CN115942108A

CN115942108A - Video processing method and electronic equipment

Info

Publication number: CN115942108A
Application number: CN202110927102.8A
Authority: CN
Inventors: 刘镇亿; 玄建永; 杨枭; 夏日升
Original assignee: Beijing Honor Device Co Ltd
Current assignee: Beijing Honor Device Co Ltd
Priority date: 2021-08-12
Filing date: 2021-08-12
Publication date: 2023-04-07
Also published as: WO2023016032A1

Abstract

A video processing method and an electronic device are provided. By implementing the video processing method in the application, when the electronic device generates a video, the electronic device can zoom an image according to the change of the zoom magnification ratio, and can also zoom an audio according to the change of the zoom magnification ratio. The electronic device audio zooming the audio comprises: the zoom magnification is increased and the field angle is decreased, suppressing the sound of the subject outside the imaging range and enhancing the sound of the subject within the imaging range. The zoom magnification is reduced, the field angle is increased, the sound of the subject outside the imaging range is suppressed, and the sound of the subject within the imaging range is reduced.

Description

Video processing method and electronic equipment

Technical Field

The present application relates to the field of terminals and communication technologies, and in particular, to a video processing method and an electronic device.

Background

With the development of electronic devices, more and more electronic devices have an image zooming function when recording videos, and in the process of recording videos by the electronic devices, the related image zooming refers to the change of the size of a shot in an acquired image through changing the zooming magnification. Even if the position of the shot object relative to the electronic equipment is not changed, if the zoom magnification is increased, the shot object is increased when the electronic equipment displays the shot object in the video, and a feeling that the shot object is relatively closer to the user is given; if the zoom magnification becomes smaller, the subject becomes smaller when the electronic device displays the subject in the video, giving the user a feeling that the subject is relatively farther away. Therefore, in the recorded video, the shot person really needing to be displayed can be highlighted, and the recorded video is more in line with the requirement in view of appearance.

However, some electronic devices cannot perform corresponding processing on audio in the process of zooming images in video. Therefore, the video recorded by the electronic equipment can better meet the requirement in both the visual sense and the auditory sense.

Therefore, how to perform audio zooming on an electronic device is a key to improve video quality, and is a direction of research.

Disclosure of Invention

The application provides a video processing method and electronic equipment, so that in a video recorded by the electronic equipment, the effect of zooming an audio and an image simultaneously can be realized.

In a first aspect, the present application provides a video processing method applied to an electronic device, where the method includes:

the electronic equipment starts a camera; displaying a preview interface, wherein the preview interface comprises a first control; detecting a first operation on the first control; in response to the first operation, starting shooting; displaying a shooting interface, wherein the shooting interface comprises a second control, and the second control is used for adjusting zoom magnification; displaying a first shot image at a first moment with a zoom magnification of a first zoom magnification, wherein the first shot image comprises a first target object and a second target object; detecting a second operation on the second control; in response to the second operation, the zoom magnification is adjusted to a second zoom magnification, the second zoom magnification being greater than the first zoom magnification; displaying a second photographed image including the first target object at a second time, the second photographed image not including a second target object; at the second moment, the microphone collects a first audio, wherein the first audio comprises a first sound and a second sound, the first sound corresponds to the first target object, and the second sound corresponds to the second target object; detecting a third operation on a third control; in response to the third operation, stopping shooting, and saving a first video, wherein the first video includes the first shot image, the first video includes, at a second time, a second shot image and a second audio, the second audio being obtained by processing the first audio according to the second zoom magnification, the second audio includes a third sound and a fourth sound, the third sound corresponds to the first target object, the fourth sound corresponds to a second target object, the third sound is enhanced with respect to the first sound, and the fourth sound is suppressed with respect to the second sound.

In the above embodiments, the electronic device may zoom both the image and the audio simultaneously during the recording of the video. When the zoom magnification becomes larger, the subject who can also be displayed in the screen becomes larger, and at the same time, the sound of the subject is enhanced, and the sound of the subject sounds larger. For a subject not displayed in the screen, the subject's voice is suppressed, and the subject's voice sounds little or inaudible.

With reference to the first aspect, in one embodiment, the method further includes: the electronic equipment processes the first audio according to the second zoom magnification to obtain a first output audio, wherein the first sound in the first output audio is unchanged, and the second sound is suppressed; enhancing the first output audio according to the second zoom magnification to obtain a second output audio; the first sound and the second sound in the second output audio are both enhanced; and according to the second zoom magnification, combining the first audio to suppress a second sound in the second output audio to obtain a second audio.

In the above embodiment, the electronic device determines, according to the second zoom magnification, the degree to which the target sound in the first audio is enhanced and the degree to which the non-target sound is suppressed, and implements audio zooming.

With reference to the first aspect, in an embodiment, the first output audio includes a channel of audio; according to the second zoom magnification, processing the first audio to obtain a first output audio, specifically including: the electronic equipment acquires a first filter coefficient corresponding to a first direction, a second filter coefficient corresponding to a second direction and a third filter coefficient corresponding to a third direction; the first direction is any one direction within the range from 10 degrees in the clockwise direction from the front of the electronic equipment to 70 degrees in the clockwise direction from the front of the electronic equipment; the second direction is any one direction within the range from the positive front of the electronic equipment by 10 degrees anticlockwise to the positive front of the electronic equipment by 10 degrees clockwise; the third direction is any one direction within the range from the positive front anticlockwise direction of the electronic equipment by 10 degrees to the positive front anticlockwise direction of the electronic equipment by 70 degrees; obtaining a first wave beam corresponding to a first direction by combining the first filter coefficient with a first audio; obtaining a second wave beam corresponding to a second direction by combining the second filter coefficient with the first audio; obtaining a third wave beam corresponding to a third direction by combining the third filter coefficient with the first audio; and the electronic equipment obtains the first output audio by using the first beam, the second beam and the third beam according to the second zoom ratio, wherein the first sound in the first output audio is unchanged, and the second sound is suppressed.

In the above embodiment, the electronic device processes the first audio through the filter coefficient, so that after the first audio is processed through the filter coefficient, a first beam, a second beam and a third beam are obtained, and at the same time, the first beam, the second beam and the third beam are fused by using the second zoom magnification to obtain the first output audio, so that the target sound remains unchanged and the non-target sound is suppressed.

With reference to the first aspect, in an implementation manner, the second output audio includes one channel of audio; according to the second zoom magnification, enhancing the first output audio to obtain a second output audio, specifically comprising: determining an adjustment parameter corresponding to the second zooming magnification according to the second zooming magnification, wherein the adjustment parameter is used for enhancing the audio; converting the first output audio from a frequency domain to a time domain to obtain a first output audio in the time domain; enhancing the first output audio frequency in the time domain by using the adjusting parameter to obtain a second output audio frequency in the time domain; converting the second output audio in the time domain to the frequency domain as the second output audio.

In the above embodiment, the target sound in the first output audio may be changed from constant to enhanced. The target sound in the second output audio is made larger with zoom magnification.

With reference to the first aspect, in an implementation manner, the second audio includes a channel of audio; according to the second zoom magnification, in combination with the first audio, suppressing a second sound in the second output audio to obtain a second audio, specifically including: performing Zelinsky filtering by using the first audio to obtain a first target gain; the first target gain is used for filtering out the sound corresponding to the high-frequency point in the second output audio; obtaining a second target gain based on a coherent diffusion power ratio algorithm by using the first audio; the second target gain is used for filtering out sounds corresponding to low-frequency points in second output audio; according to the frequency of a second output audio, combining a part of the first target gain and a part of the second target gain to obtain a third target gain; and utilizing the third target gain to suppress the second sound in the second output audio to obtain a second audio.

In the above embodiment, the non-target sound and the target sound in the second output audio are both enhanced, and the second output audio may be filtered to suppress the non-target sound therein. The Zelinsky filtering has a good filtering effect on high-frequency sound, and the coherent diffusion power ratio algorithm has a good filtering effect on low-frequency sound, so that the filtering effect can be improved by filtering the second output audio by using the combination of the two algorithms.

With reference to the first aspect, in an embodiment, the second audio includes a channel of audio; according to the second zoom magnification, in combination with the first audio, suppressing a second sound in the second output audio to obtain a second audio, specifically including: performing Zelinsky filtering by using the first audio to obtain a first target gain; the first target gain is used for filtering out sound corresponding to a high-frequency point in second output audio; the electronic equipment obtains a second target gain based on a coherent diffusion power ratio algorithm by using the audio frequency; the second target gain is used for filtering out sound corresponding to a low-frequency point in second output audio; and the electronic equipment suppresses the sound corresponding to the frequency point of the high frequency in the second audio by using the first target gain, and suppresses the sound corresponding to the frequency point of the low frequency in the second audio by using the second target gain to obtain a second audio.

With reference to the first aspect, in one implementation, the first output audio includes left channel audio and right channel audio;

according to the second zoom ratio, processing the first audio to obtain a first output audio, which specifically includes: acquiring a first filter coefficient corresponding to a first direction, a second filter coefficient corresponding to a second direction and a third filter coefficient corresponding to a third direction; the first direction is any one direction within the range from the positive front clockwise direction of the electronic equipment by 10 degrees to the positive front clockwise direction of the electronic equipment by 70 degrees; the second direction is any one direction within the range from the positive front of the electronic equipment by 10 degrees anticlockwise to the positive front of the electronic equipment by 10 degrees clockwise; the third direction is any one of the directions within the range from 10 degrees in the clockwise direction from the front of the electronic equipment to 70 degrees in the counterclockwise direction from the front of the electronic equipment; obtaining a first wave beam corresponding to a first direction by combining the first filter coefficient with a first audio; obtaining a second wave beam corresponding to a second direction by combining the second filter coefficient with the first audio; obtaining a third wave beam corresponding to a third direction by combining the third filter coefficient with the first audio; obtaining a left channel audio in the first output audio by using the first beam and the second beam according to the second zoom ratio; and obtaining a right channel audio in the first output audio by using the second beam and the third beam, wherein the first sound in the left channel audio and the right channel audio is not changed, and the second sound is suppressed.

In the above embodiment, the electronic device generates the left channel audio and the right channel audio using the first audio, and both the non-target sound in the left channel audio and the target sound in the right channel audio are suppressed, and the target sound is not changed. Therefore, when the electronic equipment plays the second audio, the stereo effect can be realized.

With reference to the first aspect, in one implementation, the second output audio includes left channel audio and right channel audio; according to the second zoom magnification, enhancing the first output audio to obtain a second output audio, specifically comprising: determining an adjustment parameter corresponding to the second zooming magnification according to the second zooming magnification, wherein the adjustment parameter is used for enhancing the audio; converting the left channel audio and the right channel audio in the first output audio from a frequency domain to a time domain respectively to obtain the left channel audio and the right channel audio in the first output audio in the time domain; respectively enhancing the left channel audio and the right channel audio in the first output audio in the time domain by using the adjusting parameters to obtain a left channel audio and a right channel audio in the second output audio in the time domain; and converting the left channel audio and the right channel audio in the second output audio in the time domain to the frequency domain to be used as the second output audio.

In the above embodiment, the electronic device processes the left channel audio and the right channel audio respectively, so that the target sound in the left channel audio signal and the right channel audio signal is enhanced.

With reference to the first aspect, in one embodiment, the second audio includes a left channel audio and a right channel audio; according to the second zoom magnification, in combination with the first audio, suppressing a second sound in the second output audio to obtain a second audio, specifically including: performing Zelinsky filtering by using the first audio to obtain a first target gain; the first target gain is used for filtering out the sound corresponding to the high-frequency point in the second output audio; obtaining a second target gain based on a coherent diffusion power ratio algorithm by using the first audio; the second target gain is used for filtering out sounds corresponding to low-frequency points in second output audio; according to the frequency of a second output audio, combining a part of the first target gain and a part of the second target gain to obtain a third target gain; and utilizing the third target gain to respectively suppress second sound in the left channel audio and the right channel audio in the second output audio to obtain the left channel audio and the right channel audio in the second audio.

In the above embodiment, the non-target sounds and the target sounds in the left channel audio signal and the right channel audio signal are enhanced, and the left channel audio and the right channel audio can be filtered respectively to suppress the non-target sounds therein. The Zelinsky filtering has a good filtering effect on high-frequency sound, and the coherent diffusion power ratio algorithm has a good filtering effect on low-frequency sound, so that the filtering effect can be improved by filtering the left channel audio and the right channel audio by using the combination of the two algorithms.

With reference to the first aspect, in one implementation, the second audio includes a left channel audio and a right channel audio; according to the second zoom magnification, in combination with the audio, suppressing a second sound in the second output audio to obtain a second audio, specifically including: performing Zelinsky filtering by using the first audio to obtain a first target gain; the first target gain is used for filtering out the sound corresponding to the high-frequency point in the second output audio; obtaining a second target gain based on a coherent diffusion power ratio algorithm by using the first audio; the second target gain is used for filtering out sound corresponding to a low-frequency point in second output audio; and suppressing the sound corresponding to the frequency point of the high frequency in the left channel audio and the right channel audio in the second audio according to the first target gain, and suppressing the sound corresponding to the frequency point of the low frequency in the second audio in the left channel audio and the right channel audio in the second audio by using the second target gain to obtain the left channel audio and the right channel audio in the second audio.

In the above embodiment, the non-target sound and the target sound in the left channel audio signal and the right channel audio signal are enhanced, and the left channel audio and the right channel audio may be filtered respectively to suppress the non-target sound therein. The Zelinsky filtering has a good filtering effect on high-frequency sound, and the coherent diffusion power ratio algorithm has a good filtering effect on low-frequency sound, so that the filtering effect can be improved by filtering the left channel audio and the right channel audio by using the combination of the two algorithms.

With reference to the first aspect, in one implementation, the first filter, the second filter coefficient, and the third filter are preset in the electronic device; in the first filter coefficient, a coefficient corresponding to a sound signal in a first direction is 1, which indicates that the sound signal in the first direction is not suppressed; the closer to the first direction the coefficients corresponding to the sound signals are to 1, the more the suppression degree increases in sequence; in the second filter coefficients, a coefficient corresponding to the audio signal in the second direction is 1, which indicates that the audio signal in the second direction is not suppressed; the closer to the coefficient corresponding to the sound signal in the second direction, the closer to 1, the suppression degree increases in sequence; in the third filter coefficients, a coefficient corresponding to a sound signal in a third direction is 1, which indicates that the sound signal in the third direction is not suppressed; the closer the coefficient corresponding to the sound signal in the third direction is to 1, the more the suppression degree increases in order.

With reference to the first aspect, in an implementation, the adjustment parameter corresponding to the second zoom magnification is preset in the electronic device; the value of the adjustment parameter is directly related to the zoom magnification, and any second zoom magnification corresponds to only one adjustment parameter.

In a second aspect, the present application provides an electronic device comprising: one or more processors and memory; the memory is coupled with the one or more processors, the memory for storing computer program code, the computer program code including computer instructions, the one or more processors invoking the computer instructions to cause the electronic device to perform: starting a camera; displaying a preview interface, wherein the preview interface comprises a first control; detecting a first operation on the first control; in response to the first operation, starting shooting; displaying a shooting interface, wherein the shooting interface comprises a second control, and the second control is used for adjusting the zoom magnification; displaying a first shot image at a first moment with a zoom magnification of a first zoom magnification, wherein the first shot image comprises a first target object and a second target object; detecting a second operation on the second control; in response to the second operation, the zoom magnification is adjusted to a second zoom magnification, the second zoom magnification being greater than the first zoom magnification; displaying a second photographed image including the first target object at a second time, the second photographed image not including a second target object; at the second moment, the microphone collects a first audio, wherein the first audio comprises a first sound and a second sound, the first sound corresponds to the first target object, and the second sound corresponds to the second target object; detecting a third operation on a third control; in response to the third operation, stopping shooting, and saving a first video, wherein the first video includes the first shot image, the first video includes, at a second time, a second shot image and a second audio, the second audio being obtained by processing the first audio according to the second zoom magnification, the second audio includes a third sound and a fourth sound, the third sound corresponds to the first target object, the fourth sound corresponds to a second target object, the third sound is enhanced with respect to the first sound, and the fourth sound is suppressed with respect to the second sound.

In combination with the second aspect, in one embodiment the one or more processors are further configured to invoke the computer instructions to cause the electronic device to perform: processing the first audio according to the second zoom ratio to obtain a first output audio, wherein the first sound in the first output audio is unchanged, and the second sound is suppressed; enhancing the first output audio according to the second zoom ratio to obtain a second output audio; the first sound and the second sound in the second output audio are both enhanced; and according to the second zoom magnification, combining the first audio, and suppressing a second sound in the second output audio to obtain a second audio.

In the above embodiment, the electronic device determines the degree of enhancement of the target sound in the first audio and the degree of suppression of the non-target sound according to the second magnification, and implements audio zooming.

With reference to the second aspect, in one embodiment, the first output audio includes one channel of audio; the one or more processors are specifically configured to invoke the computer instructions to cause the electronic device to perform: acquiring a first filter coefficient corresponding to a first direction, a second filter coefficient corresponding to a second direction and a third filter coefficient corresponding to a third direction; the first direction is any one direction within the range from the positive front clockwise direction of the electronic equipment by 10 degrees to the positive front clockwise direction of the electronic equipment by 70 degrees; the second direction is any one direction within the range from the positive front of the electronic equipment by 10 degrees anticlockwise to the positive front of the electronic equipment by 10 degrees clockwise; the third direction is any one direction within the range from the positive front anticlockwise direction of the electronic equipment by 10 degrees to the positive front anticlockwise direction of the electronic equipment by 70 degrees; obtaining a first wave beam corresponding to a first direction by combining the first filter coefficient with a first audio; obtaining a second wave beam corresponding to a second direction by combining the second filter coefficient with the first audio; obtaining a third wave beam corresponding to a third direction by combining the third filter coefficient with the first audio; and the electronic equipment obtains the first output audio by using the first beam, the second beam and the third beam according to the second zoom magnification, wherein the first sound in the first output audio is unchanged, and the second sound is suppressed.

With reference to the second aspect, in one embodiment, the second output audio includes one channel of audio; the one or more processors are specifically configured to invoke the computer instructions to cause the electronic device to perform: determining an adjustment parameter corresponding to the second zooming magnification according to the second zooming magnification, wherein the adjustment parameter is used for enhancing the audio; converting the first output audio from a frequency domain to a time domain to obtain a first output audio in the time domain; enhancing the first output audio frequency in the time domain by using the adjusting parameter to obtain a second output audio frequency in the time domain; converting the second output audio in the time domain to the frequency domain as the second output audio.

With reference to the second aspect, in one embodiment, the second audio includes a channel of audio; the one or more processors are specifically configured to invoke the computer instructions to cause the electronic device to perform: performing Zelinsky filtering by using the first audio to obtain a first target gain; the first target gain is used for filtering out the sound corresponding to the high-frequency point in the second output audio; obtaining a second target gain based on a coherent diffusion power ratio algorithm by using the first audio; the second target gain is used for filtering out sound corresponding to a low-frequency point in second output audio; according to the frequency of a second output audio, combining a part of each of the first target gain and the second target gain to obtain a third target gain; and utilizing the third target gain to suppress the second sound in the second output audio to obtain a second audio.

With reference to the second aspect, in an embodiment, the second audio includes a channel of audio; the one or more processors are specifically configured to invoke the computer instructions to cause the electronic device to perform: performing Zelinsky filtering by using the first audio to obtain a first target gain; the first target gain is used for filtering out the sound corresponding to the high-frequency point in the second output audio; the electronic equipment obtains a second target gain based on a coherent diffusion power ratio algorithm by using the audio frequency; the second target gain is used for filtering out sounds corresponding to low-frequency points in second output audio; and the electronic equipment suppresses the sound corresponding to the frequency point of the high frequency in the second audio by using the first target gain, and suppresses the sound corresponding to the frequency point of the low frequency in the second audio by using the second target gain to obtain a second audio.

With reference to the second aspect, in one embodiment, the first output audio includes left channel audio and right channel audio; the one or more processors are specifically configured to invoke the computer instructions to cause the electronic device to perform: acquiring a first filter coefficient corresponding to a first direction, a second filter coefficient corresponding to a second direction and a third filter coefficient corresponding to a third direction; the first direction is any one direction within the range from 10 degrees in the clockwise direction from the front of the electronic equipment to 70 degrees in the clockwise direction from the front of the electronic equipment; the second direction is any one direction within the range from the positive front of the electronic equipment by 10 degrees anticlockwise to the positive front of the electronic equipment by 10 degrees clockwise; the third direction is any one direction within the range from the positive front anticlockwise direction of the electronic equipment by 10 degrees to the positive front anticlockwise direction of the electronic equipment by 70 degrees; obtaining a first wave beam corresponding to a first direction by combining the first filter coefficient with a first audio; obtaining a second wave beam corresponding to a second direction by combining the second filter coefficient with the first audio; obtaining a third wave beam corresponding to a third direction by combining the third filter coefficient with the first audio; obtaining a left channel audio in the first output audio by using the first beam and the second beam according to the second zoom ratio; and obtaining a right channel audio in the first output audio by using the second beam and the third beam, wherein the first sound in the left channel audio and the first sound in the right channel audio are not changed, and the second sound is suppressed.

With reference to the second aspect, in one embodiment, the second output audio includes left channel audio and right channel audio; the one or more processors are specifically configured to invoke the computer instructions to cause the electronic device to perform: determining an adjustment parameter corresponding to the second zoom magnification according to the second zoom magnification, wherein the adjustment parameter is used for enhancing the audio; converting the left channel audio and the right channel audio in the first output audio from the frequency domain to the time domain respectively to obtain the left channel audio and the right channel audio in the first output audio in the time domain; respectively enhancing the left channel audio and the right channel audio in the first output audio in the time domain by using the adjusting parameters to obtain a left channel audio and a right channel audio in the second output audio in the time domain; and converting the left channel audio and the right channel audio in the second output audio in the time domain into the frequency domain to be used as the second output audio.

With reference to the second aspect, in one embodiment, the second audio includes a left channel audio and a right channel audio; the one or more processors are specifically configured to invoke the computer instructions to cause the electronic device to perform: performing Zelinsky filtering by using the first audio to obtain a first target gain; the first target gain is used for filtering out the sound corresponding to the high-frequency point in the second output audio; obtaining a second target gain based on a coherent diffusion power ratio algorithm by using the first audio; the second target gain is used for filtering out sound corresponding to a low-frequency point in second output audio; according to the frequency of a second output audio, combining a part of each of the first target gain and the second target gain to obtain a third target gain; and utilizing the third target gain to respectively suppress second sound in the left channel audio and the right channel audio in the second output audio to obtain the left channel audio and the right channel audio in the second audio.

With reference to the second aspect, in one embodiment, the second audio includes a left channel audio and a right channel audio; the one or more processors are specifically configured to invoke the computer instructions to cause the electronic device to perform: performing Zelinsky filtering by using the first audio to obtain a first target gain; the first target gain is used for filtering out the sound corresponding to the high-frequency point in the second output audio; obtaining a second target gain based on a coherent diffusion power ratio algorithm by using the first audio; the second target gain is used for filtering out sound corresponding to a low-frequency point in second output audio; and suppressing the sound corresponding to the frequency point of the high frequency in the left channel audio and the right channel audio in the second audio according to the first target gain, and suppressing the sound corresponding to the frequency point of the low frequency in the second audio in the left channel audio and the right channel audio in the second audio by using the second target gain to obtain the left channel audio and the right channel audio in the second audio.

In a third aspect, the present application provides an electronic device comprising: one or more processors and memory; the memory is coupled to the one or more processors and is configured to store computer program code comprising computer instructions that are invoked by the one or more processors to cause the electronic device to perform a method as described in the first aspect or any one of the embodiments of the first aspect.

In a fourth aspect, the present application provides a chip system, which is applied to an electronic device, and the chip system includes one or more processors, and the processors are configured to invoke computer instructions to cause the electronic device to perform the method described in the first aspect or any one of the implementation manners of the first aspect.

In the above embodiments, during the process of recording the video, the electronic device may zoom the image and the audio simultaneously. When the zoom magnification becomes larger, the subject who can also be displayed in the screen becomes larger, and at the same time, the sound of the subject is enhanced, and the sound of the subject sounds larger. For a subject not displayed in the screen, the subject's voice is suppressed, and the subject's voice sounds little or inaudible.

In a fifth aspect, the present application provides a computer program product containing instructions, which when run on an electronic device, causes the electronic device to perform the method as described in the first aspect or any one of the implementation manners of the first aspect.

In the above embodiment, during the process of recording the video, the electronic device may zoom the image and the audio simultaneously. When the zoom magnification becomes larger, the subject who can also be displayed in the screen becomes larger, and at the same time, the sound of the subject is enhanced, and the sound of the subject sounds larger. For a subject not displayed in the screen, the subject's voice is suppressed, and the subject's voice sounds little or inaudible.

In a sixth aspect, an embodiment of the present application provides a computer-readable storage medium, which includes instructions that, when executed on an electronic device, cause the electronic device to perform the method described in the first aspect or any one of the implementation manners of the first aspect.

In the above embodiment, during the process of recording the video, the electronic device may zoom the image and the audio simultaneously. When the zoom magnification becomes larger, the subject who can also be displayed in the screen becomes larger, and at the same time, the sound of the subject is enhanced, and the sound of the subject sounds larger. For a subject not displayed in the screen, the subject's voice is suppressed, and the subject's voice sounds little or inaudible. The key to quality is the direction of research.

Drawings

FIGS. 1 a-1 e illustrate an exemplary set of user interfaces for image zooming when an electronic device takes a recorded video;

2-4 are exemplary sets of user interfaces for image zooming but without audio zooming in videos recorded by electronic devices;

FIGS. 5 a-5 d are a set of exemplary user interfaces for an electronic device to preview a video recorded by the electronic device;

6 a-6 f are a set of exemplary user interfaces for post-processing recorded video by an electronic device according to an embodiment of the present application;

fig. 7 is a schematic diagram of processing a current frame image and a current frame input audio signal set in real time to obtain a video according to an embodiment of the present application;

fig. 8 is an exemplary flowchart illustrating an electronic device processing an input audio signal of a current frame in real time according to an embodiment of the present application;

FIGS. 9-11 show a schematic of a beamforming technique;

FIG. 12 is an exemplary flow chart for when an electronic device generates a left channel audio signal or a right channel audio signal;

FIG. 13 is a diagram illustrating an example of a first direction, a second direction, and a third direction provided by an embodiment of the present application;

fig. 14 is an exemplary flowchart of an electronic device generating a first filter corresponding to the first direction according to an embodiment of the present application;

FIG. 15 is an exemplary flow chart for adjusting the amplitude of the first output audio signal by the electronic device;

FIG. 16 is a schematic flow chart of the electronic device suppressing a non-target sound signal in the second output audio signal;

FIG. 17 is a schematic diagram of processing a current frame image and a current frame input audio signal set in real time and then playing the processed image and audio signal set;

FIG. 18 is an exemplary flowchart for post-processing a frame of input audio signals by an electronic device according to an embodiment of the present application;

fig. 19 is a schematic structural diagram of an electronic device provided in an embodiment of the present application.

Detailed Description

The terminology used in the following examples of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in the specification of the present application and the appended claims, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the listed items.

In the following, the terms "first", "second" are used for descriptive purposes only and are not to be construed as implying or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature, and in the description of embodiments of the application, unless stated otherwise, "plurality" means two or more.

The term "User Interface (UI)" in the following embodiments of the present application is a media interface for interaction and information exchange between an application program or an operating system and a user, and implements conversion between an internal form of information and a form acceptable to the user. The user interface is source code written by java, extensible markup language (XML) and other specific computer languages, and the interface source code is analyzed and rendered on the electronic equipment and finally presented as content which can be identified by a user. A commonly used presentation form of the user interface is a Graphical User Interface (GUI), which refers to a user interface related to computer operations and displayed in a graphical manner. It may be a visual interface element such as text, an icon, a button, a menu, a tab, a text box, a dialog box, a status bar, a navigation bar, a Widget, etc. displayed in a display of the electronic device.

For ease of understanding, the related terms and concepts related to the embodiments of the present application will be described below.

(1) Focal length and field angle

In the embodiment of the present application, the focal length refers to a focal length used by an electronic device in a process of recording a video or shooting an image.

The field angle is an angle formed by two edges of a lens of the electronic device as a vertex and the maximum range of the lens that a person to be shot can pass through. The size of the field angle determines the field of view of the electronic device, i.e., a person within the field of view may be displayed in the image, while a person outside the field of view may not be displayed in the image.

Specifically, when the electronic device records a video or shoots an image, the focal length of the same shot subject whose relative position with the electronic device does not change is different, and then the electronic device can acquire different images. For example, in one case, the larger the focal length used by the electronic device, the smaller the field angle of the electronic device. In this case, the larger the subject is in the image acquired by the electronic device, the limited display screen of the electronic device may cause only a part of the subject to be displayed. In another case, the smaller the focal length used by the electronic device, the larger the field angle of the electronic device. The smaller the subject is in the image captured by the electronic device. Generally, the larger the field angle of the electronic device, the more other subjects are displayed in the acquired image.

In some embodiments, the electronic device may change the focal length according to user settings when recording a video or taking an image.

In other embodiments, the electronic device may change the focal length according to some preset rules when recording a video or taking an image. For example, when recording an interesting video, the electronic device may change the focal distance according to a preset rule.

The change of the focal length includes the focal length becoming larger and the focal length becoming smaller. In some embodiments, the electronic device may change the focal length by adjusting a zoom magnification. The user can select the magnification through a zoom magnification control in the user interface, and can also select the zoom magnification through inputting a gesture command in the user interface.

The zoom magnification control may be shown as a zoom magnification control 111 shown in the user interface 11 in fig. 1 b. Reference may be made to the following description relating to the zoom magnification control 111. And the user continuously enlarges the shot object on the preview interface by adjusting the zoom ratio. The user can select the zoom magnification through a zoom magnification button on the photographing equipment; and a gesture command can be input through a display screen of the photographing device to select the zoom ratio. In general, zoom photographing includes two modes, optical zoom photographing and digital zoom photographing. Both of these ways may change the size of objects in the preview image displayed by the electronic device.

(2) Image zooming and audio zooming

The image zooming refers to that the electronic equipment changes the focal length in the process of shooting an image, and the electronic equipment can change the focal length by adjusting the zooming magnification to finish the image zooming. For example, when a user photographs a distant object through the electronic device, the displayed preview image necessarily shows a smaller size of the object. Under the condition of not changing the position of the user, the user can select to increase the zoom magnification, so that an object displayed on the interface of the electronic equipment is enlarged, and the image zooming is realized. In the embodiment of the present application, the object displayed on the interface of the electronic device may be enlarged or reduced by adjusting the zoom ratio; the method can be applied to the process of recording the video and can also be applied to the process of playing the video.

The audio zooming can be similar to the image zooming, the zooming magnification is increased, when the shot person displayed by the electronic equipment in the video becomes larger, the user can be given a feeling that the shot person is relatively closer, and at the moment, the sound of the shot person displayed by the electronic equipment is also correspondingly increased; if the zoom magnification is reduced and the subject displayed on the electronic device is reduced in the video, the user may be given a feeling that the subject is far away from the electronic device. If the images and the corresponding audio can be zoomed, the effect of zooming the audio and the images simultaneously can be brought, the sensory experience of a user is improved, and the interestingness is increased.

FIGS. 1 a-1 e illustrate an exemplary set of user interfaces for image zooming when an electronic device takes a recorded video.

Fig. 1 a-1 e show an electronic device with three microphones. Taking fig. 1a as an example, the microphones of the electronic device may include a first microphone, a second microphone, and a third microphone. In the process of recording a video by the electronic device, the microphone acquires and processes a plurality of frames of audio signals in the environment to generate an audio stream. Meanwhile, the camera can acquire multi-frame images and process the multi-frame images to generate an image stream. Then, the audio stream and the image stream are mixed to obtain a recorded video.

It should be understood that when recording a video, the number of the microphones used by the electronic device may be N, where N is a positive integer greater than or equal to 2, and is not limited to the first microphone, the second microphone, and the third microphone mentioned above.

In fig. 1a to 1d, the subject may include a subject 101 (dog), a subject 102 (man near the dog in the figure), a subject 103 (friend in the figure), and the like. At this time, the electronic device records a video by using the rear camera.

As shown in fig. 1a, the user interface 10 of the electronic device is a preview interface for recording a video. A recording control 101 may be included in the user interface 11, and the recording control 101 may be configured to receive an instruction to record a video. In response to a user operation (e.g., a click operation) on the first control 101, the electronic device may begin recording a video, displaying a user interface as shown in FIG. 1 b.

As shown in fig. 1b, the user interface 11 of the electronic device may be a user interface used when recording a video, in which case, the electronic device acquires an image corresponding to the 1 st second and an audio signal corresponding to the 1 st second in the video. A zoom magnification control 111, a zoom magnification increase control 112, and a zoom magnification decrease control 113 may be included in the user interface. The zoom magnification control 111 is configured to receive an instruction to change the zoom magnification and prompt a user as to what the current zoom magnification of the electronic device is, for example, 1.0 indicates that the zoom magnification is 1 zoom magnification, and 5.0 indicates that the zoom magnification is 5 zoom magnification, where the 5 zoom magnification is greater than 1 zoom magnification. The zoom magnification increase control 112 is configured to receive an instruction to increase the zoom magnification. The zoom magnification reduction control 113 is used to receive an instruction to reduce the zoom magnification. At this time, as can be seen from the zoom magnification control 11, during recording of a video, an image corresponding to the 1 st second is captured at a 1-time zoom magnification, and includes the subject 101, the subject 102, and the subject 103. In response to an operation of the user sliding the zoom magnification control 111 upward, the electronic apparatus may change the zoom magnification at the time of recording the video. The electronic device may display the user interface shown in fig. 1 c.

As shown in fig. 1c, the user interface 12 is a user interface for the electronic device to record video. At this time, the electronic device acquires an image corresponding to the 2 nd second and an audio signal corresponding to the 2 nd second in the video. At this time, the positions of all the subjects with respect to the electronic device do not change. However, since the zoom magnification is increased from 1-fold zoom magnification to 5-fold zoom magnification, the electronic apparatus viewing angle becomes small. Then, compared to the image captured by the electronic device at the zoom magnification of 1 in fig. 1b, it can be seen that, in the image displayed in the user interface 12, the subject 101 is no longer displayed, and other displayed subjects become larger, for example, the subject 102 and the subject 103 become larger. At this time, in response to an operation of the user sliding the zoom magnification control 111 upward, the electronic apparatus may change the zoom magnification at the time of recording the video. The electronic device may display the user interface shown in fig. 1 d.

As shown in fig. 1d, the user interface 13 is a user interface of the electronic device when recording video. A stop recording control 131 may be included in the user interface 13, and the stop recording control 131 may be configured to receive an instruction to stop recording a video. At this time, the electronic device acquires an image corresponding to the 3 rd second and an audio signal corresponding to the 3 rd second in the video. At this time, the positions of all the subjects with respect to the electronic device do not change. However, since the zoom magnification is increased from 1-fold zoom magnification to 10-fold zoom magnification, the electronic apparatus viewing angle becomes small. Then, compared to the image captured by the electronic device at 5 times zoom magnification in fig. 1c, it can be seen that, in the image displayed in the user interface 13, the photographer 101 and the photographer 103 are no longer displayed, and the other displayed photographer becomes larger, for example, the photographer 102 becomes larger. In response to the user ceasing to record further operations (e.g., clicking operations) on control 131, the electronic device may display a user interface as shown in FIG. 1 e.

The user interface 14 as shown in fig. 1e is one user interface after the electronic device has completed the video recording. The electronic device may save the recorded video.

It should be understood that fig. 1a to 1e above illustrate an exemplary set of user interfaces for an electronic device to change the angle of view due to the change of zoom magnification during the process of recording video, thereby changing the captured image, and should not be limited to the embodiments of the present application. The electronic device may also change the zoom magnification in other ways. The embodiments of the present application do not limit this.

In the embodiment of the application, in the process of generating a video by the electronic device, the image can be enlarged or reduced according to the change of the zoom magnification, and the process is called image zooming. The audio can also be processed according to the change of the zoom magnification, and the process is called audio zooming. The image zooming and the audio zooming will be described in detail in the following term (2).

(3) Inhibition and enhancement

In the embodiment of the present application, suppressing refers to reducing the energy of the audio signal so that the audio signal sounds smaller or even inaudible. Suppression of the audio signal may be achieved by reducing the amplitude of the audio signal.

Enhancing refers to increasing the energy of an audio signal such that the audio signal sounds larger. Enhancement of an audio signal can be achieved by making the amplitude of the audio signal larger.

Wherein, the amplitude is used for representing the voltage magnitude corresponding to the audio signal; may also represent the energy level of the audio signal; or decibel magnitude.

(4) Beamforming and gain coefficients

In the embodiment of the present application, the beamforming may be used to describe a corresponding relationship between audio collected by a microphone of the electronic device and the audio when the audio is transmitted to a speaker for playing. The correspondence is a set of gain coefficients representing the degree of suppression of the audio signals picked up by the microphones in the respective directions. Wherein, suppressing means reducing the energy of the audio signal so that the audio signal sounds less or even inaudible. The degree of suppression is used to describe the degree to which the audio signal is reduced. The greater the degree of suppression, the more the energy of the audio signal is reduced. For example, a gain factor of 0.0 indicates complete removal of the audio signal, and a gain factor of 1.0 indicates no suppression. The degree of inhibition is greater as the degree of inhibition is closer to 0.0, and the degree of inhibition is smaller as the degree of inhibition is closer to 1.0.

In one approach, the electronic device may zoom an image according to a change in zoom magnification while generating a video, but does not zoom audio according to a change in zoom magnification. The video recorded by the electronic equipment has the result that the picture of the shot person displayed in the video has a far and near change, but the sound of the shot person has no size change. In addition, the recorded video may display the sound of the subject as well as the sound of other objects not displayed in the image.

In this scenario, the process of recording the video by the electronic device refers to the process mentioned in fig. 1a to 1 e. If the electronic device does not perform audio zooming during the recording of the video, the electronic device may play the video as described below with reference to fig. 2 to 4.

Fig. 2-4 are exemplary sets of user interfaces for an electronic device recording video with image zoom but without audio zoom.

In fig. 2 (b) to fig. 4 (b), the sound signal in the icon 301 is a solid line, and in the present embodiment, the sound of the subject is not a suppressed subject but a target subject, and the sound of the subject 102 can be heard when playing a video. In the icon 302, the subject 101 is drawn with a dotted line, and this indicates that the subject 101 does not appear in the picture of the video and belongs to a non-target object of shooting. In the icon 303, the subject 102 is drawn with a solid line, and indicates that the subject is present in the image of the video and belongs to the target subject of the shooting. Optionally, in this embodiment of the present application, the sound of the non-target object belongs to the suppressed object. It is understood that the target object is an object appearing in the video picture, the sound of which does not need to be suppressed; correspondingly, non-target objects are objects that do not appear in the video frame, whose sound needs to be suppressed.

It should be understood that, in fig. 2 (b) to fig. 4 (b), icons having similar shapes have the same meaning and are not explained one by one. For example, when the subject is drawn with a dotted line, the meaning of the representation is that the subject does not appear in the picture of the video and belongs to a non-target subject of shooting. For example, in fig. 4 (b), when both the subject 101 and the subject 103 are drawn by dotted lines, it indicates that the subject 101 and the subject 103 do not appear in the picture of the video and belong to a non-target object to be photographed.

As shown in fig. 2 (a), the user interface 20 is a user interface when the electronic device plays a video. At this time, the electronic device plays the image and audio corresponding to the 1 st second recorded in fig. 1 b. The zoom magnification of the electronic device is 1 zoom magnification. The user interface 20 shows that the currently played image includes a subject 101, a subject 102, and a subject 103.

As shown in (b) of fig. 2, a beam forming diagram corresponding to the audio when the electronic device plays the audio corresponding to the 1 st second. Beamforming may be used to describe the correspondence between audio captured by a microphone of an electronic device and the audio as it is transmitted to a speaker for playback. The correspondence is a set of gain coefficients representing the degree of suppression of the audio signals picked up by the microphones in the respective directions. Wherein, suppressing means reducing the energy of the audio signal so that the audio signal sounds less or even inaudible. The degree of suppression is used to describe the degree to which the audio signal is reduced. The greater the degree of suppression, the more the energy of the audio signal is reduced. For example, in fig. 2 (b), a gain factor of 0.0 indicates that the audio signal is completely removed, and a gain factor of 1.0 indicates that suppression is not performed. The closer to 0.0, the greater the degree of inhibition, and the closer to 1.0, the smaller the degree of inhibition.

For a detailed description of the gain factor, reference may be made to the following description of the gain factor in step S302, which is not repeated herein.

The electronic device can suppress the audio signal collected by the microphone according to the gain coefficient and then transmit the audio signal to the loudspeaker for playing. For example, as can be seen in the beam forming diagram shown in (b) of fig. 2, the gain coefficients of the electronic device for audio in the respective directions picked up by the microphones are all 1. It indicates that the electronic device is not suppressing the captured audio signal. For example, if the gain coefficients corresponding to the directions of the subject 101, the subject 102, and the subject 103 are all 1 (or close to 1), the audio signal collected by the electronic device includes the sounds of the subject 101, the subject 102, and the subject 103, and is not suppressed. That is, in the audio, the voices of the subject 101, the subject 102, and the subject 103 are not suppressed, and the user can hear the voices of the subject 101, the subject 102, and the subject 103.

After the electronic device finishes playing the video corresponding to the 1 st second, the video corresponding to the 2 nd second can be played.

As shown in fig. 3 (a), the user interface 30 is a user interface when the electronic device plays the video corresponding to the 2 nd second. At this time, the electronic device plays the image and audio corresponding to the 2 nd second recorded in fig. 1 c. At this time, the zoom magnification of the electronic apparatus is increased from 1-time zoom magnification to 5-time zoom magnification, and it can be seen in the user interface 30 that the image currently played includes the subject 102 and the subject 103, but does not include the subject 101 any more.

As shown in (b) of fig. 3, a beam forming diagram corresponding to the audio when the electronic device plays the audio corresponding to the 2 nd second. If the gain coefficients corresponding to the directions of the subject 101, the subject 102, and the subject 103 are all 1 (or close to 1), the audio signal collected by the electronic device includes the sounds of the subject 101, the subject 102, and the subject 103, and is not suppressed. That is, in the audio, the voices of the subject 101, the subject 102, and the subject 103 are not suppressed, and the user can hear the voices of the subject 101, the subject 102, and the subject 103. That is, at this time, the sound of the subject 101 is still included in the corresponding audio, but the subject 101 is no longer included in the played image. The images of the subject 102 and the subject 103 become larger, but the voices of the subject 102 and the subject 103 do not become larger in the audio.

After the electronic device finishes playing the video corresponding to the 2 nd second, the video corresponding to the 3 rd second can be played.

Referring to the foregoing descriptions of fig. 2 and fig. 3, in conjunction with (a) in fig. 4 and (b) in fig. 4, the user interface 40 is a user interface when the electronic device plays the video corresponding to the 3 rd second. At this time, the electronic device plays the image and audio corresponding to the 3 rd second recorded in fig. 1 d. At this time, the zoom magnification of the electronic apparatus is increased from 1-time zoom magnification to 10-time zoom magnification, and it can be seen in the user interface 40 that the image currently played includes the subject 102, but does not include the subject 101 and the subject 103. However, at this time, the corresponding audio still includes the voices of the subject 101 and the subject 103, but the played image does not include the subject 101 and the subject 103. And the image of the subject 102 becomes large, but the sound of the subject 102 does not become large in the audio.

Thus, when a video recorded by adopting the scheme is played, the visual sense and the auditory sense of the user are not matched. That is, the voice of the user will not be divided into different sizes due to the larger and smaller image. Also, the user can hear the sound of other objects not displayed in the image.

By implementing the video processing method in the application, when the electronic device generates a video, the electronic device can zoom an image according to the change of the zoom magnification ratio, and can also zoom an audio according to the change of the zoom magnification ratio. The electronic device audio zooming the audio comprises: the zoom magnification is increased and the field angle is decreased, thereby suppressing the sound of the subject outside the imaging range and enhancing the sound of the subject within the imaging range. The zoom magnification is reduced, the field angle is increased, the sound of the subject outside the imaging range is suppressed, and the sound of the subject within the imaging range is reduced.

Wherein, enhancing means that the energy of the audio signal is increased to make the audio signal sound larger. Suppression refers to reducing the energy of an audio signal so that the audio signal sounds less or even inaudible. The variation of the energy of the audio signal can be achieved by adjusting its amplitude. The enhancement and suppression of the audio signal can be referred to the following description of step S105 and will not be described herein again.

In this way, in the video recorded by the electronic device, the size of the displayed picture of the photographed person changes, and the size of the voice of the photographed person also changes. In addition, in the recorded video, the sound of the object not displayed in the image is suppressed in addition to the displayed sound of the subject, and the sound of the object not displayed in the image is not audible from the user's sense of hearing.

In this application, the process of recording video by the electronic device refers to the process mentioned in fig. 1a to 1 e. The electronic equipment performs audio zooming in the process of recording the video, and the effect of zooming images and audio simultaneously is achieved.

Three usage scenarios related to the embodiments of the present application are described below. In the embodiment of the present application, in the process of generating a video by an electronic device, an image may be enlarged or reduced according to a change of a zoom magnification zoom factor (for convenience of description, image zooming is used in this application to describe, and image zooming indicates enlargement or reduction of an image presented on a mobile phone interface), or an audio may be processed according to a change of the zoom magnification zoom factor (for convenience of description, hereinafter referred to as audio zooming). As described above, when the zoom magnification is increased, the angle of view is reduced, and the electronic apparatus can suppress the sound of the subject other than the angle of view and enhance the sound of the subject within the angle of view. When the zoom magnification is small, the angle of view is large, and the electronic apparatus can suppress the sound of the subject outside the angle of view and reduce the sound of the subject inside the angle of view. The process of audio zooming by the electronic device can occur in different scenarios, three of which are described in detail below.

Scene 1: the electronic equipment can perform image zooming on each acquired frame of image in real time according to the change of zooming multiplying power by adopting the video processing method, and perform audio zooming on each acquired frame of audio in real time according to the change of zooming multiplying power. And finally, generating an image stream according to the multi-frame image and an audio stream according to the multi-frame audio, and mixing the image stream and the audio stream to obtain the recorded video. And then playing the recorded video.

Wherein, the exemplary user interface involved in recording the video in the scene 1 may refer to the foregoing description of fig. 1 a-1 e. The process involved in playing the video may refer to the following description of fig. 9-11, which is not repeated here.

The method involved in video processing in the scene 1 may refer to the following description of step S101 to step S107 in fig. 8, which is not repeated herein.

Scene 2: in the process of recording the video, the electronic device can display the image, simultaneously acquire the audio through the microphone, process the audio, and play the audio through the connected earphone, namely zooming the image of each frame of the acquired image in real time according to the change of the zooming magnification, and zooming the audio of each frame of the acquired audio in real time according to the change of the zooming magnification. Each frame of image and audio is played back as it is generated.

The exemplary user interface involved in scenario 2 may refer to the description of fig. 5 a-5 d below.

In fig. 5a to 5d, the photographic subject may include a photographic subject 101 (dog), a photographic subject 102 (man closest to the dog in the drawing), a photographic subject 103 (friend in the drawing), and the like. At this time, the electronic device records the video by using the rear camera, and it is assumed that the positions of all the persons to be shot relative to the electronic device are not changed and the sound volume of all the persons to be shot is not changed in the process from the start of video recording to the end of video recording of the electronic device. In order to avoid that the played audio is collected by the electronic device when the video is previewed and the subsequent audio required to be collected is influenced, the electronic device can play the audio through the connected earphone.

In some embodiments, the electronic device may play audio directly by using a local speaker without using an earphone, and then cancel the audio played by the speaker of the electronic device by using Acoustic Echo Cancellation (AEC).

As shown in fig. 5a, the user interface 80 is a user interface for the electronic device to preview a video. The user interface 80 may include a recording control 801, a zoom magnification control 802, a zoom magnification increase control 803, and a zoom magnification decrease control 804. The zoom magnification of the electronic device is 1 zoom magnification. The electronic device may capture and process the image and then display it with the display screen, while the electronic device may process the captured audio for playback through the headphones 805. The user can hear the voices of the subject 101, the subject 102, and the subject 103. In response to an operation of the user sliding the zoom magnification control 802 upward, the electronic apparatus may change the zoom magnification at the time of recording the video. The electronic device may display the user interface shown in fig. 5 b.

As shown in fig. 5b, user interface 81 is another user interface for the electronic device to preview a video. The zoom magnification of the electronic apparatus is changed from 1-time zoom magnification to 5-time zoom magnification. The electronic device may capture and process the images and then display them with the display screen, while the electronic device may process the captured audio for playback through the headphones 805. For the user, the user can see the person to be photographed 102 and the person to be photographed 103 becoming large but cannot see the person to be photographed 101 from the image displayed in the user interface 81, while can hear the sound of the person to be photographed 102 and the person to be photographed 103 and the sound becomes large but cannot hear the sound of the person to be photographed 101. In response to an operation of the user sliding the zoom magnification control 802 upward, the electronic apparatus can change the zoom magnification at the time of recording the video. The electronic device may display the user interface shown in fig. 5 c.

As shown in fig. 5c, user interface 82 is another user interface for the electronic device to preview a video. The zoom magnification of the electronic apparatus is changed from 1-time zoom magnification to 10-time zoom magnification. The electronic device may capture and process the image and then display it with the display screen, while the electronic device may process the captured audio for playback through the headphones 805. The user can see the image displayed on the user interface 82 that the subject 102 is large but the subject 101 and the subject 102 are not visible, and can hear the subject 102 and the sound is large but the sound of the subject 101 and the subject 103 is not audible. In response to a user operation (e.g., a click operation) on the record control 801, the electronic device can begin recording video. The user interface shown in fig. 5d is displayed.

Fig. 5d shows a user interface 83 when the electronic device is recording a video. At this time, the zoom magnification of the electronic apparatus is 10 times zoom magnification.

In this way, the electronic device can debug the best zoom magnification when recording the video by previewing the recorded video.

In some embodiments, the electronic device may store each generated frame of image and audio, generate an image stream according to a multi-frame image, generate an audio stream according to a multi-frame audio, and mix the image stream and the audio stream to obtain a video, in addition to playing the generated image and audio in real time.

The following description of steps S101 to S107 may be referred to as a method related to video processing in scene 2, and will not be repeated here.

Scene 3: the electronic equipment can perform audio zooming processing on the audio stream in the recorded video. The electronic device can store the zoom ratio corresponding to each frame of audio when the video is recorded. Then, the electronic device may acquire any frame of audio in the audio stream and the zoom magnification of the frame of audio object, and then perform audio zoom processing on the frame of audio according to the zoom magnification. And after the electronic equipment performs audio zooming processing on each frame of audio in the audio stream, re-encoding to obtain a new audio stream.

Assume that the electronic device does not audio zoom the video as it is recorded at this time. Only the zoom magnification of the electronic device at the time of capturing each frame of audio is saved.

The process of recording the video by the electronic device may refer to the foregoing description of fig. 1 a-1 e.

Exemplary user interfaces involved in scenario 3 may be referred to in the description of fig. 6 a-6 f below.

Fig. 6a shows the user interface 90 as a user interface when the electronic device has finished recording a video. A playback control 901 is included in the user interface 90. The playback control 901 can be used to display a video or a captured image that was last recorded by the electronic device. In response to a user operation (e.g., a click operation) on the playback control 901, the electronic device can display a user interface as shown in fig. 6 b.

As shown in fig. 6b, the user interface 91 may be a user interface for setting a video by the electronic device. Further controls 911 are included in the user interface 91, and the further controls 911 may be used to display further settings for the video. In response to a user operation (e.g., a click operation) on the more control 911, the electronic device may display a user interface as shown in FIG. 6 c.

As shown in fig. 6c, a setting item for setting the video may be displayed in the user interface 92. Including a zoom mode setting item 921. The zoom mode setting item may be for receiving an instruction to audio zoom the video. In response to a user operation (e.g., a click operation) on the zoom mode setting item 921, the electronic device may perform audio zooming on the audio in the video, displaying a user interface as shown in fig. 6 d.

As shown in fig. 6d, the user interface 93 is an exemplary user interface for the electronic device to perform audio zooming on the audio in the video. A prompt box 931 may be displayed in the user interface 93, and prompt text may be displayed in the prompt box 931: "the file 'video 1' is being audio zoomed, please later". The prompt text may be used to prompt the user that the electronic device is currently audio zooming the video. After the electronic device has finished zooming the video, a user interface as shown in fig. 6e may be displayed.

As shown in fig. 6e, the user interface 94 is a user interface after the electronic device has completed zooming the audio. A prompt box 941 may be included in the user interface 94, and prompt text may be included in the prompt box 941: "whether the audio zoom of the file 'video 1' is completed and the original file is replaced", in response to a user operation (e.g., a click operation) on the yes control 941, the electronic device may store the video after the audio zoom in place of the video that was not originally subjected to the audio zoom. The electronic device may display a user interface as shown in fig. 6 f.

As shown in fig. 6f, the user interface 95 is one after replacing the video that was not audio zoomed with the video after audio zooming. The user interface 95 may include prompt text 951: "replacement was successful". The prompt 951 is used to prompt the user that the electronic device has successfully replaced video that was not audio zoomed with video after audio zoom. A play control 952 may also be included in the user interface 95. In response to a user operation (e.g., a click operation) on the play control 952, the electronic device can play the video after the audio zoom.

The following description of step S601 to step S609 may be referred to as a method related to video processing in scene 3, and is not repeated here.

The video processing method is suitable for electronic equipment with N microphones, wherein N is an integer greater than or equal to 2. The following describes in detail the process of the video processing method related to the above three scenes, taking an example in which the electronic device has three microphones.

Scene 1: an exemplary set of user interfaces for use with the audio processing method of the present application in scenario 1 may be referred to the descriptions of user interfaces 10-14 in fig. 1 a-1 e above. For the real-time video processing method related to the scene 1, from the beginning of recording the video, the electronic device can perform real-time processing on the acquired current frame image, and simultaneously perform audio zooming on the acquired current frame input audio signal set in real time according to the change of the zooming magnification. Assume that from the start of recording video to the completion of recording video, there are a total of N images and N sets of input audio signals. The electronic device may generate an image stream according to the N frames of images and an audio stream according to the N frames of audio, and mix the image stream and the audio stream to obtain a recorded video.

Wherein the current frame set of input audio signals all comprises a plurality of frames of input audio signals. Any frame of input audio signal refers to an audio signal collected by any microphone of the electronic device.

Fig. 7 shows a schematic diagram of processing a current frame image and a current frame input audio signal set in real time in scene 1 to obtain a video.

The process described below with respect to fig. 7 stops until the electronic device finishes recording the video. In the process of generating the image stream and the audio stream, the electronic equipment sequentially processes the acquired current frame images according to the acquisition sequence and then stores the processed current frame images into the image stream cache. And meanwhile, the collected current frame input audio signal set is sequentially processed according to the collection sequence and then stored in an audio stream buffer. Then, the image stream in the image stream buffer is encoded or the like to generate an image stream. And generating the image stream by processing such as coding the image stream in the image stream buffer.

For the current frame image, the electronic device may adopt a processing technique in the prior art to process the current frame image, which is not described herein again.

For the current frame input audio signal set, the electronic device may process the current frame input audio signal set by using the audio zooming method mentioned in this application, and this process will be described in detail in steps S101 to S107 in fig. 8 below, which is not described herein again.

Specifically, first, the electronic device starts recording a video, and acquires a first frame image and a first frame input audio signal set. Then, the first frame image is processed, and the processed first frame image is buffered in the region 1 of the image stream buffer. And meanwhile, processing the first frame input audio signal set, and caching the processed first frame input audio signal set into the area 1 of the audio stream cache. During playing, the electronic device can play the processed first frame of input audio signal set while playing the processed first frame of image.

Then, after the electronic device finishes capturing the first frame image and the first frame input audio signal set, during the process of processing the first frame image and the first frame input audio signal set, the electronic device may continue to capture the second frame image and the second frame input audio signal set, and the processing process is similar to the first frame image and the first frame input audio signal set. The electronic device may buffer the processed second frame image into the region 2 of the image stream buffer, and buffer the processed second frame input audio signal set into the region 2 of the audio stream buffer. During playing, the electronic device can play the processed second frame of input audio signal set while playing the processed second frame of image.

By analogy, after the electronic device collects the N-1 th frame image and the N-1 th frame input audio signal set, the electronic device may continue to collect the N-th frame image and the N-th frame input audio signal set in the process of processing the image and the N-th frame input audio signal set, and the processing process is similar to the first frame image and the first frame input audio signal set. The electronic device may buffer the processed nth frame image into the region N of the image stream buffer, and buffer the processed nth frame input audio signal set into the region N of the audio stream buffer. During playing, the electronic device can play the processed nth frame of image and simultaneously play the processed nth frame of input audio signal set.

In some embodiments, the time for playing a frame of image by the electronic device is 30ms, and the time for playing a frame of audio by the electronic device is 10ms, so that in fig. 7, when the electronic device plays a frame of image, a frame of input audio signal set includes 3 frames of audio.

The processing procedure for any frame of the input audio signal set related to scenario 1 above is described in detail below.

Taking an electronic device with three microphones as an example, for the video processing process related to fig. 7, any frame of input audio signal set collected by the electronic device includes three frames of input audio signals, and the electronic device may convert the three frames of input audio signals into a frequency domain to obtain three frames of audio signals. Then, the electronic device may combine the zoom ratios corresponding to the frame of input audio signal sets to keep the target sound signal in the three frames of audio signals unchanged and suppress the non-target sound signal, so as to generate a first output audio signal.

In some embodiments, if the zoom magnification is larger, the first output audio signal is enhanced, if the zoom magnification is smaller, the first output audio signal is suppressed, and the enhanced or suppressed first output audio signal is output to the audio stream buffer, and then the processing of the frame input audio signal set may be completed.

In other embodiments, the electronic device uses the enhanced or suppressed first output audio signal as a second output audio signal, suppresses a non-target sound signal in the second output audio signal to generate a third output audio signal, and buffers the third output audio signal in an audio stream buffer, so that the processing of the frame input audio signal set may be completed.

The above process can refer to the following detailed description of steps S101 to S107 in fig. 8:

s101, collecting a first input audio signal, a second input audio signal and a third input audio signal by electronic equipment;

an exemplary user interface for the electronic device to capture the first input audio signal, the second input audio signal, and the third input audio signal may refer to the user interfaces shown above for fig. 1 b-1 d.

The first input audio signal, the second input audio signal and the third input audio signal collected by the electronic device are any one of the sets of frame input audio signals referred to in fig. 7.

The first input audio signal is a current frame audio signal converted from a sound signal collected by a first microphone of the electronic device in a first time period. The second input audio signal is a current frame audio signal converted from a sound signal collected by a second microphone of the electronic device in a first time period. The third input audio signal is a current frame audio signal converted from a sound signal collected by a third microphone of the electronic device in the first time period.

Take the example of the electronic device capturing the first input audio signal.

Specifically, during the first time period, the first microphone of the electronic device may collect a sound signal and then convert the sound signal into an analog electrical signal. The electronic device then samples the analog electrical signal and converts it to an audio signal in the time domain. The audio signal in the time domain is a digital audio signal, and is sampling points of W analog electric signals. The first input audio signal may be represented in an electronic device by an array, where any element in the array is used to represent a sampling point, and any element includes two values, where one value represents time, and the other value represents the amplitude of the audio signal corresponding to the time, and the amplitude is used to represent the voltage magnitude corresponding to the audio signal.

It is to be understood that, the process of acquiring the second input audio signal and the third input audio signal by the electronic device may refer to the description of the first input audio signal, and will not be described herein again.

S102, the electronic equipment converts the first input audio signal, the second input audio signal and the third audio signal to a frequency domain to obtain a first audio signal, a second audio signal and a third audio signal;

the first input audio signal, the second input audio signal, and the third audio signal involved in the above step S101 are audio signals in the time domain. For processing convenience, the electronic device may convert the first input audio signal to the frequency domain to obtain a first audio signal, convert the second input audio signal to the frequency domain to obtain a second audio signal, and convert the third input audio signal to the frequency domain to obtain a third audio signal.

Take the example that the electronic device converts the first input audio signal into the first audio signal in the frequency domain.

Specifically, the electronic device may divide the first input audio signal in the time domain onto the frequency domain using a Fourier Transform (FT), such as a Discrete Fourier Transform (DFT).

In some embodiments, the electronic device may divide the first input audio signal into first audio signals corresponding to N bins by a 2N-point DFT. N is an integer power of 2, a value of N is determined by a computing power of the electronic device, and the larger a processing speed of the electronic device is, the larger the value of N may be.

In the embodiment of the present application, the following explanation is performed by taking an example that the electronic device divides the first input audio signal into first audio signals corresponding to 1024 frequency points through 2048-point DFT. The electronic device may represent the first audio signal in an array comprising 1024 elements. Any element is used for representing a frequency point, and the frequency point comprises two values, wherein one value represents the frequency (hz) of the audio signal corresponding to the frequency point, the other value represents the amplitude of the audio signal corresponding to the frequency point, and the unit of the amplitude is decibel (dB), which represents the decibel size of the audio signal corresponding to the time.

It should be understood that the electronic device may express the first audio signal in other ways besides an array, such as a matrix, and the like, which is not limited by the embodiment of the present application.

It can be understood that, the process of converting the second input audio signal to the frequency domain by the electronic device to obtain the second audio signal, and converting the third input audio signal to the frequency domain to obtain the third audio signal is the same as the process of converting the first input audio signal to the frequency domain to obtain the first audio signal, and is not described herein again.

S103, the electronic equipment acquires a first zoom magnification;

one exemplary user interface for the electronic device to capture the first input audio signal, the second input audio signal, and the third input audio signal may refer to the user interfaces illustrated above with respect to fig. 1 b-1 d.

The first zoom magnification refers to a zoom magnification used by the electronic device when the current frame image is collected. When the electronic device starts to record a video, the default zoom magnification used is 1 zoom magnification, and the electronic device can change the zoom magnification used when the current frame image is collected according to the setting of a user. One way to change the zoom magnification is as described above with reference to fig. 1 a-1 e. The description about the zoom magnification may refer to the foregoing description of the term (1).

It should be understood that, the execution sequence between step S102 and step S103 is not consecutive, and the electronic device may execute step S102 and then step S103, or execute step S103 and then step S102. The method and the device can also be executed simultaneously, and the method and the device are not limited in the embodiment of the application.

S104, the electronic equipment generates a first output audio signal by using the first audio signal, the second audio signal and the third audio signal according to the first zoom ratio, wherein a target sound signal in the first output audio signal is kept, and a non-target sound signal is suppressed;

the target sound signal is an audio signal corresponding to a sound emitted by the subject in the field angle range at the first zoom magnification. The non-target sound signal is an audio signal corresponding to a sound emitted from a subject out of the field angle range.

In step S104, the electronic device may filter and combine the first audio signal, the second audio signal, and the third audio signal according to the first zoom ratio to generate a first output audio signal. The first output audio signal may include multiple audio signals, such as two audio signals, i.e., a left channel audio signal and a right channel audio signal. The first output audio signal may also include a channel of audio signal. Specifically, the number of the audio signals included in the first output audio signal may be determined by the number of speakers of the electronic device, which does not limit the embodiments of the present application.

Wherein the purpose of filtering is to suppress the non-target sound signal in the first audio signal, the second audio signal and the third audio signal, while keeping the target sound signal unchanged.

Optionally, the process may employ beamforming techniques.

Specifically, in order to enable the generated first audio signal to have a stereo effect, the electronic device may perform primary filtering on the first audio signal, the second audio signal, and the third audio signal in different directions by using filter coefficients corresponding to different directions, respectively, to obtain audio signals corresponding to the directions. And then synthesizing the audio signals corresponding to all directions to generate a first audio signal.

In some embodiments, the first audio signal includes a single audio signal, i.e., a mono audio signal. Illustratively, the electronic device includes a single speaker.

In other embodiments, the first audio signal includes two audio signals, i.e., a left channel audio signal and a right channel audio signal. Illustratively, the electronic device includes two speakers.

Fig. 9-11 show a schematic of a beamforming technique.

In order to compare the difference between the scheme related to the foregoing fig. 2-4 and the embodiment of the present application, the shooting scenes in the foregoing fig. 2-4 are taken as an example, and the beam forming diagrams related to the electronic device under the zoom magnification of 1 time, the zoom magnification of 5 times and the zoom magnification of 10 times are described.

In fig. 9 (b) to 11 (b), the sound signal in the icon 601 is a solid line, and in the present embodiment, the sound of the subject does not belong to the suppressed subject but to the target subject, and the sound of the subject 103 can be heard when playing the video. In 602, a cross is drawn on the sound signal, which in the present embodiment indicates that the sound of the subject belongs to the suppressed object and to the non-target object, and the sound of the subject 101 cannot be heard when playing the video. The subject 101 is drawn by a dotted line in the icon 603, and this indicates that the subject 101 does not appear in the picture of the video and belongs to a non-target subject to be photographed. In the icon 604, the subject 103 is drawn by a solid line, which indicates that the subject 103 appears in the image of the video and belongs to the target subject of the shooting. Optionally, in this embodiment of the present application, the sound of the non-target object belongs to the suppressed object. It is understood that the target object is an object appearing in the video picture, the sound of which does not need to be suppressed; correspondingly, non-target objects are objects that do not appear in the video frame, whose sound needs to be suppressed.

The above description of the respective icons in fig. 9 (b) to 11 (b) is equally applicable in fig. 9 (c) to 11 (c).

It should be understood that, in fig. 9 (b) -fig. 11 (b), icons having similar shapes have the same meaning and are not explained one by one. For example, when the subject is drawn with a solid line, the meaning of the representation is that the subject can appear in the picture of the video, belonging to the captured non-target object. For example, in fig. 9 (b), if the subject 101 and the subject 102 are drawn by solid lines, it indicates that the subject 101 and the subject 102 can appear in the picture of the video and belong to the target object of shooting.

As shown in fig. 9 (a), the user interface 50 is a user interface when the electronic device plays a video. At this time, the electronic device plays the image and audio corresponding to the 1 st second recorded in fig. 1 b. At this time, the zoom magnification of the electronic apparatus is 1-time zoom magnification. In the user interface 50, it can be seen that the currently played image includes a subject 101, a subject 102, and a subject 103.

When the audio corresponding to the 1 st second in the video generated by the electronic device is a monaural audio, the monaural audio can be generated by the monaural beam forming diagram shown in fig. 9 (b).

When the audio corresponding to the 1 st second in the video generated by the electronic device includes a left channel audio signal and a right channel audio signal, the left channel audio signal may be generated using the beam forming pattern of the left channel shown in (c) of fig. 9, and the right channel audio signal may be generated using the beam forming pattern of the right channel shown in (c) of fig. 9.

Optionally, the beamforming pattern of the left channel and the beamforming pattern of the right channel are symmetric.

In some embodiments, the electronic device may use the beamforming diagram for the left channel and the beamforming diagram for the right channel shown in fig. 9 (c) to perform fusion, resulting in a monophonic beamforming diagram shown in fig. 9 (b).

As shown in fig. 9 (b), a monaural beam forming diagram is obtained at a zoom magnification of 1. The symmetry line of the beam forming diagram is in the 0 ° direction, the electronic device can generate a monaural audio by using the monaural beam forming diagram, and as can be seen from the beam forming diagram, the gain coefficients corresponding to the directions of the person 101 to be photographed, the person 102 to be photographed, and the person 103 to be photographed are all 1 (or close to 1), so that the audio signal collected by the electronic device includes the sounds of the person 101 to be photographed, the person 102 to be photographed, and the person 103 to be photographed, and does not suppress them (it can be understood that since the gain coefficient is close to 1, the suppression effect is weak, and it can be considered that no suppression is performed, and other embodiments are also applicable). That is, in the monaural audio, the voices of the subject 101, the subject 102, and the subject 103 are not suppressed, and the user can hear the voices of the subject 101, the subject 102, and the subject 103.

As shown in fig. 9 (c), the left channel beam forming pattern and the right channel beam forming pattern are obtained at 1-time zoom magnification. The line of symmetry of the beamforming pattern for the left channel is in the 45 ° direction, the electronic device may generate the left channel audio signal using the beamforming pattern for the left channel, the line of symmetry of the beamforming pattern for the right channel is in the 315 ° direction, and the electronic device may generate the right channel audio signal using the beamforming pattern for the right channel.

It is understood that the angles 45 ° and 315 ° in this embodiment are only examples, and may be adjusted to other angles as needed, and the present application does not limit this. The angles in the embodiments of fig. 10 and 11 are equally applicable.

At this stage, whether the electronic device suppresses the sound of the subject 101, the subject 102, and the subject 103 is an effect that should be exhibited after the left channel audio signal and the right channel audio signal are output. For example, since the gain coefficient in the beam forming diagram of the left channel and the gain coefficient in the beam forming diagram of the right channel are both 1 (or close to 1) for the subject 101, the electronic device does not suppress the sound of the subject 101. Since the gain coefficient in the beam forming diagram for the left channel and the gain coefficient in the beam forming diagram for the right channel are both 1 (or close to 1) for the subject 102, the electronic device does not suppress the sound of the subject 102. For the subject 103, since the gain coefficient in the beam forming diagram of the left channel is 1 (or close to 1) and the gain coefficient in the beam forming diagram of the right channel is 1 (or close to 1), the electronic device does not suppress the sound of the subject 103. Thus, when the electronic device plays the image and audio corresponding to the 1 st second, the subject 102, the subject 103, and the subject 101 can be seen, and the sound of the subject 102, the sound of the subject 103, and the sound of the subject 101 can be heard.

As shown in fig. 10 (a), the user interface 60 is a user interface when the electronic device plays the video corresponding to the 2 nd second. At this time, the electronic device plays the image and audio corresponding to the 2 nd second recorded in fig. 1 c. At this time, the zoom magnification of the electronic apparatus is increased from 1-time zoom magnification to 5-time zoom magnification. It can be seen in the user interface 60 that the currently played image includes the subject 102 and the subject 103, but no subject 101.

When the audio corresponding to the 2 nd second in the video generated by the electronic device is a monaural audio, the monaural audio can be generated by the monaural beam forming diagram shown in fig. 10 (b).

When the audio corresponding to the 2 nd second in the video generated by the electronic device includes a left channel audio signal and a right channel audio signal, the left channel audio signal may be generated using a beam forming pattern of a left channel shown in (c) in fig. 10, and the right channel audio signal may be generated using a beam forming pattern of a right channel shown in (c) in fig. 10.

In some embodiments, the electronic device may use the beamforming diagram for the left channel and the beamforming diagram for the right channel shown in fig. 10 (c) to perform fusion, resulting in a monophonic beamforming diagram shown in fig. 10 (b).

As shown in fig. 10 (b), a monaural beam forming diagram is obtained at a zoom magnification of 5 times. The symmetry line of the beam forming pattern is in the 0 ° direction, the electronic device can generate monaural audio by using the monaural beam forming pattern, and as can be seen from the beam forming pattern, the gain coefficients corresponding to the directions of the subject 102 and the subject 103 are both 1 (or close to 1), so that the electronic device does not suppress the sound of the subject 102 and the subject 103. However, since the gain coefficients corresponding to the direction of the subject 101 are all 0 (or close to 0), the electronic device can suppress the sound of the subject 101. The audio signal collected by the electronic device includes the voices of the person 101 to be photographed, the person 102 to be photographed, and the person 103 to be photographed, but the voice of the person 101 to be photographed is suppressed in the played audio, and the voice of the person 101 to be photographed cannot be heard or the voice of the person 101 to be photographed becomes small in hearing.

As shown in fig. 10 (c), when the zoom magnification is 5 times, the beam forming pattern of the left channel and the beam forming pattern of the right channel are obtained. The line of symmetry of the beamforming pattern for the left channel is in the 45 ° direction, the electronic device may generate the left channel audio signal using the beamforming pattern for the left channel, the line of symmetry of the beamforming pattern for the right channel is in the 315 ° direction, and the electronic device may generate the right channel audio signal using the beamforming pattern for the right channel.

At this time, whether the electronic apparatus suppresses the sound of the subject 101, the subject 102, and the subject 103 should be the effect presented after the left channel audio signal and the right channel audio signal are output. For example, for the subject 101, the gain coefficient in the beam forming diagram of the left channel and the gain coefficient in the beam forming diagram of the right channel are both 0 (or close to 0), so the electronic device can suppress the sound of the subject 101. Since the gain coefficient in the beam forming diagram for the left channel and the gain coefficient in the beam forming diagram for the right channel are both 1 (or close to 1) for the subject 102, the electronic device does not suppress the sound of the subject 102. For the subject 103, although the gain coefficient in the beam forming diagram of the left channel is 1 (or close to 1), and the gain coefficient in the beam forming diagram of the right channel is 1 (or close to 1), the electronic device does not suppress the sound of the subject 103. Thus, when the electronic device plays the image and audio corresponding to the 2 nd second, the user 102 and the user 103 can be seen but the user 101 is not seen, and the sounds of the user 102 and the user 103 can be heard, but the sound of the user 101 is not heard or is very small.

As shown in fig. 11 (a), the user interface 70 is a user interface when the electronic device plays the video corresponding to the 3 rd second. At this time, the electronic device plays the image and audio corresponding to the 3 rd second recorded in fig. 1 d. At this time, the zoom magnification of the electronic apparatus is increased from 5 times zoom magnification to 10 times zoom magnification. It can be seen in the user interface 70 that the currently played image includes the subject 102, but no subject 101 and no subject 103.

When the audio corresponding to the 3 rd second in the video generated by the electronic device is monaural audio, the monaural audio can be generated by the monaural beam forming diagram shown in fig. 11 (b).

When the audio corresponding to the 3 rd second in the video generated by the electronic device includes a left channel audio signal and a right channel audio signal, the left channel audio signal may be generated using the beam forming pattern of the left channel shown in (c) of fig. 11, and the right channel audio signal may be generated using the beam forming pattern of the right channel shown in (c) of fig. 11.

In some embodiments, the electronic device may use the beamforming diagram for the left channel and the beamforming diagram for the right channel shown in fig. 11 (c) to perform fusion, resulting in a monophonic beamforming diagram shown in fig. 11 (b).

As shown in fig. 11 (b), a monaural beam forming diagram is obtained at a zoom magnification of 10 times. The symmetry line of the beam forming pattern is in the 0 ° direction, the electronic device can generate monaural audio by using the monaural beam forming pattern, as can be seen from the beam forming pattern, the gain coefficients corresponding to the directions of the subject 101 and the subject 103 are both 0 (or close to 0), and the electronic device suppresses the sound of the subject 101 and the subject 103. However, the gain coefficients corresponding to the direction in which the subject 102 is located are all 1 (or close to 1), and the electronic device does not suppress the sound of the subject 102. The audio signal collected by the electronic device includes the voices of the subject 101, the subject 102, and the subject 103, but the voices of the subject 101 and the subject 103 are suppressed in the played audio, and the voice of the first subject is inaudible or inaudible to the auditory sense.

As shown in fig. 11 (c), the left channel beam forming pattern and the right channel beam forming pattern are obtained at a zoom magnification of 10 times. The left channel beamforming pattern has a line of symmetry in the 10 ° direction, the electronic device may generate the left channel audio signal using the left channel beamforming pattern, the right channel beamforming pattern has a line of symmetry in the 350 ° direction, and the electronic device may generate the right channel audio signal using the right channel beamforming pattern.

At this time, whether the electronic apparatus suppresses the sound of the subject 101, the subject 102, and the subject 103 should be the effect presented after the left channel audio signal and the right channel audio signal are output. For example, for the subject 101, the gain coefficient in the beam forming diagram of the left channel and the gain coefficient in the beam forming diagram of the right channel are both 0 (or close to 0), so the electronic device can suppress the sound of the subject 101. Since the gain factor in the left channel beam forming pattern and the gain factor in the right channel beam forming pattern are both 1 (or close to 1) for the subject 102, the electronic device does not suppress the sound of the subject 102. For the subject 103, although the gain coefficient in the beam forming diagram of the left channel is 0 (or close to 0) and the gain coefficient in the beam forming diagram of the right channel is 0 (or close to 0), the electronic device suppresses the sound of the subject 103. Thus, when the electronic device plays the image and audio corresponding to the 3 rd second, the user 102 can be seen but the user 101 and the user 103 cannot be seen, or the user 102 can be heard but the user 101 and the user 103 cannot be heard or the user 101 and the user 103 cannot be heard.

Wherein, the first output audio signal comprises two audio signals: a left channel audio signal and a right channel audio signal. The process of the electronic device generating the left channel audio signal and the right channel audio signal according to the first zoom ratio by using the first audio signal, the second audio signal, and the third audio signal may refer to the following description of steps S201 to S203.

Fig. 12 is an exemplary flowchart of when an electronic device generates a left channel audio signal or a right channel audio signal.

S201, the electronic equipment acquires a first filter coefficient corresponding to a first direction, a second filter coefficient corresponding to a second direction and a third filter coefficient corresponding to a third direction;

for a detailed description of the first, second and third directions, reference may be made to the following:

the direction in which the rear camera faces is taken as the front of the electronic equipment. The first direction is any direction within a range of 10 ° from the front of the electronic apparatus to 70 ° from the front of the electronic apparatus, for example, 45 ° from the front of the electronic apparatus. The second direction may be any direction within a range of 10 ° counterclockwise from the front of the electronic device to 10 ° counterclockwise from the front of the electronic device, for example, the front of the electronic device. The third direction is any one of directions within a range of 10 ° from the front counterclockwise direction of the electronic apparatus to 70 ° from the front counterclockwise direction of the electronic apparatus, for example, 45 ° from the front counterclockwise direction of the electronic apparatus.

The electronic device can adjust the angles of the first direction, the second direction and the third direction as required.

For example, at 1 zoom magnification, the beam forming diagram in fig. 9 (c) can be adjusted, and the second direction is the direction right in front of the electronic device, i.e. the 0 ° direction in the diagram. The first direction is a direction in which the positive direction is 45 ° clockwise, i.e., the beam of the left channel in the figure forms 45 ° in the figure. The first direction is a direction in which the positive direction is 45 ° counterclockwise, i.e., 315 ° in the drawing is formed by the beam of the right channel in the drawing.

It is understood that the angles 45 ° and 315 ° in (c) in fig. 9 are only examples, and may be adjusted to other angles as needed, and the present application does not limit this. The same applies to the angles in the embodiment of fig. 10 and 11.

Referring to fig. 13, fig. 13 is a diagram illustrating an example of the first direction, the second direction, and the third direction.

As can be seen from the front view of the electronic apparatus shown in fig. 13 (a) and the top view of the electronic apparatus shown in fig. 13 (b). Referring to fig. 13 (c), the front of the electronic device ranges from 90 ° clockwise to 270 °. The direction of the right front of the electronic device is 0 degrees, the direction of the right front of the electronic device is 45 degrees in the clockwise direction in the figure, and the direction of the right front of the electronic device is 315 degrees in the counterclockwise direction in the figure. The first direction may be defined as the 45 direction, the second direction as the 0 direction and the third direction as the 315 direction.

For example, as shown in (c) of fig. 9 described above, the first direction may be a 45 ° direction, the second direction may be 0 °, and the third direction may be 315 °. As shown in (c) of fig. 10, the first direction may be a 30 ° direction, the second direction is 0 °, and the third direction may be 330 °. As shown in (c) of fig. 11, the first direction may be a 10 ° direction, the second direction is 0 °, and the third direction may be 350 °.

It is understood that the above mentioned angles are only examples, and can be adjusted to other angles according to the requirement, and the present application does not limit the present invention.

In order to make the left channel audio signal and the right channel audio signal embody stereo, it is necessary that the audio signal remaining in the left channel audio signal and the right channel audio signal is different from the suppressed audio signal. In other words, in the left channel audio signal, the audio signal collected from the right direction of the electronic device is retained, and the audio signal collected from the right direction is suppressed. In the audio signals of the right channel, the audio signals collected from the right direction right ahead of the electronic equipment are reserved, and the audio signals collected from the left direction are suppressed.

In the embodiment of the present application, the left direction is deviated to the first direction, the right direction is deviated to the second direction, and the right direction is deviated to the third direction.

The first direction is deviated to the left relative to the right of the electronic equipment, the third direction is deviated to the right relative to the right of the electronic equipment, and the second direction is relatively positioned at the right front of the electronic equipment.

The first filter coefficient corresponding to the first direction and the second filter coefficient corresponding to the second direction are used for generating a left channel audio signal. The audio signal that it was gathered to the left direction right-hand side in the dead ahead of electronic equipment keeps, and the audio signal that the direction was gathered to the right side suppresses. The first filter coefficient corresponding to the second direction and the third filter coefficient corresponding to the third direction are used for generating a right channel audio signal. The audio signal that it was gathered to the right side direction in the dead ahead of electronic equipment keeps, and the audio signal that the left side direction was gathered suppresses.

The first filter coefficient corresponding to the first direction, the second filter coefficient corresponding to the second direction, and the third filter coefficient corresponding to the third direction are previously configured in the electronic device before the electronic device is shipped.

For an example of the electronic device generating the first filter corresponding to the first direction, the process may refer to the following description of step S301 to step S303 in fig. 14.

The first filter is described in detail as follows:

the first filter coefficient corresponding to the first direction includes a first filter coefficient corresponding to the first microphone in the first direction, a first filter coefficient corresponding to the second microphone in the first direction, and a first filter coefficient corresponding to the third microphone in the first direction. The first filter coefficient corresponding to the first microphone in the first direction may be used to retain an audio signal collected in the first audio signal in a left direction with respect to a front side of the electronic device, and suppress audio signals collected in the front side and the right direction. The first filter coefficient of the second microphone corresponding to the first direction may be used to reserve audio signals collected in the second audio signal in a direction to the left with respect to the front of the electronic device, and suppress audio signals collected in the front and the direction to the right. The third filter coefficient of the first microphone corresponding to the third direction may be used to retain the audio signal collected in the third audio signal in the left direction with respect to the front of the electronic device, and suppress the audio signal collected in the front and the right direction. Details of the process may refer to the following description of step S202.

If the first audio signal includes N frequency points, the first filter coefficient corresponding to the first microphone in the first direction should also have N elements (coefficients), where the jth element represents the suppression degree of the jth frequency point in the N frequency points corresponding to the first audio signal.

Specifically, when the jth element is equal to 1 or close to 1, the electronic device does not suppress the audio signal corresponding to the jth frequency point (nearly 1, the suppression degree is very low, and almost no suppression is performed, and it is considered to be reserved), that is, it is reserved, and it is considered that the direction of the audio signal corresponding to the jth frequency point is left. In other cases, the audio signal corresponding to the jth frequency point is suppressed, for example, when the jth element is equal to or close to 0, the degree of suppression performed on the audio signal corresponding to the jth frequency point by the electronic device is greater, that is, suppression is performed, and it is considered that the direction of the audio signal corresponding to the jth frequency point is more rightward.

Fig. 14 is an exemplary flowchart of the electronic device generating a first filter corresponding to the first direction.

In some embodiments, the specific process of the electronic device generating the first filter coefficient corresponding to the first direction may refer to the following description of steps S301 to S303:

s301, the electronic equipment respectively obtains a first test audio signal, a second test audio signal and a third test audio signal at different distances in multiple directions.

The direction refers to a horizontal angle between the sound-emitting object and the electronic device, and the distance refers to the Euclidean distance between the sound-emitting object and the electronic device. The sound generating object is unitary.

The audio signals at different distances are obtained in multiple directions respectively to make the generated first filter coefficients have universality. That is, when the electronic device is shipped and a video is recorded, the directions of the collected first input audio signal, second input audio signal, and third input audio signal are the same as or close to one of the directions. The first filter coefficients are applied to the first input audio signal, the second input audio signal and the third input audio signal.

In some embodiments, the plurality of directions may include 36 directions, and one direction every 10 ° around the electronic device. The plurality of distances may include 3 distances of 1m, 2m, and 3m, respectively.

The first test audio signal is a set of input audio signals at different distances respectively acquired by a first microphone of the electronic device in a plurality of directions.

The second test audio signal is a set of input audio signals at different distances respectively acquired by a second microphone of the electronic device in a plurality of directions.

The third test audio signal is a set of input audio signals at different distances respectively acquired by a third microphone of the electronic device in a plurality of directions.

S302, the electronic equipment acquires a first target wave beam corresponding to a first direction.

The first target beam is used by the electronic device to generate first filter coefficients corresponding to a first direction that describe a degree of filtering by the electronic device in a plurality of directions. The first target beam may be a desired beam or a desired beam to be formed.

In some embodiments, when the plurality of directions is 36 directions, then there are 36 gain factors in the first target beam. The ith gain factor represents the degree of suppression in the ith direction, with one gain factor for each direction. Wherein the gain factor corresponding to the first direction is 1, and then, every direction having a difference of 10 ° from the first direction, the gain factors are sequentially subtracted by 1/36. Then, the closer the element corresponding to the direction closer to the first direction is to 1, the closer the element corresponding to the direction farther from the first direction is to 0.

And S303, the electronic equipment generates a first filter coefficient corresponding to the first direction by using the first test audio, the second test audio, the third test audio and the first target beam through a device-dependent transfer function.

The formula for the electronic device to generate the first filter coefficient corresponding to the first direction is the following formula (1):

in the formula (1), w ₁ (ω) is a first filter coefficient comprising 3 elements, where the ith element can be represented as w _1i (ω)，w _1i (ω) is a first filter coefficient, H, for the ith microphone in the first direction ₁ (ω) represents the first test audio signal, H ₂ (ω) represents the second test audio signal and H ₃ (ω) represents the third test audio signal, G (H) ₁ (ω)，H ₂ (ω)，H ₃ (ω)) means that the first, second and third test audio signals are processed by a device dependent transfer function, which may be used to describe the correlation between the first, second and third test audio signals. H ₁ Representing a first target beam, w ₁ Denotes filter coefficients that can be found in a first direction, and argmin denotes w found using a least squares frequency invariant fixed beamforming method ₁ As a first filter coefficient corresponding to the first direction.

The second filter coefficient corresponding to the second direction includes a second filter coefficient corresponding to the first microphone in the second direction, a second filter coefficient corresponding to the second microphone in the second direction, and a second filter coefficient corresponding to the third microphone in the second direction. The second filter coefficient corresponding to the first microphone in the second direction may be used to reserve the audio signal collected in the first audio signal with respect to the front of the electronic device, and suppress the audio signals collected in the left and right directions. The second filter coefficient of the second microphone corresponding to the second direction may be used to retain the audio signal collected in the second audio signal with respect to the front of the electronic device, and suppress the audio signals collected in the left and right directions. The third filter coefficient of the first microphone corresponding to the third direction may be used to retain the audio signal collected in the third audio signal relative to the front of the electronic device, and suppress the audio signals collected in the left and right directions.

For the detailed description of the second filter, reference may be made to the detailed description of the first filter, and details are not repeated here.

The formula for the electronic device to generate the second filter coefficient corresponding to the second direction is the following formula (2):

the description of formula (2) may refer to the foregoing description of steps S401 to S402. With the difference that w ₂ (ω) is a second filter coefficient, which includes 3 elements, where the ith element may be represented as w _2i (ω)，w _2i (ω)) is a second filter coefficient, H, for the ith microphone in the second direction ₂ Representing a second target beam, w, corresponding to a second direction ₂ Denotes the filter coefficients that can be found in the second direction, argmin denotes w found using least squares frequency invariant fixed beamforming ₂ As a second filter coefficient corresponding to the second direction.

Wherein the second target beam is used for the electronic device to generate a second filter corresponding to a second direction, which describes a filtering degree of the electronic device in a plurality of directions.

In some embodiments, when the plurality of directions is 36 directions, then there are 36 gain factors in the second target beam. The ith gain factor represents the degree of filtering in the ith direction, and any direction corresponds to a gain factor. Wherein the corresponding gain factor in the second direction is 1, and then, every direction having a difference of 10 ° from the second direction, the gain factors are sequentially subtracted by 1/36. Then, the elements corresponding to directions closer to the second direction are closer to 1, and the elements corresponding to directions farther from the second direction are closer to 0.

The third filter coefficient corresponding to the third direction includes a third filter coefficient corresponding to the first microphone in the third direction, a third filter coefficient corresponding to the second microphone in the third direction, and a third filter coefficient corresponding to the third microphone in the third direction. The third filter coefficient corresponding to the first microphone in the third direction may be used to retain the audio signal collected in the first audio signal in the right direction with respect to the front of the electronic device, and suppress the audio signal collected in the front and the left direction. The third filter coefficient of the second microphone corresponding to the third direction may be used to retain the audio signal collected in the second audio signal in the right direction with respect to the front of the electronic device, and suppress the audio signal collected in the front and left directions. The third filter coefficient of the first microphone corresponding to the third direction may be used to reserve the audio signal collected in the third audio signal in the right direction with respect to the front of the electronic device, and suppress the audio signal collected in the front and left directions.

For a detailed description of the third filter, reference may be made to the above detailed description of the first filter, which is not repeated here.

The formula for the electronic device to generate the third filter coefficient corresponding to the third direction is the following formula (3):

the description of formula (3) may refer to the foregoing description of steps S401 to S402. With the difference that w ₃ (ω) is a third filter coefficient, which includes 3 elements, where the ith element may be represented as w _3i (ω)，w _3i (ω) third filter coefficient corresponding to the ith microphone in the third direction，H ₃ Representing a third target beam, w, corresponding to a third direction ₃ Denotes filter coefficients which can be found in a third direction, and argmin denotes w found using a least squares frequency invariant fixed beamforming method ₃ As a third filter coefficient corresponding to the third direction.

Wherein the third target beam is used for the electronic device to generate a third filter corresponding to a third direction, which describes the filtering degree of the electronic device in multiple directions.

In some embodiments, when the plurality of directions is 36 directions, then there are 36 gain factors in the third target beam. The ith gain factor represents the degree of filtering in the ith direction, and any direction corresponds to a gain factor. Wherein, the gain coefficient corresponding to the third direction is 1, and then, every direction with a 10 degree difference from the third direction, the gain coefficients are sequentially subtracted by 1/36. Then, the elements corresponding to directions closer to the third direction are closer to 1, and the elements corresponding to directions farther from the third direction are closer to 0.

S202, generating a first wave beam corresponding to a first direction, a second wave beam corresponding to a second direction and a third wave beam corresponding to a third direction by respectively utilizing the first filter coefficient, the second filter coefficient and the third filter coefficient and combining a first audio signal, a second audio signal and a third audio signal;

the first beam corresponding to the first direction is an audio signal obtained by synthesizing the first audio signal, the second audio signal and the third audio signal by the electronic device. In the synthesizing process, the electronic device may reserve the audio signal collected in the left direction with respect to the front of the electronic device among the first audio signal, the second audio signal, and the third audio signal, and suppress the audio signals collected in the front and the right directions.

The second beam corresponding to the second direction is an audio signal obtained by synthesizing the first audio signal, the second audio signal and the third audio signal by the electronic device. In the synthesizing process, the electronic device may reserve the audio signal collected directly in front of the electronic device among the first audio signal, the second audio signal, and the third audio signal, and suppress the audio signals collected in the left and right directions.

The third beam corresponding to the third direction is an audio signal obtained by synthesizing the first audio signal, the second audio signal and the third audio signal by the electronic device. In the synthesizing process, the electronic device may reserve the audio signal collected in the right direction with respect to the front of the electronic device among the first audio signal, the second audio signal, and the third audio signal, and suppress the audio signal collected in the front and the right direction.

The electronic device uses the first filter coefficient to combine the first input audio signal, the second input audio signal and the third audio input signal to generate a formula (4) related to a first beam corresponding to the first direction, wherein the formula is as follows:

y ₁ a first beam corresponding to a first direction is represented, which comprises N elements. Any element is used to represent a frequency bin. The number of the frequency points corresponding to the first wave beam is the same as the frequency points corresponding to the first audio signal, the second audio signal and the third audio signal.

In the formula w _1i (ω) is a first filter coefficient corresponding to the ith microphone in the first direction, w _1i And the jth element in the (omega) represents the degree of suppressing the audio signal corresponding to the jth frequency point in the audio signal. x is the number of _i (ω) is the audio signal corresponding to the ith microphone, x _i The jth element in (ω) represents the complex field of the jth frequency bin, which represents the amplitude and phase information of the sound signal corresponding to that frequency bin.

For example, the jth element of the first filter coefficient corresponding to the ith microphone in the first direction is denoted as a _ji The jth element in the audio signal corresponding to the ith microphone is denoted as b _ij . The above formula can be expressed as the following formula (5):

the first beam corresponding to the first direction can be specifically expressed as the following formula (6):

the process of synthesizing the first audio signal, the second audio signal and the third audio signal by the electronic device can be seen from the above formula (6). And in the process of synthesis, the audio signals collected in the left direction relative to the right front of the electronic equipment in the first audio signal, the second audio signal and the third audio signal are reserved, and the audio signals collected in the right front and the right direction are suppressed.

Specifically, it can be seen from the above formulas (4) to (6): the electronic equipment performs cross multiplication on complex fields corresponding to N frequency points in audio signals (including a first audio signal, a second audio signal and a third audio signal) collected by M microphones and N elements in first filter coefficients corresponding to the M microphones in a first direction respectively to obtain M cross multiplication results, and then performs point-to-point addition on the M cross multiplication results to finally obtain a first beam. The first beam comprises N new frequency points. The cross multiplication process is a process of reserving the audio signals collected from the right front and the left direction of the electronic equipment and inhibiting the audio signals collected from the right front and the right direction. The process of point-to-point addition is the process of synthesis.

It should be understood that, when the jth element in the first filter coefficient corresponding to the M microphones in the first direction is equal to or close to 1, the electronic device does not suppress, i.e., retains, the audio signal corresponding to the frequency bin multiplied by the jth element, and the direction of the audio signal corresponding to the jth frequency bin is considered to be close to the first direction. In other cases, the audio signal corresponding to the frequency point multiplied by the jth element is suppressed, for example, when the jth element is equal to 0 or close to 0, the greater the degree of suppression performed on the audio signal corresponding to the jth frequency point by the electronic device is, the further the direction of the audio signal corresponding to the jth frequency point is from the first direction.

The electronic device uses the second filter coefficient, and combines the first input audio signal, the second input audio signal, and the third audio input signal to generate a formula related to the second beam corresponding to the second direction, such as the following formula (7) -formula (9), and the description of the formula (7) -formula (9) may refer to the description of the formula (4) -formula (6):

y ₂ a second beam corresponding to a second direction is represented, which comprises N elements. Any element is used to represent a frequency bin. The number of frequency points corresponding to the second wave beam is the same as the frequency points corresponding to the first audio signal, the second audio signal and the third audio signal.

In the formula w _2i (ω) is a second filter coefficient corresponding to the ith microphone in the second direction, w _2i And the jth element in the (omega) represents the degree of suppressing the audio signal corresponding to the jth frequency point in the audio signal.

For example, the jth element of the second filter coefficient corresponding to the ith microphone in the second direction is denoted as c _ji The jth element in the audio signal corresponding to the ith microphone is denoted as b _ij . The above formula can be expressed as the following formula (8):

the second beam corresponding to the second direction can be specifically expressed as the following formula (9):

the process of synthesizing the first audio signal, the second audio signal and the third audio signal by the electronic device can be seen from the above formula (9). And in the process of synthesis, the audio signals collected in the first audio signal, the second audio signal and the third audio signal relative to the right front of the electronic equipment are reserved, and the audio signals collected in the left direction and the right direction are suppressed. For a detailed description of the process, reference may be made to the above description of equations (4) - (6), which is not repeated herein.

The electronic device generates, by using the third filter coefficient, a formula related to the third beam corresponding to the third direction, such as the following formula (10) -formula (12), in combination with the first input audio signal, the third input audio signal, and the third audio input signal, and the description of the formula (10) -formula (12) may refer to the description of the formula (4) -formula (6):

y3 denotes a third beam corresponding to the third direction, which includes N elements. Any element is used to represent a frequency bin. The number of frequency points corresponding to the third beam is the same as the frequency points corresponding to the first audio signal, the second audio signal and the third audio signal.

In the formula w _3i (ω) is the third filter coefficient corresponding to the ith microphone in the third direction, w _3i And the jth element in the (omega) represents the degree of suppressing the audio signal corresponding to the jth frequency point in the audio signal.

For example, the jth element of the ith microphone in the third filter coefficient corresponding to the third direction is denoted as d _ji The jth element in the audio signal corresponding to the ith microphone is denoted as b _ij . The above formula can be expressed as the following formula (11):

the third beam corresponding to the third direction can be specifically expressed as the following formula (12):

the process of synthesizing the first audio signal, the second audio signal and the third audio signal by the electronic device can be seen from the above formula (12). And in the process of synthesis, the audio signals collected in the right direction relative to the right front of the electronic equipment in the first audio signal, the second audio signal and the third audio signal are reserved, and the audio signals collected in the right front direction and the left direction are suppressed. For a detailed description of the process, reference may be made to the above description of equations (4) - (6), which is not repeated herein.

S203, according to a first zoom ratio, generating a left channel audio signal by using the first beam and the second beam and generating a right channel audio signal by using the second beam and the third beam

The audio signals collected in the left channel audio signal in the left direction relative to the front and the front of the electronic equipment and in the left direction are reserved, but the audio signals collected in the right direction are suppressed. The audio signal in the frequency domain includes N frequency points (the number of the frequency points is the same as the frequency points corresponding to the first audio signal, the second audio signal and the third audio signal).

The audio signals collected from the right channel audio signals in the right direction relative to the front and the right direction of the electronic device are retained, but the audio signals collected from the left direction are suppressed. The audio signal in the frequency domain includes N frequency points (the number of the frequency points is the same as the frequency points corresponding to the first audio signal, the second audio signal and the third audio signal).

Specifically, the electronic device may fuse the first beam and the second beam into a left channel audio signal according to the fusion coefficient, and the following formula (13) may be referred to in the process:

y _l ＝αy ₁ +(1-α)y ₂ formula (13)

In the above formula (13), y _l Representing the left channel audio signal, y1 representing a first beam, y2 representing a second beam, α being the fusion systemThe number, the specific value of which is preset into the electronic device. The value of the fusion coefficient is directly related to the first zooming multiplying power, any first zooming multiplying power only corresponds to one fusion coefficient, the electronic equipment can determine the fusion coefficient according to the first zooming multiplying power, and the value of the fusion coefficient is [0,1 ]]. The larger the zoom magnification, the smaller the fusion coefficient. For example, when the zoom magnification is one zoom magnification, the fusion coefficient may be 1. When the zoom magnification is the maximum zoom magnification, the fusion coefficient may be 0.

The electronic device fuses the first beam and the second beam into a left channel audio signal according to the fusion coefficient, and the following formula (14) can be referred to in the process:

y _r ＝αy ₃ +(1-α)y ₂ formula (14)

In the above formula (14), y _r For the right channel audio signal, the detailed description of the equation (14) can refer to the foregoing description of equation (13), and will not be described herein again.

The fusion coefficient is used for determining whether the left channel audio signal is more left or more right in front and whether the right channel audio signal is more right or more right in front, and the value of the fusion coefficient is directly related to the zoom magnification. The principle is that, as the zoom magnification is increased, the angle of view is decreased (the degree of leftward deviation of the angle of view from the front is decreased, and the degree of rightward deviation of the angle of view from the front is decreased), and the left-channel audio signal and the right-channel audio signal should be more concentrated on the front, that is, the second direction, and as can be seen from equation (13) and equation (14), α should be decreased. It should be understood that the smaller the zoom magnification, the larger the field angle, and the larger α, so that the left channel audio signal can retain more of the audio signal that is left (i.e., in the first direction) relative to the front, and the right channel audio signal can retain more of the audio signal that is right (i.e., in the second direction) relative to the front.

It can be understood that different fusion coefficients α corresponding to different zoom magnifications result in different beam forming diagrams during fusion. For example, the shape of the monaural beam pattern at 1 × zoom magnification shown in fig. 9 (b), the monaural beam pattern at 5 × zoom magnification shown in fig. 10 (b), and the monaural beam pattern at 10 × zoom magnification shown in fig. 11 (b) are different. The beam forming pattern of the left channel and the beam forming pattern of the right channel at 1-time zoom magnification shown in fig. 9, (c) of fig. 10, the beam forming pattern of the left channel and the beam forming pattern of the right channel at 5-time zoom magnification shown in fig. 10, and the beam forming pattern of the left channel and the beam forming pattern of the right channel at 10-time zoom magnification shown in fig. 11, (c) of the foregoing fig. 9, are different in shape.

The above step S203 is optional.

In other embodiments, the first output audio signal includes only one audio signal.

The formula for the electronic device to generate the first output audio signal is as follows:

the parameters in the formula (15) can refer to the foregoing descriptions of the formula (13) and the formula (14), and are not described herein again. The formula (15) shows that the audio signals collected in the field angle range of the electronic device are reserved, and the audio signals collected in other directions are suppressed.

S105, the electronic equipment enhances or suppresses the first output audio signal according to the first zoom ratio to obtain a second output audio signal, wherein both a target sound signal and a non-target sound signal in the second output audio signal are enhanced or suppressed compared with the first output audio signal;

the enhancing of the first output audio signal means that the amplitude of the first output audio signal is adjusted to be larger, so that the decibel of the first output audio signal can be larger. The suppressing of the first output audio signal means that the amplitude of the first output audio signal is adjusted to be reduced, so that the decibel of the first output audio signal can be reduced.

When the first output audio signal includes a left channel audio signal and a right channel audio signal, the second output audio signal also includes a left channel audio signal and a right channel audio signal. The enhancing or suppressing of the first output audio signal in step S105 is to enhance or suppress the left channel audio signal and the right channel audio signal in the first output audio signal, and the process of enhancing or suppressing the left channel audio signal and the right channel audio signal in the first output audio signal can refer to the following description of step S401 to step S404.

In the case where the first output audio signal includes only one audio signal, the second output audio signal also includes only one audio signal. The enhancement or suppression of the first output audio signal in step S105 is to enhance or suppress the audio signal, and the process can also refer to the following description of steps S401 to S404.

In the case where the first zoom magnification becomes large, the closer the subject is at the time of imaging, the larger the sound should be, and the electronic device may enhance the first output audio signal. In the case where the first zoom magnification becomes small, the electronic apparatus can suppress the first output audio signal as the sound should be smaller the farther the subject is when imaging.

FIG. 15 is an exemplary flow chart for adjusting the amplitude of the first output audio signal by the electronic device:

the electronic device may refer to steps S401 to S404 described below to adjust the amplitude of the first output audio signal according to the first zoom magnification.

S401, the electronic equipment determines an adjustment parameter corresponding to a first zooming magnification according to the first zooming magnification, wherein the adjustment parameter is used for adjusting the amplitude of an audio signal, and the adjustment comprises one of enhancement or suppression;

for the adjustment parameter, a specific value thereof is set in advance in the electronic device. The value of the adjustment parameter is directly related to the zoom magnification, and any first zoom magnification corresponds to only one adjustment parameter. The electronic device may determine an adjustment parameter for the first zoom magnification based on the first zoom magnification, where the adjustment parameter is a numerical value having a unit of dB that is the same as the unit of the amplitude.

The adjustment parameter is used to adjust an amplitude of the audio signal, the adjusting including one of increasing or suppressing.

Specifically, when the first zoom magnification is greater than 1 zoom magnification, the larger the first zoom magnification is, the larger the adjustment parameter is, and the larger the adjustment parameter is positive, at this time, the electronic device uses the adjustment coefficient to enhance the first audio signal, and the larger the adjustment parameter is, the larger the enhancement degree is. When the first zoom magnification is less than 1 zoom magnification, the smaller the first zoom magnification is, the smaller the adjustment parameter is, and the adjustment parameter is negative, and at this time, the electronic device uses the adjustment coefficient to suppress the first audio signal, and the smaller the adjustment parameter is, the greater the suppression degree is.

S402, converting the first output audio signal from a frequency domain to a time domain to obtain a first output audio signal in the time domain;

the adjusting of the amplitude of the first audio signal by the electronic device may be performed in a time domain, and the electronic device may convert the first output audio signal from the frequency domain to the time domain using an Inverse Fourier Transform (IFT). A first output audio signal in the time domain is obtained.

The first output audio signal in the time domain is a digital audio signal, which may be a sampling point of W analog electrical signals. The first input audio signal may be represented by an array in the electronic device, where any element in the array is used to represent a sampling point, and any element includes two values, where one value represents time, and the other value represents the amplitude of the audio signal corresponding to the time, and the amplitude is expressed in decibel (dB) and represents the decibel size of the audio signal corresponding to the time.

S403, adjusting the amplitude of the first output audio signal in the time domain by using the adjustment parameter to obtain a second output audio signal in the time domain;

in some embodiments, the electronic device may adjust the amplitude of the first output audio signal in the time domain by using an automatic level control (AGC) or a Dynamic Range Compression (DRC) method.

The following detailed description is given taking an example in which the electronic device uses a dynamic range control algorithm to adjust the amplitude of the first output audio signal in the time domain:

in a possible case, when the first zoom magnification is greater than 1, the electronic device may increase the amplitudes of all the sampling points in the first output audio signal in the time domain, and the formula for the electronic device to increase the amplitude of the first output audio signal in the time domain by using the adjustment parameter is as follows:

A′ _i ＝A _i + | D | i ∈ (1, M) equation (16)

When the first zoom magnification is smaller than 1, the electronic device uses the adjustment parameter to adjust the amplitude of the first output audio signal in the time domain by using the adjustment parameter according to the following formula:

A′ _i ＝A _i - | D | i ∈ (1, M) formula (17)

In formula (16) -formula (17), A _i Represents the amplitude, A 'of the ith sampling point' _i Indicating the adjusted amplitude. D is an adjusting parameter. M is the total number of sampling points.

And S404, converting the second output audio signal in the time domain into the frequency domain to be used as a second output audio signal.

The process of step S404 is similar to the process of step S102, and reference may be made to the foregoing description of step S102, which is not repeated herein.

The second audio signal is an audio signal in the frequency domain.

When the second audio signal comprises a left channel audio signal and a right channel audio signal, the left channel audio signal and the right channel audio signal respectively correspond to N frequency points, and the number of the corresponding frequency points is the same as the frequency points corresponding to the first audio signal, the second audio signal and the third audio signal.

When the second audio signal only comprises one path of audio signal, the second output audio signal corresponds to N frequency points, and the number of the corresponding frequency points is the same as that of the frequency points corresponding to the first audio signal, the second audio signal and the third audio signal.

And S106, the electronic equipment combines the first audio signal, the second audio signal and the third audio signal according to the first zoom ratio to suppress the non-target sound signal in the second output audio signal again to generate a third output audio signal.

In the case where the second output audio signal includes a left channel audio signal and a right channel audio signal, the third output audio signal also includes a left channel audio signal and a right channel audio signal. The suppression of the non-target sound signals in the second output audio signals, that is, the suppression of the non-target sound signals in the left channel audio signals and the right channel audio signals in the second output audio signals, which is referred to in step S106, can refer to the following description of steps S501 to S504.

In the case that the second output audio signal includes only one audio signal, the third output audio signal also includes only one audio signal. The procedure of suppressing the non-target sound signal in the second output audio signal, which is involved in step S106, can also refer to the following description of steps S501 to S504.

In the step S105, the electronic device may enhance the first output audio signal to obtain a second output audio signal, in which a non-target sound signal in the first output audio signal is also enhanced in the process, so that the second output audio signal includes the non-target sound signal, and in order to make the audio signal played by the electronic device not affected by the non-target sound signal, the electronic device may further suppress the non-target sound signal.

In some embodiments, the electronic device may filter the first output audio signal by using a filtering method in the prior art, and further suppress the non-target sound signal, where the common filtering methods are spectral subtraction, wiener filtering, and the like.

In other embodiments, it is difficult to filter both high-frequency audio signals and low-frequency audio signals in an audio signal completely due to a single filtering algorithm. Therefore, the electronic device may calculate a first target gain using a first filtering algorithm that performs better at high frequencies, and then filter the audio signal of high frequencies in the second output audio signal using the first target gain. Meanwhile, a second target gain can be calculated by using a second filtering algorithm which performs better at low frequencies, and then the audio signals of low frequencies in the second output audio signal are filtered by using the second target gain.

Wherein the first filtering algorithm may comprise a zelinski filter algorithm. The second filtering algorithm may include a coherent-to-dispersion power ratio (CDR) algorithm.

Fig. 16 is a schematic flow chart of the electronic device suppressing a non-target sound signal in the second output audio signal.

S501, performing Zelinsky filtering by using the first audio signal, the second audio signal and the third audio signal to obtain a first target gain;

the first target gain is calculated by a zerlington filter algorithm, and the specific description may refer to the following formula (18) and formula (19), which are used to filter the non-target sound signals corresponding to the high-frequency points in the second output audio signal.

The first target gain is N target gain values (the size of N is the same as the number of frequency points in the second output audio signal), and if the value of any gain is not greater than a first threshold, the value is equal to the first threshold, wherein the first threshold is preset in the electronic device. The first threshold may be obtained by a difference (Δ) between the suppression degree of the non-target sound signal in step S104 and the enhancement degree of the non-target sound in step S105, and the corresponding relation to the difference is Δ =20logx, where x is the first threshold. And when the target gain value is equal to the first threshold value, the sound signals corresponding to the frequency point are inhibited but not removed.

The first threshold is preset in the electronic device, and any first zoom magnification corresponds to only one first threshold, which can be obtained by a difference (Δ) between the suppression degree of the non-target sound signal in step S104 and the enhancement degree of the non-target sound in step S105, and the correspondence relationship between the difference and the first threshold is Δ =20logx, where x is the first threshold. The difference (Δ) is determined in the following manner: in step S104, the non-target sound signal may be suppressed, where the suppression degree of the suppression degree is directly related to the first zoom ratio, and any first zoom ratio uniquely corresponds to one suppression degree, and the suppression degree is a suppression parameter, and is a numerical value, and the unit of the suppression degree is the same as the unit of the amplitude, and is dB. The degree of enhancement of the non-target sound signal in S105 is the aforementioned adjustment parameter. For example, if the degree of suppression is m (dB) and the degree of enhancement is n (dB), Δ = | m-n |.

Specifically, the electronic device may first calculate a first gain by using a first audio signal, the second audio signal, and the third audio signal, where the first gain is N gain values (the size of N is the same as the number of frequency points in the second output audio signal), and if a value of any gain is not equal to 1 or close to 1, the gain value is equal to 0 or close to 0, when the gain value is equal to 1 or close to 1, the sound signal corresponding to the frequency point is retained, and when the target gain value is equal to 0 or close to 0, the sound signal corresponding to the frequency point is inhibited and removed. The process of generating the first gain by the electronic device may refer to the following description of equation (18).

Then, the electronic device adjusts N gains of the first gains to obtain a first target gain. The process may refer to the following description of equation (19).

The equations involved in the electronic device generating the first target gain are equation (18) and equation (19) below:

in the formula (18), gain (ω) represents a first gain, h ₁ Representing a first threshold value, R is an operator taking a real part, N represents the number of microphones, i represents the ith microphone, j represents the jth microphone, and { i, j }. Epsilon.N represents that two microphones in the N microphones perform mutual calculation. Phi is a _ij (ω) represents a cross-power spectrum, φ, between the audio signal in the frequency domain corresponding to the ith microphone (e.g., the first audio signal corresponding to the first microphone) and the audio signal in the frequency domain corresponding to the jth microphone at the frequency point _ii (omega) represents the self-power spectrum, phi, of the first audio signal at the frequency bin _jj And (ω) represents the self-power spectrum of the second output audio signal at that frequency.

In the formula (19), gain ₁ (ω) represents a first target gain, gain _i And (ω) represents the ith gain value in the first gain, the electronic device may adjust the N gain values in the first gain using equation (19) such that the gain values greater than the first threshold remain unchanged, and the gain values less than the first threshold are adjusted to the first gain.

S502, the electronic equipment obtains a second target gain by utilizing the first audio signal, the second audio signal and the third audio signal based on a coherent diffusion power ratio algorithm;

the second target gain is calculated by a coherent diffusion power ratio algorithm, and the specific description may refer to the following formula (20) and formula (21), which are used to filter out a non-target sound signal corresponding to a low-frequency point in the second output audio signal.

The second target gain is N target gain values (the size of N is the same as the number of frequency points in the second output audio signal), and if the value of any gain is greater than the first threshold, the value is equal to the first threshold, where the specific description about the first threshold may refer to the description of the foregoing related contents, and details are not repeated here.

Specifically, the electronic device may first calculate a second gain by using the first audio signal, the second audio signal, and the third audio signal, where the second gain is N gain values (the size of N is the same as the number of frequency points in the second output audio signal), and if a value of any gain is not equal to 1 or close to 1, the value is equal to 0 or close to 0. For the detailed description, reference may be made to the description of the foregoing related contents, which are not repeated herein.

Then, the electronic device adjusts N gains in the second gains to obtain a second target gain. The process may refer to the following description of equation (21).

The formula involved in the electronic device generating the second target gain is the following formula (20) and formula (21):

in equation (20), gain' (ω) represents the second gain, R is an operator taking the real part,

the parameters in the representation can be referred to the description of equation (18) above. />

Wherein, pi represents a circumference ratio, f represents a frequency corresponding to a frequency point, d represents a distance between the ith microphone and the jth microphone, and c represents a sound propagation speed.

The related description in equation (21) can refer to the above description of equation (19), gain ₂ (ω) represents a second target gain, gain _i ' (omega) denotes in the second gainFor the ith gain value, the electronic device may adjust N gain values in the second gain using equation (21), so that the gain value greater than the first threshold remains unchanged, and the gain value smaller than the first threshold is adjusted to be the first gain.

S503, the electronic equipment combines the first target gain and the second target gain according to the frequency of the second output audio signal to generate a third target gain;

the first target gain is used for filtering out non-target sound signals corresponding to high-frequency points in the second output audio signal, and the second target gain is used for filtering out non-target sound signals corresponding to low-frequency points in the second output audio signal.

The electronic device may determine that frequencies greater than a (khz) are high frequencies and frequencies less than a (khz) are low frequencies, and a may take on a value of 0.5-1.5, e.g., 1.

The electronic device determines that the first K frequency points of the N frequency points corresponding to the second output audio signal are low frequencies and the last N-K frequency points are high frequencies, and then the electronic device may obtain the first K gain values of the first target gain and obtain the last N-K gain values of the second target gain. And placing the K gain values before the N-K gain values to obtain N gain values serving as a third target gain.

And S504, the electronic equipment suppresses the non-target sound signal in the second output audio signal by using the third target gain to generate a third output audio signal.

And the electronic equipment respectively adjusts the N frequency points corresponding to the second output audio signal by using the N gain values corresponding to the third target gain, and suppresses the non-target sound source in the second output audio signal. Specifically, the electronic device multiplies an ith gain value in the third target gain by an ith frequency point in the second output audio signal, when the ith gain value is greater than a first threshold, the sound signal corresponding to the frequency point is reserved, and when the target gain value is equal to the first threshold, the suppression of the sound signal corresponding to the frequency point is indicated.

And the electronic equipment takes the N frequency points obtained by adjusting the N frequency points corresponding to the second output audio signal as third output audio signals.

In some embodiments, steps S503-S504 described above are optional. The electronic device may not perform steps S503 to S504. Suppressing the non-target sound signal in the second output audio signal by: the electronic equipment determines that the first K frequency points in the N frequency points corresponding to the second output audio signal are low frequencies, then the electronic equipment can obtain first K gain values in the first target gain, and the electronic equipment adjusts the K frequency points by using the K gain values. Meanwhile, the electronic equipment determines that the last N-K frequency points corresponding to the second output audio signal are high frequency, then the electronic equipment can obtain the last N-K gain values in the second target gain, and the electronic equipment adjusts the N-K frequency points by utilizing the N-K gain values. Therefore, the N frequency points obtained by adjusting the N frequency points corresponding to the second output audio signal are used as third output audio signals. The process of adjusting the frequency point may refer to the foregoing description of step S504, and is not described herein again.

And S107, the electronic equipment stores the third output audio signal.

In some embodiments, the electronic device may buffer the third output audio signal into a buffer. This buffer may be the audio stream buffer referred to in fig. 7 above.

Scene 2: an exemplary set of user interfaces for use in the audio processing method of the present application in scenario 2 may be referred to the descriptions of user interfaces 80-83 in fig. 5 a-5 d above. For the real-time video processing method related to scene 2, the electronic device starts to record video, and can process the acquired current frame image in real time and perform audio zooming on the acquired current frame input audio signal set according to the change of the zooming magnification. Each time one frame of image and one frame of input audio signal set are processed, the image and the input audio signal set are played.

It should be understood that, since the audio processing process takes a certain amount of time, when the electronic device plays the processed current frame image, the played audio may not be the processed current frame input audio signal set, but may be the processed previous N frame input audio signal set. N is a positive integer greater than or equal to 1, and a specific value of N may be determined by factors such as a processing speed of the electronic device. However, the gap between these N frames of audio is not noticeable to the user.

Fig. 17 shows a schematic diagram of processing the current frame image and the current frame input audio signal set in real time in scene 2 and then playing the processed current frame image and current frame input audio signal set.

The process of processing the acquired current frame image and the current frame input audio signal set by the electronic device may refer to the description in fig. 7, and is not described herein again.

As shown in fig. 17, when the electronic device finishes processing the first frame image, the electronic device may play the processed first frame image and preview the first frame image. At this point, the electronic device has not processed the first set of frame input audio signals. Then, the electronic device processes the second frame image signal and plays the processed second frame image. Meanwhile, the electronic equipment processes the first frame of input audio signal set and plays the processed first frame of input audio signal set to preview the first frame of input audio signal set. When the electronic device plays the processed current frame image, the processed previous frame input audio signal set is played. For example, the electronic device processes the third frame of image signal and plays the processed third frame of image. And simultaneously, the electronic equipment processes the second frame input audio signal set and plays the processed second frame input audio signal set to preview the second frame input audio signal set. And by analogy, the electronic equipment processes the Nth frame of image signal and plays the processed Nth frame of image. Meanwhile, the electronic equipment processes the input audio signal set of the (N-1) th frame, plays the processed input audio signal set of the (N-1) th frame and previews the processed input audio signal set of the (N-1) th frame.

In some embodiments, the time for playing a frame of image by the electronic device is 30ms, and the time for playing a frame of audio by the electronic device is 10ms, so that in fig. 17, when the electronic device plays a frame of image, the input audio signal set includes 3 frames of audio.

The process of the processing procedure of the electronic device for any frame of input audio signal set related to the scene 2 is similar to the real-time video processing procedure related to the scene 1, and reference may be made to the foregoing description of steps S101 to S107, which is not repeated herein.

Scene 3: an exemplary set of user interfaces in scenario 3 when using the audio processing method of the present application may refer to the descriptions of user interfaces 90-95 in fig. 6 a-6 f, previously described. The electronic device can perform post-processing on the audio signal by using the audio processing method.

When any microphone of the electronic device collects the current frame input audio signal, the zoom magnification used for collecting the frame input audio signal can be stored, and one frame input audio signal corresponds to one zoom magnification. The electronic device collects N frames of input audio signals, namely N zoom magnifications can be obtained. Meanwhile, the electronic equipment can respectively store the N frames of input audio signals collected by any microphone to obtain input audio streams, and if the electronic equipment is provided with M microphones, M input audio streams can be obtained.

The electronic device may obtain the M input audio streams, and sequentially obtain N input audio signals of the M input audio streams from a first frame of input audio signals of the M input audio streams, for example, first obtain M first frame of input audio signals, then obtain M second frame of input audio signals, and so on. For the M ith frame input audio signals in the M input audio streams, the electronic device may perform audio zooming on the input audio signals using the methods of steps S101 to S107 described above.

Fig. 18 is an exemplary flowchart of an electronic device performing post-processing on an i-th frame audio signal by using the audio processing method according to the present application.

The process may refer to the following description of step S601 to step S609.

S601, the electronic equipment acquires a first input audio stream, a second input audio stream and a third input audio stream, wherein any input audio stream comprises multi-frame input audio signals;

one exemplary user interface for the electronic device to obtain the first input audio stream, the second input audio stream, and the third input audio stream may be as shown in user interface 92 of fig. 6 c.

The first input audio stream refers to a set of N frames of input audio signals captured by a first microphone of the electronic device.

The second input audio stream refers to a set of N frames of input audio signals captured by a second microphone of the electronic device.

The third input audio stream refers to a set of N frames of input audio signals captured by a second microphone of the electronic device.

S602, the electronic equipment acquires zoom information, wherein the zoom information comprises a plurality of first zoom magnifications;

the zoom information may include N first zoom magnifications. The ith first zooming magnification corresponds to an ith frame input audio signal collected by a microphone of the electronic equipment.

S603, the electronic equipment determines a first input audio signal from the first audio stream, determines a second input audio signal from the second audio stream and determines a third input audio signal from the third audio stream;

the first input audio signal is the frame of input audio signal with the earliest acquisition time in all the input audio signals of the first audio stream which are not subjected to audio zooming currently.

The second input audio signal is the frame of input audio signal with the earliest acquisition time in all the input audio signals which are not subjected to audio zooming currently in the second audio stream.

The third input audio signal is the frame of input audio signal with the earliest acquisition time in all the input audio signals of the third audio stream which are not subjected to audio zooming currently.

S604, the electronic equipment acquires a first zoom magnification corresponding to the acquisition of the first input audio signal, the second input audio signal and the third input audio signal from the zoom information;

the first zoom magnification is a first zoom magnification of zoom magnifications which are not acquired by the electronic equipment currently in the zoom information.

S605, converting the first input audio signal, the second input audio signal and the third audio signal into a frequency domain by the electronic equipment to obtain a first audio signal, a second audio signal and a third audio signal;

the process is the same as the description of step S102, and reference may be made to the description of step S102, which is not repeated herein.

S606, the electronic equipment generates a first output audio signal by using the first audio signal, the second audio signal and the third audio signal according to the first zoom ratio, wherein a target sound signal in the first output audio signal is unchanged, and a non-target sound signal is suppressed;

the process is the same as the description of step S104, and reference may be made to the description of step S104, which is not repeated herein.

S607, the electronic equipment enhances or suppresses the first output audio signal according to the first zoom ratio to obtain a second output audio signal, wherein both a target sound signal and a non-target sound signal in the second output audio signal are enhanced or suppressed compared with the first output audio signal;

the process is the same as the description of step S105, and reference may be made to the description of step S105, which is not repeated herein.

And S608, the electronic device combines the first audio signal, the second audio signal and the third audio signal according to the first zoom ratio to suppress a non-target sound signal in the second output audio signal, so as to generate a third output audio signal.

The process is the same as the description of step S106, and reference may be made to the description of step S106, which is not repeated herein.

And S609, the electronic equipment stores the third output audio signal.

The user interface involved in this step S609 may be as shown in the user interface 94 shown in fig. 6 e.

The process is the same as the description of step S107, and reference may be made to the description of step S107, which is not repeated herein.

In the embodiments of the present application, any audio signal may be referred to as audio or sound.

The audio signal collected by the electronic device may be referred to as a first audio, and the first audio may include audio signals collected by the electronic device, such as a first input audio signal, a second input audio signal, and a third input audio signal.

The third output audio signal may also be referred to as second audio.

The zoom magnification control may be referred to as a second control.

The first zoom magnification may be referred to as a second zoom magnification.

The video processing method in the embodiment of the present application may be suitable for processing the acquired input audio signal in real time when the electronic device records a video, for example, the processes related to scene 1 and scene 2. Post-processing of the audio stream, such as scene 3, may also be used. Of course, the aforementioned scenarios 1-3 are not limited. When the video processing method according to the present application is implemented in these scenes, the zoom magnification in the video played by the electronic device can be increased, the sound of the subject within the field angle can be enhanced, and the sound of the subject outside the field angle can be suppressed. The zoom magnification is reduced and the angle of view is increased, so that the sound of the subject within the angle of view is reduced and the sound of the subject outside the angle of view is suppressed.

An exemplary electronic device provided by embodiments of the present application is described below.

The following describes an embodiment specifically by taking an electronic device as an example. It should be understood that an electronic device may have more or fewer components than shown in the figures, may combine two or more components, or may have a different configuration of components. The various components shown in the figures may be implemented in hardware, software, or a combination of hardware and software, including one or more signal processing and/or application specific integrated circuits.

The electronic device may include: the mobile terminal includes a processor 110, an external memory interface 120, an internal memory 121, a Universal Serial Bus (USB) interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2, a mobile communication module 150, a wireless communication module 160, an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, a sensor module 180, a button 190, a motor 191, an indicator 192, a camera 193, a display screen 194, a Subscriber Identity Module (SIM) card interface 195, and the like. The sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, an ambient light sensor 180L, a bone conduction sensor 180M, and the like.

It is to be understood that the illustrated structure of the embodiment of the present invention does not limit the electronic device. In other embodiments of the present application, an electronic device may include more or fewer components than illustrated, or some components may be combined, or some components may be split, or a different arrangement of components may be used. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

Processor 110 may include one or more processing units, such as: the processor 110 may include an Application Processor (AP), a modem processor, a Graphics Processing Unit (GPU), an Image Signal Processor (ISP), a controller, a memory, a video codec, a Digital Signal Processor (DSP), a baseband processor, and/or a neural-Network Processing Unit (NPU), etc. The different processing units may be separate devices or may be integrated into one or more processors.

The controller can be a neural center and a command center of the electronic device. The controller can generate an operation control signal according to the instruction operation code and the timing signal to complete the control of instruction fetching and instruction execution.

A memory may also be provided in processor 110 for storing instructions and data. In some embodiments, the memory in the processor 110 is a cache memory. The memory may hold instructions or data that have just been used or recycled by the processor 110. If the processor 110 needs to reuse the instruction or data, it can be called directly from the memory. Avoiding repeated accesses reduces the latency of the processor 110, thereby increasing the efficiency of the system.

In some embodiments, processor 110 may include one or more interfaces. The interface may include an integrated circuit (I2C) interface, an integrated circuit built-in audio (I2S) interface, a Pulse Code Modulation (PCM) interface, a universal asynchronous receiver/transmitter (UART) interface, a Mobile Industry Processor Interface (MIPI), a general-purpose input/output (GPIO) interface, a Subscriber Identity Module (SIM) interface, and/or a Universal Serial Bus (USB) interface, etc.

It should be understood that the connection relationship between the modules according to the embodiment of the present invention is only illustrative, and is not limited to the structure of the electronic device. In other embodiments of the present application, the electronic device may also adopt different interface connection manners or a combination of multiple interface connection manners in the above embodiments.

The electronic device implements the display function through the GPU, the display screen 194, and the application processor, etc. The GPU is a microprocessor for image processing, and is connected to the display screen 194 and an application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. The processor 110 may include one or more GPUs that execute program instructions to generate or alter display information.

The display screen 194 is used to display images, video, and the like.

The electronic device may implement a shooting function through the ISP, the camera 193, the video codec, the GPU, the display screen 194, the application processor, and the like.

The ISP is used to process the data fed back by the camera 193. For example, when a photo is taken, the shutter is opened, light is transmitted to the camera photosensitive element through the lens, the optical signal is converted into an electrical signal, and the camera photosensitive element transmits the electrical signal to the ISP for processing and converting into an image visible to naked eyes. The ISP can also carry out algorithm optimization on the noise, brightness and skin color of the image. The ISP can also optimize parameters such as exposure, color temperature and the like of a shooting scene. In some embodiments, the ISP may be provided in camera 193.

The camera 193 is used to capture still images or video. The object generates an optical image through the lens and projects the optical image to the photosensitive element. The photosensitive element may be a Charge Coupled Device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor. The light sensing element converts the optical signal into an electrical signal, which is then passed to the ISP where it is converted into a digital image signal. And the ISP outputs the digital image signal to the DSP for processing. The DSP converts the digital image signal into image signal in standard RGB, YUV and other formats. In some embodiments, the electronic device may include 1 or N cameras 193, N being a positive integer greater than 1.

The digital signal processor is used for processing digital signals, and can process digital image signals and other digital signals. For example, when the electronic device selects a frequency point, the digital signal processor is used for performing fourier transform and the like on the frequency point energy.

Video codecs are used to compress or decompress digital video. The electronic device may support one or more video codecs. In this way, the electronic device can play or record video in a variety of encoding formats, such as: moving Picture Experts Group (MPEG) 1, MPEG2, MPEG3, MPEG4, and the like.

The NPU is a neural-network (NN) computing processor that processes input information quickly by using a biological neural network structure, for example, by using a transfer mode between neurons of a human brain, and can also learn by itself continuously. The NPU can realize applications such as intelligent cognition of electronic equipment, for example: image recognition, face recognition, speech recognition, text understanding, and the like.

The external memory interface 120 may be used to connect an external memory card, such as a Micro SD card, to extend the memory capability of the electronic device. The external memory card communicates with the processor 110 through the external memory interface 120 to implement a data storage function. For example, files such as music, video, etc. are saved in the external memory card.

The internal memory 121 may be used to store computer-executable program code, which includes instructions. The processor 110 executes various functional applications of the electronic device and data processing by executing instructions stored in the internal memory 121. The internal memory 121 may include a program storage area and a data storage area.

The electronic device may implement audio functions via the audio module 170, the speaker 170A, the receiver 170B, the microphone 170C, the headphone interface 170D, and the application processor. Such as music playing, recording, etc.

The audio module 170 is used to convert digital audio information into analog audio signals for output, and also used to convert analog audio inputs into digital audio signals. The audio module 170 may also be used to encode and decode audio signals. In some embodiments, the audio module 170 may be disposed in the processor 110, or some functional modules of the audio module 170 may be disposed in the processor 110.

The speaker 170A, also called a "horn", is used to convert the audio electrical signal into a sound signal. The electronic device can listen to music through the speaker 170A or listen to a hands-free call.

The receiver 170B, also called "earpiece", is used to convert the electrical audio signal into a sound signal. When the electronic device answers a call or voice information, it can answer the voice by placing the receiver 170B close to the ear of the person.

The microphone 170C, also referred to as a "microphone," is used to convert sound signals into electrical signals. When making a call or sending voice information, the user can input a voice signal to the microphone 170C by uttering a voice signal close to the microphone 170C through the mouth of the user. The electronic device may be provided with at least one microphone 170C. In other embodiments, the electronic device may be provided with two microphones 170C to achieve a noise reduction function in addition to collecting sound signals. In other embodiments, the electronic device may further include three, four or more microphones 170C to collect sound signals, reduce noise, identify sound sources, perform directional recording, and the like.

The earphone interface 170D is used to connect a wired earphone.

The gyro sensor 180B may be used to determine the motion pose of the electronic device. In some embodiments, the angular velocity of the electronic device about three axes (i.e., x, y, and z axes) may be determined by the gyroscope sensor 180B. The gyro sensor 180B may be used for photographing anti-shake.

The touch sensor 180K is also referred to as a "touch panel". The touch sensor 180K may be disposed on the display screen 194, and the touch sensor 180K and the display screen 194 form a touch screen, which is also called a "touch screen".

In this embodiment, the processor 110 may call a computer instruction stored in the internal memory 121 to enable the electronic device to execute the audio processing method in this embodiment.

In this embodiment of the present application, the internal memory 121 of the electronic device or a storage device externally connected to the storage interface 120 may store related instructions related to the video processing method according to the embodiment of the present application, so that the electronic device executes the video processing method according to the embodiment of the present application.

The work flow of the electronic device is exemplarily described below in connection with steps S101 to S107 and the hardware structure of the electronic device.

1. The electronic equipment acquires a first input audio signal, a second input audio signal and a third input audio signal;

in some embodiments, the touch sensor 180K of the electronic device receives a touch operation (triggered by the user touching the capture control), and a corresponding hardware interrupt is issued to the kernel layer. The kernel layer processes the touch operation into an original input event (including touch coordinates, a time stamp of the touch operation, and other information). The raw input events are stored at the kernel layer. And the application program framework layer acquires the original input event from the kernel layer and identifies the control corresponding to the input event.

For example, the touch operation is a touch single-click operation, and the control corresponding to the single-click operation is a shooting control in a camera application. The camera application calls an interface of the application framework layer, starts the camera application, then starts a microphone drive by calling the kernel layer, collects a first input audio signal through the first microphone, collects a second input audio signal through the second microphone, and collects a third input audio signal through the third microphone.

Specifically, the microphones 170C (the first microphone, the second microphone, and the third microphone) of the electronic device may convert the collected sound signals into analog electrical signals. The electrical signal is then converted into an audio signal in the time domain. The audio signal in the time domain is a digital audio signal and is stored in the form of 0 and 1, and a processor of the electronic device can process the audio signal in the time domain. The audio signals refer to a first input audio signal, a second input audio signal, and a third input audio signal.

The electronic device may store the first input audio signal, the second input audio signal, and the third input audio signal in the internal memory 121 or a storage device externally connected to the storage interface 120.

2. The electronic equipment converts the first input audio signal, the second input audio signal and the third audio signal to a frequency domain to obtain a first audio signal, a second audio signal and a third audio signal;

the digital signal processor of the electronic device obtains the first input audio signal, the second input audio signal, and the third audio signal from the internal memory 121 or a storage device externally connected to the storage interface 120. And the first audio signal, the second audio signal and the third audio signal are obtained by converting the signals from the time domain to the frequency domain through DFT.

The electronic device may store the first audio signal, the second audio signal, and the third audio signal in the internal memory 121 or a storage device externally connected to the storage interface 120.

3. The electronic equipment acquires a first zoom magnification;

in some embodiments, the touch sensor 180K of the electronic device receives a touch operation (triggered when the user touches the zoom magnification control), and a corresponding hardware interrupt is issued to the kernel layer. The kernel layer processes the touch operation into an original input event (including touch coordinates, timestamp of the touch operation, and the like). The raw input events are stored at the kernel layer. And the application program framework layer acquires the original input event from the kernel layer and identifies the control corresponding to the input event.

For example, the above touch operation is a slide operation, and the control corresponding to the slide operation is, for example, a zoom magnification control in a camera application. And the camera application calls an interface of the application frame layer to acquire a parameter corresponding to the zooming magnification control, namely the first zooming magnification.

4. The electronic equipment generates a first output audio signal by using the first audio signal, the second audio signal and the third audio signal according to the first zoom ratio;

the electronic device may obtain, through the processor 110, the first audio signal, the second audio signal, and the third audio signal stored in the memory 121 or a storage device externally connected to the storage interface 120. The processor 110 of the electronic device invokes the associated computer instructions to generate a first output audio signal based on the first audio signal, the second audio signal, and the third audio signal.

5. The electronic equipment enhances or suppresses the first output audio signal according to the first zoom magnification to obtain a second output audio signal;

the processor 110 of the electronic device invokes a relevant computer instruction to enhance or suppress the first output audio signal according to the first zoom magnification, so as to obtain a second output audio signal.

6. The electronic equipment suppresses the non-target sound signal in the second output audio signal to generate a third output audio signal;

the processor 110 of the electronic device invokes the associated computer instructions to suppress the non-target sound signal in the second output audio signal to generate a third output audio signal.

7. The electronic equipment stores the third output audio signal

The electronic device buffers the fourth output audio signal in a buffer.

The above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present application.

As used in the above embodiments, the term "when 8230; may be interpreted to mean" if 8230, "or" after 8230; or "in response to a determination of 8230," or "in response to a detection of 8230," depending on the context. Similarly, the phrase "at the time of determination of \8230;" or "if (a stated condition or event) is detected" may be interpreted to mean "if it is determined 8230;" or "in response to the determination of 8230;" or "upon detection (a stated condition or event)" or "in response to the detection (a stated condition or event)" depending on the context.

In the above embodiments, all or part of the implementation may be realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire (e.g., coaxial cable, fiber optic, digital subscriber line) or wirelessly (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid state disk), among others.

Those skilled in the art can understand that all or part of the processes in the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer readable storage medium and can include the processes of the method embodiments described above when executed. And the aforementioned storage medium includes: various media capable of storing program codes, such as ROM or RAM, magnetic or optical disks, etc.

Claims

1. A video processing method applied to an electronic device, the method comprising:

the electronic equipment starts a camera;

displaying a preview interface, wherein the preview interface comprises a first control;

detecting a first operation on the first control;

in response to the first operation, starting shooting;

displaying a shooting interface, wherein the shooting interface comprises a second control, and the second control is used for adjusting the zoom magnification;

at a first moment, displaying a first shot image with a zoom magnification of a first zoom magnification, wherein the first shot image comprises a first target object and a second target object;

detecting a second operation on the second control;

in response to the second operation, the zoom magnification is adjusted to a second zoom magnification, the second zoom magnification being greater than the first zoom magnification;

displaying a second photographed image including the first target object at a second time, the second photographed image not including a second target object;

at the second moment, the microphone collects a first audio, wherein the first audio comprises a first sound and a second sound, the first sound corresponds to the first target object, and the second sound corresponds to the second target object;

detecting a third operation on a third control;

stopping shooting in response to the third operation, saving the first video, wherein,

the first video includes the first captured image, the first video includes, at a second time, a second captured image and a second audio, the second audio is obtained by processing the first audio according to the second zoom magnification, the second audio includes a third sound and a fourth sound, the third sound corresponds to the first target object, the fourth sound corresponds to the second target object, the third sound is enhanced with respect to the first sound, and the fourth sound is suppressed with respect to the second sound.

2. The method of claim 1, further comprising:

the electronic equipment processes the first audio according to the second zoom ratio to obtain a first output audio, wherein the first sound in the first output audio is unchanged, and the second sound is suppressed;

enhancing the first output audio according to the second zoom magnification to obtain a second output audio; the first sound and the second sound in the second output audio are both enhanced;

and according to the second zoom magnification, combining the first audio to suppress a second sound in the second output audio to obtain a second audio.

3. The method of claim 2, wherein the first output audio comprises a single audio;

according to the second zoom magnification, processing the first audio to obtain a first output audio, specifically including:

the electronic equipment acquires a first filter coefficient corresponding to a first direction, a second filter coefficient corresponding to a second direction and a third filter coefficient corresponding to a third direction; the first direction is any one direction within the range from the positive front clockwise direction of the electronic equipment by 10 degrees to the positive front clockwise direction of the electronic equipment by 70 degrees; the second direction is any one direction within the range from the positive front of the electronic equipment by 10 degrees anticlockwise to the positive front of the electronic equipment by 10 degrees clockwise; the third direction is any one direction within the range from the positive front anticlockwise direction of the electronic equipment by 10 degrees to the positive front anticlockwise direction of the electronic equipment by 70 degrees;

obtaining a first wave beam corresponding to a first direction by combining the first filter coefficient with a first audio; obtaining a second wave beam corresponding to a second direction by combining the second filter coefficient with the first audio; obtaining a third wave beam corresponding to a third direction by combining the third filter coefficient with the first audio;

and the electronic equipment obtains the first output audio by using the first beam, the second beam and the third beam according to the second zoom magnification, wherein the first sound in the first output audio is unchanged, and the second sound is suppressed.

4. The method of claim 2 or 3, wherein the second output audio comprises a single audio;

according to the second zoom magnification, enhancing the first output audio to obtain a second output audio, specifically comprising:

determining an adjustment parameter corresponding to the second zoom magnification according to the second zoom magnification, wherein the adjustment parameter is used for enhancing the audio;

converting the first output audio from a frequency domain to a time domain to obtain a first output audio in the time domain;

enhancing the first output audio frequency in the time domain by using the adjusting parameter to obtain a second output audio frequency in the time domain;

converting the second output audio in the time domain to the frequency domain as the second output audio.

5. The method of any of claims 2-4, wherein the second audio comprises a single audio;

according to the second zoom magnification, in combination with the first audio, suppressing a second sound in the second output audio to obtain a second audio, specifically including:

performing Zelinsky filtering by using the first audio to obtain a first target gain; the first target gain is used for filtering out the sound corresponding to the high-frequency point in the second output audio;

obtaining a second target gain based on a coherent diffusion power ratio algorithm by using the first audio; the second target gain is used for filtering out sounds corresponding to low-frequency points in second output audio;

according to the frequency of a second output audio, combining a part of each of the first target gain and the second target gain to obtain a third target gain;

and utilizing the third target gain to suppress a second sound in the second output audio to obtain a second audio.

6. The method according to any one of claims 2-4, wherein the second audio comprises a single audio;

the electronic equipment obtains a second target gain based on a coherent diffusion power ratio algorithm by using the audio frequency; the second target gain is used for filtering out sounds corresponding to low-frequency points in second output audio;

and the electronic equipment suppresses the sound corresponding to the frequency point of the high frequency in the second audio by using the first target gain, and suppresses the sound corresponding to the frequency point of the low frequency in the second audio by using the second target gain to obtain a second audio.

7. The method of claim 2, wherein the first output audio comprises left channel audio and right channel audio;

according to the second zoom ratio, processing the first audio to obtain a first output audio, which specifically includes: acquiring a first filter coefficient corresponding to a first direction, a second filter coefficient corresponding to a second direction and a third filter coefficient corresponding to a third direction; the first direction is any one direction within the range from the positive front clockwise direction of the electronic equipment by 10 degrees to the positive front clockwise direction of the electronic equipment by 70 degrees; the second direction is any one direction within the range from the positive front of the electronic equipment by 10 degrees anticlockwise to the positive front of the electronic equipment by 10 degrees clockwise; the third direction is any one of the directions within the range from 10 degrees in the clockwise direction from the front of the electronic equipment to 70 degrees in the counterclockwise direction from the front of the electronic equipment;

obtaining a left channel audio in the first output audio by using the first beam and the second beam according to the second zoom ratio; and obtaining a right channel audio in the first output audio by using the second beam and the third beam, wherein the first sound in the left channel audio and the first sound in the right channel audio are not changed, and the second sound is suppressed.

8. The method of claim 6 or 7, wherein the second output audio comprises left channel audio and right channel audio;

determining an adjustment parameter corresponding to the second zooming magnification according to the second zooming magnification, wherein the adjustment parameter is used for enhancing the audio;

converting the left channel audio and the right channel audio in the first output audio from the frequency domain to the time domain respectively to obtain the left channel audio and the right channel audio in the first output audio in the time domain;

respectively enhancing the left channel audio and the right channel audio in the first output audio in the time domain by using the adjusting parameters to obtain a left channel audio and a right channel audio in the second output audio in the time domain;

and converting the left channel audio and the right channel audio in the second output audio in the time domain to the frequency domain to be used as the second output audio.

9. The method of any of claims 6-8, wherein the second audio comprises left channel audio and right channel audio;

obtaining a second target gain based on a coherent diffusion power ratio algorithm by using the first audio; the second target gain is used for filtering out sound corresponding to a low-frequency point in second output audio;

according to the frequency of a second output audio, combining a part of the first target gain and a part of the second target gain to obtain a third target gain;

and utilizing the third target gain to respectively suppress second sound in the left channel audio and the right channel audio in the second output audio to obtain the left channel audio and the right channel audio in the second audio.

10. The method of claim 4 or 5, wherein the second audio comprises left channel audio and right channel audio;

according to the second zoom magnification, in combination with the audio, suppressing a second sound in the second output audio to obtain a second audio, specifically including:

and suppressing the sound corresponding to the frequency point of the high frequency in the left channel audio and the right channel audio in the second audio according to the first target gain, and suppressing the sound corresponding to the frequency point of the low frequency in the second audio in the left channel audio and the right channel audio in the second audio by using the second target gain to obtain the left channel audio and the right channel audio in the second audio.

11. The method according to any one of claims 3-10, wherein the first filter, the second filter coefficient, and the third filter are pre-set in the electronic device; in the first filter coefficient, a coefficient corresponding to a sound signal in a first direction is 1, which indicates that the sound signal in the first direction is not suppressed; the closer to the coefficient corresponding to the sound signal in the first direction, the closer to 1, the suppression degree increases in sequence; in the second filter coefficients, a coefficient corresponding to the audio signal in the second direction is 1, which indicates that the audio signal in the second direction is not suppressed; the closer to the coefficient corresponding to the sound signal in the second direction, the closer to 1, the suppression degree increases in sequence; in the third filter coefficients, a coefficient corresponding to a sound signal in a third direction is 1, which indicates that the sound signal in the third direction is not suppressed; the closer the coefficient corresponding to the sound signal in the third direction is to 1, the more the suppression degree increases in order.

12. The method according to any one of claims 4 to 11, wherein the adjustment parameter corresponding to the second zoom magnification is preset in the electronic device; the value of the adjustment parameter is directly related to the zoom magnification, and any second zoom magnification corresponds to only one adjustment parameter.

13. An electronic device, comprising one or more processors and one or more memories; wherein the one or more memories are coupled to the one or more processors for storing computer program code, the computer program code comprising computer instructions that, when executed by the one or more processors, cause the electronic device to perform the method of any of claims 1-12.

14. A chip system for application to an electronic device, the chip system comprising one or more processors for invoking computer instructions to cause the electronic device to perform the method of any of claims 1-12.

15. A computer program product comprising instructions for causing an electronic device to perform the method according to any of claims 1-12 when the computer program product is run on the electronic device.

16. A computer-readable storage medium comprising instructions that, when executed on an electronic device, cause the electronic device to perform the method of any of claims 1-12.