CN117880696A

CN117880696A - Sound mixing method, device, computer equipment and storage medium

Info

Publication number: CN117880696A
Application number: CN202211245868.9A
Authority: CN
Inventors: 陈明良
Original assignee: Guangzhou Kaidelian Software Technology Co ltd
Current assignee: Guangzhou Kaidelian Software Technology Co ltd
Priority date: 2022-10-12
Filing date: 2022-10-12
Publication date: 2024-04-12

Abstract

The application relates to a method, a device, a computer device and a storage medium for mixing sound, wherein the method comprises the following steps: acquiring a first audio signal of a near-field microphone and a second audio signal of a far-field microphone; performing first time delay alignment on the first audio signal and the second audio signal to obtain a third audio signal and a fourth audio signal; the third audio signal is a signal obtained by performing first time delay alignment on the first audio signal, and the fourth audio signal is a signal obtained by performing first time delay alignment on the second audio signal; detecting a first volume of the third audio signal and a second volume of the fourth audio signal; and comparing the first volume and the second volume with a preset threshold value, and performing audio mixing processing on the third audio signal and the fourth audio signal according to a audio mixing method corresponding to the comparison result to obtain an audio mixing result, thereby improving the audio mixing quality of the sound of a teacher and a student.

Description

Sound mixing method, device, computer equipment and storage medium

Technical Field

The present disclosure relates to the field of audio processing technologies, and in particular, to a method and apparatus for mixing audio, a computer device, and a storage medium.

Background

In a remote teaching scene, a recording and broadcasting host, a near-field microphone and a far-field microphone are arranged in a classroom, the near-field microphone collects the sound of a teacher, the far-field microphone collects the sound of a student, the recording and broadcasting host receives the sound of the teacher collected by the near-field microphone and the sound of the student collected by the far-field microphone and mixes the sound of the teacher and the sound of the student, so that the sound of the teacher and the sound of the student can be accurately mixed together to obtain a sound mixing result, the sound mixing result is shared to a network teaching platform, and the sharing of teaching resources is realized.

Because the sound can attenuate in the propagation process, in terms of the sound of the teacher, the signal-to-noise ratio of the sound of the teacher collected by the far-field microphone is low, so that the quality of the sound of the teacher can be influenced when the sound of the teacher collected by the far-field microphone is mixed with the sound of the teacher collected by the near-field microphone.

Disclosure of Invention

Accordingly, an object of the present application is to provide a mixing method, apparatus, computer device, and storage medium, which can improve the mixing quality of sound of a teacher and a student.

According to a first aspect of embodiments of the present application, there is provided a mixing method, including the steps of:

Acquiring a first audio signal of a near-field microphone and a second audio signal of a far-field microphone;

performing first time delay alignment on the first audio signal and the second audio signal to obtain a third audio signal and a fourth audio signal; the third audio signal is a signal obtained by performing first time delay alignment on the first audio signal, and the fourth audio signal is a signal obtained by performing first time delay alignment on the second audio signal;

detecting a first volume of the third audio signal and a second volume of the fourth audio signal;

comparing the first volume and the second volume with a preset threshold value, and performing audio mixing processing on the third audio signal and the fourth audio signal according to a audio mixing method corresponding to the comparison result to obtain an audio mixing result.

According to a second aspect of embodiments of the present application, there is provided a mixing device, including:

the signal acquisition module is used for acquiring a first audio signal of the near-field microphone and a second audio signal of the far-field microphone;

the signal alignment module is used for performing first time delay alignment on the first audio signal and the second audio signal to obtain a third audio signal and a fourth audio signal; the third audio signal is a signal obtained by performing first time delay alignment on the first audio signal, and the fourth audio signal is a signal obtained by performing first time delay alignment on the second audio signal;

The volume detection module is used for detecting the first volume of the third audio signal and the second volume of the fourth audio signal;

and the signal mixing module is used for comparing the first volume and the second volume with a preset threshold value, and carrying out mixing processing on the third audio signal and the fourth audio signal according to a mixing method corresponding to the comparison result to obtain a mixing result.

According to a third aspect of embodiments of the present application, there is provided a computer device comprising: a processor and a memory; wherein the memory stores a computer program adapted to be loaded by the processor and to perform the mixing method according to any of the preceding claims.

According to a fourth aspect of embodiments of the present application, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a mixing method as described in any of the above.

The method comprises the steps of obtaining a first audio signal of a near-field microphone and a second audio signal of a far-field microphone; performing first time delay alignment on the first audio signal and the second audio signal to obtain a third audio signal and a fourth audio signal; the third audio signal is a signal obtained by performing first time delay alignment on the first audio signal, and the fourth audio signal is a signal obtained by performing first time delay alignment on the second audio signal; detecting a first volume of the third audio signal and a second volume of the fourth audio signal; comparing the first volume and the second volume with a preset threshold value, and performing audio mixing processing on the third audio signal and the fourth audio signal according to the audio mixing method corresponding to the comparison result to obtain an audio mixing result, so that the corresponding audio mixing method is determined according to the first volume of the third audio signal and the second volume of the fourth audio signal, and the audio mixing quality of the teacher and the students is improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

For a better understanding and implementation, the present invention is described in detail below with reference to the drawings.

Drawings

Fig. 1 is an application scenario schematic diagram of a mixing method according to an embodiment of the present application;

fig. 2 is a flow chart of a mixing method according to an embodiment of the present application;

fig. 3 is a schematic flow chart of step S40 in the audio mixing method according to an embodiment of the present application;

fig. 4 is a flowchart illustrating step S42 in the audio mixing method according to an embodiment of the present application;

fig. 5 is a flowchart illustrating step S43 in the audio mixing method according to an embodiment of the present application;

fig. 6 is a schematic flowchart of step S10 in the audio mixing method according to an embodiment of the present application;

fig. 7 is a block diagram of a sound mixing device according to an embodiment of the present application;

fig. 8 is a schematic block diagram of an electronic device according to an embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the following detailed description of the embodiments of the present application will be given with reference to the accompanying drawings.

It should be understood that the described embodiments are merely some, but not all, of the embodiments of the present application. All other embodiments, based on the embodiments herein, which would be apparent to one of ordinary skill in the art without making any inventive effort, are intended to be within the scope of the present application.

The terminology used in the embodiments of the application is for the purpose of describing particular embodiments only and is not intended to be limiting of the embodiments of the application. As used in this application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any or all possible combinations of one or more of the associated listed items.

When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present application as detailed in the accompanying claims. In the description of this application, it should be understood that the terms "first," "second," "third," and the like are used merely to distinguish between similar objects and are not necessarily used to describe a particular order or sequence, nor should they be construed to indicate or imply relative importance. The specific meaning of the terms in this application will be understood by those of ordinary skill in the art as the case may be.

Furthermore, in the description of the present application, unless otherwise indicated, "a plurality" means two or more. "and/or", describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship.

The application scenario of the audio mixing method in the embodiment of the application can be applied to a remote teaching scenario, and also can be applied to a meeting, a report, a lecture and other scenarios, and in the embodiment of the application, the scheme of the application is described by taking the remote teaching scenario as an illustration.

As shown in fig. 1, an application scenario of the audio mixing method in the embodiment of the present application includes a near-field microphone 100, a plurality of far-field microphones 110, and a recording and playing host 120. The near-field microphone 100 is a microphone carried by a teacher, and may be a microphone worn by the teacher or held by the teacher, for collecting the voice of the teacher teaching. Far-field microphones 110 may be installed at front and rear positions of a student area, for example, on front and rear walls of a classroom, respectively, to collect sounds of front-row students and rear-row students, respectively. The recording and broadcasting host 120 can be arranged behind a classroom, the near-field microphone 100 and the far-field microphone 110 are connected with the recording and broadcasting host 120 through wires or wirelessly, the recording and broadcasting host 120 receives audio signals collected by the near-field microphone 100 and the far-field microphone 110, and mixes teacher sound collected by the far-field microphone 110 with teacher sound collected by the near-field microphone 100, so that the sound of the teacher and the sound of students can be accurately mixed together, a mixing result is obtained, the mixing result is shared to a network teaching platform, and the sharing of teaching resources is realized.

Because the sound can attenuate in the propagation process, in terms of the sound of the teacher teaching, the signal to noise ratio of the teacher teaching sound collected by the far-field microphone 110 is lower than that of the near-field microphone 100, so that when the teacher teaching sound collected by the far-field microphone 110 and the teacher teaching sound collected by the near-field microphone 100 are mixed, the quality of the teacher teaching sound can be reduced, and the remote teaching quality is poor, so that teaching experience is affected.

Example 1

Please refer to fig. 1, which is a flowchart illustrating a method for mixing audio according to an embodiment of the present application. The audio mixing method provided by the embodiment of the application comprises the following steps:

s10: a first audio signal of the near-field microphone and a second audio signal of the far-field microphone are acquired.

The near-field microphone may be a microphone carried by the teacher, specifically, the near-field microphone may be a microphone worn by the teacher or a microphone held by the teacher, and the near-field microphone may also be a microphone disposed near the teacher.

The near-field microphone is used for collecting teaching sounds of a teacher end, the first audio signal collected by the near-field microphone is generally only an audio signal of the teacher, when students answer questions or read aloud, the near-field microphone can collect sounds of the student end, and at the moment, the first audio signal collected by the near-field microphone also comprises the audio signal of the students.

The far-field microphone can be a microphone installed on a wall or a ceiling in the teaching room, and can also be a microphone built in the teaching recording and playing equipment.

The far-field microphone is used for collecting sounds at the student end, and can specifically comprise sounds of questions answered by students or sounds of a clatter, and the second audio signal of the far-field microphone is generally an audio signal of the students. When the teacher speaks, the sound of the teacher can be collected by the far-field microphone, and the second audio signal of the far-field microphone also comprises the audio signal of the teacher.

It is known that the first audio signal of the near-field microphone and the second audio signal of the far-field microphone may appear as signals collected for the same sound object, for example, when the teacher gives lessons or the students read aloud, the near-field microphone and the far-field microphone collect corresponding audio signals. Meanwhile, the first audio signal of the near-field microphone and the second audio signal of the far-field microphone may appear as signals collected for different sound production objects, for example, when speaking for a teacher, the student inserts a conversation, and the near-field microphone and the far-field microphone both collect audio signals mixed by the teacher and the student. For this reason, for this phenomenon, it is necessary to improve the audio quality of the same sound emission object and to preserve the sound of different sound emission objects at the time of mixing.

S20: performing first time delay alignment on the first audio signal and the second audio signal to obtain a third audio signal and a fourth audio signal; the third audio signal is a signal obtained by performing first time delay alignment on the first audio signal, and the fourth audio signal is a signal obtained by performing first time delay alignment on the second audio signal.

Among them, since the performance parameters of the microphones are different, for example, the performance parameters include directivity of the microphones, and the response capability of the microphones to sounds in different directions is different, there is an inherent delay difference between the near-field microphone and the far-field microphone when the same audio signal is collected. In the mixing process, if the inherent delay difference is not eliminated, the mixed audio signal is out of sync, so that the quality of the mixed audio is reduced. In order to eliminate the inherent delay difference between the near-field microphone and the far-field microphone, the first audio signal and the second audio signal are subjected to first delay alignment, and a third audio signal and a fourth audio signal are obtained.

In the embodiment of the present application, the delay value between the first audio signal and the second audio signal may be obtained according to the start time point of the first audio signal and the start time point of the second audio signal. If the starting time point of the first audio signal is earlier than the starting time point of the second audio signal, zero can be added in a time period corresponding to the time delay value before the first audio signal, and the first time delay alignment is carried out on the first audio signal and the second audio signal, so that a third audio signal and a fourth audio signal are obtained. If the starting time point of the first audio signal is later than the starting time point of the second audio signal, zero can be added in a time period corresponding to the time delay value before the second audio signal, and the first time delay alignment is carried out on the first audio signal and the second audio signal, so that a third audio signal and a fourth audio signal are obtained. After the first audio signal and the second audio signal are aligned, the first audio signal and the second audio signal are in one-to-one correspondence in time, so that the effect of time delay alignment is achieved.

S30: the first volume of the third audio signal and the second volume of the fourth audio signal are detected.

In the embodiment of the application, the first volume value and the second volume value of the first volume are obtained by detecting the first volume of the third audio signal and the second volume of the fourth audio signal.

S40: comparing the first volume and the second volume with a preset threshold value, and performing audio mixing processing on the third audio signal and the fourth audio signal according to a audio mixing method corresponding to the comparison result to obtain an audio mixing result.

In the embodiment of the application, the volume value of the first volume is compared with a preset threshold, and if the volume value of the first volume is greater than or equal to the preset threshold, the teacher is judged to be speaking or the student is speaking in a clatter. Further, comparing the volume value of the second volume with a preset threshold, and judging that the student is in the clatter reading if the volume value of the second volume is larger than or equal to the preset threshold. The third audio signal and the fourth audio signal may be mixed to obtain a mixing result. If the volume value of the second volume is smaller than the preset threshold value, the teacher is judged to be speaking, and the third audio signal can be used as a sound mixing result. If the volume value of the first volume is smaller than the preset threshold, it is judged that the teacher does not speak and the student does not read the sound, and if the student only answers the questions, the third audio signal and the fourth audio signal can be mixed to obtain a mixing result, and the fourth audio signal can also be directly used as the mixing result.

By applying the embodiment of the application, the first audio signal of the near-field microphone and the second audio signal of the far-field microphone are obtained; performing first time delay alignment on the first audio signal and the second audio signal to obtain a third audio signal and a fourth audio signal; the third audio signal is a signal obtained by performing first time delay alignment on the first audio signal, and the fourth audio signal is a signal obtained by performing first time delay alignment on the second audio signal; detecting a first volume of the third audio signal and a second volume of the fourth audio signal; and comparing the first volume and the second volume with a preset threshold value, and performing audio mixing processing on the third audio signal and the fourth audio signal according to the audio mixing method corresponding to the comparison result to obtain an audio mixing result, so that the corresponding audio mixing method is determined according to the first volume of the third audio signal and the second volume of the fourth audio signal, and the audio mixing quality of the teacher and the students is improved.

In an alternative embodiment, referring to fig. 3, the preset threshold includes a first preset threshold and a second preset threshold, step S40 compares the first volume and the second volume with the preset thresholds, and performs mixing processing on the third audio signal and the fourth audio signal according to a mixing method corresponding to the comparison result, so as to obtain a mixing result, including steps S41 to S43, which are specifically as follows:

S41: if the first volume is larger than or equal to a first preset threshold value and the second volume is smaller than a second preset threshold value, performing second time delay alignment on the third audio signal and the fourth audio signal to obtain a fifth audio signal and a sixth audio signal; the fifth audio signal is a signal obtained by performing second time delay alignment on the third audio signal, and the sixth audio signal is a signal obtained by performing second time delay alignment on the fourth audio signal.

The first preset threshold is a volume threshold when a teacher speaks, the second preset threshold is a volume threshold when a student speaks in a flush manner, and the second preset threshold is larger than the first preset threshold.

In this embodiment of the present application, if the first volume is greater than or equal to the first preset threshold, and the second volume is less than the second preset threshold, it may be determined that the teacher is speaking, and the student does not have a clatter reading. The student may be in the middle of a conversation or may not speak.

Wherein, because the near-field microphone and the far-field microphone are different from the teacher and the students, when only the teacher speaks, the time delay exists for the teacher's sound collected by both the first audio signal collected by the near-field microphone and the second audio signal collected by the far-field microphone. Specifically, when a teacher teaches a class, the teacher moves in a classroom, and the distance from the sound of the teacher to the near-field microphone is fixed and the distance from the sound of the teacher to the far-field microphone is variable, so that there is a time delay in the sound of the teacher collected by both the first audio signal collected by the near-field microphone and the second audio signal collected by the far-field microphone.

When only students speak, there is time delay in student sounds collected by both the first audio signal collected by the near-field microphone and the second audio signal collected by the far-field microphone. If the first audio signal and the second audio signal with time delay are directly subjected to coherence computation, the accuracy of the coherence computation is reduced.

In order to improve accuracy of coherence computation, second time delay alignment is performed on the third audio signal and the fourth audio signal, and a fifth audio signal and a sixth audio signal are obtained. Specifically, the delay values of the third audio signal and the fourth audio signal may be obtained according to the start time point of the third audio signal and the start time point of the fourth audio signal. If the start time point of the third audio signal is earlier than the start time point of the fourth audio signal, the fourth audio signal at a time after the time period corresponding to the delay time delay value may be acquired after the third audio signal is acquired. For example, if the third audio signal is 1ms earlier than the fourth audio signal, 1ms after the time of acquiring the third audio signal, the fourth audio signal is acquired again. If the starting time point of the third audio signal is later than the starting time point of the fourth audio signal, the third audio signal at a time after the time period corresponding to the delay time delay value can be obtained after the fourth audio signal is obtained, and the third audio signal and the fourth audio signal are subjected to second time delay alignment to obtain a fifth audio signal and a sixth audio signal.

S42: and calculating the coherence of the fifth audio signal and the sixth audio signal to obtain a coherence result.

Where the coherence of signals refers to the degree of correlation between signals. When the first audio signal and the second audio signal are the same audio signal, specifically, the first audio signal and the second audio signal are both audio signals of a teacher or are both audio signals of a student, the coherence is high. When the first audio signal and the second audio signal are both mixed audio signals, specifically, the first audio signal and the second audio signal are both mixed audio signals of a teacher and a student, the coherence is low.

In the embodiment of the present application, the coherence result is obtained by calculating the coherence of the fifth audio signal and the sixth audio signal, and according to the coherence result, it is determined whether the first audio signal and the second audio signal are the same audio signal.

S43: and obtaining a mixing result of the fifth audio signal and the sixth audio signal according to the coherence result.

In the embodiment of the present application, the coherence result may be a specific value, and different values correspond to different mixing methods. Specifically, there may be a mapping relationship between a value and a mixing method, where the value is greater than or equal to a preset coherence threshold, and the corresponding mixing method is to directly use the fifth audio signal as a mixing result. When the value is smaller than the preset coherence threshold, the corresponding audio mixing method is to mix the fifth audio signal and the sixth audio signal, and the audio signal after audio mixing is used as a mixing result.

The mapping relation exists between the interval where the numerical value is located and the mixing method, and different intervals correspond to different mixing methods. Specifically, when the value is located in the first interval, the corresponding audio mixing method is to directly use the fifth audio signal as the audio mixing result. When the numerical value is in the second interval, the corresponding audio mixing method is to mix the fifth audio signal and the sixth audio signal, and the audio signal after the audio mixing is used as a mixing result. Wherein the value in the first interval is greater than the value in the second interval.

By calculating the coherence of the fifth audio signal and the sixth audio signal, it is possible to distinguish whether the first audio signal and the second audio signal are the same audio signal or a mixed audio signal, thereby improving the quality of sound mixing of a teacher and a student.

In an alternative embodiment, referring to fig. 4, step S42 calculates coherence of the fifth audio signal and the sixth audio signal to obtain a coherence result, including steps S421 to S422, specifically as follows:

s421: if the first volume is larger than or equal to a first preset threshold and the second volume is smaller than a second preset threshold, respectively performing time-frequency conversion on the fifth audio signal and the sixth audio signal to obtain a first frequency domain signal corresponding to the fifth audio signal and a second frequency domain signal corresponding to the sixth audio signal.

The first frequency domain signal corresponds to a fifth audio signal, the second frequency domain signal corresponds to a sixth audio signal, the first frequency domain signal and the second frequency domain signal correspond to the same frequency domain, and the frequency domain comprises a plurality of frequency points. The time-frequency transformation method is the prior art and will not be described in detail here.

S422: the coherence result is obtained by dividing the square of the cross-power spectrum of the first frequency domain signal and the second frequency domain signal by the product between the power spectrum of the first frequency domain signal and the power spectrum of the second frequency domain signal.

In the embodiment of the present application, the calculation formula of the coherence result is as follows:

wherein S is _yx (omega) represents the mutual power of the first frequency domain signal and the second frequency domain signal at the omega frequency pointSpectrum, S _x (omega) represents the power spectrum of the first frequency domain signal at the omega frequency point, S _y And (omega) represents the power spectrum of the second frequency domain signal at omega frequency points.

The coherence of the fifth audio signal and the sixth audio signal can be automatically and quickly calculated by the first frequency domain signal and the second frequency domain signal.

In an alternative embodiment, referring to fig. 5, step S43 includes steps S431 to S432 of obtaining a mixing result of the fifth audio signal and the sixth audio signal according to the coherence result, specifically as follows:

S431: and if the coherence result is greater than or equal to a preset coherence threshold, taking the fifth audio signal as a mixing result.

If the first audio signal and the second audio signal are the same audio signal, specifically, if the first audio signal and the second audio signal are both audio signals of a teacher, only the first audio signal may be reserved and the second audio signal may be ignored during mixing. This is because the teacher's voice attenuates during the propagation so that the signal-to-noise ratio of the first audio signal is higher than that of the second audio signal, which is mixed with the first audio signal to have a lower audio quality than the pure first audio signal.

In the embodiment of the present application, the calculated coherence result is compared with a preset coherence threshold, if the coherence result is greater than or equal to the preset coherence threshold, it indicates that the coherence between the fifth audio signal and the sixth audio signal is high, the fifth audio signal and the sixth audio signal are both audio signals of a teacher, and the fifth audio signal is reserved as a mixing result.

S432: and if the coherence result is smaller than a preset coherence threshold, mixing the fifth audio signal and the sixth audio signal to obtain a first mixed signal, and taking the first mixed signal as a mixed result.

If the first audio signal and the second audio signal are mixed audio signals of a teacher and a student, that is, the teacher speaks and the student inserts a speech, the fifth audio signal and the sixth audio signal can be mixed to obtain a mixing result, so that the speaking sounds of the teacher and the student are ensured to be kept.

In the embodiment of the present application, if the coherence result is smaller than the preset coherence threshold, it indicates that the coherence between the fifth audio signal and the sixth audio signal is low, the fifth audio signal and the sixth audio signal are both mixed audio signals of a teacher and a student, the fifth audio signal and the sixth audio signal are mixed to obtain a first mixed audio signal, and the first mixed audio signal is used as a mixed audio result, so that the sound of the teacher and the student is simultaneously saved.

By comparing the calculated coherence result with a preset coherence threshold, the mixing result of the fifth audio signal and the sixth audio signal can be automatically and quickly determined according to the comparison result.

In an optional embodiment, step S40 compares the first volume and the second volume with a preset threshold, and performs a mixing process on the third audio signal and the fourth audio signal according to a mixing method corresponding to the comparison result, so as to obtain a mixing result, which includes step S44, specifically includes the following steps:

S44: and if the first volume is larger than or equal to a first preset threshold value and the second volume is larger than or equal to a second preset threshold value, mixing the third audio signal and the fourth audio signal to obtain a second mixed signal, and taking the second mixed signal as a mixed result.

In this application embodiment, when only student's multiple sound is read aloud, because student's multiple sound is loud, far-field microphone and near-field microphone all can gather student's multiple sound and read aloud's sound. If the first volume is greater than or equal to the first preset threshold and the second volume is greater than or equal to the second preset threshold, the student can be judged to read in a clatter. At this time, the teacher may or may not be speaking.

And mixing the third audio signal and the fourth audio signal to obtain a second mixed signal, and taking the second mixed signal as a mixed result, thereby ensuring that the sound of the teacher and the students in the same sound reading is kept.

In an alternative embodiment, referring to fig. 6, step S10 of acquiring a first audio signal of a near-field microphone and a second audio signal of a far-field microphone includes steps S101 to S102, which are specifically as follows:

s101: acquiring a first audio signal of a near-field microphone and a third audio signal of a far-field microphone;

S102: and taking the first audio signal as a reference signal, and carrying out silencing treatment on the third audio signal to obtain a second audio signal of the far-field microphone.

In the embodiment of the present application, the third audio signal of the far-field microphone may be an audio signal of a teacher collected by the far-field microphone, or may be a mixed audio signal of a teacher and a student collected by the far-field microphone. And the first audio signal is used as a reference signal, and the third audio signal is subjected to silencing treatment, so that the audio signal of a teacher in the third audio signal is removed, and a second audio signal is obtained, wherein the second audio signal is the audio signal of only students. The silencing treatment can be performed by adopting an echo cancellation method or an adaptive filtering algorithm.

The second audio signal of the far-field microphone is obtained by carrying out silencing treatment on the third audio signal, and as the second audio signal does not have the audio signal of a teacher, the second audio signal of the far-field microphone and the first audio signal of the near-field microphone can be directly mixed, so that the sound mixing quality of the teacher and students is not reduced.

Example 2

The following are examples of apparatus that may be used to perform the method of example 1 of the present application. For details not disclosed in the device embodiments of the present application, please refer to the method in embodiment 1 of the present application.

Fig. 7 is a schematic structural diagram of a sound mixing device according to an embodiment of the present disclosure. The audio mixing apparatus 5 provided in the embodiment of the present application includes:

a signal acquisition module 51 for acquiring a first audio signal of a near-field microphone and a second audio signal of a far-field microphone;

a signal alignment module 52, configured to perform a first time delay alignment on the first audio signal and the second audio signal, so as to obtain a third audio signal and a fourth audio signal; the third audio signal is a signal obtained by performing first time delay alignment on the first audio signal, and the fourth audio signal is a signal obtained by performing first time delay alignment on the second audio signal;

a volume detection module 53 for detecting a first volume of the third audio signal and a second volume of the fourth audio signal;

the signal mixing module 54 is configured to compare the first volume and the second volume with a preset threshold, and perform mixing processing on the third audio signal and the fourth audio signal according to a mixing method corresponding to the comparison result, so as to obtain a mixing result.

Optionally, the signal acquisition module includes:

a first audio signal acquisition unit configured to acquire a first audio signal of a near-field microphone and a third audio signal of a far-field microphone;

And the signal silencing processing unit is used for silencing the third audio signal by taking the first audio signal as a reference signal to obtain a second audio signal of the far-field microphone.

Optionally, the signal mixing module includes:

the signal alignment unit is used for performing second time delay alignment on the third audio signal and the fourth audio signal if the first volume is larger than or equal to a first preset threshold value and the second volume is smaller than a second preset threshold value, so as to obtain a fifth audio signal and a sixth audio signal; the fifth audio signal is a signal obtained by performing second time delay alignment on the third audio signal, and the sixth audio signal is a signal obtained by performing second time delay alignment on the fourth audio signal;

a coherence calculating unit for calculating coherence of the fifth audio signal and the sixth audio signal to obtain a coherence result;

and a mixing result obtaining unit for obtaining the mixing result of the fifth audio signal and the sixth audio signal according to the coherence result.

Optionally, the coherence calculating unit includes:

the time-frequency conversion unit is used for respectively performing time-frequency conversion on the fifth audio signal and the sixth audio signal if the first volume is larger than or equal to a first preset threshold value and the second volume is smaller than a second preset threshold value, so as to obtain a first frequency domain signal corresponding to the fifth audio signal and a second frequency domain signal corresponding to the sixth audio signal;

A coherence result obtaining unit, configured to divide the square of the cross power spectrum of the first frequency domain signal and the second frequency domain signal by the product between the power spectrum of the first frequency domain signal and the power spectrum of the second frequency domain signal, to obtain a coherence result.

Optionally, the mixing result obtaining unit includes:

the first judging unit is used for taking the fifth audio signal as a sound mixing result if the coherence result is greater than or equal to a preset coherence threshold value;

and the second judging unit is used for mixing the fifth audio signal and the sixth audio signal to obtain a first mixed signal if the coherence result is smaller than a preset coherence threshold value, and taking the first mixed signal as a mixed result.

Optionally, the signal mixing module includes:

and the sound mixing unit is used for mixing the third audio signal and the fourth audio signal to obtain a second sound mixing signal, and taking the second sound mixing signal as a sound mixing result if the first sound volume is larger than or equal to a first preset threshold value and the second sound volume is larger than or equal to a second preset threshold value.

Example 3

The following are device embodiments of the present application that may be used to perform the method of embodiment 1 of the present application. For details not disclosed in the apparatus embodiments of the present application, please refer to the method in embodiment 1 of the present application.

Referring to fig. 8, the present application further provides an electronic device 300, which may be specifically a computer, a mobile phone, a tablet computer, an interactive tablet, and the like, in an exemplary embodiment of the present application, the electronic device 300 is an interactive tablet, and the interactive tablet may include: at least one processor 301, at least one memory 302, at least one display, at least one network interface 303, a user interface 304, and at least one communication bus 305.

The user interface 304 is mainly used for providing an input interface for a user, and acquiring data input by the user. Optionally, the user interface may also include a standard wired interface, a wireless interface.

The network interface 303 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface), among others.

Wherein a communication bus 305 is used to enable connected communications between these components.

Wherein the processor 301 may include one or more processing cores. The processor uses various interfaces and lines to connect various portions of the overall electronic device, perform various functions of the electronic device, and process data by executing or executing instructions, programs, code sets, or instruction sets stored in memory, and invoking data stored in memory. Alternatively, the processor may be implemented in hardware in at least one of digital signal processing (Digital Signal Processing, DSP), field programmable gate array (Field-Programmable Gate Array, FPGA), programmable logic array (Programmable Logic Array, PLA). The processor may integrate one or a combination of several of a central processing unit (Central Processing Unit, CPU), an image processor (Graphics Processing Unit, GPU), and a modem etc. The CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for rendering and drawing the content required to be displayed by the display layer; the modem is used to handle wireless communications. It will be appreciated that the modem may not be integrated into the processor and may be implemented by a single chip.

The Memory 302 may include a random access Memory (Random Access Memory, RAM) or a Read-Only Memory (Read-Only Memory). Optionally, the memory includes a non-transitory computer readable medium (non-transitory computer-readable storage medium). The memory may be used to store instructions, programs, code sets, or instruction sets. The memory may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing the above-described respective method embodiments, etc.; the storage data area may store data or the like referred to in the above respective method embodiments. The memory may optionally also be at least one storage device located remotely from the aforementioned processor. The memory as a computer storage medium may include an operating system, a network communication module, a user interface module, and an operating application program.

The processor may be configured to call an application program of the video resolution adjustment method stored in the memory, and specifically execute the method steps of the foregoing embodiment 1, and the specific execution process may refer to the specific description shown in embodiment 1, which is not repeated herein.

Example 4

The present application further provides a computer readable storage medium, on which a computer program is stored, where instructions are adapted to be loaded by a processor and execute the method steps of the above-described embodiment 1, and the specific execution process may refer to the specific description shown in the embodiment, which is not repeated herein. The storage medium can be an electronic device such as a personal computer, a notebook computer, a smart phone, a tablet computer and the like.

For the device embodiments, reference is made to the description of the method embodiments for the relevant points, since they essentially correspond to the method embodiments. The above-described apparatus embodiments are merely illustrative, in which components illustrated as separate components may or may not be physically separate, and components shown as units may or may not be physical units, may be located in one place, or may be distributed over multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purposes of the present application. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart block or blocks and/or block diagram block or blocks.

In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, etc., such as Read Only Memory (ROM) or flash RAM. Memory is an example of a computer-readable medium.

Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises an element.

The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and changes may be made to the present application by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc. which are within the spirit and principles of the present application are intended to be included within the scope of the claims of the present application.

Claims

1. A method of mixing sound, the method comprising the steps of:

2. The mixing method according to claim 1, characterized in that:

the preset threshold comprises a first preset threshold and a second preset threshold;

comparing the first volume and the second volume with a preset threshold, and performing audio mixing processing on the third audio signal and the fourth audio signal according to a audio mixing method corresponding to a comparison result to obtain an audio mixing result, wherein the audio mixing method comprises the following steps:

if the first volume is greater than or equal to the first preset threshold and the second volume is less than the second preset threshold, performing second time delay alignment on the third audio signal and the fourth audio signal to obtain a fifth audio signal and a sixth audio signal; the fifth audio signal is a signal obtained by performing second time delay alignment on the third audio signal, and the sixth audio signal is a signal obtained by performing second time delay alignment on the fourth audio signal;

Calculating coherence of the fifth audio signal and the sixth audio signal to obtain a coherence result;

and obtaining a mixing result of the fifth audio signal and the sixth audio signal according to the coherence result.

3. The mixing method according to claim 2, characterized in that:

the step of obtaining a mixing result of the fifth audio signal and the sixth audio signal according to the coherence result includes:

if the coherence result is greater than or equal to a preset coherence threshold, taking the fifth audio signal as a sound mixing result;

and if the coherence result is smaller than the preset coherence threshold, mixing the fifth audio signal and the sixth audio signal to obtain a first mixed signal, and taking the first mixed signal as a mixed result.

4. A mixing method according to any one of claims 2 to 3, characterized in that:

the step of calculating the coherence of the fifth audio signal and the sixth audio signal to obtain a coherence result includes:

if the first volume is greater than or equal to the first preset threshold and the second volume is less than the second preset threshold, performing time-frequency conversion on the fifth audio signal and the sixth audio signal respectively to obtain a first frequency domain signal corresponding to the fifth audio signal and a second frequency domain signal corresponding to the sixth audio signal;

And dividing the square of the cross power spectrum of the first frequency domain signal and the second frequency domain signal by the product between the power spectrum of the first frequency domain signal and the power spectrum of the second frequency domain signal to obtain a coherence result.

5. The mixing method according to claim 1, characterized in that:

and if the first volume is larger than or equal to the first preset threshold value and the second volume is larger than or equal to the second preset threshold value, mixing the third audio signal and the fourth audio signal to obtain a second mixed signal, and taking the second mixed signal as a mixed result.

6. A mixing method according to any one of claims 1 to 3 or claim 5, wherein:

the step of acquiring the first audio signal of the near-field microphone and the second audio signal of the far-field microphone comprises the following steps:

acquiring a first audio signal of a near-field microphone and a third audio signal of a far-field microphone;

And taking the first audio signal as a reference signal, and carrying out silencing treatment on the third audio signal to obtain a second audio signal of the far-field microphone.

7. A mixing device, characterized by comprising:

8. The apparatus of claim 7, wherein the signal mixing module comprises:

The signal alignment unit is used for performing second time delay alignment on the third audio signal and the fourth audio signal if the first volume is larger than or equal to the first preset threshold value and the second volume is smaller than the second preset threshold value, so as to obtain a fifth audio signal and a sixth audio signal; the fifth audio signal is a signal obtained by performing second time delay alignment on the third audio signal, and the sixth audio signal is a signal obtained by performing second time delay alignment on the fourth audio signal;

a coherence calculating unit, configured to calculate coherence of the fifth audio signal and the sixth audio signal, and obtain a coherence result;

and a mixing result obtaining unit, configured to obtain a mixing result of the fifth audio signal and the sixth audio signal according to the coherence result.

9. A computer device, comprising: a processor, a memory and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any one of claims 1 to 6 when the computer program is executed.

10. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the method according to any one of claims 1 to 6.