CN111899753A

CN111899753A - Audio separation device, computer equipment and method

Info

Publication number: CN111899753A
Application number: CN202010703227.8A
Authority: CN
Inventors: 赵安国; 林舸
Original assignee: Tianyu Quanganyin Technology Co ltd
Current assignee: Tianyu Quanganyin Technology Co ltd
Priority date: 2020-07-20
Filing date: 2020-07-20
Publication date: 2020-11-06

Abstract

The invention discloses an audio separation device, computer equipment and a method, wherein the device comprises: the acquisition module is used for acquiring the audio signals to be separated; the extraction module is used for extracting a plurality of frequency points of the audio signal, wherein the frequency points are peak points of the frequency domain of the audio signal; the identification module is used for identifying the phases and the maximum phase difference of the frequency points; and the separation module is used for separating the audio signals according to the maximum phase difference of the frequency points. By implementing the method and the device, the sound of different people in the audio signal to be separated can be separated by utilizing the large phase difference of the sound of different people, so that the later-stage recording is convenient to arrange, and the recording arrangement efficiency is improved.

Description

Audio separation device, computer equipment and method

Technical Field

The invention relates to the technical field of audio processing, in particular to an audio separating device, computer equipment and a method.

Background

The conference recording means that in the process of a conference, a recording person records the organization condition and the specific content of the conference to form a conference record. The term "remember" includes the difference between the detailed description and the abbreviation. The jockey is the main point in the meeting, the important or main language in the meeting. The detailed description requires that the recorded items must be complete and the recorded statements must be complete and detailed. Recording is relied upon if it is desired to leave a conference recording that includes the content described above. Recording includes audio recording and video recording, and for conference recording, audio recording and video recording are usually only means, and finally recorded contents are restored into characters. But the recording in-process has a plurality of sounds to be in the same place, is difficult to distinguish everyone's sound when leading to later stage arrangement recording, so the audio frequency separation method that awaits a urgent need can be divided different people's sound, is favorable to later stage arrangement recording, improves recording arrangement efficiency.

Disclosure of Invention

Therefore, the technical problem to be solved by the present invention is to overcome the defect in the prior art that the sound of different people in a recording file cannot be separated, so as to provide an audio separation apparatus, a computer device and a method.

According to a first aspect, an embodiment of the present invention discloses an audio separation apparatus, including: the acquisition module is used for acquiring the audio signals to be separated; the extraction module is used for extracting a plurality of frequency points of the audio signal, wherein the frequency points are peak points of the frequency domain of the audio signal; the identification module is used for identifying the phases and the maximum phase difference of the frequency points; and the separation module is used for separating the audio signals according to the maximum phase difference of the frequency points.

Optionally, the apparatus further comprises: the judging module is used for judging whether the maximum phase difference is larger than a preset threshold value or not; and the separation submodule is used for separating the audio signals according to the maximum phase difference if the maximum phase difference is larger than a preset threshold value, taking the separated audio signals as new audio signals to be separated, and returning to execute the step of extracting the multiple frequency points of the audio signals.

Optionally, the apparatus further comprises: the first determining module is configured to determine that the audio signal is a single-channel audio signal if the maximum phase difference is smaller than a preset threshold.

Optionally, the separation module comprises: the second determining module is used for determining a first phase shifting value and a second phase shifting value according to the phases of the two frequency points corresponding to the maximum phase difference; the phase shifting module is used for respectively shifting the phase of the audio signal by a first phase shifting value and a second phase shifting value to obtain a first phase shifting audio and a second phase shifting audio; and the merging module is used for merging the first phase-shifted audio and the second phase-shifted audio with the audio signal respectively to obtain a first audio and a second audio.

According to a second aspect, an embodiment of the present invention further discloses an audio separation method, including the following steps: acquiring an audio signal to be separated; extracting a plurality of frequency points of the audio signal, wherein the frequency points are peak points of the audio signal frequency domain; identifying the phases and the maximum phase difference of the multiple frequency points; and separating the audio signals according to the maximum phase difference of the multiple frequency points.

Optionally, before separating the audio signals according to the maximum phase difference of the multiple frequency points, the method further includes: judging whether the maximum phase difference is larger than a preset threshold value or not; and if the maximum phase difference is larger than a preset threshold value, separating the audio signals according to the maximum phase difference, taking the separated audio signals as new audio signals to be separated, and returning to the step of extracting the plurality of frequency points of the audio signals.

Optionally, the method further comprises: and if the maximum phase difference is smaller than a preset threshold value, determining that the audio signal is a single-channel audio signal.

Optionally, the separating the audio signals according to the maximum phase difference of the multiple frequency points includes: determining a first phase shift value and a second phase shift value according to the phases of the two frequency points corresponding to the maximum phase difference; respectively phase-shifting the audio signal by a first phase-shifting value and a second phase-shifting value to obtain a first phase-shifting audio and a second phase-shifting audio; and respectively combining the first phase-shifted audio and the second phase-shifted audio with the audio signal to obtain a first audio and a second audio.

According to a third aspect, an embodiment of the present invention further discloses a computer device, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the steps of the audio separation method according to the second aspect or any one of the alternative embodiments of the second aspect.

According to a fourth aspect, the present invention further discloses a computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the steps of the audio separation method according to the second aspect or any one of the optional embodiments of the second aspect.

The technical scheme of the invention has the following advantages:

the audio separation device and the audio separation method provided by the invention have the advantages that the audio signals to be separated are obtained, the multiple frequency points of the audio signals are extracted, the multiple frequency points are peak points of the frequency domain of the audio signals, the phases and the maximum phase difference of the multiple frequency points are identified, and the audio signals are separated according to the maximum phase difference of the multiple frequency points. The invention can separate the voices of different persons in the audio signal to be separated by utilizing the large phase difference of the voices of different persons, is convenient for later-stage arrangement of the recording records and improves the recording record arrangement efficiency.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

Fig. 1 is a schematic block diagram of a specific example of an audio separating apparatus according to an embodiment of the present invention;

fig. 2 is a schematic block diagram of another specific example of the audio separating apparatus in the embodiment of the present invention;

fig. 3 is a schematic block diagram of still another specific example of an audio separating apparatus according to an embodiment of the present invention;

FIG. 4 is a flow chart of a specific example of an audio separation method in an embodiment of the present invention;

FIG. 5 is a diagram of an exemplary computer device according to an embodiment of the present invention.

Detailed Description

The technical solutions of the present invention will be described clearly and completely with reference to the accompanying drawings, and it should be understood that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In the description of the present invention, it should be noted that the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc., indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, and are only for convenience of description and simplicity of description, but do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus, should not be construed as limiting the present invention. Furthermore, the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

In the description of the present invention, it should be noted that, unless otherwise explicitly specified or limited, the terms "mounted," "connected," and "connected" are to be construed broadly, e.g., as meaning either a fixed connection, a removable connection, or an integral connection; can be mechanically or electrically connected; the two elements may be directly connected or indirectly connected through an intermediate medium, or may be communicated with each other inside the two elements, or may be wirelessly connected or wired connected. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.

In addition, the technical features involved in the different embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.

The embodiment of the invention discloses an audio frequency separation device, as shown in figure 1, comprising:

the obtaining module 11 is configured to obtain an audio signal to be separated.

Illustratively, the audio signal may comprise an audio signal of only one person, or may comprise audio signals of a plurality of persons. The audio signal to be separated may be an audio signal of a conference, or may be a recording of a public security trial, and the like. The audio signal to be separated may be an audio signal obtained by recording in advance from another recording device in a wired or wireless manner, or an audio signal directly collected in real time.

In order to improve the quality of the audio signal, the audio signal to be separated may be subjected to noise reduction processing to filter out noise signals present in the audio signal.

The extracting module 12 is configured to extract multiple frequency points of the audio signal, where the multiple frequency points are peak points of a frequency domain of the audio signal.

For example, the frequency points of the voices of different people are different, and a plurality of peak points in the frequency domain of the audio signal can be extracted through hardware or software, namely, the frequency points are a plurality of frequency points. The frequency point may comprise a sound frequency point of one person, or may comprise frequency points of a plurality of persons.

And the identification module 13 is configured to identify the phases and the maximum phase difference of the multiple frequency points.

For example, when multiple frequency points are from the same person, the same person sounds at the same position, and therefore the phase difference of the person is kept within a certain range, in the embodiment of the present invention, a phase identifier or software may be used to identify the phases corresponding to the multiple phase points, and the phase difference between each two phases is calculated according to each phase, so as to determine the maximum phase difference, and separate the audio signals according to the maximum phase difference.

And the separation module 14 is configured to separate the audio signals according to the maximum phase difference of the multiple frequency points.

For example, in the embodiment of the present invention, the audio signals are separated according to the maximum phase difference of the multiple frequency points, that is, the audio signals are separated according to the phase of the frequency point corresponding to the maximum phase difference, for example, the audio signals are respectively shifted by the phase of two frequency points corresponding to the maximum phase difference, and then are added and combined with the original audio signals, so as to obtain the separated audio signals. In the embodiment of the present invention, the audio signal to be separated may be divided into 2 audio signals according to the maximum phase difference, or may be divided into a plurality of audio signals.

The audio separation device provided by the invention extracts a plurality of frequency points of the audio signal by acquiring the audio signal to be separated, wherein the plurality of frequency points are peak points of the frequency domain of the audio signal, identifies the phases and the maximum phase difference of the plurality of frequency points, and separates the audio signal according to the maximum phase difference of the plurality of frequency points. The invention can separate the voices of different persons in the audio signal to be separated by utilizing the large phase difference of the voices of different persons, is convenient for later-stage arrangement of the recording records and improves the recording record arrangement efficiency.

As an alternative embodiment of the present invention, as shown in fig. 2, the audio separation apparatus further includes:

and the judging module 15 is configured to judge whether the maximum phase difference is greater than a preset threshold.

For example, the preset threshold may be 10 ° or 20 °, and the preset range is not limited in the embodiment of the present invention and may be set according to actual situations.

The separating sub-module 141 is configured to separate the audio signal according to the maximum phase difference if the maximum phase difference is greater than a preset threshold, use the separated audio signal as a new audio signal to be separated, and return to the step of extracting multiple frequency points of the audio signal.

Illustratively, when the maximum phase difference is greater than a preset threshold, the audio signal is considered to contain at least 2 audio signals, and the audio signals are separated according to the maximum phase difference to obtain separated audio signals. In order to separate the audio signals of all people, the separated audio signals are respectively used as new audio signals to be separated, the step of extracting a plurality of frequency points of the audio signals is returned to be executed until the maximum phase difference is smaller than a preset threshold value, the audio signals are considered to belong to the sound of the same person, and the audio signal separation is completed.

the first determining module 142 is configured to determine that the audio signal is a single-channel audio signal if the maximum phase difference is smaller than a preset threshold.

Illustratively, if the maximum phase difference is smaller than the preset threshold, the audio signal is considered to contain only one person's voice, i.e. the audio signal is a single-channel audio signal, and is not separated any more.

As an alternative embodiment of the present invention, as shown in fig. 3, the audio separating apparatus further includes:

the second determining module 1411 is configured to determine the first phase shift value and the second phase shift value according to the phase of the two frequency points corresponding to the maximum phase difference.

Exemplarily, the determining of the first phase shift value and the second phase shift value according to the phase of the two frequency points corresponding to the maximum phase difference may be to respectively use the phase corresponding to the two frequency points as the first phase shift value and the second phase shift value, or may be to respectively use the phase corresponding to the two frequency points and the difference of 180 ° as the first phase shift value and the second phase shift value.

In the embodiment of the present invention, the first phase shift value and the second phase shift value are determined by taking as an example that the first phase shift value and the second phase shift value are obtained according to the difference between the phase corresponding to the two frequency points and 180 °, for example, when the phases of the two frequency points corresponding to the maximum phase difference are a ═ P (100Hz) and b ═ P (1800Hz), respectively, the first phase shift value may be (180-a), and the second phase shift value may be (180-b), where P (100Hz) and P (1800Hz) represent the phases corresponding to the frequency points of 100Hz and 1800 Hz.

The phase shifting module 1412 is configured to shift the audio signal by a first phase shift value and a second phase shift value, respectively, to obtain a first phase-shifted audio and a second phase-shifted audio.

Exemplarily, the audio signal is respectively shifted by a first phase shift value and a second phase shift value, and the obtaining of the first phase-shifted audio and the second phase-shifted audio may be to integrally shift the audio signal by the first phase shift value and the second phase shift value to the left, respectively, to obtain the first phase-shifted audio and the second phase-shifted audio, or may be to integrally shift the audio signal by the first phase shift value and the second phase shift value to the right, respectively, to obtain the first phase-shifted audio and the second phase-shifted audio.

A combining module 1413, configured to combine the first phase-shifted audio and the second phase-shifted audio with the audio signal to obtain a first audio and a second audio.

Illustratively, the first phase-shifted audio and the second phase-shifted audio are respectively combined with the audio signal to obtain the first audio and the second audio, and the first audio and the second audio are obtained by respectively adding and combining the first phase-shifted audio and the second phase-shifted audio with the audio signal according to the inverse cancellation principle, and the audio signal of the two audio frequency points is eliminated, so that the audio signal is divided into two audio signals.

According to the embodiment of the invention, the audio signals are separated according to the dynamic analysis of the frequency domain and the phase, the audio of each person does not need to be stored in advance, and compared with the audio stored in advance, the storage space is saved, and the network load is reduced.

The embodiment of the invention also discloses an audio separation method, which comprises the following steps as shown in fig. 4:

s21: an audio signal to be separated is acquired. The specific implementation manner is described in the obtaining module 11 in the embodiment, and is not described herein again.

S22: and extracting a plurality of frequency points of the audio signal, wherein the frequency points are peak points of the audio signal frequency domain. The specific implementation manner is described in the extraction module 12 in the embodiment, and is not described herein again.

S23: and identifying the phases and the maximum phase difference of a plurality of frequency points. The specific implementation manner is described in the identification module 13 in the embodiment, and is not described herein again.

S24: and separating the audio signals according to the maximum phase difference of the multiple frequency points. The specific implementation manner is described in the separation module 14 in the embodiment, and is not described herein again.

The audio separation method provided by the invention extracts a plurality of frequency points of the audio signal by acquiring the audio signal to be separated, wherein the plurality of frequency points are peak points of the frequency domain of the audio signal, identifies the phases and the maximum phase difference of the plurality of frequency points, and separates the audio signal according to the maximum phase difference of the plurality of frequency points. The invention can separate the voices of different persons in the audio signal to be separated by utilizing the large phase difference of the voices of different persons, is convenient for later-stage arrangement of the recording records and improves the recording record arrangement efficiency.

As an optional embodiment of the present invention, before step S24, the audio separation method further includes:

and judging whether the maximum phase difference is larger than a preset threshold value. The specific implementation manner is described in the determination module 15 in the embodiment, and is not described herein again.

And if the maximum phase difference is larger than a preset threshold value, separating the audio signals according to the maximum phase difference, taking the separated audio signals as new audio signals to be separated, and returning to the step of extracting the multiple frequency points of the audio signals. The specific implementation manner is described in the separation sub-module 141 in the embodiment, and is not described herein again.

And if the maximum phase difference is smaller than a preset threshold value, determining that the audio signal is a single-channel audio signal. The specific implementation manner is described in the first determining module 142 in the embodiment, and is not described herein again.

As an optional embodiment of the present invention, the separating the audio signals according to the maximum phase difference of the multiple frequency points includes:

and determining a first phase shift value and a second phase shift value according to the phases of the two frequency points corresponding to the maximum phase difference. The specific implementation manner is described in the second determining module 1411 in the embodiment, and is not described herein again.

And respectively shifting the phase of the audio signal by a first phase shifting value and a second phase shifting value to obtain a first phase shifting audio and a second phase shifting audio. The specific implementation manner is described in the phase shift module 1412 in the embodiment, and is not described herein again.

And respectively combining the first phase-shifted audio and the second phase-shifted audio with the audio signal to obtain a first audio and a second audio. The specific implementation manner is described in the merging module 1413 in the embodiment, and is not described herein again.

An embodiment of the present invention further provides a computer device, as shown in fig. 5, the computer device may include a processor 31 and a memory 32, where the processor 31 and the memory 32 may be connected by a bus or in another manner, and fig. 5 takes the example of connection by a bus as an example.

The processor 31 may be a Central Processing Unit (CPU). The Processor 31 may also be other general purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, or combinations thereof.

The memory 32, which is a non-transitory computer-readable storage medium, may be used to store non-transitory software programs, non-transitory computer-executable programs, and modules, such as program instructions/modules (e.g., the acquisition module 11, the extraction module 12, the identification module 13, and the separation module 14 shown in fig. 1) corresponding to the audio separation method in the embodiment of the present invention. The processor 31 executes various functional applications and data processing of the processor by running non-transitory software programs, instructions and modules stored in the memory 32, that is, implements the audio separation method in the above-described method embodiments.

The memory 32 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created by the processor 31, and the like. Further, the memory 32 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 32 may optionally include memory located remotely from the processor 31, and these remote memories may be connected to the processor 31 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The one or more modules are stored in the memory 32 and, when executed by the processor 31, perform the audio separation method as in the embodiment shown in fig. 4.

The details of the computer device can be understood by referring to the corresponding related description and effects in the embodiment shown in fig. 4, and are not described herein again.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic Disk, an optical Disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a Flash Memory (Flash Memory), a Hard Disk (Hard Disk Drive, abbreviated as HDD) or a Solid State Drive (SSD), etc.; the storage medium may also comprise a combination of memories of the kind described above.

Although the embodiments of the present invention have been described in conjunction with the accompanying drawings, those skilled in the art may make various modifications and variations without departing from the spirit and scope of the invention, and such modifications and variations fall within the scope defined by the appended claims.

Claims

1. An audio separation apparatus, comprising:

the acquisition module is used for acquiring the audio signals to be separated;

the extraction module is used for extracting a plurality of frequency points of the audio signal, wherein the frequency points are peak points of the frequency domain of the audio signal;

the identification module is used for identifying the phases and the maximum phase difference of the frequency points;

and the separation module is used for separating the audio signals according to the maximum phase difference of the frequency points.

2. The apparatus of claim 1, further comprising:

the judging module is used for judging whether the maximum phase difference is larger than a preset threshold value or not;

and the separation submodule is used for separating the audio signals according to the maximum phase difference if the maximum phase difference is larger than a preset threshold value, taking the separated audio signals as new audio signals to be separated, and returning to execute the step of extracting the multiple frequency points of the audio signals.

3. The apparatus of claim 2, further comprising:

the first determining module is configured to determine that the audio signal is a single-channel audio signal if the maximum phase difference is smaller than a preset threshold.

4. The apparatus of claim 1 or 2, wherein the separation module comprises:

the second determining module is used for determining a first phase shifting value and a second phase shifting value according to the phases of the two frequency points corresponding to the maximum phase difference;

the phase shifting module is used for respectively shifting the phase of the audio signal by a first phase shifting value and a second phase shifting value to obtain a first phase shifting audio and a second phase shifting audio;

and the merging module is used for merging the first phase-shifted audio and the second phase-shifted audio with the audio signal respectively to obtain a first audio and a second audio.

5. An audio separation method, comprising the steps of:

acquiring an audio signal to be separated;

extracting a plurality of frequency points of the audio signal, wherein the frequency points are peak points of the audio signal frequency domain:

identifying the phases and the maximum phase difference of the multiple frequency points;

and separating the audio signals according to the maximum phase difference of the multiple frequency points.

6. The method of claim 5, wherein before the separating the audio signals according to the maximum phase difference of the plurality of frequency points, the method further comprises:

judging whether the maximum phase difference is larger than a preset threshold value or not;

and if the maximum phase difference is larger than a preset threshold value, separating the audio signals according to the maximum phase difference, taking the separated audio signals as new audio signals to be separated, and returning to the step of extracting the plurality of frequency points of the audio signals.

7. The method of claim 6, wherein the audio signal is determined to be a single-channel audio signal if the maximum phase difference is less than a preset threshold.

8. The method according to claim 5 or 6, wherein the separating the audio signals according to the maximum phase difference of the plurality of frequency points comprises:

determining a first phase shift value and a second phase shift value according to the phases of the two frequency points corresponding to the maximum phase difference;

respectively phase-shifting the audio signal by a first phase-shifting value and a second phase-shifting value to obtain a first phase-shifting audio and a second phase-shifting audio;

and respectively combining the first phase-shifted audio and the second phase-shifted audio with the audio signal to obtain a first audio and a second audio.

9. A computer device, comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the steps of the audio separation method of any of claims 5-8.

10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the audio separation method according to any one of claims 5 to 8.