WO2024093264A1

WO2024093264A1 - Audio detection method, apparatus and device

Info

Publication number: WO2024093264A1
Application number: PCT/CN2023/101461
Authority: WO
Inventors: 陈亚克
Original assignee: 华为云计算技术有限公司
Priority date: 2022-10-31
Filing date: 2023-06-20
Publication date: 2024-05-10
Also published as: CN117998011A

Abstract

The present application provides an audio detection method, apparatus and device. The method comprises: acquiring a first audio code stream sent by an audio sender to an audio receiver; acquiring a second audio code stream returned by the audio receiver, wherein the second audio code stream carries identity information of the audio receiver; according to the first audio code stream and the second audio code stream, detecting whether the second audio code stream comprises an echo; and when the echo is detected, according to the identity information, determining the audio receiver generating the echo. The present application can achieve quickly identifying the source of echo generation, and improves the use experience of users.

Description

Audio detection method, device and equipment

This application claims priority to the Chinese patent application filed with the China Patent Office on October 31, 2022, with application number 202211352488.5 and application name “Audio detection method, device and equipment”, the entire contents of which are incorporated by reference in this application.

Technical Field

The present application relates to the technical field of conference systems, and in particular to an audio detection method, device and equipment.

Background technique

In the related art, the generation and operation of a voice conference includes the following steps: the conference terminal establishes a connection with the conference media server; the conference media server receives the media stream sent by each conference terminal and forwards the media stream to each conference terminal; each conference terminal plays the media stream received from the conference media server, and the above process is repeated until the voice conference ends.

During a voice conference, each conference terminal may have an echo. Currently, conference staff locate the source of the echo by muting each conference terminal one by one, which takes a long time and affects the user experience.

Summary of the invention

In view of this, the present application provides an audio detection method, apparatus and device, which can quickly identify the source of the echo and improve the user experience.

A first aspect of an embodiment of the present application provides an audio detection method, comprising: obtaining a second audio code stream returned by the audio receiver, wherein the second audio code stream carries identity information of the audio receiver; detecting whether the second audio code stream includes an echo based on the first audio code stream and the second audio code stream; and when the echo is detected, determining the audio receiver that generates the echo based on the identity information.

Compared with the related art, the embodiments of the present application have at least the following advantages:

Whether an echo is generated can be accurately determined based on the sent first audio code stream and the received second audio code stream; since the second audio code stream returned by the audio receiver contains the identity information of the audio receiver, when an echo is detected, the source of the echo can be quickly identified based on the identity information, which effectively improves the efficiency of echo detection and also improves the user experience.

In some possible implementations, the obtaining the second audio code stream returned by the audio receiver includes: obtaining multiple second audio code streams returned by multiple audio receivers, wherein one audio receiver corresponds to one second audio code stream; the detecting whether the second audio code stream includes an echo according to the first audio code stream and the second audio code stream includes: detecting whether the multiple second audio code streams include the echo according to the first audio code stream and the multiple second audio code streams, and when it is detected that the second audio code stream includes the echo, identifying out a second audio code stream including the echo from a plurality of the second audio code streams; and determining the audio receiver of the echo according to the identity information comprises: determining the audio receiver that generates the echo according to the identity information corresponding to the second audio code stream including the echo.

By adopting this technical solution, when there are multiple audio receivers, the identity information of the audio receiver that generates the echo can be determined.

In some possible implementations, detecting whether the plurality of second audio code streams include echoes according to the first audio code stream and the plurality of second audio code streams includes: performing coherence detection on the first audio code stream and each of the second audio code streams respectively; and determining whether any second audio code stream includes echoes according to a result of the coherence detection.

By adopting this technical solution, the audio code stream containing the echo can be accurately detected, thereby improving the reliability of the audio detection method.

In some possible implementations, after determining the audio receiver that generates the echo based on the identity information, the method further includes: uploading the identity information to a signaling server, so that after receiving the identity information, the signaling server sends a reminder message to the audio receiver corresponding to the identity information.

By adopting this technical solution, the audio receiver that generates the echo can be notified more quickly, so that the audio receiver can eliminate the echo as soon as possible, further improving the user experience.

A second aspect of an embodiment of the present application provides an audio detection device, comprising: an audio sending module, the audio sending module being used to send a first audio code stream to an audio receiving module; an audio receiving module, the audio receiving module being used to receive the first audio code stream and send a second audio code stream to the audio sending module, wherein the second audio code stream carries identity information of an audio receiver; the audio sending module is also used to receive the second audio code stream, detect whether the second audio code stream includes an echo based on the first audio code stream and the second audio code stream, and when the echo is detected, determine the audio receiver that generates the echo based on the identity information.

In some possible implementations, the audio sending module is specifically used to: perform coherence detection on the first audio code stream and each of the second audio code streams respectively; and determine whether any second audio code stream includes an echo based on a result of the coherence detection.

In some possible implementations, there are multiple audio receiving modules that return the second audio code stream, wherein one audio receiving module corresponds to one second audio code stream; the audio sending module is specifically used to: receive multiple second audio code streams returned by multiple audio receiving modules, detect whether there is a second audio code stream including the echo among the multiple second audio code streams according to the first audio code stream and the multiple second audio code streams, and when it is detected that there is a second audio code stream including the echo, identify the second audio code stream including the echo among the multiple second audio code streams; determine the audio receiver that generates the echo according to the identity information corresponding to the second audio code stream including the echo.

In some possible implementations, the audio detection device also includes a signaling server; the audio sending module is used to upload the identity information to the signaling server; and the signaling server is used to send a reminder message to an audio receiver corresponding to the identity information after receiving the identity information.

The third aspect of an embodiment of the present application discloses a computing device cluster, including at least one computing device, each computing device including a processor and a memory; the processor of the at least one computing device is used to execute instructions stored in the memory of the at least one computing device, so that the computing device cluster performs the above-mentioned audio detection method.

A fourth aspect of an embodiment of the present application discloses a computer program product comprising instructions. When the instructions are executed by a computing device cluster, the computing device cluster executes the above-mentioned audio detection method.

A fifth aspect of an embodiment of the present application discloses a computer-readable storage medium, including computer program instructions. When the computer program instructions are executed by a computing device cluster, the computing device cluster executes the above-mentioned audio detection method.

It can be understood that the computer device cluster of the third aspect, the computer program product of the fourth aspect and the computer-readable storage medium of the fifth aspect provided above all correspond to the method of the first aspect. Therefore, the beneficial effects that can be achieved can refer to the beneficial effects in the corresponding methods provided above and will not be repeated here.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG1 is a diagram of an echo generation scene in the related art;

FIG2 is a schematic diagram of a flow chart of an audio detection method provided in an embodiment of the present application;

FIG3 is a schematic diagram of a flow chart of an audio detection method provided in an embodiment of the present application;

FIG4 is a schematic diagram of a flow chart of an audio detection method provided in an embodiment of the present application;

FIG5 is a diagram of an interactive process of audio detection provided by an embodiment of the present application;

FIG6 is a schematic diagram of functional modules of an audio detection system provided in an embodiment of the present application;

FIG7 is a schematic diagram of the hardware structure of a computing device provided in an embodiment of the present application;

FIG8 is a schematic diagram of the hardware structure of a computing device cluster provided in an embodiment of the present application;

FIG. 9 is a schematic diagram of the hardware structure of a computing device cluster provided in an embodiment of the present application.

Detailed ways

In order to more clearly understand the above-mentioned purposes, features and advantages of the present application, the present application is described in detail below in conjunction with the accompanying drawings and specific implementations. It should be noted that the implementations of the present application and the features in the implementations can be combined with each other without conflict.

In the following description, many specific details are set forth to facilitate a full understanding of the present application. The described implementations are only part of the implementations of the present application, rather than all of the implementations.

Unless otherwise defined, all technical and scientific terms used herein have the same meaning as those commonly understood by those skilled in the art to which this application belongs. The terms used herein in the specification of this application are only for the purpose of describing specific embodiments and are not intended to limit this application.

It should be further noted that, in this document, the terms "include", "comprises" or any other variations thereof are intended to cover non-exclusive inclusion, so that a process, method, article or device that includes a series of elements includes not only those elements, but also other elements not explicitly listed, or also includes elements inherent to such process, method, article or device. In the absence of more limitations, the term "includes" or "comprising" or any other variations thereof is intended to cover non-exclusive inclusion, so that a process, method, article or device that includes a series of elements includes not only those elements, but also other elements not explicitly listed, or also includes elements inherent to such process, method, article or device. The phrase "comprising a ..." specifies an element, does not exclude the existence of other identical elements in the process, method, article or apparatus comprising the element.

In this application, "at least one" means one or more, and "more than one" means two or more than two. "And/or" describes the association relationship of associated objects, indicating that three relationships may exist. For example, A and/or B can mean: A exists alone, A and B exist at the same time, and B exists alone, where A and B can be singular or plural. The terms "first", "second", "third", "fourth", etc. (if any) in the specification, claims and drawings of this application are used to distinguish similar objects, rather than to describe a specific order or sequence.

In the embodiments of the present application, words such as "exemplary" or "for example" are used to indicate examples, illustrations or descriptions. Any embodiment or design described as "exemplary" or "for example" in the embodiments of the present application should not be interpreted as being more preferred or more advantageous than other embodiments or designs. Specifically, the use of words such as "exemplary" or "for example" is intended to present related concepts in a specific way.

As shown in Figure 1, it is a scene diagram of echo generation in the related art. In the related art, terminal A sends a normal audio code stream to the media server, and the media server forwards the normal audio code stream to terminals B, C, and D. The dotted arrow indicates the transmission path of the echo audio code stream generated after terminal B leaks the echo. After the echo audio code stream is transmitted to the media server, it is forwarded by the media server to terminals A, C, and D. Terminals A, C, and D hear the echo, resulting in poor sound effects for terminals A, C, and D.

However, there is no method for detecting the source of the echo in the related art, and the echo source needs to be located by muting each terminal one by one, which is time-consuming and affects the user experience.

Please refer to Figure 2, which is a flow chart of the audio detection method provided in an embodiment of the present application. This embodiment is applied to the audio sender side, as shown in Figure 2, and includes the following steps:

Step S101: Acquire a first audio code stream sent by an audio sender to an audio receiver.

In some embodiments, the audio sender is a terminal device that generates the first audio code stream, such as a microphone, a computer, a mobile communication device, etc. The audio sender can send the first audio code stream to one audio receiver, or can send the first audio code stream to multiple audio receivers.

In some embodiments, the audio sender sends the first audio code stream to the conference media server, and the conference media server forwards the first audio code stream to the audio receiver.

In some embodiments, the main functions of the conference media server are: media routing allocation, media code stream forwarding, etc. In this embodiment, the conference media server forwards the audio code stream to each terminal.

Step S102: Acquire a second audio code stream returned by the audio receiver, where the second audio code stream carries the identity information of the audio receiver.

In some embodiments, the audio receiver is a terminal device that generates the second audio code stream, such as a microphone, a computer, a mobile communication device, etc.

In some embodiments, when joining a conference, the conference media server will assign a unique ID to the audio receiver or the audio sender. Assuming that the audio receiver is terminal B, when terminal B joins the conference, the conference media server assigns a unique ID to terminal B, which is the identity information of terminal B.

In this way, all subsequent audio stream packets sent by terminal B contain the ID. After receiving the audio code stream sent by terminal B, the terminal can identify that the audio code stream comes from terminal B by parsing the ID field in the audio code stream packet.

In some embodiments, the identity information does not involve the information security of the terminal, but is only a field that represents the identity of the terminal.

Step S103: According to the first audio code stream and the second audio code stream, detecting whether the second audio code stream includes an echo.

In some embodiments, the audio sender performs coherence detection on the audio code stream (first audio code stream) sent by itself and the received audio code stream (second audio code stream) to evaluate whether the audio sender receives an echo.

The specific process of coherence detection will be described in detail later, and will not be described here again to avoid repetition.

Step S104: when an echo is detected, determining the audio receiver generating the echo according to the identity information.

In some embodiments, it can be known from the foregoing content that the audio sender can identify the identity of the audio receiver that sends the second audio stream by parsing the ID field in the second audio stream.

Compared with the related art, the embodiments of the present application have at least the following advantages: whether an echo is generated can be accurately determined based on the sent first audio code stream and the received second audio code stream; because the second audio code stream returned by the audio receiver contains the identity information of the audio receiver, when an echo is detected, the source of the echo can be quickly identified based on the identity information, which effectively improves the efficiency of echo detection and improves the user experience.

Please refer to FIG. 3 , which is a flow chart of an audio detection method provided in an embodiment of the present application. This embodiment is a further description of the aforementioned embodiment, and specifically describes: when there are multiple audio receivers, a specific method for determining the audio receiver that generates an echo.

This embodiment is applied to the audio sender side, as shown in FIG3 , and includes the following steps:

S201: Send a first audio code stream to multiple audio receivers.

S202: Acquire multiple second audio code streams returned by multiple audio receivers.

Steps S201 to S202 of this embodiment are similar to steps S101 to S102 of the aforementioned embodiment, and are not described again here to avoid repetition.

S203: Detecting, according to the first audio code stream and the plurality of second audio code streams, whether any of the plurality of second audio code streams includes an echo, and identifying, when detecting that any of the second audio code streams includes an echo, the second audio code stream including the echo in the plurality of second audio code streams.

In some embodiments, an audio code stream containing an echo among multiple second audio code streams can be identified in the following manner: performing coherence detection on the first audio code stream and each second audio code stream respectively; determining whether an echo is generated based on the result of the coherence detection; and using the second audio code stream corresponding to the echo determined to be generated based on the result of the coherence detection as the audio code stream containing the echo.

Specifically, assuming that there are 5 audio receivers returning the second audio code stream, after receiving the 5 second audio code streams, the audio sender performs coherence detection on the first audio code stream and each second audio code stream respectively, and obtains 5 coherence detection results, so that it can be known which audio receiver generates the echo according to the detection results. It is understandable that multiple audio receivers may also generate the echo.

In some embodiments, the auto-power spectral density of the first audio code stream and each second audio code stream is obtained, The cross power spectral density of the first audio code stream and each second audio code stream is used for coherence detection by the following formula:

Among them, Gxy(f) is the cross power spectral density, Gxx(f) is the auto power spectral density of the first audio code stream, Gyy(f) is the auto power spectral density of the second audio code stream, and Cxy(f) is the coherence.

Specifically, the coherence is greater than or equal to 0 and less than or equal to 1, 0 represents completely irrelevant, and 1 represents completely coherent. When the coherence is greater than or equal to the constant K, it is determined that the audio receiver corresponding to the coherence has generated an echo. It can be understood that this embodiment does not specifically limit the size of the constant K, and it can be set according to actual needs.

S204: Determine an audio receiver generating the echo according to the identity information corresponding to the second audio code stream including the echo.

Please refer to Figure 4, which is a flow chart of an audio detection method provided in an embodiment of the present application. This embodiment is a further improvement of the aforementioned embodiment, and the main improvement is that: in this embodiment, after determining the audio receiver that generates the echo according to the identity information, it also includes: uploading the identity information to a server, wherein the server sends a reminder message to the audio receiver corresponding to the identity information after receiving the identity information. In this way, the audio receiver that generates the echo can be notified more quickly, so that the audio receiver can eliminate the echo as soon as possible, further improving the user experience.

This embodiment is applied to the audio sender side, as shown in FIG4 , and includes the following steps:

Step S301: Acquire a first audio code stream sent by an audio sender to an audio receiver.

Step S302: Acquire a second audio code stream returned by the audio receiver, where the second audio code stream carries the identity information of the audio receiver.

Step S303: According to the first audio code stream and the second audio code stream, detecting whether the second audio code stream includes an echo.

Step S304: when an echo is detected, determining the audio receiver generating the echo according to the identity information.

Steps S301 to S304 of this embodiment are similar to steps S101 to S104 of the above-mentioned embodiment, and are not described again here to avoid repetition.

Step S305: Upload the identity information to the signaling server.

In some embodiments, the audio sender uploads the identity information to the signaling server, and after receiving the identity information, the signaling server sends a reminder message to the audio receiver corresponding to the identity information. After receiving the reminder message, the audio receiver turns off the microphone to eliminate the echo.

For ease of understanding, the audio detection process of this embodiment is specifically described below in conjunction with FIG. 5 :

Assume that the audio sender is terminal A, and the audio receivers are terminals B, C, and D. Terminals A, B, C, and D are connected to the conference system. The terminals D forward their respective audio code streams through the media server. The conference system has a terminal A downlink echo detection module, which is used to detect whether terminals B, C and D generate echoes.

It can be understood that the downlink echo detection module of terminal A shown in FIG. 4 is a hardware device. In practical applications, the downlink echo detection module of terminal A may also be a functional module of terminal A.

(1) Terminal A sends a first audio code stream to the media server and the downlink echo detection module of terminal A, and the media server forwards the first audio code stream to terminals B, C, and D.

(2) Terminal B, terminal C and terminal D send the second audio code stream to the media server respectively, and the media server forwards the second audio code stream to terminal A and the downlink echo detection module of terminal A. It can be understood that terminal A and the downlink echo detection module of terminal A will receive three different second audio code streams, and the second audio code streams sent by terminal B, terminal C and terminal D are different from each other, and the second audio code stream sent by each terminal contains its own identity information.

(3) The downlink echo detection module of terminal A performs coherence detection on the first audio code stream and each second audio code stream respectively. When an echo is detected, the module analyzes the second audio code stream and detects that the echo is generated by terminal B.

(4) Report the identity information of terminal B to the signaling server.

(5) The signaling server sends a reminder message to terminal B, and terminal B turns off the microphone or replaces another microphone device to eliminate the echo.

In some embodiments, the conference system accessed by the above-mentioned terminal A, terminal B, terminal C and terminal D has a conference background management system, and the main function of the conference background management system is to monitor the operation of the entire conference system and summarize the data. The data sources of the conference background management system are: terminal A, terminal B, terminal C, terminal D, media server and signaling server will actively report data to the conference background management system. Therefore, when it is detected that terminal B generates an echo, the downlink echo detection module of terminal A will also report the information to the conference background management system.

Please refer to Figure 6, which is a functional module diagram of an audio detection device 600 provided in an embodiment of the present application, including: an audio sending module 10, the audio sending module 10 is used to send a first audio code stream to an audio receiving module 20; an audio receiving module 20, the audio receiving module 20 is used to receive the first audio code stream, and send a second audio code stream to the audio sending module 10, wherein the second audio code stream carries the identity information of the audio receiver; the audio sending module 10 is also used to receive the second audio code stream, and according to the first audio code stream and the second audio code stream, detect whether the second audio code stream includes an echo, and when an echo is detected, determine the audio receiver that generates the echo according to the identity information. The module referred to in the embodiment of the present application can be a program segment that completes a specific function, which is more suitable for describing the execution process of the software in the processor than a program. The one or more modules can be stored in a memory and configured to be executed by one or more processors.

In some embodiments, a conference media server 30 is included, which may be an audio A functional module of the detection device 600, the conference media server 30 can also be a hardware device or functional module independent of the audio detection device 600. As shown in FIG6 , the conference media server 30 is a hardware device independent of the audio detection device 600, the audio sending module 10 sends the first audio code stream to the conference media server 30, and the conference media server 30 forwards the first audio code stream to the audio receiving module 20; the audio receiving module 20 sends the second audio code stream to the conference media server 30, and the conference media server 30 forwards the second audio code stream to the audio sending module 10.

In some embodiments, there are multiple audio receiving modules 20 that return the second audio code stream, wherein one audio receiving module 20 corresponds to one second audio code stream; the audio sending module 10 is specifically used to: receive multiple second audio code streams returned by multiple audio receiving modules 20, detect whether an echo is generated according to the first audio code stream and the multiple second audio code streams, and when an echo is detected, identify the audio code stream containing the echo in the multiple second audio code streams; determine the audio receiver that generates the echo according to the identity information corresponding to the audio code stream containing the echo.

In some embodiments, the audio sending module 10 performs coherence detection on the first audio code stream and each second audio code stream respectively; and determines whether any second audio code stream includes an echo according to the result of the coherence detection.

In some embodiments, the audio detection device 600 also includes a signaling server 40, and the audio sending module 10 is used to upload the identity information to the signaling server 40; the signaling server 40 is used to send a reminder message to the audio receiver corresponding to the identity information after receiving the identity information.

In some embodiments, the signaling server 40 is a hardware device or functional module independent of the audio detection device 600 .

Please refer to FIG. 7, which is a schematic diagram of the hardware structure of a computing device 100 provided in an embodiment of the present application. As shown in FIG. 7, the computing device 100 includes: a bus 102, a processor 104, a memory 106, and a communication interface 108. The processor 104, the memory 106, and the communication interface 108 communicate through the bus 102. The computing device 100 can be a server or a terminal device. It should be understood that the present application does not limit the number of processors and memories in the computing device 100.

The bus 102 may be a peripheral component interconnect (PCI) bus or an extended industry standard architecture (EISA) bus, etc. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of representation, FIG. 7 is represented by only one line, but does not mean that there is only one bus or one type of bus. The bus 104 may include a path for transmitting information between various components of the computing device 100 (e.g., the memory 106, the processor 104, the communication interface 108).

The processor 104 may include any one or more processors such as a central processing unit (CPU), a graphics processing unit (GPU), a microprocessor (MP) or a digital signal processor (DSP).

The memory 106 may include a volatile memory, such as a random access memory (RAM). The processor 104 may also include a non-volatile memory, such as a read-only memory (ROM), a flash memory, a hard disk drive (HDD), or a solid state drive (SSD).

The memory 106 stores executable program code, and the processor 104 executes the executable program code. The codes are used to implement the functions of the aforementioned audio sending module and audio receiving module respectively, thereby implementing the audio detection method. That is, the memory 106 stores instructions for executing the audio detection method.

The communication interface 103 uses a transceiver module such as, but not limited to, a network interface card or a transceiver to implement communication between the computing device 100 and other devices or a communication network.

The embodiment of the present application also provides a computing device cluster. The computing device cluster includes at least one computing device. The computing device can be a server, such as a central server, an edge server, or a local server in a local data center. In some embodiments, the computing device can also be a terminal device such as a desktop computer, a laptop computer, or a smart phone.

As shown in Fig. 8, the computing device cluster includes at least one computing device 100. The memory 106 in one or more computing devices 100 in the computing device cluster may store the same instructions for executing the audio detection method.

In some possible implementations, the memory 106 of one or more computing devices 100 in the computing device cluster may also store partial instructions for executing the audio detection method. In other words, the combination of one or more computing devices 100 may jointly execute instructions for executing the audio detection method.

It should be noted that the memory 106 in different computing devices 100 in the computing device cluster can store different instructions, which are respectively used to execute part of the functions of the audio detection module. That is, the instructions stored in the memory 106 in different computing devices 100 can implement the functions of one or more modules in the audio sending module and the audio receiving module.

In some possible implementations, one or more computing devices in the computing device cluster can be connected via a network. Wherein, the network can be a wide area network or a local area network, etc. FIG. 9 shows a possible implementation. As shown in FIG. 8 , two computing devices 100A and 100B are connected via a network. Specifically, the network is connected via a communication interface in each computing device. In this class of possible implementations, the memory 106 in the computing device 100A stores instructions for executing the functions of the audio transmission module. Meanwhile, the memory 106 in the computing device 100B stores instructions for executing the functions of the audio receiving module.

The connection method between the computing device clusters shown in Figure 9 can be considered to be that the audio detection method provided in this application requires a large amount of data storage and analysis, so it is considered to hand over the functions implemented by the audio receiving module to the computing device 100B for execution.

It should be understood that the functions of the computing device 100A shown in FIG9 may also be completed by multiple computing devices 100. Similarly, the functions of the computing device 100B may also be completed by multiple computing devices 100.

The embodiment of the present application also provides another computing device cluster. The connection relationship between the computing devices in the computing device cluster can be similar to the connection mode of the computing device cluster described in Figures 8 and 9. The difference is that the memory 106 in one or more computing devices 100 in the computing device cluster can store the same instructions for executing the above-mentioned audio detection method.

The present application also provides a computer program product including instructions. The computer program product may be a computer program product including instructions that can be run on a computing device or stored in any available medium. The software or program product, when executed on at least one computing device, causes the at least one computing device to perform the audio detection method.

This embodiment also provides a computer-readable storage medium, which can be any available medium that can be stored by a computing device or a data storage device such as a data center containing one or more available media. The available medium can be a magnetic medium (e.g., a floppy disk, a hard disk, a tape), an optical medium (e.g., a DVD), or a semiconductor medium (e.g., a solid-state hard disk), etc. The computer-readable storage medium includes instructions that instruct the computing device to execute the audio detection method.

In several embodiments provided in the present application, the disclosed devices and methods can be implemented in other ways. For example, the device embodiments described above are schematic. For example, the division of the modules or units is a logical function division. There may be other division methods in actual implementation, such as multiple units or components can be combined or integrated into another device, or some features can be ignored or not executed. Another point is that the mutual coupling or direct coupling or communication connection shown or discussed can be through some interfaces, indirect coupling or communication connection of devices or units, which can be electrical, mechanical or other forms.

The unit described as a separate component may or may not be physically separated, and the component shown as a unit may be one physical unit or multiple physical units, that is, it may be located in one place or distributed in multiple different places. Some or all of the units may be selected according to actual needs to achieve the purpose of the present embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit. The above-mentioned integrated unit may be implemented in the form of hardware or in the form of software functional units.

If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a readable storage medium. Based on this understanding, the technical solution of the embodiment of the present application is essentially or the part that contributes to the prior art or all or part of the technical solution can be embodied in the form of a software product, which is stored in a storage medium and includes several instructions to enable a device (which can be a single-chip microcomputer, chip, etc.) or a processor (processor) to execute all or part of the steps of the method described in each embodiment of the present application. The aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (ROM), random access memory (RAM), disk or optical disk and other media that can store program code.

The above description is only a specific implementation of the present application, but the protection scope of the present application is not limited thereto. Any changes or substitutions within the technical scope disclosed in the present application should be included in the protection scope of the present application.

Claims

An audio detection method, characterized by comprising:

Obtaining a first audio code stream sent by an audio sender to an audio receiver;

Acquire a second audio code stream returned by the audio receiver, wherein the second audio code stream carries the identity information of the audio receiver;

Detecting, according to the first audio code stream and the second audio code stream, whether the second audio code stream includes an echo;

When the echo is detected, the audio receiver generating the echo is determined according to the identity information.
The audio detection method according to claim 1, wherein obtaining the second audio code stream returned by the audio receiver comprises:

Acquire multiple second audio code streams returned by multiple audio receivers, wherein one audio receiver corresponds to one second audio code stream;

The detecting, according to the first audio code stream and the second audio code stream, whether the second audio code stream includes an echo comprises:

According to the first audio code stream and the plurality of the second audio code streams, detecting whether there is a second audio code stream including the echo in the plurality of the second audio code streams, and identifying the second audio code stream including the echo in the plurality of the second audio code streams when it is detected that there is a second audio code stream including the echo;

The step of determining the audio receiver of the echo according to the identity information includes:

An audio receiver generating the echo is determined according to the identity information corresponding to the second audio code stream including the echo.
The audio detection method according to claim 2, wherein the detecting whether the plurality of second audio code streams include an echo based on the first audio code stream and the plurality of second audio code streams comprises:

Performing coherence detection on the first audio code stream and each of the second audio code streams respectively;

It is determined whether the second audio code stream includes an echo according to the result of the coherence detection.
The audio detection method according to claim 1, characterized in that after determining the audio receiver generating the echo according to the identity information, it also includes:

The identity information is uploaded to a signaling server, so that after receiving the identity information, the signaling server sends a reminder message to an audio receiver corresponding to the identity information.
An audio detection device, characterized by comprising:

An audio sending module, the audio sending module is used to send a first audio code stream to the audio receiving module;

an audio receiving module, the audio receiving module being used to receive the first audio code stream and send a second audio code stream to the audio sending module, wherein the second audio code stream carries identity information of an audio receiver;

The audio sending module is further configured to receive the second audio code stream, detect whether the second audio code stream includes an echo according to the first audio code stream and the second audio code stream, and When an audio signal is received, the audio receiver generating the audio signal is determined according to the identity information.
The audio detection device according to claim 5, characterized in that there are multiple audio receiving modules that return the second audio code stream, wherein one audio receiving module corresponds to one second audio code stream;

The audio sending module is specifically used to: receive multiple second audio code streams returned by multiple audio receiving modules, detect whether there is a second audio code stream including the echo in the multiple second audio code streams according to the first audio code stream and the multiple second audio code streams, and when it is detected that there is a second audio code stream including the echo, identify the second audio code stream including the echo in the multiple second audio code streams; determine the audio receiver that generates the echo according to the identity information corresponding to the second audio code stream including the echo.
The audio detection device as described in claim 5 is characterized in that the audio sending module is specifically used to: perform coherence detection on the first audio code stream and each of the second audio code streams respectively; and determine whether the second audio code stream includes an echo based on the result of the coherence detection.
The audio detection device according to claim 5, characterized in that the audio detection device also includes a signaling server;

The audio sending module is used to upload the identity information to the signaling server;

The signaling server is used to send reminder information to the audio receiver corresponding to the identity information after receiving the identity information.
A computing device cluster, characterized in that it includes at least one computing device, each computing device includes a processor and a memory;

The processor of the at least one computing device is configured to execute instructions stored in the memory of the at least one computing device, so that the computing device cluster executes the method according to claim 1 .
A computer program product comprising instructions, characterized in that when the instructions are executed by a computing device cluster, the computing device cluster is caused to perform the method according to claim 1.
A computer-readable storage medium, characterized in that it includes computer program instructions. When the computer program instructions are executed by a computing device cluster, the computing device cluster executes the method as claimed in claim 1.