WO2024093264A1 - Audio detection method, apparatus and device - Google Patents

Audio detection method, apparatus and device Download PDF

Info

Publication number
WO2024093264A1
WO2024093264A1 PCT/CN2023/101461 CN2023101461W WO2024093264A1 WO 2024093264 A1 WO2024093264 A1 WO 2024093264A1 CN 2023101461 W CN2023101461 W CN 2023101461W WO 2024093264 A1 WO2024093264 A1 WO 2024093264A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio
code stream
audio code
echo
identity information
Prior art date
Application number
PCT/CN2023/101461
Other languages
French (fr)
Chinese (zh)
Inventor
陈亚克
Original Assignee
华为云计算技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为云计算技术有限公司 filed Critical 华为云计算技术有限公司
Publication of WO2024093264A1 publication Critical patent/WO2024093264A1/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/22Arrangements for supervision, monitoring or testing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/56Arrangements for connecting several subscribers to a common circuit, i.e. affording conference facilities

Definitions

  • the present application relates to the technical field of conference systems, and in particular to an audio detection method, device and equipment.
  • the generation and operation of a voice conference includes the following steps: the conference terminal establishes a connection with the conference media server; the conference media server receives the media stream sent by each conference terminal and forwards the media stream to each conference terminal; each conference terminal plays the media stream received from the conference media server, and the above process is repeated until the voice conference ends.
  • each conference terminal may have an echo.
  • conference staff locate the source of the echo by muting each conference terminal one by one, which takes a long time and affects the user experience.
  • the present application provides an audio detection method, apparatus and device, which can quickly identify the source of the echo and improve the user experience.
  • a first aspect of an embodiment of the present application provides an audio detection method, comprising: obtaining a second audio code stream returned by the audio receiver, wherein the second audio code stream carries identity information of the audio receiver; detecting whether the second audio code stream includes an echo based on the first audio code stream and the second audio code stream; and when the echo is detected, determining the audio receiver that generates the echo based on the identity information.
  • the embodiments of the present application have at least the following advantages:
  • Whether an echo is generated can be accurately determined based on the sent first audio code stream and the received second audio code stream; since the second audio code stream returned by the audio receiver contains the identity information of the audio receiver, when an echo is detected, the source of the echo can be quickly identified based on the identity information, which effectively improves the efficiency of echo detection and also improves the user experience.
  • the obtaining the second audio code stream returned by the audio receiver includes: obtaining multiple second audio code streams returned by multiple audio receivers, wherein one audio receiver corresponds to one second audio code stream; the detecting whether the second audio code stream includes an echo according to the first audio code stream and the second audio code stream includes: detecting whether the multiple second audio code streams include the echo according to the first audio code stream and the multiple second audio code streams, and when it is detected that the second audio code stream includes the echo, identifying out a second audio code stream including the echo from a plurality of the second audio code streams; and determining the audio receiver of the echo according to the identity information comprises: determining the audio receiver that generates the echo according to the identity information corresponding to the second audio code stream including the echo.
  • the identity information of the audio receiver that generates the echo can be determined.
  • detecting whether the plurality of second audio code streams include echoes according to the first audio code stream and the plurality of second audio code streams includes: performing coherence detection on the first audio code stream and each of the second audio code streams respectively; and determining whether any second audio code stream includes echoes according to a result of the coherence detection.
  • the audio code stream containing the echo can be accurately detected, thereby improving the reliability of the audio detection method.
  • the method further includes: uploading the identity information to a signaling server, so that after receiving the identity information, the signaling server sends a reminder message to the audio receiver corresponding to the identity information.
  • the audio receiver that generates the echo can be notified more quickly, so that the audio receiver can eliminate the echo as soon as possible, further improving the user experience.
  • a second aspect of an embodiment of the present application provides an audio detection device, comprising: an audio sending module, the audio sending module being used to send a first audio code stream to an audio receiving module; an audio receiving module, the audio receiving module being used to receive the first audio code stream and send a second audio code stream to the audio sending module, wherein the second audio code stream carries identity information of an audio receiver; the audio sending module is also used to receive the second audio code stream, detect whether the second audio code stream includes an echo based on the first audio code stream and the second audio code stream, and when the echo is detected, determine the audio receiver that generates the echo based on the identity information.
  • the audio sending module is specifically used to: perform coherence detection on the first audio code stream and each of the second audio code streams respectively; and determine whether any second audio code stream includes an echo based on a result of the coherence detection.
  • the audio sending module is specifically used to: receive multiple second audio code streams returned by multiple audio receiving modules, detect whether there is a second audio code stream including the echo among the multiple second audio code streams according to the first audio code stream and the multiple second audio code streams, and when it is detected that there is a second audio code stream including the echo, identify the second audio code stream including the echo among the multiple second audio code streams; determine the audio receiver that generates the echo according to the identity information corresponding to the second audio code stream including the echo.
  • the audio detection device also includes a signaling server; the audio sending module is used to upload the identity information to the signaling server; and the signaling server is used to send a reminder message to an audio receiver corresponding to the identity information after receiving the identity information.
  • the third aspect of an embodiment of the present application discloses a computing device cluster, including at least one computing device, each computing device including a processor and a memory; the processor of the at least one computing device is used to execute instructions stored in the memory of the at least one computing device, so that the computing device cluster performs the above-mentioned audio detection method.
  • a fourth aspect of an embodiment of the present application discloses a computer program product comprising instructions.
  • the instructions When the instructions are executed by a computing device cluster, the computing device cluster executes the above-mentioned audio detection method.
  • a fifth aspect of an embodiment of the present application discloses a computer-readable storage medium, including computer program instructions.
  • the computer program instructions are executed by a computing device cluster, the computing device cluster executes the above-mentioned audio detection method.
  • FIG1 is a diagram of an echo generation scene in the related art
  • FIG2 is a schematic diagram of a flow chart of an audio detection method provided in an embodiment of the present application.
  • FIG3 is a schematic diagram of a flow chart of an audio detection method provided in an embodiment of the present application.
  • FIG4 is a schematic diagram of a flow chart of an audio detection method provided in an embodiment of the present application.
  • FIG5 is a diagram of an interactive process of audio detection provided by an embodiment of the present application.
  • FIG6 is a schematic diagram of functional modules of an audio detection system provided in an embodiment of the present application.
  • FIG7 is a schematic diagram of the hardware structure of a computing device provided in an embodiment of the present application.
  • FIG8 is a schematic diagram of the hardware structure of a computing device cluster provided in an embodiment of the present application.
  • FIG. 9 is a schematic diagram of the hardware structure of a computing device cluster provided in an embodiment of the present application.
  • the terms “include”, “comprises” or any other variations thereof are intended to cover non-exclusive inclusion, so that a process, method, article or device that includes a series of elements includes not only those elements, but also other elements not explicitly listed, or also includes elements inherent to such process, method, article or device.
  • the term “includes” or “comprising” or any other variations thereof is intended to cover non-exclusive inclusion, so that a process, method, article or device that includes a series of elements includes not only those elements, but also other elements not explicitly listed, or also includes elements inherent to such process, method, article or device.
  • the phrase “comprising a " specifies an element, does not exclude the existence of other identical elements in the process, method, article or apparatus comprising the element.
  • “at least one” means one or more, and “more than one” means two or more than two.
  • “And/or” describes the association relationship of associated objects, indicating that three relationships may exist.
  • a and/or B can mean: A exists alone, A and B exist at the same time, and B exists alone, where A and B can be singular or plural.
  • the terms “first”, “second”, “third”, “fourth”, etc. (if any) in the specification, claims and drawings of this application are used to distinguish similar objects, rather than to describe a specific order or sequence.
  • words such as “exemplary” or “for example” are used to indicate examples, illustrations or descriptions. Any embodiment or design described as “exemplary” or “for example” in the embodiments of the present application should not be interpreted as being more preferred or more advantageous than other embodiments or designs. Specifically, the use of words such as “exemplary” or “for example” is intended to present related concepts in a specific way.
  • terminal A sends a normal audio code stream to the media server, and the media server forwards the normal audio code stream to terminals B, C, and D.
  • the dotted arrow indicates the transmission path of the echo audio code stream generated after terminal B leaks the echo.
  • the echo audio code stream is transmitted to the media server, it is forwarded by the media server to terminals A, C, and D.
  • Terminals A, C, and D hear the echo, resulting in poor sound effects for terminals A, C, and D.
  • Figure 2 is a flow chart of the audio detection method provided in an embodiment of the present application. This embodiment is applied to the audio sender side, as shown in Figure 2, and includes the following steps:
  • Step S101 Acquire a first audio code stream sent by an audio sender to an audio receiver.
  • the audio sender is a terminal device that generates the first audio code stream, such as a microphone, a computer, a mobile communication device, etc.
  • the audio sender can send the first audio code stream to one audio receiver, or can send the first audio code stream to multiple audio receivers.
  • the audio sender sends the first audio code stream to the conference media server, and the conference media server forwards the first audio code stream to the audio receiver.
  • the main functions of the conference media server are: media routing allocation, media code stream forwarding, etc.
  • the conference media server forwards the audio code stream to each terminal.
  • Step S102 Acquire a second audio code stream returned by the audio receiver, where the second audio code stream carries the identity information of the audio receiver.
  • the audio receiver is a terminal device that generates the second audio code stream, such as a microphone, a computer, a mobile communication device, etc.
  • the conference media server when joining a conference, will assign a unique ID to the audio receiver or the audio sender. Assuming that the audio receiver is terminal B, when terminal B joins the conference, the conference media server assigns a unique ID to terminal B, which is the identity information of terminal B.
  • all subsequent audio stream packets sent by terminal B contain the ID.
  • the terminal After receiving the audio code stream sent by terminal B, the terminal can identify that the audio code stream comes from terminal B by parsing the ID field in the audio code stream packet.
  • the identity information does not involve the information security of the terminal, but is only a field that represents the identity of the terminal.
  • Step S103 According to the first audio code stream and the second audio code stream, detecting whether the second audio code stream includes an echo.
  • the audio sender performs coherence detection on the audio code stream (first audio code stream) sent by itself and the received audio code stream (second audio code stream) to evaluate whether the audio sender receives an echo.
  • Step S104 when an echo is detected, determining the audio receiver generating the echo according to the identity information.
  • the audio sender can identify the identity of the audio receiver that sends the second audio stream by parsing the ID field in the second audio stream.
  • the embodiments of the present application have at least the following advantages: whether an echo is generated can be accurately determined based on the sent first audio code stream and the received second audio code stream; because the second audio code stream returned by the audio receiver contains the identity information of the audio receiver, when an echo is detected, the source of the echo can be quickly identified based on the identity information, which effectively improves the efficiency of echo detection and improves the user experience.
  • FIG. 3 is a flow chart of an audio detection method provided in an embodiment of the present application.
  • This embodiment is a further description of the aforementioned embodiment, and specifically describes: when there are multiple audio receivers, a specific method for determining the audio receiver that generates an echo.
  • This embodiment is applied to the audio sender side, as shown in FIG3 , and includes the following steps:
  • S201 Send a first audio code stream to multiple audio receivers.
  • S202 Acquire multiple second audio code streams returned by multiple audio receivers.
  • Steps S201 to S202 of this embodiment are similar to steps S101 to S102 of the aforementioned embodiment, and are not described again here to avoid repetition.
  • S203 Detecting, according to the first audio code stream and the plurality of second audio code streams, whether any of the plurality of second audio code streams includes an echo, and identifying, when detecting that any of the second audio code streams includes an echo, the second audio code stream including the echo in the plurality of second audio code streams.
  • an audio code stream containing an echo among multiple second audio code streams can be identified in the following manner: performing coherence detection on the first audio code stream and each second audio code stream respectively; determining whether an echo is generated based on the result of the coherence detection; and using the second audio code stream corresponding to the echo determined to be generated based on the result of the coherence detection as the audio code stream containing the echo.
  • the audio sender After receiving the 5 second audio code streams, the audio sender performs coherence detection on the first audio code stream and each second audio code stream respectively, and obtains 5 coherence detection results, so that it can be known which audio receiver generates the echo according to the detection results. It is understandable that multiple audio receivers may also generate the echo.
  • the auto-power spectral density of the first audio code stream and each second audio code stream is obtained,
  • the cross power spectral density of the first audio code stream and each second audio code stream is used for coherence detection by the following formula:
  • Gxy(f) is the cross power spectral density
  • Gxx(f) is the auto power spectral density of the first audio code stream
  • Gyy(f) is the auto power spectral density of the second audio code stream
  • Cxy(f) is the coherence.
  • the coherence is greater than or equal to 0 and less than or equal to 1, 0 represents completely irrelevant, and 1 represents completely coherent.
  • the coherence is greater than or equal to the constant K, it is determined that the audio receiver corresponding to the coherence has generated an echo. It can be understood that this embodiment does not specifically limit the size of the constant K, and it can be set according to actual needs.
  • S204 Determine an audio receiver generating the echo according to the identity information corresponding to the second audio code stream including the echo.
  • the embodiments of the present application have at least the following advantages: whether an echo is generated can be accurately determined based on the sent first audio code stream and the received second audio code stream; because the second audio code stream returned by the audio receiver contains the identity information of the audio receiver, when an echo is detected, the source of the echo can be quickly identified based on the identity information, which effectively improves the efficiency of echo detection and improves the user experience.
  • FIG 4 is a flow chart of an audio detection method provided in an embodiment of the present application.
  • This embodiment is a further improvement of the aforementioned embodiment, and the main improvement is that: in this embodiment, after determining the audio receiver that generates the echo according to the identity information, it also includes: uploading the identity information to a server, wherein the server sends a reminder message to the audio receiver corresponding to the identity information after receiving the identity information. In this way, the audio receiver that generates the echo can be notified more quickly, so that the audio receiver can eliminate the echo as soon as possible, further improving the user experience.
  • This embodiment is applied to the audio sender side, as shown in FIG4 , and includes the following steps:
  • Step S301 Acquire a first audio code stream sent by an audio sender to an audio receiver.
  • Step S302 Acquire a second audio code stream returned by the audio receiver, where the second audio code stream carries the identity information of the audio receiver.
  • Step S303 According to the first audio code stream and the second audio code stream, detecting whether the second audio code stream includes an echo.
  • Step S304 when an echo is detected, determining the audio receiver generating the echo according to the identity information.
  • Steps S301 to S304 of this embodiment are similar to steps S101 to S104 of the above-mentioned embodiment, and are not described again here to avoid repetition.
  • Step S305 Upload the identity information to the signaling server.
  • the audio sender uploads the identity information to the signaling server, and after receiving the identity information, the signaling server sends a reminder message to the audio receiver corresponding to the identity information. After receiving the reminder message, the audio receiver turns off the microphone to eliminate the echo.
  • terminal A Assume that the audio sender is terminal A, and the audio receivers are terminals B, C, and D. Terminals A, B, C, and D are connected to the conference system. The terminals D forward their respective audio code streams through the media server.
  • the conference system has a terminal A downlink echo detection module, which is used to detect whether terminals B, C and D generate echoes.
  • the downlink echo detection module of terminal A shown in FIG. 4 is a hardware device. In practical applications, the downlink echo detection module of terminal A may also be a functional module of terminal A.
  • Terminal A sends a first audio code stream to the media server and the downlink echo detection module of terminal A, and the media server forwards the first audio code stream to terminals B, C, and D.
  • Terminal B, terminal C and terminal D send the second audio code stream to the media server respectively, and the media server forwards the second audio code stream to terminal A and the downlink echo detection module of terminal A. It can be understood that terminal A and the downlink echo detection module of terminal A will receive three different second audio code streams, and the second audio code streams sent by terminal B, terminal C and terminal D are different from each other, and the second audio code stream sent by each terminal contains its own identity information.
  • the downlink echo detection module of terminal A performs coherence detection on the first audio code stream and each second audio code stream respectively. When an echo is detected, the module analyzes the second audio code stream and detects that the echo is generated by terminal B.
  • the signaling server sends a reminder message to terminal B, and terminal B turns off the microphone or replaces another microphone device to eliminate the echo.
  • the conference system accessed by the above-mentioned terminal A, terminal B, terminal C and terminal D has a conference background management system, and the main function of the conference background management system is to monitor the operation of the entire conference system and summarize the data.
  • the data sources of the conference background management system are: terminal A, terminal B, terminal C, terminal D, media server and signaling server will actively report data to the conference background management system. Therefore, when it is detected that terminal B generates an echo, the downlink echo detection module of terminal A will also report the information to the conference background management system.
  • the embodiments of the present application have at least the following advantages: whether an echo is generated can be accurately determined based on the sent first audio code stream and the received second audio code stream; because the second audio code stream returned by the audio receiver contains the identity information of the audio receiver, when an echo is detected, the source of the echo can be quickly identified based on the identity information, which effectively improves the efficiency of echo detection and improves the user experience.
  • FIG. 6 is a functional module diagram of an audio detection device 600 provided in an embodiment of the present application, including: an audio sending module 10, the audio sending module 10 is used to send a first audio code stream to an audio receiving module 20; an audio receiving module 20, the audio receiving module 20 is used to receive the first audio code stream, and send a second audio code stream to the audio sending module 10, wherein the second audio code stream carries the identity information of the audio receiver; the audio sending module 10 is also used to receive the second audio code stream, and according to the first audio code stream and the second audio code stream, detect whether the second audio code stream includes an echo, and when an echo is detected, determine the audio receiver that generates the echo according to the identity information.
  • the module referred to in the embodiment of the present application can be a program segment that completes a specific function, which is more suitable for describing the execution process of the software in the processor than a program.
  • the one or more modules can be stored in a memory and configured to be executed by one or more processors.
  • a conference media server 30 is included, which may be an audio A functional module of the detection device 600, the conference media server 30 can also be a hardware device or functional module independent of the audio detection device 600.
  • the conference media server 30 is a hardware device independent of the audio detection device 600, the audio sending module 10 sends the first audio code stream to the conference media server 30, and the conference media server 30 forwards the first audio code stream to the audio receiving module 20; the audio receiving module 20 sends the second audio code stream to the conference media server 30, and the conference media server 30 forwards the second audio code stream to the audio sending module 10.
  • the audio sending module 10 is specifically used to: receive multiple second audio code streams returned by multiple audio receiving modules 20, detect whether an echo is generated according to the first audio code stream and the multiple second audio code streams, and when an echo is detected, identify the audio code stream containing the echo in the multiple second audio code streams; determine the audio receiver that generates the echo according to the identity information corresponding to the audio code stream containing the echo.
  • the audio sending module 10 performs coherence detection on the first audio code stream and each second audio code stream respectively; and determines whether any second audio code stream includes an echo according to the result of the coherence detection.
  • the audio detection device 600 also includes a signaling server 40, and the audio sending module 10 is used to upload the identity information to the signaling server 40; the signaling server 40 is used to send a reminder message to the audio receiver corresponding to the identity information after receiving the identity information.
  • the signaling server 40 is a hardware device or functional module independent of the audio detection device 600 .
  • FIG. 7 is a schematic diagram of the hardware structure of a computing device 100 provided in an embodiment of the present application.
  • the computing device 100 includes: a bus 102, a processor 104, a memory 106, and a communication interface 108.
  • the processor 104, the memory 106, and the communication interface 108 communicate through the bus 102.
  • the computing device 100 can be a server or a terminal device. It should be understood that the present application does not limit the number of processors and memories in the computing device 100.
  • the bus 102 may be a peripheral component interconnect (PCI) bus or an extended industry standard architecture (EISA) bus, etc.
  • the bus may be divided into an address bus, a data bus, a control bus, etc.
  • FIG. 7 is represented by only one line, but does not mean that there is only one bus or one type of bus.
  • the bus 104 may include a path for transmitting information between various components of the computing device 100 (e.g., the memory 106, the processor 104, the communication interface 108).
  • the processor 104 may include any one or more processors such as a central processing unit (CPU), a graphics processing unit (GPU), a microprocessor (MP) or a digital signal processor (DSP).
  • processors such as a central processing unit (CPU), a graphics processing unit (GPU), a microprocessor (MP) or a digital signal processor (DSP).
  • CPU central processing unit
  • GPU graphics processing unit
  • MP microprocessor
  • DSP digital signal processor
  • the memory 106 may include a volatile memory, such as a random access memory (RAM).
  • the processor 104 may also include a non-volatile memory, such as a read-only memory (ROM), a flash memory, a hard disk drive (HDD), or a solid state drive (SSD).
  • ROM read-only memory
  • HDD hard disk drive
  • SSD solid state drive
  • the memory 106 stores executable program code, and the processor 104 executes the executable program code.
  • the codes are used to implement the functions of the aforementioned audio sending module and audio receiving module respectively, thereby implementing the audio detection method. That is, the memory 106 stores instructions for executing the audio detection method.
  • the communication interface 103 uses a transceiver module such as, but not limited to, a network interface card or a transceiver to implement communication between the computing device 100 and other devices or a communication network.
  • a transceiver module such as, but not limited to, a network interface card or a transceiver to implement communication between the computing device 100 and other devices or a communication network.
  • the embodiment of the present application also provides a computing device cluster.
  • the computing device cluster includes at least one computing device.
  • the computing device can be a server, such as a central server, an edge server, or a local server in a local data center.
  • the computing device can also be a terminal device such as a desktop computer, a laptop computer, or a smart phone.
  • the computing device cluster includes at least one computing device 100.
  • the memory 106 in one or more computing devices 100 in the computing device cluster may store the same instructions for executing the audio detection method.
  • the memory 106 of one or more computing devices 100 in the computing device cluster may also store partial instructions for executing the audio detection method.
  • the combination of one or more computing devices 100 may jointly execute instructions for executing the audio detection method.
  • the memory 106 in different computing devices 100 in the computing device cluster can store different instructions, which are respectively used to execute part of the functions of the audio detection module. That is, the instructions stored in the memory 106 in different computing devices 100 can implement the functions of one or more modules in the audio sending module and the audio receiving module.
  • one or more computing devices in the computing device cluster can be connected via a network.
  • the network can be a wide area network or a local area network, etc.
  • FIG. 9 shows a possible implementation. As shown in FIG. 8 , two computing devices 100A and 100B are connected via a network. Specifically, the network is connected via a communication interface in each computing device.
  • the memory 106 in the computing device 100A stores instructions for executing the functions of the audio transmission module. Meanwhile, the memory 106 in the computing device 100B stores instructions for executing the functions of the audio receiving module.
  • connection method between the computing device clusters shown in Figure 9 can be considered to be that the audio detection method provided in this application requires a large amount of data storage and analysis, so it is considered to hand over the functions implemented by the audio receiving module to the computing device 100B for execution.
  • the functions of the computing device 100A shown in FIG9 may also be completed by multiple computing devices 100.
  • the functions of the computing device 100B may also be completed by multiple computing devices 100.
  • the embodiment of the present application also provides another computing device cluster.
  • the connection relationship between the computing devices in the computing device cluster can be similar to the connection mode of the computing device cluster described in Figures 8 and 9.
  • the difference is that the memory 106 in one or more computing devices 100 in the computing device cluster can store the same instructions for executing the above-mentioned audio detection method.
  • the memory 106 of one or more computing devices 100 in the computing device cluster may also store partial instructions for executing the audio detection method.
  • the combination of one or more computing devices 100 may jointly execute instructions for executing the audio detection method.
  • the present application also provides a computer program product including instructions.
  • the computer program product may be a computer program product including instructions that can be run on a computing device or stored in any available medium.
  • the software or program product when executed on at least one computing device, causes the at least one computing device to perform the audio detection method.
  • This embodiment also provides a computer-readable storage medium, which can be any available medium that can be stored by a computing device or a data storage device such as a data center containing one or more available media.
  • the available medium can be a magnetic medium (e.g., a floppy disk, a hard disk, a tape), an optical medium (e.g., a DVD), or a semiconductor medium (e.g., a solid-state hard disk), etc.
  • the computer-readable storage medium includes instructions that instruct the computing device to execute the audio detection method.
  • the disclosed devices and methods can be implemented in other ways.
  • the device embodiments described above are schematic.
  • the division of the modules or units is a logical function division. There may be other division methods in actual implementation, such as multiple units or components can be combined or integrated into another device, or some features can be ignored or not executed.
  • Another point is that the mutual coupling or direct coupling or communication connection shown or discussed can be through some interfaces, indirect coupling or communication connection of devices or units, which can be electrical, mechanical or other forms.
  • the unit described as a separate component may or may not be physically separated, and the component shown as a unit may be one physical unit or multiple physical units, that is, it may be located in one place or distributed in multiple different places. Some or all of the units may be selected according to actual needs to achieve the purpose of the present embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit may be implemented in the form of hardware or in the form of software functional units.
  • the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a readable storage medium.
  • the technical solution of the embodiment of the present application is essentially or the part that contributes to the prior art or all or part of the technical solution can be embodied in the form of a software product, which is stored in a storage medium and includes several instructions to enable a device (which can be a single-chip microcomputer, chip, etc.) or a processor (processor) to execute all or part of the steps of the method described in each embodiment of the present application.
  • the aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (ROM), random access memory (RAM), disk or optical disk and other media that can store program code.

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Multimedia (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Quality & Reliability (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Telephonic Communication Services (AREA)
  • Telephone Function (AREA)

Abstract

The present application provides an audio detection method, apparatus and device. The method comprises: acquiring a first audio code stream sent by an audio sender to an audio receiver; acquiring a second audio code stream returned by the audio receiver, wherein the second audio code stream carries identity information of the audio receiver; according to the first audio code stream and the second audio code stream, detecting whether the second audio code stream comprises an echo; and when the echo is detected, according to the identity information, determining the audio receiver generating the echo. The present application can achieve quickly identifying the source of echo generation, and improves the use experience of users.

Description

音频检测方法、装置及设备Audio detection method, device and equipment
本申请要求于2022年10月31日提交中国专利局,申请号为202211352488.5、申请名称为“音频检测方法、装置及设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims priority to the Chinese patent application filed with the China Patent Office on October 31, 2022, with application number 202211352488.5 and application name “Audio detection method, device and equipment”, the entire contents of which are incorporated by reference in this application.
技术领域Technical Field
本申请涉及会议系统技术领域,尤其涉及一种音频检测方法、装置及设备。The present application relates to the technical field of conference systems, and in particular to an audio detection method, device and equipment.
背景技术Background technique
相关技术中,语音会议的生成和运行包括以下步骤:会议终端与会议媒体服务器建立连接;会议媒体服务器接收各个会议终端发送的媒体流,并将媒体流转发给各个会议终端;各个会议终端播放从会议媒体服务器接收的媒体流,重复上述过程,直至语音会议结束。In the related art, the generation and operation of a voice conference includes the following steps: the conference terminal establishes a connection with the conference media server; the conference media server receives the media stream sent by each conference terminal and forwards the media stream to each conference terminal; each conference terminal plays the media stream received from the conference media server, and the above process is repeated until the voice conference ends.
在语音会议过程中,各个会议终端均有可能出现回声。目前,会议工作人员通过逐个静音各个会议终端来定位回声来源,耗时较长,影响用户体验。During a voice conference, each conference terminal may have an echo. Currently, conference staff locate the source of the echo by muting each conference terminal one by one, which takes a long time and affects the user experience.
发明内容Summary of the invention
有鉴于此,本申请提供一种音频检测方法、装置及设备,其能够快速识别回声产生的来源,提高用户的使用体验。In view of this, the present application provides an audio detection method, apparatus and device, which can quickly identify the source of the echo and improve the user experience.
本申请的实施例第一方面提供一种音频检测方法,包括:获取所述音频接收方返回的第二音频码流,其中,所述第二音频码流中携带所述音频接收方的身份信息;根据所述第一音频码流和所述第二音频码流,检测所述第二音频码流是否包括回声;在检测到所述回声时,根据所述身份信息确定产生所述回声的音频接收方。A first aspect of an embodiment of the present application provides an audio detection method, comprising: obtaining a second audio code stream returned by the audio receiver, wherein the second audio code stream carries identity information of the audio receiver; detecting whether the second audio code stream includes an echo based on the first audio code stream and the second audio code stream; and when the echo is detected, determining the audio receiver that generates the echo based on the identity information.
与相关技术相比,本申请的实施例至少具有以下优点:Compared with the related art, the embodiments of the present application have at least the following advantages:
根据发送的第一音频码流和接收的第二音频码流能够准确判断是否有回声产生;由于音频接收方返回的第二音频码流中包含音频接收方的身份信息,使得在检测到有回声产生时,能够根据身份信息快速识别回声的产生来源,有效地提升了回声检测的效率,也提高了用户的使用体验。Whether an echo is generated can be accurately determined based on the sent first audio code stream and the received second audio code stream; since the second audio code stream returned by the audio receiver contains the identity information of the audio receiver, when an echo is detected, the source of the echo can be quickly identified based on the identity information, which effectively improves the efficiency of echo detection and also improves the user experience.
在一些可能的实现方式中,所述获取所述音频接收方返回的第二音频码流,包括:获取多个所述音频接收方返回的多个所述第二音频码流,其中,一个所述音频接收方对应一个所述第二音频码流;所述根据所述第一音频码流和所述第二音频码流,检测所述第二音频码流是否包括回声,包括:根据所述第一音频码流和多个所述第二音频码流,检测多个所述第二音频码流是否包括所述回声,并在检测到存在所述第二音频码流包括所述回声时,识别 出多个所述第二音频码流中包括所述回声的第二音频码流;所述根据所述身份信息确定所述回声的音频接收方,包括:根据与所述包括回声的第二音频码流对应的身份信息,确定产生所述回声的音频接收方。In some possible implementations, the obtaining the second audio code stream returned by the audio receiver includes: obtaining multiple second audio code streams returned by multiple audio receivers, wherein one audio receiver corresponds to one second audio code stream; the detecting whether the second audio code stream includes an echo according to the first audio code stream and the second audio code stream includes: detecting whether the multiple second audio code streams include the echo according to the first audio code stream and the multiple second audio code streams, and when it is detected that the second audio code stream includes the echo, identifying out a second audio code stream including the echo from a plurality of the second audio code streams; and determining the audio receiver of the echo according to the identity information comprises: determining the audio receiver that generates the echo according to the identity information corresponding to the second audio code stream including the echo.
通过采用该技术方案,能够在音频接收方为多个时,确定产生回声的音频接收方的身份信息。By adopting this technical solution, when there are multiple audio receivers, the identity information of the audio receiver that generates the echo can be determined.
在一些可能的实现方式中,所述根据所述第一音频码流和多个所述第二音频码流,检测多个所述第二音频码流是否包括回声,包括:将所述第一音频码流分别和每个所述第二音频码流进行相干性检测;根据所述相干性检测的结果确定是否存在第二音频码流包括回声。In some possible implementations, detecting whether the plurality of second audio code streams include echoes according to the first audio code stream and the plurality of second audio code streams includes: performing coherence detection on the first audio code stream and each of the second audio code streams respectively; and determining whether any second audio code stream includes echoes according to a result of the coherence detection.
通过采用该技术方案,能够准确的检测出包含回声的音频码流,从而提高了音频检测方法的可靠性。By adopting this technical solution, the audio code stream containing the echo can be accurately detected, thereby improving the reliability of the audio detection method.
在一些可能的实现方式中,在根据所述身份信息确定产生所述回声的音频接收方之后,还包括:将所述身份信息上传至信令服务器,以使得所述信令服务器在接收到所述身份信息后,向与所述身份信息对应的音频接收方发送提醒信息。In some possible implementations, after determining the audio receiver that generates the echo based on the identity information, the method further includes: uploading the identity information to a signaling server, so that after receiving the identity information, the signaling server sends a reminder message to the audio receiver corresponding to the identity information.
通过采用该技术方案,能够更快的通知产生回声的音频接收方,以使音频接收方能够尽快消除回声,进一步提高用户的使用体验。By adopting this technical solution, the audio receiver that generates the echo can be notified more quickly, so that the audio receiver can eliminate the echo as soon as possible, further improving the user experience.
本申请的实施例第二方面提供了一种音频检测装置,包括:音频发送模块,所述音频发送模块用于向音频接收模块发送第一音频码流;音频接收模块,所述音频接收模块用于接收所述第一音频码流,并向所述音频发送模块发送第二音频码流,其中,所述第二音频码流中携带音频接收方的身份信息;所述音频发送模块还用于接收所述第二音频码流,根据所述第一音频码流和所述第二音频码流,检测所述第二音频码流是否包括回声,并在检测到所述回声时,根据所述身份信息确定产生所述回声的音频接收方。A second aspect of an embodiment of the present application provides an audio detection device, comprising: an audio sending module, the audio sending module being used to send a first audio code stream to an audio receiving module; an audio receiving module, the audio receiving module being used to receive the first audio code stream and send a second audio code stream to the audio sending module, wherein the second audio code stream carries identity information of an audio receiver; the audio sending module is also used to receive the second audio code stream, detect whether the second audio code stream includes an echo based on the first audio code stream and the second audio code stream, and when the echo is detected, determine the audio receiver that generates the echo based on the identity information.
在一些可能的实现方式中,所述音频发送模块具体用于:将所述第一音频码流分别和每个所述第二音频码流进行相干性检测;根据所述相干性检测的结果确定是否存在第二音频码流包括回声。In some possible implementations, the audio sending module is specifically used to: perform coherence detection on the first audio code stream and each of the second audio code streams respectively; and determine whether any second audio code stream includes an echo based on a result of the coherence detection.
在一些可能的实现方式中,返回所述第二音频码流的所述音频接收模块为多个,其中,一个所述音频接收模块对应一个所述第二音频码流;所述音频发送模块具体用于:接收多个所述音频接收模块返回的多个所述第二音频码流,根据所述第一音频码流和多个所述第二音频码流,检测多个所述第二音频码流是否存在第二音频码流包括所述回声,并在检测到存在所述第二音频码流包括所述回声时,识别出多个所述第二音频码流中包括所述回声的第二音频码流;根据与所述包括回声的第二音频码流对应的身份信息,确定产生所述回声的音频接收方。In some possible implementations, there are multiple audio receiving modules that return the second audio code stream, wherein one audio receiving module corresponds to one second audio code stream; the audio sending module is specifically used to: receive multiple second audio code streams returned by multiple audio receiving modules, detect whether there is a second audio code stream including the echo among the multiple second audio code streams according to the first audio code stream and the multiple second audio code streams, and when it is detected that there is a second audio code stream including the echo, identify the second audio code stream including the echo among the multiple second audio code streams; determine the audio receiver that generates the echo according to the identity information corresponding to the second audio code stream including the echo.
在一些可能的实现方式中,所述音频检测装置还包括信令服务器;所述音频发送模块用于将所述身份信息上传至所述信令服务器;所述信令服务器用于在接收到所述身份信息后,向与所述身份信息对应的音频接收方发送提醒信息。 In some possible implementations, the audio detection device also includes a signaling server; the audio sending module is used to upload the identity information to the signaling server; and the signaling server is used to send a reminder message to an audio receiver corresponding to the identity information after receiving the identity information.
本申请实施例第三方面公开了一种计算设备集群,包括至少一个计算设备,每个计算设备包括处理器和存储器;所述至少一个计算设备的处理器用于执行所述至少一个计算设备的存储器中存储的指令,以使得所述计算设备集群执行上述的音频检测方法。The third aspect of an embodiment of the present application discloses a computing device cluster, including at least one computing device, each computing device including a processor and a memory; the processor of the at least one computing device is used to execute instructions stored in the memory of the at least one computing device, so that the computing device cluster performs the above-mentioned audio detection method.
本申请实施例第四方面公开了一种包含指令的计算机程序产品,当所述指令被计算设备集群运行时,使得所述计算设备集群执行上述的音频检测方法。A fourth aspect of an embodiment of the present application discloses a computer program product comprising instructions. When the instructions are executed by a computing device cluster, the computing device cluster executes the above-mentioned audio detection method.
本申请实施例第五方面公开了一种计算机可读存储介质,包括计算机程序指令,当所述计算机程序指令由计算设备集群执行时,所述计算设备集群执行上述的音频检测方法。A fifth aspect of an embodiment of the present application discloses a computer-readable storage medium, including computer program instructions. When the computer program instructions are executed by a computing device cluster, the computing device cluster executes the above-mentioned audio detection method.
可以理解地,上述提供的第三方面的计算机设备集群,第四方面的计算机程序产品和第五方面的计算机可读存储介质均与上述第一方面的方法对应,因此,其所能达到的有益效果可参考上文所提供的对应的方法中的有益效果,此处不再赘述。It can be understood that the computer device cluster of the third aspect, the computer program product of the fourth aspect and the computer-readable storage medium of the fifth aspect provided above all correspond to the method of the first aspect. Therefore, the beneficial effects that can be achieved can refer to the beneficial effects in the corresponding methods provided above and will not be repeated here.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
图1为相关技术的回声产生场景图;FIG1 is a diagram of an echo generation scene in the related art;
图2为本申请一实施例提供的音频检测方法的流程示意图;FIG2 is a schematic diagram of a flow chart of an audio detection method provided in an embodiment of the present application;
图3为本申请一实施例提供的音频检测方法的流程示意图;FIG3 is a schematic diagram of a flow chart of an audio detection method provided in an embodiment of the present application;
图4为本申请一实施例提供的音频检测方法的流程示意图;FIG4 is a schematic diagram of a flow chart of an audio detection method provided in an embodiment of the present application;
图5为本申请一实施例提供的音频检测的交互过程图;FIG5 is a diagram of an interactive process of audio detection provided by an embodiment of the present application;
图6为本申请一实施例提供的音频检测系统的功能模块示意图;FIG6 is a schematic diagram of functional modules of an audio detection system provided in an embodiment of the present application;
图7为本申请一实施例提供的计算设备的硬件结构示意图;FIG7 is a schematic diagram of the hardware structure of a computing device provided in an embodiment of the present application;
图8为本申请一实施例提供的计算设备集群的硬件结构示意图;FIG8 is a schematic diagram of the hardware structure of a computing device cluster provided in an embodiment of the present application;
图9为本申请一实施例提供的计算设备集群的硬件结构示意图。FIG. 9 is a schematic diagram of the hardware structure of a computing device cluster provided in an embodiment of the present application.
具体实施方式Detailed ways
为了能够更清楚地理解本申请的上述目的、特征和优点,下面结合附图和具体实施方式对本申请进行详细描述。需要说明的是,在不冲突的情况下,本申请的实施方式及实施方式中的特征可以相互组合。In order to more clearly understand the above-mentioned purposes, features and advantages of the present application, the present application is described in detail below in conjunction with the accompanying drawings and specific implementations. It should be noted that the implementations of the present application and the features in the implementations can be combined with each other without conflict.
在下面的描述中阐述了很多具体细节以便于充分理解本申请,所描述的实施方式仅是本申请一部分实施方式,而不是全部的实施方式。In the following description, many specific details are set forth to facilitate a full understanding of the present application. The described implementations are only part of the implementations of the present application, rather than all of the implementations.
除非另有定义,本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同。本文中在本申请的说明书中所使用的术语只是为了描述具体的实施方式的目的,不是旨在于限制本申请。Unless otherwise defined, all technical and scientific terms used herein have the same meaning as those commonly understood by those skilled in the art to which this application belongs. The terms used herein in the specification of this application are only for the purpose of describing specific embodiments and are not intended to limit this application.
进一步需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者装置所固有的要素。在没有更多限制的情况下,由语 句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者装置中还存在另外的相同要素。It should be further noted that, in this document, the terms "include", "comprises" or any other variations thereof are intended to cover non-exclusive inclusion, so that a process, method, article or device that includes a series of elements includes not only those elements, but also other elements not explicitly listed, or also includes elements inherent to such process, method, article or device. In the absence of more limitations, the term "includes" or "comprising" or any other variations thereof is intended to cover non-exclusive inclusion, so that a process, method, article or device that includes a series of elements includes not only those elements, but also other elements not explicitly listed, or also includes elements inherent to such process, method, article or device. The phrase "comprising a ..." specifies an element, does not exclude the existence of other identical elements in the process, method, article or apparatus comprising the element.
本申请中“至少一个”是指一个或者多个,“多个”是指两个或多于两个。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B可以表示:单独存在A,同时存在A和B,单独存在B的情况,其中A,B可以是单数或者复数。本申请的说明书和权利要求书及附图中的术语“第一”、“第二”、“第三”、“第四”等(如果存在)是用于区别类似的对象,而不是用于描述特定的顺序或先后次序。In this application, "at least one" means one or more, and "more than one" means two or more than two. "And/or" describes the association relationship of associated objects, indicating that three relationships may exist. For example, A and/or B can mean: A exists alone, A and B exist at the same time, and B exists alone, where A and B can be singular or plural. The terms "first", "second", "third", "fourth", etc. (if any) in the specification, claims and drawings of this application are used to distinguish similar objects, rather than to describe a specific order or sequence.
在本申请实施例中,“示例性的”或者“例如”等词用于表示作例子、例证或说明。本申请实施例中被描述为“示例性的”或者“例如”的任何实施例或设计方案不应被解释为比其它实施例或设计方案更优选或更具优势。确切而言,使用“示例性的”或者“例如”等词旨在以具体方式呈现相关概念。In the embodiments of the present application, words such as "exemplary" or "for example" are used to indicate examples, illustrations or descriptions. Any embodiment or design described as "exemplary" or "for example" in the embodiments of the present application should not be interpreted as being more preferred or more advantageous than other embodiments or designs. Specifically, the use of words such as "exemplary" or "for example" is intended to present related concepts in a specific way.
如图1所示,为相关技术的回声产生场景图。相关技术中,终端A向媒体服务器发出正常的音频码流,媒体服务器将正常的音频码流转发至终端B、终端C和终端D。虚线箭头表示终端B漏回声之后产生的回声音频码流的传输路径,回声音频码流传输至媒体服务器后,由媒体服务器转发至终端A、终端C和终端D,终端A、终端C和终端D听到回声,导致终端A、终端C和终端D的声音效果不佳。As shown in Figure 1, it is a scene diagram of echo generation in the related art. In the related art, terminal A sends a normal audio code stream to the media server, and the media server forwards the normal audio code stream to terminals B, C, and D. The dotted arrow indicates the transmission path of the echo audio code stream generated after terminal B leaks the echo. After the echo audio code stream is transmitted to the media server, it is forwarded by the media server to terminals A, C, and D. Terminals A, C, and D hear the echo, resulting in poor sound effects for terminals A, C, and D.
然而,相关技术中没有检测回声来源的方法,需要通过逐个静音各个终端来定位回声来源,耗时较长,影响用户体验。However, there is no method for detecting the source of the echo in the related art, and the echo source needs to be located by muting each terminal one by one, which is time-consuming and affects the user experience.
请参考图2,为本申请实施例提供的音频检测方法的流程图。本实施例应用于音频发送方侧,如图2所示,包括以下步骤:Please refer to Figure 2, which is a flow chart of the audio detection method provided in an embodiment of the present application. This embodiment is applied to the audio sender side, as shown in Figure 2, and includes the following steps:
步骤S101:获取音频发送方发往音频接收方的第一音频码流。Step S101: Acquire a first audio code stream sent by an audio sender to an audio receiver.
在一些实施例中,音频发送方即为产生第一音频码流的终端设备,如麦克风、电脑、移动通讯设备等。音频发送方可以向一个音频接收方发送第一音频码流,也可以向多个音频接收方发送第一音频码流。In some embodiments, the audio sender is a terminal device that generates the first audio code stream, such as a microphone, a computer, a mobile communication device, etc. The audio sender can send the first audio code stream to one audio receiver, or can send the first audio code stream to multiple audio receivers.
在一些实施例中,音频发送方将第一音频码流发送至会议媒体服务器,会议媒体服务器将第一音频码流转发至音频接收方。In some embodiments, the audio sender sends the first audio code stream to the conference media server, and the conference media server forwards the first audio code stream to the audio receiver.
在一些实施例中,会议媒体服务器的主要功能为:媒体路由分配、媒体码流转发等。本实施例通过会议媒体服务器将音频码流转发给各个终端。In some embodiments, the main functions of the conference media server are: media routing allocation, media code stream forwarding, etc. In this embodiment, the conference media server forwards the audio code stream to each terminal.
步骤S102:获取音频接收方返回的第二音频码流,第二音频码流中携带音频接收方的身份信息。Step S102: Acquire a second audio code stream returned by the audio receiver, where the second audio code stream carries the identity information of the audio receiver.
在一些实施例中,音频接收方为产生第二音频码流的终端设备,如麦克风、电脑、移动通讯设备等。In some embodiments, the audio receiver is a terminal device that generates the second audio code stream, such as a microphone, a computer, a mobile communication device, etc.
在一些实施例中,无论是音频接收方还是音频发送方,在加入会议时,会议媒体服务器均会分配一个会议中唯一的ID。假设音频接收方为终端B,终端B加入会议时,会议媒体服务器分配一个会议中唯一的ID给终端B,该ID即为终端B的身份信息。In some embodiments, when joining a conference, the conference media server will assign a unique ID to the audio receiver or the audio sender. Assuming that the audio receiver is terminal B, when terminal B joins the conference, the conference media server assigns a unique ID to terminal B, which is the identity information of terminal B.
通过此种方式,终端B后续发送的所有音频码流包里面均包含该ID。其他 终端收到终端B发送的音频码流后,通过解析音频码流包中的ID字段,即可识别出该音频码流来自终端B。In this way, all subsequent audio stream packets sent by terminal B contain the ID. After receiving the audio code stream sent by terminal B, the terminal can identify that the audio code stream comes from terminal B by parsing the ID field in the audio code stream packet.
在一些实施例中,身份信息并不涉及终端的信息安全,仅为表征终端的身份的字段。In some embodiments, the identity information does not involve the information security of the terminal, but is only a field that represents the identity of the terminal.
步骤S103:根据第一音频码流和第二音频码流,检测第二音频码流检测是否包括回声。Step S103: According to the first audio code stream and the second audio code stream, detecting whether the second audio code stream includes an echo.
在一些实施例中,音频发送方对自己发出的音频码流(第一音频码流)和收到的音频码流(第二音频码流)进行相干性检测,以评估音频发送方是否收到回声。In some embodiments, the audio sender performs coherence detection on the audio code stream (first audio code stream) sent by itself and the received audio code stream (second audio code stream) to evaluate whether the audio sender receives an echo.
相干性检测的具体过程会在后文中详细描述,为了避免重复,此处不再赘述。The specific process of coherence detection will be described in detail later, and will not be described here again to avoid repetition.
步骤S104:在检测到回声时,根据身份信息确定产生回声的音频接收方。Step S104: when an echo is detected, determining the audio receiver generating the echo according to the identity information.
在一些实施例中,由前述内容可知,音频发送方通过解析第二音频码流中的ID字段,即可识别出发送第二音频码流的音频接收方的身份。In some embodiments, it can be known from the foregoing content that the audio sender can identify the identity of the audio receiver that sends the second audio stream by parsing the ID field in the second audio stream.
与相关技术相比,本申请的实施例至少具有以下优点:根据发送的第一音频码流和接收的第二音频码流能够准确判断是否有回声产生;由于音频接收方返回的第二音频码流中包含音频接收方的身份信息,使得在检测到有回声产生时,能够根据身份信息快速识别回声的产生来源,有效地提升了回声检测的效率,也提高了用户的使用体验。Compared with the related art, the embodiments of the present application have at least the following advantages: whether an echo is generated can be accurately determined based on the sent first audio code stream and the received second audio code stream; because the second audio code stream returned by the audio receiver contains the identity information of the audio receiver, when an echo is detected, the source of the echo can be quickly identified based on the identity information, which effectively improves the efficiency of echo detection and improves the user experience.
请参考图3,为本申请一实施例提供的音频检测方法的流程图,本实施例是对前述实施例的进一步说明,具体说明了:当音频接收方为多个时,确定产生回声的音频接收方的具体方式。Please refer to FIG. 3 , which is a flow chart of an audio detection method provided in an embodiment of the present application. This embodiment is a further description of the aforementioned embodiment, and specifically describes: when there are multiple audio receivers, a specific method for determining the audio receiver that generates an echo.
本实施例应用于音频发送方侧,如图3所示,包括以下步骤:This embodiment is applied to the audio sender side, as shown in FIG3 , and includes the following steps:
S201:向多个音频接收方发送第一音频码流。S201: Send a first audio code stream to multiple audio receivers.
S202:获取多个音频接收方返回的多个第二音频码流。S202: Acquire multiple second audio code streams returned by multiple audio receivers.
本实施例的步骤S201至步骤S202与前述实施例的步骤S101至步骤S102类似,为了避免重复,此处不再赘述。Steps S201 to S202 of this embodiment are similar to steps S101 to S102 of the aforementioned embodiment, and are not described again here to avoid repetition.
S203:根据第一音频码流和多个第二音频码流,检测多个第二音频码流中是否存在第二音频码流包括回声,并在检测到存在第二音频码流包括回声时,识别出多个第二音频码流中包括回声的第二音频码流。S203: Detecting, according to the first audio code stream and the plurality of second audio code streams, whether any of the plurality of second audio code streams includes an echo, and identifying, when detecting that any of the second audio code streams includes an echo, the second audio code stream including the echo in the plurality of second audio code streams.
在一些实施例中,可以通过如下方式识别出多个第二音频码流中包含回声的音频码流:将第一音频码流分别和每个第二音频码流进行相干性检测;根据相干性检测的结果确定是否产生回声;将与根据相干性检测的结果确定产生回声对应的第二音频码流,作为包含回声的音频码流。In some embodiments, an audio code stream containing an echo among multiple second audio code streams can be identified in the following manner: performing coherence detection on the first audio code stream and each second audio code stream respectively; determining whether an echo is generated based on the result of the coherence detection; and using the second audio code stream corresponding to the echo determined to be generated based on the result of the coherence detection as the audio code stream containing the echo.
具体的说,假设有5个音频接收方返回第二音频码流,音频发送方在接收到5个第二音频码流后,将第一音频码流分别和每个第二音频码流进行相干性检测,得到5个相干性检测的结果,从而能够根据检测结果得知哪个音频接收方产生了回声。可以理解的是,也可能是多个音频接收方产生了回声。Specifically, assuming that there are 5 audio receivers returning the second audio code stream, after receiving the 5 second audio code streams, the audio sender performs coherence detection on the first audio code stream and each second audio code stream respectively, and obtains 5 coherence detection results, so that it can be known which audio receiver generates the echo according to the detection results. It is understandable that multiple audio receivers may also generate the echo.
在一些实施例中,获取第一音频码流和每个第二音频码流的自功率谱密度、 第一音频码流和每个第二音频码流的互功率谱密度,通过以下公式进行相干性检测:
In some embodiments, the auto-power spectral density of the first audio code stream and each second audio code stream is obtained, The cross power spectral density of the first audio code stream and each second audio code stream is used for coherence detection by the following formula:
其中,Gxy(f)为互功率谱密度,Gxx(f)为第一音频码流的自功率谱密度,Gyy(f)为第二音频码流的自功率谱密度,Cxy(f)为相干性。Among them, Gxy(f) is the cross power spectral density, Gxx(f) is the auto power spectral density of the first audio code stream, Gyy(f) is the auto power spectral density of the second audio code stream, and Cxy(f) is the coherence.
具体的说,相干性大于或等于0、且小于或等于1,0表示完全不相干,1表示完全相干。当相干性大于或等于常数K时,判定与该相干性对应的音频接收方产生了回声。可以理解的是,本实施例并不对常数K的大小做具体限定,可以根据实际需求设置。Specifically, the coherence is greater than or equal to 0 and less than or equal to 1, 0 represents completely irrelevant, and 1 represents completely coherent. When the coherence is greater than or equal to the constant K, it is determined that the audio receiver corresponding to the coherence has generated an echo. It can be understood that this embodiment does not specifically limit the size of the constant K, and it can be set according to actual needs.
S204:根据与包括回声的第二音频码流对应的身份信息,确定产生回声的音频接收方。S204: Determine an audio receiver generating the echo according to the identity information corresponding to the second audio code stream including the echo.
与相关技术相比,本申请的实施例至少具有以下优点:根据发送的第一音频码流和接收的第二音频码流能够准确判断是否有回声产生;由于音频接收方返回的第二音频码流中包含音频接收方的身份信息,使得在检测到有回声产生时,能够根据身份信息快速识别回声的产生来源,有效地提升了回声检测的效率,也提高了用户的使用体验。Compared with the related art, the embodiments of the present application have at least the following advantages: whether an echo is generated can be accurately determined based on the sent first audio code stream and the received second audio code stream; because the second audio code stream returned by the audio receiver contains the identity information of the audio receiver, when an echo is detected, the source of the echo can be quickly identified based on the identity information, which effectively improves the efficiency of echo detection and improves the user experience.
请参考图4,为本申请一实施例提供的音频检测方法的流程图,本实施例是对前述实施例的进一步改进,主要改进之处在于:在本实施例中,在根据所述身份信息确定产生所述回声的音频接收方之后,还包括:将所述身份信息上传至服务器,其中,所述服务器在接收到所述身份信息后,向与所述身份信息对应的音频接收方发送提醒信息。通过此种方式,能够更快的通知产生回声的音频接收方,以使音频接收方能够尽快消除回声,进一步提高用户的使用体验。Please refer to Figure 4, which is a flow chart of an audio detection method provided in an embodiment of the present application. This embodiment is a further improvement of the aforementioned embodiment, and the main improvement is that: in this embodiment, after determining the audio receiver that generates the echo according to the identity information, it also includes: uploading the identity information to a server, wherein the server sends a reminder message to the audio receiver corresponding to the identity information after receiving the identity information. In this way, the audio receiver that generates the echo can be notified more quickly, so that the audio receiver can eliminate the echo as soon as possible, further improving the user experience.
本实施例应用于音频发送方侧,如图4所示,包括以下步骤:This embodiment is applied to the audio sender side, as shown in FIG4 , and includes the following steps:
步骤S301:获取音频发送方发往音频接收方的第一音频码流。Step S301: Acquire a first audio code stream sent by an audio sender to an audio receiver.
步骤S302:获取音频接收方返回的第二音频码流,第二音频码流中携带音频接收方的身份信息。Step S302: Acquire a second audio code stream returned by the audio receiver, where the second audio code stream carries the identity information of the audio receiver.
步骤S303:根据第一音频码流和第二音频码流,检测第二音频码流是否包括回声。Step S303: According to the first audio code stream and the second audio code stream, detecting whether the second audio code stream includes an echo.
步骤S304:在检测到回声时,根据身份信息确定产生回声的音频接收方。Step S304: when an echo is detected, determining the audio receiver generating the echo according to the identity information.
本实施例的步骤S301至步骤S304与前述实施例的步骤S101至步骤S104类似,为了避免重复,此处不再赘述。Steps S301 to S304 of this embodiment are similar to steps S101 to S104 of the above-mentioned embodiment, and are not described again here to avoid repetition.
步骤S305:将身份信息上传至信令服务器。Step S305: Upload the identity information to the signaling server.
在一些实施例中,音频发送方将身份信息上传至信令服务器,信令服务器在接收到身份信息后,向与身份信息对应的音频接收方发送提醒信息。音频接收方在收到提醒信息后,关闭麦克风以消除回声。In some embodiments, the audio sender uploads the identity information to the signaling server, and after receiving the identity information, the signaling server sends a reminder message to the audio receiver corresponding to the identity information. After receiving the reminder message, the audio receiver turns off the microphone to eliminate the echo.
为了便于理解,下面结合图5对本实施例的音频检测的过程进行具体的说明:For ease of understanding, the audio detection process of this embodiment is specifically described below in conjunction with FIG. 5 :
假设音频发送方为终端A,音频接收方为终端B、终端C和终端D,终端A、终端B、终端C和终端D接入会议系统中,终端A、终端B、终端C和终 端D之间通过媒体服务器转发各自的音频码流,会议系统具有终端A下行回声检测模块,终端A下行回声检测模块用于检测终端B、终端C和终端D是否产生回声。Assume that the audio sender is terminal A, and the audio receivers are terminals B, C, and D. Terminals A, B, C, and D are connected to the conference system. The terminals D forward their respective audio code streams through the media server. The conference system has a terminal A downlink echo detection module, which is used to detect whether terminals B, C and D generate echoes.
可以理解的是,图4所示的终端A下行回声检测模块为硬件设备,在实际应用中,终端A下行回声检测模块也可以为终端A的一个功能模块。It can be understood that the downlink echo detection module of terminal A shown in FIG. 4 is a hardware device. In practical applications, the downlink echo detection module of terminal A may also be a functional module of terminal A.
(1)终端A向媒体服务器和终端A下行回声检测模块发送第一音频码流,媒体服务器将该第一音频码流转发至终端B、终端C和终端D。(1) Terminal A sends a first audio code stream to the media server and the downlink echo detection module of terminal A, and the media server forwards the first audio code stream to terminals B, C, and D.
(2)终端B、终端C和终端D分别向媒体服务器发送第二音频码流,媒体服务器将第二音频码流转发至终端A和终端A下行回声检测模块。可以理解的是,终端A和终端A下行回声检测模块会接收到三个不同的第二音频码流,终端B、终端C和终端D发送的第二音频码流互不相同,且每个终端发送的第二音频码流中均包含各自的身份信息。(2) Terminal B, terminal C and terminal D send the second audio code stream to the media server respectively, and the media server forwards the second audio code stream to terminal A and the downlink echo detection module of terminal A. It can be understood that terminal A and the downlink echo detection module of terminal A will receive three different second audio code streams, and the second audio code streams sent by terminal B, terminal C and terminal D are different from each other, and the second audio code stream sent by each terminal contains its own identity information.
(3)终端A下行回声检测模块将第一音频码流分别和每个第二音频码流进行相干性检测,在检测到产生回声时,解析该第二音频码流,检测出是终端B产生了回声。(3) The downlink echo detection module of terminal A performs coherence detection on the first audio code stream and each second audio code stream respectively. When an echo is detected, the module analyzes the second audio code stream and detects that the echo is generated by terminal B.
(4)将终端B的身份信息上报至信令服务器。(4) Report the identity information of terminal B to the signaling server.
(5)信令服务器向终端B发送提醒消息,终端B关闭麦克风或更换其他的麦克风设备,回声消除。(5) The signaling server sends a reminder message to terminal B, and terminal B turns off the microphone or replaces another microphone device to eliminate the echo.
在一些实施例中,上述终端A、终端B、终端C和终端D接入的会议系统具有会议后台管理系统,会议后台管理系统的主要功能为:监控整个会议系统的运行情况,对数据进行汇总。会议后台管理系统的数据来源为:终端A、终端B、终端C、终端D、媒体服务器以及信令服务器均会向会议后台管理系统主动上报数据。因此,当检测到终端B产生了回声时,终端A下行回声检测模块也会将该信息上报至会议后台管理系统。In some embodiments, the conference system accessed by the above-mentioned terminal A, terminal B, terminal C and terminal D has a conference background management system, and the main function of the conference background management system is to monitor the operation of the entire conference system and summarize the data. The data sources of the conference background management system are: terminal A, terminal B, terminal C, terminal D, media server and signaling server will actively report data to the conference background management system. Therefore, when it is detected that terminal B generates an echo, the downlink echo detection module of terminal A will also report the information to the conference background management system.
与相关技术相比,本申请的实施例至少具有以下优点:根据发送的第一音频码流和接收的第二音频码流能够准确判断是否有回声产生;由于音频接收方返回的第二音频码流中包含音频接收方的身份信息,使得在检测到有回声产生时,能够根据身份信息快速识别回声的产生来源,有效地提升了回声检测的效率,也提高了用户的使用体验。Compared with the related art, the embodiments of the present application have at least the following advantages: whether an echo is generated can be accurately determined based on the sent first audio code stream and the received second audio code stream; because the second audio code stream returned by the audio receiver contains the identity information of the audio receiver, when an echo is detected, the source of the echo can be quickly identified based on the identity information, which effectively improves the efficiency of echo detection and improves the user experience.
请参考图6,为本申请一实施例提供的音频检测装置600的功能模块示意图,包括:音频发送模块10,音频发送模块10用于向音频接收模块20发送第一音频码流;音频接收模块20,音频接收模块20用于接收第一音频码流,并向音频发送模块10发送第二音频码流,其中,第二音频码流中携带音频接收方的身份信息;音频发送模块10还用于接收第二音频码流,根据第一音频码流和第二音频码流,检测第二音频码流是否包括回声,并在检测到回声时,根据身份信息确定产生回声的音频接收方。本申请实施例所称的模块可以是完成一特定功能的程序段,比程序更适合于描述软件在处理器中的执行过程。该一个或多个模块可以被存储在存储器中并被配置成由一个或多个处理器执行。Please refer to Figure 6, which is a functional module diagram of an audio detection device 600 provided in an embodiment of the present application, including: an audio sending module 10, the audio sending module 10 is used to send a first audio code stream to an audio receiving module 20; an audio receiving module 20, the audio receiving module 20 is used to receive the first audio code stream, and send a second audio code stream to the audio sending module 10, wherein the second audio code stream carries the identity information of the audio receiver; the audio sending module 10 is also used to receive the second audio code stream, and according to the first audio code stream and the second audio code stream, detect whether the second audio code stream includes an echo, and when an echo is detected, determine the audio receiver that generates the echo according to the identity information. The module referred to in the embodiment of the present application can be a program segment that completes a specific function, which is more suitable for describing the execution process of the software in the processor than a program. The one or more modules can be stored in a memory and configured to be executed by one or more processors.
在一些实施例中,包括会议媒体服务器30,会议媒体服务器30可以是音频 检测装置600的一个功能模块,会议媒体服务器30也可以是独立于音频检测装置600的硬件设备或功能模块。如图6所示,会议媒体服务器30为独立于音频检测装置600的硬件设备,音频发送模块10将第一音频码流发送至会议媒体服务器30,会议媒体服务器30将第一音频码流转发至音频接收模块20;音频接收模块20将第二音频码流发送至会议媒体服务器30,会议媒体服务器30将第二音频码流转发至音频发送模块10。In some embodiments, a conference media server 30 is included, which may be an audio A functional module of the detection device 600, the conference media server 30 can also be a hardware device or functional module independent of the audio detection device 600. As shown in FIG6 , the conference media server 30 is a hardware device independent of the audio detection device 600, the audio sending module 10 sends the first audio code stream to the conference media server 30, and the conference media server 30 forwards the first audio code stream to the audio receiving module 20; the audio receiving module 20 sends the second audio code stream to the conference media server 30, and the conference media server 30 forwards the second audio code stream to the audio sending module 10.
在一些实施例中,返回第二音频码流的音频接收模块20为多个,其中,一个音频接收模块20对应一个第二音频码流;音频发送模块10具体用于:接收多个音频接收模块20返回的多个第二音频码流,根据第一音频码流和多个第二音频码流检测是否产生回声,并在检测到产生回声时,识别出多个第二音频码流中包含回声的音频码流;根据与包含回声的音频码流对应的身份信息,确定产生回声的音频接收方。In some embodiments, there are multiple audio receiving modules 20 that return the second audio code stream, wherein one audio receiving module 20 corresponds to one second audio code stream; the audio sending module 10 is specifically used to: receive multiple second audio code streams returned by multiple audio receiving modules 20, detect whether an echo is generated according to the first audio code stream and the multiple second audio code streams, and when an echo is detected, identify the audio code stream containing the echo in the multiple second audio code streams; determine the audio receiver that generates the echo according to the identity information corresponding to the audio code stream containing the echo.
在一些实施例中,音频发送模块10将第一音频码流分别和每个第二音频码流进行相干性检测;根据相干性检测的结果确定是否存在第二音频码流包括回声。In some embodiments, the audio sending module 10 performs coherence detection on the first audio code stream and each second audio code stream respectively; and determines whether any second audio code stream includes an echo according to the result of the coherence detection.
在一些实施例中,音频检测装置600还包括信令服务器40,音频发送模块10用于将身份信息上传至信令服务器40;信令服务器40用于在接收到身份信息后,向与身份信息对应的音频接收方发送提醒信息。In some embodiments, the audio detection device 600 also includes a signaling server 40, and the audio sending module 10 is used to upload the identity information to the signaling server 40; the signaling server 40 is used to send a reminder message to the audio receiver corresponding to the identity information after receiving the identity information.
在一些实施例中,信令服务器40为独立于音频检测装置600的硬件设备或功能模块。In some embodiments, the signaling server 40 is a hardware device or functional module independent of the audio detection device 600 .
请参考图7,为本申请实施例提供的计算设备100的硬件结构示意图。如图7所示,计算设备100包括:总线102、处理器104、存储器106和通信接口108。处理器104、存储器106和通信接口108之间通过总线102通信。计算设备100可以是服务器或终端设备。应理解,本申请不限定计算设备100中的处理器、存储器的个数。Please refer to FIG. 7, which is a schematic diagram of the hardware structure of a computing device 100 provided in an embodiment of the present application. As shown in FIG. 7, the computing device 100 includes: a bus 102, a processor 104, a memory 106, and a communication interface 108. The processor 104, the memory 106, and the communication interface 108 communicate through the bus 102. The computing device 100 can be a server or a terminal device. It should be understood that the present application does not limit the number of processors and memories in the computing device 100.
总线102可以是外设部件互连标准(peripheral component interconnect,PCI)总线或扩展工业标准结构(extended industry standard architecture,EISA)总线等。总线可以分为地址总线、数据总线、控制总线等。为便于表示,图7中仅用一条线表示,但并不表示仅有一根总线或一种类型的总线。总线104可包括在计算设备100各个部件(例如,存储器106、处理器104、通信接口108)之间传送信息的通路。The bus 102 may be a peripheral component interconnect (PCI) bus or an extended industry standard architecture (EISA) bus, etc. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of representation, FIG. 7 is represented by only one line, but does not mean that there is only one bus or one type of bus. The bus 104 may include a path for transmitting information between various components of the computing device 100 (e.g., the memory 106, the processor 104, the communication interface 108).
处理器104可以包括中央处理器(central processing unit,CPU)、图形处理器(graphics processing unit,GPU)、微处理器(micro processor,MP)或者数字信号处理器(digital signal processor,DSP)等处理器中的任意一种或多种。The processor 104 may include any one or more processors such as a central processing unit (CPU), a graphics processing unit (GPU), a microprocessor (MP) or a digital signal processor (DSP).
存储器106可以包括易失性存储器(volatile memory),例如随机存取存储器(random access memory,RAM)。处理器104还可以包括非易失性存储器(non-volatile memory),例如只读存储器(read-only memory,ROM),快闪存储器,机械硬盘(hard disk drive,HDD)或固态硬盘(solid state drive,SSD)。The memory 106 may include a volatile memory, such as a random access memory (RAM). The processor 104 may also include a non-volatile memory, such as a read-only memory (ROM), a flash memory, a hard disk drive (HDD), or a solid state drive (SSD).
存储器106中存储有可执行的程序代码,处理器104执行该可执行的程序 代码以分别实现前述音频发送模块、音频接收模块的功能,从而实现音频检测方法。也即,存储器106上存有用于执行音频检测方法的指令。The memory 106 stores executable program code, and the processor 104 executes the executable program code. The codes are used to implement the functions of the aforementioned audio sending module and audio receiving module respectively, thereby implementing the audio detection method. That is, the memory 106 stores instructions for executing the audio detection method.
通信接口103使用例如但不限于网络接口卡、收发器一类的收发模块,来实现计算设备100与其他设备或通信网络之间的通信。The communication interface 103 uses a transceiver module such as, but not limited to, a network interface card or a transceiver to implement communication between the computing device 100 and other devices or a communication network.
本申请实施例还提供了一种计算设备集群。该计算设备集群包括至少一台计算设备。该计算设备可以是服务器,例如是中心服务器、边缘服务器,或者是本地数据中心中的本地服务器。在一些实施例中,计算设备也可以是台式机、笔记本电脑或者智能手机等终端设备。The embodiment of the present application also provides a computing device cluster. The computing device cluster includes at least one computing device. The computing device can be a server, such as a central server, an edge server, or a local server in a local data center. In some embodiments, the computing device can also be a terminal device such as a desktop computer, a laptop computer, or a smart phone.
如图8所示,所述计算设备集群包括至少一个计算设备100。计算设备集群中的一个或多个计算设备100中的存储器106中可以存有相同的用于执行音频检测方法的指令。As shown in Fig. 8, the computing device cluster includes at least one computing device 100. The memory 106 in one or more computing devices 100 in the computing device cluster may store the same instructions for executing the audio detection method.
在一些可能的实现方式中,该计算设备集群中的一个或多个计算设备100的存储器106中也可以分别存有用于执行音频检测方法的部分指令。换言之,一个或多个计算设备100的组合可以共同执行用于执行音频检测方法的指令。In some possible implementations, the memory 106 of one or more computing devices 100 in the computing device cluster may also store partial instructions for executing the audio detection method. In other words, the combination of one or more computing devices 100 may jointly execute instructions for executing the audio detection method.
需要说明的是,计算设备集群中的不同的计算设备100中的存储器106可以存储不同的指令,分别用于执行音频检测模块的部分功能。也即,不同的计算设备100中的存储器106存储的指令可以实现音频发送模块和音频接收模块中的一个或多个模块的功能。It should be noted that the memory 106 in different computing devices 100 in the computing device cluster can store different instructions, which are respectively used to execute part of the functions of the audio detection module. That is, the instructions stored in the memory 106 in different computing devices 100 can implement the functions of one or more modules in the audio sending module and the audio receiving module.
在一些可能的实现方式中,计算设备集群中的一个或多个计算设备可以通过网络连接。其中,所述网络可以是广域网或局域网等等。图9示出了一种可能的实现方式。如图8所示,两个计算设备100A和100B之间通过网络进行连接。具体地,通过各个计算设备中的通信接口与所述网络进行连接。在这一类可能的实现方式中,计算设备100A中的存储器106中存有执行音频发送模块的功能的指令。同时,计算设备100B中的存储器106中存有执行音频接收模块的功能的指令。In some possible implementations, one or more computing devices in the computing device cluster can be connected via a network. Wherein, the network can be a wide area network or a local area network, etc. FIG. 9 shows a possible implementation. As shown in FIG. 8 , two computing devices 100A and 100B are connected via a network. Specifically, the network is connected via a communication interface in each computing device. In this class of possible implementations, the memory 106 in the computing device 100A stores instructions for executing the functions of the audio transmission module. Meanwhile, the memory 106 in the computing device 100B stores instructions for executing the functions of the audio receiving module.
图9所示的计算设备集群之间的连接方式可以是考虑到本申请提供的音频检测方法需要大量地存储数据和分析数据,因此考虑将音频接收模块实现的功能交由计算设备100B执行。The connection method between the computing device clusters shown in Figure 9 can be considered to be that the audio detection method provided in this application requires a large amount of data storage and analysis, so it is considered to hand over the functions implemented by the audio receiving module to the computing device 100B for execution.
应理解,图9中示出的计算设备100A的功能也可以由多个计算设备100完成。同样,计算设备100B的功能也可以由多个计算设备100完成。It should be understood that the functions of the computing device 100A shown in FIG9 may also be completed by multiple computing devices 100. Similarly, the functions of the computing device 100B may also be completed by multiple computing devices 100.
本申请实施例还提供了另一种计算设备集群。该计算设备集群中各计算设备之间的连接关系可以类似的参考图8和图9所述计算设备集群的连接方式。不同的是,该计算设备集群中的一个或多个计算设备100中的存储器106中可以存有相同的用于执行上述的音频检测方法的指令。The embodiment of the present application also provides another computing device cluster. The connection relationship between the computing devices in the computing device cluster can be similar to the connection mode of the computing device cluster described in Figures 8 and 9. The difference is that the memory 106 in one or more computing devices 100 in the computing device cluster can store the same instructions for executing the above-mentioned audio detection method.
在一些可能的实现方式中,该计算设备集群中的一个或多个计算设备100的存储器106中也可以分别存有用于执行音频检测方法的部分指令。换言之,一个或多个计算设备100的组合可以共同执行用于执行音频检测方法的指令。In some possible implementations, the memory 106 of one or more computing devices 100 in the computing device cluster may also store partial instructions for executing the audio detection method. In other words, the combination of one or more computing devices 100 may jointly execute instructions for executing the audio detection method.
本申请实施例还提供了一种包含指令的计算机程序产品。所述计算机程序产品可以是包含指令的,能够运行在计算设备上或被储存在任何可用介质中的 软件或程序产品。当所述计算机程序产品在至少一个计算设备上运行时,使得至少一个计算设备执行音频检测方法。The present application also provides a computer program product including instructions. The computer program product may be a computer program product including instructions that can be run on a computing device or stored in any available medium. The software or program product, when executed on at least one computing device, causes the at least one computing device to perform the audio detection method.
本实施例还提供一种计算机可读存储介质,所述计算机可读存储介质可以是计算设备能够存储的任何可用介质或者是包含一个或多个可用介质的数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘)等。该计算机可读存储介质包括指令,所述指令指示计算设备执行音频检测方法。This embodiment also provides a computer-readable storage medium, which can be any available medium that can be stored by a computing device or a data storage device such as a data center containing one or more available media. The available medium can be a magnetic medium (e.g., a floppy disk, a hard disk, a tape), an optical medium (e.g., a DVD), or a semiconductor medium (e.g., a solid-state hard disk), etc. The computer-readable storage medium includes instructions that instruct the computing device to execute the audio detection method.
在本申请所提供的几个实施例中,所揭露的装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例是示意性的,例如,该模块或单元的划分,为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个装置,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In several embodiments provided in the present application, the disclosed devices and methods can be implemented in other ways. For example, the device embodiments described above are schematic. For example, the division of the modules or units is a logical function division. There may be other division methods in actual implementation, such as multiple units or components can be combined or integrated into another device, or some features can be ignored or not executed. Another point is that the mutual coupling or direct coupling or communication connection shown or discussed can be through some interfaces, indirect coupling or communication connection of devices or units, which can be electrical, mechanical or other forms.
该作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是一个物理单元或多个物理单元,即可以位于一个地方,或者也可以分布到多个不同地方。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The unit described as a separate component may or may not be physically separated, and the component shown as a unit may be one physical unit or multiple physical units, that is, it may be located in one place or distributed in multiple different places. Some or all of the units may be selected according to actual needs to achieve the purpose of the present embodiment.
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit. The above-mentioned integrated unit may be implemented in the form of hardware or in the form of software functional units.
该集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个可读取存储介质中。基于这样的理解,本申请实施例的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该软件产品存储在一个存储介质中,包括若干指令用以使得一个设备(可以是单片机,芯片等)或处理器(processor)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a readable storage medium. Based on this understanding, the technical solution of the embodiment of the present application is essentially or the part that contributes to the prior art or all or part of the technical solution can be embodied in the form of a software product, which is stored in a storage medium and includes several instructions to enable a device (which can be a single-chip microcomputer, chip, etc.) or a processor (processor) to execute all or part of the steps of the method described in each embodiment of the present application. The aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (ROM), random access memory (RAM), disk or optical disk and other media that can store program code.
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何在本申请揭露的技术范围内的变化或替换,都应涵盖在本申请的保护范围之内。 The above description is only a specific implementation of the present application, but the protection scope of the present application is not limited thereto. Any changes or substitutions within the technical scope disclosed in the present application should be included in the protection scope of the present application.

Claims (11)

  1. 一种音频检测方法,其特征在于,包括:An audio detection method, characterized by comprising:
    获取音频发送方发往音频接收方的第一音频码流;Obtaining a first audio code stream sent by an audio sender to an audio receiver;
    获取所述音频接收方返回的第二音频码流,其中,所述第二音频码流中携带所述音频接收方的身份信息;Acquire a second audio code stream returned by the audio receiver, wherein the second audio code stream carries the identity information of the audio receiver;
    根据所述第一音频码流和所述第二音频码流,检测所述第二音频码流是否包括回声;Detecting, according to the first audio code stream and the second audio code stream, whether the second audio code stream includes an echo;
    在检测到所述回声时,根据所述身份信息确定产生所述回声的音频接收方。When the echo is detected, the audio receiver generating the echo is determined according to the identity information.
  2. 如权利要求1所述的音频检测方法,其特征在于,所述获取所述音频接收方返回的第二音频码流,包括:The audio detection method according to claim 1, wherein obtaining the second audio code stream returned by the audio receiver comprises:
    获取多个所述音频接收方返回的多个所述第二音频码流,其中,一个所述音频接收方对应一个所述第二音频码流;Acquire multiple second audio code streams returned by multiple audio receivers, wherein one audio receiver corresponds to one second audio code stream;
    所述根据所述第一音频码流和所述第二音频码流,检测所述第二音频码流是否包括回声,包括:The detecting, according to the first audio code stream and the second audio code stream, whether the second audio code stream includes an echo comprises:
    根据所述第一音频码流和多个所述第二音频码流,检测多个所述第二音频码流中是否存在第二音频码流包括所述回声,并在检测到存在所述第二音频码流包括所述回声时,识别出多个所述第二音频码流中包括所述回声的第二音频码流;According to the first audio code stream and the plurality of the second audio code streams, detecting whether there is a second audio code stream including the echo in the plurality of the second audio code streams, and identifying the second audio code stream including the echo in the plurality of the second audio code streams when it is detected that there is a second audio code stream including the echo;
    所述根据所述身份信息确定所述回声的音频接收方,包括:The step of determining the audio receiver of the echo according to the identity information includes:
    根据与所述包括回声的第二音频码流对应的身份信息,确定产生所述回声的音频接收方。An audio receiver generating the echo is determined according to the identity information corresponding to the second audio code stream including the echo.
  3. 如权利要求2所述的音频检测方法,其特征在于,所述根据所述第一音频码流和多个所述第二音频码流,检测多个所述第二音频码流是否包括回声,包括:The audio detection method according to claim 2, wherein the detecting whether the plurality of second audio code streams include an echo based on the first audio code stream and the plurality of second audio code streams comprises:
    将所述第一音频码流分别和每个所述第二音频码流进行相干性检测;Performing coherence detection on the first audio code stream and each of the second audio code streams respectively;
    根据所述相干性检测的结果确定是否存在第二音频码流包括回声。It is determined whether the second audio code stream includes an echo according to the result of the coherence detection.
  4. 如权利要求1所述的音频检测方法,其特征在于,在根据所述身份信息确定产生所述回声的音频接收方之后,还包括:The audio detection method according to claim 1, characterized in that after determining the audio receiver generating the echo according to the identity information, it also includes:
    将所述身份信息上传至信令服务器,以使得所述信令服务器在接收到所述身份信息后,向与所述身份信息对应的音频接收方发送提醒信息。The identity information is uploaded to a signaling server, so that after receiving the identity information, the signaling server sends a reminder message to an audio receiver corresponding to the identity information.
  5. 一种音频检测装置,其特征在于,包括:An audio detection device, characterized by comprising:
    音频发送模块,所述音频发送模块用于向音频接收模块发送第一音频码流;An audio sending module, the audio sending module is used to send a first audio code stream to the audio receiving module;
    音频接收模块,所述音频接收模块用于接收所述第一音频码流,并向所述音频发送模块发送第二音频码流,其中,所述第二音频码流中携带音频接收方的身份信息;an audio receiving module, the audio receiving module being used to receive the first audio code stream and send a second audio code stream to the audio sending module, wherein the second audio code stream carries identity information of an audio receiver;
    所述音频发送模块还用于接收所述第二音频码流,根据所述第一音频码流和所述第二音频码流,检测所述第二音频码流是否包括回声,并在检测到所述 回声时,根据所述身份信息确定产生所述回声的音频接收方。The audio sending module is further configured to receive the second audio code stream, detect whether the second audio code stream includes an echo according to the first audio code stream and the second audio code stream, and When an audio signal is received, the audio receiver generating the audio signal is determined according to the identity information.
  6. 如权利要求5所述的音频检测装置,其特征在于,返回所述第二音频码流的所述音频接收模块为多个,其中,一个所述音频接收模块对应一个所述第二音频码流;The audio detection device according to claim 5, characterized in that there are multiple audio receiving modules that return the second audio code stream, wherein one audio receiving module corresponds to one second audio code stream;
    所述音频发送模块具体用于:接收多个所述音频接收模块返回的多个所述第二音频码流,根据所述第一音频码流和多个所述第二音频码流,检测多个所述第二音频码流是否存在第二音频码流包括所述回声,并在检测到存在所述第二音频码流包括所述回声时,识别出多个所述第二音频码流中包括所述回声的第二音频码流;根据与所述包括回声的第二音频码流对应的身份信息,确定产生所述回声的音频接收方。The audio sending module is specifically used to: receive multiple second audio code streams returned by multiple audio receiving modules, detect whether there is a second audio code stream including the echo in the multiple second audio code streams according to the first audio code stream and the multiple second audio code streams, and when it is detected that there is a second audio code stream including the echo, identify the second audio code stream including the echo in the multiple second audio code streams; determine the audio receiver that generates the echo according to the identity information corresponding to the second audio code stream including the echo.
  7. 如权利要求5所述的音频检测装置,其特征在于,所述音频发送模块具体用于:将所述第一音频码流分别和每个所述第二音频码流进行相干性检测;根据所述相干性检测的结果确定是否存在第二音频码流包括回声。The audio detection device as described in claim 5 is characterized in that the audio sending module is specifically used to: perform coherence detection on the first audio code stream and each of the second audio code streams respectively; and determine whether the second audio code stream includes an echo based on the result of the coherence detection.
  8. 如权利要求5所述的音频检测装置,其特征在于,所述音频检测装置还包括信令服务器;The audio detection device according to claim 5, characterized in that the audio detection device also includes a signaling server;
    所述音频发送模块用于将所述身份信息上传至所述信令服务器;The audio sending module is used to upload the identity information to the signaling server;
    所述信令服务器用于在接收到所述身份信息后,向与所述身份信息对应的音频接收方发送提醒信息。The signaling server is used to send reminder information to the audio receiver corresponding to the identity information after receiving the identity information.
  9. 一种计算设备集群,其特征在于,包括至少一个计算设备,每个计算设备包括处理器和存储器;A computing device cluster, characterized in that it includes at least one computing device, each computing device includes a processor and a memory;
    所述至少一个计算设备的处理器用于执行所述至少一个计算设备的存储器中存储的指令,以使得所述计算设备集群执行如权利要求1所述的方法。The processor of the at least one computing device is configured to execute instructions stored in the memory of the at least one computing device, so that the computing device cluster executes the method according to claim 1 .
  10. 一种包含指令的计算机程序产品,其特征在于,当所述指令被计算设备集群运行时,使得所述计算设备集群执行如权利要求的1所述的方法。A computer program product comprising instructions, characterized in that when the instructions are executed by a computing device cluster, the computing device cluster is caused to perform the method according to claim 1.
  11. 一种计算机可读存储介质,其特征在于,包括计算机程序指令,当所述计算机程序指令由计算设备集群执行时,所述计算设备集群执行如权利要求1所述的方法。 A computer-readable storage medium, characterized in that it includes computer program instructions. When the computer program instructions are executed by a computing device cluster, the computing device cluster executes the method as claimed in claim 1.
PCT/CN2023/101461 2022-10-31 2023-06-20 Audio detection method, apparatus and device WO2024093264A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211352488.5 2022-10-31
CN202211352488.5A CN117998011A (en) 2022-10-31 2022-10-31 Audio detection method, device and equipment

Publications (1)

Publication Number Publication Date
WO2024093264A1 true WO2024093264A1 (en) 2024-05-10

Family

ID=90896454

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/101461 WO2024093264A1 (en) 2022-10-31 2023-06-20 Audio detection method, apparatus and device

Country Status (2)

Country Link
CN (1) CN117998011A (en)
WO (1) WO2024093264A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130003962A1 (en) * 2011-06-30 2013-01-03 Nokia Corporation Methods, apparatuses and computer program products for providing active echo-cancellation for systems combining voice communication and synchronous audio content
US11122160B1 (en) * 2020-07-08 2021-09-14 Lenovo (Singapore) Pte. Ltd. Detecting and correcting audio echo
CN113516991A (en) * 2020-08-18 2021-10-19 腾讯科技(深圳)有限公司 Audio playing and equipment management method and device based on group session
CN113571093A (en) * 2021-07-27 2021-10-29 陈元有 Audio data processing method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130003962A1 (en) * 2011-06-30 2013-01-03 Nokia Corporation Methods, apparatuses and computer program products for providing active echo-cancellation for systems combining voice communication and synchronous audio content
US11122160B1 (en) * 2020-07-08 2021-09-14 Lenovo (Singapore) Pte. Ltd. Detecting and correcting audio echo
CN113516991A (en) * 2020-08-18 2021-10-19 腾讯科技(深圳)有限公司 Audio playing and equipment management method and device based on group session
CN113571093A (en) * 2021-07-27 2021-10-29 陈元有 Audio data processing method

Also Published As

Publication number Publication date
CN117998011A (en) 2024-05-07

Similar Documents

Publication Publication Date Title
JP2019207710A (en) Interaction method between smart voice apparatuses, device, apparatus, and storage medium
US10284616B2 (en) Adjusting a media stream in a video communication system based on participant count
CN111654519B (en) Method and device for transmitting data processing requests
WO2022017007A1 (en) Audio data processing method, server, and storage medium
WO2023103419A1 (en) Message queue-based method and apparatus for sending 5g messages in batches, and electronic device
WO2021238404A1 (en) Data card switching method and apparatus, and terminal, and storage medium
US20190087054A1 (en) Information processing apparatus, conference system, and control method of information processing apparatus
CN108573393A (en) Comment information processing method, device, server and storage medium
CN105704338A (en) Audio mixing method, audio mixing equipment and system
CN109889922B (en) Method, device, equipment and storage medium for forwarding streaming media data
WO2018223936A1 (en) Method and system for interaction between ap and modem, and storage medium
US10848530B2 (en) Conference system and method for interworking between audio conference and web conference
CN109787900B (en) Transmission method, device, equipment and machine readable medium
WO2024093264A1 (en) Audio detection method, apparatus and device
WO2023165320A1 (en) Playback parameter configuration method and device
CN102025730B (en) Data transmission device based on HTTP protocol and implementation method thereof
CN112073215B (en) Method for realizing application and service controller
WO2016095510A1 (en) Path selection method, device and system
CN114500546B (en) Communication message sending method, device and equipment based on load balancing cluster
CN111835801A (en) File downloading method, device, server, edge device, terminal and medium
CN113507394B (en) Network performance detection method and device, electronic equipment and storage medium
CN113873301A (en) Video stream acquisition method and device, server and storage medium
US20190089753A1 (en) Conference system and method for handling conference connection thereof
CN115022411B (en) WebRTC-based media server scheduling system, method and device
CN114900710B (en) Multi-device synchronization method and device, electronic device and storage medium