CN115171728A

CN115171728A - Illegal audio stream identification method and device, computer equipment and storage medium

Info

Publication number: CN115171728A
Application number: CN202210907944.1A
Authority: CN
Inventors: 钟正阳; 李一文; 刘名运; 周渝雄
Original assignee: Hunan Yingke Mutual Entertainment Network Information Co ltd
Current assignee: Hunan Yingke Mutual Entertainment Network Information Co ltd
Priority date: 2022-07-29
Filing date: 2022-07-29
Publication date: 2022-10-11

Abstract

The application relates to a method and a device for identifying an illegal audio stream, computer equipment and a storage medium. The method comprises the following steps: the method comprises the steps of obtaining multiple audio streams of a live broadcast room, monitoring and calculating the audio streams according to an audio detector and a filter bound on each audio stream to obtain audio code rates, subtracting the audio code rate at the current moment from the audio code rate at the previous moment to obtain incremental code rates, comparing the incremental code rates with the audio code rate at the previous moment to generate an audio trend value, comparing the audio trend value with a preset audio trend threshold value to judge whether the audio streams have sound, giving a label to the audio streams with the sound, identifying the multiple audio streams by listening to the sound content and the label, and capturing illegal audio streams. By adopting the method, the capture of the illegal audio stream can be effectively accelerated, the illegal content can be identified by directly listening to the sound content of the multi-channel audio stream, and the accuracy of illegal identification is improved.

Description

Illegal audio stream identification method and device, computer equipment and storage medium

Technical Field

The present application relates to the field of mobile internet technologies, and in particular, to a method and an apparatus for identifying an illegal audio stream, a computer device, and a storage medium.

Background

With the arrival of mobile internet, network interaction modes are more and more diversified, live audio and video interaction is one of the live audio and video interaction, a situation that multiple microphones in one live broadcast room interact simultaneously exists, related contents need to be supervised and audited under the national supervision requirement, and a general processing scheme is to build an auditing system to audit pushed audio and video. Because multiple persons often talk in one room simultaneously when connecting to the wheat, how to accurately capture an offender during auditing becomes a problem, normally, live broadcast streams are generated by mobile phone software, an auditing platform adopts a Browser/Server mode (Browser/Server, b/s) architecture, and conventional processing means comprises that firstly, each stream in the live broadcast room is individually played on the auditing platform, and the offender can be accurately captured by the individual stream playing, so that the illegal capturing time is too long due to too many persons connecting to the wheat; if the two are played simultaneously, the offender cannot be known. And secondly, all streams are played simultaneously, a currently speaking person is reported on mobile phone software, socket communication is adopted for real-time reporting, a background monitors that the sockets are kept synchronous, a speaker is identified in an explicit mode, the scheme needs mobile phone software collaborative development, the implementation difficulty is high, the problem of synchronous delay exists, and violation judgment errors can be caused. Thirdly, each audio stream is subjected to voice-to-text conversion, the reporter of the converted text is identified and is audited one by one, certain voice-to-text cost needs to be borne at the moment, and the individual illegal sound translation effect is poor, so that a large number of illegal sounds are missed.

As mentioned above, the above solutions all have a certain problem in identifying the illegal user, and cannot well meet the requirements of supervision in efficiency and accuracy.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a method, an apparatus, a computer device, and a storage medium for identifying an illegal audio stream, which can quickly and accurately identify the illegal audio stream.

A method of identifying an offending audio stream, the method comprising:

acquiring multiple audio streams of a live broadcast room, stripping and instantiating each audio stream to obtain an audio context, and creating a media element audio source node and an analysis node according to the audio context;

setting a buffer average data frame parameter according to a media element audio source node and an analysis node, and detecting an audio stream according to the buffer average data frame parameter to obtain a data frame binary stream carrying audio data;

monitoring and calculating the data frame binary stream according to an audio detector and a filter bound on each path of audio stream to obtain an audio code rate, subtracting the audio code rate at the current moment from the audio code rate at the previous moment to obtain an incremental code rate, and comparing the incremental code rate with the audio code rate at the previous moment to obtain an audio trend value; the incremental code rate comprises a positive incremental code rate and a negative incremental code rate; and comparing the audio trend value with a preset audio trend threshold, judging whether the audio stream has sound according to the comparison result, giving a label to the audio stream with the sound, identifying the multi-channel audio stream by listening to the sound content and the label, and capturing the illegal audio stream.

In one embodiment, the monitoring and calculating the data frame binary stream according to the audio detector and the filter bound to each path of audio stream to obtain the audio code rate includes:

creating a floating point array carrying the audio track data according to the length of the binary stream of the data frame;

and monitoring and calculating the audio track data in the floating point array according to the audio detector and the filter bound on each path of audio stream to obtain the audio code rate.

In one embodiment, the audio code rate is obtained by monitoring and calculating the audio track data in the floating point array according to the audio detector and the filter bound to each audio stream, and the method comprises the following steps:

monitoring and denoising the audio track data in the floating point array according to the audio detector bound on each path of audio stream to obtain denoised audio track data;

sampling the denoised audio track data according to a filter to obtain a plurality of groups of audio track data samples and index filter values;

and multiplying and accumulating the squares of the multiple groups of audio track data samples and the index filter value to obtain the audio code rate.

In one embodiment, sampling the noise-reduced audio track data according to a filter to obtain a plurality of sets of audio track data samples and an index filter value, includes:

sampling the noise-reduced audio track data according to the sampling frequency set by the filter to obtain a plurality of groups of audio track data samples;

quantizing the length of the sampling number to obtain a sample index;

and comparing the product of the sample index and the sampling frequency with an initial threshold value set by a filter to obtain an index filter value.

In one embodiment, the initial threshold set by the filter comprises a first initial threshold and a second initial threshold;

comparing the product of the sample index and the sampling frequency with an initial threshold value set by a filter to obtain an index filter value, comprising:

comparing the product of the sample index and the sampling frequency with an initial threshold set by a filter, and setting the index filter value as 0 when the product is less than a first initial threshold; when the product is less than a second initial threshold, the index filter value is set to 1.

In one embodiment, comparing the incremental code rate with the audio code rate at the previous time to obtain an audio trend value includes:

setting an initial audio trend value to be zero;

comparing the positive incremental code rate with the positive multiple of the audio code rate at the last moment, wherein when the positive incremental code rate is greater than the positive multiple of the audio code rate at the last moment, the audio trend value is the initial audio trend value plus one, otherwise, the audio trend value keeps the initial audio trend value unchanged;

and comparing the negative incremental code rate with the negative multiple of the audio code rate at the last moment, wherein when the negative incremental code rate is smaller than the negative multiple of the audio code rate at the last moment, the audio trend value is the initial audio trend value minus one, otherwise, the audio trend value keeps the initial audio trend value unchanged.

In one embodiment, comparing the audio trend value with a preset audio trend threshold value, and determining whether the audio stream has sound according to the comparison result includes:

and comparing the audio trend value with a preset audio trend threshold, judging that the audio stream has sound when the audio trend value is greater than the audio trend threshold, and otherwise, judging that the audio stream has no sound.

An apparatus for identifying an offending audio stream, the apparatus comprising:

the audio stream acquisition module is used for acquiring multiple audio streams of a live broadcast room, stripping and instantiating each audio stream to obtain an audio context, and creating a media element audio source node and an analysis node according to the audio context;

the audio stream detection module is used for setting a buffering average data frame parameter according to the media element audio source node and the analysis node, and detecting the audio stream according to the buffering average data frame parameter to obtain a data frame binary stream carrying audio data;

the audio code rate calculation module is used for monitoring and calculating the data frame binary stream according to the audio detector and the filter bound on each path of audio stream to obtain the audio code rate; subtracting the audio code rate at the current moment from the audio code rate at the previous moment to obtain an incremental code rate, and comparing the incremental code rate with the audio code rate at the previous moment to obtain an audio trend value; the incremental code rate comprises a positive incremental code rate and a negative incremental code rate;

and the violation identification module is used for comparing the audio trend value with a preset audio trend threshold, judging whether the audio stream has sound according to the comparison result, giving a label to the audio stream with the sound, identifying the multi-channel audio stream by listening to the sound content and the label, and capturing the violation audio stream.

A computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:

setting a buffering average data frame parameter according to a media element audio source node and an analysis node, and detecting an audio stream according to the buffering average data frame parameter to obtain a data frame binary stream carrying audio data;

A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:

The method, the device, the computer equipment and the storage medium for identifying the illegal audio stream can monitor and calculate the audio stream according to the audio detector and the filter bound on each audio stream when multiple paths of audio streams are played simultaneously to obtain the audio code rate corresponding to each path of audio stream, subtract the audio code rate at the current moment from the audio code rate at the previous moment to obtain the incremental code rate, compare the incremental code rate with the audio code rate at the previous moment to generate an audio trend value, compare the audio trend value with the preset audio trend threshold value to judge whether the audio stream has sound, endow the audio stream with the sound with a label, and check the audio stream marked with the current sound when an auditor hears the illegal content, so that the illegal user corresponding to the illegal audio stream can be known, the capture identification of the illegal user is effectively accelerated in the process of illegal capture, and the illegal content is identified by directly hearing the sound content of the multiple paths of audio streams through the auditor, and the accuracy of the illegal identification is higher.

Drawings

FIG. 1 is a flow diagram illustrating a method for identifying an offending audio stream in one embodiment;

FIG. 2 is a diagram illustrating an internal structure of a computer device according to an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of and not restrictive on the broad application.

In one embodiment, as shown in fig. 1, there is provided a method for identifying an illegal audio stream, including the steps of:

102, obtaining a plurality of paths of audio streams of a live broadcast room, stripping and instantiating each path of audio stream to obtain an audio context, and creating a media element audio source node and an analysis node according to the audio context.

It can be understood that, by performing stripping and instantiation processing on a current real-time playing video (video) object on each Audio stream, an Audio Context (Audio Context) can be obtained, and a Media Element Audio Source Node (Media Element Audio Source Node) and an analysis Node (analyzer Node) are created according to the Audio Context, wherein the Media Element Audio Source Node is connected with the analysis Node and an Audio Context destination (Audio Context destination).

And 104, setting buffer average data frame parameters according to the media element audio source nodes and the analysis nodes, and detecting the audio stream according to the buffer average data frame parameters to obtain a data frame binary stream carrying the audio data.

It will be appreciated that the size of the buffered average data frame parameter indicates the sensitivity to audio stream detection, the greater the buffered average data frame parameter setting the lower the sensitivity to audio detection, and typically the buffered average data frame parameter setting is 0.1.

Step 106, monitoring and calculating the data frame binary stream according to the audio detector and the filter bound on each path of audio stream to obtain an audio code rate, subtracting the audio code rate at the current moment from the audio code rate at the previous moment to obtain an incremental code rate, and comparing the incremental code rate with the audio code rate at the previous moment to obtain an audio trend value; wherein the delta code rates comprise positive delta code rates and negative delta code rates.

It can be understood that the positive incremental code rate indicates that the audio code rate at the current moment is greater than the audio code rate at the previous moment, and the negative incremental code rate is greater than the audio code rate at the current moment and less than the audio code rate at the previous moment.

It can be understood that each path of audio stream is bound with an independent audio detector and an independent filter, and when multiple paths of streams are played simultaneously, the data frame binary stream of each path of audio stream can be monitored and calculated to obtain an audio code rate, wherein the larger the code rate value is, the larger the sound is, and the smaller the sound is, the less the sound is; the incremental code rate is obtained by subtracting the audio code rate at the current moment from the audio code rate at the previous moment, and the incremental code rate is compared with the audio code rate at the previous moment, so that the audio trend value is further obtained, the misjudgment can be better reduced, and the capture accuracy of the subsequent illegal audio stream is improved.

And 108, comparing the audio trend value with a preset audio trend threshold, judging whether the audio stream has sound according to the comparison result, giving a label to the audio stream with the sound, identifying the multi-channel audio stream by listening to the sound content and the label, and capturing to obtain the illegal audio stream.

It can be understood that when the audio trend value is greater than the preset audio trend threshold value, it indicates that the audio stream has sound, otherwise, the audio stream does not have sound, and the state of whether the audio stream has sound is displayed by adding an obvious mark through the recognition of the sound of the audio stream, and after hearing the violation content, the auditor can see whether the corresponding audio stream has a sound mark, and at this time, the detection of the violation audio can be quickly and accurately completed according to the sound mark and the sound content.

Specifically, a 32 floating point Array (Float 32 Array) is created to carry the track data according to the length of the data frame binary stream.

In one embodiment, the monitoring and calculating the audio track data in the floating point array according to the audio detector and the filter bound to each audio stream to obtain the audio code rate includes:

The method and the device have the advantages that the noise reduction processing is carried out on the audio track data through the audio detector, the noise in the audio track data is removed, the interference caused by the follow-up noise on the calculation of the audio code rate is avoided, and the accuracy of the calculation of the audio code rate is improved.

In one embodiment, sampling the noise-reduced audio track data according to a filter to obtain a plurality of sets of audio track data samples and an index filter value, includes: performing Fast Fourier Transform (FFT) sampling on the noise-reduced audio track data according to the sampling frequency set by the filter to obtain a plurality of groups of audio track data samples;

carrying out quantization processing on the length of the sampling number to obtain a sample index;

and comparing the product of the sample index and the sampling frequency with an initial threshold value set by the filter to obtain an index filter value. Wherein the length of the index filter values is obtained by dividing the sampling frequency by the fast fourier transform values (FFT size).

It is understood that the sampling frequency set by the filter refers to the number of times the audio track data is sampled per unit time,

comparing the product of the sample index and the sampling frequency with an initial threshold set by a filter to obtain an index filter value, comprising:

comparing the product of the sample index and the sampling frequency with an initial threshold value set by a filter, and setting the index filter value to be 0 when the product is smaller than a first initial threshold value; when the product is less than the second initial threshold, the index filter value is set to 1, and the effective value of the index filter value set here ranges from the first threshold to the second threshold, because the low sound wave belongs to the first threshold, the ultrasonic wave belongs to the second threshold, and the low sound wave and the high sound wave belong to the silence.

setting an initial audio trend value to be zero;

Specifically, the initial audio trend value of the audio code rate is set to be 0, the multiple of the positive multiple of the audio code rate at the previous moment is set to be 2, the multiple of the negative multiple of the audio code rate at the previous moment is set to be 0.5, and the multiple can be adjusted according to actual conditions.

It can be understood that by calculating the audio trend value of the audio code rate, the audio trend value can be maintained when the sound is not fluctuated smoothly, so as to avoid the error generated when the audio code rate is not changed.

It can be understood that by marking audio streams with sounds, an auditor can quickly see which route the current violation sounds are generated by, thereby capturing the actual violation users.

It should be understood that, although the steps in the flowchart of fig. 1 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 1 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performing the sub-steps or stages is not necessarily sequential, but may be performed alternately or alternately with other steps or at least some of the sub-steps or stages of other steps.

In one embodiment, there is provided an apparatus for identifying an offending audio stream, including: the device comprises an acquisition module, a detection module, a monitoring module and an illegal identification module, wherein:

and the audio stream acquisition module is used for acquiring a plurality of paths of audio streams of a live broadcast room, stripping and instantiating each path of audio stream to obtain an audio context, and creating a media element audio source node and an analysis node according to the audio context.

It is understood that by stripping and instantiating the currently live video object on each audio stream, an audio context can be obtained, and a media element audio source node and an analysis node are created according to the audio context, wherein the media element audio source node is connected with the analysis node and an audio context destination.

And the audio stream detection module is used for setting a buffer average data frame parameter according to the media element audio source node and the analysis node, and detecting the audio stream according to the buffer average data frame parameter to obtain a data frame binary stream carrying audio data.

It will be appreciated that the size of the buffered average data frame parameter indicates the sensitivity to audio stream detection filtering, the greater the setting of the buffered average data frame parameter, the less sensitive to audio detection, and typically the buffered average data frame parameter is set to 0.1.

The audio code rate calculation module is used for monitoring and calculating the data frame binary stream according to the audio detector and the filter bound on each path of audio stream to obtain the audio code rate; subtracting the audio code rate at the current moment from the audio code rate at the previous moment to obtain an incremental code rate, and comparing the incremental code rate with the audio code rate at the previous moment to obtain an audio trend value; the incremental code rate comprises a positive incremental code rate and a negative incremental code rate.

For specific limitations of the identification device for the illegal audio stream, reference may be made to the above limitations of the identification method for the illegal audio stream, which are not described herein again. The modules in the above device for identifying an offending audio stream can be implemented in whole or in part by software, hardware, and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a terminal, and its internal structure diagram may be as shown in fig. 2. The computer device includes a processor, a memory, a network interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method of identifying an offending audio stream. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.

Those skilled in the art will appreciate that the architecture shown in fig. 2 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, there is provided a computer device comprising a memory storing a computer program and a processor implementing the following steps when the processor executes the computer program:

acquiring multiple paths of audio streams of a live broadcast room, stripping and instantiating each path of audio stream to obtain an audio context, and creating a media element audio source node and an analysis node according to the audio context;

In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of:

monitoring and calculating the data frame binary stream according to the audio detector and the filter bound on each path of audio stream to obtain an audio code rate, subtracting the audio code rate at the current moment from the audio code rate at the previous moment to obtain an incremental code rate, and comparing the incremental code rate with the audio code rate at the previous moment to obtain an audio trend value; the incremental code rate comprises a positive incremental code rate and a negative incremental code rate; and comparing the audio trend value with a preset audio trend threshold, judging whether the audio stream has sound according to the comparison result, giving a label to the audio stream with the sound, identifying the multi-channel audio stream by listening to the sound content and the label, and capturing the illegal audio stream.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), rambus (Rambus) direct RAM (RDRAM), direct Rambus Dynamic RAM (DRDRAM), and Rambus Dynamic RAM (RDRAM), among others.

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A method of identifying an offending audio stream, the method comprising:

setting a buffering average data frame parameter according to the media element audio source node and the analysis node, and detecting the audio stream according to the buffering average data frame parameter to obtain a data frame binary stream carrying audio data;

monitoring and calculating the data frame binary stream according to an audio detector and a filter bound on each path of audio stream to obtain an audio code rate, subtracting the audio code rate at the current moment from the audio code rate at the previous moment to obtain an incremental code rate, and comparing the incremental code rate with the audio code rate at the previous moment to obtain an audio trend value; wherein the incremental code rate comprises a positive incremental code rate and a negative incremental code rate;

and comparing the audio trend value with a preset audio trend threshold, judging whether the audio stream has sound according to a comparison result, giving a label to the audio stream with the sound, identifying the multi-channel audio stream by listening to sound content and the label, and capturing to obtain the illegal audio stream.

2. The method of claim 1, wherein the monitoring and calculating the data frame binary stream according to the audio detector and the filter bound to each audio stream to obtain the audio bitrate comprises:

creating a floating point array carrying audio track data according to the length of the data frame binary stream;

3. The method of claim 2, wherein the listening and calculating the audio track data in the floating point array according to the audio detector and the filter bound to each audio stream to obtain the audio code rate comprises:

4. The method of claim 3, wherein sampling the denoised soundtrack data according to a filter to obtain a plurality of sets of soundtrack data samples and indexing filter values, comprising:

5. The method of claim 4, wherein the initial threshold set by the filter comprises a first initial threshold and a second initial threshold;

comparing the product of the sample index and the sampling frequency to an initial threshold set by a filter, and setting the index filter value to 0 when the product is less than the first initial threshold; setting the index filter value to 1 when the product is less than the second initial threshold.

6. The method of claim 1, wherein comparing the incremental code rate with the audio code rate at the previous time to obtain an audio trend value comprises:

setting an initial audio trend value to be zero;

comparing the positive incremental code rate with a positive multiple of the audio code rate at the last moment, wherein when the positive incremental code rate is greater than the positive multiple of the audio code rate at the last moment, the audio trend value is the initial audio trend value plus one, otherwise, the audio trend value keeps the initial audio trend value unchanged;

7. The method of claim 1, wherein comparing the audio trend value with a preset audio trend threshold value, and determining whether the audio stream has sound according to the comparison result comprises:

8. An apparatus for identifying an offending audio stream, said apparatus comprising:

the audio code rate calculation module is used for monitoring and calculating the data frame binary stream according to the audio detector and the filter bound on each path of audio stream to obtain an audio code rate, subtracting the audio code rate at the current moment from the audio code rate at the previous moment to obtain an incremental code rate, and comparing the incremental code rate with the audio code rate at the previous moment to obtain an audio trend value; wherein the incremental code rate comprises a positive incremental code rate and a negative incremental code rate;

9. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the steps of the method of any one of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.