CN113824986B

CN113824986B - Method, device, storage medium and equipment for auditing live audio based on context

Info

Publication number: CN113824986B
Application number: CN202111110487.5A
Authority: CN
Inventors: 姚庆; 樊伟华; 杜晓祥
Original assignee: Beijing Yunshang Technology Co ltd
Current assignee: Beijing Yunshang Technology Co ltd
Priority date: 2021-09-18
Filing date: 2021-09-18
Publication date: 2024-03-29
Anticipated expiration: 2041-09-18
Also published as: CN113824986A

Abstract

Based on a context live audio auditing method, a device, a storage medium and equipment, preprocessing live audio streams by acquiring live audio streams to be audited, wherein the preprocessing comprises cutting the live audio streams according to preset frequency, and cutting the live audio streams to obtain a plurality of single audio files; sequencing the acquired plurality of single audio files according to a time sequence; splicing adjacent single audio files to obtain spliced audio files to be checked; performing voice recognition on the audio file to be checked to obtain text content of the audio file to be checked; and carrying out text auditing on the text content, and checking whether predefined illegal contents exist in the text content. On the basis of sectional audit, the invention combines adjustable sectional frequency to realize second-level audio fragment acquisition, greatly ensures audit instantaneity, and re-integrates audit files, solves the pain point of file content fragmentation, and greatly improves the follow-up recognition accuracy and effect.

Description

Method, device, storage medium and equipment for auditing live audio based on context

Technical Field

The invention relates to the technical field of audio processing, in particular to a live audio auditing method, device, storage medium and equipment based on context.

Background

Currently, in the internet scenario, auditing of content is often necessary based on compliance needs, or consideration of actual business requirements. Compared with other carriers, such as images or words, the content of the audio published by the user is more sensitive or illegal, and particularly in live audio, a detection means with stronger real-time performance is more needed.

In the prior art, a live audio auditing scheme generally adopts segmentation auditing, the segmentation auditing segments a real-time audio stream into a plurality of audio files so as to perform voice recognition and text detection, and when the scheme segments the audio stream, the integrity of voice content is difficult to ensure, a sentence or a piece of content is easily segmented into a plurality of fragments, so that the fragmentation of the audio content is caused, the accuracy of the content based on the voice recognition of the file cannot be ensured, and a large number of missed detection and false detection are easily caused. Therefore, the live audio is only audited by the traditional scheme, and coverage rate of content audit is difficult to ensure. A new live audio auditing solution is needed.

Disclosure of Invention

Therefore, the invention provides a method, a device, a storage medium and equipment for auditing live audio based on context, which are used for solving the problems that the existing live audio auditing is easy to cause false leakage detection and the coverage rate of audio content auditing is difficult to ensure.

In order to achieve the above object, the present invention provides the following technical solutions: in a first aspect, a method for auditing live audio based on context is provided, including the following steps:

acquiring a live audio stream to be audited, and preprocessing the live audio stream, wherein the preprocessing comprises cutting the live audio stream according to a preset frequency, and cutting the live audio stream to obtain a plurality of single audio files;

sequencing the acquired single audio files according to a time sequence;

splicing the adjacent single audio files to obtain spliced audio files to be checked;

performing voice recognition on the audio file to be checked to obtain text content of the audio file to be checked;

and carrying out text auditing on the text content, and checking whether predefined illegal contents exist in the text content.

As a preferred scheme of the context-based live audio auditing method, the process of preprocessing the live audio stream includes:

acquiring a live audio stream, and dividing the live audio stream into single audio files in real time according to a preset frequency;

and acquiring the single audio file after segmentation, and splicing the single audio file with the single audio file in the previous time period to obtain a new audio file to be checked after splicing.

As a preferred scheme of the context-based live audio auditing method, the sequence of cutting the live audio stream is as follows:

and cutting the live audio stream into a plurality of single audio file fragments with equal duration at fixed frequency according to the time sequence until the live broadcast is finished or interrupted, and stopping cutting the live broadcast audio stream.

As a preferable scheme based on the context live audio auditing method, the method for splicing two adjacent single audio files is as follows:

and acquiring a single audio file to be checked, finding the previous audio of the single audio file according to the time sequence, combining the single audio file and the previous audio into a new audio file to be checked, and then performing voice recognition processing.

As a preferred scheme of the context-based live audio auditing method, checking whether predefined offending content exists in the text content includes:

defining the scope of the illegal contents, and determining the specific type of the illegal contents;

time positioning is carried out on the detected illegal audio fragments, and the live broadcast real-time is marked;

and carrying out time positioning on the detected illegal audio fragments, and marking the duration time of the audio fragments.

In a second aspect, a context-based live audio auditing apparatus is provided, including:

the audio stream preprocessing module is used for acquiring a live audio stream to be audited, preprocessing the live audio stream, wherein the preprocessing comprises the steps of cutting the live audio stream according to preset frequency, and cutting the live audio stream to obtain a plurality of single audio files;

the audio file ordering module is used for ordering the acquired single audio files according to the time sequence;

the audio file splicing module splices the adjacent single audio files to obtain spliced audio files to be checked;

the voice recognition module is used for carrying out voice recognition on the audio file to be checked to obtain text content of the audio file to be checked;

and the illegal content auditing module is used for conducting text auditing on the text content and checking whether predefined illegal content exists in the text content.

As a preferable scheme of the context-based live broadcast audio auditing device, the audio stream preprocessing module segments the obtained live broadcast audio stream into single audio files in real time according to a preset frequency;

and in the audio file splicing module, a single audio file after being cut is obtained, and the single audio file is spliced with a single audio file in the previous time period to obtain a new audio file to be checked after being spliced.

As a preferable scheme of the context-based live audio auditing device, the order of the audio stream preprocessing module for cutting the live audio stream is as follows: and cutting the live audio stream into a plurality of single audio file fragments with equal duration at fixed frequency according to the time sequence until the live broadcast is finished or interrupted, and stopping cutting the live broadcast audio stream.

As a preferable scheme based on the context live broadcast audio auditing device, the audio file splicing module splices two adjacent single audio files in the following way: and acquiring a single audio file to be checked, finding the previous audio of the single audio file according to the time sequence, combining the single audio file and the previous audio into a new audio file to be checked, and then performing voice recognition processing.

As a preferred solution of the context-based live broadcast audio auditing apparatus, the checking whether the predefined offence content exists in the text content in the offence content auditing module includes:

In a third aspect, a computer readable storage medium having stored therein program code for a context based live audio auditing method, the program code comprising instructions for performing the context based live audio auditing method of the first aspect or any possible implementation thereof.

In a fourth aspect, an electronic device is provided, the electronic device comprising a processor coupled with a storage medium, which when executed by the processor, causes the electronic device to perform the context-based live audio auditing method of the first aspect or any possible implementation thereof.

The invention has the following advantages: preprocessing the live audio stream by acquiring the live audio stream to be audited, wherein the preprocessing comprises the steps of cutting the live audio stream according to a preset frequency, and cutting the live audio stream to obtain a plurality of single audio files; sequencing the acquired plurality of single audio files according to a time sequence; splicing adjacent single audio files to obtain spliced audio files to be checked; performing voice recognition on the audio file to be checked to obtain text content of the audio file to be checked; and carrying out text auditing on the text content, and checking whether predefined illegal contents exist in the text content. On the basis of sectional audit, the invention combines adjustable sectional frequency to realize second-level audio fragment acquisition, greatly ensures audit instantaneity, and re-integrates audit files, solves the pain point of file content fragmentation, and greatly improves the follow-up recognition accuracy and effect.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It will be apparent to those of ordinary skill in the art that the drawings in the following description are exemplary only and that other implementations can be obtained from the extensions of the drawings provided without inventive effort.

The structures, proportions, sizes, etc. shown in the present specification are shown only for the purposes of illustration and description, and are not intended to limit the scope of the invention, which is defined by the claims, so that any structural modifications, changes in proportions, or adjustments of sizes, which do not affect the efficacy or the achievement of the present invention, should fall within the scope of the invention.

Fig. 1 is a schematic flow chart of a live audio auditing method based on context according to an embodiment of the present invention;

fig. 2 is a schematic diagram of a context-based live audio auditing apparatus according to an embodiment of the present invention.

Detailed Description

Other advantages and advantages of the present invention will become apparent to those skilled in the art from the following detailed description, which, by way of illustration, is to be read in connection with certain specific embodiments, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

It is known that speech recognition, which converts a text into a corresponding text or command by means of speech signal processing and pattern recognition, is performed by a machine automatically recognizing and understanding the text or command. A speech recognition system is essentially a pattern recognition system, and typically includes three basic units, feature extraction, pattern matching, and a reference pattern library.

Example 1

Referring to fig. 1, embodiment 1 of the present invention provides a method for auditing live audio based on context, comprising the following steps:

s1, acquiring a live audio stream to be audited, and preprocessing the live audio stream, wherein the preprocessing comprises cutting the live audio stream according to a preset frequency, and cutting the live audio stream to obtain a plurality of single audio files;

s2, sequencing the acquired single audio files according to a time sequence;

s3, splicing the adjacent single audio files to obtain spliced audio files to be checked;

s4, performing voice recognition on the audio file to be checked to obtain text content of the audio file to be checked;

s5, conducting text auditing on the text content, and checking whether predefined illegal contents exist in the text content.

In this embodiment, in step S1, the process of preprocessing the live audio stream includes:

and acquiring a live audio stream, and dividing the live audio stream into single audio files in real time according to a preset frequency.

In this embodiment, the order of cutting the live audio stream in step S1 is as follows:

Specifically, the live audio stream is preprocessed according to the fixed frequency, and the larger the fixed frequency is, the higher the real-time performance of the generated single audio file is, so that the higher the auditing real-time performance is. As live video continues, new individual audio files are continually generated from the live audio stream, with the individual audio files being ordered in chronological order.

Specifically, the fixed frequency is adjustable, the shortest is that a single audio file segment is generated every second, an isochronous audio file segment file is continuously obtained by using a tool according to the fixed frequency, and the auditing instantaneity is ensured; individual audio files are stored in time series on the disk.

In this embodiment, in step S3, a single audio file after segmentation is obtained, and the single audio file is spliced with a single audio file in a previous time period to obtain a new audio file to be checked after splicing.

Specifically, a single audio file to be detected is obtained, the single audio file is preprocessed, each single audio file is combined with the last single audio file, the integrity of voice is guaranteed to the greatest extent, and therefore recognition accuracy is improved.

In this embodiment, in step S3, the manner of splicing the two adjacent single audio files is as follows:

In addition, the following splicing can be performed on the single audio files according to requirements: the audio files of the adjacent previous time period can be spliced, or the audio files of the adjacent next time period can be spliced, or the adjacent audio files of the previous time period and the next time period can be spliced at the same time, and the spliced files ensure the integrity of each section of audio content.

In this embodiment, in step S5, checking whether predefined offensive content exists in the text content includes:

Specifically, the audio file to be audited is converted into a recognizable file format, and the processed file is transmitted into a voice recognition system for content extraction. The processing mode of the identified text content is as follows: the text content is subjected to the examination of sensitive or illegal contents, and the time point of the illegal contents is positioned, so that the error detection can be reduced, and the real-time performance and accuracy of the examination of live audio are ensured.

Specifically, if the list of the checking results of the illegal contents is not empty, the illegal contents exist, the corresponding position is the illegal contents, and if the list of the checking results of the illegal contents is empty, the illegal contents do not exist.

In summary, the method and the device for preprocessing the live audio stream acquire the live audio stream to be audited, wherein the preprocessing comprises the steps of cutting the live audio stream according to a preset frequency, and cutting the live audio stream to acquire a plurality of single audio files; sequencing the acquired plurality of single audio files according to a time sequence; splicing adjacent single audio files to obtain spliced audio files to be checked; performing voice recognition on the audio file to be checked to obtain text content of the audio file to be checked; and carrying out text auditing on the text content, and checking whether predefined illegal contents exist in the text content. On the basis of sectional audit, the invention combines adjustable sectional frequency to realize second-level audio fragment acquisition, greatly ensures audit instantaneity, and re-integrates audit files, solves the pain point of file content fragmentation, and greatly improves the follow-up recognition accuracy and effect.

Example 2

Referring to fig. 2, embodiment 2 of the present invention provides a device for auditing live audio based on context, including:

the audio stream preprocessing module 1 is used for acquiring a live audio stream to be audited, preprocessing the live audio stream, wherein the preprocessing comprises cutting the live audio stream according to a preset frequency, and cutting the live audio stream to obtain a plurality of single audio files;

the audio file ordering module 2 is used for ordering the acquired single audio files according to the time sequence;

the audio file splicing module 3 splices the adjacent single audio files to obtain spliced audio files to be checked;

the voice recognition module 4 is used for carrying out voice recognition on the audio file to be checked to obtain text content of the audio file to be checked;

and the illegal content auditing module 5 is used for conducting text auditing on the text content and checking whether predefined illegal content exists in the text content.

In this embodiment, the audio stream preprocessing module 1 segments the obtained live audio stream into single audio files in real time according to a preset frequency;

and in the audio file splicing module 3, a single audio file after segmentation is obtained, and the single audio file is spliced with a single audio file in the previous time period to obtain a new audio file to be checked after splicing.

In this embodiment, the order in which the audio stream preprocessing module 1 cuts the live audio stream is: and cutting the live audio stream into a plurality of single audio file fragments with equal duration at fixed frequency according to the time sequence until the live broadcast is finished or interrupted, and stopping cutting the live broadcast audio stream.

In this embodiment, the manner in which the audio file splicing module 3 splices two adjacent single audio files is: and acquiring a single audio file to be checked, finding the previous audio of the single audio file according to the time sequence, combining the single audio file and the previous audio into a new audio file to be checked, and then performing voice recognition processing.

In this embodiment, the checking whether the predefined illegal content exists in the text content in the illegal content auditing module 5 includes:

It should be noted that, because the content of information interaction and execution process between the modules/units of the above-mentioned apparatus is based on the same concept as the method embodiment in embodiment 1 of the present application, the technical effects brought by the content are the same as the method embodiment of the present application, and the specific content can be referred to the description in the foregoing illustrated method embodiment of the present application, which is not repeated herein.

Example 3

Embodiment 3 of the present invention provides a computer-readable storage medium having stored therein program code for a context-based live audio auditing method, the program code comprising instructions for performing the context-based live audio auditing method of embodiment 1 or any possible implementation thereof.

Computer readable storage media can be any available media that can be accessed by a computer or data storage devices, such as servers, data centers, etc., that contain an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid state disk (SolidStateDisk, SSD)), etc.

Example 4

Embodiment 4 of the present invention provides an electronic device including a processor coupled with a storage medium, which when executing instructions in the storage medium, causes the electronic device to perform the context-based live audio auditing method of embodiment 1 or any possible implementation thereof.

Specifically, the processor may be implemented by hardware or software, and when implemented by hardware, the processor may be a logic circuit, an integrated circuit, or the like; when implemented in software, the processor may be a general-purpose processor, implemented by reading software code stored in a memory, which may be integrated in the processor, or may reside outside the processor, and which may reside separately.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present invention, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, for example, by wired (e.g., coaxial cable, optical fiber, digital Subscriber Line (DSL)), or wireless (e.g., infrared, wireless, microwave, etc.).

It will be appreciated by those skilled in the art that the modules or steps of the invention described above may be implemented in a general purpose computing device, they may be concentrated on a single computing device, or distributed across a network of computing devices, they may alternatively be implemented in program code executable by computing devices, so that they may be stored in a memory device for execution by computing devices, and in some cases, the steps shown or described may be performed in a different order than that shown or described, or they may be separately fabricated into individual integrated circuit modules, or multiple modules or steps within them may be fabricated into a single integrated circuit module for implementation. Thus, the present invention is not limited to any specific combination of hardware and software.

While the invention has been described in detail in the foregoing general description and specific examples, it will be apparent to those skilled in the art that modifications and improvements can be made thereto. Accordingly, such modifications or improvements may be made without departing from the spirit of the invention and are intended to be within the scope of the invention as claimed.

Claims

1. The method for auditing the live audio based on the context is characterized by comprising the following steps:

sequencing the acquired single audio files according to a time sequence;

performing text auditing on the text content, and checking whether predefined illegal contents exist in the text content;

the process of preprocessing the live audio stream comprises the following steps:

acquiring a single audio file after segmentation, and splicing the single audio file with a single audio file in a previous time period to obtain a new audio file to be checked after splicing;

the method for splicing the adjacent two single audio files is as follows:

acquiring a single audio file to be checked, finding the previous audio of the single audio file according to the time sequence, combining the single audio file and the previous audio into a new audio file to be checked, and then performing voice recognition processing;

checking whether predefined offending content exists in the text content includes:

2. The context-based live audio auditing method of claim 1, characterized in that the order in which the live audio streams are cut is:

3. Based on context live broadcast audio auditing device, its characterized in that includes:

the illegal content auditing module is used for conducting text auditing on the text content and checking whether predefined illegal content exists in the text content;

the audio stream preprocessing module divides the obtained live audio stream into single audio files in real time according to preset frequency;

the audio file splicing module is used for acquiring a single audio file after segmentation, and splicing the single audio file with a single audio file in a previous time period to obtain a new audio file to be checked after splicing;

the audio file splicing module splices two adjacent single audio files in the following way: acquiring a single audio file to be checked, finding the previous audio of the single audio file according to the time sequence, combining the single audio file and the previous audio into a new audio file to be checked, and then performing voice recognition processing;

in the offence content auditing module, checking whether predefined offence content exists in the text content includes:

4. The context-based live audio auditing device of claim 3, wherein the order in which the audio stream preprocessing module cuts the live audio stream is: and cutting the live audio stream into a plurality of single audio file fragments with equal duration at fixed frequency according to the time sequence until the live broadcast is finished or interrupted, and stopping cutting the live broadcast audio stream.

5. A computer readable storage medium having stored therein program code for a context based live audio auditing method, characterized in that the program code comprises instructions for performing the context based live audio auditing method of any of claims 1-2.

6. An electronic device comprising a processor coupled to a storage medium, wherein the processor, when executing instructions in the storage medium, causes the electronic device to perform the context-based live audio auditing method of any of claims 1-2.