CN113852835A

CN113852835A - Live broadcast audio processing method and device, electronic equipment and storage medium

Info

Publication number: CN113852835A
Application number: CN202111111150.6A
Authority: CN
Inventors: 杜康
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-09-22
Filing date: 2021-09-22
Publication date: 2021-12-28

Abstract

The disclosure provides a live audio processing method and device, electronic equipment and a storage medium, and relates to the technical field of computers, in particular to the technical field of voice. The specific implementation scheme is as follows: responding to the generation completion of the live broadcast audio information, and acquiring a transport stream file corresponding to the live broadcast audio information in real time; converting the transport stream file into an audio file and a text file; determining whether the audio file and the text file include predetermined abnormality information; in the event that it is determined that predetermined anomaly information is included in at least one of the audio file and the text file, a disposition instruction is sent to a live subject associated with the live audio information.

Description

Live broadcast audio processing method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a live audio processing method and apparatus, an electronic device, and a storage medium.

Background

With the development of internet technology, more and more people begin to pay attention to live webcasting, which can be represented by two forms of live video and live audio. Live audio is a real-time audio playing technology, which is similar to live video, emphasizes real-time performance, and is different in that only audio is provided with fewer image elements.

Disclosure of Invention

The disclosure provides a live audio processing method and device, electronic equipment and a storage medium.

According to an aspect of the present disclosure, there is provided a live audio processing method including: responding to the generation completion of the live audio information, and acquiring a transport stream file corresponding to the live audio information in real time; converting the transport stream file into an audio file and a text file; determining whether the audio file and the text file include predetermined abnormality information; in an instance in which it is determined that predetermined anomaly information is included in at least one of the audio file and the text file, sending a disposition instruction to a live subject associated with the live audio information.

According to another aspect of the present disclosure, there is provided a live audio processing apparatus including: the acquisition module is used for responding to the generation completion of the live broadcast audio information and acquiring a transport stream file corresponding to the live broadcast audio information in real time; the conversion module is used for converting the transport stream file into an audio file and a text file; a determining module for determining whether the audio file and the text file include predetermined abnormal information; the processing module is used for processing the audio file and the text file in a preset mode, and sending a processing instruction to a live object related to the live audio information under the condition that at least one of the audio file and the text file comprises preset abnormal information.

According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a live audio processing method as described above.

According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the live audio processing method as described above.

According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements a live audio processing method as described above.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

fig. 1 schematically illustrates an exemplary system architecture to which live audio processing methods and apparatus may be applied, according to an embodiment of the present disclosure;

fig. 2 schematically shows a flow diagram of a live audio processing method according to an embodiment of the present disclosure;

fig. 3 schematically illustrates a schematic diagram of acquiring a transport stream file corresponding to live audio information in response to completion of generation of the live audio information according to an embodiment of the present disclosure;

FIG. 4 schematically illustrates a diagram of identifying a transition for a transport stream file, according to an embodiment of the disclosure;

FIG. 5 schematically shows a diagram of a live audio review flow, in accordance with an embodiment of the disclosure;

fig. 6 schematically shows a block diagram of a live audio processing apparatus according to an embodiment of the present disclosure; and

FIG. 7 illustrates a schematic block diagram of an example electronic device that can be used to implement embodiments of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In the technical scheme of the disclosure, the collection, storage, use, processing, transmission, provision, disclosure and other processing of the personal information of the related user are all in accordance with the regulations of related laws and regulations, necessary security measures are taken, and the customs of the public order is not violated.

The audio live broadcast is a live broadcast which is related to social contact by voice and has no visual picture scene. Under the scene, how to quickly and effectively audit the words of a main broadcast or audiences in a live broadcast room in real time and identify yellow reflex and political affair related voices is very important, so that relevant real-time punishment is carried out.

In order to realize the examination of the words of the main broadcast or audiences in the live broadcast room, an auditor can listen to the audio streams of countless live broadcast rooms in real time and listen to characters by ears for examination. Or after the live broadcast is finished, characters can be recognized through a universal voice recognition tool, so that the examination and verification can be performed.

The inventor finds that the mode that an auditor listens to the audio stream in real time for auditing can only be applied to a few scenes of a live broadcast room, and the application range is relatively limited in the process of realizing the concept disclosed by the invention. When a lot of voices are simultaneously started, a lot of auditing manpower is consumed, and the cost is also high. Thus. When the live voice stream is numerous, a large amount of human resources are consumed, and an auditor can be fatigued after listening too many things, which can lead to misjudgment. After the live broadcast is finished, voice recognition is carried out, and then the live broadcast is sent to an auditing platform for auditing, so that the timeliness is low, and abnormal information can be exposed.

In view of this, the embodiments of the present disclosure provide a scheme that can accurately identify the quality of audio data in a live voice broadcast room in real time, so as to alleviate the above drawbacks.

Fig. 1 schematically shows an exemplary system architecture to which the live audio processing method and apparatus may be applied, according to an embodiment of the present disclosure.

It should be noted that fig. 1 is only an example of a system architecture to which the embodiments of the present disclosure may be applied to help those skilled in the art understand the technical content of the present disclosure, and does not mean that the embodiments of the present disclosure may not be applied to other devices, systems, environments or scenarios. For example, in another embodiment, an exemplary system architecture to which the live audio processing method and apparatus may be applied may include a terminal device, but the terminal device may implement the live audio processing method and apparatus provided in the embodiments of the present disclosure without interacting with a server.

As shown in fig. 1, the system architecture 100 according to this embodiment may include

terminal devices

101, 102, 103, a network 104 and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired and/or wireless communication links, and so forth.

The user may use the

terminal devices

101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The

terminal devices

101, 102, 103 may have installed thereon various communication client applications, such as a knowledge reading application, a web browser application, a search application, an instant messaging tool, a mailbox client, and/or social platform software, etc. (by way of example only).

The

terminal devices

101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.

The server 105 may be a server providing various services, such as a background management server (for example only) providing support for content browsed by the user using the

terminal devices

101, 102, 103. The background management server may analyze and perform other processing on the received data such as the user request, and feed back a processing result (e.g., a webpage, information, or data obtained or generated according to the user request) to the terminal device. The Server may be a cloud Server, which is also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service extensibility in a traditional physical host and a VPS service ("Virtual Private Server", or "VPS" for short). The server may also be a server of a distributed system, or a server incorporating a blockchain.

It should be noted that the live audio processing method provided by the embodiment of the present disclosure may be generally executed by the

terminal device

101, 102, or 103. Accordingly, the live audio processing apparatus provided by the embodiment of the present disclosure may also be disposed in the

terminal device

101, 102, or 103.

Alternatively, the live audio processing method provided by the embodiment of the present disclosure may also be generally performed by the server 105. Accordingly, the live audio processing apparatus provided by the embodiments of the present disclosure may be generally disposed in the server 105. The live audio processing method provided by the embodiment of the present disclosure may also be executed by a server or a server cluster that is different from the server 105 and is capable of communicating with the

terminal devices

101, 102, 103 and/or the server 105. Accordingly, the live audio processing apparatus provided by the embodiment of the present disclosure may also be disposed in a server or a server cluster different from the server 105 and capable of communicating with the

terminal devices

101, 102, 103 and/or the server 105.

For example, when generating live audio, the

terminal devices

101, 102, and 103 may acquire target content in an electronic book pointed by a user's line of sight, then send the acquired target content to the server 105, and the server 105 analyzes the target content to determine feature information of the target content; predicting the content which is interested by the user according to the characteristic information of the target content; and extracting the content of interest to the user. Or by a server or server cluster capable of communicating with the

terminal devices

101, 102, 103 and/or the server 105, to analyze the target content and finally to enable the extraction of content of interest to the user.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

Fig. 2 schematically shows a flow chart of a live audio processing method according to an embodiment of the present disclosure.

As shown in fig. 2, the method includes operations S210 to S240.

In operation S210, in response to the generation of the live audio information being completed, a transport stream file corresponding to the live audio information is acquired in real time.

The transport stream file is converted into an audio file and a text file in operation S220.

In operation S230, it is determined whether the audio file and the text file include predetermined abnormality information.

In operation S240, in case it is determined that predetermined abnormality information is included in at least one of the audio file and the text file, a handling instruction is transmitted to a live subject related to the live audio information.

According to an embodiment of the present disclosure, the live audio information may include voice information uttered by a live user in at least one of a video live room, an audio live room, and the like. Under the condition that a live broadcast user sends out any section of voice information with any length, the generation of the live broadcast audio information can be represented to be completed. The transport stream file may represent a file of live audio information characterized in a transport stream format. The transport stream format may include a TS (transport stream) format, and a storage manner based on the TS format may store the live audio information in segments as TS files, so that the purpose of obtaining live audio by independently decoding any segment of the live audio information is achieved.

According to an embodiment of the present disclosure, the predetermined abnormality information may include various kinds of abnormality information that do not conform to a common order. The live object may include at least one of a live room and a live user. The handling instructions may include instructions to initiate at least one of a debit, an alert, a banning, a seal number, etc., to at least one of a live room and a live user that generated the exception information.

According to the embodiment of the disclosure, under the condition that a live broadcast user in a live broadcast room sends out voice information, the voice information can be audited in real time so as to determine whether the voice information comprises abnormal information which is not in accordance with the good custom of the public order. Under the condition that the voice information sent by the live broadcast user comprises abnormal information which is not in accordance with the social customs of the public order, a handling instruction can be initiated to at least one of the corresponding live broadcast user and the live broadcast room where the live broadcast user is located in real time, and handling operations such as money deduction, warning, speech forbidding, number sealing and the like can be carried out on at least one of the corresponding live broadcast user and the live broadcast room where the live broadcast user is located.

Through the above embodiment of the present disclosure, live audio can be audited in real time while live audio is being played, and a disposition instruction can be sent to a live object generating the live audio in time to perform disposition under the condition that it is determined that the live audio includes predetermined abnormal information, so that the quality of audio data of the live audio can be controlled in time, and the outflow and diffusion of the abnormal information are reduced.

The method shown in fig. 2 is further described below with reference to specific embodiments.

According to an embodiment of the present disclosure, the predetermined abnormality information may include information related to a user contact address. The information related to the user contact information may include, for example, information such as a mobile phone number, an address, a name, and the like. The predetermined anomaly information may also include information related to the advertisement. The information related to the advertisement may be determined, for example, by predefining a corresponding advertisement word and then matching the audio information with the predefined corresponding advertisement word. The predetermined exception information may also include other information, which is not limited herein.

Through the embodiment of the disclosure, the information coverage range of the predefined abnormal information can be increased, so that the audio data can be processed in more directions, and the quality of the output audio is improved.

According to an embodiment of the present disclosure, the handling instructions may include at least one of warning instructions, ban-talk instructions, and seal instructions. The handling instructions may also include other instructions with a penalty meaning, and are not limited herein.

Through the above-mentioned embodiment of this disclosure, set up the processing instruction, handle the live broadcast object that produces the live broadcast audio frequency that includes predetermined abnormal information, can effectively control the audio data of live broadcast object output, improve the quality of output audio frequency.

According to an embodiment of the present disclosure, the live audio information may include a plurality of live audio information. In response to completion of the generation of the live audio information, acquiring a transport stream file corresponding to the live audio information in real time may include: and responding to the completion of the generation of the plurality of live broadcast audio information, and acquiring the transport stream files corresponding to the plurality of live broadcast audio information in real time and concurrently.

According to embodiments of the present disclosure, there may be live broadcasts of multiple live users at the same time. The method comprises the steps that a plurality of voice messages sent by a plurality of live users at the same moment can be acquired in parallel and audited in real time, and under the condition that one or more voice messages include abnormal information which is not in accordance with the common customs of the public order, a handling instruction is sent to at least one of the live users generating the voice messages and the live rooms where the live users are located in real time, and handling operations such as money deduction, warning, speech forbidding, number sealing and the like are carried out on at least one of the corresponding live users and the live rooms where the live users are located.

Through the above embodiment of the present disclosure, a plurality of live audio information can be audited simultaneously, and the problem of audit error caused by fatigue during manual audit of a plurality of live audio can be effectively alleviated through the process of carrying out subsequent audit based on the transport stream file. In addition, the quality of the audio data of each live broadcast audio can be controlled in time, and the outflow and diffusion of any piece of abnormal information are reduced.

According to an embodiment of the present disclosure, in response to completion of generation of live audio information, acquiring a transport stream file corresponding to live audio in real time may include: and responding to the generation completion of the live broadcast audio information, and acquiring a transport stream address for requesting transport stream fragment information related to the live broadcast audio information. And acquiring the transport stream fragment information according to the transport stream address. And generating a transport stream file according to the transport stream fragment information.

According to the embodiment of the disclosure, the transport stream fragment information may represent information of the TS file obtained by storing the live audio information in fragments, the transport stream address may represent a TS file address of the TS file, and the transport stream file may be a TS file with a preset duration obtained by merging a plurality of transport stream fragment information sorted in time sequence.

Fig. 3 schematically shows a schematic diagram of acquiring a transport stream file corresponding to live audio information in response to completion of generation of the live audio information according to an embodiment of the present disclosure.

As shown in fig. 3, multiple live rooms 310, such as live room a, live room B, etc. in fig. 3, may be opened online at the same time. Each live broadcast room can support multiple live broadcasts of users, for example, live broadcasts of user a1 and user a2 can be provided in live broadcast room a, live broadcasts of user B1 and user B2 can be provided in live broadcast room B, and the like. After the live broadcast room is started, the live broadcast basic service module 320 may generate corresponding live broadcast room identification information room _ id for the started live broadcast room. After a live user logs in a live broadcast room, the live broadcast basic service module 320 may generate corresponding live broadcast user identification information u _ id for the live broadcast user who logs in. In addition, the live basic service module 320 may also generate an m3u8 (a file format) stream address, that is, m3u8 url, for the live broadcasting room and live broadcasting user according to the room _ id and the u _ id, and store the m3u8 (a file format) stream address into the m3u8 address pool 330. m3u8 url may be expressed, for example, as xxxx. The pull stream parsing download service 340 can access m3u8 url in real time and obtain a TS file address for requesting a TS file. The resulting TS file address may be stored in the TS file pool 350.

As shown in fig. 3, the pull stream parsing download service 340 may include a pull stream parsing main service 341, an m3u8 parsing service 342, and a TS download service 343. Under the condition that the pull stream analysis main service 341 is started, the m3u8 analysis service 342 and the TS download service 343 can be started at the same time, and whether the m3u8 analysis service 342 and the TS download service 343 are active can be detected in a heartbeat detection manner, so that timely processing is performed under the condition that at least one of the m3u8 analysis service 342 and the TS download service 343 is detected to be in a fault, the normal operation state of each service is maintained, and the real-time performance of the whole process of live audio processing is effectively maintained.

According to the embodiment of the present disclosure, the pull stream parsing master service 341 may obtain m3u8 url in the m3u8 address pool 330 in real time and concurrently, and obtain a corresponding m3u8 stream address file. The TS file address of the corresponding TS file may be included in the m3u8 stream address file. In the event that it is determined that the current user is on-phone or online, the m3u8 streaming address file may be pushed to m3u8 resolution service 342. The m3u8 parsing service 342 may parse the m3u8 stream address file to obtain a TS file address for requesting a corresponding TS file. The parsed TS file address may be pushed to the TS file download service 343. The TS file download service 343 may invoke multiple protocols to download TS files in parallel and store them in the TS file pool 350.

According to the method for realizing real-time acquisition of the transport stream file, the transport stream fragment information is acquired according to the transport stream address acquired by responding to the generation completion of the live audio information, and the transport stream fragment information is generated to generate the transport stream file, so that the real-time performance of the acquisition of the transport stream file can be effectively improved, the real-time performance of live audio processing is enhanced, the quality of audio data of live audio is controlled in time, and the outflow and diffusion of abnormal information are reduced.

According to an embodiment of the present disclosure, converting the transport stream file into the audio file and the text file includes: and converting the transmission stream file into a voice file in a pulse code modulation format. And carrying out voice recognition on the voice file to obtain a text file.

According to an embodiment of the present disclosure, converting the transport stream file into the audio file may include at least one of converting the TS file into an mp3 (an audio video format) file and converting the TS file into a pcm (pulse code modulation) file. Converting the TS file into a pcm file may include: the TS file is converted into an mp3 file, and then the mp3 file is converted into a pcm file. Converting the transport stream file into a text file may include converting the pcm file into a text file.

Fig. 4 schematically shows a schematic diagram of performing conversion recognition on a transport stream file according to an embodiment of the present disclosure.

As shown in fig. 4, in the case of starting the speech recognition main service 410, ffmpeg (open source program with audio conversion function) processing service 420 and ASR (automatic speech recognition technology) recognition service 430 may be started at the same time, and whether ffmpeg processing service 420 and ASR recognition service 430 are active may be detected by means of heartbeat detection, so as to process in time in the case of detecting that at least one of ffmpeg processing service 420 and ASR recognition service 430 is failed, and maintain the normal operation state of each service, thereby effectively maintaining the real-time performance of the whole process of live audio processing.

According to an embodiment of the present disclosure, the voice recognition main service 410 may obtain a combined TS file having a preset duration from the TS file pool 350, and then send the combined TS file to the ffmpeg processing service 420. The ffmpeg processing service 420 may convert the TS file into an mp3 file 421 and store the mp3 file 421. The ffmpeg processing service 420 may also convert the mp3 file into a pcm formatted voice file 422 and then send the pcm formatted voice file 422 to the ASR recognition service 430. The ASR recognition service 430 may recognize the pcm formatted speech file 422, obtain a text file 431, and store the text file 431.

Through the embodiment of the disclosure, the transmission stream file can be converted into the audio file and the text file which can be audited, the combination of the corresponding auditing mode is improved, the live audio information can be efficiently processed, and the real-time performance of the processing process is improved.

According to an embodiment of the present disclosure, the live audio processing method may further include: in an instance in which it is determined that at least one of the audio file and the text file includes the predetermined sensitive word, at least one of the audio file and the text file is sent to a human disposition platform. And in response to the condition that the treatment result of the manual treatment platform is not passed, sending a treatment instruction to a live object related to the live audio information.

Fig. 5 schematically shows a schematic diagram of a live audio review flow according to an embodiment of the disclosure.

As shown in fig. 5, mp3 file 421 and text file 431 converted for the transport stream file corresponding to the live audio information may be sent to machine audit module 510 for auditing. The machine auditing module 510 may include a text policy-based machine auditing method and a voice policy-based machine auditing method. In the event that it is determined from the machine review method that predetermined anomaly information is included in the text file, a disposition instruction may be sent to at least one of a live room and a live user associated with the live audio information. Under the condition that at least one of the mp3 file 421 and the text file 431 includes the predetermined sensitive word according to the machine auditing method, the mp3 file 421 and the text file 431 may be sent to the manual auditing module 520 for auditing, the manual auditing module 520 may obtain the mp3 file 421 and the text file 431 in batch for fast auditing, and under the condition that it is determined that a handling instruction needs to be sent to at least one of a live broadcast room and a live broadcast user related to live broadcast audio information, a handling instruction is sent, and live broadcast users in the live broadcast room and the live broadcast room are penalized.

According to embodiments of the present disclosure, the word policy may include a vocabulary policy and a contact identification policy. The vocabulary policy may determine whether the live broadcast audio information includes predetermined abnormal information by comparing whether the word information in the text file matches the word information in the vocabulary based on a preset vocabulary library. For example, in the case where the word information in the text file hits the word information in the thesaurus, it may be determined that predetermined abnormality information is included in the live audio information. The contact identification policy may match whether the textual information in the text file includes information related to a user's contact, advertisement, etc.

Through the above-mentioned embodiment of this disclosure, introduce artifical processing platform, increase the scheme of artifical audit, can further improve the accuracy of live audio processing result.

Fig. 6 schematically shows a block diagram of a live audio processing apparatus according to an embodiment of the present disclosure.

As shown in fig. 6, the live audio processing apparatus 600 includes an acquisition module 610, a conversion module 620, a determination module 630, and a first transmission module 640.

The obtaining module 610 is configured to obtain, in real time, a transport stream file corresponding to the live audio information in response to completion of generation of the live audio information.

And a converting module 620, configured to convert the transport stream file into an audio file and a text file.

A determining module 630, configured to determine whether the audio file and the text file include predetermined abnormal information.

A first sending module 640, configured to send a handling instruction to a live object related to the live audio information if it is determined that at least one of the audio file and the text file includes predetermined abnormal information.

According to an embodiment of the present disclosure, an acquisition module includes a first acquisition unit, a second acquisition unit, and a generation unit.

A first obtaining unit configured to obtain, in response to completion of generation of the live audio information, a transport stream address for requesting transport stream clip information related to the live audio information.

And the second acquisition unit is used for acquiring the transport stream fragment information according to the transport stream address.

And the generating unit is used for generating a transport stream file according to the transport stream fragment information.

According to an embodiment of the present disclosure, a conversion module includes a conversion unit and a voice recognition unit.

And the conversion unit is used for converting the transmission stream file into a voice file in a pulse code modulation format.

And the voice recognition unit is used for carrying out voice recognition on the voice file to obtain a text file.

According to an embodiment of the present disclosure, the live audio processing apparatus further includes a second sending module and a third sending module.

And the second sending module is used for sending at least one of the audio file and the text file to the manual handling platform under the condition that the at least one of the audio file and the text file comprises the preset sensitive words.

And the third sending module is used for sending a handling instruction to the live object related to the live audio information in response to the condition that the handling result of the manual handling platform is not passed.

According to an embodiment of the present disclosure, the live audio information includes a plurality of live audio information. The acquisition module is used for responding to the completion of the generation of the plurality of live broadcast audio information and acquiring the transport stream files corresponding to the plurality of live broadcast audio information in real time and concurrently.

According to an embodiment of the present disclosure, the predetermined abnormality information includes information related to a user contact address.

According to an embodiment of the present disclosure, the handling instruction includes at least one of a warning instruction, a ban instruction, and a seal instruction.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

According to an embodiment of the present disclosure, an electronic device includes: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the live audio processing method as described above.

According to an embodiment of the present disclosure, a non-transitory computer-readable storage medium having stored thereon computer instructions for causing a computer to execute a live audio processing method as described above.

According to an embodiment of the present disclosure, a computer program product comprising a computer program which, when executed by a processor, implements a live audio processing method as described above.

FIG. 7 illustrates a schematic block diagram of an example electronic device 700 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 7, the device 700 comprises a computing unit 701, which may perform various suitable actions and processes according to a computer program stored in a Read Only Memory (ROM)702 or a computer program loaded from a storage unit 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data required for the operation of the device 700 can also be stored. The computing unit 701, the ROM 702, and the RAM 703 are connected to each other by a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.

Various components in the device 700 are connected to the I/O interface 705, including: an input unit 706 such as a keyboard, a mouse, or the like; an output unit 707 such as various types of displays, speakers, and the like; a storage unit 708 such as a magnetic disk, optical disk, or the like; and a communication unit 709 such as a network card, modem, wireless communication transceiver, etc. The communication unit 709 allows the device 700 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.

Computing unit 701 may be a variety of general purpose and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 701 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The computing unit 701 performs the respective methods and processes described above, such as the live audio processing method. For example, in some embodiments, the live audio processing method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 708. In some embodiments, part or all of a computer program may be loaded onto and/or installed onto device 700 via ROM 702 and/or communications unit 709. When the computer program is loaded into the RAM 703 and executed by the computing unit 701, one or more steps of the live audio processing method described above may be performed. Alternatively, in other embodiments, the computing unit 701 may be configured to perform the live audio processing method by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A live audio processing method, comprising:

responding to the generation completion of the live audio information, and acquiring a transport stream file corresponding to the live audio information in real time;

converting the transport stream file into an audio file and a text file;

determining whether the audio file and the text file include predetermined abnormality information;

in an instance in which it is determined that predetermined anomaly information is included in at least one of the audio file and the text file, sending a disposition instruction to a live subject associated with the live audio information.

2. The method of claim 1, wherein the obtaining a transport stream file corresponding to live audio information in real-time in response to completion of generation of the live audio information comprises:

responding to the generation completion of the live broadcast audio information, and acquiring a transport stream address for requesting transport stream fragment information related to the live broadcast audio information;

acquiring the transport stream fragment information according to the transport stream address; and

and generating the transport stream file according to the transport stream fragment information.

3. The method of claim 1, wherein the converting the transport stream file into an audio file and a text file comprises:

converting the transport stream file into a voice file in a pulse code modulation format; and

and carrying out voice recognition on the voice file to obtain the text file.

4. The method of claim 1, further comprising:

in the event that at least one of the audio file and the text file is determined to include a predetermined sensitive word, sending at least one of the audio file and the text file to a human handling platform; and

and in response to the condition that the treatment result of the manual treatment platform is not passed, sending a treatment instruction to a live object related to the live audio information.

5. The method of claim 1, wherein the live audio information comprises a plurality of live audio information;

the step of acquiring a transport stream file corresponding to the live audio information in real time in response to the completion of the generation of the live audio information includes:

and responding to the completion of the generation of the plurality of live broadcast audio information, and acquiring the transport stream files corresponding to the plurality of live broadcast audio information in real time and concurrently.

6. The method of any one of claims 1 to 5, wherein the predetermined exception information includes information relating to a user contact address.

7. The method of any of claims 1 to 6, wherein the treatment instructions include at least one of warning instructions, ban-ial instructions, and seal instructions.

8. A live audio processing apparatus comprising:

the acquisition module is used for responding to the generation completion of the live broadcast audio information and acquiring a transport stream file corresponding to the live broadcast audio information in real time;

the conversion module is used for converting the transport stream file into an audio file and a text file;

a determining module for determining whether the audio file and the text file include predetermined abnormal information;

the processing module is used for processing the audio file and the text file in a preset mode, and sending a processing instruction to a live object related to the live audio information under the condition that at least one of the audio file and the text file comprises preset abnormal information.

9. The apparatus of claim 8, wherein the means for obtaining comprises:

a first obtaining unit configured to obtain, in response to completion of generation of the live audio information, a transport stream address for requesting transport stream segment information related to the live audio information;

a second obtaining unit, configured to obtain the transport stream fragment information according to the transport stream address; and

and the generating unit is used for generating the transport stream file according to the transport stream fragment information.

10. The apparatus of claim 8, wherein the conversion module comprises:

the conversion unit is used for converting the transmission stream file into a voice file in a pulse code modulation format; and

and the voice recognition unit is used for carrying out voice recognition on the voice file to obtain the text file.

11. The apparatus of claim 8, further comprising:

a second sending module, configured to send at least one of the audio file and the text file to a human handling platform if it is determined that at least one of the audio file and the text file includes a predetermined sensitive word; and

and the third sending module is used for sending a handling instruction to a live object related to the live audio information in response to the condition that the handling result of the manual handling platform is not passed.

12. The apparatus of claim 8, wherein the live audio information comprises a plurality of live audio information;

the acquisition module is used for responding to the completion of the generation of the plurality of live broadcast audio information and acquiring the transport stream files corresponding to the plurality of live broadcast audio information in real time and concurrently.

13. The apparatus according to any one of claims 8 to 12, wherein the predetermined abnormal information includes information related to a user contact address.

14. The apparatus of any of claims 8 to 13, wherein the treatment instructions comprise at least one of warning instructions, no-word instructions, and seal number instructions.

15. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-7.

16. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-7.

17. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-7.