CN113852835A - Live broadcast audio processing method and device, electronic equipment and storage medium - Google Patents

Live broadcast audio processing method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN113852835A
CN113852835A CN202111111150.6A CN202111111150A CN113852835A CN 113852835 A CN113852835 A CN 113852835A CN 202111111150 A CN202111111150 A CN 202111111150A CN 113852835 A CN113852835 A CN 113852835A
Authority
CN
China
Prior art keywords
file
live
information
transport stream
audio
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111111150.6A
Other languages
Chinese (zh)
Inventor
杜康
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202111111150.6A priority Critical patent/CN113852835A/en
Publication of CN113852835A publication Critical patent/CN113852835A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/21Server components or server architectures
    • H04N21/218Source of audio or video content, e.g. local disk arrays
    • H04N21/2187Live feed
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/233Processing of audio elementary streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • H04N21/4394Processing of audio elementary streams involving operations for analysing the audio stream, e.g. detecting features or characteristics in audio streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/478Supplemental services, e.g. displaying phone caller identification, shopping application
    • H04N21/4788Supplemental services, e.g. displaying phone caller identification, shopping application communicating with other users, e.g. chatting
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/488Data services, e.g. news ticker
    • H04N21/4882Data services, e.g. news ticker for displaying messages, e.g. warnings, reminders

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

The disclosure provides a live audio processing method and device, electronic equipment and a storage medium, and relates to the technical field of computers, in particular to the technical field of voice. The specific implementation scheme is as follows: responding to the generation completion of the live broadcast audio information, and acquiring a transport stream file corresponding to the live broadcast audio information in real time; converting the transport stream file into an audio file and a text file; determining whether the audio file and the text file include predetermined abnormality information; in the event that it is determined that predetermined anomaly information is included in at least one of the audio file and the text file, a disposition instruction is sent to a live subject associated with the live audio information.

Description

Live broadcast audio processing method and device, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a live audio processing method and apparatus, an electronic device, and a storage medium.
Background
With the development of internet technology, more and more people begin to pay attention to live webcasting, which can be represented by two forms of live video and live audio. Live audio is a real-time audio playing technology, which is similar to live video, emphasizes real-time performance, and is different in that only audio is provided with fewer image elements.
Disclosure of Invention
The disclosure provides a live audio processing method and device, electronic equipment and a storage medium.
According to an aspect of the present disclosure, there is provided a live audio processing method including: responding to the generation completion of the live audio information, and acquiring a transport stream file corresponding to the live audio information in real time; converting the transport stream file into an audio file and a text file; determining whether the audio file and the text file include predetermined abnormality information; in an instance in which it is determined that predetermined anomaly information is included in at least one of the audio file and the text file, sending a disposition instruction to a live subject associated with the live audio information.
According to another aspect of the present disclosure, there is provided a live audio processing apparatus including: the acquisition module is used for responding to the generation completion of the live broadcast audio information and acquiring a transport stream file corresponding to the live broadcast audio information in real time; the conversion module is used for converting the transport stream file into an audio file and a text file; a determining module for determining whether the audio file and the text file include predetermined abnormal information; the processing module is used for processing the audio file and the text file in a preset mode, and sending a processing instruction to a live object related to the live audio information under the condition that at least one of the audio file and the text file comprises preset abnormal information.
According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a live audio processing method as described above.
According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the live audio processing method as described above.
According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements a live audio processing method as described above.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
fig. 1 schematically illustrates an exemplary system architecture to which live audio processing methods and apparatus may be applied, according to an embodiment of the present disclosure;
fig. 2 schematically shows a flow diagram of a live audio processing method according to an embodiment of the present disclosure;
fig. 3 schematically illustrates a schematic diagram of acquiring a transport stream file corresponding to live audio information in response to completion of generation of the live audio information according to an embodiment of the present disclosure;
FIG. 4 schematically illustrates a diagram of identifying a transition for a transport stream file, according to an embodiment of the disclosure;
FIG. 5 schematically shows a diagram of a live audio review flow, in accordance with an embodiment of the disclosure;
fig. 6 schematically shows a block diagram of a live audio processing apparatus according to an embodiment of the present disclosure; and
FIG. 7 illustrates a schematic block diagram of an example electronic device that can be used to implement embodiments of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
In the technical scheme of the disclosure, the collection, storage, use, processing, transmission, provision, disclosure and other processing of the personal information of the related user are all in accordance with the regulations of related laws and regulations, necessary security measures are taken, and the customs of the public order is not violated.
The audio live broadcast is a live broadcast which is related to social contact by voice and has no visual picture scene. Under the scene, how to quickly and effectively audit the words of a main broadcast or audiences in a live broadcast room in real time and identify yellow reflex and political affair related voices is very important, so that relevant real-time punishment is carried out.
In order to realize the examination of the words of the main broadcast or audiences in the live broadcast room, an auditor can listen to the audio streams of countless live broadcast rooms in real time and listen to characters by ears for examination. Or after the live broadcast is finished, characters can be recognized through a universal voice recognition tool, so that the examination and verification can be performed.
The inventor finds that the mode that an auditor listens to the audio stream in real time for auditing can only be applied to a few scenes of a live broadcast room, and the application range is relatively limited in the process of realizing the concept disclosed by the invention. When a lot of voices are simultaneously started, a lot of auditing manpower is consumed, and the cost is also high. Thus. When the live voice stream is numerous, a large amount of human resources are consumed, and an auditor can be fatigued after listening too many things, which can lead to misjudgment. After the live broadcast is finished, voice recognition is carried out, and then the live broadcast is sent to an auditing platform for auditing, so that the timeliness is low, and abnormal information can be exposed.
In view of this, the embodiments of the present disclosure provide a scheme that can accurately identify the quality of audio data in a live voice broadcast room in real time, so as to alleviate the above drawbacks.
Fig. 1 schematically shows an exemplary system architecture to which the live audio processing method and apparatus may be applied, according to an embodiment of the present disclosure.
It should be noted that fig. 1 is only an example of a system architecture to which the embodiments of the present disclosure may be applied to help those skilled in the art understand the technical content of the present disclosure, and does not mean that the embodiments of the present disclosure may not be applied to other devices, systems, environments or scenarios. For example, in another embodiment, an exemplary system architecture to which the live audio processing method and apparatus may be applied may include a terminal device, but the terminal device may implement the live audio processing method and apparatus provided in the embodiments of the present disclosure without interacting with a server.
As shown in fig. 1, the system architecture 100 according to this embodiment may include terminal devices 101, 102, 103, a network 104 and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired and/or wireless communication links, and so forth.
The user may use the terminal devices 101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The terminal devices 101, 102, 103 may have installed thereon various communication client applications, such as a knowledge reading application, a web browser application, a search application, an instant messaging tool, a mailbox client, and/or social platform software, etc. (by way of example only).
The terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
The server 105 may be a server providing various services, such as a background management server (for example only) providing support for content browsed by the user using the terminal devices 101, 102, 103. The background management server may analyze and perform other processing on the received data such as the user request, and feed back a processing result (e.g., a webpage, information, or data obtained or generated according to the user request) to the terminal device. The Server may be a cloud Server, which is also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service extensibility in a traditional physical host and a VPS service ("Virtual Private Server", or "VPS" for short). The server may also be a server of a distributed system, or a server incorporating a blockchain.
It should be noted that the live audio processing method provided by the embodiment of the present disclosure may be generally executed by the terminal device 101, 102, or 103. Accordingly, the live audio processing apparatus provided by the embodiment of the present disclosure may also be disposed in the terminal device 101, 102, or 103.
Alternatively, the live audio processing method provided by the embodiment of the present disclosure may also be generally performed by the server 105. Accordingly, the live audio processing apparatus provided by the embodiments of the present disclosure may be generally disposed in the server 105. The live audio processing method provided by the embodiment of the present disclosure may also be executed by a server or a server cluster that is different from the server 105 and is capable of communicating with the terminal devices 101, 102, 103 and/or the server 105. Accordingly, the live audio processing apparatus provided by the embodiment of the present disclosure may also be disposed in a server or a server cluster different from the server 105 and capable of communicating with the terminal devices 101, 102, 103 and/or the server 105.
For example, when generating live audio, the terminal devices 101, 102, and 103 may acquire target content in an electronic book pointed by a user's line of sight, then send the acquired target content to the server 105, and the server 105 analyzes the target content to determine feature information of the target content; predicting the content which is interested by the user according to the characteristic information of the target content; and extracting the content of interest to the user. Or by a server or server cluster capable of communicating with the terminal devices 101, 102, 103 and/or the server 105, to analyze the target content and finally to enable the extraction of content of interest to the user.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Fig. 2 schematically shows a flow chart of a live audio processing method according to an embodiment of the present disclosure.
As shown in fig. 2, the method includes operations S210 to S240.
In operation S210, in response to the generation of the live audio information being completed, a transport stream file corresponding to the live audio information is acquired in real time.
The transport stream file is converted into an audio file and a text file in operation S220.
In operation S230, it is determined whether the audio file and the text file include predetermined abnormality information.
In operation S240, in case it is determined that predetermined abnormality information is included in at least one of the audio file and the text file, a handling instruction is transmitted to a live subject related to the live audio information.
According to an embodiment of the present disclosure, the live audio information may include voice information uttered by a live user in at least one of a video live room, an audio live room, and the like. Under the condition that a live broadcast user sends out any section of voice information with any length, the generation of the live broadcast audio information can be represented to be completed. The transport stream file may represent a file of live audio information characterized in a transport stream format. The transport stream format may include a TS (transport stream) format, and a storage manner based on the TS format may store the live audio information in segments as TS files, so that the purpose of obtaining live audio by independently decoding any segment of the live audio information is achieved.
According to an embodiment of the present disclosure, the predetermined abnormality information may include various kinds of abnormality information that do not conform to a common order. The live object may include at least one of a live room and a live user. The handling instructions may include instructions to initiate at least one of a debit, an alert, a banning, a seal number, etc., to at least one of a live room and a live user that generated the exception information.
According to the embodiment of the disclosure, under the condition that a live broadcast user in a live broadcast room sends out voice information, the voice information can be audited in real time so as to determine whether the voice information comprises abnormal information which is not in accordance with the good custom of the public order. Under the condition that the voice information sent by the live broadcast user comprises abnormal information which is not in accordance with the social customs of the public order, a handling instruction can be initiated to at least one of the corresponding live broadcast user and the live broadcast room where the live broadcast user is located in real time, and handling operations such as money deduction, warning, speech forbidding, number sealing and the like can be carried out on at least one of the corresponding live broadcast user and the live broadcast room where the live broadcast user is located.
Through the above embodiment of the present disclosure, live audio can be audited in real time while live audio is being played, and a disposition instruction can be sent to a live object generating the live audio in time to perform disposition under the condition that it is determined that the live audio includes predetermined abnormal information, so that the quality of audio data of the live audio can be controlled in time, and the outflow and diffusion of the abnormal information are reduced.
The method shown in fig. 2 is further described below with reference to specific embodiments.
According to an embodiment of the present disclosure, the predetermined abnormality information may include information related to a user contact address. The information related to the user contact information may include, for example, information such as a mobile phone number, an address, a name, and the like. The predetermined anomaly information may also include information related to the advertisement. The information related to the advertisement may be determined, for example, by predefining a corresponding advertisement word and then matching the audio information with the predefined corresponding advertisement word. The predetermined exception information may also include other information, which is not limited herein.
Through the embodiment of the disclosure, the information coverage range of the predefined abnormal information can be increased, so that the audio data can be processed in more directions, and the quality of the output audio is improved.
According to an embodiment of the present disclosure, the handling instructions may include at least one of warning instructions, ban-talk instructions, and seal instructions. The handling instructions may also include other instructions with a penalty meaning, and are not limited herein.
Through the above-mentioned embodiment of this disclosure, set up the processing instruction, handle the live broadcast object that produces the live broadcast audio frequency that includes predetermined abnormal information, can effectively control the audio data of live broadcast object output, improve the quality of output audio frequency.
According to an embodiment of the present disclosure, the live audio information may include a plurality of live audio information. In response to completion of the generation of the live audio information, acquiring a transport stream file corresponding to the live audio information in real time may include: and responding to the completion of the generation of the plurality of live broadcast audio information, and acquiring the transport stream files corresponding to the plurality of live broadcast audio information in real time and concurrently.
According to embodiments of the present disclosure, there may be live broadcasts of multiple live users at the same time. The method comprises the steps that a plurality of voice messages sent by a plurality of live users at the same moment can be acquired in parallel and audited in real time, and under the condition that one or more voice messages include abnormal information which is not in accordance with the common customs of the public order, a handling instruction is sent to at least one of the live users generating the voice messages and the live rooms where the live users are located in real time, and handling operations such as money deduction, warning, speech forbidding, number sealing and the like are carried out on at least one of the corresponding live users and the live rooms where the live users are located.
Through the above embodiment of the present disclosure, a plurality of live audio information can be audited simultaneously, and the problem of audit error caused by fatigue during manual audit of a plurality of live audio can be effectively alleviated through the process of carrying out subsequent audit based on the transport stream file. In addition, the quality of the audio data of each live broadcast audio can be controlled in time, and the outflow and diffusion of any piece of abnormal information are reduced.
According to an embodiment of the present disclosure, in response to completion of generation of live audio information, acquiring a transport stream file corresponding to live audio in real time may include: and responding to the generation completion of the live broadcast audio information, and acquiring a transport stream address for requesting transport stream fragment information related to the live broadcast audio information. And acquiring the transport stream fragment information according to the transport stream address. And generating a transport stream file according to the transport stream fragment information.
According to the embodiment of the disclosure, the transport stream fragment information may represent information of the TS file obtained by storing the live audio information in fragments, the transport stream address may represent a TS file address of the TS file, and the transport stream file may be a TS file with a preset duration obtained by merging a plurality of transport stream fragment information sorted in time sequence.
Fig. 3 schematically shows a schematic diagram of acquiring a transport stream file corresponding to live audio information in response to completion of generation of the live audio information according to an embodiment of the present disclosure.
As shown in fig. 3, multiple live rooms 310, such as live room a, live room B, etc. in fig. 3, may be opened online at the same time. Each live broadcast room can support multiple live broadcasts of users, for example, live broadcasts of user a1 and user a2 can be provided in live broadcast room a, live broadcasts of user B1 and user B2 can be provided in live broadcast room B, and the like. After the live broadcast room is started, the live broadcast basic service module 320 may generate corresponding live broadcast room identification information room _ id for the started live broadcast room. After a live user logs in a live broadcast room, the live broadcast basic service module 320 may generate corresponding live broadcast user identification information u _ id for the live broadcast user who logs in. In addition, the live basic service module 320 may also generate an m3u8 (a file format) stream address, that is, m3u8 url, for the live broadcasting room and live broadcasting user according to the room _ id and the u _ id, and store the m3u8 (a file format) stream address into the m3u8 address pool 330. m3u8 url may be expressed, for example, as xxxx. The pull stream parsing download service 340 can access m3u8 url in real time and obtain a TS file address for requesting a TS file. The resulting TS file address may be stored in the TS file pool 350.
As shown in fig. 3, the pull stream parsing download service 340 may include a pull stream parsing main service 341, an m3u8 parsing service 342, and a TS download service 343. Under the condition that the pull stream analysis main service 341 is started, the m3u8 analysis service 342 and the TS download service 343 can be started at the same time, and whether the m3u8 analysis service 342 and the TS download service 343 are active can be detected in a heartbeat detection manner, so that timely processing is performed under the condition that at least one of the m3u8 analysis service 342 and the TS download service 343 is detected to be in a fault, the normal operation state of each service is maintained, and the real-time performance of the whole process of live audio processing is effectively maintained.
According to the embodiment of the present disclosure, the pull stream parsing master service 341 may obtain m3u8 url in the m3u8 address pool 330 in real time and concurrently, and obtain a corresponding m3u8 stream address file. The TS file address of the corresponding TS file may be included in the m3u8 stream address file. In the event that it is determined that the current user is on-phone or online, the m3u8 streaming address file may be pushed to m3u8 resolution service 342. The m3u8 parsing service 342 may parse the m3u8 stream address file to obtain a TS file address for requesting a corresponding TS file. The parsed TS file address may be pushed to the TS file download service 343. The TS file download service 343 may invoke multiple protocols to download TS files in parallel and store them in the TS file pool 350.
According to the method for realizing real-time acquisition of the transport stream file, the transport stream fragment information is acquired according to the transport stream address acquired by responding to the generation completion of the live audio information, and the transport stream fragment information is generated to generate the transport stream file, so that the real-time performance of the acquisition of the transport stream file can be effectively improved, the real-time performance of live audio processing is enhanced, the quality of audio data of live audio is controlled in time, and the outflow and diffusion of abnormal information are reduced.
According to an embodiment of the present disclosure, converting the transport stream file into the audio file and the text file includes: and converting the transmission stream file into a voice file in a pulse code modulation format. And carrying out voice recognition on the voice file to obtain a text file.
According to an embodiment of the present disclosure, converting the transport stream file into the audio file may include at least one of converting the TS file into an mp3 (an audio video format) file and converting the TS file into a pcm (pulse code modulation) file. Converting the TS file into a pcm file may include: the TS file is converted into an mp3 file, and then the mp3 file is converted into a pcm file. Converting the transport stream file into a text file may include converting the pcm file into a text file.
Fig. 4 schematically shows a schematic diagram of performing conversion recognition on a transport stream file according to an embodiment of the present disclosure.
As shown in fig. 4, in the case of starting the speech recognition main service 410, ffmpeg (open source program with audio conversion function) processing service 420 and ASR (automatic speech recognition technology) recognition service 430 may be started at the same time, and whether ffmpeg processing service 420 and ASR recognition service 430 are active may be detected by means of heartbeat detection, so as to process in time in the case of detecting that at least one of ffmpeg processing service 420 and ASR recognition service 430 is failed, and maintain the normal operation state of each service, thereby effectively maintaining the real-time performance of the whole process of live audio processing.
According to an embodiment of the present disclosure, the voice recognition main service 410 may obtain a combined TS file having a preset duration from the TS file pool 350, and then send the combined TS file to the ffmpeg processing service 420. The ffmpeg processing service 420 may convert the TS file into an mp3 file 421 and store the mp3 file 421. The ffmpeg processing service 420 may also convert the mp3 file into a pcm formatted voice file 422 and then send the pcm formatted voice file 422 to the ASR recognition service 430. The ASR recognition service 430 may recognize the pcm formatted speech file 422, obtain a text file 431, and store the text file 431.
Through the embodiment of the disclosure, the transmission stream file can be converted into the audio file and the text file which can be audited, the combination of the corresponding auditing mode is improved, the live audio information can be efficiently processed, and the real-time performance of the processing process is improved.
According to an embodiment of the present disclosure, the live audio processing method may further include: in an instance in which it is determined that at least one of the audio file and the text file includes the predetermined sensitive word, at least one of the audio file and the text file is sent to a human disposition platform. And in response to the condition that the treatment result of the manual treatment platform is not passed, sending a treatment instruction to a live object related to the live audio information.
Fig. 5 schematically shows a schematic diagram of a live audio review flow according to an embodiment of the disclosure.
As shown in fig. 5, mp3 file 421 and text file 431 converted for the transport stream file corresponding to the live audio information may be sent to machine audit module 510 for auditing. The machine auditing module 510 may include a text policy-based machine auditing method and a voice policy-based machine auditing method. In the event that it is determined from the machine review method that predetermined anomaly information is included in the text file, a disposition instruction may be sent to at least one of a live room and a live user associated with the live audio information. Under the condition that at least one of the mp3 file 421 and the text file 431 includes the predetermined sensitive word according to the machine auditing method, the mp3 file 421 and the text file 431 may be sent to the manual auditing module 520 for auditing, the manual auditing module 520 may obtain the mp3 file 421 and the text file 431 in batch for fast auditing, and under the condition that it is determined that a handling instruction needs to be sent to at least one of a live broadcast room and a live broadcast user related to live broadcast audio information, a handling instruction is sent, and live broadcast users in the live broadcast room and the live broadcast room are penalized.
According to embodiments of the present disclosure, the word policy may include a vocabulary policy and a contact identification policy. The vocabulary policy may determine whether the live broadcast audio information includes predetermined abnormal information by comparing whether the word information in the text file matches the word information in the vocabulary based on a preset vocabulary library. For example, in the case where the word information in the text file hits the word information in the thesaurus, it may be determined that predetermined abnormality information is included in the live audio information. The contact identification policy may match whether the textual information in the text file includes information related to a user's contact, advertisement, etc.
Through the above-mentioned embodiment of this disclosure, introduce artifical processing platform, increase the scheme of artifical audit, can further improve the accuracy of live audio processing result.
Fig. 6 schematically shows a block diagram of a live audio processing apparatus according to an embodiment of the present disclosure.
As shown in fig. 6, the live audio processing apparatus 600 includes an acquisition module 610, a conversion module 620, a determination module 630, and a first transmission module 640.
The obtaining module 610 is configured to obtain, in real time, a transport stream file corresponding to the live audio information in response to completion of generation of the live audio information.
And a converting module 620, configured to convert the transport stream file into an audio file and a text file.
A determining module 630, configured to determine whether the audio file and the text file include predetermined abnormal information.
A first sending module 640, configured to send a handling instruction to a live object related to the live audio information if it is determined that at least one of the audio file and the text file includes predetermined abnormal information.
According to an embodiment of the present disclosure, an acquisition module includes a first acquisition unit, a second acquisition unit, and a generation unit.
A first obtaining unit configured to obtain, in response to completion of generation of the live audio information, a transport stream address for requesting transport stream clip information related to the live audio information.
And the second acquisition unit is used for acquiring the transport stream fragment information according to the transport stream address.
And the generating unit is used for generating a transport stream file according to the transport stream fragment information.
According to an embodiment of the present disclosure, a conversion module includes a conversion unit and a voice recognition unit.
And the conversion unit is used for converting the transmission stream file into a voice file in a pulse code modulation format.
And the voice recognition unit is used for carrying out voice recognition on the voice file to obtain a text file.
According to an embodiment of the present disclosure, the live audio processing apparatus further includes a second sending module and a third sending module.
And the second sending module is used for sending at least one of the audio file and the text file to the manual handling platform under the condition that the at least one of the audio file and the text file comprises the preset sensitive words.
And the third sending module is used for sending a handling instruction to the live object related to the live audio information in response to the condition that the handling result of the manual handling platform is not passed.
According to an embodiment of the present disclosure, the live audio information includes a plurality of live audio information. The acquisition module is used for responding to the completion of the generation of the plurality of live broadcast audio information and acquiring the transport stream files corresponding to the plurality of live broadcast audio information in real time and concurrently.
According to an embodiment of the present disclosure, the predetermined abnormality information includes information related to a user contact address.
According to an embodiment of the present disclosure, the handling instruction includes at least one of a warning instruction, a ban instruction, and a seal instruction.
The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.
According to an embodiment of the present disclosure, an electronic device includes: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the live audio processing method as described above.
According to an embodiment of the present disclosure, a non-transitory computer-readable storage medium having stored thereon computer instructions for causing a computer to execute a live audio processing method as described above.
According to an embodiment of the present disclosure, a computer program product comprising a computer program which, when executed by a processor, implements a live audio processing method as described above.
FIG. 7 illustrates a schematic block diagram of an example electronic device 700 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 7, the device 700 comprises a computing unit 701, which may perform various suitable actions and processes according to a computer program stored in a Read Only Memory (ROM)702 or a computer program loaded from a storage unit 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data required for the operation of the device 700 can also be stored. The computing unit 701, the ROM 702, and the RAM 703 are connected to each other by a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.
Various components in the device 700 are connected to the I/O interface 705, including: an input unit 706 such as a keyboard, a mouse, or the like; an output unit 707 such as various types of displays, speakers, and the like; a storage unit 708 such as a magnetic disk, optical disk, or the like; and a communication unit 709 such as a network card, modem, wireless communication transceiver, etc. The communication unit 709 allows the device 700 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.
Computing unit 701 may be a variety of general purpose and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 701 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The computing unit 701 performs the respective methods and processes described above, such as the live audio processing method. For example, in some embodiments, the live audio processing method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 708. In some embodiments, part or all of a computer program may be loaded onto and/or installed onto device 700 via ROM 702 and/or communications unit 709. When the computer program is loaded into the RAM 703 and executed by the computing unit 701, one or more steps of the live audio processing method described above may be performed. Alternatively, in other embodiments, the computing unit 701 may be configured to perform the live audio processing method by any other suitable means (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.
The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims (17)

1. A live audio processing method, comprising:
responding to the generation completion of the live audio information, and acquiring a transport stream file corresponding to the live audio information in real time;
converting the transport stream file into an audio file and a text file;
determining whether the audio file and the text file include predetermined abnormality information;
in an instance in which it is determined that predetermined anomaly information is included in at least one of the audio file and the text file, sending a disposition instruction to a live subject associated with the live audio information.
2. The method of claim 1, wherein the obtaining a transport stream file corresponding to live audio information in real-time in response to completion of generation of the live audio information comprises:
responding to the generation completion of the live broadcast audio information, and acquiring a transport stream address for requesting transport stream fragment information related to the live broadcast audio information;
acquiring the transport stream fragment information according to the transport stream address; and
and generating the transport stream file according to the transport stream fragment information.
3. The method of claim 1, wherein the converting the transport stream file into an audio file and a text file comprises:
converting the transport stream file into a voice file in a pulse code modulation format; and
and carrying out voice recognition on the voice file to obtain the text file.
4. The method of claim 1, further comprising:
in the event that at least one of the audio file and the text file is determined to include a predetermined sensitive word, sending at least one of the audio file and the text file to a human handling platform; and
and in response to the condition that the treatment result of the manual treatment platform is not passed, sending a treatment instruction to a live object related to the live audio information.
5. The method of claim 1, wherein the live audio information comprises a plurality of live audio information;
the step of acquiring a transport stream file corresponding to the live audio information in real time in response to the completion of the generation of the live audio information includes:
and responding to the completion of the generation of the plurality of live broadcast audio information, and acquiring the transport stream files corresponding to the plurality of live broadcast audio information in real time and concurrently.
6. The method of any one of claims 1 to 5, wherein the predetermined exception information includes information relating to a user contact address.
7. The method of any of claims 1 to 6, wherein the treatment instructions include at least one of warning instructions, ban-ial instructions, and seal instructions.
8. A live audio processing apparatus comprising:
the acquisition module is used for responding to the generation completion of the live broadcast audio information and acquiring a transport stream file corresponding to the live broadcast audio information in real time;
the conversion module is used for converting the transport stream file into an audio file and a text file;
a determining module for determining whether the audio file and the text file include predetermined abnormal information;
the processing module is used for processing the audio file and the text file in a preset mode, and sending a processing instruction to a live object related to the live audio information under the condition that at least one of the audio file and the text file comprises preset abnormal information.
9. The apparatus of claim 8, wherein the means for obtaining comprises:
a first obtaining unit configured to obtain, in response to completion of generation of the live audio information, a transport stream address for requesting transport stream segment information related to the live audio information;
a second obtaining unit, configured to obtain the transport stream fragment information according to the transport stream address; and
and the generating unit is used for generating the transport stream file according to the transport stream fragment information.
10. The apparatus of claim 8, wherein the conversion module comprises:
the conversion unit is used for converting the transmission stream file into a voice file in a pulse code modulation format; and
and the voice recognition unit is used for carrying out voice recognition on the voice file to obtain the text file.
11. The apparatus of claim 8, further comprising:
a second sending module, configured to send at least one of the audio file and the text file to a human handling platform if it is determined that at least one of the audio file and the text file includes a predetermined sensitive word; and
and the third sending module is used for sending a handling instruction to a live object related to the live audio information in response to the condition that the handling result of the manual handling platform is not passed.
12. The apparatus of claim 8, wherein the live audio information comprises a plurality of live audio information;
the acquisition module is used for responding to the completion of the generation of the plurality of live broadcast audio information and acquiring the transport stream files corresponding to the plurality of live broadcast audio information in real time and concurrently.
13. The apparatus according to any one of claims 8 to 12, wherein the predetermined abnormal information includes information related to a user contact address.
14. The apparatus of any of claims 8 to 13, wherein the treatment instructions comprise at least one of warning instructions, no-word instructions, and seal number instructions.
15. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-7.
16. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-7.
17. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-7.
CN202111111150.6A 2021-09-22 2021-09-22 Live broadcast audio processing method and device, electronic equipment and storage medium Pending CN113852835A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111111150.6A CN113852835A (en) 2021-09-22 2021-09-22 Live broadcast audio processing method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111111150.6A CN113852835A (en) 2021-09-22 2021-09-22 Live broadcast audio processing method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN113852835A true CN113852835A (en) 2021-12-28

Family

ID=78979086

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111111150.6A Pending CN113852835A (en) 2021-09-22 2021-09-22 Live broadcast audio processing method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113852835A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115150633A (en) * 2022-06-30 2022-10-04 广州方硅信息技术有限公司 Processing method for live broadcast reading, computer equipment and storage medium

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105933330A (en) * 2016-06-13 2016-09-07 武汉斗鱼网络科技有限公司 Sticky method and device based on live broadcast bullet screen controller
CN108881937A (en) * 2018-07-31 2018-11-23 成都华栖云科技有限公司 A kind of intelligent identification Method included based on live stream and system
CN110085213A (en) * 2019-04-30 2019-08-02 广州虎牙信息科技有限公司 Abnormality monitoring method, device, equipment and the storage medium of audio
US20190327183A1 (en) * 2018-04-20 2019-10-24 International Business Machines Corporation Live video anomaly detection
CN111010614A (en) * 2019-12-26 2020-04-14 北京奇艺世纪科技有限公司 Method, device, server and medium for displaying live caption
CN111464819A (en) * 2020-03-30 2020-07-28 腾讯音乐娱乐科技(深圳)有限公司 Live image detection method, device, equipment and storage medium
CN112653904A (en) * 2020-12-16 2021-04-13 杭州当虹科技股份有限公司 Rapid video clipping method based on PTS and DTS modification
CN112860939A (en) * 2021-02-19 2021-05-28 北京百度网讯科技有限公司 Audio and video data processing method, device, equipment and storage medium
CN113038153A (en) * 2021-02-26 2021-06-25 深圳道乐科技有限公司 Financial live broadcast violation detection method, device and equipment and readable storage medium
CN113301373A (en) * 2021-05-21 2021-08-24 山东新一代信息产业技术研究院有限公司 Method and system for realizing live video broadcasting and playback

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105933330A (en) * 2016-06-13 2016-09-07 武汉斗鱼网络科技有限公司 Sticky method and device based on live broadcast bullet screen controller
US20190327183A1 (en) * 2018-04-20 2019-10-24 International Business Machines Corporation Live video anomaly detection
CN108881937A (en) * 2018-07-31 2018-11-23 成都华栖云科技有限公司 A kind of intelligent identification Method included based on live stream and system
CN110085213A (en) * 2019-04-30 2019-08-02 广州虎牙信息科技有限公司 Abnormality monitoring method, device, equipment and the storage medium of audio
CN111010614A (en) * 2019-12-26 2020-04-14 北京奇艺世纪科技有限公司 Method, device, server and medium for displaying live caption
CN111464819A (en) * 2020-03-30 2020-07-28 腾讯音乐娱乐科技(深圳)有限公司 Live image detection method, device, equipment and storage medium
CN112653904A (en) * 2020-12-16 2021-04-13 杭州当虹科技股份有限公司 Rapid video clipping method based on PTS and DTS modification
CN112860939A (en) * 2021-02-19 2021-05-28 北京百度网讯科技有限公司 Audio and video data processing method, device, equipment and storage medium
CN113038153A (en) * 2021-02-26 2021-06-25 深圳道乐科技有限公司 Financial live broadcast violation detection method, device and equipment and readable storage medium
CN113301373A (en) * 2021-05-21 2021-08-24 山东新一代信息产业技术研究院有限公司 Method and system for realizing live video broadcasting and playback

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115150633A (en) * 2022-06-30 2022-10-04 广州方硅信息技术有限公司 Processing method for live broadcast reading, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
CN107391359B (en) Service testing method and device
US20160034558A1 (en) Generating a clustering model and clustering based on the clustering model
US11758088B2 (en) Method and apparatus for aligning paragraph and video
US11153236B2 (en) Real-time integration of machine intelligence into client messaging platforms
US11783808B2 (en) Audio content recognition method and apparatus, and device and computer-readable medium
CN104346480B (en) information mining method and device
CN112527649A (en) Test case generation method and device
US11750898B2 (en) Method for generating target video, apparatus, server, and medium
US8868419B2 (en) Generalizing text content summary from speech content
CN113055751A (en) Data processing method and device, electronic equipment and storage medium
US20230169272A1 (en) Communication framework for automated content generation and adaptive delivery
CN113852835A (en) Live broadcast audio processing method and device, electronic equipment and storage medium
CN113111658A (en) Method, device, equipment and storage medium for checking information
CN113377972A (en) Multimedia content recommendation method and device, computing equipment and storage medium
CN111147894A (en) Sign language video generation method, device and system
US10331786B2 (en) Device compatibility management
CN113111200B (en) Method, device, electronic equipment and storage medium for auditing picture files
CN113590447B (en) Buried point processing method and device
CN111064844A (en) Message prompting method and device, electronic equipment and computer readable medium
CN114819679A (en) Customer service session quality inspection method and device
CN115730104A (en) Live broadcast room processing method, device, equipment and medium
CN113076932A (en) Method for training audio language recognition model, video detection method and device thereof
CN113779018A (en) Data processing method and device
CN113066479A (en) Method and device for evaluating model
CN111081247A (en) Method for speech recognition, terminal, server and computer-readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination