CN113038153B - Financial live broadcast violation detection method, device, equipment and readable storage medium - Google Patents

Financial live broadcast violation detection method, device, equipment and readable storage medium Download PDF

Info

Publication number
CN113038153B
CN113038153B CN202110222936.9A CN202110222936A CN113038153B CN 113038153 B CN113038153 B CN 113038153B CN 202110222936 A CN202110222936 A CN 202110222936A CN 113038153 B CN113038153 B CN 113038153B
Authority
CN
China
Prior art keywords
video
violation
audio
financial
text information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110222936.9A
Other languages
Chinese (zh)
Other versions
CN113038153A (en
Inventor
蔡树彬
卢良楷
林旭恒
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Daole Technology Co ltd
Original Assignee
Shenzhen Daole Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Daole Technology Co ltd filed Critical Shenzhen Daole Technology Co ltd
Priority to CN202110222936.9A priority Critical patent/CN113038153B/en
Publication of CN113038153A publication Critical patent/CN113038153A/en
Application granted granted Critical
Publication of CN113038153B publication Critical patent/CN113038153B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/21Server components or server architectures
    • H04N21/218Source of audio or video content, e.g. local disk arrays
    • H04N21/2187Live feed
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/233Processing of audio elementary streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/23418Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • H04N21/4394Processing of audio elementary streams involving operations for analysing the audio stream, e.g. detecting features or characteristics in audio streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

The invention discloses a financial live broadcast violation detection method, a device, equipment and a readable storage medium, wherein the financial live broadcast violation detection method comprises the following steps: acquiring an audio and video file of a financial live program; determining the violation type of Chinese information in an audio and video file; and calculating the violation score of the financial live program according to the violation type, so as to realize the intelligent detection of the violation risk of the financial live program.

Description

Financial live broadcast violation detection method, device, equipment and readable storage medium
Technical Field
The present invention relates to the field of video detection, and in particular, to a method, apparatus, device, and readable storage medium for detecting a financial live broadcast violation.
Background
In the prior art, before video of a video platform is put on shelf, the video platform is required to be audited through an own auditing system, auditing contents mainly comprise administrative, pornography, abuse, terrorism and the like, but the auditing system mainly carries out auditing of pornography, violence and the like on pictures of the video, mainly adopts an image processing technology, does not analyze video voice, is not suitable for auditing of financial live broadcast video, and is lack of auditing methods for financial live broadcast in the prior art.
Disclosure of Invention
The invention mainly aims to provide a method, a device, equipment and a computer readable storage medium for detecting violations of financial live broadcast, and aims to provide a method for checking violations of financial live broadcast programs, wherein the method for detecting violations of financial live broadcast comprises the following steps:
acquiring an audio and video file of a financial live program;
determining the violation type of Chinese information in the audio and video file;
and calculating the violation score of the financial live program according to the violation type.
In one embodiment, the step of obtaining an audio/video file of a live financial program includes:
acquiring a live link of a financial live program by utilizing a crawler tool;
acquiring an m3u8 file corresponding to the live link by using a package grabbing tool;
and extracting a plurality of video ts slices from the m3u8 file, and merging the video ts slices to obtain an audio and video file.
In one embodiment, the audio-video file includes an audio portion and a video portion, and the step of determining the type of violation of the Chinese information in the audio-video file includes:
performing voice recognition on the audio part in the audio-video file to obtain text information;
segmenting a video part in the audio and video file to obtain a video segment;
and carrying out illegal risk identification on the text information and the text information in the video clip to obtain an illegal type.
In one embodiment, the step of segmenting the video portion in the audio-video file to obtain a video segment includes:
acquiring the complete sentence in the text information;
and segmenting the video part in the audio and video file according to the complete sentence to obtain a video segment corresponding to the complete sentence.
In one embodiment, the step of identifying the risk of violating rules for the text information and the text information in the video clip, and obtaining the type of violations further includes:
inputting the text information in the text information into a text classification model by taking the complete sentence as a unit to obtain a violation type corresponding to the complete sentence;
performing character recognition on each video image in the video segment, and performing de-duplication on a recognition result to obtain character information in the video segment;
and acquiring the violation type corresponding to the text information of the video clip based on a preset violation keyword library.
In one embodiment, before the step of performing text recognition on each video image in the video clip, the method further includes:
and extracting video frames from the video clips by taking the preset interval duration as a node to obtain video images.
In one embodiment, the step of calculating the violation score of the financial live program according to the violation type comprises:
obtaining a degree value of each violation type and the number of times that the violation type appears in the complete sentence or the video clip corresponding to the complete sentence;
substituting the degree value and the times into a risk calculation formula to obtain the violation value of the financial live program.
In addition, in order to achieve the above object, the present invention further provides a financial live broadcast violation detection device, including:
the acquisition module acquires an audio and video file of the financial live program;
the determining module is used for determining the violation type of the Chinese information in the audio and video file;
and the calculation module is used for calculating the violation score of the financial live program according to the violation type.
In addition, in order to achieve the above object, the present invention also provides a financial live broadcast violation detection device, which includes a memory, a processor, and a financial live broadcast violation detection program stored on the memory and executable on the processor, wherein the financial live broadcast violation detection program when executed by the processor implements the steps of the financial live broadcast violation detection method as described above.
In addition, in order to achieve the above object, the present invention further provides a computer readable storage medium, on which the financial live broadcast violation detection method program is stored, which when executed by a processor, implements the steps of the financial live broadcast violation detection method described above.
According to the invention, the audio and video files of the financial program are acquired, the violation type of the Chinese information in the audio and video files is determined, and the violation score of the financial live program is calculated according to the violation type, so that the intelligent detection of the violation risk of the financial live program is realized.
Drawings
Fig. 1 is a schematic diagram of a hardware structure of a terminal for implementing various embodiments of the present invention;
FIG. 2 is a flow chart of an embodiment of a method for detecting financial live broadcast violations of the present invention;
FIG. 3 is a schematic view of the framework of the invention.
Detailed Description
It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
The invention provides a terminal, and referring to fig. 1, fig. 1 is a schematic structural diagram of a hardware operation environment related to an embodiment of the invention.
It should be noted that fig. 1 may be a schematic structural diagram of a hardware operating environment of a terminal. The terminal of the embodiment of the invention can comprise hardware devices such as a PC (Personal Computer ), a portable computer, a server and the like.
As shown in fig. 1, the terminal includes: a processor 1001, such as a CPU, memory 1005, user interface 1003, network interface 1004, communication bus 1002. Wherein the communication bus 1002 is used to enable connected communication between these components. The user interface 1003 may include a Display, an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may further include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a stable memory (non-volatile memory), such as a disk memory. The memory 1005 may also optionally be a storage device separate from the processor 1001 described above.
Optionally, the terminal may also include RF (Radio Frequency) circuitry, sensors, wiFi modules, and the like.
It will be appreciated by those skilled in the art that the terminal structure shown in fig. 1 is not limiting of the terminal and may include more or fewer components than shown, or may combine certain components, or a different arrangement of components.
As shown in fig. 1, an operation terminal, a network communication module, a user interface module, and a financial live broadcast violation detection program may be included in a memory 1005 as a computer storage readable storage medium. The operation terminal is a program for managing and controlling terminal hardware and software resources, and supports the operation of financial live broadcast violation detection programs and other software or programs.
The terminal shown in fig. 1 may be used to provide a method for conducting violation auditing on a financial live program, where the user interface 1003 is mainly used to detect or output various information, such as inputting an audio-video file and outputting a violation score, etc.; the network interface 1004 is mainly used for interacting with a background server and communicating; the processor 1001 may be configured to invoke the financial live violation detection program stored in the memory 1005 and perform the following operations:
acquiring an audio and video file of a financial live program;
determining the violation type of Chinese information in the audio and video file;
and calculating the violation score of the financial live program according to the violation type.
According to the invention, the audio and video files of the financial program are acquired, the violation type of the Chinese information in the audio and video files is determined, and the violation score of the financial live program is calculated according to the violation type, so that the intelligent detection of the violation risk of the financial live program is realized.
The specific implementation manner of the mobile terminal is basically the same as the following embodiment of the financial live broadcast violation detection method, and is not repeated here.
Based on the structure, the embodiment of the financial live broadcast violation detection method is provided.
The invention provides a financial live broadcast violation detection method.
Referring to fig. 2, fig. 2 is a flow chart of an embodiment of the method for detecting a financial live broadcast violation according to the present invention.
In the present embodiment, an embodiment of a financial live violation detection method is provided, and it should be noted that although a logical order is shown in the flowchart, in some cases, the steps shown or described may be performed in an order different from that herein.
In this embodiment, the method for detecting the violation of the live financial broadcast includes:
step S10, acquiring an audio and video file of a financial live program;
at present, the concept of finance is gradually popularized, a plurality of finance live programs are also led, and an illegal merchant can add exaggerated words and illegal video contents into the finance live programs for different purposes, so that an auditing method of the finance live programs is very necessary for enhancing the management of the finance live programs. The auditing of the financial live program only has auditing significance before live broadcast until auditing results, so that the audio and video files of the financial live program are acquired before the financial live program is played, and the financial live program is allowed to be played only after the audio and video files are audited.
In some embodiments, step S10 includes:
step a, acquiring live links of financial live programs by utilizing a crawler tool;
and (3) performing data crawling on the financial live webpage by using a crawler tool, acquiring live links of the financial live watching programs, forming a live link list when a plurality of live links exist, and automatically updating the live link list every day through a script.
To circumvent the counter-crawling measures of the network, crawler crawling is performed using scrapy in combination with a selenium automated test tool. The Selenium is an automatic testing tool and can simulate the operation of a user on a browser. Here, the selenium is embedded into the scrapy as a web page parser, after a request is sent through the scrapy, web page parsing, operation and the like are performed by the selenium, and the parsing result is returned to the scrapy framework, so that subsequent operations such as data processing and warehousing are performed by the scrapy framework.
FIG. 3 is a view of a scirpy framework with scirpy Engine responsible for communication between the four components Scheduler, item Pipeline, downloader and Spiders; the schedule can store url (request information) sent by the Scrayy Engine, and sequentially fetch url and send the url to the Scrayy Engine for request operation; the Downloader can download the sent url and feed back the downloaded webpage to the script Engine; downloader Middlewares (download middleware module) can perform anticreep peer-to-peer on IP proxy or encapsulation header files; spider Middlewares (crawler middleware) can add codes to process the responses sent to the Spider and the item and request generated by the Spider; the spiners are equivalent to a parser and are responsible for receiving the Responses sent by the scirpy Engine, parsing the Responses, writing parsing rules in the parser, sending a storage request to the scirpy Engine by the parsed content, and sending a Requests to the scirpy Engine by new url parsed in the spiners; the Item Pipeline can define a storage structure in the Item, so that data storage is facilitated; in Pipeline, the data stream may be operated, such as storing the collected data after performing several processing flows.
Step b, acquiring an m3u8 file corresponding to the live link by using a packet capturing tool;
the embodiment provides two Packet capturing tools to obtain m3u8 files corresponding to live links, wherein the first Packet capturing tool is a Packet Capture tool, the Packet Capture is an app for http/https network traffic sniffing on an Android platform, and the SSL is decrypted based on the VpnService api provided by the Android. Firstly, a global Packet capturing mode of a Packet Capture is opened, then a live link in a live link list is opened, after live broadcasting begins, characteristic flow of a financial live program can be found in the Packet Capture, SSL data of the financial live program can be further found, the SSL data contains a real address of an m3u8 file, and a complete m3u8 file can be obtained by combining parameters of a Host after a GET field.
The second is a Wireshark tool, which is a packet-grabbing tool that can analyze the underlying protocol. Firstly, starting a hot spot function of the wireless network card, and connecting the mobile equipment. After connection is completed, the mobile device enters a financial live broadcast page and plays a video, and at the moment, http contacts '. M3u8' is input into a filter of Wireshark, so that a complete m3u8 file can be obtained.
And c, extracting a plurality of video ts slices from the m3u8 file, and merging the video ts slices to obtain an audio-video file.
And extracting the content of the obtained m3u8 file, obtaining a corresponding video ts slice, merging the video slices by using a ffmpeg library, and converting the video slices into an mp4 format to obtain the audio and video file.
Step S20, determining the violation type of Chinese information in the audio and video file;
the audio and video file comprises audio information and video information, the audio information is converted into text, text information can be obtained, and the video picture can possibly contain the content such as a barrage, comments and the like, so that the video information can also contain the text information.
The offence risk of the financial live program is reflected on the characters, so that the offence detection is carried out on the audio and video file, namely the offence detection is carried out on the character information in the audio and video file, and the offence type in the character information is judged. It should be noted that a piece of text information may correspond to different types of violations.
Specifically, step S20 further includes:
step d, performing voice recognition on the audio part in the audio-video file to obtain text information;
extracting an audio part in an audio-video file, and carrying out voice recognition on the audio part to obtain text information corresponding to the audio part, wherein the text information is information composed of characters.
Step e, segmenting the video part in the audio and video file to obtain a video segment;
extracting a video part in an audio-video file, firstly segmenting the video part, wherein the specific segmentation steps comprise: acquiring the complete sentence in the text information; and segmenting the video part in the audio and video file according to the complete sentence to obtain a video segment corresponding to the complete sentence.
The text information contains a plurality of complete sentences, the audio and the video are simultaneously played in the playing process of the audio and video file, the complete sentences in the audio correspond to video clips, and the embodiment obtains the video clips corresponding to the complete sentences of each sentence in the text information.
And f, carrying out illegal risk identification on the text information and the text information in the video clip to obtain an illegal type.
The video clips may have the content of a bullet screen or comments, so that text information and the video clips are provided with text information, the offence risk of the text information is detected, the offence types of the text information and the video clips are obtained, and it can be understood that the offence types of the obtained text information and video clips are also offence types of a section of audio and video corresponding to a complete sentence, and a section of audio and video can have various offence types.
In some embodiments, step f further comprises:
step f1, inputting the text information in the text information into a text classification model by taking the complete sentence as a unit to obtain a violation type corresponding to the complete sentence;
the text classification model provided in this embodiment is a Bert shift learning model. In a large number of application scenes of deep learning, massive marked data cannot be obtained due to the problems of high marking cost and the like, and the problem is well solved by the occurrence of transfer learning.
The transfer learning refers to transferring knowledge of one domain to a target domain, so that the target domain can obtain a better learning effect. The initial transfer learning technology is widely applied to image processing, the transfer learning technology is successfully applied to the field of natural language processing, a basic pre-training language model is obtained after model pre-training is carried out through massive network text data, and the model obtained through pre-training is used for carrying out small-scale training by combining specific tasks and marking data. With this approach, models such as ELMo, bert, GPT are created. The pre-training language model greatly reduces the requirement of training data volume of various subdivision tasks, so that the model can be reused, and the development of the natural language processing field is effectively promoted.
With respect to training of a text classification model, a sentence-based rule breaking risk text training set is constructed by summarizing and arranging the dialogue content of a main broadcast in the financial live broadcast process, and corresponding data labeling is carried out based on different rule breaking types. And then constructing a Bert pre-training classification model by using a tensorf low and keras deep learning framework (the open source model is pre-trained by a large amount of network data, the requirement on the data amount of follow-up fine tuning is low), inputting a rule breaking risk text training set to perform model fine tuning (namely retraining of the model), and obtaining a Bert migration learning model capable of judging rule breaking risk for sentences.
And inputting the text information into a text classification model by taking the complete sentence as a unit to obtain the violation type to which the complete sentence belongs. The types of violations output by the text classification model may include: using exaggeration to depict the representation of finance; using risk-free or risk-neglected expressions; using one-sided emphasis to focus the expression of the marketing time limit; use the expression of defatting others; content prohibited using advertising.
Step f2, performing character recognition on each video image in the video segment, and performing de-duplication on the recognition result to obtain character information in the video segment;
the embodiment provides a character recognition method for video images in video clips by an OCR model, wherein the OCR model is based on a hundred-degree self-aided framework and supports the importing and training of a local data set. The data set aspect selects an ICDAR2019-LSVT data set with an open source, wherein the ICDAR2019-LSVT data set comprises 45 ten thousand Chinese street view images and corresponding recognition results. The accuracy of the OCR model finally obtained by training reaches more than 85 percent, and the text of the live broadcast picture is mainly printed and is in a clearer and opposite angle, so that the accuracy of the character recognition result in the live broadcast room is higher.
The method for acquiring the video image in the video clip comprises the following steps: and extracting video frames from the video clips by taking the preset interval duration as a node to obtain video images.
The video frame extraction is performed on the video clip using the python openCV tool, and the extracted interval duration, i.e., the preset interval duration, may be 0.5 seconds. It should be noted that, the extracted video images may be two identical images, so that text recognition is performed on the video images, repeated text may occur in the recognition result, so that duplicate removal processing is also required on the recognition result, and finally text information in the video clip is obtained.
And f3, acquiring the violation type corresponding to the text information of the video clip based on a preset violation keyword library.
Based on analysis summary of financial live broadcast violation risk cases, summarizing high-frequency keywords of violation risk in the live broadcast process, such as: no risk, high income, etc., and is divided into different violation keyword libraries according to different violation types. Detecting the offence risk of the text information in the obtained video clip by using an offence keyword library to obtain an offence risk detection result of the video image, wherein the offence type of the video image comprises: there is an exaggerated picture content depicting finance; picture content with no risk or neglect risk exists; picture content of defatting others appears; the advertising-prohibited picture content appears.
And step S30, calculating the violation score of the financial live program according to the violation type.
In some embodiments, step S30 includes:
step g, obtaining the degree value of each violation type and the occurrence times of the violation type in the complete sentence or the video clip corresponding to the complete sentence;
and h, substituting the degree value and the times into a risk calculation formula to obtain the violation score of the financial live program.
The number of the violation types of a video segment can be multiple, one violation type can appear multiple times, in this embodiment, a degree value is set according to the risk degree corresponding to the violation type, and the degree value corresponding to the complete sentence in the text information and the violation type of the video segment corresponding to the complete sentence and the number of the occurrence times of the violation type in the complete sentence and the video segment are substituted into a risk calculation formula to calculate the violation value of the financial live program.
The risk calculation formula is:
Figure BDA0002954504350000091
wherein f 1 -f k The corresponding degree value of each violation type can be freely set by a user, and x is 1 -x k In order to calculate the number of times of occurrence of the violation types, the number of occurrence of the violation types is combined with the corresponding degree value through the audio and the image, the violation score of the financial live program can be calculated through a formula, if the violation score is 0, the fact that any risky content is not detected is indicated, if the violation score is not 0, the contents of the violation types, the violation risk keywords, the time periods of occurrence of the violation risks and the like are marked in the system, and the user can conveniently check details further.
According to the embodiment, the audio and video files of the financial program are obtained, the violation type of the Chinese information in the audio and video files is determined, the violation score of the financial live program is calculated according to the violation type, and intelligent detection of the violation risk of the financial live program is achieved.
The embodiment also realizes the automatic regular crawling and updating of the link of the live broadcasting room; automatic conversion from live stream to video file; the method comprises the steps of analyzing live audio by using a voice recognition technology and converting the audio and the image into text type data by using a text recognition technology; according to the business characteristics of financial live broadcast, a text-based violation detection method is provided, the method can overcome the defect that the image violation detection can only solve the visual object violation, and the violation detection is carried out by extracting the core content of live broadcast; aiming at the problem of detecting the offence risk of financial live broadcast, a keyword dictionary and a text classification corpus with a certain scale are constructed, and the keyword dictionary and the text classification corpus are tightly combined with business, so that the practicability is strong; aiming at the problem that a large amount of training corpus is needed for a deep learning field model, using a Bert transfer learning model to carry out annotation data fine adjustment work on the basis of a pre-training model; and aiming at the result of the illegal risk detection, carrying out illegal risk statistical analysis on the financial live broadcast, so that the financial live broadcast can transversely compare the illegal risk situation.
In addition, the embodiment of the invention also provides a financial live broadcast violation detection device, which comprises:
the acquisition module acquires an audio and video file of the financial live program;
the determining module is used for determining the violation type of the Chinese information in the audio and video file;
and the calculation module is used for calculating the violation score of the financial live program according to the violation type.
In addition, the embodiment of the invention also provides a financial live broadcast violation detection device, which comprises a memory, a processor and a financial live broadcast violation detection program stored in the memory and capable of running on the processor, wherein the financial live broadcast violation detection program realizes the steps of the financial live broadcast violation detection method when being executed by the processor.
In addition, the embodiment of the invention also provides a computer readable storage medium, wherein the computer readable storage medium is stored with a financial live broadcast violation detection program, and the financial live broadcast violation detection program realizes each step of the financial live broadcast violation detection method when being executed by a processor.
Note that the computer storage medium may be provided in the terminal-based system.
The specific implementation manner of the computer readable storage medium of the present invention is basically the same as the above embodiments of the method for detecting a live financial violation, and will not be described herein.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method according to the embodiments of the present invention.
The foregoing description of the preferred embodiments of the present invention should not be taken as limiting the scope of the invention, but rather should be understood to cover all modifications, equivalents, and alternatives falling within the scope of the invention as defined by the following description and drawings, or by direct or indirect application to other relevant art(s).

Claims (6)

1. The financial live broadcast violation detection method is characterized by comprising the following steps of:
acquiring an audio and video file of a financial live program;
determining the violation type of Chinese information in the audio and video file;
calculating the violation score of the financial live program according to the violation type;
the step of obtaining the audio and video file of the financial live program comprises the following steps:
acquiring a live link of a financial live program by utilizing a crawler tool;
acquiring an m3u8 file corresponding to the live link by using a package grabbing tool;
extracting a plurality of video ts slices from the m3u8 file, and merging the video ts slices to obtain an audio-video file;
the obtaining the live link of the financial live program by using the crawler tool comprises the following steps:
crawler crawling using scrapy in combination with selenium automated test tools;
the audio-video file comprises an audio part and a video part, and the step of determining the violation type of the Chinese information in the audio-video file comprises the following steps:
performing voice recognition on the audio part in the audio-video file to obtain text information;
segmenting a video part in the audio and video file to obtain a video segment;
carrying out violation risk identification on the text information and the text information in the video clip to obtain a violation type;
the step of segmenting the video part in the audio/video file to obtain a video segment comprises the following steps:
acquiring all complete sentences in the text information;
segmenting the video part in the audio and video file according to the complete sentence to obtain a video segment corresponding to the complete sentence;
and the step of identifying the offence risk of the text information and the text information in the video clip and obtaining the offence type further comprises the following steps:
inputting the text information in the text information into a text classification model by taking the complete sentence as a unit to obtain a violation type corresponding to the complete sentence;
performing character recognition on each video image in the video segment, and performing de-duplication on a recognition result to obtain character information in the video segment;
based on a preset violation keyword library, obtaining a violation type corresponding to the text information of the video clip;
the text information in the video clip comprises a barrage and comments.
2. The method of claim 1, wherein prior to the step of text recognition of each video image in the video clip, further comprising:
and extracting video frames from the video clips by taking the preset interval duration as a node to obtain video images.
3. A method of detecting a financial live violation according to claim 2, wherein the step of calculating a violation score for the financial live program based on the violation type comprises:
obtaining a degree value of each violation type and the number of times that the violation type appears in the complete sentence or the video clip corresponding to the complete sentence;
substituting the degree value and the times into a risk calculation formula to obtain the violation value of the financial live program.
4. Financial live broadcast violation detection device, its characterized in that, financial live broadcast violation detection device includes:
the acquisition module acquires an audio and video file of the financial live program;
the determining module is used for determining the violation type of the Chinese information in the audio and video file;
the calculation module is used for calculating the violation score of the financial live program according to the violation type;
the step of obtaining the audio and video file of the financial live program comprises the following steps:
acquiring a live link of a financial live program by utilizing a crawler tool;
acquiring an m3u8 file corresponding to the live link by using a package grabbing tool;
extracting a plurality of video ts slices from the m3u8 file, and merging the video ts slices to obtain an audio-video file;
the obtaining the live link of the financial live program by using the crawler tool comprises the following steps:
crawler crawling using scrapy in combination with selenium automated test tools;
the audio-video file comprises an audio part and a video part, and the step of determining the violation type of the Chinese information in the audio-video file comprises the following steps:
performing voice recognition on the audio part in the audio-video file to obtain text information;
segmenting a video part in the audio and video file to obtain a video segment;
carrying out violation risk identification on the text information and the text information in the video clip to obtain a violation type;
the step of segmenting the video part in the audio/video file to obtain a video segment comprises the following steps:
acquiring all complete sentences in the text information;
segmenting the video part in the audio and video file according to the complete sentence to obtain a video segment corresponding to the complete sentence;
and the step of identifying the offence risk of the text information and the text information in the video clip and obtaining the offence type further comprises the following steps:
inputting the text information in the text information into a text classification model by taking the complete sentence as a unit to obtain a violation type corresponding to the complete sentence;
performing character recognition on each video image in the video segment, and performing de-duplication on a recognition result to obtain character information in the video segment;
based on a preset violation keyword library, obtaining a violation type corresponding to the text information of the video clip;
the text information in the video clip comprises a barrage and comments.
5. A financial live broadcast violation detection device comprising a memory, a processor and a financial live broadcast violation detection program stored on the memory and executable on the processor, the financial live broadcast violation detection program when executed by the processor implementing the steps of the financial live broadcast violation detection method of any of claims 1-3.
6. A computer readable storage medium, characterized in that it has stored thereon a financial live broadcast violation detection program, which when executed by a processor, implements the steps of the financial live broadcast violation detection method of any of claims 1-3.
CN202110222936.9A 2021-02-26 2021-02-26 Financial live broadcast violation detection method, device, equipment and readable storage medium Active CN113038153B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110222936.9A CN113038153B (en) 2021-02-26 2021-02-26 Financial live broadcast violation detection method, device, equipment and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110222936.9A CN113038153B (en) 2021-02-26 2021-02-26 Financial live broadcast violation detection method, device, equipment and readable storage medium

Publications (2)

Publication Number Publication Date
CN113038153A CN113038153A (en) 2021-06-25
CN113038153B true CN113038153B (en) 2023-06-02

Family

ID=76464880

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110222936.9A Active CN113038153B (en) 2021-02-26 2021-02-26 Financial live broadcast violation detection method, device, equipment and readable storage medium

Country Status (1)

Country Link
CN (1) CN113038153B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113852835A (en) * 2021-09-22 2021-12-28 北京百度网讯科技有限公司 Live broadcast audio processing method and device, electronic equipment and storage medium
CN114245160A (en) * 2021-12-07 2022-03-25 北京达佳互联信息技术有限公司 Information processing method, information processing device, electronic equipment and storage medium
CN114786035A (en) * 2022-05-25 2022-07-22 上海氪信信息技术有限公司 Compliance quality inspection and interactive question-answering system and method for live scene

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2824330A1 (en) * 2011-01-12 2012-07-19 Videonetics Technology Private Limited An integrated intelligent server based system and method/systems adapted to facilitate fail-safe integration and/or optimized utilization of various sensory inputs
CN107222780A (en) * 2017-06-23 2017-09-29 中国地质大学(武汉) A kind of live platform comprehensive state is perceived and content real-time monitoring method and system
CN108170813A (en) * 2017-12-29 2018-06-15 智搜天机(北京)信息技术有限公司 A kind of method and its system of full media content intelligent checks
CN108419126A (en) * 2018-01-23 2018-08-17 广州虎牙信息科技有限公司 Abnormal main broadcaster's recognition methods, storage medium and the terminal of platform is broadcast live
CN111767482A (en) * 2020-05-21 2020-10-13 中国地质大学(武汉) Self-adaptive crawling method for focused web crawler

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7698188B2 (en) * 2005-11-03 2010-04-13 Beta-Rubicon Technologies, Llc Electronic enterprise capital marketplace and monitoring apparatus and method
FR2981189B1 (en) * 2011-10-10 2013-11-01 Thales Sa NON-SUPERVISED SYSTEM AND METHOD OF ANALYSIS AND THEMATIC STRUCTURING MULTI-RESOLUTION OF AUDIO STREAMS
KR101872870B1 (en) * 2017-10-12 2018-06-29 이광재 Pulse diagnosis apparatus and pulse diagnosis method thereof
CN110019817A (en) * 2018-12-04 2019-07-16 阿里巴巴集团控股有限公司 A kind of detection method, device and the electronic equipment of text in video information
CN111415654B (en) * 2019-01-07 2023-12-08 北京嘀嘀无限科技发展有限公司 Audio recognition method and device and acoustic model training method and device
CN110085213B (en) * 2019-04-30 2021-08-03 广州虎牙信息科技有限公司 Audio abnormity monitoring method, device, equipment and storage medium
CN110310663A (en) * 2019-05-16 2019-10-08 平安科技(深圳)有限公司 Words art detection method, device, equipment and computer readable storage medium in violation of rules and regulations
CN110798703A (en) * 2019-11-04 2020-02-14 云目未来科技(北京)有限公司 Method and device for detecting illegal video content and storage medium
CN111090776B (en) * 2019-12-20 2023-06-30 广州市百果园信息技术有限公司 Video auditing method and device, auditing server and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2824330A1 (en) * 2011-01-12 2012-07-19 Videonetics Technology Private Limited An integrated intelligent server based system and method/systems adapted to facilitate fail-safe integration and/or optimized utilization of various sensory inputs
CN107222780A (en) * 2017-06-23 2017-09-29 中国地质大学(武汉) A kind of live platform comprehensive state is perceived and content real-time monitoring method and system
CN108170813A (en) * 2017-12-29 2018-06-15 智搜天机(北京)信息技术有限公司 A kind of method and its system of full media content intelligent checks
CN108419126A (en) * 2018-01-23 2018-08-17 广州虎牙信息科技有限公司 Abnormal main broadcaster's recognition methods, storage medium and the terminal of platform is broadcast live
CN111767482A (en) * 2020-05-21 2020-10-13 中国地质大学(武汉) Self-adaptive crawling method for focused web crawler

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
浅谈IPTV业务监管中EPG的采集技术与应用;祖燕;《电视工程》;全文 *

Also Published As

Publication number Publication date
CN113038153A (en) 2021-06-25

Similar Documents

Publication Publication Date Title
CN113038153B (en) Financial live broadcast violation detection method, device, equipment and readable storage medium
CN110213610B (en) Live broadcast scene recognition method and device
US20180160200A1 (en) Methods and systems for identifying, incorporating, streamlining viewer intent when consuming media
CN110008378B (en) Corpus collection method, device, equipment and storage medium based on artificial intelligence
US8510795B1 (en) Video-based CAPTCHA
CN111507097B (en) Title text processing method and device, electronic equipment and storage medium
CN104766014A (en) Method and system used for detecting malicious website
CN106354861A (en) Automatic film label indexing method and automatic indexing system
CN107153716B (en) Webpage content extraction method and device
KR20160055930A (en) Systems and methods for actively composing content for use in continuous social communication
US10909174B1 (en) State detection of live feed
CN106708823A (en) Search processing method, apparatus and system
US20150161278A1 (en) Method and apparatus for identifying webpage type
CN102063484B (en) Discovery method and device of third-party WEB application program
CN112995690B (en) Live content category identification method, device, electronic equipment and readable storage medium
US20180336320A1 (en) System and method for interacting with information posted in the media
CN110889034A (en) Data analysis method and data analysis system
CN104881428A (en) Information graph extracting and retrieving method and device for information graph webpages
CN114422211A (en) HTTP malicious traffic detection method and device based on graph attention network
CN113626624B (en) Resource identification method and related device
CN114880458A (en) Book recommendation information generation method, device, equipment and medium
CN117745237A (en) Content inspection method, device, equipment and storage medium
CN116089732B (en) User preference identification method and system based on advertisement click data
CN117173608A (en) Video content auditing method and system
CN112911323B (en) Live broadcast interaction evaluation method and device, electronic equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant