CN112995696A - Live broadcast room violation detection method and device - Google Patents

Live broadcast room violation detection method and device Download PDF

Info

Publication number
CN112995696A
CN112995696A CN202110424189.7A CN202110424189A CN112995696A CN 112995696 A CN112995696 A CN 112995696A CN 202110424189 A CN202110424189 A CN 202110424189A CN 112995696 A CN112995696 A CN 112995696A
Authority
CN
China
Prior art keywords
live broadcast
video
violation
broadcast room
anchor
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110424189.7A
Other languages
Chinese (zh)
Other versions
CN112995696B (en
Inventor
魏海巍
王伟伟
刘凯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Gongdao Network Technology Co ltd
Original Assignee
Gongdao Network Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Gongdao Network Technology Co ltd filed Critical Gongdao Network Technology Co ltd
Priority to CN202110424189.7A priority Critical patent/CN112995696B/en
Publication of CN112995696A publication Critical patent/CN112995696A/en
Application granted granted Critical
Publication of CN112995696B publication Critical patent/CN112995696B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/21Server components or server architectures
    • H04N21/218Source of audio or video content, e.g. local disk arrays
    • H04N21/2187Live feed
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/233Processing of audio elementary streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/23418Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • H04N21/4394Processing of audio elementary streams involving operations for analysing the audio stream, e.g. detecting features or characteristics in audio streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a method and a device for detecting violation of live broadcast room, and the method is characterized by comprising the following steps: acquiring a live broadcast data stream of a target live broadcast room; the live data stream comprises a video stream and an audio stream corresponding to the video stream; carrying out segmentation processing on the video stream to obtain a plurality of video segments; extracting features from the video clips, and determining whether the extracted features are matched with anchor features corresponding to the target live broadcast room; and if so, converting the audio clip corresponding to the video clip into text information, and carrying out violation detection on the target live broadcast room based on the text information. According to the technical scheme, the video clip containing the anchor characteristics in the live data stream is determined, so that violation detection is more targeted; and carrying out violation detection based on the determined audio clip corresponding to the video clip, and carrying out violation detection by converting the words of the anchor in live broadcast delivery into text information, thereby improving the accuracy of violation detection.

Description

Live broadcast room violation detection method and device
Technical Field
The application relates to the technical field of communication, in particular to a live broadcast break detection method and device.
Background
Currently, with the fire heat of live broadcast platforms, selling goods through live broadcast becomes a popular sales mode.
Generally, the anchor introduces basic information, functional characteristics, preferential schemes, and the like of the commodities during live broadcasting. However, in order to attract customers, the anchor may conduct illegal activities such as exaggeration, illegal sales promotion, and even fraud or sale of counterfeit products.
The illegal act infringes the rights and interests of consumers and disturbs the market order, and the illegal act must be handled in time. However, because live broadcast is too hot, a large number of live broadcast rooms exist, and live broadcast can be carried out at any time. Therefore, the current supervision requirements are difficult to meet only by manual supervision.
Disclosure of Invention
In view of this, the present application provides a method and an apparatus for detecting a violation of a live broadcast room, where the violation of the live broadcast room is detected by determining a violation of a host.
Specifically, the method is realized through the following technical scheme:
in a first aspect, the present application provides a method for detecting a live broadcast violation, where the method includes:
acquiring a live broadcast data stream of a target live broadcast room; the live data stream comprises a video stream and an audio stream corresponding to the video stream;
carrying out segmentation processing on the video stream to obtain a plurality of video segments;
extracting features from the video clips, and determining whether the extracted features are matched with anchor features corresponding to the target live broadcast room;
and if so, converting the audio clip corresponding to the video clip into text information, and carrying out violation detection on the target live broadcast room based on the text information.
In a second aspect, the present application further provides a device for detecting violation in a live broadcast room, where the device includes:
the acquisition unit is used for acquiring a live broadcast data stream of a target live broadcast room; the live data stream comprises a video stream and an audio stream corresponding to the video stream;
the segmentation unit is used for segmenting the video stream to obtain a plurality of video segments;
the matching unit is used for extracting features from the video clips and determining whether the extracted features are matched with the anchor features corresponding to the target live broadcast room;
and the detection unit is used for converting the audio clip corresponding to the video clip into text information when the extracted features are matched with the anchor features corresponding to the target live broadcast room, and carrying out violation detection on the target live broadcast room based on the text information.
By analyzing the technical scheme, the video clip containing the anchor characteristics is determined by extracting the characteristics in the video stream of the target live broadcast room and matching the characteristics with the anchor characteristics corresponding to the target live broadcast room; further, an audio clip corresponding to the video clip is converted into text information, and violation detection is performed on the target live broadcast room based on the text information. The anchor is used as a person responsible for the live broadcast room and needs to be responsible for the live broadcast content, and the violation detection is more targeted by determining the video clip containing the anchor characteristic in the live broadcast data stream; further, because the characteristics of live broadcast area goods lie in that the anchor broadcasts promote, the information content that the audio clip contains is far more than the video clip, and this application carries out violation detection based on the audio clip that the video clip corresponds that the aforesaid was confirmed, converts text information into through saying the anchor broadcasts when live broadcast area goods and carries out violation detection, improves violation detection's the degree of accuracy.
Drawings
Fig. 1 is a flowchart illustrating a live broadcast break detection method according to an exemplary embodiment of the present application;
fig. 2 is a flow chart illustrating another live break detection method according to an exemplary embodiment of the present application;
fig. 3 is a block diagram illustrating a live broadcast violation detection apparatus according to an exemplary embodiment of the present application.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present application. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.
When violation detection is carried out in the live broadcast room, different detection strategies can be adopted according to different live broadcast contents.
For example, for living or outdoor live broadcast rooms, detection may be performed on video frames of a video stream in a live broadcast data stream to determine whether illegal content occurs in the live broadcast frames.
For another example, for a live broadcast room of an emotion type or a chat type, an audio stream in a live broadcast data stream may be identified, the audio stream is converted into text content for detection, and it is determined whether a main broadcast says illegal content during live broadcast.
Live broadcast area goods is as fusing the new live broadcast type of two kinds of above-mentioned live broadcast room characteristics, and the anchor can be carried out the propaganda to commodity when carrying out commodity show, introduces the functional characteristics, the preferential strength etc. of commodity to audience through barrage and live broadcast room is interactive, answers audience's problem. In addition, the anchor may invite guests to promote or ask assistants to help show the merchandise.
When the anchor is promoting, it is mainly through speaking to introduce to the audience of the live room, the important information is all included in the audio stream. If the video pictures are audited, since the video streams mainly include conversation pictures among people and display pictures of commodities, illegal contents are not easy to detect based on the video pictures. However, if the audio stream is converted into text content for auditing, besides the content of the commodity propaganda of the concerned anchor, the audio stream also contains a lot of useless conversations, background noises and other contents irrelevant to violation judgment, and the detection of the irrelevant contents can reduce the detection efficiency.
Therefore, for the new live broadcast mode of live broadcast and live broadcast, the existing detection strategy is continuously used, and the supervision requirement cannot be met. And the manual patrol mode is adopted, so that the accuracy during detection can be improved, but the manual mode has low efficiency and cannot meet the requirement of monitoring of a mass live broadcast room.
In view of this, the present application provides a technical solution for converting an audio clip corresponding to a video clip containing anchor characteristics into text information in a targeted manner, and performing violation detection on a live broadcast room based on the text information.
When the method is realized, when violation detection is carried out on a target live broadcast room, live broadcast data streams of the target live broadcast room can be obtained; the live data stream may include a video stream, and an audio stream corresponding to the video stream.
After the live broadcast data stream of the target live broadcast room is obtained, the video stream can be processed in a segmented mode to obtain a plurality of video segments;
for example, the audio stream may be segmented by a voice segmentation technique, an audio segment with voice is retained, and the video stream is segmented according to a timestamp range corresponding to the retained audio segment to obtain a plurality of video segments.
Extracting features from the video clips, and determining whether the extracted features are matched with anchor features corresponding to the target live broadcast room;
for example, the features extracted from the video segment may be face features, and the face features extracted from the video segment are identified by a face identification technology, so as to determine whether the face features match with the face features of the anchor broadcast corresponding to the target live broadcast room.
If the features extracted from the video clip are matched with the anchor features corresponding to the target live broadcast room, indicating that an anchor picture appears in the video clip, then further converting the audio clip corresponding to the video clip into text information, and carrying out violation detection on the target live broadcast room based on the text information;
for example, when the features extracted from the video clip are matched with the anchor features corresponding to the target live broadcast room, the audio clip corresponding to the video clip can be converted into text information according to a voice recognition technology; the text information is subjected to word segmentation to obtain a plurality of keywords, the keywords are matched with violation keywords in a preset violation keyword library, and whether the target live broadcast room violates rules or not is determined according to a matching result.
In the technical scheme, the video clip containing the anchor characteristics can be determined by extracting the characteristics in the video stream of the target live broadcast room and matching the characteristics with the anchor characteristics corresponding to the target live broadcast room; further, an audio clip corresponding to the video clip can be converted into text information, and violation detection is performed on the target live broadcast room based on the text information. The anchor is used as a person responsible for the live broadcast room and needs to be responsible for the live broadcast content, and the violation detection is more targeted by determining the video clip containing the anchor characteristic in the live broadcast data stream; further, because the characteristics of live broadcast area goods lie in that the anchor broadcasts promote, the information content that the audio clip contains is far more than the video clip, and this application carries out violation detection based on the audio clip that the video clip corresponds that the aforesaid was confirmed, converts text information into through saying the anchor broadcasts when live broadcast area goods and carries out violation detection, improves violation detection's the degree of accuracy.
Next, examples of the present application will be described in detail.
Referring to fig. 1, fig. 1 is a flowchart illustrating a live broadcast break detection method according to an exemplary embodiment of the present application, where the method includes the following steps:
step 101: acquiring a live broadcast data stream of a target live broadcast room; the live data stream comprises a video stream and an audio stream corresponding to the video stream;
step 102: carrying out segmentation processing on the video stream to obtain a plurality of video segments;
step 103: extracting features from the video clips, and determining whether the extracted features are matched with anchor features corresponding to the target live broadcast room;
step 104: and if so, converting the audio clip corresponding to the video clip into text information, and carrying out violation detection on the target live broadcast room based on the text information.
In this embodiment, when violation detection is performed on the target live broadcast room, the live broadcast data stream of the target live broadcast room may be obtained through the live broadcast platform, and then the video stream and the audio stream corresponding to the video stream may be analyzed from the live broadcast data stream.
The video stream and the audio stream correspond to each other and have the same time axis, so that the picture in the video and the sound in the audio can be synchronized.
In addition, when the live broadcast data stream of the live broadcast room is obtained, the live broadcast data stream of the live broadcast room can be obtained in real time, and uninterrupted detection of the live broadcast room is realized; random spot check can be carried out on the live broadcast room, live broadcast data flow of the live broadcast room in a specific time period is obtained, and violation detection of sampling is carried out on the live broadcast room; and the live broadcast data stream of the live broadcast room can be periodically acquired based on the preset interval time length, so that the periodic detection of the live broadcast room is realized. In the present specification, the acquisition mode of the live data stream is not specifically limited, and in practical applications, a person skilled in the art can select different live data stream acquisition modes according to the detection requirements of the live platform on different live rooms.
For example, for a live broadcast with a high possibility of violation or a live broadcast with a frequent violation, in order to meet strict detection requirements, it is possible to achieve stricter detection by increasing the acquisition frequency of the live broadcast data stream; for the live broadcast room with low possibility of violation or less violation behaviors, if the live broadcast data stream is frequently acquired and detected, a lot of useless work can be done, and at the moment, the loose detection requirement can be adopted, and the detection requirement can be met through sampling detection.
In practical application, since the violation of the anchor often has no persistence, it usually only occurs in a certain specific period of the live broadcast process; thus, in a live data stream in a live room, there may often be a large number of invalid video or audio segments that do not help with violation detection. In this case, if the video stream and the audio stream in the live data stream are subjected to the violation detection frame by frame, it is obvious that the detection efficiency of the violation detection is reduced.
For example, in the live broadcasting process, if the anchor does not speak in a certain time period but looks at the barrage of the live broadcasting room, the audio clip corresponding to the time period has no voice, only background noise and no effect on violation detection; therefore, if the audio stream in the live broadcast data stream is subjected to the frame-by-frame violation detection, it is obvious that the detection efficiency of the violation detection is affected by the need to process a large number of audio frames that do not have any effect on the violation detection.
In this embodiment, in order to improve detection efficiency of violation detection, before violation detection is performed on a live broadcast data stream based on the live broadcast, video streams and audio streams in the obtained live broadcast data stream may be segmented, and then violation detection processing is performed on video segments and audio segments obtained after the segmentation processing.
Note that, a specific segmentation method for performing segmentation processing on a video stream and an audio stream in an acquired live data stream is not particularly limited in this specification.
The live broadcast delivery is characterized in that the anchor can interact with audiences in real time and answer the problems of the audiences while the goods are displayed in all directions, so that an audio part containing the voice signals of the anchor in the live broadcast data stream is an important detection object during violation detection.
Thus, in one illustrated embodiment, a video stream in a live data stream may be segmented based on an active audio segment in the live data stream containing a main voice signal.
In implementation, Voice signals emitted by the anchor in the audio stream can be distinguished from various background noise signals by detecting Voice boundaries where Voice appears and disappears in the audio signal corresponding to the audio stream in the live data stream by using VAD (Voice Activity Detection) technology.
After distinguishing the voice signal sent by the anchor in the audio stream from various background noise signals, an audio segment containing the voice signal sent by the anchor in the audio stream can be determined as an effective audio segment, and then the audio stream and the video stream are respectively segmented based on the time stamp range corresponding to the determined effective audio segment to obtain a plurality of effective audio segments containing the voice signal sent by the anchor and video segments corresponding to the effective audio segments.
It should be noted that, the detailed process of VAD voice activation detection for the audio stream in the live data stream is not described in detail in this specification, and those skilled in the art may refer to the description in the related art.
In addition, a face detection technology can be used for detecting whether a face appears in a video frame corresponding to a video stream in the live broadcast data stream, and the video stream is divided into a part with the face and a part without the face.
When the part of the video stream containing the face is distinguished, the audio stream and the video stream can be respectively segmented based on the corresponding timestamp range when the face appears, so as to obtain the video segments containing the face and the audio segments corresponding to the video segments.
After the video stream is subjected to the segmentation processing, video segments which do not have any effect on violation detection in the live data stream can be removed, and a plurality of video segments to be detected are reserved.
It should be noted that, besides the anchor, guests or assistants of the anchor may also appear in the live broadcast; therefore, in the pictures of several video segments obtained after the segmentation process, a main broadcast may appear, and guests or assistants may also appear.
The anchor is used as a responsible person of the live broadcast room, and needs to follow the rules of the live broadcast platform, take charge of the live broadcast content and simultaneously restrain the lines of participants of the live broadcast room; therefore, when detecting the violation in the live broadcast, in order to further improve the detection efficiency of the violation detection, the video clip with the occurrence of the anchor may be focused, and the anchor may be used as a main detection object.
In this embodiment, in order to further determine a video segment containing anchor information, after a video stream is segmented to obtain a plurality of video segments, features may be extracted from each video segment, and it is determined whether the extracted features match anchor features corresponding to a target live broadcast room, so that the video segment containing the anchor features is determined as a video segment with an anchor.
The anchor characteristics corresponding to the target live broadcast room can be anchor characteristics recorded by a live broadcast platform when the anchor applies for the live broadcast room, and the characteristics can be uniformly stored in an anchor characteristic library.
Specifically, the machine learning model may be trained in advance based on a plurality of feature data samples labeled with anchor, and the trained machine learning model may be used as a model for anchor feature matching of features extracted from a video segment.
For the features extracted from the video segment, the features can be input into the anchor feature matching model, so as to obtain the result of feature matching.
In one illustrated embodiment, the features may include human face features.
Further, a video frame can be extracted from the video segment, and facial features can be extracted from the video frame; and determining whether the extracted face features are matched with the anchor face features corresponding to the target live broadcast room.
Specifically, the anchor feature matching model may be an anchor face feature matching model, and the machine learning model may be trained in advance based on a plurality of face feature sample data labeled on an anchor, and the trained machine learning model is used as a model for performing anchor face feature matching on face features extracted from video frames.
For the face features extracted from the video frame, the face features can be input into the anchor face feature matching model, so that the result of face feature matching extracted from the video frame is obtained.
For example, when a plurality of persons appear in a video clip, extracting a video frame for the video clip and extracting facial features from the video frame; and matching the extracted multiple face features with the anchor face features corresponding to the target live broadcast room, and determining whether the anchor face features exist in the multiple face features extracted from the video frame.
When the characteristics of a plurality of video segments are matched and the video segments containing the anchor characteristics are determined, in order to improve the efficiency of finding out illegal behaviors during violation detection, the video segments with higher possibility of illegal behaviors in the plurality of video segments can be determined, and priority characteristic matching is performed.
For example, it is easy for viewers to be motivated by the mood of the anchor and to be induced by the anchor for impulsive consumption when watching a live broadcast. For example, when the anchor broadcasts goods, the anchor often adopts a time-limited preferential form, and when the countdown is about to end, the anchor often describes the strength of the goods preferential, emphasizes the functions of the goods and brings a feeling of loss without buying to consumers.
In the above process, the anchor may generate exaggerated publicity, violation of advertisement law, and the like in order to sell more products. Therefore, when the anchor has a large emotion fluctuation, the anchor is likely to be accompanied by an illegal behavior, and it is necessary to give priority to whether the emotion of the anchor has a large fluctuation or not during the illegal detection, so as to improve the accuracy of the illegal detection in the live broadcast room.
Thus, in one embodiment shown, intonation emotion recognition may be performed based on the active audio segment; when the recognized emotion hits a preset emotion, the feature matching can be preferentially performed on the video segment corresponding to the effective audio segment.
The preset emotions may include strong emotions such as excitement, anger, sadness, and the like.
Specifically, the machine learning model may be trained in advance based on a plurality of voice feature data samples labeled with different emotions, and the trained machine learning model may be used as a model for performing intonation and emotion recognition on the effective audio piece.
For the effective audio segment, the sound features in the effective audio segment can be extracted first, and then the sound features are input to the speech emotion recognition model, so that a result of the speech emotion recognition is obtained.
And subsequently, if the recognized emotion hits the preset emotion, preferentially performing feature matching on the video segment corresponding to the effective audio segment meeting the condition.
For example, suppose there are two video segments, the first is a anchor introducing merchandise calmly, and the second is a strong sound of the anchor urging the viewer to catch up with the order purchase. Then, when the intonation emotion recognition is performed on the effective audio segments corresponding to the two video segments, it can be determined that the emotion recognized by the second segment hits a preset emotion, and feature matching needs to be performed on the second video segment preferentially.
In addition, after the video segments containing the anchor characteristics are determined, the intonation emotion recognition can be carried out according to the audio segments corresponding to the video segments; when the recognized emotion hits a preset emotion, subsequent processing can be preferentially performed on the video clip.
After the characteristics in the video clip are matched with the anchor characteristics corresponding to the target live broadcast room, the video clip corresponding to the anchor can be determined. Further, violation detection needs to be performed on the video segment to determine whether a violation occurs in the video segment.
For the live broadcast with goods scene, as the anchor mainly carries out propaganda of goods through words, if detection is carried out only according to pictures in the video, a lot of important information can be missed. Therefore, the important point of the violation detection of live delivery is audio information corresponding to video, and the violation detection is performed for the main speech.
In this embodiment, if the features extracted from the video clip are matched with the anchor features corresponding to the target live broadcast room, the audio clip corresponding to the video clip is converted into text information, and violation detection is performed on the target live broadcast room based on the text information.
Specifically, an ASR (Automatic Speech Recognition) may be used to input the audio segment corresponding to the video segment into a preset Speech Recognition model, and obtain the text information corresponding to the audio segment as an output result.
Furthermore, text content identification can be carried out on the obtained text information, whether illegal contents appear in the words spoken by the anchor is detected, and therefore whether illegal behaviors happen in the target live broadcast room is determined.
For example, suppose that a main audio clip is inputted to a preset voice recognition model, and text information conversion is performed, and the text information is obtained as "highly imitated handbag, absolutely in a false, and after luxury … …" is enjoyed at a low price; and identifying the content based on the text information, and identifying that the anchor sells high-imitation products, so that the fact that the illegal action occurs in the live broadcast room can be determined.
After the matching of the human face features of the anchor, if the features extracted from the video clip are matched with the anchor features corresponding to the target live broadcast room, which indicates that the anchor is present in the video clip, the audio clip corresponding to the video clip is further converted into text information, and violation detection is performed on the target live broadcast room based on the text information; and if not, indicating that the anchor is not present in the video segment.
However, when the anchor is not present in the video clip, it is still possible to introduce the merchandise and there is a possibility of violation. Therefore, when the violation detection in the live broadcast room needs to adopt a stricter detection strategy, in order to expand the detection range, the video clip without the occurrence of the anchor can be further detected.
In one embodiment, if the extracted features do not match the anchor features corresponding to the target live broadcast room, further extracting voiceprint features from an active audio clip corresponding to the video clip; determining whether the extracted voiceprint features are matched with anchor voiceprint features corresponding to the target live broadcast room; if yes, converting the effective audio clip corresponding to the video clip into text information, and carrying out live broadcast violation detection on the target live broadcast room based on the text information.
Specifically, when the features extracted from the video clip are not matched with the anchor features corresponding to the target live broadcast room, the voiceprint features can be further extracted from the effective audio clip corresponding to the video clip.
The machine learning model can be trained in advance based on a plurality of voiceprint feature data samples marked with anchor, and the trained machine learning model is used as a model for matching the extracted voiceprints with the anchor voiceprint features.
For the voiceprint features extracted from the effective audio clip, the voiceprint features can be input into the anchor voiceprint feature matching model, so that a voiceprint feature matching result is obtained.
And if the extracted voiceprint features are matched with the corresponding anchor voiceprint features of the target live broadcast room, determining that the anchor is speaking although the anchor does not appear in the video segment. In order to meet a stricter detection strategy, an effective audio clip corresponding to the video clip needs to be converted into text information, and live broadcast violation detection is performed on a target live broadcast room based on the text information.
For example, after a video frame is extracted from a certain video segment and facial features are extracted from the video frame, if the extracted facial features do not match with the anchor facial features corresponding to the target live broadcast room, it is indicated that no anchor appears in the video segment.
Further, in order to satisfy a stricter detection strategy, voiceprint features can be extracted from the video clip, the voiceprint features are input into a preset anchor voiceprint feature matching model, and whether the sound appearing in the video clip is the sound of an anchor is determined.
If the sound appearing in the video segment is the sound of the anchor, it is shown that the anchor is speaking although the anchor is not present in the video segment. Obviously, the effective audio segment corresponding to the video segment needs to be converted into text information, and live broadcast violation detection needs to be performed on the target live broadcast room based on the text information.
When the violation detection is performed on the video segment containing the anchor characteristics, the effective audio segment corresponding to the video segment needs to be converted into text information, and the live violation detection is performed on the target live broadcast room based on the text information.
When the anchor is in the process of commodity promotion, the anchor may speak all the time in order to catch the attention of the audience, or a large amount of content may interfere the judgment of the audience on the commodities, so that the audience can give an order without being deeply concerned. At this time, the text information after the audio clip conversion in the above process is too long and the information is too messy, so that the violation detection efficiency for the text information is reduced.
In an embodiment shown, the text information is subjected to word segmentation to obtain a plurality of keywords; matching the keywords with the violation keywords in a preset violation keyword library respectively; and marking the violation keywords with corresponding violation types respectively. And if the keywords are matched with any illegal keyword in the illegal keyword library, determining the illegal type corresponding to the illegal keyword as the illegal type of the target live broadcast room.
Specifically, word segmentation processing can be performed on continuous text information according to a preset word segmentation rule to obtain a plurality of keywords; the preset word segmentation rule may be a dictionary set by a person skilled in the art as required.
For example, assuming that the text message is "XX milk powder is produced by the world-leading latest technology and is top-level milk powder available at the price", based on the preset word segmentation rule, the following keywords can be obtained: XX milk powder, world leading, latest technology, top grade.
Matching the keywords with the violation keywords in a preset violation keyword library respectively; the rule violation keyword library can be composed of keywords of multiple rule violation types, and the rule violation keywords are labeled with corresponding rule violation types respectively.
And if the keywords obtained from the text information are respectively matched with the illegal keywords in the illegal keyword library, determining the illegal type corresponding to the matched keywords as the illegal type of the target live broadcast room.
Continuing the example, matching the keywords 'XX milk powder, world leading, latest technology and top level' obtained from the text information with the violation keywords in the preset violation keyword library respectively. In the preset keyword library, "world leading, latest technology and top level" are marked as limit terms in the advertising law, and the anchor uses the words to violate the advertising limit terms during propaganda.
In addition, the violation types include selling prohibited goods, using prohibited words, performing false publicity, and the like, in addition to the violation of the advertisement limit word mentioned above.
By performing word segmentation processing on the text information, keywords can be obtained and rule violation keyword library matching can be performed, and when the number of dictionaries in the preset word segmentation rule is more, the number of keywords obtained by word segmentation processing is more.
However, there is a limit to the extent of improving the violation detection accuracy by increasing keywords.
For example, whether the violation of 'XX star strongly recommends XX milk powder', 'XX star is disapproval of XX milk powder' cannot be identified through keyword matching, the star does not say the brand in fact, and the propaganda is not made for the brand, and the anchor belongs to false propaganda.
In this case, semantics in the text information can be recognized by NLU (Natural Language Understanding), and the accuracy of violation detection can be further improved.
In one embodiment, the text information is semantically recognized; and performing live broadcast violation detection on the target live broadcast room based on the semantic recognition result.
Continuing with the above example, through semantic recognition, it can be known that the anchor is using XX stars for merchandise promotion. Further, when the host mentions the star, matching can be performed according to the preset relationship between the star and the dialect brand. When the brand is not matched, the anchor is determined to have a false hype violation.
After the anchor violation is determined, the violation can be recorded and the target live broadcast room can be scored.
In one embodiment, if the target live broadcast room is determined to have violation, recording violation data to a violation database; wherein the violation data comprises a number of violations and/or the violation type; scoring the live broadcast room based on violation data corresponding to the target live broadcast room recorded in the violation database; wherein the scoring mechanism comprises a deduction or accumulation of a score; limiting the authority of the anchor when the score is below or above a threshold.
Specifically, when it is determined that the target live broadcast room has an illegal behavior, a timestamp of the illegal anchor having the illegal behavior, the number of times of the illegal behavior, and the type of the violation corresponding to each illegal behavior of the anchor may be recorded in the violation database.
When the occurrence of the violation of the anchor is detected, an alarm prompt may be generated in the live broadcast room, where the alarm prompt may include a type corresponding to the violation of the anchor.
By establishing the violation database, a scoring model of the live broadcast room can be established based on the violation database, and scoring is carried out on the live broadcast room; or when the result of violation detection disputes, the violation detection system can perform recheck according to historical data recorded in the violation database.
And according to the violation data corresponding to the target live broadcast room recorded in the violation database, the number of violations and/or the violation type are included.
It is worth noting that the score corresponding to each violation may be the same, or may be increased according to the increase of the number of times; different violation types may be assigned different scores depending on the severity of the violation.
Further, the scoring mechanism may be a decreasing score or an increasing score.
When the scoring mechanism is a score reduction system, and every time violation behavior occurs, corresponding scores are deducted; when the score is below a threshold, the authority of the anchor is restricted.
For example, assuming that each live room is initially scored at 100 points, the anchor takes a false promotion the first time a tenth, after a warning the second time a 15 point may be deducted, and when the score is below 60 points, the live room traffic may be restricted or the anchor's rights may be restricted.
When the scoring mechanism is the adding scoring system, corresponding scores are accumulated when violation behaviors occur each time; when the score is above a threshold, the authority of the anchor is restricted.
For example, assuming that each live room is initially scored 0, the anchor is added ten minutes for the first time when a false promotion occurs, and 15 points for the second time after a warning, when the score is higher than 60, the live room traffic may be limited or the authority of the anchor may be limited.
The authority of the anchor can include a purchase link of a commodity added in a live interface, live duration and the like, and the method is not limited in the application.
In the technical scheme, the video clip containing the anchor characteristics is determined by extracting the characteristics in the video stream of the target live broadcast room and matching the characteristics with the anchor characteristics corresponding to the target live broadcast room; further, an audio clip corresponding to the video clip is converted into text information, and violation detection is performed on the target live broadcast room based on the text information. The anchor is used as a person responsible for the live broadcast room and needs to be responsible for the live broadcast content, and the violation detection is more targeted by determining the video clip containing the anchor characteristic in the live broadcast data stream; further, because the characteristics of live broadcast area goods lie in that the anchor broadcasts promote, the information content that the audio clip contains is far more than the video clip, and this application carries out violation detection based on the audio clip that the video clip corresponds that the aforesaid was confirmed, converts text information into through saying the anchor broadcasts when live broadcast area goods and carries out violation detection, improves violation detection's the degree of accuracy.
Referring to fig. 2, fig. 2 is a flowchart illustrating another live broadcast violation detection method according to an exemplary embodiment of the present application.
As shown in fig. 2, in an embodiment, a live broadcast violation detection method includes the following steps:
s201: acquiring a live broadcast data stream of a target live broadcast room;
wherein the live data stream includes a video stream and an audio stream corresponding to the video stream.
S202: valid audio segments and corresponding video segments are determined by VAD voice activation detection.
Specifically, VAD voice activation detection is performed on an audio stream in the obtained live data stream to determine an effective audio segment containing a voice signal in the audio stream; and segmenting the video stream based on the determined timestamp range corresponding to the effective audio segment to obtain the video segment corresponding to the effective audio segment.
For example, by determining an audio segment of a valid spoken utterance in a live data stream, a corresponding video segment can be determined based on a timestamp of the audio segment.
S203: and determining the video clip which is preferentially matched according to the intonation emotion recognition result of the effective audio clip.
Specifically, the intonation emotion recognition is carried out based on the effective audio segments; and when the recognized emotion hits preset emotion, preferentially performing feature matching on the video segment corresponding to the effective audio segment.
For example, for a plurality of effective audio segments, when a certain effective audio segment intonation emotion recognition result hits a preset emotion, the video segments corresponding to the effective audio segments are subjected to priority matching.
S204: and matching the face features in the video frames with the anchor face features.
Specifically, video frames are extracted from the video clips, and human face features are extracted from the video frames; and determining whether the extracted face features are matched with the anchor face features corresponding to the target live broadcast room.
For example, it may be that a guest or assistant in a video segment is talking, not an anchor with an emphasis on attention, and the segment may be ignored. Therefore, the video segment appearing in the anchor can be determined according to the human face characteristics, and the video segment irrelevant to the anchor is provided.
S205: determining whether the number of video frames in the video clip that match the anchor feature corresponding to the target live broadcast room is greater than a threshold.
If yes, go to step S208;
for example, when a plurality of persons appear in the video segment, a plurality of video frames may be extracted from the video segment, and when the number of matched video frames is greater than a threshold value, the video segment is determined to be the video segment to be detected.
If not, step S206 is performed.
S206: and extracting the voiceprint features from the effective audio clip corresponding to the video clip.
Specifically, if the extracted features are not matched with the anchor features corresponding to the target live broadcast room, voiceprint features are further extracted from an effective audio clip corresponding to the video clip.
For example, when the anchor is not present in the screen, there is still a possibility that an introduction is being made to the merchandise and there is a possibility that there is a violation. Therefore, when a stricter strategy is required to be adopted for the violation detection of the live broadcast room, the detection range can be expanded, and the segments without the anchor in the picture can be further detected.
S207: and determining whether the extracted voiceprint features are matched with the anchor voiceprint features corresponding to the target live broadcast room.
If so, indicating that the anchor is not in the picture but speaking, step S208 is performed.
S208: and converting the audio clip corresponding to the video clip into text information.
S209: and performing word segmentation on the text information to obtain keywords, and matching the keywords with the rule-breaking keyword library.
Specifically, word segmentation processing is carried out on the text information to obtain a plurality of keywords;
matching the keywords with the violation keywords in a preset violation keyword library respectively; and marking the violation keywords with corresponding violation types respectively.
And if the keywords are matched with any illegal keyword in the illegal keyword library, determining the illegal type corresponding to the illegal keyword as the illegal type of the target live broadcast room.
For example, when the text information is too long, word segmentation processing may be performed on the text information to obtain keywords and match the keywords with a preset keyword library. And when the keyword is matched, determining the violation type corresponding to the keyword.
S210: and performing semantic recognition on the text information.
Specifically, the detection accuracy can be further improved by adding semantic recognition.
S211: and judging whether the target live broadcast room violates rules or not based on the text information identification result.
Specifically, whether the target live broadcast room is illegal or not can be judged according to a matching result of the acquired keywords and a preset illegal keyword library and/or a semantic recognition result.
If so, go to step S212.
S212: the live room is scored based on data recorded in the violation database.
Specifically, if the target live broadcast room is determined to have violation behaviors, violation data is recorded into a violation database;
wherein the violation data comprises a number of violations and/or the violation type;
scoring the live broadcast room based on violation data corresponding to the target live broadcast room recorded in the violation database;
wherein the scoring mechanism comprises a deduction or accumulation of a score;
for example, when a violation is detected in the target live broadcast room, data related to the violation, including a violation timestamp, a number of violations, and a type of violation, may be recorded in the violation database. Meanwhile, the live room may be scored based on historical records in the violation database.
S213: it is determined whether the live space current rating is below or above a threshold.
If not, executing S214 and outputting an alarm prompt to the live broadcast room.
If so, S215 is executed to limit the authority of the anchor.
For example, when the scoring mechanism is a score reduction system, each time a violation occurs, a corresponding score is deducted; when the score is lower than the threshold value, limiting the authority of the anchor; when the scoring mechanism is the additional scoring, corresponding scores are accumulated when violation behaviors occur each time; when the score is above a threshold, the authority of the anchor is restricted.
As can be seen from the above embodiment, by extracting the features in the video stream of the target live broadcast room and matching the features with the anchor features corresponding to the target live broadcast room, a video clip containing the anchor features is determined; further, an audio clip corresponding to the video clip is converted into text information, and violation detection is performed on the target live broadcast room based on the text information. The anchor is used as a person responsible for the live broadcast room and needs to be responsible for the live broadcast content, and the violation detection is more targeted by determining the video clip containing the anchor characteristic in the live broadcast data stream; further, because the characteristics of live broadcast area goods lie in that the anchor broadcasts promote, the information content that the audio clip contains is far more than the video clip, and this application carries out violation detection based on the audio clip that the video clip corresponds that the aforesaid was confirmed, converts text information into through saying the anchor broadcasts when live broadcast area goods and carries out violation detection, improves violation detection's the degree of accuracy.
Corresponding to the embodiment of the method, the specification further provides an embodiment of a device for detecting the violation of the live broadcast room.
Referring to fig. 3, fig. 3 is a block diagram of a live broadcast break detection apparatus according to an exemplary embodiment of the present application, including:
an obtaining unit 301, configured to obtain a live data stream of a target live broadcast room; the live data stream comprises a video stream and an audio stream corresponding to the video stream;
a segmentation unit 302, configured to perform segmentation processing on the video stream to obtain a plurality of video segments;
a matching unit 303, configured to extract features from the video segment, and determine whether the extracted features match anchor features corresponding to the target live broadcast room;
a detecting unit 304, configured to, when the extracted feature matches with an anchor feature corresponding to the target live broadcast room, convert an audio clip corresponding to the video clip into text information, and perform violation detection on the target live broadcast room based on the text information.
Optionally, the slicing unit 302 includes:
performing VAD voice activity detection on the audio stream to determine valid audio segments of the audio stream containing voice signals;
and segmenting the video stream based on the determined timestamp range corresponding to the effective audio segment to obtain the video segment corresponding to the effective audio segment.
Specifically, the features include human face features;
optionally, the matching unit 303 includes:
extracting video frames from the video clips, and extracting human face features from the video frames;
and determining whether the extracted face features are matched with the anchor face features corresponding to the target live broadcast room.
Optionally, before converting the audio stream corresponding to the video stream into text information, the method includes:
determining whether the number of video frames in the video clip matched with the anchor characteristics corresponding to the target live broadcast room is greater than a threshold value;
if yes, further converting the audio clip corresponding to the video clip into text information.
Optionally, the apparatus further comprises:
if the extracted features are not matched with the anchor features corresponding to the target live broadcast room, further extracting voiceprint features from the effective audio clip corresponding to the video clip;
determining whether the extracted voiceprint features are matched with anchor voiceprint features corresponding to the target live broadcast room;
if yes, converting the effective audio clip corresponding to the video clip into text information, and carrying out live broadcast violation detection on the target live broadcast room based on the text information.
Optionally, the apparatus further comprises:
performing intonation emotion recognition based on the effective audio segments;
and when the recognized emotion hits preset emotion, preferentially performing feature matching on the video segment corresponding to the effective audio segment.
Optionally, the detecting unit 304 includes:
performing word segmentation processing on the text information to obtain a plurality of keywords;
matching the keywords with the violation keywords in a preset violation keyword library respectively; and marking the violation keywords with corresponding violation types respectively.
And if the keywords are matched with any illegal keyword in the illegal keyword library, determining the illegal type corresponding to the illegal keyword as the illegal type of the target live broadcast room.
Optionally, the detecting unit 304 includes:
performing semantic recognition on the text information;
and performing live broadcast violation detection on the target live broadcast room based on the semantic recognition result.
Optionally, the apparatus further comprises:
if the target live broadcast room is determined to have the violation behavior, recording violation data into a violation database; wherein the violation data comprises a number of violations and/or the violation type;
scoring the live broadcast room based on violation data corresponding to the target live broadcast room recorded in the violation database; wherein the scoring mechanism comprises a deduction or accumulation of a score;
limiting the authority of the anchor when the score is below or above a threshold.
The implementation process of the functions and actions of the above devices is specifically described in the implementation process of the corresponding steps in the above method, and is not described herein again.
The foregoing description of specific embodiments of the present application has been presented. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
Other embodiments of the present application will be readily apparent to those skilled in the art from consideration of the present application and practice of the invention as claimed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.
It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.
The above description is only for the purpose of illustrating the preferred embodiments of the present application and is not to be construed as limiting the present application, and any modifications, equivalents, improvements, etc. made within the spirit and principle of the present application should be included in the scope of the present application.

Claims (10)

1. A live broadcast break detection method, comprising:
acquiring a live broadcast data stream of a target live broadcast room; the live data stream comprises a video stream and an audio stream corresponding to the video stream;
carrying out segmentation processing on the video stream to obtain a plurality of video segments;
extracting features from the video clips, and determining whether the extracted features are matched with anchor features corresponding to the target live broadcast room;
and if so, converting the audio clip corresponding to the video clip into text information, and carrying out violation detection on the target live broadcast room based on the text information.
2. The method of claim 1, wherein the segmenting the video stream into a plurality of video segments comprises:
performing VAD voice activity detection on the audio stream to determine valid audio segments of the audio stream containing voice signals;
and segmenting the video stream based on the determined timestamp range corresponding to the effective audio segment to obtain the video segment corresponding to the effective audio segment.
3. The method of claim 1, wherein the features comprise human face features;
the extracting features from the video clip and determining whether the extracted features match anchor features corresponding to the target live broadcast room includes:
extracting video frames from the video clips, and extracting human face features from the video frames;
and determining whether the extracted face features are matched with the anchor face features corresponding to the target live broadcast room.
4. The method of claim 3, prior to converting the audio segment corresponding to the video segment into text information, comprising:
determining whether the number of video frames in the video clip matched with the anchor characteristics corresponding to the target live broadcast room is greater than a threshold value;
if yes, further converting the audio clip corresponding to the video clip into text information.
5. The method of claim 1, further comprising:
if the extracted features are not matched with the anchor features corresponding to the target live broadcast room, further extracting voiceprint features from the effective audio clip corresponding to the video clip;
determining whether the extracted voiceprint features are matched with anchor voiceprint features corresponding to the target live broadcast room;
if yes, converting the effective audio clip corresponding to the video clip into text information, and carrying out live broadcast violation detection on the target live broadcast room based on the text information.
6. The method of claim 2, further comprising:
performing intonation emotion recognition based on the effective audio segments;
and when the recognized emotion hits preset emotion, preferentially performing feature matching on the video segment corresponding to the effective audio segment.
7. The method of claim 1, wherein the detecting violations for the target live broadcast room based on text information comprises:
performing word segmentation processing on the text information to obtain a plurality of keywords;
matching the keywords with the violation keywords in a preset violation keyword library respectively; the violation keywords are respectively marked with corresponding violation types;
and if the keywords are matched with any illegal keyword in the illegal keyword library, determining the illegal type corresponding to the illegal keyword as the illegal type of the target live broadcast room.
8. The method of claim 1, wherein the detecting violations for the target live broadcast room based on text information comprises:
performing semantic recognition on the text information;
and performing live broadcast violation detection on the target live broadcast room based on the semantic recognition result.
9. The method of claim 1, further comprising:
if the target live broadcast room is determined to have the violation behavior, recording violation data into a violation database; wherein the violation data comprises a number of violations and/or the violation type;
scoring the live broadcast room based on violation data corresponding to the target live broadcast room recorded in the violation database; wherein the scoring mechanism comprises a deduction or accumulation of a score;
limiting the authority of the anchor when the score is below or above a threshold.
10. A live room violation detection apparatus, the apparatus comprising:
the acquisition unit is used for acquiring a live broadcast data stream of a target live broadcast room; the live data stream comprises a video stream and an audio stream corresponding to the video stream;
the segmentation unit is used for segmenting the video stream to obtain a plurality of video segments;
the matching unit is used for extracting features from the video clips and determining whether the extracted features are matched with the anchor features corresponding to the target live broadcast room;
and the detection unit is used for converting the audio clip corresponding to the video clip into text information when the extracted features are matched with the anchor features corresponding to the target live broadcast room, and carrying out violation detection on the target live broadcast room based on the text information.
CN202110424189.7A 2021-04-20 2021-04-20 Live broadcast room violation detection method and device Active CN112995696B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110424189.7A CN112995696B (en) 2021-04-20 2021-04-20 Live broadcast room violation detection method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110424189.7A CN112995696B (en) 2021-04-20 2021-04-20 Live broadcast room violation detection method and device

Publications (2)

Publication Number Publication Date
CN112995696A true CN112995696A (en) 2021-06-18
CN112995696B CN112995696B (en) 2022-01-25

Family

ID=76341314

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110424189.7A Active CN112995696B (en) 2021-04-20 2021-04-20 Live broadcast room violation detection method and device

Country Status (1)

Country Link
CN (1) CN112995696B (en)

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113379557A (en) * 2021-06-30 2021-09-10 杭州东忠科技股份有限公司 Intelligent gift matching method and system based on space station and storage medium
CN113420677A (en) * 2021-06-25 2021-09-21 联仁健康医疗大数据科技股份有限公司 Method and device for determining reasonable image, electronic equipment and storage medium
CN113434561A (en) * 2021-06-24 2021-09-24 北京金山云网络技术有限公司 Live broadcast data verification method and system, electronic device and storage medium
CN113705370A (en) * 2021-08-09 2021-11-26 百度在线网络技术(北京)有限公司 Method and device for detecting illegal behavior of live broadcast room, electronic equipment and storage medium
CN113824986A (en) * 2021-09-18 2021-12-21 北京云上曲率科技有限公司 Context-based live broadcast audio auditing method and device, storage medium and equipment
CN113850184A (en) * 2021-09-22 2021-12-28 支付宝(杭州)信息技术有限公司 Method, device, equipment and readable medium for detecting video content
CN113949887A (en) * 2021-09-24 2022-01-18 支付宝(杭州)信息技术有限公司 Method and device for processing network live broadcast data
CN114005079A (en) * 2021-12-31 2022-02-01 北京金茂教育科技有限公司 Multimedia stream processing method and device
CN114125494A (en) * 2021-09-29 2022-03-01 阿里巴巴(中国)有限公司 Content auditing auxiliary processing method and device and electronic equipment
CN114245205A (en) * 2022-02-23 2022-03-25 达维信息技术(深圳)有限公司 Video data processing method and system based on digital asset management
CN114786038A (en) * 2022-03-29 2022-07-22 慧之安信息技术股份有限公司 Low-custom live broadcast behavior monitoring method based on deep learning
CN115086721A (en) * 2022-08-22 2022-09-20 深圳市稻兴实业有限公司 Ultra-high-definition live system service supervision system based on data analysis
CN115209174A (en) * 2022-07-18 2022-10-18 忆月启函(盐城)科技有限公司 Audio processing method and system
CN115209188A (en) * 2022-09-07 2022-10-18 北京达佳互联信息技术有限公司 Detection method, device, server and storage medium for simultaneous live broadcast of multiple accounts
CN115499678A (en) * 2022-09-20 2022-12-20 广州虎牙科技有限公司 Video live broadcast method and device and live broadcast server
WO2023045939A1 (en) * 2021-09-24 2023-03-30 北京沃东天骏信息技术有限公司 Live broadcast processing method, live broadcast platform, storage medium and electronic device
CN116109990A (en) * 2023-04-14 2023-05-12 南京锦云智开软件有限公司 Sensitive illegal content detection system for video
CN116822805A (en) * 2023-08-29 2023-09-29 深圳市纬亚森科技有限公司 Education video quality monitoring method based on big data

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108932451A (en) * 2017-05-22 2018-12-04 北京金山云网络技术有限公司 Audio-video frequency content analysis method and device
CN109508402A (en) * 2018-11-15 2019-03-22 上海指旺信息科技有限公司 Violation term detection method and device
CN110085213A (en) * 2019-04-30 2019-08-02 广州虎牙信息科技有限公司 Abnormality monitoring method, device, equipment and the storage medium of audio
CN111586421A (en) * 2020-01-20 2020-08-25 全息空间(深圳)智能科技有限公司 Method, system and storage medium for auditing live broadcast platform information

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108932451A (en) * 2017-05-22 2018-12-04 北京金山云网络技术有限公司 Audio-video frequency content analysis method and device
CN109508402A (en) * 2018-11-15 2019-03-22 上海指旺信息科技有限公司 Violation term detection method and device
CN110085213A (en) * 2019-04-30 2019-08-02 广州虎牙信息科技有限公司 Abnormality monitoring method, device, equipment and the storage medium of audio
CN111586421A (en) * 2020-01-20 2020-08-25 全息空间(深圳)智能科技有限公司 Method, system and storage medium for auditing live broadcast platform information

Cited By (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113434561A (en) * 2021-06-24 2021-09-24 北京金山云网络技术有限公司 Live broadcast data verification method and system, electronic device and storage medium
CN113420677A (en) * 2021-06-25 2021-09-21 联仁健康医疗大数据科技股份有限公司 Method and device for determining reasonable image, electronic equipment and storage medium
CN113420677B (en) * 2021-06-25 2024-06-11 联仁健康医疗大数据科技股份有限公司 Method, device, electronic equipment and storage medium for determining reasonable image
CN113379557A (en) * 2021-06-30 2021-09-10 杭州东忠科技股份有限公司 Intelligent gift matching method and system based on space station and storage medium
CN113705370A (en) * 2021-08-09 2021-11-26 百度在线网络技术(北京)有限公司 Method and device for detecting illegal behavior of live broadcast room, electronic equipment and storage medium
CN113705370B (en) * 2021-08-09 2023-06-30 百度在线网络技术(北京)有限公司 Method and device for detecting illegal behaviors of live broadcasting room, electronic equipment and storage medium
CN113824986A (en) * 2021-09-18 2021-12-21 北京云上曲率科技有限公司 Context-based live broadcast audio auditing method and device, storage medium and equipment
CN113824986B (en) * 2021-09-18 2024-03-29 北京云上曲率科技有限公司 Method, device, storage medium and equipment for auditing live audio based on context
CN113850184A (en) * 2021-09-22 2021-12-28 支付宝(杭州)信息技术有限公司 Method, device, equipment and readable medium for detecting video content
WO2023045939A1 (en) * 2021-09-24 2023-03-30 北京沃东天骏信息技术有限公司 Live broadcast processing method, live broadcast platform, storage medium and electronic device
CN113949887A (en) * 2021-09-24 2022-01-18 支付宝(杭州)信息技术有限公司 Method and device for processing network live broadcast data
CN114125494A (en) * 2021-09-29 2022-03-01 阿里巴巴(中国)有限公司 Content auditing auxiliary processing method and device and electronic equipment
CN114005079A (en) * 2021-12-31 2022-02-01 北京金茂教育科技有限公司 Multimedia stream processing method and device
CN114245205A (en) * 2022-02-23 2022-03-25 达维信息技术(深圳)有限公司 Video data processing method and system based on digital asset management
CN114245205B (en) * 2022-02-23 2022-05-24 达维信息技术(深圳)有限公司 Video data processing method and system based on digital asset management
CN114786038A (en) * 2022-03-29 2022-07-22 慧之安信息技术股份有限公司 Low-custom live broadcast behavior monitoring method based on deep learning
CN115209174B (en) * 2022-07-18 2023-12-01 深圳时代鑫华科技有限公司 Audio processing method and system
CN115209174A (en) * 2022-07-18 2022-10-18 忆月启函(盐城)科技有限公司 Audio processing method and system
CN115086721B (en) * 2022-08-22 2022-10-25 深圳市稻兴实业有限公司 Ultra-high-definition live system service supervision system based on data analysis
CN115086721A (en) * 2022-08-22 2022-09-20 深圳市稻兴实业有限公司 Ultra-high-definition live system service supervision system based on data analysis
CN115209188B (en) * 2022-09-07 2023-01-20 北京达佳互联信息技术有限公司 Detection method, device, server and storage medium for simultaneous live broadcast of multiple accounts
CN115209188A (en) * 2022-09-07 2022-10-18 北京达佳互联信息技术有限公司 Detection method, device, server and storage medium for simultaneous live broadcast of multiple accounts
CN115499678A (en) * 2022-09-20 2022-12-20 广州虎牙科技有限公司 Video live broadcast method and device and live broadcast server
CN115499678B (en) * 2022-09-20 2024-04-09 广州虎牙科技有限公司 Video live broadcast method and device and live broadcast server
CN116109990A (en) * 2023-04-14 2023-05-12 南京锦云智开软件有限公司 Sensitive illegal content detection system for video
CN116822805A (en) * 2023-08-29 2023-09-29 深圳市纬亚森科技有限公司 Education video quality monitoring method based on big data
CN116822805B (en) * 2023-08-29 2023-12-15 北京菜鸟无忧教育科技有限公司 Education video quality monitoring method based on big data

Also Published As

Publication number Publication date
CN112995696B (en) 2022-01-25

Similar Documents

Publication Publication Date Title
CN112995696B (en) Live broadcast room violation detection method and device
CN108566565B (en) Bullet screen display method and device
CN102222227B (en) Video identification based system for extracting film images
US9230547B2 (en) Metadata extraction of non-transcribed video and audio streams
Hauptmann et al. Story segmentation and detection of commercials in broadcast news video
CN105938716B (en) A kind of sample copying voice automatic testing method based on the fitting of more precision
US20090326947A1 (en) System and method for spoken topic or criterion recognition in digital media and contextual advertising
JP4216190B2 (en) Method of using transcript information to identify and learn the commercial part of a program
KR20120038000A (en) Method and system for determining the topic of a conversation and obtaining and presenting related content
CN107591116A (en) A kind of intelligent advisement player and its method of work based on recognition of face analysis
CN109739354B (en) Voice-based multimedia interaction method and device
CN111797820B (en) Video data processing method and device, electronic equipment and storage medium
CN113315988B (en) Live video recommendation method and device
CN112153397A (en) Video processing method, device, server and storage medium
JP6208794B2 (en) Conversation analyzer, method and computer program
CN109545232A (en) Information-pushing method, information push-delivery apparatus and interactive voice equipment
US11978457B2 (en) Method for uniquely identifying participants in a recorded streaming teleconference
CN114708869A (en) Voice interaction method and device and electric appliance
Watanabe et al. Coco-nut: Corpus of japanese utterance and voice characteristics description for prompt-based control
CN114125506B (en) Voice auditing method and device
CN107767862B (en) Voice data processing method, system and storage medium
CN110992984B (en) Audio processing method and device and storage medium
US12002460B2 (en) Information processing device, information processing system, and information processing method, and program
CN109710735B (en) Reading content recommendation method based on multiple social channels and electronic equipment
CN110099332B (en) Audio environment display method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant