WO2007128185A1

WO2007128185A1 - A system and method of media stream censorship and a node apparatus for generating censorship code stream

Info

Publication number: WO2007128185A1
Application number: PCT/CN2007/000115
Authority: WO
Inventors: Zhong Luo
Original assignee: Huawei Technologies Co., Ltd.
Priority date: 2006-04-30
Filing date: 2007-01-11
Publication date: 2007-11-15
Also published as: CN1968137A

Abstract

A system and method for censoring the media stream from the streaming media source and a censorship code stream generating node for this censorship system particularly in the multimedia communication process which relate to the multimedia communication technology, resolve how to reduce the use of the system resource when censoring the content deeply for the current streaming media services. The censorship system includes: a censorship code stream generating node, for delaying the media stream forwarded from the streaming media source, and generating the censorship code stream according to the media stream; a censorship center, for connecting the censorship code stream generating node through the communication, and censoring the censorship code stream received from the censorship code stream generating node, and transmitting the control signal for stopping forwarding the media stream to the censorship code stream generating node when determining the nocuous content is contained in the censorship code stream. The censorship code stream generating node main contains video, audio code stream delay module and video, audio censorship code stream generating module.

Description

Media stream review system, method and review code stream generating node device

The present invention relates to multimedia communication technologies, and in particular, to a system and method for reviewing media streams from a streaming media source during multimedia communication, and a review code stream generating node for the review system. Background technique

As a basic form of multimedia communication, Streaming Media has spawned many forms of multimedia communication services: conference television/visual telephone, IPTV (Internet Protocol Television, IP TV), VOD (Video on Demand, video on demand). ), instant messaging, etc. Therefore, streaming media will become the basic form of communication on the NGN (Next Generation Network). Especially in recent years, the rapid rise of IPTV services at home and abroad, the application of streaming media on the network is also developing rapidly.

One type of service on streaming media, such as IPTV and VOD, is designed to provide video and audio content. The content is very broad, including film and television programs, news, sports competitions, concerts and more. Various countries, especially China, have always attached great importance to the safety and monitoring of content, and all have relevant laws. From the perspective of protecting minors, countries also have relevant regulations. At the same time, there is such a need in operators/ISPs (Internet Service Providers) and content providers. The IPTV operation will be launched on a large scale in China. So the first question is how to ensure effective content monitoring and filtering to achieve filtering of harmful content.

For content security, the usual understanding consists of two sides:

1. For content protection, prevent content from being received by users without permission;

For example, to prevent theft of TV shows. For such intrusions, there are many mature technologies, such as Encryption and Scrambling, authentication and authentication, and Digital Right Management (DRM).

2. For the prevention of intrusion of harmful and illegal content, the object of protection is the object of content attack. Usually an audience.

This requires real-time review of the content, drawing on the current practice in the broadcast and television industry, mainly to stream the TV program (usually transmitted in accordance with DVB-T, Digital Video Broadcasting-Terrestrial, the compression format of video and audio is MPEG- 2. MPEG=Moving Picture Experts Group, an international standards organization, sets up inspection nodes on a path from a program source (such as a satellite) to a user's TV set or set-top box STB (Set-Top Box) for manual review. Of course, with the advancement of technology, some content reviews can be done automatically by the system or semi-automatically (human-machine combination). Manual review, once it is found that there is a problem with the content of the program, measures must be taken to stop the transmission of the program stream, and (in most cases) replace the program with harmful content, such as public service advertisements or subtitle announcements, with a temporarily replaced program. and many more. Of course, it is necessary to manually determine the context in which the content needs to be aggregated, and it takes a certain amount of time to react and handle, so there must be a delay device to provide this delay, such as 5 seconds.

The so-called content filtering is to process and judge certain attributes of the content. These content attributes may include: the name of the content provider, the URL of the content (Universal Resource Locator, such as a web address), and the IP address of the content providing server. Etc., and the packet header information of the packet in the case where the media stream is encapsulated in a packet, the information in the packet, and the like. It can be seen that this processing and filtering is also carried out in a hierarchy from shallow to deep.

The prior art one mainly performs content filtering according to external features of the content, or shallow features. The most typical example is URL filtering. The principle is shown in Figure 1. The content filtering device is located between the core network and the edge access network on the network, so that the media stream from the content source must reach between the receiving terminals. ^ Gateway, in practice, can be placed in the same network location as the agent of the enterprise network, NAT (Network Address Translator) / FW (Firewall, firewall), for the case of broadband home users, and B AS (Broadband Administration System, Broadband Registration and Admission System, DISLAM (Digital Subscriber Line Access Multiplexers), in the same location, Or put it on the POP's POP (Point of Presence). ■ The filtering device itself has an internal database with information about multiple content source URLs. According to this database, it is possible to determine whether a part of the content source is harmful, and to block harmful content sources and release harmless content sources. At the same time, there are many content rating service providers that provide third-party services. Their databases are more abundant and professional. Content filtering devices can also connect with such third-party service providers to use their services for URL filtering.

The prior art 1 has the following problems:

1. The problem of wrong killing: According to the URL filtering, the harmless content may be filtered out. For example, some websites provide video program on demand, some of which are harmful, but some are healthy movies, which cannot be distinguished according to the URL;

2, misplacement problems: Some URLs may be considered to be good sites in the tiered system, and may also have problems (being hacked to impersonate their websites, or their own illegal attempts, etc.);

3, using URL filtering, usually also requires a third-party rating system, such a rating system is available, some paid rating service providers specialize in providing rating services. But their scars are not completely accurate and exhaustive of everything on the web. And the content on the web is constantly changing, and it is impossible for any rating system to keep up with these changes in a timely manner.

For very demanding application scenarios, such as IPTV for the national public, it is best to be foolproof, so it is best to use the deepest content filtering, that is, the filtering of video and audio data itself, such as the recognition of images, identify harmful scenes. (Violence, pornography, etc.), harmful text information (subtitles), faces of specific people, etc.

In order to achieve a high filtering accuracy rate, it is necessary to go deeper to the deepest level, namely DPF (Deep Packet Filtering) of the content data itself.

The prior art 2 DPF is based on manual deep content setting. In this case, the content filtering device can decode the media stream and play the content (assuming that encryption is not a problem, because the encryption problem can be met by the legal listening requirement of the communication device. Solved), for manual review by the supervisor. If there is a problem, the monitor immediately takes measures to cut off harmful content and switch to a harmless content such as public service advertisements. Of course, after the content filtering device, there must be a relatively large capacity delay device to delay the harmful content, giving the monitoring personnel certain judgment and reaction processing time (for example, 5 seconds). Of course, this process can also be implemented by automatic or human-machine combined with semi-automatic methods. The implementation principle is shown in Figure 2, where the content over-the-counter device is set in the review process.

The basic idea of the prior art 2 is correct, and has been practically applied for many years in the field of broadcast television, and the effect is good. However, the content review for streaming media content services on the IP (Internet Protocol) network requires considerable improvement and improvement. The main problems are:

1. The situation of IP networks is much more complicated than that of broadcast TV networks in terms of structure and network topology. There are many more programs. If the programs being reviewed are transmitted to the network center, they will occupy too much communication resources.

2. The IP network has many content sources and many programs. If the centralized decoding process is performed during the review, the processing capacity of the decoding is too large, and the capacity of the review device is too high, and the existing equipment cannot be satisfied. Summary of the invention

The embodiment of the invention provides a review system and method for reviewing a media stream in a multimedia communication process, and a review code stream generating node for the review system, to solve the deep content review of the existing streaming media service. How to reduce the problem of system resource occupation.

The media stream content review system according to the embodiment of the present invention includes:

Examining a code stream generating node, configured to delay forwarding the media stream from the streaming media source, and generate a review code stream according to the media stream;

a reviewing center, the communication connection to the review code stream generating node, configured to review the review code stream received from the review code stream generating node, and to the review code when identifying that the review code stream contains harmful content The stream generation node sends a control signal to disconnect the media stream.

The review code stream generating node includes:

a first communication module, configured to communicate with the review center, send the review code stream to the review center, and receive the control signal;

And a main control module, configured to control operation of the review code stream generating node and execute a control signal of the review center.

The review code stream generating node further includes:

a video code stream delay module, configured to delay forwarding a video code stream in the media stream; a video review code stream generating module, configured to generate a corresponding video review code stream according to the video code stream, and send the same to the review center for review by the first communication module;

The first switch module is connected to the output end of the chirp code stream delay module, and the main control module triggers the first switch module to disconnect and forward the video code stream when the control signal is received.

The review code stream generating node further includes:

An audio stream delay module, configured to delay forwarding the audio stream in the media stream;

And an audio review code stream generating module, configured to generate a corresponding audio review code stream according to the audio code stream, and send the same to the review center for review by using the first communication module;

The second switch module is connected to the output end of the audio code stream delay module, and the main control module triggers the second switch module to disconnect the audio code stream when the control signal is received.

The embodiment of the present invention further provides a media stream content review method, including the following steps: the review code stream generating node generates a review code stream according to the media stream from the streaming media source, and sends the review code stream to the review center, and the The media stream is delayed after the set time is forwarded;

The review center identifies the review code stream within the set time, and sends a control signal to the review code stream generating node when the review code stream contains harmful content;

The review code stream generating node disconnects and forwards the media stream when receiving the control signal.

The technical solution provided by the embodiment of the present invention combines the review center and the review code stream generating node into a streaming media review system, and the review code stream generating node generates a review code stream with a relatively small amount of data according to a large number of original code streams and sends the same. Conducting in-depth content review for the review center to solve the content security problem of current streaming media services such as IPTV or digital television; the data volume of the review code stream generated by the review code stream generating node of the present invention is only a few tenths of the original code stream. One or less, taking up only a small amount of bandwidth, thus reducing the occupation of communication resources by the review code stream without burdening existing systems. DRAWINGS

FIG. 1 is a schematic diagram showing the principle of filtering based on a content-based URL;

2 is a schematic diagram of the principle of deep review of existing streaming media content;

3 is a schematic structural diagram of an examination system according to an embodiment of the present invention; 4 is a schematic diagram showing the relationship between a frame and a scene in a video sequence;

5 is a schematic diagram of correspondence between data packets in a scene, a frame, and a video code stream;

FIG. 6 is a schematic structural diagram of a review code stream generating node used in the review system of FIG. 3 according to an embodiment of the present invention. detailed description

The technical solution provided by the embodiment of the present invention is: setting the review code stream generating node to delay the media stream from the streaming media source after a set time, and then generating the review code stream according to the media stream from the streaming media source, and generating the generated review The code stream is sent to the review center for review; the review center identifies the review code stream within a set time, and sends a control signal to the review code stream generation node when it is identified that the review code stream contains harmful content, and the review code stream generation node receives The media stream is disconnected when the control signal is reached.

The principle is shown in Figure 3. The review center is set up in the operator's core network center. The review center communicates with the surrounding review code stream generation nodes. The review code stream generation node can be set separately or with other existing network entities. Merge settings, such as: Media Gateway or Media Resource Server. The networks that review the codestream generation nodes that may be associated are:

1. The operator's access network may have multiple;

2. There may be more than one core network of other operators;

3. Internet (Internet).

The content source of the streaming media content service (composed of a streaming media server and peripheral devices, such as a web server, a load balancing device, etc.) can be directly accessed through the operator's access network to the core network, or through other operators' cores. The network or the Internet accesses the core network of the operator. Of course, there is also a situation in which the content source may be deployed in the center of the carrier's core network, and the content review center is in the same network.

In addition to the last case described above, when the review code stream generating node is separately set, it can be connected before or after the media gateway. When the review code stream generating node and the media gateway are combined, the media stream gateway can directly bear the media gateway. Function, so the media stream from the content source must After each of the corresponding code stream generation nodes is generated, in general, the code stream generation node only performs code conversion, including two cases:

1. Transcoding across protocols, such as between MPEG-2 and MPEG-4, between MPEG-2 and H.264;

2. Coding conversion within the same protocol, such as in H.264, from high bit rate to low bit rate (access network requires lower bit rate, such as if the user is using ISDN access, 2B+D The maximum rate can only support up to 128kbps. If the program stream from the content source is 768kbps, then this conversion is required).

In the embodiment of the present invention, in addition to performing the above conventional functions, the review code stream generating node needs to be able to generate a review media stream to be sent to the review center according to the original content media stream. The necessity of doing this is: First, it is necessary to review the code stream, because the review code stream generation nodes are scattered at the edge of the network. It is impossible to send professional reviewers to review at each review code stream generation node location. It is concentrated in a few review centers. This requires the review code stream generation node to send a "condensed" or "summary" version of the content to the content review center. This review code stream requires that it contain all the information needed for content review of the original content stream, while at the same time - having the lowest possible bit rate, avoiding the extra overhead of the network due to censorship. It is acceptable if the overhead is only a fraction of a fraction of the original content stream or even lower.

The resulting review media stream can also be viewed as a special code conversion that requires:

1. Keep the information of the original content stream as much as possible, at least when viewing it manually, to determine the nature of the content;

2. The bit rate is as low as possible.

Therefore, in the embodiment of the present invention, according to the set coding conversion method, the examination code stream is generated by using the original code stream and sent to the examination center. In some cases, the generation of the proof stream requires decoding of the original stream and then special encoding; or partial decoding, special processing and encoding; or no decoding at all. In the case where decoding is required, the decoding is reused with the decoding required for the original code conversion of the review code stream generating node, or most of the reuse is used to minimize the computational burden. After the review media stream from each review code stream generating node is aggregated into the content review center, the review center can manually identify or automatically identify the review code stream. If it is harmful, take measures to order the review of the problem content. The code stream generating node performs the following processing:

1. Cut off the harmful content stream;

2. Replace the content stream with a harmless alternate, so that the user can still see the content. The most common way is to insert advertisements;

Of course, if the operator's network is large, the review center can also be set up hierarchically, and a review center is responsible for part of the network. Still referring to Figure 3, there is one of the highest level 1 review centers. In addition, the review experts in the highest review center have the highest authority. For the content in question, if the following level 2 review center cannot determine, it can be handed over to The Supreme Review Center uses manual identification to make decisions.

Level 2 or lower review centers can conduct automatic or semi-automatic content reviews. If it is not possible, there can be a level 1 review center to finalize the manual decision.

In the media stream, a video stream and a corresponding audio stream are included. First, a method for generating an audio review code stream is first provided. Based on the prior art, the method for generating an audio review code stream may include the following three types:

1. Every time period Τ, the audio data packet of the playback time length L is extracted from the original audio stream and sent to the content review center. Because the audio sample points per unit time are fixed, such as 44.1KHZ, etc., the packet and playback time are strictly corresponding. For example, each data packet is for 20ms or 30ms, etc., and the above method 4 is implemented. For the reviewers, they hear a length L of sound every time period T;

Second, because the harmful content added to the sound may be short, the cycle check is easy to miss. In this case, the entire audio stream can be sent, which is equivalent to forwarding a copy of the original audio stream to the content review center;

Third, in the case of all listening, in order to reduce the bandwidth consumption, you can encode and convert.

The audio coding part of the quasi-standard) is converted to G.723.1, so that the bit rate is reduced to 5.3/6.3 kbps. Although many broadband audio information (music, etc.) is lost, the harmful content is mainly human voice. Therefore, the impact is not great.

The video citation code stream generation method is more complicated and can include the following:

I. Method for generating video review code stream based on scene segmentation

Ideally, for each shot of the video, or the scene, a frame is taken to represent the image composition review video stream. The image changes in each scene are not very large, and such sampling is the best.

Referring to FIG. 4, FIG. 4 is a schematic diagram showing a relationship between a frame and a scene in a video sequence. For a video stream that passes through a filtering node, the video stream is first divided into different scenes (Scerie), and the original is composed of multiple frames (Frame). The video sequence is divided into sequence sequences of different scenes. A scene contains unequal numbers of frames, and each frame inside each scene is basically the same in the background and foreground, but there is a certain motion. Can be understood as a lens, when the lens is switched, a new scene is generated.

For split scenes, it must be stated that the scene was originally created when video content capture (lens switching) and production (adding effects such as 3D transition effects between two shots). Performing scene segmentation on the filtering node is to divide the code stream in the video stream into segments, each segment corresponding to the original scene. Of course, because the current scene recognition technology cannot achieve 100% recognition accuracy, the scenes that are finally split on the filter node may not be completely consistent with the scenes inherent in the video code stream, but the application of the embodiment of the present invention is not affected.

Referring to FIG. 5, FIG. 5 is a schematic diagram of the correspondence between data packets in a scene, a frame, and a video stream, because the video stream is sent from a device such as a streaming media server (Streaming Media Server), and is compressed. Packing (packageization, independent of the specific packaging protocol), the package is issued in chronological order, each package has a corresponding serial number or time stamp (Time Stamp, etc.), according to the information filtering node can correctly reconstruct the package The original order, so that the package and the scene correspond. Therefore, the final result is a scene corresponding to a series of video packets.

In fact, the content review only needs to review the first frame of each scene, so that all the scenes can be segmented, and all the frames between the first frame of one scene and the first frame of the next scene belong to the scene. In general, there is at least one I frame (intra-coded frame) in a scene, and the so-called I frame is for a P (predictive coded frame) frame and a B frame (bidirectional predictive coded frame). I frame encoding It is entirely up to itself, and does not need to rely on other frames, and the P frame depends on its previous reference frame to decode, and the B frame depends on the reference frame before and after it can be decoded. Therefore, the decoding of I frames is the simplest. As long as it is based on the discrete cosine transform DCT (Discrete Cosine Transform) transform + entropy coding idea compression coding standard, such as ITU H.26x series and MPEG series, I frame decoding only needs to perform anti-coeration, de-quantization and inverse DCT The transformation is fine, no motion compensation is required. Therefore, the amount of calculation for decoding is the least. Other types of frames, such as P-frames, to decode the P-frame from the video stream, need to decode the first few P-frames, up to an I-frame that is closest to it. But for an I frame, only the I frame itself needs to be decoded. Compared with the two phases, the complexity of decoding differs greatly. In fact, in the encoder, although the standard is generally not mandatory, in general, I frames are added when the scene changes, and the first frame of the scene is often an I frame. For the new standard such as H.264, there may not be a complete I frame in the video stream, but only a part of a frame for intra coding, such as a slice (Slice) can be independently intraframe coded. For the case where there may be no complete I-frames, some modified selection criteria can be defined: for example, selecting the frame with the most intra-coded strips or macroblocks MB (Macroblock). For general coding protocols, there are identification mechanisms to identify I-frames or intra-coded stripes. For example, in the ITU H.264 standard, it is identified by the Instantaneous Decoding Refresh (IDR) flag. Therefore, the filtering node can correctly extract the I frame or the intra-coded strip/macroblock according to these specific identifiers.

Of course, it is also possible to have multiple I frames in one scene (the lens is relatively long), so it is stipulated to select the first I frame. Therefore, a copy of the packet corresponding to the first I frame in each scene is forwarded to the content review center.

The scene segmentation technology used in the embodiments of the present invention generally includes the following two types:

1. Estimate the motion area in the image by using structural information (such as motion vector) in the video data packet to determine the motion area, motion direction, motion mode (one-way motion, reciprocating motion, etc.). Size, etc., to determine which frames are similar in motion mode, frames with similar motion patterns generally belong to the same scene;

2. Analyze the statistical information of the video stream, and regard the bit rate in the video stream as a random process with respect to time, and then perform statistical modeling (Statistical Modelling) to utilize the statistical model. The type estimates the start and end of the scene.

Both of the above technologies do not require decoding, and therefore have high efficiency.

Second, a method for generating a video review code stream according to all I frames in a video code stream

For a general encoding protocol, there is an identification mechanism to identify an I frame or an intra-coded stripe. For example, in the ITU H.264 standard, it is identified by the IDR (alternative decoding refresh) flag. Therefore, the review code stream generating node can correctly extract the I frame or the intra-coded strip/macroblock according to these specific identifiers. In the original code stream, the I frame is identified according to the flag contained in the packet header information of the I frame, and is sequentially copied and forwarded to the content review center.

After the video review code stream generated by the above two methods is sent to the review center, the review center can automatically identify some harmful content, and can further identify it by combining the manual identification method.

The specific identification method is: acquiring an I frame from the video review code stream at one time, decoding the I frame and restoring the I frame image, and then identifying the frame image, including the following two identification methods:

1. Manual identification, displaying the I frame image for human monitors to realize the manual filtering function; 2. Automatically identifying, inputting the I frame image into the automatic identification module, and using the harmful content database for automatic comparison identification, if found The playback of the video stream is immediately cut off and reported to the human monitor for processing. The harmful content that can be automatically identified in the prior art includes the following aspects:

a. Automatic recognition of harmful image content, such as obscenity, violence, etc., the image recognition technology belongs to mature prior art;

b. Identify harmful superimposed text or symbols. First, after processing, locate the area where the text or symbol is located, and then identify whether it is vertical or horizontal, then divide the text and background specifically, and finally send the result of the processing to an existing optical character recognition OCR.

The (Optical Character Recognition) module recognizes. The recognition result is matched with the database, and if the harmful judgment condition in the database is successfully matched, it is determined to be a harmful superimposed text or symbol, and the superimposed text or symbol recognition technology belongs to a mature prior art;

c. Identify a specific face that may exist in the image, and send the frame image directly to the existing face recognition module for recognition. Of course, there is already data in the database of the face recognition module. The monitoring department establishes itself, and can store various types of faces as needed: suspects, important people, terrorists, etc. The face recognition technology is a mature existing technology.

When manual identification and automatic recognition are used at the same time, the judgment condition can be defined:

1. The recognition result of the automatic identification module shall be taken as the standard.

2. It is completely based on the recognition results of human monitors.

3. In between, the two judgment results should be referred to at the same time, and the joint judgment should be given. One embodiment is: A weighted average based on scores. Automatic identification modules and human monitors not only determine whether they are harmful, but also give harmful scores, such as from 0-100. The higher the degree of damage, the higher the score, and 0 means harmless. Then add the score of the automatic identification module and the score of the human monitor to the following:

S ( W _M XS _M +W _H XSH ) / ( WM+WH )

Where W _M and W _H represent the weights of the automatic identification module and the human monitor. The relative size between the two indicates that the automatic recognition module is still more human or human, and S _M and S _H respectively represent the scores given by the automatic recognition module and humans. If the resulting composite score is greater than a given value, such as 50, then the joint judgment is harmful, otherwise it is harmless. If only one party identifies the harmful content and gives the score of the harmful content, you can default the other party. The score given for this content is 0.

Third, based on requantified DCT coefficient video review code stream generation method

The DCT coefficients of each frame are re-quantized, and the DCT coefficients extracted from the data packets are quantized by a new quantization step size larger than the original quantization step size to achieve a result of reducing the amount of data. The specific steps include:

1. Extract the DCT coefficients in each data packet and perform inverse quantization according to the quantization step size in the packet structure information;

2. Requantize according to the new quantization step size to obtain a new DCT coefficient quantization result;

3. Package the quantized result to the content review center, and pay attention to changing the quantization step size in the structure information in the data packet to the new quantization step size.

At the same time, it is also possible to further reduce the amount of data by discarding the DCT coefficients (U, V / CbCr components) corresponding to the chrominance components of the original video in the above steps 1-3. If calculated according to the 4:2:2 sampling rate, by discarding the chrominance component coefficients, at least half of the data amount can be reduced. Fourth, based on the reduction of image resolution video review code stream generation method

This technique can reduce the resolution of the image without completely decoding the original media stream, such as reducing the original CIF (Common Interchange Format) image to QCIF (Quarter CIF). In principle, the amount of data can be reduced. To 1 / r. r is a multiple of the resolution reduction, r = rh * rv, which is the product of the multiple of the resolution reduction in the horizontal direction and the vertical direction.

V. Video review code stream generation method based on reduced frame rate

The frame rate of the original content stream is reduced, typically by dropping P frames and B frames and PB frames in the video sequence. By dropping these frames, the frame rate is lowered, thereby reducing the bit rate.

Among the above five methods, the amount of the review code stream generated by either method is only a few tenths or less of the original code stream, so that it occupies only a small amount of bandwidth and does not burden the existing system.

In the above method 1, method 2 and method 5, when the review center reviews the video review code stream, for example, determining the nature of the scene in the video stream, locating the subtitle or other superimposed text symbol information, etc., several successive frames are required. Accurate analysis can achieve the purpose of identification. After the video review code stream obtained by the I frame or the first frame in the scene, or by reducing the frame rate, is sent to the review center, the review center needs to feedback whether it needs to supplement the adjacent frames before and after a certain frame during the review process. . If there are no such adjacent frames, the review center cannot accurately analyze and identify them. To avoid the above situation, the following solutions can be used:

1. For method 1 and method 2, according to experience, the I frame or the first frame and a certain number of adjacent frames may be directly intercepted to generate a video review code stream, for example, 2 frames or 3 frames adjacent to each other, generally No more than 5 frames;

2. For Method 1, Method 2, and Method 5, during the review process, for the content that cannot be clearly identified by the review center, the notification code stream generation node will need to further supplement the review of the set number of adjacent frame data packets sent. Give the review center.

After receiving the notification, the code stream generating node may intercept the corresponding adjacent frame data packet from the delayed original video stream and send it to the review center; or, in the review code stream generating node, create a cache module for each At time t, all video stream packets in the time window between time t and time t-δ are buffered. This time window is actually a sliding window, as time t changes, no Discard outdated data and include new data.

This cache module actually caches data of fixed time length δ, because the amount of data of the compressed code stream per unit time varies, so the amount of data in the cache also changes with time. Then, if the video review code stream packet generated at the time is received by the review center, if it is found that the data in the video review code stream cannot be accurately identified, the review center requests the video review code stream generation node to transmit the corresponding frame phase by communication. Neighboring frames, such as the previous lk _a frame, a total of 1 frame, the following lk _p frame, a total of k _p frames, because there is a time correspondence, and the number of frames per unit time is controllable, so the corresponding time can be obtained. The relationship, the time required is to -t to to +t _p . If the video review code generation node comes from the review center's request and starts processing at the time of to +t _d , then as long as:

t _a , < δ- t _d and tp, < t _d

The review code stream generating node can find the data packets corresponding to the required adjacent frames in the cache, and send the data packets to the review center. The parameter δ determines the amount of buffered data (although not strictly, because the amount of data in the compressed code stream varies per unit time, so the amount of data in the cache also changes with time.), the larger it is for the request before and after The higher the hit rate of the adjacent frame, but the greater the impact on the cost and efficiency of the video censorship generation node device, so there must be a compromise. In addition, with the development of related recognition processing technology, the required 13⁄4 and 13⁄4 will become smaller and smaller, and the corresponding ^ and t _p will be smaller and smaller, so the requirement for δ size will progress with the technology. Declining. It can even go to 0, which means no caching at all. The cache module determines which packets need to be cached according to each time t and the parameter δ, that is, based on the timestamp information on the data packet. Of course, in the implementation, how many frames before and after can be used as the unit of measurement. For example, it is stipulated to cache the k _a frame in front of the current processing frame and the following k _p frame.

In the above method, in the communication between the video review code stream generating node and the review center, support for the following signaling is added: The review center tells the video review code stream generating node that the request and corresponding data of the adjacent frame data need to be transmitted before and after. Parameters: and {or equivalent] 3⁄4 and ]3⁄4.

The above five methods can be used alone or in combination. For example, the video review code stream generated by the method 1 is further processed by the method 4 to reduce the resolution and then sent. There are many combinations of application modes, not here - description.

If the method for processing the delayed-transferred video code stream by the review code stream generating node is the same as the method for generating the video review code stream being used, the video review code stream can be directly delayed, and the repeated processing is not performed. As shown in FIG. 6, FIG. 6 is a schematic structural diagram of a verification code stream generating node 600 for implementing media stream content review according to an embodiment of the present invention, including:

The main control module 601 is a central module of the device, and functions to control the operation of all other modules, and is connected with all other modules or sub-modules;

The communication module 602, as a conventional review code stream generation node 600, is controlled by the control device of the review code stream generation node 600, and then the information exchange between the code stream generation node 600 and its control device is exchanged. The communication protocol used for the control command and the data report may be H.248/MGCP (Media Gateway Control Protocol) or the like. At the same time, as the review code stream generating node 600 having the content review function, it is also required to communicate with the media review center, accept the latter's control, and report the information to the latter. Because the signaling between content review devices is not defined in the International Standards Organization (because it is completely new), there are two ways to solve the signaling problem: completely redefine a set; or based on H. 248 MGCP and other protocols are extended to define new packages, profiles, and methods.

The review code stream generating node 600 also includes a first portion related to video review, the first portion specifically comprising:

1. The video stream delay module 6031 is configured to delay forwarding the video code stream from the streaming media server, and the length of the delay may be set, or a default delay time length may be adopted. The purpose is to reserve a sufficient time for the entire process of the video review code stream generation, transmission to the content review center, the reviewer to make judgments and responses, and to issue the command, and then to the review code stream generation node 600 to take the action after receiving the command;

2. The video review code stream generating module 6032 is configured to be connected to the communication module 602, configured to generate a corresponding video review code stream according to the video code stream from the streaming media server, and send the message through the communication module 602. The first switch module 6033 is connected to the main control module 601, and is configured to disconnect the video code stream according to the trigger signal of the main control module 601.

4. The video replacement source library 6034 is connected to the first switch module 6033. The first switch module 6033 turns on the video replacement source library 6034 while the video stream is disconnected, and the video replaces the source. The library 6034 holds a number of innocuous pieces of content that can be used to replace harmful content, such as public service advertisements, cue subtitles, animated shorts, and the like. The replaced command is issued by the main control module 601, of course, the main control module 601 is controlled under the command of the content review center, and the content review center command is sent to the main control module 601 through the communication module 602;

5. The video protocol conversion module 6035 is connected to the first switch module 6033 for performing coding protocol conversion on the video code stream output by the first switch module 6033. The module uses the prior art to perform an input video code stream. The required code conversion. Including but not limited to the following processing:

a, conversion between compression coding protocols, such as MPEG-2 to H.264;

b. Conversion between internal coding parameters of the same coding protocol: frame rate, spatial resolution, etc.; c. Error Resilient coding conversion: by reviewing the conversion of the code stream generation node 600, so that no transmission is originally transmitted. The input video stream of the protection measure plus the required transmission protection measures, such as FEC, etc.; or the adjustment of the original protection strength;

d, other coding conversion: such as inserting superimposed text and graphic symbols, such as local operator logo. Referring to FIG. 7, according to the foregoing video review code stream generating method, the video review code stream generating module 6032 may include at least one of the following sub-modules:

a first sub-module 60321, configured to: after performing scene segmentation on the video code stream, extract a start image frame and an end image frame of each scene, and then copy each scene to include a first frame or a first frame and A certain number of frames of data packets before and after, the certain number is determined according to factors such as recognition accuracy, and the video review code stream is generated by using the copied I frame data packet, and special processing is needed for the case where there is no complete I frame, for example, selecting an intraframe coding strip. The frame with the largest number of macroblocks;

a second sub-module 60322, configured to copy a data packet that includes each I frame or the I frame and a certain number of frames before and after the video code stream, and then generate the video review code stream by using the copied I frame data packet; a third sub-module 60323, configured to perform the process of reducing the spatial resolution of the video code stream to form the video review code stream; setting r is a multiple of the overall spatial resolution reduction, i=rh*rv, ie equal to The horizontal and vertical resolutions are reduced by the product of the multiples, and the input video stream does not need to be completely decoded. In general, rh and rv are integers, but rh and rv can also be arbitrary rational numbers (for irrational numbers such as sqrt ( 2 ), sqrt ( 3 ), etc., rational numbers can be used, such as sqrt ( 2 ) =1.414, sqrt (3)=1.732, etc.);

The fourth sub-module 60324 is configured to process the video code stream to reduce the frame rate to form the video review code stream, and the module does not need to completely decode the input video stream, and the reduced multiple r must be an integer;

The fifth sub-module 60325 is configured to re-quantize the video code stream by using an increased quantization step size to form the video review code stream.

Referring to FIG. 7, FIG. 7 shows a schematic diagram of one of the parallel connections. In fact, the video review code stream generating module 6032 can complete the purpose of the embodiment of the present invention as long as one of the sub-modules is included.

Each of the above sub-modules stores a corresponding processing program, and is respectively connected to the main control module 601. The main control module 601 can call one of them to generate a video censorship stream, or call a sub-module pair in which the application can be combined. The video review code stream is processed step by step to minimize the amount of data contained in the video censorship stream.

Referring to FIG. 6 and FIG. 7, when the video review code stream is generated by the first submodule 60321, the second submodule 60322, or the fifth submodule 60325, the first submodule 60321, the second submodule 60322, or the fifth submodule The 60325 is further connected to the video stream delay module 6031. When receiving the notification from the review center, the data packet required for the supplementary review is intercepted and sent to the review center through the communication module 602.

Of course, as mentioned above, the video review code stream generation module 6032 may also include:

The data cache module is respectively connected to the data of the fixed time length δ before the first sub-module 60321, the second sub-module 60322 or the fifth sub-module 60325, and the video review code stream of the Dangdang is generated by the video censorship stream. When a sub-module 60321, the second sub-module 60322, or the fifth sub-module 60325 is generated, if the review center needs to cache data, the first sub-module 60321, the second sub-module Block 60322 or fifth sub-module 60325 retrieves the corresponding data packet from the data cache module.

If the third stream module 60323, the fourth sub-module 60324, or the fifth sub-module 60325 are already included in the review code stream generating node 600, the output of the already included sub-module may also be directly connected to the input of the video stream delay module 6031. In this way, according to the control of the main control module 601, when the video review code stream is generated in the same manner as the conventional protocol conversion process of the video code stream, in order to avoid repeated processing, according to the control of the main control module 601, the video protocol conversion The module 6035 can stop working, and the video stream delay module 6031 directly delays the video review code stream to the destination.

The review code stream generating node 600 may further include a second part related to the audio review, the second part specifically including:

6. The audio stream delay module 6041 is configured to delay forwarding the audio stream from the streaming server; the length of the delay may be set, or a default delay length may be used. The purpose is to reserve sufficient time for the audio review code stream to be generated, transmitted to the content review center, the reviewer to make judgments and responses, and to issue commands, and then to review the code stream generation node 600 to take action after receiving the command. ;

The audio review code stream generating module 6042 is configured to be connected to the communication module 602, configured to generate a corresponding audio review code stream according to the audio code stream and send the data through the communication module 602.

The second switch module 6043 is connected to the main control module 601 for disconnecting the audio code stream according to the trigger signal of the main control module 601.

9. The audio replacement source library 6044 is connected to the second switch module 6043, and the second switch module 6043 turns on the audio replacement source library 6044 while disconnecting the audio stream; the audio replacement source The library 6044 holds a plurality of innocuous pieces of audio content that can be used to replace harmful content, such as music, voice public service advertisements, and the like. The replaced command is sent by the main control module 601. Of course, the main control module 601 is controlled under the command of the content review center, and the content review center command is sent to the main control module 601 through the communication module 602;

10. The audio protocol conversion module 6045 is connected to the second switch module 6043 for performing encoding protocol conversion on the audio code stream output by the second switch module 6043.

According to the foregoing audio review code stream generating method, at least the audio review code stream generating module 6042 Includes one of the following submodules:

The audio stream module 60421 _¾ replicon for replication of said audio code streams directly examine the audio stream;

The audio stream excerpt sub-module 60422 is configured to select the audio code stream according to the set interval time period, and generate the audio review code stream by using the selected audio code stream;

The audio stream encoding conversion sub-module 60423 is configured to re-encode the audio stream into a low bit rate audio review code stream.

Referring to FIG. 8, FIG. 8 shows a connection diagram including three modules at the same time. In fact, the audio review code stream generation module 6042 can implement the purpose of the embodiment of the present invention as long as one of the sub-modules is included.

Similarly, any one of the above submodules stores a corresponding processing program, and is simultaneously connected to the main control module 601. The main control module 601 can call one of the submodules to perform current processing, wherein the output of each submodule can directly connect to the audio. At the input end of the delay module, when the audio review code stream generation method and the audio protocol conversion module 6045 adopt the same processing method, in order to avoid repeated processing, the audio protocol conversion module 6045 stops working according to the control of the main control module 601, and the audio delay module Forward the audio review code stream delay directly to the destination.

The review code stream generating node 600 of the embodiment of the present invention further includes:

11. The content recording module 605 is configured to record and save the media stream of the specified time period, and record the specified time range of the specified video and audio stream for later retrieval (such as searching for illegal crime evidence, etc.). The Content Review Center can specify which video or audio stream to record for, by when, and when.

12. The log report module 606 is connected to another module or sub-module of the review code stream generating node 600 for generating an operation log and a report generation of the review code stream generating node 600.

13. The video bottom layer processing common module 607 is connected to each of the sub-modules included in the video protocol conversion module 6035 and the video review code stream generating module 6032 for storing the common processing program. Because in the process of video transcoding, different technologies may use some common technical modules. Such as squatting, entropy coding and entropy decoding, quantization and inverse quantization, DCT and inverse DCT. Also in trial These common modules are also needed during the code stream generation process. Therefore, in order to improve efficiency, these technologies are extracted to form a common module.

14. The audio underlay processing common module 608 is connected to the audio protocol conversion module 6045 and the audio review code stream generation module 6042, respectively, for storing a common processing program. Because in the audio coding conversion process, different technologies may use some common technical modules. Examples include downsampling, entropy coding and chirp decoding, quantization and inverse quantization, prediction parameter estimation, and so on. These common modules are also needed during the review of the code stream generation process. Therefore, in order to improve efficiency, these technologies are extracted to form a common module.

The review center can adopt a hierarchical setting method, and each level of review center generally includes the following main structures:

a main control module for controlling the operation of the review center;

a second communication module, configured to communicate with the first communication module of the review code stream generating node 600, receive the review code stream and send the control signal; and communicate with the upper-level review center; decode the restore module, Decoding and restoring the review code stream; of course, the video review code stream decoding atomic module and the audio review code stream decoding atomic module are respectively included herein;

a harmful content identification module, configured to identify whether the restored review code stream contains harmful content; the decision module is connected to the harmful content identification module, configured to generate the control signal when the harmful content is identified and pass the second communication module Sended to the review code stream generating node 600.

Correspondingly, the harmful content identification module may also include a harmful video content recognition submodule and a harmful sound recognition submodule respectively;

According to a specific identification manner, the video harmful content recognition sub-module may include: an automatic identification sub-unit for automatically identifying harmful content; and/or a manual identification unit for manually identifying harmful content, in which case it must be included An operating unit for receiving a recognition result;

The audio harmful content recognition includes an audio playback device and an operation unit for receiving the recognition result.

For the content source inside the core network, the network location is closer to a content review center, and the content media stream sent by the content source first passes through the content review center, and then reaches the review code stream generation section. Point 600. In this case, if the review code stream is generated by the following code stream generation node 600, the efficiency is relatively low. In order to solve this problem, a review code stream generating node 600 may be built in the review center connected to the content source, or a review code stream generating node 600 may be connected between the content source and the review center for generating the content source. Review the code stream and send it to the review center.

In the above method of the embodiment of the present invention, the specific grading standard of the harmful content and the corresponding identification standard are determined according to the actual application scenario, and the specific standard or the identification method does not limit the protection scope of the embodiment of the present invention.

The application of the embodiments of the present invention can solve the content security problem in the multimedia services such as IPTV and digital television, and ensure the security and reliability of the services provided by the telecom broadcasting and television operators/content providers and equipment manufacturers. The benefits also ensure that the society, especially the minors, are not poisoned by harmful video content, and the social benefits are more significant.

It is apparent that those skilled in the art can make various modifications and changes to the embodiments of the present invention without departing from the spirit and scope of the embodiments of the present invention. Therefore, it is intended that the present invention cover the modifications and variations of the embodiments of the present invention.

Claims

Rights request

A media stream content review system, comprising:

At least one review code stream generating node, configured to delay forwarding the media stream from the streaming media source, and generate a review code stream according to the media stream;

2. The system of claim 1, wherein the review code stream generating node comprises:

And a main control module, configured to control operation of the review code stream generating node and execute a control signal from the review center.

3. The system according to claim 2, wherein the review code stream generating node comprises:

a video code stream delay module, configured to delay forwarding a video code stream in the media stream;

a video review code stream generating module, configured to generate a corresponding video review code stream according to the video code stream, and send the review to the review center for review by the first communication module;

The first switch module is connected to the output end of the video code stream delay module, and the main control module triggers the first switch module to disconnect and forward the video code stream when the control signal is received.

4. The system according to claim 3, wherein the video review code stream generating module comprises at least one of the following:

a first sub-module, configured to: after performing scene segmentation on the video code stream, copy a data packet or a first frame and a data packet of a certain number of frames in each scene, and then copy the data The packet sequence is combined into the video review code stream;

a second submodule, configured to copy a data packet that includes each intra-coded frame in the video code stream Or the intra-coded frame and the data packets of a certain number of frames before and after, and then the copied data packets are sequentially combined into the video review code stream;

a third submodule, configured to perform the process of reducing spatial resolution of the video code stream to form the chirp frequency review code stream;

a fourth sub-module, configured to process the video code stream to reduce the frame rate to form the video review code stream;

And a fifth submodule, configured to re-quantize the video code stream by using an increased quantization step size to form the video review code stream.

The system according to claim 4, wherein each of the included submodules is included when the video review code stream generating module includes the first submodule, the second submodule, or the fifth submodule The video stream delay module is connected at the same time; or each of the included sub-modules is connected with a data buffer module for buffering the video code stream of the set period.

The system according to claim 4, wherein when the video review code stream generating module includes a third sub-module, a fourth sub-module or a fifth sub-module, the review code stream generation The node further includes: a video bottom layer processing common module, which is respectively connected to the third submodule, the fourth submodule or the fifth submodule for storing the common video processing program.

The system according to claim 2 or 3, wherein the review code stream generating node comprises:

An audio review code stream generating module, configured to generate a corresponding audio review code stream according to the audio code stream, and send the review to the review center for review by the first communication module;

The system according to claim 7, wherein the review code stream generating node further comprises: an audio underlay processing common module, and respectively connected to the audio review code stream generating module, configured to store the common audio processing program.

The system according to claim 7, wherein said audio review code stream generating module At least one of the following is included:

An audio stream copy sub-module, configured to directly copy the audio code stream into the audio review code stream;

An audio stream excerpt sub-module, configured to select the audio code stream according to the set interval time period, and sequentially combine the selected audio code stream into the audio review code stream;

An audio stream encoding conversion sub-module for re-encoding the audio stream into a low bit rate audio review code stream.

The system according to claim 2, wherein the review code stream generating node comprises: a content recording module, configured to record a media stream of a specified time period.

The system according to claim 3 or 4, wherein the review code stream generating node comprises:

a video replacement source library, connected to an input end of the first switch module, the first switch module turns on the video replacement source library while disconnecting the video code stream; and/or

The video protocol conversion module is configured to connect the output end of the first switch module to perform processing of encoding protocol conversion on the video code stream output by the first switch module.

The system according to claim 7, wherein the review code stream generating node comprises:

The audio replacement chip source library is connected to the input end of the second switch module, and the second switch module turns on the audio replacement chip source library while disconnecting the audio code stream; and/or

The audio protocol conversion module is connected to the output end of the second switch module for performing encoding protocol conversion processing on the audio code stream output by the second switch module.

The system according to claim 2, wherein the review code stream generating node further comprises: a log reporting module, which is respectively connected to other modules of the system, for generating a running operation of the system.

The system according to claim 2, wherein the first communication module further communicably connects to the control device of the review code stream generating node.

.

15. The system of claim 2, wherein the review center comprises: a second communication module, configured to communicatively connect the first communication module, receive the review code stream, and send the control signal;

a decoding and restoration module, configured to decode and restore the review code stream;

a harmful content identification module, configured to identify whether the restored review code stream contains harmful content; a decision module connected between the harmful content identification module and the second communication module, configured to generate the control when the harmful content is identified The signal is sent through the second communication module.

The system of claim 15, wherein the harmful content identification module comprises:

Automatic identification sub-module for automatic identification of harmful content; and / or

Manual identification sub-module for manual identification of harmful content.

The system according to claim 15 or 16, wherein the review center hierarchical setting is used to level-check whether the media stream contains harmful content.

18. An audit code stream generating node, comprising:

Communications module, used to implement _{communications;:} video stream delay module for delaying the forwarding video code stream from a streaming source;

a video review code stream generating module, configured to generate a corresponding video review code stream according to the video code stream from the streaming media source, and send the data through the communication module;

a main control module, configured to output, by using the communication module, a trigger signal when the control signal for forwarding the video code stream is disconnected;

The first switch module is connected to the output end of the video stream delay module, and is configured to disconnect and forward the video code stream according to the trigger signal of the main control module.

The review code stream generating node according to claim 18, wherein the review code stream generating node further comprises:

An audio stream delay module, configured to delay forwarding an audio stream from a streaming media source;

An audio review code stream generating module, configured to generate a corresponding audio review code stream according to the audio code stream and send the data through the communication module;

a second switch module, connected to an output end of the audio stream delay module, where the main control module receives And when the control signal is received, the trigger signal is also output to the second switch module, and the second switch module turns off the audio code stream according to the trigger signal.

The review code stream generating node according to claim 18, wherein the video review code stream generating module comprises at least one of the following:

a dice module, configured to: after performing scene segmentation on the video code stream, copy a data packet or a first frame and a data packet of a certain number of frames in each scene, and then copy the data packet Sequentially combining the video review code streams;

a second submodule, configured to copy a data packet including each intra-coded frame in the video code stream or a data packet of the intra-coded frame and a certain number of frames before and after, and then sequentially combine the copied data packets into the Video review code stream;

a third submodule, configured to perform the process of reducing spatial resolution of the video code stream to form the video censorship code stream;

a fourth submodule, configured to process the video code stream to reduce the frame rate to form the video review code stream;

The review code stream generating node according to claim 20, wherein when the video review code stream generating module includes the first submodule, the second submodule or the fifth submodule, each included A sub-module is simultaneously connected to the video stream delay module; or each sub-module included is previously connected with a data buffer module for buffering a video code stream of a set period.

The review code stream generating node according to claim 20, wherein when the video review code stream generating module includes a third sub-module, a fourth sub-module or a fifth sub-module, the The review code stream generating node further includes: a video bottom layer processing common module, which is respectively connected to the video code stream delay module and each of the included sub-modules for storing the common video processing program.

The review code stream generating node according to claim 19, wherein the audio review code stream generating module includes at least one of the following:

An audio stream copy submodule, configured to directly copy the audio code stream into the audio review code Flow

The review code stream generating node according to claim 19, wherein the review code stream generating node further comprises: an audio bottom layer processing common module, respectively connected to the audio code stream delay module and the audio review code stream generating module , used to store public audio handlers.

The review code stream generating node according to claim 18 or 19, wherein the review code stream generating node further comprises: a content recording module, configured to record the media stream of the specified time period.

a video replacement chip source library, connected to an input end of the first switch module, the first switch module turns on the replacement video source library while disconnecting the video code stream; and/or

The video protocol conversion module is connected to the output end of the first switch module for performing encoding protocol conversion processing on the video code stream output by the first switch module.

The review code stream generating node according to claim 19, wherein the review code stream generating node further comprises:

The audio replacement chip source library is connected to the input end of the second switch module, and the second switch module turns on the replacement audio source library while disconnecting the audio code stream; and/or

The review code stream generating node according to claim 18, wherein the review code stream generating node further comprises: a log reporting module, which is respectively connected to other modules or sub-modules of the review code stream generating node, and is used for Generate a running log of the review code stream generation node.

29. A media stream content review method, comprising the steps of:

The review code stream generating node delays the media stream from the streaming media source after a set time, and Generating a review code stream according to the media stream from the fluid medium source and transmitting the review code stream to the review center; the review center identifies the review code stream within the set time, and when the review code stream includes harmful content Examining the code stream generating node to send a control signal;

30. The method of claim 29, wherein the media stream comprises a chirp stream and an audio stream, and the censored stream comprises a video censorship stream and an audio censorship stream.

The method according to claim 30, wherein the method for generating the video review code stream comprises:

After performing scene segmentation on the video code stream, copying the data packet of the first frame or the first frame and the data packets of a certain number of frames before and after each scene, and then sequentially combining the copied data packets into the video. Review the code stream;

Copying, in the video code stream, a data packet including each intra-coded frame or a data packet of the intra-frame coded frame and a certain number of frames before and after, and then sequentially combining the copied data packets into the video review code stream; or

The video code stream is processed by reducing the frame rate to form the video review code stream.

The method according to claim 31, wherein the method further comprises: when the review center identifies the review code stream within the set time, notifying the review code for a set frame requiring supplementary review The stream generation node supplements a number of adjacent frame data packets that send the set frame;

The review code stream generating node intercepts the corresponding adjacent frame data packet from the delayed transmitted video code stream and sends it to the review center.

The method according to claim 31, wherein the method for generating the video review code stream further comprises:

Forming the video review code stream by performing the process of reducing the spatial resolution of the 枧 code stream; or

The video review code stream is re-quantized by using an increased quantization step size to form the video review code stream.

34. The method of claim 30, wherein the generating method of the audio review code stream comprises:

Copying the audio code stream directly into the audio review code stream;

Selecting the audio stream according to the set interval period, and generating the audio review code stream by using the selected audio stream; or

The audio code stream is re-encoded into a low bit rate audio review code stream.