CN110493615A

CN110493615A - A kind of live video monitoring method and electronic equipment

Info

Publication number: CN110493615A
Application number: CN201810462293.3A
Authority: CN
Inventors: 李振华; 张文明; 陈少杰
Original assignee: Wuhan Douyu Network Technology Co Ltd
Current assignee: Wuhan Douyu Network Technology Co Ltd
Priority date: 2018-05-15
Filing date: 2018-05-15
Publication date: 2019-11-22

Abstract

Present disclose provides a kind of live video monitoring method and electronic equipment, method includes: to be decoded to live video, obtains live video voice data and live video image data；Live video voice data is identified, whether to determine in live video voice data including sensitive content；Live video image data is compared with pre-stored image data, to determine the similarity of live video image data and pre-stored image data；When live video voice data includes sensitive content, and/or, the similarity of live video image data and pre-stored image data is greater than threshold value, is monitored to live video.

Description

A kind of live video monitoring method and electronic equipment

Technical field

This disclosure relates to a kind of live video monitoring method and electronic equipment.

Background technique

It is increasing with live streaming, it is possible that some relate to live content that is yellow, relating to sudden and violent, related to gambling activities, relate to political affairs etc., need Unsound direct broadcasting room is closed in time, is purified Internet environment, and a large amount of direct broadcasting room such as game, live teaching broadcast etc. do not need real prison Control, thus propose monitoring emphasis room concept, monitoring personnel in monitor video, can by mark key monitoring room, from In the direct broadcasting room of magnanimity, select it is possible that relate to it is yellow, a small amount of direct broadcasting room such as relate to sudden and violent, related to gambling activities, relate to political affairs, and then to these emphasis rooms Between be monitored, improve working efficiency, purify live content.

But above-mentioned the method that live video is monitored is had the following deficiencies:

Human cost needed for artificial monitoring not only is larger, but also when direct broadcasting room quantity is more, each direct broadcasting room it is straight The monitoring precision for broadcasting video is lower, so that the being found property of the live video of flame is lower, good can not realize Prevent the propagation of network flame.

Summary of the invention

The disclosure combines the identification based on voice and based on video image, reaches better recognition effect.It is obtaining When taking live video, the voice data and net cast screenshot between net cast are respectively obtained, is judged in the audio, video data Audio data whether be related to sensitive content, and itself and pre-stored image similarity are calculated, if being related to sensitive content or two figures It is then emphasis room this direct broadcasting room label as similarity is higher, if not being related to sensitive content or two image similarities are low, For common direct broadcasting room, monitoring personnel carries out normal direct broadcasting room to discover whether that there are emphasis rooms with machine monitoring, and if so, Emphasis room is added, key monitoring finally is carried out to all emphasis rooms.

An aspect of this disclosure provides a kind of live video monitoring method, comprising: is decoded, obtains to live video To live video voice data and live video image data；The live video voice data is identified, to determine Whether state in live video voice data includes sensitive content；The live video image data and pre-stored image data are carried out Compare, with the similarity of the determination live video image data and pre-stored image data；When the live video voice data Including sensitive content, and/or, the similarity of the live video image data and pre-stored image data is greater than threshold value, to described Live video is monitored.

Optionally, before being identified to the live video voice data, comprising: use the complete set of adaptive noise Empirical mode decomposition method decomposes the live video voice data, obtains at least one intrinsic mode function；Using Independent component analysis method handles at least one described intrinsic mode function, isolates at least one voice signal point Amount；At least one described voice signal components are reconstructed, the live video voice data enhanced.

Optionally, the live video image data is compared with pre-stored image data, with the determination live streaming view The similarity of frequency image data and pre-stored image data, comprising: calculate separately the live video image data and pre-stored image The characteristic similarity of textural characteristics similarity and Riemann manifold between data is flowed according to the textural characteristics similarity and Riemann The characteristic similarity of shape determines the comprehensive similarity of the live video image data and pre-stored image data.

Optionally, the live video image data is compared with pre-stored image data, with the determination live streaming view The similarity of frequency image data and pre-stored image data, comprising: obtain the first texture of the live video image data Q respectively Second textural characteristics of feature and pre-stored image data I；Calculate the similarity of first textural characteristics and the second textural characteristics S_texture:

Wherein,For the texture feature vector of first textural characteristics,It is special for the texture of second textural characteristics Vector is levied, n is feature vector number；Calculate the feature of the Riemann manifold of the live video image data and pre-stored image data Similarity S_manifold；Determine the comprehensive similarity S (Q, I) of the live video image data Q and pre-stored image data I:

S (Q, I)=ω_ZS_manifold(Q,I)+ω_tS_texture(Q,I)

Wherein, ω_ZAnd ω_tIt is two adjustable weights, and meets ω_Z+ω_t=1.

Optionally, when the live video voice data includes sensitive content, and/or, the live video image data It is greater than threshold value with the similarity of pre-stored image data, first level monitoring is carried out to the live video；When the live video Voice data does not include sensitive content, also, the live video image data and the similarity of pre-stored image data are less than etc. In threshold value, second level monitoring is carried out to the live video.

Another aspect of the disclosure provides a kind of electronic equipment, comprising: processor；Memory is stored with calculating Machine executable program, the program by the processor when being executed, so that the processor executes: carrying out to live video Decoding, obtains live video voice data and live video image data；The live video voice data is identified, with Determine in the live video voice data whether include sensitive content；By the live video image data and pre-stored image number According to being compared, with the similarity of the determination live video image data and pre-stored image data；When the live video language Sound data include sensitive content, and/or, the similarity of the live video image data and pre-stored image data is greater than threshold value, The live video is monitored.

Optionally, it before the processor identifies the live video voice data, also executes: using adaptive The complete set empirical mode decomposition method of noise decomposes the live video voice data, obtains at least one eigen mode State function；At least one described intrinsic mode function is handled using independent component analysis method, isolates at least one Voice signal components；At least one described voice signal components are reconstructed, the live video voice data enhanced.

Optionally, the live video image data is compared by the processor with pre-stored image data, with determination The similarity of the live video image data and pre-stored image data, comprising: calculate separately the live video image data The characteristic similarity of textural characteristics similarity and Riemann manifold between pre-stored image data, it is similar according to the textural characteristics Degree and the characteristic similarity of Riemann manifold determine the comprehensive similarity of the live video image data and pre-stored image data.

Optionally, the live video image data is compared by the processor with pre-stored image data, with determination The similarity of the live video image data and pre-stored image data, comprising: obtain the live video image data Q respectively The first textural characteristics and pre-stored image data I the second textural characteristics；It calculates first textural characteristics and the second texture is special The similarity S of sign_texture:

S (Q, I)=ω_ZS_manifold(Q,I)+ω_tS_texture(Q,I)

Wherein, ω_ZAnd ω_tIt is two adjustable weights, and meets ω_Z+ω_t=1.

Optionally, the processor executes: when the live video voice data includes sensitive content, and/or, it is described straight The similarity for broadcasting video image data and pre-stored image data is greater than threshold value, carries out first level monitoring to the live video； When the live video voice data does not include sensitive content, also, the live video image data and pre-stored image data Similarity be less than or equal to threshold value, to the live video carry out second level monitoring.

Detailed description of the invention

In order to which the disclosure and its advantage is more fully understood, referring now to being described below in conjunction with attached drawing, in which:

Fig. 1 diagrammatically illustrates the flow chart of live video monitoring method according to an embodiment of the present disclosure.

Fig. 2 diagrammatically illustrates the block diagram of live video monitoring system according to an embodiment of the present disclosure.

Fig. 3 diagrammatically illustrates the block diagram of the electronic equipment according to the disclosure.

Specific embodiment

According in conjunction with attached drawing to the described in detail below of disclosure exemplary embodiment, other aspects, the advantage of the disclosure Those skilled in the art will become obvious with prominent features.

In the disclosure, term " includes " and " containing " and its derivative mean including rather than limit；Term "or" is packet Containing property, mean and/or.

In the present specification, following various embodiments for describing disclosure principle only illustrate, should not be with any Mode is construed to limitation scope of disclosure.Referring to attached drawing the comprehensive understanding described below that is used to help by claim and its equivalent The exemplary embodiment for the disclosure that object limits.Described below includes a variety of details to help to understand, but these details are answered Think to be only exemplary.Therefore, it will be appreciated by those of ordinary skill in the art that without departing substantially from the scope of the present disclosure and spirit In the case where, embodiment described herein can be made various changes and modifications.In addition, for clarity and brevity, The description of known function and structure is omitted.In addition, running through attached drawing, same reference numbers are used for identity function and operation.

As shown in Figure 1, method specifically includes following operation:

S1 obtains live video.

In above-mentioned steps, the video being broadcast live can be obtained in real time.Such as in numerous portals live streaming platform, pipe Reason person can constantly intercept the video flowing of each direct broadcasting room in the caching on backstage.Flow is larger or inadequate buffer space In the case where, can also temporally threshold extraction direct broadcasting room live video.It can also be directed to the difference of viewing number, regulation is not With the acquisition dynamics of live video.Above it is the schematical video acquiring method of the disclosure, but is also not necessarily limited to this.

S2 is decoded live video, obtains live video voice data.

It is to separate the audio data in live video by decoded mode, what is obtained is straight in above-mentioned steps Broadcasting video speech data is the audio data during live streaming, such as can be the sound of main broadcaster, the background sound during being broadcast live Deng.

S3 is decoded live video, obtains live video image data.

It is to separate the image data in live video by decoded mode, what is obtained is straight in above-mentioned steps Broadcasting video image data is the image data during live streaming, such as can be the portrait of main broadcaster, the scene figure in game live streaming As etc..

Whether S4 identifies live video voice data, to determine in live video voice data including in sensitivity Hold.

In above-mentioned steps, using the complete set empirical mode decomposition method (Complementary of adaptive noise EnsembleEmpirical Mode Decomposition with Adaptive Noise, CEEMDAN) to live video language Sound data are decomposed, at least one intrinsic mode function is obtained, and can effectively be reduced and be calculated cost and modal overlap is overcome to ask Topic；Then, using independent component analysis method (independent component analysis, ICA) at least one sheet Sign mode function is handled, at least one effective voice signal components is isolated；Finally, at least one voice signal point Amount is reconstructed, the live video voice data enhanced.

Live video image data is compared by S5 with pre-stored image data, with determine live video image data with The similarity of pre-stored image data.

In above-mentioned steps, pre-stored image data be server in it is pre-stored include violation content image, lead to It crosses the live video image data that will be isolated to be compared with pre-stored image data, whether to determine live video image data Including violation content.

Specifically, the first textural characteristics of live video image data Q and the second line of pre-stored image data I are obtained respectively Feature is managed, since texture is narrow band signal, and different texture generally has different centre frequency and bandwidth, therefore is utilized respectively Small wave converting method extracts the textural characteristics of two images to be compared, and filter will input texture image I (x, y) and shearlet Small echo carries out convolution, and image texture characteristic can be obtained；

Calculate the similarity S of the first textural characteristics and the second textural characteristics_texture:

Wherein,For the texture feature vector of the first textural characteristics,For the texture feature vector of the second textural characteristics, n For feature vector number；

Calculate the characteristic similarity of the Riemann manifold of live video image data and pre-stored image data

S_manifold；In this step, the characteristic similarity for calculating the Riemann manifold of two image datas is substantially to compare Neighborhood (x, y the ∈ R of same position pixel (x, y)²), wherein R²For image area, the embodiment of the present disclosure is using Euclidean distance ratio It is whether similar compared with the neighborhood of point (x, y):

Wherein, g_tIt is a given window function, and is the Gaussian function of variable t, h is defined in R²On seat Scale value；

The image similarity in Riemann manifold defined above compares formula, is used to calculate the similarity between image. The comparison formula needs to solve one with x, and y is as follows as degeneration PDE, the PDE equation of variable:

Wherein D is Euclidean distance, Δ_xD is differential of the D in the direction x, Δ_yFor D in the differential in the direction y, Tr is matrix pair The quadratic sum of linea angulata, D_xyIndicate D to the partial derivative of xy；

Determine the comprehensive similarity S (Q, I) of live video image data Q and pre-stored image data I:

S (Q, I)=ω_ZS_manifold(Q,I)+ω_tS_texture(Q,I)

Wherein, ω_ZAnd ω_tIt is two adjustable weights, and meets ω_Z+ω_t=1.

In addition, the value of S (Q, I) is smaller in above-mentioned steps, then it is higher to be considered as similarity.ω_ZAnd ω_tSelection be foundation What the contribution degree of corresponding component part determined, using weight-coefficient compromise is based on, according to feedback information appropriate adjustment range formula In weight coefficient achieve the purpose that Optimized Matching result, effect calculates accuracy than a kind of image similarity is used alone It is high.

S6 judges that live video voice data includes sensitive content, and/or, live video image data and pre-stored image Whether the similarity of data is greater than threshold value.

In above-mentioned steps, if the live video voice data identified include relate to it is yellow, relate to sudden and violent, related to gambling activities, relate to political affairs etc. it is quick Feel content, is then considered as the live video and is accused of in violation of rules and regulations, and/or, if the phase of live video image data and pre-stored image data It is greater than threshold value like degree, then is considered as the live video and is accused of in violation of rules and regulations.

The direct broadcasting room of the live video is included in emphasis room if S6 determines to set up by S7.In this step, substantially It is to classify to the direct broadcasting room of entire live streaming platform, determines that the biggish direct broadcasting room of violation possibility and violation possibility are smaller Direct broadcasting room, specifically, if S6 judgement set up, it is determined that the direct broadcasting room of the live video be the biggish live streaming of violation possibility Between, it is on the contrary, it is determined that the direct broadcasting room of the live video is the lesser direct broadcasting room of violation possibility.Then it is transferred to S8.

S8 carries out first level monitoring to live video.In this step, to the violation possibility determined in step S7 Biggish direct broadcasting room carries out first level monitoring.Specifically, can take monitoring personnel manually to the live video of the direct broadcasting room into Row is checked, to determine whether there is unlawful practice.

S9, if S6 determines invalid, i.e. live video voice data does not include sensitive content, also, live video figure As whether data and the similarity of pre-stored image data are less than or equal to threshold value, then it is assumed that the direct broadcasting room where the live video is separated The lesser direct broadcasting room of possibility is advised, and is included in normal room.Then it is transferred to S10.

S10 carries out second level monitoring to it for the lesser direct broadcasting room of violation possibility determined in S9.Specifically Ground can enable monitoring personnel to carry out the direct broadcasting room of random inspection with the direct broadcasting room of these such as normal rooms of system random inspection Monitoring.

As shown in Fig. 2, live video monitoring system 200 includes Video decoding module 210, speech recognition module 220, image Identification module 230 and video monitoring module 240.The system 200 can execute the method described above with reference to Fig. 1, straight to realize Broadcast video monitoring.

Specifically, in identification procedure, Video decoding module 210 is decoded live video, obtains live streaming view Frequency voice data and live video image data；Speech recognition module 220 identifies live video voice data, with determination It whether include sensitive content in live video voice data；Picture recognition module 230 is by live video image data and prestores figure As data are compared, to determine that the similarity of live video image data and pre-stored image data, video monitoring module 240 are used It include sensitive content in working as live video voice data, and/or, the similarity of live video image data and pre-stored image data Greater than threshold value, the live video is monitored.

As shown in figure 3, electronic equipment 300 includes processor 310, computer readable storage medium 320.The robot 300 The method described above with reference to Fig. 1 can be executed, to realize that live video monitors.

Specifically, processor 310 for example may include general purpose microprocessor, instruction set processor and/or related chip group And/or special microprocessor (for example, specific integrated circuit (ASIC)), etc..Processor 310 can also include using for caching The onboard storage device on way.Processor 310 can be for executing the method flow according to the embodiment of the present disclosure for referring to Fig. 1 description Different movements single treatment units either multiple processing units.

Computer readable storage medium 320, such as can be times can include, store, transmitting, propagating or transmitting instruction Meaning medium.For example, readable storage medium storing program for executing can include but is not limited to electricity, magnetic, optical, electromagnetic, infrared or semiconductor system, device, Device or propagation medium.The specific example of readable storage medium storing program for executing includes: magnetic memory apparatus, such as tape or hard disk (HDD)；Optical storage Device, such as CD (CD-ROM)；Memory, such as random access memory (RAM) or flash memory；And/or wire/wireless communication chain Road.

Computer readable storage medium 320 may include computer program 321, which may include generation Code/computer executable instructions retouch the execution of processor 310 for example above in conjunction with Fig. 1 The method flow stated and its any deformation.

Computer program 521 can be configured to have the computer program code for example including computer program module.Example Such as, in the exemplary embodiment, the code in computer program 321 may include one or more program modules, for example including 321A, module 321B ....It should be noted that the division mode and number of module are not fixation, those skilled in the art can To be combined according to the actual situation using suitable program module or program module, when these program modules are combined by processor 310 When execution, processor 310 is executed for example above in conjunction with method flow described in Fig. 1 and its any deformation.

Although the disclosure, art technology has shown and described referring to the certain exemplary embodiments of the disclosure Personnel it should be understood that in the case where the spirit and scope of the present disclosure limited without departing substantially from the following claims and their equivalents, A variety of changes in form and details can be carried out to the disclosure.Therefore, the scope of the present disclosure should not necessarily be limited by above-described embodiment, But should be not only determined by appended claims, also it is defined by the equivalent of appended claims.

Claims

1. a kind of live video monitoring method, comprising:

Live video is decoded, live video voice data and live video image data are obtained；

The live video voice data is identified, whether to include in sensitivity in the determination live video voice data Hold；

The live video image data is compared with pre-stored image data, with the determination live video image data with The similarity of pre-stored image data；

When the live video voice data includes sensitive content, and/or, the live video image data and pre-stored image number According to similarity be greater than threshold value, the live video is monitored.

2. live video monitoring method according to claim 1, before being identified to the live video voice data, Include:

The live video voice data is decomposed using adaptive noise complete set empirical mode decomposition method, is obtained At least one intrinsic mode function；

At least one described intrinsic mode function is handled using independent component analysis method, isolates at least one voice Signal component；

At least one described voice signal components are reconstructed, the live video voice data enhanced.

3. live video monitoring method according to claim 1, by the live video image data and pre-stored image number According to being compared, with the similarity of the determination live video image data and pre-stored image data, comprising:

Calculate separately the textural characteristics similarity and Riemann manifold between the live video image data and pre-stored image data Characteristic similarity, the live video image is determined according to the characteristic similarity of the textural characteristics similarity and Riemann manifold The comprehensive similarity of data and pre-stored image data.

4. live video monitoring method according to claim 3, by the live video image data and pre-stored image number According to being compared, with the similarity of the determination live video image data and pre-stored image data, comprising:

The first textural characteristics of the live video image data Q and the second textural characteristics of pre-stored image data I are obtained respectively；

Calculate the similarity S of first textural characteristics and the second textural characteristics_texture:

Wherein,For the texture feature vector of first textural characteristics,For second textural characteristics textural characteristics to Amount, n are feature vector number；

Calculate the characteristic similarity S of the Riemann manifold of the live video image data and pre-stored image data_manifold；

Determine the comprehensive similarity S (Q, I) of the live video image data Q and pre-stored image data I:

S (Q, I)=ω_ZS_manifold(Q,I)+ω_tS_texture(Q,I)

Wherein, ω_ZAnd ω_tIt is two adjustable weights, and meets ω_Z+ω_t=1.

5. live video monitoring method according to claim 1, wherein when the live video voice data includes sensitivity Content, and/or, the similarity of the live video image data and pre-stored image data is greater than threshold value, to the live video Carry out first level monitoring；

When the live video voice data does not include sensitive content, also, the live video image data and pre-stored image The similarity of data is less than or equal to threshold value, carries out second level monitoring to the live video.

6. a kind of electronic equipment, comprising:

Processor；

Memory is stored with computer executable program, and the program by the processor when being executed, so that the place Device is managed to execute:

7. electronic equipment according to claim 6, the processor carries out identifying it to the live video voice data Before, also execute:

8. electronic equipment according to claim 6, the processor is by the live video image data and pre-stored image Data are compared, with the similarity of the determination live video image data and pre-stored image data, comprising:

9. electronic equipment according to claim 8, the processor is by the live video image data and pre-stored image Data are compared, with the similarity of the determination live video image data and pre-stored image data, comprising:

S (Q, I)=ω_ZS_manifold(Q,I)+ω_tS_textur(Q,I)

Wherein, ω_ZAnd ω_tIt is two adjustable weights, and meets ω_Z+ω_t=1.

10. electronic equipment according to claim 6, wherein the processor executes:

When the live video voice data includes sensitive content, and/or, the live video image data and pre-stored image number According to similarity be greater than threshold value, to the live video carry out first level monitoring；