CN111866605B

CN111866605B - Video auditing method and server

Info

Publication number: CN111866605B
Application number: CN202010658724.0A
Authority: CN
Inventors: 范鑫; 钟湧睿
Original assignee: Beijing Cheerbright Technologies Co Ltd
Current assignee: Beijing Cheerbright Technologies Co Ltd
Priority date: 2020-07-09
Filing date: 2020-07-09
Publication date: 2022-10-18
Anticipated expiration: 2040-07-09
Also published as: CN111866605A

Abstract

The invention discloses a video auditing method, which is executed in a server and comprises the following steps: the method comprises the steps of segmenting a video file to be audited into video subfiles of a plurality of time periods, wherein the video subfiles have priority orders; extracting an audio subfile from the video subfile with the first priority, calculating the violation probability of text information corresponding to the audio subfile, and cutting frames of the video subfile in the same period by adopting a frame cutting interval corresponding to the violation probability to obtain a picture subfile comprising a plurality of picture frames; determining violation probabilities of the picture subfiles by calculating the violation probability of each picture frame; the two violation probabilities are integrated to determine the violation probability of the video subfile in the middle period; and if the violation probability is larger than or equal to the first threshold, determining that the video file violates rules, otherwise, continuously determining the violation probability of the video subfiles with other priorities according to the priority order to determine whether the video file violates rules. The invention also discloses a server for executing the method.

Description

Video auditing method and server

Technical Field

The invention relates to the technical field of videos, in particular to a video auditing method and a server.

Background

Video is a commonly used presentation form of information content, and nowadays, photographs take an increasingly important position. According to the relevant regulations of legal regulations, before the video is transmitted to the public, the content examination is required to be carried out legally, the examination comprises the video which is unqualified and illegal and comprises junk information, political affairs, violence, terrorism, abuse, pornography, banning and other types, the video transmission is forbidden, and the adverse social influence is avoided.

Therefore, the video audit is an indispensable step of information audit. At present, complete manual video review exists, or auxiliary review is carried out on video contents through a computer learning algorithm, a machine learning algorithm firstly decomposes videos frame by frame and then carries out violation judgment, and then the videos with high violation probability are handed to workers for review. These methods are time consuming, labor intensive and of limited efficiency. Therefore, a more accurate and efficient video auditing method is needed.

Disclosure of Invention

In view of the above problems, the present invention proposes a video auditing method and server in an attempt to solve, or at least solve, the problems presented above.

According to an aspect of the present invention, there is provided a video auditing method adapted to be executed in a server, the method comprising the steps of: acquiring a video file to be audited, and dividing the video file into video subfiles in a plurality of time periods, wherein the video subfiles have priority sequences, and the priority is decreased from the middle time period to two ends; extracting an audio subfile from the video subfile with the first priority, and converting the audio subfile into text information; calculating the violation probability of the text information, determining a frame cutting interval corresponding to the violation probability, and cutting frames of the video subfiles of the same segment by adopting the frame cutting interval to obtain a picture subfile comprising a plurality of picture frames; determining the violation probability of the picture subfiles by calculating the violation probability of each picture frame; determining the violation probability of the video subfile with the first priority by integrating the text information and the violation probability of the picture subfile; and if the violation probability is larger than or equal to the first threshold, determining that the video file violates rules, otherwise, continuously determining the violation probability of the video subfiles with other priorities according to the priority order to determine whether the video file violates rules.

Optionally, in the video auditing method according to the present invention, the step of splitting the video file into video subfiles of a plurality of time periods includes: the video subfile is divided into three periods of video subfiles, and the priority order of the three periods is as follows: a second period, a third period, and a second period; or the video subfiles are divided into five periods of video subfiles, and the priority order of the five periods is as follows: a third period, a fourth period, a second period, a fifth period, and a first period.

Optionally, in the video auditing method according to the present invention, the step of converting the audio subfile into text information includes: and removing the environmental sound and the background sound in the audio sub-file, extracting the voice, and converting the voice into corresponding text information after recognizing the voice of the voice.

Optionally, in the video auditing method according to the present invention, a violation text library is stored in the server, and the violation text library includes multiple violation texts, and the step of calculating the violation probability of the text information includes: the text information is segmented into a plurality of single sentences, the matching degrees of the single sentences and a plurality of illegal texts are respectively calculated, and the illegal probability of the text information is determined based on the matching degrees of the single sentences.

Optionally, in the video auditing method according to the present invention, the relationship between the violation probability of text information and the frame cutting interval includes any one of the following modes: the frame cutting interval is inversely related to the violation probability of the text information; if the violation probability of the text information is smaller than a second threshold value, adopting a first frame cutting interval, otherwise adopting a second frame cutting interval; if the violation probability of the text information is smaller than a second threshold or larger than a third threshold, adopting a first frame cutting interval, otherwise adopting a second frame cutting interval; wherein the first framing interval is greater than the second framing interval.

Optionally, in the video auditing method according to the present invention, the method further includes the steps of: and acquiring a time period of the single sentence with high matching degree in the audio subfile, marking the time period as a first key time period, and reducing the picture frame cutting interval of the first key time period.

Optionally, in the video auditing method according to the present invention, after obtaining a picture subfile including a plurality of picture frames, the method further includes the steps of: and calculating the similarity of two adjacent picture frames in the picture subfile, and if the similarity is greater than or equal to a fourth threshold value, removing one picture frame from the picture subfile.

Optionally, in the video auditing method according to the present invention, a sample picture library is stored in the server, the sample picture library contains multiple violation sample pictures, and the step of determining the violation probability of a picture subfile by calculating the violation probability of each picture frame includes: the method comprises the steps of respectively calculating the matching degrees of a picture frame and a plurality of sample pictures, determining the violation probability of the picture frame based on the matching degrees, and determining the violation probability of a picture subfile by integrating the violation probability of each picture frame.

Optionally, in the video auditing method according to the present invention, the multiple sample pictures belong to multiple violation categories, and the method further includes the steps of: for the picture frame with high violation probability, the violation category of the picture frame is determined by counting the matching degree of the picture frame and a plurality of sample pictures of the same class.

Optionally, in the video auditing method according to the present invention, the method further includes the steps of: acquiring a time period of the picture frame with high violation probability in the picture subfile, and marking the time period as a second key time period; and determining the key time period of the video subfile by integrating the first key time period and the second key time period so as to record information.

Optionally, in the video auditing method according to the present invention, the step of continuously determining the violation probability of video subfiles of other priorities in the order of priority comprises: and if the violation probability of the video subfiles of a certain priority is determined to be larger than or equal to a first threshold according to the priority sequence, determining that the video files violate rules, and otherwise, continuing to determine the violation probability of the video subfiles of the next priority.

According to another aspect of the present invention, there is provided a server comprising: one or more processors; a memory; and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs when executed by the processors implement the steps of the video review method as described above.

According to a further aspect of the invention there is provided a readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by a server, implement the steps of a video review method as described above.

According to the technical scheme, the video information is audited in an automatic auditing mode, a section of video file is divided into a plurality of video subfiles, and the video in the middle time interval is preferentially processed considering that the content of suspected violation is generally less in the front section and the rear section of the video. In addition, in consideration of the fact that text recognition of a video is more efficient than picture recognition, the method provided by the invention firstly carries out voice recognition on the interrupted audio of the video and then converts the audio into characters, and recognizes the violation probability of the characters. If the probability of character violation is high, the frame cutting interval of the pictures of the simultaneous segments can be smaller properly, otherwise, the frame cutting interval of the pictures of the simultaneous segments can be larger properly, and thus the picture processing amount and the processing efficiency are effectively reduced.

Furthermore, for the pictures obtained by frame cutting, the method can also calculate the similarity of the adjacent pictures, and if the similarity of the adjacent pictures is higher, the pictures can be directly removed, so that the subsequent image matching amount is further reduced. In addition, the invention can also record the single sentence time interval with high violation probability, and the frame cutting interval in the single sentence time interval can be properly reduced. Meanwhile, the invention can also record the single sentence time interval with high violation probability and the time interval of the picture frame, and record the video information of the time intervals at the same time, so as to facilitate the subsequent check.

The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.

Drawings

To the accomplishment of the foregoing and related ends, certain illustrative aspects are described herein in connection with the following description and the annexed drawings, which are indicative of various ways in which the principles disclosed herein may be practiced, and all aspects and equivalents thereof are intended to be within the scope of the claimed subject matter. The above and other objects, features and advantages of the present disclosure will become more apparent from the following detailed description read in conjunction with the accompanying drawings. Throughout this disclosure, like reference numerals generally refer to like parts or elements.

FIG. 1 shows a block diagram of a server 100 according to one embodiment of the invention;

FIG. 2 shows a flow diagram of a video review method 200 according to one embodiment of the invention;

fig. 3 shows a flow diagram of a video review method according to another embodiment of the invention.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

Fig. 1 is a block diagram of a server 100 according to one embodiment of the invention. In a basic configuration 102, a server 100 typically includes a system memory 106 and one or more processors 104. A memory bus 108 may be used for communication between the processor 104 and the system memory 106.

Depending on the desired configuration, the processor 104 may be any type of processing, including but not limited to: a microprocessor (μ P), a microcontroller (μ C), a Digital Signal Processor (DSP), or any combination thereof. The processor 104 may include one or more levels of cache, such as a level one cache 110 and a level two cache 112, a processor core 114, and registers 116. The example processor core 114 may include an Arithmetic Logic Unit (ALU), a Floating Point Unit (FPU), a digital signal processing core (DSP core), or any combination thereof. The example memory controller 118 may be used with the processor 104, or in some implementations the memory controller 118 may be an internal part of the processor 104.

Depending on the desired configuration, system memory 106 may be any type of memory, including but not limited to: volatile memory (such as RAM), non-volatile memory (such as ROM, flash memory, etc.), or any combination thereof. System memory 106 may include an operating system 120, one or more applications 122, and program data 124. In some embodiments, application 122 may be arranged to operate with program data 124 on an operating system. The program data 124 comprises instructions, and in the server 100 according to the invention the program data 124 comprises instructions for performing the video review method 200.

Server 100 may also include an interface bus 140 that facilitates communication from various interface devices (e.g., output devices 142, peripheral interfaces 144, and communication devices 146) to basic configuration 102 via bus/interface controller 130. The example output device 142 includes a graphics processing unit 148 and an audio processing unit 150. They may be configured to facilitate communication with various external devices, such as a display or speakers, via one or more a/V ports 152. Example peripheral interfaces 144 may include a serial interface controller 154 and a parallel interface controller 156, which may be configured to facilitate communication with external devices such as input devices (e.g., keyboard, mouse, pen, voice input device, touch input device) or other peripherals (e.g., printer, scanner, etc.) via one or more I/O ports 158. An example communication device 146 may include a network controller 160, which may be arranged to facilitate communications with one or more other computing devices 162 over a network communication link via one or more communication ports 164.

A network communication link may be one example of a communication medium. Communication media may typically be embodied by computer readable instructions, data structures, program modules, and may include any information delivery media, such as carrier waves or other transport mechanisms, in a modulated data signal. A "modulated data signal" may be a signal that has one or more of its data set or its changes made in such a manner as to encode information in the signal. By way of non-limiting example, communication media may include wired media such as a wired network or private-wired network, and various wireless media such as acoustic, radio Frequency (RF), microwave, infrared (IR), or other wireless media. The term computer readable media as used herein may include both storage media and communication media.

Server 100 may be implemented as a server, such as a file server, a database server, an application server, a WEB server, etc., or as part of a small-sized portable (or mobile) electronic device, such as a cellular telephone, a Personal Digital Assistant (PDA), a personal media player device, a wireless WEB-browsing device, a personal headset, an application-specific device, or a hybrid device that include any of the above functions. The server 100 may also be implemented as a personal computer including desktop and notebook computer configurations. In some embodiments, the server 100 is configured to perform the video review method 200.

According to an embodiment of the invention, a violation text library and a sample picture library may also be stored in the server 100. The violation text library includes a plurality of violation texts, and the sample picture library includes a plurality of sample pictures of violations. The multiple illegal texts belong to different illegal categories, and the multiple sample pictures also belong to different illegal categories. Violation categories include, but are not limited to, spam, political involvement, violence, terrorism, abuse, pornography, contraband, and other types.

Fig. 2 shows a flow diagram of a video review method 200 according to an embodiment of the invention. The method 200 is performed in a server, such as the server 100. As shown in fig. 2, the method begins at step S210.

In step S210, a video file to be audited is obtained, and the video file is divided into video subfiles of multiple time periods, where the video subfiles have priority orders, and the priority orders decrease from the middle time period to the two ends.

Generally, the segmentation may be performed according to the duration of the video file, and a person skilled in the art may select the number of the segmentations according to needs, which is not limited by the present invention. In one implementation, a video file may be equally divided by time length into a plurality of video subfiles; or randomly sliced in time. In another implementation manner, an audio file corresponding to the video file may be extracted, and the slicing duration of the video subfile may be determined based on the sound interval in the audio file. Thus, a video subfile typically contains complete statements, avoiding the occurrence of a segment or division of a segment into different video subfiles.

Furthermore, considering that the content of suspected violation is generally less located in the front section and the rear section of the video, the video file in the middle period can be divided into a plurality of video subfiles, that is, the video division duration in the middle period is reduced, so as to examine the video file in the middle period with emphasis. Correspondingly, the video segmentation duration can be properly prolonged at the front section or the rear section of the video, so that the overall efficiency of video auditing is improved.

It should be understood that the cut-out video subfiles have higher priority as they are closer to the middle of the video; the closer to the ends of the video, the lower its priority. And if the distance is equal to the time length of the video interruption, the priority of the back-end video is higher than that of the front-end video. Furthermore, the invention can set corresponding file coding for each video subfile, and the video subfiles are sequentially coded from front to back according to the front-back sequence of the video. Then the number of the file codes is approximately close to the median of all the file codes, and the priority is higher; and if the distance between the coded number of the front-end video and the coded number of the rear-end video is equal to the median, the priority of the sub-video file at the rear end is higher.

According to one embodiment, a video file is divided into three periods of video subfiles in the order of priority: a second period, a third period, and a second period. That is, the time interval in the middle is reviewed first, the time interval later in the middle is reviewed later, and the time interval earlier in the middle is reviewed last.

According to another embodiment, the video file is divided into five periods of video subfiles, the priority order of the five periods being: a third period, a fourth period, a second period, a fifth period, and a first period. That is, the video file may be split into an odd number of video subfiles, with the middle most video subfiles then having the highest priority.

Subsequently, in step S220, an audio subfile is extracted from the video subfile of the first priority, and the audio subfile is converted into text information.

Here, if the complete audio file of the video file is not extracted in step S210, the audio subfile of the video subfile of the first priority is separately extracted in step S220. If the complete audio file of the video file has been extracted in step S220, the audio subfile corresponding to the period of the first priority is intercepted from the extracted complete audio file in step S220.

According to one embodiment of the present invention, the step of converting the audio subfile into text information comprises: and removing the environmental sound and the background sound in the audio subfile, extracting the voice, and converting the voice into corresponding text information after voice recognition of the voice. The elimination of the noise can reduce the data volume of the voice file to be processed; meanwhile, the speed of subsequent natural language processing can be improved, and the voice audio information can be quickly converted into text information.

It should be noted that the ambient sound and background sound elimination, the speech recognition and the text conversion are well-established technologies in the art, and those skilled in the art can reasonably select the implementation method according to the needs, which is not limited by the present invention. For example, a voice recognition engine performs natural language processing on the separated human voice data to convert a voice file into text information.

Subsequently, in step S230, the violation probability of the text information is calculated, the frame cutting interval corresponding to the violation probability is determined, and the frame cutting interval is adopted to cut the video subfiles of the same segment, so as to obtain a picture subfile including a plurality of picture frames.

According to one embodiment, the step of calculating the violation probability of the textual information comprises: segmenting the text information into a plurality of single sentences; and for each single sentence, respectively calculating the matching degree of the single sentence and a plurality of illegal texts, and determining the illegal probability of the text information based on the matching degree of the single sentence. And the text auditing engine can be used for auditing the converted text information, marking the contents which are not in compliance and returning an auditing result.

Wherein, the single sentence segmentation can be performed according to punctuation marks. The matching degree of the single sentence and the illegal text can adopt the existing arbitrary sentence matching algorithm, such as a maximum common subsequence algorithm, and the like, which is not limited by the invention.

The matching degree of a single sentence and an illegal text can be used as the violation probability of the single sentence for the illegal text, and the violation probability of the single sentence can be obtained by integrating the violation probability of the single sentence for a plurality of illegal texts. For example, a single sentence is sorted in descending order for the violation probabilities of multiple violation texts, the maximum value is used as the violation probability of the single sentence, or the average value of the numerical values of the top N bits is used as the violation probability of the single sentence.

And if the violation probability of each single sentence is known, the violation probability of the text information can be obtained by integrating the violation probabilities of all the single sentences. Likewise, the violation probabilities of multiple single sentences of the text information can be sorted in a descending order, the maximum value of the violation probabilities can be used as the violation probability of the text information, or the average value of the numerical values of the top N bits can be used as the violation probability of the text information.

In a first implementation manner, the frame cutting interval of the video subfile is inversely related to the violation probability of the text information, that is, the higher the violation probability of the text information in the same period is, the smaller the corresponding frame cutting interval is. At this time, the frame cutting interval is a value that dynamically changes according to the violation probability.

In a second implementation mode, if the violation probability of the text information is smaller than a second threshold, a first frame cutting interval is adopted; otherwise, if the violation probability of the text information is larger than or equal to the second threshold, a second frame cutting interval is adopted, wherein the first frame cutting interval is longer than the second frame cutting interval. At this time, two determined interval times are selected according to the two intervals of the violation probability in the frame cutting interval. The value range of the second threshold may be [40%, 50%), for example, 40% or 49%, although not limited thereto.

In a third implementation manner, if the violation probability of the text information is smaller than the second threshold or larger than the third threshold, a first frame switching interval is adopted; on the contrary, if the interval of the violation probability of the text information is [ the second threshold, the third threshold ], a second frame cutting interval is adopted. The value interval of the second threshold may be [40%, 50%), for example, 40% or 49%, although not limited thereto. The value interval of the third threshold may be [80%,95% ], for example, 80% or 95%, although not limited thereto.

Here, a more accurate segmentation method is adopted, mainly considering that when the text violation probability is low (e.g., 30%), the violation probability of the picture frame is also low, and therefore the frame can be cut roughly. When the violation probability is high (for example, 92%), the picture frame violation probability is high, and thus the randomly captured picture frames may be violated. And when the text violation probability is not high or low, which causes that the text violation is suspicious, the result of the picture frame needs to be audited in a highlight manner to determine whether the sub-video really violates the rule or not.

According to an embodiment of the invention, a time period of the single sentence with high matching degree in the audio subfile can be further acquired, the time period is marked as a first key time period, and the picture frame cutting interval of the first key time period is narrowed. If the violation probability of a small section of text information is high, the picture frame corresponding to the section of text information needs to be checked in a focused manner. Therefore, the frame cutting interval of the first key time interval can be properly reduced on the basis of the original frame cutting interval, so that more picture frames in the interval can be obtained, and the accuracy of video review is improved.

According to another embodiment of the present invention, after the picture subfile is obtained by frame cutting, the similarity between two adjacent picture frames in the picture subfile may be further calculated, and if the similarity is greater than or equal to a fourth threshold, one of the picture frames is removed from the picture subfile. Therefore, the similar picture frames are compared only by judging once, and the picture auditing workload and the auditing efficiency are improved. The value interval of the fourth threshold may be [80%,95% ], for example, 80% or 95%, although not limited thereto.

Subsequently, in step S240, the violation probability of the picture subfile is determined by calculating the violation probability of each picture frame.

Specifically, for each picture frame, the matching degree of the picture frame and a plurality of sample pictures can be respectively calculated, the violation probability of the picture frame is determined based on the matching degrees, and the violation probability of each picture frame is synthesized to determine the violation probability of the picture subfile.

Here, the sample pictures may be trained first for the purpose of trainingFor standard functions

Indicating that the ith sample picture is matched with the jth labeled object picture of the category p, otherwise, if not, then

According to the matching policy, if

It means that for the jth annotated object picture, there are likely to be multiple pictures matching the sample. Here, the category p is a violation category that is commonly included in spam, administration, violence, and the like. There may be multiple tagged objects under each category, with these tagged objects generally representing the picture as the corresponding violation category.

The dimensions and aspect ratios are selected for the sample picture, which need not correspond to the picture of each frame in general, and the particular feature map is responsible for processing objects of a particular dimension in the image. On each feature map, the scale of the default sample picture is calculated as follows:

the matching degree of one picture frame and one sample picture can be used as the violation probability of the picture frame for the sample picture, and the violation probability of the picture frame can be obtained by integrating the violation probabilities of one picture frame for a plurality of sample pictures. For example, the violation probabilities of the plurality of sample pictures by the picture frame are sorted in a descending order, the maximum value is used as the violation probability of the picture frame, or the average value of the first N bits is used as the violation probability of the picture frame.

Knowing the violation probability of each picture frame, the violation probability of all the picture frames is integrated to obtain the violation probability of the picture subfiles. Likewise, the violation probabilities of the multiple picture frames of the picture subfile may be sorted in a descending order, with the maximum value as the violation probability of the picture subfile, or with the average of the top N bits as the violation probability of the picture subfile.

According to an embodiment of the invention, a time period of the picture frame in the picture subfile with high violation probability can be acquired, the time period is marked as a second key time period, and the key time periods of the video subfile are determined by combining the first key time period and the second key time period, so as to record information. Namely, the first key time interval and the second key time interval are taken as a union set to be used as the key time interval of the video subfile, and the video information of the time interval is recorded for subsequent auditing.

Subsequently, in step S250, the violation probabilities of the video subfiles for the intermediate period are determined by integrating the text information and the violation probabilities of the picture subfiles. The video subfile may be determined for violation probability by averaging or maximizing the two.

Then, it is determined whether the violation probability is equal to or greater than a first threshold.

If yes, in step S260, it is determined that the video file is illegal.

Otherwise, in step S270, the violation probabilities of the video subfiles of other priorities are continuously determined according to the priority order, so as to determine whether the video file is violated.

Here, if the violation probability of the video subfiles of a certain priority is determined to be equal to or greater than the first threshold in the order of priority, the video file is determined to be violated, and otherwise, the violation probability determination of the video subfile of the next priority is continued.

Specifically, if it is determined that the video subfile of the first priority does not violate the rule, it is continuously determined whether the video subfile of the second priority violates the rule, and the violation determination method refers to the violation determination method for the video subfile of the first priority. That is, extracting an audio subfile from the video subfile with the second priority, and converting the audio subfile into text information; calculating the violation probability of the text information, determining a frame cutting interval corresponding to the violation probability, and cutting the video subfiles of the same segment by adopting the frame cutting interval to obtain a picture subfile comprising a plurality of picture frames; determining violation probabilities of the picture subfiles by calculating the violation probability of each picture frame; determining the violation probability of the video subfile with the second priority by integrating the text information and the violation probability of the picture subfile, and judging whether the violation probability is greater than or equal to a first threshold value; if so, judging that the video file is illegal, otherwise, continuously judging the violation probability of the video subfile of the third priority according to the priority order to determine whether the video file is illegal.

That is, as long as the violation probability of a video subfile with a certain priority reaches the standard, it is determined that the video file is violated, and the auditing process is stopped. And if the violation probability of all the video subfiles does not reach the standard, determining that the video file does not violate the rule. Certainly, in order to improve the auditing efficiency, the invention can skip the auditing of the video subfiles at the head end and the tail end, and only audit one or more video subfiles in the middle period to obtain the auditing result.

Further, considering that the violation text and the sample picture have different violation types, if a video file is determined to be in violation, the violation type of the video file can be determined according to the video subfile with the highest violation probability.

Specifically, one or more single sentences with high violation probability in the text information corresponding to the video subfile are extracted, the matching degree of each single sentence and multiple violation texts in the same class is counted to determine the violation categories of the single sentences, and then the violation categories of the text information are determined by counting the violation categories of the multiple single sentences. Here, the probability that a single sentence or text information belongs to each violation category may also be calculated based on the degree of matching.

Similarly, one or more picture frames with high violation probability in the picture subfiles corresponding to the video subfile are extracted, the matching degree of the picture frames and the multiple sample pictures of the same class is counted to determine the violation category of the picture frames, and the violation category of the picture subfiles is further determined by counting the violation categories of the multiple picture frames. Meanwhile, the probability that the picture frame or the picture subfile belongs to each violation category can be calculated based on the matching degree.

Then, the violation categories of the video subfile can be determined by combining the violation categories of the text information of the simultaneous segments. For example, the violation category of the maximum probability of the picture sub-file and the text information may be taken as the violation category of the video file; or after averaging the two probability values of the violation categories belonging to the same category, selecting the violation category with the maximum average probability value as the violation category of the video file.

It should be noted that the method 200 is not limited to the duration of the original video, and any video may be audited by using the method 200. Of course, videos of moderate or slightly longer duration may preferably be audited by way of method 200; when the video is short, for example, only a few seconds, the audio and picture frames can be directly extracted from the whole video file for subsequent review without segmenting the video file. In addition, various thresholds, durations and frame-slicing intervals are provided in the above contents, and those skilled in the art can set their value ranges or sizes as needed, which is not limited by the present invention.

Fig. 3 illustrates a video review method according to another embodiment of the present invention. As shown in fig. 3, in an implementation manner, under a synchronous audit rule, the method 200 for synchronously performing audio frame audit and picture frame audit is adopted to perform video audit, and two audit results are combined to determine an audit result of a video. Meanwhile, the invention can self-define the logic of extracting video and audio auditing and the logic of extracting video picture frames, and can search the audio-to-text information and the illegal and illegal information in the video picture frame group by a more efficient searching algorithm, thereby completing auditing more quickly.

In another implementation manner, one of the two processes of audio review and picture frame review can be optionally performed, and the audio and video frames are extracted for voice recognition review or the video frames are extracted for picture recognition review, so that the review process can be completed after either process is finished, and the review result of the single process is the review result of the video.

Here, audit rules can be preset, and if the audit rule is any one of audits, one audio audit or picture frame audit is selected for video audit; if the auditing rule is limited audio auditing, adopting an audio auditing flow to carry out video auditing; if the audit rule is to limit the audit of the picture frame, adopting a picture frame audit process to audit the video; if the audit rule is that the two are audited synchronously, the method 200 is adopted to complete two audit modes to obtain the video audit result.

Or, the auditing mode can be automatically selected according to the video duration. If the video time is too short and is less than a fifth threshold, for example, within 15s, a single auditing mode without segmentation is adopted. If the video duration is moderate or too long, and is greater than or equal to the fifth threshold, for example, 2min, the method 200 is adopted in a synchronous auditing manner of segmentation, or a single auditing flow of segmentation. The value of the fifth threshold may be set by a person skilled in the art according to needs, and the present invention is not limited to this.

According to one embodiment, a non-partitioned single audio review process includes: and extracting a complete audio file, eliminating non-human voice, converting the audio into character information after human voice identification, and auditing the character information to obtain the violation probability of the complete audio file.

According to another embodiment, the sliced single audio review flow comprises: the method comprises the steps of extracting an audio file from a video file, segmenting the audio file into a plurality of audio subfiles, extracting the audio subfiles in the middle period, eliminating the non-human voice of the audio subfiles, converting the audio into character information after the human voice identification, and obtaining the violation probability of the audio subfiles after the character information is audited. If the violation probability is larger than the first threshold, the whole video is judged to be violated, otherwise, the violation probability of the audio sub-file in other time periods is continuously calculated to determine whether the video file is violated. Here, the audio file is divided into a plurality of audio subfiles according to the null voice and the period of the audio, and each divided period has the same priority order as the video subfile.

According to one embodiment, the single picture frame auditing flow without segmentation comprises the following steps: the method comprises the steps of extracting a picture file from a video file through frame cutting, sequentially judging the violation probability of each picture frame, further determining the violation probability of the picture file as the violation probability of the video file, and determining whether the video file violates rules or not.

According to another embodiment, the cut single picture frame auditing process comprises: the video file is divided into a plurality of video subfiles, the picture subfiles are extracted from the video subfiles with the first priority (namely, the middle period), the violation probability of each picture frame is sequentially judged, and then the violation probability of the picture subfiles is determined. If the violation probability is larger than or equal to the first threshold, the whole video is judged to be violated, otherwise, the violation probability of the picture subfiles in other periods is continuously calculated, so as to determine whether the video file is violated.

In addition, when the video and audio are not approved, the character information which is not approved and the video time axis where the character information is located are returned to record the information so as to carry out secondary verification when necessary. And when the video picture frame is not approved, returning the video picture frame which is not approved and the video time axis where the picture frame is positioned, and recording information so as to perform secondary verification.

According to the technical scheme of the invention, the method for quickly and efficiently auditing the video information based on the system is provided, and the video audio auditing and the video picture frame synchronization auditing are combined and applied to quickly audit the picture video. The invention can support the auditing of double confirmation of the auditing of the video audio track and the auditing of the video picture frame, can also support the logic of auditing when single auditing is not passed, and can select a proper auditing mode on the auditing speed and the auditing accuracy according to the business requirement. Various auditing modes and various auditing logics can be selected according to actual requirements, the video file can be quickly and accurately audited, and the method has good applicability.

The invention can perform picture frame extraction processing on the dynamic picture, process the dynamic picture into a plurality of static pictures and then audit all the static pictures. Meanwhile, a fault tolerance processing mechanism is also arranged, and possible errors are processed in various modes in the process of checking the picture video information, the generated errors are refined and classified, so that the accuracy of checking the picture video information is ensured. Therefore, the problem that the traditional picture auditing method cannot quickly audit the dynamic picture can be avoided, and quick auditing can be realized even if a large amount of data is faced.

In addition, the invention can also automatically select the splitting or splitting, single or synchronous auditing mode according to the time length, thereby improving the overall auditing efficiency of mass data. In addition, in the process of auditing the synchronous segmentation, the frame segmentation interval of the picture frames can be selected according to the identification result of the audio subfiles, the picture frames with low or high text violation probability are few, and the picture frames with more text violations are more suspicious. And simultaneously marking time intervals with high text violation probability and high picture frame violation probability, and recording information for subsequent examination.

The method of A9, A8, wherein the sample pictures belong to violation categories, the method further comprising: for the picture frame with high violation probability, the violation category of the picture frame is determined by counting the matching degree of the picture frame and a plurality of sample pictures of the same class.

The method A10, as stated in A8, further includes the steps of: acquiring a time period of the picture frame with high violation probability in the picture subfile, and marking the time period as a second key time period; and determining the key time periods of the video subfiles by integrating the first key time period and the second key time period so as to record information.

The method according to any one of A1 to a10, wherein the step of continuously determining the violation probability of the video subfiles of other priorities in the priority order includes: and if the violation probability of the video subfiles of a certain priority is determined to be larger than or equal to a first threshold according to the priority sequence, determining that the video files are violated, otherwise, continuing to determine the violation probability of the video subfiles of the next priority.

The various techniques described herein may be implemented in connection with hardware or software or, alternatively, with a combination of both. Thus, the methods and apparatus of the present invention, or certain aspects or portions thereof, may take the form of program code (i.e., instructions) embodied in tangible media, such as removable hard drives, U.S. disks, floppy disks, CD-ROMs, or any other machine-readable storage medium, wherein, when the program is loaded into and executed by a machine, such as a computer, the machine becomes an apparatus for practicing the invention.

In the case of program code execution on programmable computers, the computing device will generally include a processor, a storage medium readable by the processor (including volatile and non-volatile memory and/or storage elements), at least one input device, and at least one output device. Wherein the memory is configured to store program code; the processor is configured to execute the video review method of the present invention according to instructions in the program code stored in the memory.

By way of example, and not limitation, readable media includes readable storage media and communication media. Readable storage media store information such as computer readable instructions, data structures, program modules or other data. Communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. Combinations of any of the above are also included within the scope of readable media.

In the description provided herein, algorithms and displays are not inherently related to any particular computer, virtual system, or other apparatus. Various general purpose systems may also be used with examples of this invention. The required structure for constructing such a system will be apparent from the description above. Moreover, the present invention is not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present invention as described herein, and any descriptions of specific languages are provided above to disclose the best mode of the invention.

In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.

Those skilled in the art will appreciate that the modules or units or components of the devices in the examples disclosed herein may be arranged in a device as described in this embodiment, or alternatively may be located in one or more devices different from the device in this example. The modules in the foregoing examples may be combined into one module or may be further divided into multiple sub-modules.

Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Moreover, those skilled in the art will appreciate that although some embodiments described herein include some features included in other embodiments, not others, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.

Furthermore, some of the described embodiments are described herein as a method or combination of method elements that can be performed by a processor of a computer system or by other means of performing the described functions. A processor having the necessary instructions for carrying out the method or method elements thus forms a means for carrying out the method or method elements. Further, the elements of the apparatus embodiments described herein are examples of the following apparatus: the apparatus is used to implement the functions performed by the elements for the purpose of carrying out the invention.

As used herein, unless otherwise specified the use of the ordinal adjectives "first", "second", "third", etc., to describe a common object, merely indicate that different instances of like objects are being referred to, and are not intended to imply that the objects so described must be in a given sequence, either temporally, spatially, in ranking, or in any other manner.

While the invention has been described with respect to a limited number of embodiments, those skilled in the art, having benefit of this description, will appreciate that other embodiments can be devised which do not depart from the scope of the invention as described herein. Furthermore, it should be noted that the language used in the specification has been principally selected for readability and instructional purposes, and may not have been selected to delineate or circumscribe the inventive subject matter. Accordingly, many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the appended claims. The present invention has been disclosed in an illustrative rather than a restrictive sense with respect to the scope of the invention, as defined in the appended claims.

Claims

1. A video auditing method adapted to be executed in a server, the method comprising the steps of:

acquiring a video file to be audited, and segmenting the video file into video subfiles in a plurality of time periods, wherein the video subfiles have priority sequences, and the priority is decreased from the middle time period to two ends;

extracting an audio subfile from the video subfile with the first priority, and converting the audio subfile into text information;

calculating the violation probability of the text information, determining a frame cutting interval corresponding to the violation probability, and cutting the video subfiles of the same segment by adopting the frame cutting interval to obtain a picture subfile comprising a plurality of picture frames; determining a violation probability for the picture subfile by calculating a violation probability for each picture frame;

determining the violation probability of the video subfile with the first priority by integrating the violation probabilities of the text information and the picture subfile;

if the violation probability is larger than or equal to a first threshold, determining that the video file is in violation, otherwise, continuously determining the violation probability of video subfiles of other priorities according to the priority order to determine whether the video file is in violation;

the relation between the violation probability of the text information and the frame cutting interval comprises any one of the following modes:

the frame cutting interval is inversely related to the violation probability of the text information;

if the violation probability of the text information is smaller than a second threshold value, adopting a first frame cutting interval, otherwise adopting a second frame cutting interval, wherein the first frame cutting interval is larger than the second frame cutting interval;

and if the violation probability of the text information is smaller than a second threshold or larger than a third threshold, adopting a first frame cutting interval, otherwise adopting a second frame cutting interval.

2. The method of claim 1, wherein the step of segmenting the video file into video subfiles for a plurality of time periods comprises:

the video file is divided into video subfiles of three time periods, and the priority sequence of the three time periods is as follows: a second period, a third period, and a second period; or

The video file is divided into video subfiles of five time intervals, and the priority order of the five time intervals is as follows: a third period, a fourth period, a second period, a fifth period, and a first period.

3. The method of claim 1 or 2, wherein the step of converting the audio subfile into text information comprises:

and removing the environmental sound and the background sound in the audio subfile, extracting the voice, and converting the voice into corresponding text information after voice recognition of the voice.

4. The method according to claim 1 or 2, wherein the server stores a violation text library, the violation text library comprising a plurality of violation texts, and the step of calculating the violation probability of the text information comprises:

and segmenting the text information into a plurality of single sentences, respectively calculating the matching degrees of the single sentences and the plurality of illegal texts, and determining the illegal probability of the text information based on the matching degrees of the single sentences.

5. The method of claim 4, further comprising the steps of:

and acquiring a time period of the single sentence with high matching degree in the audio subfile, marking the time period as a first key time period, and reducing the picture frame cutting interval of the first key time period.

6. The method as claimed in claim 1 or 2, wherein after obtaining the picture subfile including a plurality of picture frames, further comprising the steps of:

and calculating the similarity of two adjacent picture frames in the picture subfile, and if the similarity is greater than or equal to a fourth threshold value, removing one picture frame from the picture subfile.

7. The method of claim 1 or 2, wherein the server has stored therein a sample picture library containing a plurality of violation sample pictures, the determining the violation probability for the picture subfile by calculating the violation probability for each picture frame comprising:

respectively calculating the matching degrees of the picture frame and the sample pictures, determining the violation probability of the picture frame based on the matching degrees, and determining the violation probability of the picture subfile by integrating the violation probability of each picture frame.

8. The method of claim 7, wherein the plurality of sample pictures are categorized in a plurality of violation categories, the method further comprising the steps of: for the picture frame with high violation probability, the violation type of the picture frame is determined by counting the matching degree of the picture frame and a plurality of sample pictures of the same type.

9. The method of claim 5, further comprising the steps of: acquiring a time period of the picture frame with high violation probability in the picture subfile, and marking the time period as a second key time period; and determining the key time period of the video subfile by integrating the first key time period and the second key time period so as to record information.

10. The method of claim 1 or 2, wherein the step of proceeding to determine violation probabilities for video subfiles of other priorities in priority order comprises: and if the violation probability of the video subfiles of a certain priority is determined to be larger than or equal to a first threshold according to the priority sequence, determining that the video files are violated, otherwise, continuing to determine the violation probability of the video subfiles of the next priority.

11. A server, comprising:

a memory;

one or more processors;

one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs comprising instructions for performing any of the methods of claims 1-10.

12. A computer readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by a server, cause the server to perform any of the methods of claims 1-10.