CN110889351B

CN110889351B - Video detection method, device, terminal equipment and readable storage medium

Info

Publication number: CN110889351B
Application number: CN201911128730.9A
Authority: CN
Inventors: 乔宇; 彭小江; 叶木超
Original assignee: Shenzhen Institute of Advanced Technology of CAS
Current assignee: Shenzhen Institute of Advanced Technology of CAS
Priority date: 2019-11-18
Filing date: 2019-11-18
Publication date: 2023-09-26
Anticipated expiration: 2039-11-18
Also published as: CN110889351A; WO2021098657A1

Abstract

The application is applicable to the technical field of computer vision, and provides a video detection method, a device, terminal equipment and a readable storage medium, wherein the method comprises the following steps: and acquiring a video to be detected, wherein the video to be detected comprises N video frames. And sequentially acquiring predicted frames of N video frames, wherein the predicted frames of the (i+1) th video frame are obtained by inputting error images of the (i) th video frame into a trained prediction network model for processing. If the difference degree of the N-th video frame and the predicted frame of the N-th video frame accords with the preset condition, determining that the video to be detected is abnormal. The prediction frame is obtained according to the error image prediction of the previous frame when the prediction frame is calculated, so that the generated prediction frame considers the time sequence influence of the video to be detected, the obtained prediction frame of the N-th video frame is more accurate, and the effects of reducing the probability of false detection and improving the detection accuracy are realized when whether the video to be detected is abnormal or not is confirmed according to the prediction frame of the N-th video frame and the N-th video frame.

Description

Video detection method, device, terminal equipment and readable storage medium

Technical Field

The application belongs to the technical field of computer vision, and particularly relates to a video detection method, a video detection device, terminal equipment and a readable storage medium.

Background

Video anomaly detection can detect whether an anomaly event is contained in a video, wherein the anomaly event in the video refers to a special event different from normal behavior in a specific scene, and the anomaly event can damage public safety or influence social order to cause serious consequences, so that the video anomaly detection is used for determining whether the anomaly event exists or not, and plays an important role in maintaining the social order.

In the prior art, N frames of a video to be detected are input into an automatic encoder as a whole, the automatic encoder carries out convolution operation on the input N frames to obtain a video frame, and finally, whether the video to be detected has an abnormality or not is determined according to the video frame.

However, since the conventional technology does not consider the influence of the time sequence on the detection of the abnormal event, false detection is likely to occur when the abnormal event is detected, and the detection effect is not ideal.

Disclosure of Invention

The embodiment of the application provides a video detection method, a device, terminal equipment and a readable storage medium, which are used for solving the problem that false detection is easy to occur when an abnormal event is detected, so that the detection effect is not ideal.

In a first aspect, an embodiment of the present application provides a video detection method, including:

and acquiring a video to be detected, wherein the video to be detected comprises N video frames, and N is an integer greater than 1.

And sequentially acquiring predicted frames of the N video frames.

The calculation mode of the predicted frame of the (i+1) th video frame in the N video frames is as follows: inputting an error image of an ith video frame into a trained prediction network model for processing to obtain a predicted frame of an ith+1th video frame, wherein the error image of the ith video frame is obtained by subtracting the predicted frame of the ith video frame from the predicted frame of the ith video frame, i is more than or equal to 1 and less than or equal to N-1, and i is an integer.

And calculating the difference degree of the predicted frames of the Nth video frame and the Nth video frame.

If the difference degree meets the preset condition, determining that the video to be detected is abnormal.

In a possible implementation manner of the first aspect, the execution subject of the video detection method is a terminal with image processing capability. The terminal may be an entity terminal, such as a desktop computer, a server, a notebook computer, a tablet computer, or the like, or may be a virtual terminal, such as a cloud server, cloud computing, or the like. It should be understood that the above execution subject is only an example and not necessarily the above terminal.

It should be noted that, the predicted frame of the 1 st video frame is obtained by inputting the preset error image into the predicted network model for processing.

In some embodiments, calculating the degree of difference between the nth video frame and the predicted frame of the nth video frame may be performed by subtracting the predicted frame of the nth video frame from the nth video frame and taking a modulus to obtain the degree of difference.

In other embodiments, calculating the difference value between the nth video frame and the predicted frame of the nth video frame may further repair the predicted frame of the nth video frame by a preset repair algorithm. And subtracting the predicted frame of the repaired Nth video frame from the Nth video frame, and taking a model to obtain the difference degree.

Optionally, if the difference degree meets a preset condition, determining that the video to be detected is abnormal, and when the difference degree is greater than a first preset threshold, determining that the video to be detected is abnormal.

Optionally, if the difference degree meets a preset condition, determining that the video to be detected is abnormal, and normalizing the reciprocal of the difference degree to obtain a normality score. And then, when the normality score is smaller than a second preset threshold, determining that the video to be detected is abnormal.

The prediction network model is a Long Short-Term Memory (LSTM) model or a gated recurrent neural network (Gated Recurrent Unit, GRU) model.

In a second aspect, an embodiment of the present application provides a video detection apparatus, including:

the device comprises an acquisition module, a display module and a display module, wherein the acquisition module is used for acquiring a video to be detected, the video to be detected comprises N video frames, wherein N is an integer greater than 1. And the prediction module is used for sequentially acquiring the predicted frames of the N video frames. The calculation mode of the predicted frame of the (i+1) th video frame in the N video frames is as follows: inputting an error image of an ith video frame into a trained prediction network model for processing to obtain a predicted frame of the (i+1) th video frame, wherein the error image of the ith video frame is obtained by subtracting the predicted frame of the ith video frame from the predicted frame of the ith video frame, i is more than or equal to 1 and less than or equal to N-1, and i is an integer. And the calculating module is used for calculating the difference degree between the Nth video frame and the predicted frame of the Nth video frame. And the determining module is used for determining that the video to be detected is abnormal if the difference degree meets a preset condition.

In a possible implementation manner of the second aspect, the video detection device may be an execution body of the first aspect, and a specific form thereof is the same as the implementation body, and is not described herein.

In some embodiments, the calculating module is specifically configured to subtract the predicted frame of the nth video frame from the nth video frame and modulo the predicted frame of the nth video frame to obtain the difference degree.

In other embodiments, the calculating module is specifically configured to repair the predicted frame of the nth video frame by using a preset repair algorithm. And subtracting the predicted frame of the repaired Nth video frame from the Nth video frame, and taking a model to obtain the difference degree.

Optionally, the determining module is specifically configured to determine that the video to be detected is abnormal when the difference degree is greater than a first preset threshold.

Optionally, the determining module is further configured to normalize the reciprocal of the difference degree to obtain a normality score. And then, when the normality score is smaller than a second preset threshold, determining that the video to be detected is abnormal.

The prediction network model is an LSTM model or a GRU model.

In a third aspect, an embodiment of the present application provides a terminal device, including: a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the method as provided in the first aspect when executing the computer program.

In a fourth aspect, embodiments of the present application provide a computer readable storage medium storing a computer program which when executed by a processor performs a method as provided in the first aspect.

In a fifth aspect, an embodiment of the application provides a computer program product for, when run on a terminal device, causing the terminal device to perform the method as provided in the first aspect.

It will be appreciated that the advantages of the second to fifth aspects may be found in the relevant description of the first aspect, and are not described here again.

Compared with the prior art, the embodiment of the application has the beneficial effects that: firstly, according to the video to be detected, the predicted frames of N video frames are sequentially obtained. The error image of the ith video frame is obtained by subtracting the predicted frame of the ith video frame from the predicted frame of the ith video frame, and the (i+1) th predicted frame is obtained by inputting the error image of the ith video frame into a trained prediction network model for processing. Then, the difference degree between the N-th video frame and the predicted frame of the N-th video frame is calculated. And finally, if the difference degree meets the preset condition, determining that the video to be detected is abnormal. The prediction frame is obtained according to the error image prediction of the previous frame when the prediction frame is calculated, so that the generated prediction frame considers the time sequence influence of the video to be detected, the obtained prediction frame of the N-th video frame is more accurate, and the effects of reducing the probability of false detection and improving the detection accuracy are realized when whether the video to be detected is abnormal or not is confirmed according to the prediction frame of the N-th video frame and the N-th video frame.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic view of an application scenario provided in an embodiment of the present application;

fig. 2 is a flowchart of a video detection method according to an embodiment of the present application;

fig. 3 is a flowchart of a video detection method according to another embodiment of the present application;

fig. 4 is a flowchart of a video detection method according to another embodiment of the present application;

fig. 5 is a schematic view of an application scenario provided in another embodiment of the present application;

fig. 6 is a schematic structural diagram of a video detection device according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of a terminal device according to an embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth such as the particular system architecture, techniques, etc., in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

As used in the present specification and the appended claims, the term "if" may be interpreted as "when..once" or "in response to a determination" or "in response to detection" depending on the context. Similarly, the phrase "if..then determining" may be interpreted as meaning "upon determining" or "in response to determining" or "upon detecting compliance with a preset condition" or "in response to detecting compliance with a preset condition" depending on the context.

Furthermore, the terms "first," "second," "third," and the like in the description of the present specification and in the appended claims, are used for distinguishing between descriptions and not necessarily for indicating or implying a relative importance.

Reference in the specification to "a possible embodiment" or "some embodiments" or the like means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," and the like in various places throughout this specification are not necessarily all referring to the same embodiment, but mean "one or more but not all embodiments" unless expressly specified otherwise. The terms "comprising," "including," "having," and variations thereof mean "including but not limited to," unless expressly specified otherwise.

The video detection method provided by the embodiment of the application can be applied to terminal equipment such as mobile phones, tablet computers, wearable equipment, vehicle-mounted equipment, augmented reality (augmented reality, AR)/Virtual Reality (VR) equipment, notebook computers, ultra-mobile personal computer (UMPC), netbooks, personal digital assistants (personal digital assistant, PDA), security cameras, monitoring cameras and the like, and the embodiment of the application does not limit the specific types of the terminal equipment.

Fig. 1 is a schematic view of an application scenario provided in an embodiment of the present application.

Referring to fig. 1, the scene includes at least one video capturing device 11 and at least one terminal device connected to the video capturing device 11.

In some embodiments, the video capture device 11 may be a camera of various forms, for example, a security camera, a surveillance camera, a camera integrated on a notebook, a camera integrated on a smart phone, etc.

By way of example only and not limitation, the terminal device may be at least one of the server 12, the personal computer 13, the smart phone 14, and the tablet computer 15, the terminal device may acquire the video to be detected acquired by the video acquisition device 11 and detect the video, for example, the video acquisition device 11 may be in communication with the server 12, after the video acquisition device 11 acquires the video to be detected, the video to be detected is sent to the server 12 through a wired network or a wireless network, the server 12 processes the video to be detected to obtain a detection result of the video to be detected, where the detection result may be stored in the server 12 or a designated database, and then may retrieve the video to be detected and the detection result from the server 12 or the database through other devices, such as the smart phone, the personal computer, the tablet computer, and so on, for review or processing. Or, the video acquisition device 11 may be in communication connection with at least one of the personal computer 13, the smart phone 14 and the tablet computer 15, and send the video to be detected to a terminal device connected with the video acquisition device, process the video to be detected on the device to obtain a detection result of the video to be detected, and display the video to be detected and the detection result thereof on the device through a screen.

In still other embodiments, the video capturing apparatus 11 may be integrated in a terminal device, so as to implement the solution provided by the present application.

By way of example only and not limitation, a camera of the smartphone 14 may be used as the video capture device 11 to capture a video to be detected, and then stored in a memory of the smartphone 14, and then a processor of the smartphone 14 executes a corresponding executable program to process the video to be detected in the memory, obtain a detection result of the video to be detected, and display the detection result on a screen of the smartphone 14.

It will be understood by those skilled in the art that fig. 1 is merely an example of an application scenario of the present application, and does not constitute a limitation of the application scenario for executing the video detection method provided in the present application, and may include more devices than those illustrated in practical application, for example, when the video capturing device 11 is communicatively connected to the server 12, the server 12 may also be communicatively connected to a database for storing the video to be detected and the detection result; or is in communication connection with a screen and is used for displaying the detection result; and the system can also be in communication connection with alarm equipment for reminding abnormal detection results and the like, and is not limited herein.

Among other things, wireless networks may include solutions for communication in wireless local area networks (Wireless Localarea Networks, WLAN) (e.g., wi-Fi networks), bluetooth, zigbee, mobile communication networks, near field wireless communication technology (Near Field Communication, NFC), infrared technology (IR), etc. The wired network may include a fiber optic network, a telecommunications network, an intranet, etc., such as a local area network (Local Area Network, LAN), a wide area network (Wide Area Network, WAN), a metropolitan area network (Metropolitan Area Network, MAN), a public switched telephone network (Public Switched Telephone Network, PSTN), etc. The types of wireless networks and wired networks are not limited herein.

Fig. 2 is a flowchart of a video detection method according to an embodiment of the application. By way of example, but not limitation, the method may be applied to a terminal device in the above scenario, such as a server 12, a personal computer 13, a smart phone 14, a tablet computer 15, or a vehicle computer 16.

Referring to fig. 2, the video detection method includes:

s21, acquiring a video to be detected, wherein the video to be detected comprises N video frames.

Wherein N is an integer greater than 1.

In some embodiments, the video to be detected may be a video clip directly collected by the video collecting device 11, or may be a stored video clip that is collected and stored by the video collecting device 11 and then retrieved by a terminal device executing the method, which is not limited herein.

It should be noted that the video to be tested is usually a video clip based on RGB color space, and includes at least 2 frames of video frames. The frame number of the video to be tested is determined according to the duration of the video to be tested and the transmission frame number per second (Frames Per Second, FPS), for example, if the duration of the video to be tested is 2 seconds, the FPS30, the video to be tested includes 60 video frames.

When detecting the video, the duration of the video to be detected can be used as a detection step length, and the length of the detection step length can be set according to the practical situation when the video is applied, so that the video detection method is not limited.

S22, sequentially obtaining predicted frames of the N video frames.

The predictive network model may be a sequential network model, which may include, by way of example and not limitation, an LSTM model or a GRU model, or the like.

After the error image at the i-th moment (i.e., the error image of the i-th video frame) is input into the prediction network model, the prediction network model generates a prediction frame at the next moment (i.e., the prediction frame of the i+1th video frame) according to the input error image.

It should be noted that, the predicted frame of the 1 st video frame is obtained by inputting the preset error image into the prediction network model for processing, for example, the preset error image may be obtained by subtracting the 1 st video frame from the blank image (i.e. 0), or the 1 st video frame may be directly used as the preset error image.

The predictive network model may be trained by a preset sample and a preset sample label, and the training manner is a conventional means in the art and will not be described herein.

S23, calculating the difference degree of the predicted frames of the Nth video frame and the Nth video frame.

The difference degree is used for representing the similarity between the nth video frame and the predicted frame of the nth video frame, and the lower the similarity between the nth video frame and the predicted frame of the nth video frame is, the larger the difference between the nth video frame and the predicted frame of the nth video frame is, namely the video frame with the excessive difference with the predicted frame exists in the video to be detected, namely the video frame has abnormality.

By way of example and not limitation, the similarity between the nth video frame and the predicted frame of the nth video frame may be calculated by, for example, histogram matching, cosine similarity, mean hashing algorithm, and the like. Alternatively, the similarity between the two images may be represented by a distance between the images, for example, but not limited to, a euclidean distance, a mahalanobis distance, a manhattan distance, or the like.

S24, judging whether the difference degree meets preset conditions.

In some embodiments, the preset conditions may be set according to factors such as accuracy required by video detection and interference information in a scene included in video detection during actual application, so as to ensure that an abnormality can be accurately detected during video detection, and false detection caused by oversensitivity or overdullness is avoided.

S25, if the difference degree meets the preset condition, determining that the video to be detected is abnormal.

S26, if the difference degree does not meet the preset condition, determining that the video to be detected is abnormal.

It should be noted that the presence of an anomaly in the video to be detected means that a special event different from a normal event exists in the video to be detected. For example, if the scene in the video to be measured is a street, special events may include violent actions such as a motor vehicle traveling on a sidewalk, a pedestrian crossing a motor vehicle lane, a robbery or a fight; if the scene in the video to be tested is indoor, special events may include, for example, smoke, open fire, crowding, etc.

In the above embodiment, since the prediction frame is obtained by predicting according to the error image of the previous frame when calculating the prediction frame, the generated prediction frame considers the time sequence influence of the video to be detected, so that the obtained prediction frame of the nth video frame is more accurate, and further when confirming whether the video to be detected is abnormal according to the prediction frame of the nth video frame and the nth video frame, the effect of reducing the probability of false detection and improving the detection accuracy is achieved.

Wherein the predicted frame of the Nth video frame is subtracted from the Nth video frame and modulo taken can be obtained by L ₂ Norm implementation, e.g. if the predicted frame of the nth video frame isThe Nth video frame is I _N Degree of difference->

In other embodiments, the calculating the difference value between the nth video frame and the predicted frame of the nth video frame in S23 may be further implemented by the process shown in fig. 3, referring to fig. 3, and S23 may include:

s231, repairing the predicted frame of the Nth video frame through a preset repairing algorithm.

In some embodiments, a predicted frame for an nth video frameRepairing, correcting the problems of influence detection such as noise and distortion caused in the prediction process, and obtaining a predicted frame R of the N-th video frame after repairing _N Wherein, the preset repair algorithm can be a convolution self-encoder, and the convolution self-encoder pair is used for +.>Performing repair is a routine procedure for those skilled in the art and will not be described in detail herein.

S232, subtracting the predicted frame of the repaired Nth video frame from the Nth video frame, and taking a model to obtain the difference degree.

With reference to the above example, L may still be used ₂ N-th video frame R after norm realization repair _N Is the predicted frame and Nth video frame I _N Calculation of degree of difference, i.e

After the predicted frame of the Nth video frame is repaired through a preset repair algorithm, the difference degree is calculated with the Nth video frame, so that the problem of improvement of the difference degree during detection due to noise, distortion and other conditions can be effectively solved, the probability of false detection is reduced, and the accuracy of video detection is improved.

It should be noted that, the larger the difference degree is, the greater the possibility that an abnormality exists in the video to be measured is.

By way of example and not limitation, when the degree of variance is greater than 70%, it may be determined that there is an anomaly in the video under test, i.e., the first preset threshold may be set to 70%, and it should be understood that in a scenario with different accuracy requirements, the first preset threshold may also be set to other values such as 63%, 72%, 86%.

Optionally, if the difference degree meets the preset condition in S24, the process of determining that the video to be tested is abnormal may also be implemented by a flow shown in fig. 4. Referring to fig. 4, S24 may include:

s241, normalizing the reciprocal of the difference degree to obtain a normal degree score.

In some embodiments, after the inverse of the difference is normalized, a normal score of the video to be measured may be obtained, where the interval range of the normal score may be [0,1] or [0,100], which is not limited herein.

The higher the normality score, the higher the probability that the video to be measured is normal, for example, if the interval of the normality score is [0,100], the score 0 may indicate that the video to be measured is abnormal, and the score 100 indicates that the video to be measured is normal.

S242, if the normality score is smaller than a second preset threshold, determining that the video to be detected is abnormal.

Referring to the example in S241, in one possible implementation manner, when the degree of normality score is less than 30 hours, it may be determined that there is an abnormality in the video to be detected, that is, the second preset threshold is 30, and it should be understood that, in a scenario with different precision requirements, the second preset threshold may also be set to other values such as 35, 27, 12, and the like, which is not limited herein.

After the inverse of the difference degree is normalized, whether the video to be detected is abnormal or not is determined through the normal degree score of the video to be detected, and when the detection result is displayed, the normal degree score is more in line with the habit of a user, so that the user can more intuitively determine the abnormal degree of the video to be detected, and the user experience is improved.

It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present application.

Fig. 5 is a schematic view of an application scenario provided in another embodiment of the present application.

Referring to fig. 5, fig. 5 illustrates a scenario of reminding dangerous behavior when the scheme of the present application is applied to automatic driving of an automobile, and here, the application of the video detection method provided by the present application is explained by using an automatic driving example of the automobile, and the following application manner is only illustrative and not limiting.

In this scenario, the video capturing apparatus 11 (not shown) may be disposed at least one position of the air intake grille 16, the rear bumper 17, the side rearview mirrors 18, etc. of the automobile, and the video capturing apparatus 11 may employ a micro camera, an infrared camera, etc. in this scenario, for capturing an image of the driver on the driving seat. The video acquisition device 11 may be connected to a vehicle-mounted computer (not shown) through a cable, and the vehicle-mounted computer receives at least one video to be detected captured by the video acquisition device 11 when the vehicle is running, and processes each video to be detected to obtain a detection result.

The camera provided on the air intake grille 16 can collect video images of the front of the automobile, the camera provided on the rear bumper 17 can collect video images of the rear of the automobile, and the camera provided on the rear view mirror 18 on both sides can collect video images of both sides of the automobile. When the automobile is automatically driven, the detection frequency of a camera shooting the driving direction of the automobile can be improved, for example, the FPS collected by the camera on the air inlet grille 16 can be set to be 120FPS, the detection frequency is 0.1 seconds, whether the video to be detected including 12 video frames is abnormal or not needs to be detected within 0.1 seconds, wherein the predicted frame of the 1 st video frame detected each time can be used in the last detection period, and the predicted frame of the 12 th video frame can be used in the one detection period, if the video to be detected is abnormal, if the pedestrian 192 exists in the driving direction or other obstacles possibly causing safety problems exist in the driving direction, the driver can be reminded of safety through voice or lamplight, or other sensors can be cooperated, and the functions of automatic braking, automatic avoidance and the like can be realized.

Meanwhile, the cameras arranged on the rearview mirrors 18 on two sides can be arranged in the same way, so that whether other vehicles 191 approaching or other conditions possibly causing potential safety hazards exist on two sides of the automobile or not can be detected, and if so, a driver can be reminded of paying attention to safety or in coordination with other sensors through voice or lamplight, and functions of automatic braking, automatic avoidance and the like are realized.

It should be noted that, when the automobile is running, the rear direction of the automobile is safe, the detection frequency of the camera opposite to the running direction of the automobile can be reduced, so as to reduce the load of the driving computer, for example, the FPS collected by the camera on the rear bumper 17 can be set to 30, and the detection frequency is 0.2 seconds, so that only the video to be detected including 6 video frames needs to be detected in every 0.2 seconds to determine whether the video to be detected is abnormal.

There is also a scene in which the setting parameters of the camera on the rear bumper 17 in the above example and the setting parameters of the camera on the intake grill 16 can be interchanged to more sensitively detect the pedestrian 192 or other obstacle located behind the automobile when the automobile is automatically parked.

Fig. 6 is a schematic structural diagram of a video detection device according to an embodiment of the present application, corresponding to the video detection method described in the above embodiments, and only the portions related to the embodiments of the present application are shown for convenience of explanation.

Referring to fig. 6, the apparatus includes:

the obtaining module 31 is configured to obtain a video to be tested, where the video to be tested includes N video frames, where N is an integer greater than 1.

The prediction module 32 is configured to sequentially obtain predicted frames of the N video frames. The calculation mode of the predicted frame of the (i+1) th video frame in the N video frames is as follows: inputting an error image of an ith video frame into a trained prediction network model for processing to obtain a predicted frame of the (i+1) th video frame, wherein the error image of the ith video frame is obtained by subtracting the predicted frame of the ith video frame from the predicted frame of the ith video frame, i is more than or equal to 1 and less than or equal to N-1, and i is an integer.

A calculating module 33, configured to calculate a degree of difference between the nth video frame and the predicted frame of the nth video frame.

And the determining module 34 is configured to determine that the video to be detected is abnormal if the difference degree meets a preset condition.

In some embodiments, the calculating module 33 is specifically configured to subtract the predicted frame of the nth video frame from the nth video frame and modulo the predicted frame of the nth video frame to obtain the degree of difference.

In other embodiments, the calculating module 33 is specifically configured to repair the predicted frame of the nth video frame by a preset repair algorithm. And subtracting the predicted frame of the repaired Nth video frame from the Nth video frame, and taking a model to obtain the difference degree.

Optionally, the determining module 34 is specifically configured to determine that the video to be tested is abnormal when the difference degree is greater than a first preset threshold.

Optionally, the determining module 34 is further configured to normalize the inverse of the difference degree first to obtain a normality score. And then, when the normality score is smaller than a second preset threshold, determining that the video to be detected is abnormal.

The prediction network model is an LSTM model or a GRU model.

It should be noted that, because the content of information interaction and execution process between the modules and the embodiment of the method of the present application are based on the same concept, specific functions and technical effects thereof may be referred to in the method embodiment section, and details thereof are not repeated herein.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, the specific names of the functional units and modules are only for distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working process of the units and modules in the above system may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.

Referring to fig. 7, the embodiment of the present application further provides a terminal device 4, where the terminal device 4 includes: at least one processor 41, a memory 42 and a computer program 43 stored in the memory and executable on the at least one processor 41, the processor 41 implementing the steps of any of the various method embodiments described above when the computer program 43 is executed.

It should be noted that fig. 7 is not limited to the structure of the terminal device 4, and may include more or less components than those shown in the drawings, or may be combined with some components, or different components, for example, the terminal device 4 may also include a display screen, an indicator lamp, a motor, a control (e.g., a button), a gyro sensor, an acceleration sensor, and the like.

The processor 41 may be a central processing unit (Central Processing Unit, CPU), and the processor 41 may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 42 may in some embodiments be an internal storage unit of the terminal device 4, such as a hard disk or a memory of the terminal device 4. The memory 42 may in other embodiments also be an external storage device of the terminal device 4, such as a plug-in hard disk provided on the terminal device 4, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like. Further, the memory 42 may also include both an internal storage unit and an external storage device of the terminal device 4. The memory 42 is used to store an operating system, application programs, boot loader (BootLoader), data, and other programs, such as program code for the computer program. The memory 42 may also be used to temporarily store data that has been obtained or is to be obtained.

Embodiments of the present application also provide a computer readable storage medium storing a computer program which, when executed by a processor, implements steps for implementing the various method embodiments described above.

Embodiments of the present application provide a computer program product which, when run on a mobile terminal, causes the mobile terminal to perform steps that enable the implementation of the method embodiments described above.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present application may implement all or part of the flow of the method of the above embodiments, and may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the computer program may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include at least: any entity or device capable of carrying computer program code to a photographing device/terminal apparatus, recording medium, computer Memory, read-Only Memory (ROM), random access Memory (RAM, random Access Memory), electrical carrier signals, telecommunications signals, and software distribution media. Such as a U-disk, removable hard disk, magnetic or optical disk, etc. In some jurisdictions, computer readable media may not be electrical carrier signals and telecommunications signals in accordance with legislation and patent practice.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus/terminal device and method may be implemented in other manners. For example, the apparatus/terminal device embodiments described above are merely illustrative, e.g., the division of the modules or units is merely a logical function division, and there may be additional divisions in actual implementation, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection via interfaces, devices or units, which may be in electrical, mechanical or other forms.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

The above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.

Claims

1. A video detection method, comprising:

acquiring a video to be detected, wherein the video to be detected comprises N video frames, and N is an integer greater than 1;

the predicted frames of the N video frames are sequentially acquired, wherein the calculation mode of the predicted frames of the (i+1) th video frame in the N video frames is as follows: inputting an error image of an ith video frame into a trained prediction network model for processing to obtain a predicted frame of the (i+1) th video frame, wherein the error image of the ith video frame is obtained by subtracting the predicted frame of the ith video frame from the predicted frame of the ith video frame, i is more than or equal to 1 and less than or equal to N-1, and i is an integer; wherein the predictive network model is a time sequence network model; the 1 st video frame prediction frame is obtained by inputting a preset error image into a prediction network model for processing;

calculating the difference degree of the predicted frames of the Nth video frame and the Nth video frame;

if the difference degree meets a preset condition, determining that the video to be detected is abnormal comprises the following steps:

if the difference degree is larger than a first preset threshold value, determining that the video to be detected is abnormal.

2. The method of claim 1, wherein said calculating a degree of difference of the predicted frame of the nth video frame and the nth video frame comprises:

and subtracting the predicted frame of the Nth video frame from the Nth video frame, and taking a model to obtain the difference degree.

3. The method of claim 1, wherein said calculating a degree of difference of the predicted frame of the nth video frame and the nth video frame comprises:

repairing the predicted frame of the Nth video frame through a preset repairing algorithm;

subtracting the predicted frame of the N video frame after repair from the N video frame and taking a module to obtain the difference degree.

4. A method according to any one of claims 1 to 3, wherein if the difference degree meets a preset condition, determining that the video to be tested is abnormal includes:

normalizing the reciprocal of the difference degree to obtain a normal degree score;

if the normality score is smaller than a second preset threshold, determining that the video to be detected is abnormal.

5. A method according to any one of claims 1-3, wherein the predictive network model is a long and short memory network LSTM model or a gated recurrent neural network GRU model.

6. A video detection apparatus, comprising:

the device comprises an acquisition module, a display module and a display module, wherein the acquisition module is used for acquiring a video to be detected, the video to be detected comprises N video frames, wherein N is an integer greater than 1;

the prediction module is used for sequentially obtaining the predicted frames of the N video frames, wherein the calculation mode of the predicted frames of the (i+1) th video frame in the N video frames is as follows: inputting an error image of an ith video frame into a trained prediction network model for processing to obtain a predicted frame of the (i+1) th video frame, wherein the error image of the ith video frame is obtained by subtracting the predicted frame of the ith video frame from the predicted frame of the ith video frame, i is more than or equal to 1 and less than or equal to N-1, and i is an integer; wherein the predictive network model is a time sequence network model; the 1 st video frame prediction frame is obtained by inputting a preset error image into a prediction network model for processing;

the computing module is used for computing the difference degree of the predicted frames of the Nth video frame and the Nth video frame;

the determining module is configured to determine that the video to be detected is abnormal if the difference degree meets a preset condition, and includes:

7. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1 to 5 when executing the computer program.

8. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the method according to any one of claims 1 to 5.