CN110889351B - Video detection method, device, terminal equipment and readable storage medium - Google Patents

Video detection method, device, terminal equipment and readable storage medium Download PDF

Info

Publication number
CN110889351B
CN110889351B CN201911128730.9A CN201911128730A CN110889351B CN 110889351 B CN110889351 B CN 110889351B CN 201911128730 A CN201911128730 A CN 201911128730A CN 110889351 B CN110889351 B CN 110889351B
Authority
CN
China
Prior art keywords
video
frame
video frame
predicted
detected
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911128730.9A
Other languages
Chinese (zh)
Other versions
CN110889351A (en
Inventor
乔宇
彭小江
叶木超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Institute of Advanced Technology of CAS
Original Assignee
Shenzhen Institute of Advanced Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Institute of Advanced Technology of CAS filed Critical Shenzhen Institute of Advanced Technology of CAS
Priority to CN201911128730.9A priority Critical patent/CN110889351B/en
Publication of CN110889351A publication Critical patent/CN110889351A/en
Priority to PCT/CN2020/129171 priority patent/WO2021098657A1/en
Application granted granted Critical
Publication of CN110889351B publication Critical patent/CN110889351B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/44Event detection

Abstract

The application is applicable to the technical field of computer vision, and provides a video detection method, a device, terminal equipment and a readable storage medium, wherein the method comprises the following steps: and acquiring a video to be detected, wherein the video to be detected comprises N video frames. And sequentially acquiring predicted frames of N video frames, wherein the predicted frames of the (i+1) th video frame are obtained by inputting error images of the (i) th video frame into a trained prediction network model for processing. If the difference degree of the N-th video frame and the predicted frame of the N-th video frame accords with the preset condition, determining that the video to be detected is abnormal. The prediction frame is obtained according to the error image prediction of the previous frame when the prediction frame is calculated, so that the generated prediction frame considers the time sequence influence of the video to be detected, the obtained prediction frame of the N-th video frame is more accurate, and the effects of reducing the probability of false detection and improving the detection accuracy are realized when whether the video to be detected is abnormal or not is confirmed according to the prediction frame of the N-th video frame and the N-th video frame.

Description

Video detection method, device, terminal equipment and readable storage medium
Technical Field
The application belongs to the technical field of computer vision, and particularly relates to a video detection method, a video detection device, terminal equipment and a readable storage medium.
Background
Video anomaly detection can detect whether an anomaly event is contained in a video, wherein the anomaly event in the video refers to a special event different from normal behavior in a specific scene, and the anomaly event can damage public safety or influence social order to cause serious consequences, so that the video anomaly detection is used for determining whether the anomaly event exists or not, and plays an important role in maintaining the social order.
In the prior art, N frames of a video to be detected are input into an automatic encoder as a whole, the automatic encoder carries out convolution operation on the input N frames to obtain a video frame, and finally, whether the video to be detected has an abnormality or not is determined according to the video frame.
However, since the conventional technology does not consider the influence of the time sequence on the detection of the abnormal event, false detection is likely to occur when the abnormal event is detected, and the detection effect is not ideal.
Disclosure of Invention
The embodiment of the application provides a video detection method, a device, terminal equipment and a readable storage medium, which are used for solving the problem that false detection is easy to occur when an abnormal event is detected, so that the detection effect is not ideal.
In a first aspect, an embodiment of the present application provides a video detection method, including:
and acquiring a video to be detected, wherein the video to be detected comprises N video frames, and N is an integer greater than 1.
And sequentially acquiring predicted frames of the N video frames.
The calculation mode of the predicted frame of the (i+1) th video frame in the N video frames is as follows: inputting an error image of an ith video frame into a trained prediction network model for processing to obtain a predicted frame of an ith+1th video frame, wherein the error image of the ith video frame is obtained by subtracting the predicted frame of the ith video frame from the predicted frame of the ith video frame, i is more than or equal to 1 and less than or equal to N-1, and i is an integer.
And calculating the difference degree of the predicted frames of the Nth video frame and the Nth video frame.
If the difference degree meets the preset condition, determining that the video to be detected is abnormal.
In a possible implementation manner of the first aspect, the execution subject of the video detection method is a terminal with image processing capability. The terminal may be an entity terminal, such as a desktop computer, a server, a notebook computer, a tablet computer, or the like, or may be a virtual terminal, such as a cloud server, cloud computing, or the like. It should be understood that the above execution subject is only an example and not necessarily the above terminal.
It should be noted that, the predicted frame of the 1 st video frame is obtained by inputting the preset error image into the predicted network model for processing.
In some embodiments, calculating the degree of difference between the nth video frame and the predicted frame of the nth video frame may be performed by subtracting the predicted frame of the nth video frame from the nth video frame and taking a modulus to obtain the degree of difference.
In other embodiments, calculating the difference value between the nth video frame and the predicted frame of the nth video frame may further repair the predicted frame of the nth video frame by a preset repair algorithm. And subtracting the predicted frame of the repaired Nth video frame from the Nth video frame, and taking a model to obtain the difference degree.
Optionally, if the difference degree meets a preset condition, determining that the video to be detected is abnormal, and when the difference degree is greater than a first preset threshold, determining that the video to be detected is abnormal.
Optionally, if the difference degree meets a preset condition, determining that the video to be detected is abnormal, and normalizing the reciprocal of the difference degree to obtain a normality score. And then, when the normality score is smaller than a second preset threshold, determining that the video to be detected is abnormal.
The prediction network model is a Long Short-Term Memory (LSTM) model or a gated recurrent neural network (Gated Recurrent Unit, GRU) model.
In a second aspect, an embodiment of the present application provides a video detection apparatus, including:
the device comprises an acquisition module, a display module and a display module, wherein the acquisition module is used for acquiring a video to be detected, the video to be detected comprises N video frames, wherein N is an integer greater than 1. And the prediction module is used for sequentially acquiring the predicted frames of the N video frames. The calculation mode of the predicted frame of the (i+1) th video frame in the N video frames is as follows: inputting an error image of an ith video frame into a trained prediction network model for processing to obtain a predicted frame of the (i+1) th video frame, wherein the error image of the ith video frame is obtained by subtracting the predicted frame of the ith video frame from the predicted frame of the ith video frame, i is more than or equal to 1 and less than or equal to N-1, and i is an integer. And the calculating module is used for calculating the difference degree between the Nth video frame and the predicted frame of the Nth video frame. And the determining module is used for determining that the video to be detected is abnormal if the difference degree meets a preset condition.
In a possible implementation manner of the second aspect, the video detection device may be an execution body of the first aspect, and a specific form thereof is the same as the implementation body, and is not described herein.
It should be noted that, the predicted frame of the 1 st video frame is obtained by inputting the preset error image into the predicted network model for processing.
In some embodiments, the calculating module is specifically configured to subtract the predicted frame of the nth video frame from the nth video frame and modulo the predicted frame of the nth video frame to obtain the difference degree.
In other embodiments, the calculating module is specifically configured to repair the predicted frame of the nth video frame by using a preset repair algorithm. And subtracting the predicted frame of the repaired Nth video frame from the Nth video frame, and taking a model to obtain the difference degree.
Optionally, the determining module is specifically configured to determine that the video to be detected is abnormal when the difference degree is greater than a first preset threshold.
Optionally, the determining module is further configured to normalize the reciprocal of the difference degree to obtain a normality score. And then, when the normality score is smaller than a second preset threshold, determining that the video to be detected is abnormal.
The prediction network model is an LSTM model or a GRU model.
In a third aspect, an embodiment of the present application provides a terminal device, including: a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the method as provided in the first aspect when executing the computer program.
In a fourth aspect, embodiments of the present application provide a computer readable storage medium storing a computer program which when executed by a processor performs a method as provided in the first aspect.
In a fifth aspect, an embodiment of the application provides a computer program product for, when run on a terminal device, causing the terminal device to perform the method as provided in the first aspect.
It will be appreciated that the advantages of the second to fifth aspects may be found in the relevant description of the first aspect, and are not described here again.
Compared with the prior art, the embodiment of the application has the beneficial effects that: firstly, according to the video to be detected, the predicted frames of N video frames are sequentially obtained. The error image of the ith video frame is obtained by subtracting the predicted frame of the ith video frame from the predicted frame of the ith video frame, and the (i+1) th predicted frame is obtained by inputting the error image of the ith video frame into a trained prediction network model for processing. Then, the difference degree between the N-th video frame and the predicted frame of the N-th video frame is calculated. And finally, if the difference degree meets the preset condition, determining that the video to be detected is abnormal. The prediction frame is obtained according to the error image prediction of the previous frame when the prediction frame is calculated, so that the generated prediction frame considers the time sequence influence of the video to be detected, the obtained prediction frame of the N-th video frame is more accurate, and the effects of reducing the probability of false detection and improving the detection accuracy are realized when whether the video to be detected is abnormal or not is confirmed according to the prediction frame of the N-th video frame and the N-th video frame.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic view of an application scenario provided in an embodiment of the present application;
fig. 2 is a flowchart of a video detection method according to an embodiment of the present application;
fig. 3 is a flowchart of a video detection method according to another embodiment of the present application;
fig. 4 is a flowchart of a video detection method according to another embodiment of the present application;
fig. 5 is a schematic view of an application scenario provided in another embodiment of the present application;
fig. 6 is a schematic structural diagram of a video detection device according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of a terminal device according to an embodiment of the present application.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth such as the particular system architecture, techniques, etc., in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.
As used in the present specification and the appended claims, the term "if" may be interpreted as "when..once" or "in response to a determination" or "in response to detection" depending on the context. Similarly, the phrase "if..then determining" may be interpreted as meaning "upon determining" or "in response to determining" or "upon detecting compliance with a preset condition" or "in response to detecting compliance with a preset condition" depending on the context.
Furthermore, the terms "first," "second," "third," and the like in the description of the present specification and in the appended claims, are used for distinguishing between descriptions and not necessarily for indicating or implying a relative importance.
Reference in the specification to "a possible embodiment" or "some embodiments" or the like means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," and the like in various places throughout this specification are not necessarily all referring to the same embodiment, but mean "one or more but not all embodiments" unless expressly specified otherwise. The terms "comprising," "including," "having," and variations thereof mean "including but not limited to," unless expressly specified otherwise.
The video detection method provided by the embodiment of the application can be applied to terminal equipment such as mobile phones, tablet computers, wearable equipment, vehicle-mounted equipment, augmented reality (augmented reality, AR)/Virtual Reality (VR) equipment, notebook computers, ultra-mobile personal computer (UMPC), netbooks, personal digital assistants (personal digital assistant, PDA), security cameras, monitoring cameras and the like, and the embodiment of the application does not limit the specific types of the terminal equipment.
Fig. 1 is a schematic view of an application scenario provided in an embodiment of the present application.
Referring to fig. 1, the scene includes at least one video capturing device 11 and at least one terminal device connected to the video capturing device 11.
In some embodiments, the video capture device 11 may be a camera of various forms, for example, a security camera, a surveillance camera, a camera integrated on a notebook, a camera integrated on a smart phone, etc.
By way of example only and not limitation, the terminal device may be at least one of the server 12, the personal computer 13, the smart phone 14, and the tablet computer 15, the terminal device may acquire the video to be detected acquired by the video acquisition device 11 and detect the video, for example, the video acquisition device 11 may be in communication with the server 12, after the video acquisition device 11 acquires the video to be detected, the video to be detected is sent to the server 12 through a wired network or a wireless network, the server 12 processes the video to be detected to obtain a detection result of the video to be detected, where the detection result may be stored in the server 12 or a designated database, and then may retrieve the video to be detected and the detection result from the server 12 or the database through other devices, such as the smart phone, the personal computer, the tablet computer, and so on, for review or processing. Or, the video acquisition device 11 may be in communication connection with at least one of the personal computer 13, the smart phone 14 and the tablet computer 15, and send the video to be detected to a terminal device connected with the video acquisition device, process the video to be detected on the device to obtain a detection result of the video to be detected, and display the video to be detected and the detection result thereof on the device through a screen.
In still other embodiments, the video capturing apparatus 11 may be integrated in a terminal device, so as to implement the solution provided by the present application.
By way of example only and not limitation, a camera of the smartphone 14 may be used as the video capture device 11 to capture a video to be detected, and then stored in a memory of the smartphone 14, and then a processor of the smartphone 14 executes a corresponding executable program to process the video to be detected in the memory, obtain a detection result of the video to be detected, and display the detection result on a screen of the smartphone 14.
It will be understood by those skilled in the art that fig. 1 is merely an example of an application scenario of the present application, and does not constitute a limitation of the application scenario for executing the video detection method provided in the present application, and may include more devices than those illustrated in practical application, for example, when the video capturing device 11 is communicatively connected to the server 12, the server 12 may also be communicatively connected to a database for storing the video to be detected and the detection result; or is in communication connection with a screen and is used for displaying the detection result; and the system can also be in communication connection with alarm equipment for reminding abnormal detection results and the like, and is not limited herein.
Among other things, wireless networks may include solutions for communication in wireless local area networks (Wireless Localarea Networks, WLAN) (e.g., wi-Fi networks), bluetooth, zigbee, mobile communication networks, near field wireless communication technology (Near Field Communication, NFC), infrared technology (IR), etc. The wired network may include a fiber optic network, a telecommunications network, an intranet, etc., such as a local area network (Local Area Network, LAN), a wide area network (Wide Area Network, WAN), a metropolitan area network (Metropolitan Area Network, MAN), a public switched telephone network (Public Switched Telephone Network, PSTN), etc. The types of wireless networks and wired networks are not limited herein.
Fig. 2 is a flowchart of a video detection method according to an embodiment of the application. By way of example, but not limitation, the method may be applied to a terminal device in the above scenario, such as a server 12, a personal computer 13, a smart phone 14, a tablet computer 15, or a vehicle computer 16.
Referring to fig. 2, the video detection method includes:
s21, acquiring a video to be detected, wherein the video to be detected comprises N video frames.
Wherein N is an integer greater than 1.
In some embodiments, the video to be detected may be a video clip directly collected by the video collecting device 11, or may be a stored video clip that is collected and stored by the video collecting device 11 and then retrieved by a terminal device executing the method, which is not limited herein.
It should be noted that the video to be tested is usually a video clip based on RGB color space, and includes at least 2 frames of video frames. The frame number of the video to be tested is determined according to the duration of the video to be tested and the transmission frame number per second (Frames Per Second, FPS), for example, if the duration of the video to be tested is 2 seconds, the FPS30, the video to be tested includes 60 video frames.
When detecting the video, the duration of the video to be detected can be used as a detection step length, and the length of the detection step length can be set according to the practical situation when the video is applied, so that the video detection method is not limited.
S22, sequentially obtaining predicted frames of the N video frames.
The calculation mode of the predicted frame of the (i+1) th video frame in the N video frames is as follows: inputting an error image of an ith video frame into a trained prediction network model for processing to obtain a predicted frame of an ith+1th video frame, wherein the error image of the ith video frame is obtained by subtracting the predicted frame of the ith video frame from the predicted frame of the ith video frame, i is more than or equal to 1 and less than or equal to N-1, and i is an integer.
The predictive network model may be a sequential network model, which may include, by way of example and not limitation, an LSTM model or a GRU model, or the like.
After the error image at the i-th moment (i.e., the error image of the i-th video frame) is input into the prediction network model, the prediction network model generates a prediction frame at the next moment (i.e., the prediction frame of the i+1th video frame) according to the input error image.
It should be noted that, the predicted frame of the 1 st video frame is obtained by inputting the preset error image into the prediction network model for processing, for example, the preset error image may be obtained by subtracting the 1 st video frame from the blank image (i.e. 0), or the 1 st video frame may be directly used as the preset error image.
The predictive network model may be trained by a preset sample and a preset sample label, and the training manner is a conventional means in the art and will not be described herein.
S23, calculating the difference degree of the predicted frames of the Nth video frame and the Nth video frame.
The difference degree is used for representing the similarity between the nth video frame and the predicted frame of the nth video frame, and the lower the similarity between the nth video frame and the predicted frame of the nth video frame is, the larger the difference between the nth video frame and the predicted frame of the nth video frame is, namely the video frame with the excessive difference with the predicted frame exists in the video to be detected, namely the video frame has abnormality.
By way of example and not limitation, the similarity between the nth video frame and the predicted frame of the nth video frame may be calculated by, for example, histogram matching, cosine similarity, mean hashing algorithm, and the like. Alternatively, the similarity between the two images may be represented by a distance between the images, for example, but not limited to, a euclidean distance, a mahalanobis distance, a manhattan distance, or the like.
S24, judging whether the difference degree meets preset conditions.
In some embodiments, the preset conditions may be set according to factors such as accuracy required by video detection and interference information in a scene included in video detection during actual application, so as to ensure that an abnormality can be accurately detected during video detection, and false detection caused by oversensitivity or overdullness is avoided.
S25, if the difference degree meets the preset condition, determining that the video to be detected is abnormal.
S26, if the difference degree does not meet the preset condition, determining that the video to be detected is abnormal.
It should be noted that the presence of an anomaly in the video to be detected means that a special event different from a normal event exists in the video to be detected. For example, if the scene in the video to be measured is a street, special events may include violent actions such as a motor vehicle traveling on a sidewalk, a pedestrian crossing a motor vehicle lane, a robbery or a fight; if the scene in the video to be tested is indoor, special events may include, for example, smoke, open fire, crowding, etc.
In the above embodiment, since the prediction frame is obtained by predicting according to the error image of the previous frame when calculating the prediction frame, the generated prediction frame considers the time sequence influence of the video to be detected, so that the obtained prediction frame of the nth video frame is more accurate, and further when confirming whether the video to be detected is abnormal according to the prediction frame of the nth video frame and the nth video frame, the effect of reducing the probability of false detection and improving the detection accuracy is achieved.
In some embodiments, calculating the degree of difference between the nth video frame and the predicted frame of the nth video frame may be performed by subtracting the predicted frame of the nth video frame from the nth video frame and taking a modulus to obtain the degree of difference.
Wherein the predicted frame of the Nth video frame is subtracted from the Nth video frame and modulo taken can be obtained by L 2 Norm implementation, e.g. if the predicted frame of the nth video frame isThe Nth video frame is I N Degree of difference->
In other embodiments, the calculating the difference value between the nth video frame and the predicted frame of the nth video frame in S23 may be further implemented by the process shown in fig. 3, referring to fig. 3, and S23 may include:
s231, repairing the predicted frame of the Nth video frame through a preset repairing algorithm.
In some embodiments, a predicted frame for an nth video frameRepairing, correcting the problems of influence detection such as noise and distortion caused in the prediction process, and obtaining a predicted frame R of the N-th video frame after repairing N Wherein, the preset repair algorithm can be a convolution self-encoder, and the convolution self-encoder pair is used for +.>Performing repair is a routine procedure for those skilled in the art and will not be described in detail herein.
S232, subtracting the predicted frame of the repaired Nth video frame from the Nth video frame, and taking a model to obtain the difference degree.
With reference to the above example, L may still be used 2 N-th video frame R after norm realization repair N Is the predicted frame and Nth video frame I N Calculation of degree of difference, i.e
After the predicted frame of the Nth video frame is repaired through a preset repair algorithm, the difference degree is calculated with the Nth video frame, so that the problem of improvement of the difference degree during detection due to noise, distortion and other conditions can be effectively solved, the probability of false detection is reduced, and the accuracy of video detection is improved.
Optionally, if the difference degree meets a preset condition, determining that the video to be detected is abnormal, and when the difference degree is greater than a first preset threshold, determining that the video to be detected is abnormal.
It should be noted that, the larger the difference degree is, the greater the possibility that an abnormality exists in the video to be measured is.
By way of example and not limitation, when the degree of variance is greater than 70%, it may be determined that there is an anomaly in the video under test, i.e., the first preset threshold may be set to 70%, and it should be understood that in a scenario with different accuracy requirements, the first preset threshold may also be set to other values such as 63%, 72%, 86%.
Optionally, if the difference degree meets the preset condition in S24, the process of determining that the video to be tested is abnormal may also be implemented by a flow shown in fig. 4. Referring to fig. 4, S24 may include:
s241, normalizing the reciprocal of the difference degree to obtain a normal degree score.
In some embodiments, after the inverse of the difference is normalized, a normal score of the video to be measured may be obtained, where the interval range of the normal score may be [0,1] or [0,100], which is not limited herein.
The higher the normality score, the higher the probability that the video to be measured is normal, for example, if the interval of the normality score is [0,100], the score 0 may indicate that the video to be measured is abnormal, and the score 100 indicates that the video to be measured is normal.
S242, if the normality score is smaller than a second preset threshold, determining that the video to be detected is abnormal.
Referring to the example in S241, in one possible implementation manner, when the degree of normality score is less than 30 hours, it may be determined that there is an abnormality in the video to be detected, that is, the second preset threshold is 30, and it should be understood that, in a scenario with different precision requirements, the second preset threshold may also be set to other values such as 35, 27, 12, and the like, which is not limited herein.
After the inverse of the difference degree is normalized, whether the video to be detected is abnormal or not is determined through the normal degree score of the video to be detected, and when the detection result is displayed, the normal degree score is more in line with the habit of a user, so that the user can more intuitively determine the abnormal degree of the video to be detected, and the user experience is improved.
It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present application.
Fig. 5 is a schematic view of an application scenario provided in another embodiment of the present application.
Referring to fig. 5, fig. 5 illustrates a scenario of reminding dangerous behavior when the scheme of the present application is applied to automatic driving of an automobile, and here, the application of the video detection method provided by the present application is explained by using an automatic driving example of the automobile, and the following application manner is only illustrative and not limiting.
In this scenario, the video capturing apparatus 11 (not shown) may be disposed at least one position of the air intake grille 16, the rear bumper 17, the side rearview mirrors 18, etc. of the automobile, and the video capturing apparatus 11 may employ a micro camera, an infrared camera, etc. in this scenario, for capturing an image of the driver on the driving seat. The video acquisition device 11 may be connected to a vehicle-mounted computer (not shown) through a cable, and the vehicle-mounted computer receives at least one video to be detected captured by the video acquisition device 11 when the vehicle is running, and processes each video to be detected to obtain a detection result.
The camera provided on the air intake grille 16 can collect video images of the front of the automobile, the camera provided on the rear bumper 17 can collect video images of the rear of the automobile, and the camera provided on the rear view mirror 18 on both sides can collect video images of both sides of the automobile. When the automobile is automatically driven, the detection frequency of a camera shooting the driving direction of the automobile can be improved, for example, the FPS collected by the camera on the air inlet grille 16 can be set to be 120FPS, the detection frequency is 0.1 seconds, whether the video to be detected including 12 video frames is abnormal or not needs to be detected within 0.1 seconds, wherein the predicted frame of the 1 st video frame detected each time can be used in the last detection period, and the predicted frame of the 12 th video frame can be used in the one detection period, if the video to be detected is abnormal, if the pedestrian 192 exists in the driving direction or other obstacles possibly causing safety problems exist in the driving direction, the driver can be reminded of safety through voice or lamplight, or other sensors can be cooperated, and the functions of automatic braking, automatic avoidance and the like can be realized.
Meanwhile, the cameras arranged on the rearview mirrors 18 on two sides can be arranged in the same way, so that whether other vehicles 191 approaching or other conditions possibly causing potential safety hazards exist on two sides of the automobile or not can be detected, and if so, a driver can be reminded of paying attention to safety or in coordination with other sensors through voice or lamplight, and functions of automatic braking, automatic avoidance and the like are realized.
It should be noted that, when the automobile is running, the rear direction of the automobile is safe, the detection frequency of the camera opposite to the running direction of the automobile can be reduced, so as to reduce the load of the driving computer, for example, the FPS collected by the camera on the rear bumper 17 can be set to 30, and the detection frequency is 0.2 seconds, so that only the video to be detected including 6 video frames needs to be detected in every 0.2 seconds to determine whether the video to be detected is abnormal.
There is also a scene in which the setting parameters of the camera on the rear bumper 17 in the above example and the setting parameters of the camera on the intake grill 16 can be interchanged to more sensitively detect the pedestrian 192 or other obstacle located behind the automobile when the automobile is automatically parked.
Fig. 6 is a schematic structural diagram of a video detection device according to an embodiment of the present application, corresponding to the video detection method described in the above embodiments, and only the portions related to the embodiments of the present application are shown for convenience of explanation.
Referring to fig. 6, the apparatus includes:
the obtaining module 31 is configured to obtain a video to be tested, where the video to be tested includes N video frames, where N is an integer greater than 1.
The prediction module 32 is configured to sequentially obtain predicted frames of the N video frames. The calculation mode of the predicted frame of the (i+1) th video frame in the N video frames is as follows: inputting an error image of an ith video frame into a trained prediction network model for processing to obtain a predicted frame of the (i+1) th video frame, wherein the error image of the ith video frame is obtained by subtracting the predicted frame of the ith video frame from the predicted frame of the ith video frame, i is more than or equal to 1 and less than or equal to N-1, and i is an integer.
A calculating module 33, configured to calculate a degree of difference between the nth video frame and the predicted frame of the nth video frame.
And the determining module 34 is configured to determine that the video to be detected is abnormal if the difference degree meets a preset condition.
It should be noted that, the predicted frame of the 1 st video frame is obtained by inputting the preset error image into the predicted network model for processing.
In some embodiments, the calculating module 33 is specifically configured to subtract the predicted frame of the nth video frame from the nth video frame and modulo the predicted frame of the nth video frame to obtain the degree of difference.
In other embodiments, the calculating module 33 is specifically configured to repair the predicted frame of the nth video frame by a preset repair algorithm. And subtracting the predicted frame of the repaired Nth video frame from the Nth video frame, and taking a model to obtain the difference degree.
Optionally, the determining module 34 is specifically configured to determine that the video to be tested is abnormal when the difference degree is greater than a first preset threshold.
Optionally, the determining module 34 is further configured to normalize the inverse of the difference degree first to obtain a normality score. And then, when the normality score is smaller than a second preset threshold, determining that the video to be detected is abnormal.
The prediction network model is an LSTM model or a GRU model.
It should be noted that, because the content of information interaction and execution process between the modules and the embodiment of the method of the present application are based on the same concept, specific functions and technical effects thereof may be referred to in the method embodiment section, and details thereof are not repeated herein.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, the specific names of the functional units and modules are only for distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working process of the units and modules in the above system may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.
Fig. 7 is a schematic structural diagram of a terminal device according to an embodiment of the present application.
Referring to fig. 7, the embodiment of the present application further provides a terminal device 4, where the terminal device 4 includes: at least one processor 41, a memory 42 and a computer program 43 stored in the memory and executable on the at least one processor 41, the processor 41 implementing the steps of any of the various method embodiments described above when the computer program 43 is executed.
It should be noted that fig. 7 is not limited to the structure of the terminal device 4, and may include more or less components than those shown in the drawings, or may be combined with some components, or different components, for example, the terminal device 4 may also include a display screen, an indicator lamp, a motor, a control (e.g., a button), a gyro sensor, an acceleration sensor, and the like.
The processor 41 may be a central processing unit (Central Processing Unit, CPU), and the processor 41 may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 42 may in some embodiments be an internal storage unit of the terminal device 4, such as a hard disk or a memory of the terminal device 4. The memory 42 may in other embodiments also be an external storage device of the terminal device 4, such as a plug-in hard disk provided on the terminal device 4, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like. Further, the memory 42 may also include both an internal storage unit and an external storage device of the terminal device 4. The memory 42 is used to store an operating system, application programs, boot loader (BootLoader), data, and other programs, such as program code for the computer program. The memory 42 may also be used to temporarily store data that has been obtained or is to be obtained.
Embodiments of the present application also provide a computer readable storage medium storing a computer program which, when executed by a processor, implements steps for implementing the various method embodiments described above.
Embodiments of the present application provide a computer program product which, when run on a mobile terminal, causes the mobile terminal to perform steps that enable the implementation of the method embodiments described above.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present application may implement all or part of the flow of the method of the above embodiments, and may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the computer program may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include at least: any entity or device capable of carrying computer program code to a photographing device/terminal apparatus, recording medium, computer Memory, read-Only Memory (ROM), random access Memory (RAM, random Access Memory), electrical carrier signals, telecommunications signals, and software distribution media. Such as a U-disk, removable hard disk, magnetic or optical disk, etc. In some jurisdictions, computer readable media may not be electrical carrier signals and telecommunications signals in accordance with legislation and patent practice.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus/terminal device and method may be implemented in other manners. For example, the apparatus/terminal device embodiments described above are merely illustrative, e.g., the division of the modules or units is merely a logical function division, and there may be additional divisions in actual implementation, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection via interfaces, devices or units, which may be in electrical, mechanical or other forms.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
The above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.

Claims (8)

1. A video detection method, comprising:
acquiring a video to be detected, wherein the video to be detected comprises N video frames, and N is an integer greater than 1;
the predicted frames of the N video frames are sequentially acquired, wherein the calculation mode of the predicted frames of the (i+1) th video frame in the N video frames is as follows: inputting an error image of an ith video frame into a trained prediction network model for processing to obtain a predicted frame of the (i+1) th video frame, wherein the error image of the ith video frame is obtained by subtracting the predicted frame of the ith video frame from the predicted frame of the ith video frame, i is more than or equal to 1 and less than or equal to N-1, and i is an integer; wherein the predictive network model is a time sequence network model; the 1 st video frame prediction frame is obtained by inputting a preset error image into a prediction network model for processing;
calculating the difference degree of the predicted frames of the Nth video frame and the Nth video frame;
if the difference degree meets a preset condition, determining that the video to be detected is abnormal comprises the following steps:
if the difference degree is larger than a first preset threshold value, determining that the video to be detected is abnormal.
2. The method of claim 1, wherein said calculating a degree of difference of the predicted frame of the nth video frame and the nth video frame comprises:
and subtracting the predicted frame of the Nth video frame from the Nth video frame, and taking a model to obtain the difference degree.
3. The method of claim 1, wherein said calculating a degree of difference of the predicted frame of the nth video frame and the nth video frame comprises:
repairing the predicted frame of the Nth video frame through a preset repairing algorithm;
subtracting the predicted frame of the N video frame after repair from the N video frame and taking a module to obtain the difference degree.
4. A method according to any one of claims 1 to 3, wherein if the difference degree meets a preset condition, determining that the video to be tested is abnormal includes:
normalizing the reciprocal of the difference degree to obtain a normal degree score;
if the normality score is smaller than a second preset threshold, determining that the video to be detected is abnormal.
5. A method according to any one of claims 1-3, wherein the predictive network model is a long and short memory network LSTM model or a gated recurrent neural network GRU model.
6. A video detection apparatus, comprising:
the device comprises an acquisition module, a display module and a display module, wherein the acquisition module is used for acquiring a video to be detected, the video to be detected comprises N video frames, wherein N is an integer greater than 1;
the prediction module is used for sequentially obtaining the predicted frames of the N video frames, wherein the calculation mode of the predicted frames of the (i+1) th video frame in the N video frames is as follows: inputting an error image of an ith video frame into a trained prediction network model for processing to obtain a predicted frame of the (i+1) th video frame, wherein the error image of the ith video frame is obtained by subtracting the predicted frame of the ith video frame from the predicted frame of the ith video frame, i is more than or equal to 1 and less than or equal to N-1, and i is an integer; wherein the predictive network model is a time sequence network model; the 1 st video frame prediction frame is obtained by inputting a preset error image into a prediction network model for processing;
the computing module is used for computing the difference degree of the predicted frames of the Nth video frame and the Nth video frame;
the determining module is configured to determine that the video to be detected is abnormal if the difference degree meets a preset condition, and includes:
if the difference degree is larger than a first preset threshold value, determining that the video to be detected is abnormal.
7. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1 to 5 when executing the computer program.
8. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the method according to any one of claims 1 to 5.
CN201911128730.9A 2019-11-18 2019-11-18 Video detection method, device, terminal equipment and readable storage medium Active CN110889351B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201911128730.9A CN110889351B (en) 2019-11-18 2019-11-18 Video detection method, device, terminal equipment and readable storage medium
PCT/CN2020/129171 WO2021098657A1 (en) 2019-11-18 2020-11-16 Video detection method and apparatus, terminal device, and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911128730.9A CN110889351B (en) 2019-11-18 2019-11-18 Video detection method, device, terminal equipment and readable storage medium

Publications (2)

Publication Number Publication Date
CN110889351A CN110889351A (en) 2020-03-17
CN110889351B true CN110889351B (en) 2023-09-26

Family

ID=69747861

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911128730.9A Active CN110889351B (en) 2019-11-18 2019-11-18 Video detection method, device, terminal equipment and readable storage medium

Country Status (2)

Country Link
CN (1) CN110889351B (en)
WO (1) WO2021098657A1 (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110889351B (en) * 2019-11-18 2023-09-26 中国科学院深圳先进技术研究院 Video detection method, device, terminal equipment and readable storage medium
CN113486853B (en) * 2021-07-29 2024-02-27 北京百度网讯科技有限公司 Video detection method and device, electronic equipment and medium
CN113705370B (en) * 2021-08-09 2023-06-30 百度在线网络技术(北京)有限公司 Method and device for detecting illegal behaviors of live broadcasting room, electronic equipment and storage medium
CN113671917B (en) * 2021-08-19 2022-08-02 中国科学院自动化研究所 Detection method, system and equipment for abnormal state of multi-modal industrial process
CN113435432B (en) * 2021-08-27 2021-11-30 腾讯科技(深圳)有限公司 Video anomaly detection model training method, video anomaly detection method and device
CN113688925B (en) * 2021-08-31 2023-10-24 惠州学院 Attendance number identification method, electronic equipment and storage medium
CN113762134B (en) * 2021-09-01 2024-03-29 沈阳工业大学 Method for detecting surrounding obstacles in automobile parking based on vision
CN114040197B (en) * 2021-11-29 2023-07-28 北京字节跳动网络技术有限公司 Video detection method, device, equipment and storage medium
CN114782284B (en) * 2022-06-17 2022-09-23 广州三七极耀网络科技有限公司 Motion data correction method, device, equipment and storage medium
CN117079079B (en) * 2023-09-27 2024-03-15 中电科新型智慧城市研究院有限公司 Training method of video anomaly detection model, video anomaly detection method and system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104281858A (en) * 2014-09-15 2015-01-14 中安消技术有限公司 Three-dimensional convolutional neutral network training method and video anomalous event detection method and device
CN109214253A (en) * 2017-07-07 2019-01-15 阿里巴巴集团控股有限公司 A kind of video frame detection method and device
CN110414313A (en) * 2019-06-06 2019-11-05 平安科技(深圳)有限公司 Abnormal behaviour alarm method, device, server and storage medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104036250B (en) * 2014-06-16 2017-11-10 上海大学 Video pedestrian detection and tracking
US10740619B2 (en) * 2017-11-21 2020-08-11 Uber Technologies, Inc. Characterizing content with a predictive error representation
CN110298323B (en) * 2019-07-02 2021-10-15 中国科学院自动化研究所 Frame-fighting detection method, system and device based on video analysis
CN110889351B (en) * 2019-11-18 2023-09-26 中国科学院深圳先进技术研究院 Video detection method, device, terminal equipment and readable storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104281858A (en) * 2014-09-15 2015-01-14 中安消技术有限公司 Three-dimensional convolutional neutral network training method and video anomalous event detection method and device
CN109214253A (en) * 2017-07-07 2019-01-15 阿里巴巴集团控股有限公司 A kind of video frame detection method and device
CN110414313A (en) * 2019-06-06 2019-11-05 平安科技(深圳)有限公司 Abnormal behaviour alarm method, device, server and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
郑锦 ; 李波 ; .视频序列中运动对象检测技术的研究现状与展望.计算机应用研究.2008,(第12期),第3534-3539页. *

Also Published As

Publication number Publication date
CN110889351A (en) 2020-03-17
WO2021098657A1 (en) 2021-05-27

Similar Documents

Publication Publication Date Title
CN110889351B (en) Video detection method, device, terminal equipment and readable storage medium
US10977917B2 (en) Surveillance camera system and surveillance method
US10275670B1 (en) Image analysis technologies for identifying abnormal vehicle conditions
JP6888950B2 (en) Image processing device, external world recognition device
EP3026880B1 (en) Damage recognition assist system
CN112349144B (en) Monocular vision-based vehicle collision early warning method and system
US20150116493A1 (en) Method and system for estimating gaze direction of vehicle drivers
US9076340B2 (en) Vehicle detecting system and method
CN108162858B (en) Vehicle-mounted monitoring device and method thereof
WO2019223655A1 (en) Detection of non-motor vehicle carrying passenger
CN110738150B (en) Camera linkage snapshot method and device and computer storage medium
US11479260B1 (en) Systems and methods for proximate event capture
CN111010530B (en) Emergency vehicle detection
US10710537B2 (en) Method and system for detecting an incident , accident and/or scam of a vehicle
KR20220025997A (en) Apparatus for providing vehicle AVM image for collision prevention of approaching objects
US20160232415A1 (en) Detection detection of cell phone or mobile device use in motor vehicle
CN112818839A (en) Method, device, equipment and medium for identifying violation behaviors of driver
KR20160069685A (en) recognizing system of vehicle number for parking crossing gate
CN110706115A (en) Traffic accident fast claims settlement method, system and server
CN111241918B (en) Vehicle tracking prevention method and system based on face recognition
SE536729C2 (en) Procedure and system for monitoring a motor vehicle from the point of view of intrusion
CN112308723A (en) Vehicle detection method and system
CN115862167A (en) Gateway control method and device, computer equipment and storage medium
JP2021128796A (en) Object recognition system and object recognition method
KR20130031713A (en) Device for guiding safty distance for vehicle and method therefor

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB03 Change of inventor or designer information

Inventor after: Qiao Yu

Inventor after: Peng Xiaojiang

Inventor after: Ye Muchao

Inventor before: Qiao Yu

Inventor before: Peng Xiaojiang

CB03 Change of inventor or designer information
GR01 Patent grant
GR01 Patent grant