CN112364692A - Image processing method and device based on monitoring video data and storage medium - Google Patents

Image processing method and device based on monitoring video data and storage medium Download PDF

Info

Publication number
CN112364692A
CN112364692A CN202011085900.2A CN202011085900A CN112364692A CN 112364692 A CN112364692 A CN 112364692A CN 202011085900 A CN202011085900 A CN 202011085900A CN 112364692 A CN112364692 A CN 112364692A
Authority
CN
China
Prior art keywords
image
video data
training
target object
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011085900.2A
Other languages
Chinese (zh)
Inventor
不公告发明人
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Terminus Technology Group Co Ltd
Original Assignee
Terminus Technology Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Terminus Technology Group Co Ltd filed Critical Terminus Technology Group Co Ltd
Priority to CN202011085900.2A priority Critical patent/CN112364692A/en
Publication of CN112364692A publication Critical patent/CN112364692A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an image processing method, an image processing device and a storage medium based on monitoring video data, wherein the method comprises the following steps: inputting an image to be recognized into an image recognition model for recognition, and outputting a recognition result, wherein the recognition result comprises recognized objects of various categories and corresponding candidate frames; and judging whether a target object exists in the monitored video data according to the identification result, if the identification result is that the objects of all types in the candidate frame comprise the target object, judging that the target object exists in the monitored video data, and if not, judging that no target object exists in the monitored video data. By adopting the embodiment of the application, the objects of all categories in the image to be recognized can be automatically recognized, and whether the target object exists in the monitoring video data can be automatically judged according to the recognition result, so that the following effects can be achieved: and rapidly and accurately screening out videos with target objects from massive video data in a monitoring video library.

Description

Image processing method and device based on monitoring video data and storage medium
Technical Field
The invention relates to the technical field of video data processing, in particular to an image processing method, an image processing device and a storage medium based on monitoring video data.
Background
With the popularization of camera technology, more and more cameras are installed at each corner of user life, in markets, living areas and the like, so that massive monitoring data can be obtained. However, the existing monitoring data are often independent, and it is difficult to retrieve the monitoring data in different areas in order to protect the privacy of the user.
The existing monitoring data is too large in quantity and very disordered in classification, and users cannot quickly and accurately screen out the required monitoring data in a massive monitoring database. In addition, the existing method is too complicated for processing massive monitoring video data, and the obtained processing result is not accurate enough if manual screening is not performed or random screening is not performed.
Disclosure of Invention
The embodiment of the application provides an image processing method and device based on monitoring video data and a storage medium. The following presents a simplified summary in order to provide a basic understanding of some aspects of the disclosed embodiments. This summary is not an extensive overview and is intended to neither identify key/critical elements nor delineate the scope of such embodiments. Its sole purpose is to present some concepts in a simplified form as a prelude to the more detailed description that is presented later.
In a first aspect, an embodiment of the present application provides an image processing method based on surveillance video data, where the method includes:
selecting an image to be identified from a monitoring video database;
training according to each monitoring data in the monitoring database to obtain an image recognition model;
inputting the image to be recognized into an image recognition model for recognition, and outputting a recognition result, wherein the recognition result comprises recognized objects of various categories and corresponding candidate frames;
and judging whether a target object exists in the monitoring video data according to the identification result, if the identification result is that the objects of all types in the candidate frame comprise the target object, judging that the target object exists in the monitoring video data, and if not, judging that the target object does not exist in the monitoring video data.
Optionally, the training according to each monitoring data in the monitoring database to obtain an image recognition model includes:
selecting a plurality of data sets from the surveillance video data, the data sets including a first data set, a second data set, and a third data set;
taking the first data set and the second data set as training samples, and taking the third data set as detection samples;
training the training sample to obtain an initial image recognition model;
and correcting the initial image recognition model according to the detection sample to obtain the image recognition model.
Optionally, the inputting the image to be recognized into an image recognition model for recognition, and outputting a recognition result includes:
extracting candidate regions of the image to be identified to obtain a plurality of candidate regions;
respectively extracting the features of each candidate region, and extracting the image features of corresponding dimensions;
and classifying each candidate region through a support vector machine to obtain the identification result.
Optionally, the classifying each candidate region by the support vector machine includes:
obtaining a support vector machine classifier corresponding to each category;
and classifying the objects of each candidate region of the image to be recognized through the support vector machine classifier corresponding to each category, and determining the classification category of the maximum probability corresponding to each candidate region.
Optionally, after the outputting the recognition result, the method further includes:
selecting any one object from the objects in each category;
obtaining coordinates of a bounding box used for determining the object, wherein the coordinates comprise an abscissa and an ordinate;
determining the position of the bounding box and the size of the bounding box according to each coordinate;
correcting the position of the boundary frame according to a linear regression correction model to obtain the corrected position of the boundary frame, and/or,
and correcting the size of the boundary frame according to the linear regression correction model to obtain the corrected size of the boundary frame.
Optionally, before the correcting the position of the bounding box according to the linear regression correction model, the method further includes:
and training corresponding bounding box regressors for the objects of all classes.
Optionally, the image recognition model is an image recognition model established based on a regional convolutional neural network.
In a second aspect, an embodiment of the present application provides an image processing apparatus based on surveillance video data, the apparatus including:
the selection unit is used for selecting an image to be identified from the monitoring video database;
the training unit is used for training according to each monitoring data in the monitoring database to obtain an image recognition model;
the identification unit is used for inputting the image to be identified selected by the selection unit into an image identification model for identification and outputting an identification result, wherein the identification result comprises the identified objects of each category and corresponding candidate frames;
and the processing unit is used for judging whether a target object exists in the monitoring video data according to the identification result identified by the identification unit, judging that the target object exists in the monitoring video data if the identification result is that the objects of all types in the candidate frame comprise the target object, and otherwise, judging that the target object does not exist in the monitoring video data.
Optionally, the training unit is specifically configured to:
selecting a plurality of data sets from the surveillance video data, the data sets including a first data set, a second data set, and a third data set;
taking the first data set and the second data set as training samples, and taking the third data set as detection samples;
training the training sample to obtain an initial image recognition model;
and correcting the initial image recognition model according to the detection sample to obtain the image recognition model.
In a third aspect, embodiments of the present application provide a computer storage medium storing a plurality of instructions adapted to be loaded by a processor and to perform the above-mentioned method steps.
The technical scheme provided by the embodiment of the application can have the following beneficial effects:
in the embodiment of the application, an image to be recognized is input into an image recognition model for recognition, and a recognition result is output and comprises recognized objects of various categories and corresponding candidate frames; and judging whether a target object exists in the monitored video data according to the identification result, if the identification result is that the objects of all types in the candidate frame comprise the target object, judging that the target object exists in the monitored video data, and if not, judging that no target object exists in the monitored video data. By adopting the embodiment of the application, the objects of all categories in the image to be recognized can be automatically recognized, and whether the target object exists in the monitoring video data can be automatically judged according to the recognition result, so that the following effects can be achieved: and rapidly and accurately screening out videos with target objects from massive video data in a monitoring video library.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.
Fig. 1 is a schematic flowchart of an image processing method based on surveillance video data according to an embodiment of the present application;
fig. 2 is a schematic structural diagram of an image processing apparatus based on surveillance video data according to an embodiment of the present application.
Detailed Description
The following description and the drawings sufficiently illustrate specific embodiments of the invention to enable those skilled in the art to practice them.
It should be understood that the described embodiments are only some embodiments of the invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The following describes in detail an image processing method based on surveillance video data according to an embodiment of the present application with reference to fig. 1. The image processing method may be implemented by relying on a computer program, and may be run on an image processing apparatus based on surveillance video data.
Referring to fig. 1, a schematic flow chart of an image processing method based on surveillance video data is provided in an embodiment of the present application. As shown in fig. 1, the image processing method based on surveillance video data according to the embodiment of the present application may include the following steps:
and S102, selecting an image to be identified from the monitoring video database.
In the embodiment of the application, a screenshot of a specific video data in a monitoring video database can be selected as an image to be identified according to a selection instruction of a user; or randomly selecting a screenshot of certain video data as an image to be identified. In the image processing method provided in the embodiment of the present application, the image to be recognized is not specifically limited.
And S104, training according to each monitoring data in the monitoring database to obtain an image recognition model.
In the embodiment of the application, the training according to each monitoring data in the monitoring database to obtain the image recognition model comprises the following steps:
selecting a plurality of data sets from the monitoring video data, wherein the data sets comprise a first data set, a second data set and a third data set;
taking the first data set and the second data set as training samples, and taking the third data set as detection samples;
training the training sample to obtain an initial image recognition model;
and correcting the initial image recognition model according to the detection sample to obtain the image recognition model.
Here, this is merely an example, and more data sets may be selected according to the requirement for the training accuracy of the image recognition model, for example, five data sets are configured, where four data sets are used as training samples for training, and one data set is used as a detection sample, and the obtained initial image recognition model is modified to finally obtain a modified image recognition model.
In practical applications, in order to ensure a better detection result, a plurality of detection samples are often set, for example, any one of five data sets is used as a certain detection sample to be detected in sequence, so as to obtain five corresponding detection results; after five detection results are obtained respectively, the five detection results are compared, and finally an image recognition model is determined, wherein the image recognition model can enable a predicted value to be infinitely close to a true value.
In the embodiment of the application, the image recognition model is an image recognition model established based on a regional convolution neural network.
The following description is made for a training process of an image recognition model in the embodiment of the present application:
the specific training process adopts a gradient descent method.
For any parameter w of the regional convolutional neural network, the training formula for gradient descent used is:
Figure BDA0002720386070000051
where η is the learning rate, a small positive number specified before training, similar to the step size of each step of the gradient descent. While
Figure BDA0002720386070000061
Is to calculate the partial derivative of the signal,
Figure BDA0002720386070000062
referred to as a gradient.
From the definition of the derivative, it is calculated:
Figure BDA0002720386070000063
the specific gradient descent formula of the area convolution neural network:
LOSS ═ OUT-desired output2,
Figure BDA0002720386070000064
For the weight of the neuron, the formula is:
Figure BDA0002720386070000065
for bias, the formula is:
Figure BDA0002720386070000066
according to the definition of the derivative, for any derivable function f, there is:
Figure BDA0002720386070000067
thus:
Figure BDA0002720386070000068
then:
Figure BDA0002720386070000069
Figure BDA0002720386070000071
the core idea of the area convolution neural network series is to convert the traditional image processing technology into processing by using a neural network and reuse the processing as much as possible to reduce the calculation amount.
The nonlinear activation function used in the training of the image recognition model in the embodiment of the present application may be any one of ReLU, sigmoid, and tanh.
The ReLU activation function is explained as follows:
ReLU is defined as:
Figure BDA0002720386070000072
the derivative of ReLU is:
Figure BDA0002720386070000073
the nonlinear activation function ReLU has the following advantages:
ReLU is simple and fast to operate;
ReLU can avoid gradient disappearance;
the ReLU has better network performance.
sigmoid is defined as:
Figure BDA0002720386070000074
the derivative of sigmoid is: f' (x) ═ f (x) (1-f (x)).
the definition of tanh is:
Figure BDA0002720386070000075
the derivative of tanh is: f' (x) ═ 1-f (x)2
The following steps are taken to obtain an image recognition model by training all data under a specific application scene, and the method comprises the following steps, assuming that the number of data is 20000 and the batch size is 100:
step a 1: reading all the data (20000);
step a 2: dividing data into training set test sets, for example, the first 18000 as training sets and the last 2000 as test sets;
step a 3: starting the first epoch (one forward and one backward propagation of all training samples in the neural network, called an epoch); disorganizing 18000 data of a training set, wherein input and output keep correct one-to-one correspondence;
step a 4: sending the No. 1 batch processing, namely No. 1-100 of the scrambled data into a network;
step a 5: forward propagation, comparing the output result with an expected value, and calculating loss;
step a 6: back propagation and updating network parameters;
step a 7: sending the 2 nd batch processing, namely the No. 101-200 of the scrambled data into the network; repeating the above process;
step a 8: completing the 1 st period after all training sets are trained, namely 18000/100-180 batches;
step a 9: sending the test set into a network; forward propagation; comparing the test result with an expected value, and calculating loss;
step a 10: comparing the loss of the training set with the loss of the test set, and performing corresponding adjustment, for example, adjusting the learning rate, or stopping training and modifying the hyper-parameters;
step a 11: beginning the 2 nd session, the 18000 data from the training set are shuffled again and the process is repeated.
And S106, inputting the image to be recognized into the image recognition model for recognition, and outputting a recognition result, wherein the recognition result comprises recognized objects of various categories and corresponding candidate frames.
In the embodiment of the application, the image to be recognized is input into the image recognition model for recognition, and the output of the recognition result comprises the following steps:
step a: extracting candidate regions of an image to be identified to obtain a plurality of candidate regions;
specifically, the candidate region extraction usually adopts a classical target detection algorithm, and uses a sliding window to sequentially judge each possible region. In practical application, the optimization can be performed by using an object detection algorithm, a series of candidate regions of possible objects are extracted in advance by adopting selective search, and then, features are extracted only on the candidate regions, so that the calculation amount is greatly reduced.
Step b: respectively extracting the features of each candidate region, and extracting the image features of corresponding dimensions;
specifically, all candidate regions are cropped and scaled to a fixed size, and then, for each candidate region, multi-dimensional image features are extracted using 5 convolutional layers and 2 fully-connected layers.
Step c: and classifying each candidate region through a support vector machine to obtain an identification result.
Specifically, the classification of each candidate region by the support vector machine includes the following steps:
obtaining a support vector machine classifier corresponding to each category;
and classifying the objects of each candidate region of the image to be recognized through the support vector machine classifier corresponding to each category, and determining the classification category of the maximum probability corresponding to each candidate region.
Each category is provided with a support vector machine classifier, objects (including background) in 1000 candidate areas are classified through 21 classifiers, and the most possible classification of each candidate area is judged, such as people, flowers, pets and the like.
In practical application, in order to avoid the possibility that the same object has a plurality of different frames, after the classifier is used for classifying the possible types of each candidate area, the classification result is subjected to non-maximum suppression, and redundant frames are removed. For example, the same object may have different boxes, some redundant boxes are removed, and only one box is retained.
In one possible implementation, after outputting the recognition result, the method further includes:
selecting any one object from the objects in each category;
acquiring coordinates of a bounding box for determining an object, wherein the coordinates comprise an abscissa and an ordinate;
determining the position of the bounding box and the size of the bounding box according to each coordinate;
correcting the position of the boundary frame according to the linear regression correction model to obtain the corrected position of the boundary frame, and/or,
and correcting the size of the boundary frame according to the linear regression correction model to obtain the corrected size of the boundary frame. And performing fine tuning calibration on each candidate frame by performing boundary frame regression and correcting the model through linear regression so as to accurately frame the object in each candidate area, and finally improving the accuracy of identifying the object.
In one possible implementation, before the position of the bounding box is corrected according to the linear regression correction model, the method further includes the following steps:
and training corresponding bounding box regressors for the objects of all classes.
For example, there is a support vector machine classifier for each category, and objects (including background) in 1000 candidate regions are classified by 21 classifiers, and the most probable classification of each candidate region is determined, for example, human, flower, pet, etc. After 21 classifications are determined, corresponding bounding box regressors are trained for the 21 classes of objects. The training method for training the boundary box regressor is a conventional method, and is not described herein again.
And S108, judging whether a target object exists in the monitored video data according to the identification result, if the identification result is that the objects of all types in the candidate frame comprise the target object, judging that the target object exists in the monitored video data, and if not, judging that no target object exists in the monitored video data.
In the embodiment of the application, an image to be recognized is input into an image recognition model for recognition, and a recognition result is output and comprises recognized objects of various categories and corresponding candidate frames; and judging whether a target object exists in the monitored video data according to the identification result, if the identification result is that the objects of all types in the candidate frame comprise the target object, judging that the target object exists in the monitored video data, and if not, judging that no target object exists in the monitored video data. By adopting the embodiment of the application, the objects of all categories in the image to be recognized can be automatically recognized, and whether the target object exists in the monitoring video data can be automatically judged according to the recognition result, so that the following effects can be achieved: and rapidly and accurately screening out videos with target objects from massive video data in a monitoring video library.
The following is an embodiment of an image processing apparatus of the present invention that can be used to perform an embodiment of an image processing method of the present invention. For details that are not disclosed in the embodiments of the image processing apparatus of the present invention, refer to the embodiments of the image processing method of the present invention.
Referring to fig. 2, a schematic structural diagram of an image processing apparatus based on surveillance video data according to an exemplary embodiment of the present invention is shown. The image processing apparatus may be implemented as all or a part of the terminal by software, hardware, or a combination of both. The image processing apparatus includes a selecting unit 10, a training unit 20, a recognizing unit 30, and a processing unit 40.
Specifically, the selecting unit 10 is configured to select an image to be identified from a surveillance video database;
the training unit 20 is used for training according to each monitoring data in the monitoring database to obtain an image recognition model;
the recognition unit 30 is used for inputting the image to be recognized selected by the selection unit 10 into the image recognition model for recognition and outputting a recognition result, wherein the recognition result comprises recognized objects of various categories and corresponding candidate frames;
and the processing unit 40 is configured to determine whether a target object exists in the monitored video data according to the identification result identified by the identification unit 30, determine that a target object exists in the monitored video data if the identification result indicates that the objects of each category in the candidate frame include a target object, and determine that no target object exists in the monitored video data if the identification result indicates that the objects of each category in the candidate frame include a target object.
Optionally, the training unit 20 is specifically configured to:
selecting a plurality of data sets from the monitoring video data, wherein the data sets comprise a first data set, a second data set and a third data set;
taking the first data set and the second data set as training samples, and taking the third data set as detection samples;
training the training sample to obtain an initial image recognition model;
and correcting the initial image recognition model according to the detection sample to obtain the image recognition model.
Optionally, the identification unit 30 is configured to:
extracting candidate regions of an image to be identified to obtain a plurality of candidate regions;
respectively extracting the features of each candidate region, and extracting the image features of corresponding dimensions;
and classifying each candidate region through a support vector machine to obtain an identification result.
Optionally, the identification unit 30 is specifically configured to:
obtaining a support vector machine classifier corresponding to each category;
and classifying the objects of each candidate region of the image to be recognized through the support vector machine classifier corresponding to each category, and determining the classification category of the maximum probability corresponding to each candidate region.
Optionally, after the recognition unit 30 outputs the recognition result, the processing unit 40 is further specifically configured to:
selecting any one object from the objects in each category;
acquiring coordinates of a bounding box for determining an object, wherein the coordinates comprise an abscissa and an ordinate;
determining the position of the bounding box and the size of the bounding box according to each coordinate;
correcting the position of the boundary frame according to the linear regression correction model to obtain the corrected position of the boundary frame, and/or,
and correcting the size of the boundary frame according to the linear regression correction model to obtain the corrected size of the boundary frame.
Optionally, before the processing unit 40 corrects the position of the bounding box according to the linear regression correction model, the training unit 20 is further configured to:
and training corresponding bounding box regressors for the objects of all classes.
Optionally, the image recognition model is an image recognition model established based on a regional convolutional neural network.
It should be noted that, when the image processing apparatus provided in the foregoing embodiment executes the image processing method, only the division of the functional modules is illustrated, and in practical applications, the above functions may be distributed by different functional modules according to needs, that is, the internal structure of the device may be divided into different functional modules to complete all or part of the above described functions. In addition, the image processing apparatus and the image processing method provided by the above embodiments belong to the same concept, and details of implementation processes thereof are referred to in the method embodiments and are not described herein again.
In the embodiment of the application, the identification unit inputs the image to be identified into the image identification model for identification, and outputs an identification result, wherein the identification result comprises the identified objects of each category and corresponding candidate frames; and the processing unit judges whether a target object exists in the monitored video data according to the identification result of the identification unit, if the identification result is that the objects of all types in the candidate frame comprise the target object, the target object exists in the monitored video data, and if not, the target object does not exist in the monitored video data. By adopting the embodiment of the application, the objects of all categories in the image to be recognized can be automatically recognized, and whether the target object exists in the monitoring video data can be automatically judged according to the recognition result, so that the following effects can be achieved: and rapidly and accurately screening out videos with target objects from massive video data in a monitoring video library.
In one embodiment, a computer device is proposed, the computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the following steps when executing the computer program: selecting an image to be identified from a monitoring video database; training according to each monitoring data in the monitoring database to obtain an image recognition model; inputting an image to be recognized into an image recognition model for recognition, and outputting a recognition result, wherein the recognition result comprises recognized objects of various categories and corresponding candidate frames; and judging whether a target object exists in the monitored video data according to the identification result, if the identification result is that the objects of all types in the candidate frame comprise the target object, judging that the target object exists in the monitored video data, and if not, judging that no target object exists in the monitored video data.
In one embodiment, the step of training, performed by the processor, according to each monitoring data in the monitoring database to obtain the image recognition model includes: selecting a plurality of data sets from the monitoring video data, wherein the data sets comprise a first data set, a second data set and a third data set; taking the first data set and the second data set as training samples, and taking the third data set as detection samples; training the training sample to obtain an initial image recognition model; and correcting the initial image recognition model according to the detection sample to obtain the image recognition model.
In one embodiment, the step of inputting the image to be recognized into the image recognition model for recognition and outputting the recognition result, which is executed by the processor, includes: extracting candidate regions of an image to be identified to obtain a plurality of candidate regions; respectively extracting the features of each candidate region, and extracting the image features of corresponding dimensions; and classifying each candidate region through a support vector machine to obtain an identification result.
In one embodiment, the step of classifying each candidate region by a support vector machine performed by the processor comprises: obtaining a support vector machine classifier corresponding to each category; and classifying the objects of each candidate region of the image to be recognized through the support vector machine classifier corresponding to each category, and determining the classification category of the maximum probability corresponding to each candidate region.
In one embodiment, a storage medium is provided that stores computer-readable instructions that, when executed by one or more processors, cause the one or more processors to perform the steps of: selecting an image to be identified from a monitoring video database; training according to each monitoring data in the monitoring database to obtain an image recognition model; inputting an image to be recognized into an image recognition model for recognition, and outputting a recognition result, wherein the recognition result comprises recognized objects of various categories and corresponding candidate frames; and judging whether a target object exists in the monitored video data according to the identification result, if the identification result is that the objects of all types in the candidate frame comprise the target object, judging that the target object exists in the monitored video data, and if not, judging that no target object exists in the monitored video data.
In one embodiment, the step of training, performed by the processor, according to each monitoring data in the monitoring database to obtain the image recognition model includes: selecting a plurality of data sets from the monitoring video data, wherein the data sets comprise a first data set, a second data set and a third data set; taking the first data set and the second data set as training samples, and taking the third data set as detection samples; training the training sample to obtain an initial image recognition model; and correcting the initial image recognition model according to the detection sample to obtain the image recognition model.
In one embodiment, the step of inputting the image to be recognized into the image recognition model for recognition and outputting the recognition result, which is executed by the processor, includes: extracting candidate regions of an image to be identified to obtain a plurality of candidate regions; respectively extracting the features of each candidate region, and extracting the image features of corresponding dimensions; and classifying each candidate region through a support vector machine to obtain an identification result.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and can include the processes of the embodiments of the methods described above when the computer program is executed. The storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a Random Access Memory (RAM).
The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. An image processing method based on monitoring video data, characterized in that the method comprises:
selecting an image to be identified from a monitoring video database;
training according to each monitoring data in the monitoring database to obtain an image recognition model;
inputting the image to be recognized into an image recognition model for recognition, and outputting a recognition result, wherein the recognition result comprises recognized objects of various categories and corresponding candidate frames;
and judging whether a target object exists in the monitoring video data according to the identification result, if the identification result is that the objects of all types in the candidate frame comprise the target object, judging that the target object exists in the monitoring video data, and if not, judging that the target object does not exist in the monitoring video data.
2. The method of claim 1, wherein the training according to each monitoring data in the monitoring database to obtain an image recognition model comprises:
selecting a plurality of data sets from the surveillance video data, the data sets including a first data set, a second data set, and a third data set;
taking the first data set and the second data set as training samples, and taking the third data set as detection samples;
training the training sample to obtain an initial image recognition model;
and correcting the initial image recognition model according to the detection sample to obtain the image recognition model.
3. The method according to claim 1, wherein the inputting the image to be recognized into an image recognition model for recognition and the outputting the recognition result comprises:
extracting candidate regions of the image to be identified to obtain a plurality of candidate regions;
respectively extracting the features of each candidate region, and extracting the image features of corresponding dimensions;
and classifying each candidate region through a support vector machine to obtain the identification result.
4. The method of claim 3, wherein the classifying the respective candidate regions by a support vector machine comprises:
obtaining a support vector machine classifier corresponding to each category;
and classifying the objects of each candidate region of the image to be recognized through the support vector machine classifier corresponding to each category, and determining the classification category of the maximum probability corresponding to each candidate region.
5. The method of claim 1, after said outputting a recognition result, further comprising:
selecting any one object from the objects in each category;
obtaining coordinates of a bounding box used for determining the object, wherein the coordinates comprise an abscissa and an ordinate;
determining the position of the bounding box and the size of the bounding box according to each coordinate;
correcting the position of the boundary frame according to a linear regression correction model to obtain the corrected position of the boundary frame, and/or,
and correcting the size of the boundary frame according to the linear regression correction model to obtain the corrected size of the boundary frame.
6. The method of claim 5, further comprising, prior to said modifying the position of the bounding box according to a linear regression modification model:
and training corresponding bounding box regressors for the objects of all classes.
7. The method according to any one of claims 1 to 6,
the image recognition model is established based on a regional convolution neural network.
8. An image processing apparatus based on surveillance video data, the apparatus comprising:
the selection unit is used for selecting an image to be identified from the monitoring video database;
the training unit is used for training according to each monitoring data in the monitoring database to obtain an image recognition model;
the identification unit is used for inputting the image to be identified selected by the selection unit into an image identification model for identification and outputting an identification result, wherein the identification result comprises the identified objects of each category and corresponding candidate frames;
and the processing unit is used for judging whether a target object exists in the monitoring video data according to the identification result identified by the identification unit, judging that the target object exists in the monitoring video data if the identification result is that the objects of all types in the candidate frame comprise the target object, and otherwise, judging that the target object does not exist in the monitoring video data.
9. The apparatus of claim 8,
the training unit is specifically configured to:
selecting a plurality of data sets from the surveillance video data, the data sets including a first data set, a second data set, and a third data set;
taking the first data set and the second data set as training samples, and taking the third data set as detection samples;
training the training sample to obtain an initial image recognition model;
and correcting the initial image recognition model according to the detection sample to obtain the image recognition model.
10. A computer storage medium, characterized in that it stores a plurality of instructions adapted to be loaded by a processor and to carry out the method steps according to any one of claims 1 to 7.
CN202011085900.2A 2020-10-12 2020-10-12 Image processing method and device based on monitoring video data and storage medium Pending CN112364692A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011085900.2A CN112364692A (en) 2020-10-12 2020-10-12 Image processing method and device based on monitoring video data and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011085900.2A CN112364692A (en) 2020-10-12 2020-10-12 Image processing method and device based on monitoring video data and storage medium

Publications (1)

Publication Number Publication Date
CN112364692A true CN112364692A (en) 2021-02-12

Family

ID=74507722

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011085900.2A Pending CN112364692A (en) 2020-10-12 2020-10-12 Image processing method and device based on monitoring video data and storage medium

Country Status (1)

Country Link
CN (1) CN112364692A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106384345A (en) * 2016-08-31 2017-02-08 上海交通大学 RCNN based image detecting and flow calculating method
CN107247956A (en) * 2016-10-09 2017-10-13 成都快眼科技有限公司 A kind of fast target detection method judged based on grid
CN109460753A (en) * 2018-05-25 2019-03-12 江南大学 A method of detection over-water floats
WO2019223582A1 (en) * 2018-05-24 2019-11-28 Beijing Didi Infinity Technology And Development Co., Ltd. Target detection method and system
CN111488804A (en) * 2020-03-19 2020-08-04 山西大学 Labor insurance product wearing condition detection and identity identification method based on deep learning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106384345A (en) * 2016-08-31 2017-02-08 上海交通大学 RCNN based image detecting and flow calculating method
CN107247956A (en) * 2016-10-09 2017-10-13 成都快眼科技有限公司 A kind of fast target detection method judged based on grid
WO2019223582A1 (en) * 2018-05-24 2019-11-28 Beijing Didi Infinity Technology And Development Co., Ltd. Target detection method and system
CN109460753A (en) * 2018-05-25 2019-03-12 江南大学 A method of detection over-water floats
CN111488804A (en) * 2020-03-19 2020-08-04 山西大学 Labor insurance product wearing condition detection and identity identification method based on deep learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
梦里寻梦: "通俗易懂理解—R-CNN", pages 1 - 13, Retrieved from the Internet <URL:《https://zhuanlan.zhihu.com/p/67928873》> *

Similar Documents

Publication Publication Date Title
CN109741292B (en) Method for detecting abnormal image in first image data set by using countermeasure self-encoder
CN105808610B (en) Internet picture filtering method and device
CN109271958B (en) Face age identification method and device
US20170364742A1 (en) Lip-reading recognition method and apparatus based on projection extreme learning machine
CN109376696B (en) Video motion classification method and device, computer equipment and storage medium
CN111814902A (en) Target detection model training method, target identification method, device and medium
KR20170091716A (en) Automatic defect classification without sampling and feature selection
CN109871821B (en) Pedestrian re-identification method, device, equipment and storage medium of self-adaptive network
US10929978B2 (en) Image processing apparatus, training apparatus, image processing method, training method, and storage medium
JP7058941B2 (en) Dictionary generator, dictionary generation method, and program
CN110096938B (en) Method and device for processing action behaviors in video
KR20160096460A (en) Recognition system based on deep learning including a plurality of classfier and control method thereof
US9842279B2 (en) Data processing method for learning discriminator, and data processing apparatus therefor
CN111008643B (en) Picture classification method and device based on semi-supervised learning and computer equipment
JP2017228224A (en) Information processing device, information processing method, and program
JP6924031B2 (en) Object detectors and their programs
CN109934129B (en) Face feature point positioning method, device, computer equipment and storage medium
CN111242176A (en) Computer vision task processing method and device and electronic system
CN116266387A (en) YOLOV4 image recognition algorithm and system based on re-parameterized residual error structure and coordinate attention mechanism
CN111310516A (en) Behavior identification method and device
CN109101858B (en) Action recognition method and device
CN112364692A (en) Image processing method and device based on monitoring video data and storage medium
CN115761842A (en) Automatic updating method and device for human face base
CN112699809B (en) Vaccinia category identification method, device, computer equipment and storage medium
CN114119970A (en) Target tracking method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination