WO2020151172A1

WO2020151172A1 - Moving object detection method and apparatus, computer device, and storage medium

Info

Publication number: WO2020151172A1
Application number: PCT/CN2019/091905
Authority: WO
Inventors: 王健宗; 彭俊清
Original assignee: 平安科技（深圳）有限公司
Priority date: 2019-01-23
Filing date: 2019-06-19
Publication date: 2020-07-30
Also published as: CN109919008A

Abstract

A moving object detection method and apparatus, a device, and a storage medium. The method comprises: acquiring a real-time video, and firstly determining a moving object in the real-time video; extracting a bounding box of the moving object and data information corresponding to the bounding box; inputting the image in the bounding box into a pre-trained target recognition model according to the data information to carry out recognition detection so as to obtain a classification category corresponding to the moving object; and labeling the moving object in the real-time video according to the classification category.

Description

Moving target detection method, device, computer equipment and storage medium

This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on January 23, 2019, the application number is 201910065021.4, and the invention title is "moving target detection method, device, computer equipment and storage medium", the entire content of which is incorporated by reference Incorporated in this application.

Technical field

This application relates to the field of image recognition technology, and in particular to a moving target detection method, device, computer equipment and storage medium.

Background technique

In the traditional target detection method, it is necessary to put pictures or videos into the convolutional layer of the neural network for convolution operation, and then segment and find the detection targets one by one. This process finds the relevant targets by traversing the entire picture. This method It consumes more computing power. In some practical scenarios, such as traffic monitoring scenarios, in the process of detecting vehicles, real-time video is generally monitored, which requires very high efficiency, while traditional target detection methods are difficult to achieve at this point. Therefore, it is necessary to provide a moving target detection method to solve the above problems.

Summary of the invention

This application provides a moving target detection method, device, computer equipment and storage medium to improve the detection speed and accuracy of moving targets.

In the first aspect, this application provides a method for detecting a moving target, the method including:

Acquiring real-time video, and determining the moving target in the real-time video;

Extracting a bounding box of the moving target and data information corresponding to the bounding box, where the data information includes position information and size information of the bounding box in the real-time video recording;

Inputting the image in the bounding box to a pre-trained target recognition model for recognition and detection according to the data information, so as to output a classification category corresponding to the moving target;

Marking the moving target in the real-time video recording according to the classification category.

In the second aspect, this application also provides a moving target detection device, the device including:

An obtaining and determining unit, configured to obtain real-time video, and determine the moving target in the real-time video;

An information extraction unit, configured to extract a bounding box of the moving target and data information corresponding to the bounding box, the data information including position information and size information of the bounding box in the real-time video;

A recognition detection unit, configured to input the image in the bounding box into a pre-trained target recognition model for recognition and detection according to the data information, so as to output a classification category corresponding to the moving target;

The target labeling unit is configured to label the moving target in the real-time video recording according to the classification category.

In the third aspect, the present application also provides a computer device, the computer device includes a memory and a processor; the memory is used to store a computer program; the processor is used to execute the computer program and when executing the computer program The computer program realizes the above-mentioned moving target detection method.

In a fourth aspect, the present application also provides a computer-readable storage medium that stores a computer program, and when the computer program is executed by a processor, the processor realizes the above-mentioned moving target detection method.

This application discloses a moving object detection method, device, equipment and storage medium, which can quickly identify and classify moving objects, such as identifying car logos and car models corresponding to moving vehicles, etc., which can reduce the amount of calculation when identifying and classifying, thereby Provides the recognition efficiency of moving targets and is suitable for real-time detection and recognition.

Description of the drawings

In order to more clearly describe the technical solutions of the embodiments of the present application, the following will briefly introduce the drawings used in the description of the embodiments. Obviously, the drawings in the following description are some embodiments of the present application. Ordinary technical personnel can obtain other drawings based on these drawings without creative work.

FIG. 1 is a schematic flowchart of a method for training a target recognition model provided by an embodiment of the present application;

FIG. 2 is a schematic diagram of an application scenario of a moving target detection method provided by an embodiment of the present application;

FIG. 3 is a schematic flowchart of a moving target detection method provided by an embodiment of the present application;

4 is a schematic flowchart of sub-steps of the moving target detection method in FIG. 3;

FIG. 5 is a schematic flowchart of steps for determining a moving target provided by an embodiment of the present application;

FIG. 6 is a schematic block diagram of a model training device provided by an embodiment of the application;

FIG. 7 is a schematic block diagram of a moving target detection device provided by an embodiment of the application;

FIG. 8 is a schematic block diagram of another moving target detection device provided by an embodiment of the application;

FIG. 9 is a schematic block diagram of the structure of a computer device according to an embodiment of the application.

detailed description

The following will clearly and completely describe the technical solutions in the embodiments of the present application in conjunction with the drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, not all of the embodiments. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of this application.

The flowchart shown in the drawings is only an example, and does not necessarily include all contents and operations/steps, nor does it have to be executed in the described order. For example, some operations/steps can also be decomposed, combined or partially combined, so the actual execution order may be changed according to actual conditions.

The embodiments of the application provide a moving target detection method, device, computer equipment, and storage medium. Among them, the moving target detection method can be applied to a terminal or a server to quickly and accurately identify the classification information of the moving target.

For example, the moving target detection method is used to identify and classify moving vehicles on the road, and of course it can be used to identify other moving targets, such as non-motorized vehicles, animals, or pedestrians. However, for ease of understanding, the following embodiments will take a moving vehicle as a moving target for detailed introduction.

In the following, some embodiments of the present application will be described in detail with reference to the accompanying drawings. In the case of no conflict, the following embodiments and features in the embodiments can be combined with each other.

Please refer to FIG. 1. FIG. 1 is a schematic flowchart of a method for training a target recognition model provided by an embodiment of the present application. The target recognition model is obtained by model training based on a convolutional neural network. Of course, other networks can also be used for training.

It should be noted that in this embodiment, GoogLeNet is used for model training to obtain the target recognition model. Of course, other networks may also be used, such as AlexNet or VGGNet. The following will introduce GoogLeNet as an example.

As shown in Figure 1, the training method of the target recognition model is used to train the target recognition model for application in the moving target detection method. Wherein, the training method includes step S101 to step S105.

S101. Obtain a target picture.

Wherein, the target pictures are pictures of multiple target objects taken from different angles. In this embodiment, the target object is a vehicle, including vehicles of different models under the same vehicle label. Of course, it may also be a non-motorized vehicle, a pedestrian, or an animal. Selecting a vehicle includes selecting cars with different logos and models, and taking pictures taken from different angles of the car as the target picture. The target picture constitutes a picture set for training the target recognition model.

S102: Mark the target picture according to the category identifier corresponding to the category category.

Among them, the classification category includes vehicle logo and vehicle model, and the corresponding category identification includes vehicle logo identification and vehicle model identification. Among them, the car logo includes: Ferrari, Lamborghini, Bentley, Aston Martin, Mercedes-Benz, BMW, Audi, Chevrolet, Volkswagen or BYD, etc.; model logos include: small cars, mini cars, compact cars, medium cars, high-end cars , Luxury models, sedan models or SUV models.

Specifically, the target pictures are marked according to the vehicle logo identifier and the vehicle type identifier corresponding to the classification category, so that each target picture has marking information, that is, each target picture includes the vehicle logo and the vehicle model.

In one embodiment, in order to quickly train the target recognition model, after marking each target picture, sample data can be constructed, and step S105 is executed according to the constructed sample data to perform model training.

S103: Perform an image processing operation on the target picture to change the picture parameters of the target picture, and use the target picture whose picture parameters are changed as a new target picture.

In order to improve the accuracy of the target recognition model, after marking each target picture, it is necessary to perform an image processing operation on each target picture to change the picture parameters of the target picture.

Among them, image processing operations include: size adjustment, cropping, rotation, image algorithm processing, etc.; image algorithm processing includes: color temperature adjustment algorithm, exposure adjustment algorithm, contrast adjustment algorithm, highlight recovery algorithm, low light compensation algorithm, white balance Algorithm, adjustment of definition algorithm, fogging algorithm index, adjustment of natural saturation algorithm. Through these image processing operations, the diversity of the sample data can be increased, making the sample data closer to the real pictures.

Correspondingly, the picture parameters include size information, pixel size, color temperature parameters, exposure, contrast, white balance, sharpness, fogging parameters, and natural saturation.

It should be noted that performing an image processing operation on the target picture to change the picture parameters of the target picture, and using the target picture whose picture parameters are changed as a new target picture, refers to performing the aforementioned multiple image processing operations on the target picture respectively One or more of them are combined to change the picture parameters of the target picture. In turn, the diversity of the samples is increased, and the samples are more representative of the real environment, thereby improving the recognition accuracy of the model.

S104. Construct sample data according to the new target picture and the target picture.

Specifically, the target picture whose picture parameters are changed is saved as a new target picture, and the new target picture and the original target picture are combined to form sample data. This increases the number of samples and at the same time increases the diversity of samples.

S105: Based on the convolutional neural network, perform model training according to the sample data to obtain a target recognition model, and use the obtained target recognition model as a pre-trained target recognition model.

Specifically, the constructed sample data is used for model training through GoogLeNet. Specifically, directional propagation training can be used. The convolutional layer and pooling layer of GoogLeNet are used to extract features from the input sample data, and the fully connected layer is used as a classifier. The output of this classifier is the probability value of different car logos and models.

Initialize all filters and parameters/weights with random values; the convolutional neural network takes the trained sample data as input and goes through the forward propagation step (convolution, ReLU activation and pooling operations to forward propagation in the fully connected layer) , And finally get the output probability of each category.

Take part of the pictures in the above sample data as the ground truth, and use the prepared sample data through large-scale iterative training to let the convolutional neural network output the output probability of each category after learning the semantic information of the picture, using the output probability and Define the loss function (loss) of the calibration data (ground truth), and minimize the loss function (loss) in the model training to ensure the accuracy of the model to complete the model training.

Since the moving target detection method can be applied to the terminal or server, the trained model needs to be stored in the terminal or server. Among them, the terminal can be an electronic device such as a mobile phone, a tablet computer, a notebook computer, a desktop computer, a personal digital assistant, and a wearable device; the server can be an independent server or a server cluster.

If it is applied to a terminal, in order to ensure the normal operation of the terminal and to quickly identify the category of the detected moving target, it is also necessary to compress the target recognition model obtained by training, and save the compressed model in the terminal.

Wherein, the compression processing specifically includes pruning processing, quantization processing, and Huffman encoding processing on the target recognition model, etc., to reduce the size of the target recognition model, and thereby facilitate storage in a terminal with a smaller capacity.

The training method provided by the above-mentioned embodiments uses image processing operations to process the target pictures to increase the diversity of sample data by shooting target pictures with multiple target objects at different angles; based on the convolutional neural network, the training is performed according to the constructed sample data Model training is used to obtain a target recognition model, and the obtained target recognition model is used as a pre-trained target recognition model in the moving target recognition method, thereby improving the recognition accuracy of the moving target.

Please refer to Fig. 2, which is a schematic diagram of an application scenario of the moving target detection method provided by an embodiment of the present application. This application scenario includes servers, terminals, and traffic monitoring equipment, and traffic monitoring equipment includes cameras. The server is used to train the target recognition model, and save the trained target recognition model in the terminal or save it after compression; the camera is used to collect real-time video of moving vehicles on the traffic road, and send the collected real-time video to the terminal ; The terminal is used to implement the moving target detection method to identify the category of the detected moving vehicle.

Please refer to FIG. 3, which is a schematic flowchart of a method for detecting a moving target provided by an embodiment of the present application. The moving object detection method can be applied to a terminal or a server, and quickly identify the category of the detected moving object from the real-time video with a small amount of calculation.

As shown in FIG. 3, the moving target detection method specifically includes steps S201 to S204, which will be described in detail below in conjunction with FIG. 2.

S201. Acquire real-time video, and determine a moving target in the real-time video.

Specifically, real-time video recording is, for example, a camera in a traffic monitoring device that captures a video of a moving vehicle on a traffic road in real time.

Among them, determine the moving target in the real-time video recording, such as a moving vehicle, and specifically use the inter-frame difference method to detect the real-time video to determine the moving vehicle. Of course, other detection methods can also be used, such as image recognition to determine the moving vehicle. Shape recognition of moving vehicles in real-time video.

S202: Extract a bounding box of the moving target and data information corresponding to the bounding box.

Wherein, the data information includes position information and size information of the bounding box in the real-time video recording. Extracting the bounding box of the moving target and the data information corresponding to the bounding box includes: determining the bounding box of the video frame image of the moving target in real-time recording; extracting the position of the bounding box in the real-time recording Information and size information.

In one embodiment, the specific process of extracting the bounding box and data information is shown in FIG. 4, that is, step S202 includes sub-steps S202a and S202b.

S202a. Determine a bounding box corresponding to the moving target according to the horizontal bandwidth and vertical length of the moving target in real-time video recording; S202b. Extract the horizontal bandwidth and vertical length as the size information, and the bounding box As the position information.

Specifically, the corresponding bounding box is determined according to the maximum horizontal bandwidth and vertical length of the moving target in real-time recording; and the maximum horizontal bandwidth and vertical length are extracted as size information, and the center coordinate value of the bounding box is obtained as According to the position information, the size and position information of the bounding box can be obtained, and the size and position information of the bounding box is the data information corresponding to the bounding box.

It should be noted that: a frame of image in real-time recording may include multiple moving targets, such as multiple moving vehicles, each moving vehicle corresponds to a bounding box, so the real-time recording video frame may correspond to multiple Bounding box.

S203. Input the image in the bounding box to a pre-trained target recognition model for recognition and detection according to the data information, so as to output a classification category corresponding to the moving target.

Specifically, the image in the bounding box can be determined according to the data information of the bounding box, and then the image in the bounding box is input to a pre-trained target recognition model for prediction, so as to output the classification category corresponding to the moving target.

For example, if the moving target is a moving vehicle, the target recognition model may recognize that the classification category of the moving vehicle includes information such as car logo and model. Specifically, as shown in Figure 2, the predicted logo and model of the sports vehicle are Audi And the car.

S204: Mark the moving target in the real-time video recording according to the classification category.

Specifically, marking the moving targets in the real-time recording according to the classification category includes displaying the classification category output by the model at the moving target in the real-time recording. Of course, the bounding box can also be displayed in the real-time video, and then the classification category can be displayed in the bounding box. Alternatively, other labeling methods may also be used to label the moving target in the real-time video recording. Therefore, by marking the moving target, it is convenient for the user to locate or track the moving vehicle.

It should be noted that if multiple moving targets are included in the real-time recording, each moving target needs to be marked separately for the user to recognize.

The method for recognizing moving objects provided in the above embodiments can quickly recognize and classify moving objects, such as recognizing car logos and car models corresponding to moving vehicles. Specifically, after determining the moving target in real-time video; extracting the bounding box of the moving target and the data information corresponding to the bounding box; determining the image in the bounding box according to the data information corresponding to the bounding box, and then inputting the image in the bounding box To the pre-trained target recognition model to output the classification category of the moving target. This realizes the recognition and classification of moving targets in real-time video. This method can reduce the amount of calculation during classification, thereby improving the recognition efficiency of moving targets, and is suitable for real-time detection and recognition.

Please refer to FIG. 5, which is a schematic flowchart of steps for determining a moving target provided by an embodiment of the present application. In order to quickly and accurately determine the moving target in the real-time recording, as shown in Figure 5, the steps of determining the moving target specifically include the following:

S301: Determine a current frame image from the real-time video recording, and use the current frame image as a reference image.

Wherein, the current frame image is determined from the real-time recording, and the corresponding video picture can be selected as the current frame image according to the user in the real-time recording. For example, when the real-time video is played, the user clicks to select the currently played video, and the video frame selected by the user can be used as the current frame image. Of course, the user can also specify the corresponding video frame as the current frame image.

Specifically, the determined current frame image is taken as the reference image, and the reference image is expressed as f _k (i, j), where k represents the current frame image of the k-th video frame in the real-time recorded image sequence, where k is a positive integer, (i, j) are expressed as discrete image coordinates in the video frame.

S302. Acquire the moving speed of the moving target to be determined.

In this embodiment, in order to improve the efficiency and accuracy of determining the moving target, the moving speed of the moving target can be determined first, and then the corresponding preset number of frames is selected according to the moving speed, where different moving speeds correspond to different numbers of presets The number of frames.

Specifically, the movement speed is a range value, of course, it can also be a specific value. The movement speed range value is, for example, 90 to 110km/h; the specific movement speed value is, for example, 100km/h.

In one embodiment, to obtain the moving speed of the moving target to be determined, the moving speed of the moving target to be determined may be measured by a speed measuring instrument, such as a laser speedometer. Of course, to obtain the moving speed of the moving target to be determined, the moving speed of the moving target can also be calculated based on two images with a certain number of frames in the interval.

In an embodiment, in order to save the calculation amount of the terminal, the speed and accuracy of moving target recognition are improved. To obtain the moving speed of the moving target to be determined, the moving speed of the moving target to be determined can be determined according to the environmental parameters of the moving target.

For example, first determine which lane of the highway the vehicle is on, so that the approximate range of the moving vehicle can be determined according to the specific road. For example, if the vehicle is in the rightmost lane, according to the speed limit range of 60km/h～90km/h in the rightmost lane, it can be determined that the moving target's moving speed is roughly 60km/h～90km/h; accordingly, the speed limit in the middle lane The range is 90km/h～110km/h; the leftmost lane is the overtaking lane, and the minimum speed is higher than 110km/h. For another example, there is only one motor vehicle lane in the same direction on a city road, and the speed limit is 50 kilometers per hour. If the moving target is on a city road, the moving speed can be determined to be approximately 50 km/h.

S303: According to the preset correspondence between the motion speed range and the preset frame number, determine the preset frame number corresponding to the acquired motion speed range.

Specifically, the number of delayed preset frames is set according to the motion speed. For example, a vehicle in the leftmost lane on an expressway moves faster, and its corresponding delay preset number of frames is less. For example, set the preset number of frames to delay 1 or 2 frames; on an expressway For vehicles in the middle lane, the speed is also relatively fast. Set the default frame number to 4 or 5 frames later; for vehicles in the rightmost lane on the expressway, the speed is relatively fast, so set the default frame number to delay 7 frames or 8 frames; the speed of vehicles on urban roads is relatively slow, and the corresponding delay preset frame number can be set to a larger number of frames, such as 9 or 10 frames.

Therefore, according to the preset correspondence between the motion speed range and the preset frame number, the preset frame number corresponding to the acquired motion speed range is determined, which can be changed according to the actual situation of the moving target, thereby quickly and accurately determining real-time recording Sports goals in.

For example, if the vehicle is in the leftmost lane of the expressway, it is determined that the moving speed of the moving vehicle is approximately 110km/h or more, and the obtained moving speed is determined according to the preset correspondence between the moving speed range and the preset number of frames The preset number of frames corresponding to the range is specifically 2 frames.

S304. Extract a delayed frame image that is delayed by a preset number of frames relative to the reference image.

Specifically, the reference image is expressed as f _k (i, j). For example, if the vehicle is in the leftmost lane on the expressway, the predetermined number of frames is determined to be 2 frames, and then an image that is 2 frames behind the reference image can be extracted As the delayed frame image, the delayed frame image is expressed as f _k+2 (i, j).

S305: Subtracting the delayed frame image and the current frame image to obtain a difference image.

Specifically, the deferred frame image and the current frame image are subtracted by a difference method to obtain a difference image, and the difference image is expressed as:

D _k =|f _k+2 (i,j)-f(i,j)| (1)

Among them, in formula (1), D _k represents a differential image, f _k (i, j) represents a reference image, f _k+2 (i, j) represents a delayed frame image, and (i, j) represents a discrete image coordinate.

S306: Perform threshold processing on the difference image to obtain a binary image corresponding to the difference image.

Specifically, the performing threshold processing on the differential image to obtain the binary image corresponding to the differential image includes: determining pixels in the differential image with pixel values greater than a preset threshold; The pixel points determine the binary image corresponding to the difference image.

Wherein, the binary image is expressed as:

Among them, S _k (i, j) represents a binary image, T is a preset threshold, (i, j) represents the coordinates of a discrete image, and D _k represents a differential image; greater than or equal to the preset threshold is represented as 1, and less than the preset threshold. Let the threshold be represented as 0.

S307: Determine a moving target in the real-time video recording according to the binary image.

Wherein, the determining the moving target in the real-time video recording according to the binary image includes: setting the area corresponding to S _k (i, j) of 1 in the binary image as the moving area; passing through the moving area Morphological processing and connectivity analysis remove noise to determine the moving target in the real-time video.

Specifically, the area corresponding to S _k (i, j) of 1 in the binary image is set as the motion area, and then the motion area is processed by morphological processing and connectivity analysis to remove noise, so as to obtain effective motion aims.

Please refer to FIG. 6. FIG. 6 is a schematic block diagram of a model training device provided by an embodiment of the present application. The model training device may be configured in a server and used to execute the aforementioned target recognition model training method.

As shown in FIG. 6, the model training device 400 includes: a picture acquisition unit 401, a picture labeling unit 402, a parameter changing unit 403, a data construction unit 404, and a model training unit 405.

The picture acquiring unit 401 is configured to acquire a target picture, where the target picture is a picture of multiple target objects taken from different angles.

The picture marking unit 402 is configured to mark the target picture according to the category identifier corresponding to the classification category.

The parameter changing unit 403 is configured to perform an image processing operation on the target picture to change the picture parameters of the target picture, and use the target picture whose picture parameters are changed as a new target picture.

Wherein, the image processing operations include: size adjustment, cropping processing, rotation processing, image algorithm processing, etc.; the image algorithm processing includes: color temperature adjustment algorithm, exposure adjustment algorithm, contrast adjustment algorithm, highlight restoration algorithm, low light compensation algorithm , White balance algorithm, sharpness adjustment algorithm, fogging algorithm index, natural saturation adjustment algorithm.

The data construction unit 404 is configured to construct sample data according to the new target picture and the target picture.

The model training unit 405 is configured to perform model training according to the sample data based on the convolutional neural network to obtain a target recognition model, and use the obtained target recognition model as a pre-trained target recognition model.

Please refer to FIG. 7. FIG. 7 is a schematic block diagram of a moving target detection device provided in an embodiment of the present application, and the moving target detection device is used to execute the aforementioned moving target detection method. Wherein, the moving target detection device can be configured in a server or a terminal.

As shown in FIG. 7, the moving target detection device 500 includes: an acquisition and determination unit 501, an information extraction unit 502, an identification and detection unit 503, and a target labeling unit 504.

The obtaining and determining unit 501 is configured to obtain real-time video and determine the moving target in the real-time video.

The information extraction unit 502 is configured to extract a bounding box of the moving target and data information corresponding to the bounding box, the data information including position information and size information of the bounding box in the real-time video recording.

Wherein, the information extraction unit 502 is specifically configured to determine the bounding box corresponding to the moving target according to the horizontal broadband and vertical length of the moving target in real-time video recording; extract the horizontal broadband and vertical length as the size Information, and the center coordinates of the bounding box as the position information.

The recognition and detection unit 503 is configured to input the image in the bounding box into a pre-trained target recognition model for recognition and detection according to the data information, so as to output the classification category corresponding to the moving target;

The target labeling unit 504 is configured to label the moving target in the real-time video recording according to the classification category.

In one embodiment, as shown in FIG. 8, the acquisition and determination unit 501 includes: a reference determination unit 5011, a speed determination unit 5012, a frame number determination unit 5013, an image extraction unit 5014, an image subtraction unit 5015, and an image processing unit 5016.

The reference determining unit 5011 is configured to determine a current frame image from the real-time video recording, and use the current frame image as a reference image.

The speed determining unit 5012 is used to obtain the moving speed of the moving target to be determined, where different moving speeds correspond to different numbers of preset frames.

The frame number determining unit 5013 is configured to determine the preset frame number corresponding to the acquired motion speed range according to the preset correspondence between the motion speed range and the preset frame number.

The image extraction unit 5014 is configured to extract a delayed frame image that is delayed by a preset number of frames relative to the reference image.

The image subtraction unit 5015 is configured to subtract the delayed frame image and the current frame image to obtain a difference image.

The image processing unit 5016 is configured to perform threshold processing on the difference image to obtain a binary image corresponding to the difference image.

It should be noted that those skilled in the art can clearly understand that for the convenience and conciseness of description, the specific working process of the device and each unit described above can refer to the corresponding process in the foregoing method embodiment, and it will not be omitted here. Repeat.

The above-mentioned apparatus may be implemented in the form of a computer program, and the computer program may run on the computer device as shown in FIG. 9.

Please refer to FIG. 9, which is a schematic block diagram of the structure of a computer device according to an embodiment of the present application. The computer equipment can be a server or a terminal.

Referring to FIG. 9, the computer device includes a processor, a memory, and a network interface connected through a system bus, where the memory may include a non-volatile storage medium and an internal memory.

The non-volatile storage medium can store an operating system and a computer program. The computer program includes program instructions, and when the program instructions are executed, the processor can execute any moving target detection method.

The processor is used to provide calculation and control capabilities and support the operation of the entire computer equipment.

The internal memory provides an environment for the running of the computer program in the non-volatile storage medium. When the computer program is executed by the processor, the processor can execute any method for detecting moving objects.

The network interface is used for network communication, such as sending assigned tasks. Those skilled in the art can understand that the structure shown in FIG. 9 is only a block diagram of part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied. The specific computer device may Including more or less parts than shown in the figure, or combining some parts, or having a different part arrangement.

It should be understood that the processor may be a central processing unit (Central Processing Unit, CPU), and the processor may also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), and application specific integrated circuits (Application Specific Integrated Circuits). Circuit, ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc. Among them, the general-purpose processor may be a microprocessor or the processor may also be any conventional processor.

The embodiments of the present application also provide a computer-readable storage medium, the computer-readable storage medium stores a computer program, the computer program includes program instructions, and the processor executes the program instructions to implement the present application Any of the moving target detection methods provided by the embodiments.

Wherein, the computer-readable storage medium may be the internal storage unit of the computer device described in the foregoing embodiment, such as the hard disk or memory of the computer device. The computer-readable storage medium may also be an external storage device of the computer device, such as a plug-in hard disk, a smart media card (SMC), or a secure digital (Secure Digital, SD) equipped on the computer device. ) Card, Flash Card, etc.

The above are only specific implementations of this application, but the protection scope of this application is not limited to this. Anyone familiar with the technical field can easily think of various equivalents within the technical scope disclosed in this application. Modifications or replacements, these modifications or replacements shall be covered within the protection scope of this application. Therefore, the protection scope of this application should be subject to the protection scope of the claims.

Claims

A method for detecting moving targets includes:

Acquiring real-time video, and determining the moving target in the real-time video;

Extracting a bounding box of the moving target and data information corresponding to the bounding box, where the data information includes position information and size information of the bounding box in the real-time video recording;

Inputting the image in the bounding box to a pre-trained target recognition model for recognition and detection according to the data information, so as to output a classification category corresponding to the moving target;

Marking the moving target in the real-time video recording according to the classification category.
The detection method according to claim 1, wherein said determining the moving target in the real-time video recording comprises:

Determine a current frame image from the real-time video, and use the current frame image as a reference image;

Extracting a delayed frame image that is delayed by a preset number of frames relative to the reference image;

Subtracting the delayed frame image and the current frame image to obtain a difference image;

Performing threshold processing on the difference image to obtain a binary image corresponding to the difference image; and

The moving target in the real-time video recording is determined according to the binary image.
2. The detection method according to claim 2, wherein before extracting a delayed frame image that is delayed by a preset number of frames relative to the reference image, the method further comprises:

Acquiring the movement speed of the moving target to be determined, where different movement speeds correspond to different numbers of preset frames;

According to the preset correspondence between the motion speed range and the preset frame number, determine the preset frame number corresponding to the acquired motion speed range.
The detection method according to claim 2 or 3, wherein the threshold processing on the difference image to obtain the binary image corresponding to the difference image comprises:

Determining pixels in the differential image with pixel values greater than a preset threshold;

The binary image corresponding to the difference image is determined according to pixels larger than the preset threshold.
The detection method according to claim 4, wherein the binary image is expressed as:

Among them, S k (i, j) represents a binary image, T is a preset threshold, (i, j) represents the coordinates of a discrete image, and D k represents a differential image;

The determining the moving target in the real-time video recording according to the binary image includes:

Set the area corresponding to S k (i, j) as 1 in the binary image as the motion area;

Morphological processing and connectivity analysis are performed on the moving area to remove noise to determine the moving target in the real-time video.
The detection method according to claim 1, wherein said extracting the bounding box of the moving target and the data information corresponding to the bounding box comprises:

Determine the bounding box corresponding to the moving target according to the horizontal bandwidth and vertical length of the moving target in real-time video recording;

Extract the horizontal bandwidth and vertical length as the size information, and the center coordinates of the bounding box as the position information.
The detection method according to claim 1, further comprising:

Acquiring a target picture, the target picture being pictures of multiple target objects taken from different angles;

Marking the target picture according to the category identifier corresponding to the classification category to construct sample data;

Based on the convolutional neural network, model training is performed according to the sample data to obtain a target recognition model, and the obtained target recognition model is used as a pre-trained target recognition model.
A moving target detection device includes:

An obtaining and determining unit, configured to obtain real-time video, and determine the moving target in the real-time video;

An information extraction unit, configured to extract a bounding box of the moving target and data information corresponding to the bounding box, the data information including position information and size information of the bounding box in the real-time video;

A recognition detection unit, configured to input the image in the bounding box into a pre-trained target recognition model for recognition and detection according to the data information, so as to output a classification category corresponding to the moving target;

The target labeling unit is configured to label the moving target in the real-time video recording according to the classification category.
A computer device, wherein the computer device includes a memory and a processor;

The memory is used to store computer programs;

The processor is configured to execute the computer program and implement the following steps when executing the computer program:

Acquiring real-time video, and determining the moving target in the real-time video;

Extracting a bounding box of the moving target and data information corresponding to the bounding box, where the data information includes position information and size information of the bounding box in the real-time video recording;

Inputting the image in the bounding box to a pre-trained target recognition model for recognition and detection according to the data information, so as to output a classification category corresponding to the moving target;

Marking the moving target in the real-time video recording according to the classification category.
9. The computer device according to claim 9, wherein the processor is configured to realize the following when realizing the determination of the moving target in the real-time video recording:

Determine a current frame image from the real-time video, and use the current frame image as a reference image;

Extracting a delayed frame image that is delayed by a preset number of frames relative to the reference image;

Subtracting the delayed frame image and the current frame image to obtain a difference image;

Performing threshold processing on the difference image to obtain a binary image corresponding to the difference image; and

The moving target in the real-time video recording is determined according to the binary image.
The computer device according to claim 10, wherein the processor is further configured to implement: before implementing the extraction of the delayed frame image delayed by a preset number of frames relative to the reference image:

Acquiring the movement speed of the moving target to be determined, where different movement speeds correspond to different numbers of preset frames;

According to the preset correspondence between the motion speed range and the preset frame number, determine the preset frame number corresponding to the acquired motion speed range.
The computer device according to claim 10 or 11, wherein, when the processor implements the threshold processing on the differential image to obtain the binary image corresponding to the differential image, the processor is configured to implement:

Determining pixels in the differential image with pixel values greater than a preset threshold;

The binary image corresponding to the difference image is determined according to pixels larger than the preset threshold.
The computer device according to claim 12, wherein the binary image is represented as:

Among them, S k (i, j) represents a binary image, T is a preset threshold, (i, j) represents the coordinates of a discrete image, and D k represents a differential image;

When the processor realizes the determination of the moving target in the real-time video recording according to the binary image, it is used to realize:

Set the area corresponding to S k (i, j) as 1 in the binary image as the motion area;

Morphological processing and connectivity analysis are performed on the moving area to remove noise to determine the moving target in the real-time video.
The computer device according to claim 9, wherein the processor is configured to implement the following when extracting the bounding box of the moving target and the data information corresponding to the bounding box:

Determine the bounding box corresponding to the moving target according to the horizontal bandwidth and vertical length of the moving target in real-time video recording;

Extract the horizontal bandwidth and vertical length as the size information, and the center coordinates of the bounding box as the position information.
A computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the processor implements the following steps:

Acquiring real-time video, and determining the moving target in the real-time video;

Extracting a bounding box of the moving target and data information corresponding to the bounding box, where the data information includes position information and size information of the bounding box in the real-time video recording;

Inputting the image in the bounding box to a pre-trained target recognition model for recognition and detection according to the data information, so as to output a classification category corresponding to the moving target;

Marking the moving target in the real-time video recording according to the classification category.
15. The computer-readable storage medium according to claim 15, wherein the processor is configured to realize:

Determine a current frame image from the real-time video, and use the current frame image as a reference image;

Extracting a delayed frame image that is delayed by a preset number of frames relative to the reference image;

Subtracting the delayed frame image and the current frame image to obtain a difference image;

Performing threshold processing on the difference image to obtain a binary image corresponding to the difference image; and

The moving target in the real-time video recording is determined according to the binary image.
The computer-readable storage medium according to claim 16, wherein the processor is further configured to implement: before implementing the extraction of the delayed frame image delayed by a preset number of frames relative to the reference image:

Acquiring the movement speed of the moving target to be determined, where different movement speeds correspond to different numbers of preset frames;

According to the preset correspondence between the motion speed range and the preset frame number, determine the preset frame number corresponding to the acquired motion speed range.
The computer-readable storage medium according to claim 16 or 17, wherein, when the processor implements the threshold processing on the differential image to obtain the binary image corresponding to the differential image, the processor is configured to implement:

Determining pixels in the differential image with pixel values greater than a preset threshold;

The binary image corresponding to the difference image is determined according to pixels larger than the preset threshold.
The computer-readable storage medium according to claim 18, wherein the binary image is represented as:

Among them, S k (i, j) represents a binary image, T is a preset threshold, (i, j) represents the coordinates of a discrete image, and D k represents a differential image;

When the processor realizes the determination of the moving target in the real-time video recording according to the binary image, it is used to realize:

Set the area corresponding to S k (i, j) as 1 in the binary image as the motion area;

Morphological processing and connectivity analysis are performed on the moving area to remove noise to determine the moving target in the real-time video.
15. The computer-readable storage medium according to claim 15, wherein, when the processor implements the extraction of the bounding box of the moving target and the data information corresponding to the bounding box, it is configured to implement:

Determine the bounding box corresponding to the moving target according to the horizontal bandwidth and vertical length of the moving target in real-time video recording;

Extract the horizontal bandwidth and vertical length as the size information, and the center coordinates of the bounding box as the position information.