WO2021212659A1

WO2021212659A1 - Video data processing method and apparatus, and computer device and storage medium

Info

Publication number: WO2021212659A1
Application number: PCT/CN2020/099082
Authority: WO
Inventors: 黄小弟
Original assignee: 平安国际智慧城市科技股份有限公司
Priority date: 2020-04-24
Filing date: 2020-06-30
Publication date: 2021-10-28
Also published as: CN111626123A

Abstract

Disclosed are a video data processing method and apparatus, and a computer device and a storage medium. The method comprises: acquiring a set of target data and a set of videos to be processed; extracting at least two video frame images from each original video; an image identification model performing image binarization processing, edge detection processing and contour tracking processing on a video image to then obtain a license plate area, a facial area and a human-body feature area; extracting, by means of a character segmentation algorithm, a text feature from the license plate area to obtain a license plate number result; extracting, by means of a YOLO algorithm, a facial feature from the facial area to obtain a facial result; extracting, by means of a chain-code curvature algorithm, a human-body posture feature from the human-body feature area to obtain a human-body feature result; determining an identification result; extracting matching fragments; and performing splicing to obtain video composite fragments, and ranking the video composite fragments and then obtaining a target video. In addition, the present application further relates to blockchain technology. The set of target data may be stored in a blockchain node.

Description

Video data processing method, device, computer equipment and storage medium

This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on April 24, 2020, the application number is CN202010332455.9, and the invention title is "Video data processing methods, devices, computer equipment and storage media", and its entire contents Incorporated in this application by reference.

Technical field

This application relates to the field of artificial intelligence data processing, and in particular to a video data processing method, device, computer equipment, and storage medium.

Background technique

At present, with the continuous development of traffic surveillance video, the amount of surveillance video data continues to increase. In the prior art, the inventor realized that after a traffic accident, surveillance video data needs to be retrieved for tracking and responsibilities, most of which are through the naked eye. Perform manual viewing and interception, and stitch the intercepted video clips as evidence of responsibility. Due to the scattered and small scope of the license plates, faces, and human bodies captured in the surveillance video data, it is difficult for the staff to find it and work efficiency It will be affected by the mental state of the staff, and it is also prone to the risk of omissions leading to improper responsibilities, resulting in high input costs and low work efficiency.

Summary of the invention

This application provides a video data processing method, device, computer equipment, and storage medium, which can automatically quickly and accurately cut out video clips related to the target data set from the video set to be processed. This application can be applied to the field of smart transportation In this way, it promotes the construction of smart cities, improves the efficiency and accuracy of recognition, and greatly reduces input costs.

A video data processing method, including:

Receive a video extraction instruction to obtain a target data set and a video set to be processed; the target data set includes a license plate number, a face image, and a human body feature map; the video set to be processed includes a number of original videos, each of the original videos Are all associated with a unique identification code;

Extract at least two video frame images from each of the original videos according to a preset extraction rule;

The extracted video frame image is input into an image recognition model, and the image recognition model performs image binarization processing, edge detection processing, and contour tracking processing on the video image to obtain license plate area, face area, and human body features Region, extract the text features of the license plate area through the character segmentation algorithm, obtain the license plate number result of the license plate area, and extract the facial features of the face area through the YOLO algorithm to obtain the face area And extract the human body posture feature of the human body feature region through a chain code curvature algorithm to obtain the human body feature result of the human body feature region;

According to the license plate number result, the face result, and the human body feature result, the recognition result corresponding to the video frame image is determined; the recognition result characterizes whether the video frame image contains the license plate number, Any one of the human face image and the human body feature map; the recognition result includes correlation or non-correlation with the license plate number, the human face image, and the human body feature map;

According to the recognition result corresponding to the video frame image in each of the original videos, multiple matching segments in each of the original videos are extracted, and the matching segment and the unique identifier corresponding to it are extracted Code to associate; the matching segment is the video segment between the start frame image and the end frame image in the original video; the start frame image refers to the image of the previous video frame image that is related and adjacent to the recognition result The recognition result is a non-correlated video frame image; the end frame image means that the recognition result is non-correlated and the recognition result of the previous video frame image adjacent to it is the relevant video frame image, or refers to the video frame image in the original video Video frame image at the end;

Splicing the matching segments associated with the same unique identification code in chronological order to obtain a video synthesis segment, and associating the video synthesis segment with the same unique identification code;

According to a preset unique identification code order rule, all the video synthesis fragments are sorted, and all the video synthesis fragments after sorting are determined as target videos in the to-be-processed video set that are related to the target data set.

A video data processing device, including:

The receiving module is used to receive a video extraction instruction to obtain a target data set and a to-be-processed video set; the target data set includes a license plate number, a face image, and a human body feature map, and the to-be-processed video set includes a number of original videos, each Each of the original videos is associated with a unique identification code;

An extraction module, configured to extract at least two video frame images from each of the original videos according to a preset extraction rule;

The acquisition module is used to input the extracted video frame image into an image recognition model, and the image recognition model obtains the license plate area and the person after image binarization processing, edge detection processing, and contour tracking processing are performed on the video image. Face area and human body feature area, extract the text feature of the license plate area through the character segmentation algorithm, obtain the license plate number result of the license plate area, and extract the facial features of the face area through the YOLO algorithm to obtain Extracting the human body posture feature of the human body feature region by using a chain code curvature algorithm to obtain the human body feature result of the human body feature region;

The recognition module is used to determine the recognition result corresponding to the video frame image according to the result of the license plate number, the face result and the result of the human body feature; the recognition result characterizes whether the video frame image contains Any one of the license plate number, the face image, and the human body feature map; the recognition result includes whether it is related or unrelated to the license plate number, the face image, and the human body feature map;

The extraction module is configured to extract multiple matching segments in each original video according to the recognition result corresponding to the video frame image in each original video, and compare the matching segments to the corresponding The unique identification code is associated with the unique identification code; the matching segment is the video segment between the start frame image and the end frame image in the original video; the start frame image refers to the previous recognition result that is related and adjacent to it The recognition result of a video frame image is a non-correlated video frame image; the end frame image refers to the recognition result is non-correlated and the recognition result of the previous video frame image adjacent to it is a related video frame image, or refers to the relevant video frame image. The video frame image at the end of the original video;

A merging module, configured to splice the matching segments associated with the same unique identification code in chronological order to obtain a video synthesis segment, and associate the video synthesis segment with the same unique identification code;

The determining module is configured to sort all the video synthesis fragments according to a preset unique identification code order rule, and determine all the video synthesis fragments after sorting as the to-be-processed video set related to the target data set The target video.

A computer device includes a memory, a processor, and a computer program that is stored in the memory and can run on the processor, and the processor implements the following steps when the processor executes the computer program:

A computer-readable storage medium having a computer program stored on the computer-readable storage medium, wherein, when the computer program is executed by a processor, the following steps are implemented:

This application can automatically quickly and accurately cut out video clips related to the target data set from the video set to be processed, which improves the recognition efficiency and accuracy, and greatly reduces the input cost.

Description of the drawings

FIG. 1 is a schematic diagram of an application environment of a video data processing method in an embodiment of the present application;

Figure 2 is a flowchart of a video data processing method in an embodiment of the present application;

Fig. 3 is a flowchart of step S10 of a video data processing method in an embodiment of the present application;

4 is a flowchart of step S20 of the video data processing method in an embodiment of the present application;

FIG. 5 is a flowchart of step S30 of the video data processing method in an embodiment of the present application;

Fig. 6 is a flowchart of step S30 of a video data processing method in another embodiment of the present application;

FIG. 7 is a flowchart of step S40 of the video data processing method in an embodiment of the present application;

Fig. 8 is a functional block diagram of a video data processing device in an embodiment of the present application;

Fig. 9 is a schematic diagram of a computer device in an embodiment of the present application;

The realization, functional characteristics, and advantages of the purpose of this application will be further described in conjunction with the embodiments and with reference to the accompanying drawings.

Detailed ways

The technical solutions in the embodiments of the present application will be described clearly and completely in conjunction with the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, rather than all of them. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of this application.

The video data processing method provided by this application can be applied in the application environment as shown in Fig. 1, in which the client (computer equipment) communicates with the server through the network. Among them, the client (computer equipment) includes, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, cameras, and portable wearable devices. The server can be implemented as an independent server or a server cluster composed of multiple servers.

In an embodiment, as shown in FIG. 2, a video data processing method is provided, and the technical solution mainly includes the following steps S10-S70:

S10. Receive a video extraction instruction, and obtain a target data set and a video set to be processed; the target data set includes a license plate number, a face image, and a human body feature map; the video set to be processed includes a number of original videos, each of which The original videos are all associated with a unique identification code.

Understandably, the video extraction instruction is an instruction triggered after selecting the target data set and the to-be-processed video set, the target data set is a data set related to the searched target, and the target is The person who needs to be searched, the target data set includes a license plate number, a face image and a human body feature map, the license plate number is the unique license plate number of the vehicle driven by the target, and the face image is the face of the target The image of the region, the human body feature map is an image of a specific human posture of the target person, for example, the human body feature map is the upper body image of the target person driving a vehicle, and the to-be-processed video set is to find videos related to the target person The set of videos to be processed includes at least one of the original videos. The original video may be an unprocessed video clip or a clipped video clip, for example, the original video is of a certain intersection on a certain day Surveillance video, or the original video is surveillance video of a certain street from 19:00 to 21:00 on a certain day, etc., each of the original videos is associated with a unique identification code, and the unique identification code is The unique identification code assigned by the original video, and the unique identification code can be set according to requirements.

In an embodiment, as shown in FIG. 3, before the step 10, that is, before the acquisition of the target data set, the method includes:

S101: Receive a target collection instruction, and obtain a sample video; the sample video includes a license plate, a human face, and a specific human posture.

Understandably, the target collection instruction is an instruction that is triggered when the target person needs to collect information, the sample video is a short video clip related to the target person, and the content in the sample video contains a license plate , The face of a target person and the specific body posture of a target person, which means that the license plate, the face and the specific body posture appear at least once in the sample video.

S102: Split the sample video into several sample images.

Understandably, according to a preset splitting parameter, one of the sample images in the sample video is obtained every interval of the splitting parameter, and the sample image is an image selected from the sample videos, and the The splitting parameters can be set according to requirements. For example, the splitting parameters can be set to 30 frames or 25 frames, etc., to split the sample video into multiple sample images.

S103. Input all the sample images into a sample collection model, the sample collection model performs image collection on all the sample images, intercepts all the images of the license plate area containing the license plate in all the sample images, and intercepts all the images at the same time. The sample image includes the human face region image of the human face, and the human body feature region images including the human face and the specific human posture are extracted from all the sample images.

Understandably, the sample collection model refers to a neural network model that has been trained to identify the license plate area, face area, and human feature area in the image. The image collection is to identify and cut out the license plate contained in the sample image. A rectangular license plate area image, a rectangular face area image containing the face, and a rectangular human feature area image containing the face and the specific human posture. The network structure of the sample collection model can be implemented according to requirements Setting, for example, the network structure of the sample collection model can be the network structure of the Inception series or the network structure of the VGG series.

S104: Input the license plate area image into a license plate extraction model, and the license plate extraction model performs license plate number recognition on the license plate area image, and obtains a license plate number output by the license plate extraction model.

Understandably, the license plate extraction model refers to a neural network model that has been trained to identify the license plate number in the image. The network structure of the license plate extraction model can be set according to requirements, such as the network structure of the license plate extraction model. It can be obtained by migration learning GoogleNet. The license plate extraction model can identify Chinese characters, numbers and letters in the license plate area image, the license plate number is the unique identification code of the vehicle driven by the target, and the license plate number consists of Chinese characters, Composition of numbers and letters.

S105: Input all the face region images into a face extraction model, and the face extraction model filters all the face region images, and obtains the face extraction model to filter a face image.

Understandably, through the face extraction model, from all the face region images, the face region images containing eyes, eyebrows, mouth, nose, ears, face profile and having the highest definition are screened out, and the screening method It can be set according to requirements. Preferably, the filtering method is to obtain the average value of the pixel values of the pixels at the same position in all the face region images, and then combine all the pixels at the same position in the face region images. The average value of the pixel value is recorded as the average value of the pixel corresponding to the pixel, and the difference between each pixel in the face area image and the average of the pixel corresponding to the pixel is taken and the absolute value is taken to obtain the absolute pixel difference , And calculate the sum of the absolute pixel differences of all pixels in the face area image, determine the sum of the absolute pixel differences of all pixels in the face area image as the image difference value, and select The face area image with the smallest correspondence among all the image differences is determined to be the face image, that is, the face area image after screening is determined as the face image.

S106: Input all the human body feature region images into a human body feature extraction model, and the human body feature extraction model screens all the human body feature region images to obtain a human body feature map selected by the human body feature extraction model.

Understandably, the human body feature region image with the highest definition including the face and the specific human body posture is filtered from all the human body feature region images through the human body feature extraction model, and the screening method It can be set according to requirements. Preferably, the filtering method is the sum of the pixel value corresponding to each pixel in the image of the human body feature area and the pixel value corresponding to the surrounding pixels adjacent to the pixel. The difference is determined as the local pixel difference, the sum of all the local pixel differences in each of the human body feature region images is obtained, and the corresponding one with the smallest sum of all the local pixel differences in all the human body feature region images is selected The human body feature region image is determined as the human body feature map, that is, the human body feature region image after screening is determined as the human body feature map.

S107: Determine the license plate number, the face image, and the human body feature map as the target data set.

Understandably, the license plate number, the face image, and the human body feature map are determined and recorded as the target data set, that is, the target data set contains the license plate number, face, and specific information related to the target. Information collection of human posture.

In this way, the sample video is collected through the sample collection model, and the license plate number, face image, and face feature map are obtained from the image after the image collection through the license plate extraction model, face extraction model, and human feature extraction model. It automatically recognizes the license plate number from the sample video and intercepts the target's face and human body feature maps, which improves the accuracy of recognition and the reliability of interception, reduces labor costs, and improves efficiency.

S20: Extract at least two video frame images from each of the original videos according to a preset extraction rule.

Understandably, the extraction rules can be set according to requirements. For example, the extraction rules can be set to extract at least two video frame images from the original video on average, or to extract video frames from the frame parameters preset in the original video at every interval. Image, etc., the video frame image is an image corresponding to a frame in the original video.

In an embodiment, as shown in FIG. 4, the step S20, that is, extracting at least two video frame images from each of the original videos according to a preset extraction rule, includes:

S201: Acquire extraction parameters in the original video and the extraction rule.

Understandably, the extraction rule can be set according to requirements, and the purpose of the extraction rule is to extract at least two video frame images from each of the original videos. For example, the extraction rule can be to obtain the original video. The start video frame image and the end video frame image in the original video, and the video frame image at the bisecting center position in the original video, the extraction rule may also be to obtain the start video frame image in the original video The video frame image at the end and the end of the original video, and the extraction parameter preset in the middle part of the original video selects one video frame image. The extraction parameter can be set according to requirements, for example, the extraction parameter is 15 frames or 25 frames. , 30 frames and so on.

S202: Determine a starting video frame image in the original video as a starting frame image.

Understandably, the video frame image corresponding to the first frame in the original video is determined as the start frame image.

S203: Determine the ending video frame image in the original video as the ending frame image.

Understandably, the video frame image corresponding to the last frame in the original video is determined as the end frame image.

S204: Starting from the start frame image, extract one video frame image every other extraction parameters until the end frame image.

Understandably, the extraction parameters can be set according to requirements. For example, the extraction parameters can be set to 25 frames (about 1 second), that is, starting from the start frame image, a video is extracted every 25 frames Frame images, stop until the end frame image is less than the 25-frame interval.

S205: Determine the video frame image after the extraction as a process frame image.

Understandably, the process frame image is the video frame image after extraction except for the start frame image and the end frame image in the original video.

S206: All the video frame images in the original video are composed of the start frame image, the end frame image, and the process frame image.

Understandably, the start frame image, the end frame image, and all the process frame images are determined as all the video frame images in the original video.

In this way, a method for extracting video frame images from the original video is provided, and it is guaranteed that at least two video frame images are extracted from the original video, and the problem of insufficient extraction caused by omission in the recognition process is avoided through the extraction rule.

S30. Input the extracted video frame image into an image recognition model, and the image recognition model obtains the license plate area, the face area, and In the human body feature area, extract the text feature of the license plate area through the character segmentation algorithm, and obtain the license plate number result of the license plate area. At the same time, extract the facial features of the face area through the YOLO algorithm to obtain the person And extract the human body posture feature from the human body feature region by using a chain code curvature algorithm to obtain the human body feature result of the human body feature region.

Understandably, the image recognition model includes training of the image binarization processing, the edge detection processing, the contour tracking processing, the character segmentation algorithm, the YOLO algorithm, and the chain code curvature algorithm The completed neural network model, the image recognition model can use the image binarization processing, the edge detection processing and the contour tracking processing to identify the license plate area and the face area in the video image And the human body feature area, the license plate number result of the license plate area can be recognized by the character segmentation algorithm, and the face result of the face area can be recognized by the YOLO algorithm, and finally passed The chain code curvature algorithm can identify the human body feature result of the human body feature region.

The YOLO (You Only Look Once) algorithm is an algorithm that uses a CNN (Convolutional Neural Networks, convolutional neural network) operation to directly predict the categories and regions of different targets.

In an embodiment, as shown in FIG. 5, before step S30, that is, before inputting the extracted video frame image into an image recognition model, the method includes:

S301. Obtain a sample training image set; the sample training image set includes several sample training images, and the sample training images include all the license plate region images, all the face sample images, all the human feature region images, and A number of negative sample images; each of the sample training images is associated with a sample training label.

Understandably, the sample training image set is a collection of the sample training images, and the sample training images include all the license plate area images, all the face area images, all the human feature area images, and at least One of the negative sample images, the negative sample images are collected images that do not contain the license plate, the face, and the specific human posture, and the sample training images are all associated with one of the sample training tags, so The sample training label can be set according to requirements. For example, the sample training label can be set to include relevant and non-relevant, or set to include relevant license plate, relevant face, relevant human body feature, and non-relevant, and so on.

S302. Input the sample training image into a deep convolutional neural network model containing initial parameters, and extract text features, facial features, and human posture features from the sample training image through the deep convolutional neural network model. The convolutional neural network model outputs a training result according to the extracted text feature, the face feature, and the human posture feature.

Understandably, the text features are features related to the color, shape, and text of the license plate, the facial features are features related to the target's eyes, eyebrows, mouth, nose, ears, and face profile, and the human body posture Features are features related to the target’s head, face, arms, shoulders, etc. The training result represents whether the sample training image contains the license plate number, the face image, and the human feature map. In any one of the above, the deep convolutional neural network model extracts the text feature from all the license plate region images, and outputs the training result corresponding to the license plate region image according to the extracted text features, and the deep convolution The neural network model extracts the face features from all the face region images, and outputs the training results corresponding to the face region images according to the extracted face features. The deep convolutional neural network model performs The human body feature region image extracts the human body posture feature, and the training result corresponding to the human body feature region image is output according to the extracted human body posture feature. The deep convolutional neural network model extracts all the negative sample images. The text feature, the face feature and the human posture feature, and the training result corresponding to the negative sample image is output according to the extracted text feature, the face feature and the human posture feature.

S303: Match the training result with the sample training label to obtain a loss value.

Understandably, the training result and the sample training label are input into the loss function in the deep convolutional neural network model. The loss function can be set according to requirements, and can be a multi-class cross-entropy loss function, or It may be a regression loss function, and the loss value is calculated by the loss function.

S304: When the loss value reaches a preset convergence condition, record the deep convolutional neural network model after convergence as an image recognition model.

Wherein, the preset convergence condition may be a condition that the value of the loss value is very small and will not drop after 7000 calculations, that is, the value of the loss value is very small and does not fall after 7000 calculations. When it drops again, stop training, and record the converged deep convolutional neural network model as an image recognition model; the preset convergence condition can also be a condition that the loss value is less than a set threshold, that is, When the loss value is less than the set threshold, the training is stopped, and the converged deep convolutional neural network model is recorded as an image recognition model.

S305: When the loss value does not reach a preset convergence condition, iteratively update the initial parameters of the deep convolutional neural network model, until the loss value reaches the preset convergence condition, converge all subsequent parameters. The deep convolutional neural network model is recorded as an image recognition model.

Understandably, when the loss value does not reach the preset convergence condition, the initial parameters of the iterative deep convolutional neural network model can be continuously updated to continuously move closer to the accurate recognition result, so that the accuracy of the recognition result is increasing. Higher.

In this way, the license plate area image, the face sample image, and the human body feature area image output by inputting the sample video into the sample collection model are used as sample training images, and the image recognition model is obtained by training the sample training images. Therefore, image recognition The model is more sensitive to the data of the target data set, thereby improving the image recognition model to be more targeted, the recognition accuracy rate is higher, and the recognition reliability is improved.

In one embodiment, as shown in FIG. 6, in the step S30, the extracted video frame image is input into an image recognition model, and the image recognition model performs image binarization on the video image. After processing, edge detection processing and contour tracking processing, the license plate area, face area and human body feature area are obtained, and the text feature of the license plate area is extracted through the character segmentation algorithm, and the license plate number result of the license plate area is obtained. The YOLO algorithm extracts the face features of the face area, obtains the face result of the face area, and extracts the posture features of the human body feature area through the chain code curvature algorithm to obtain the human body feature The results of the human body characteristics of the area, including:

S306: Binarize the video frame image by using the image recognition model to obtain a grayscale image.

Understandably, the gray-scale value of each pixel in the video frame image is calculated through the gray-scale algorithm, and the gray-scale value ranges from 0 to 255, which is a binarization process. The gray value is arranged according to the position of the corresponding pixel point to obtain the gray image.

S307: Perform edge detection processing on the grayscale image by using the Candy algorithm to obtain an edge image.

Understandably, the Candy algorithm (Canney edge extraction algorithm) is an algorithm for extracting the edge of an object. The Candy algorithm uses a variational method to be less susceptible to noise interference and can detect the true edge of a weak edge. The Candy algorithm first uses a two-dimensional Gaussian filter function for convolution, and performs noise reduction processing to reduce the noise of each pixel in the image to no interference; secondly, according to the point on the edge contour, the first-order partial derivative in two directions And gradient, calculate the gradient direction of the point; then according to the 0 degree, 45 degree, 90 degree and 135 degree directions corresponding to the gradient direction, determine the adjacent point; finally calculate the difference between the gray value of the point and the adjacent point, according to The difference determines the edge, and finally the final edge image is obtained.

S308: Perform contour tracking processing on the edge image according to the tracking criterion and the node criterion to obtain a contour image.

Understandably, the contour tracking process is to start from a starting point and follow the edge route near the starting point to find points that meet the tracking criteria, and follow the points that meet the tracking criteria until the points that meet the tracking criteria are found. According to the points of the node criterion, the contour in the edge image is clearly marked, and the contour image is obtained.

S309: Analyze the contour image, and cut out the license plate area including the license plate, the face area including the human face, and the human body feature area including the human body posture.

Understandably, by performing image analysis on all the contour images, that is, analyzing the contour images containing license plates, human faces, or human poses, and determining the license plate area containing the license plate and all areas containing the human face from the analysis result The human face area and the human body feature area related to the posture of the human body are included, and intercepted.

S310: Using a character segmentation algorithm and a character recognition method, the image recognition model extracts the text feature of the license plate area to obtain the license plate number result.

Understandably, the character segmentation algorithm is an image segmented into individual characters from the license plate area, and the character recognition method is to recognize letters, Chinese characters and numbers through the segmented image of a single character, that is, extract a single According to the text feature of the image of the character, the character value corresponding to the image of the single character is recognized according to the extracted text feature, and the character segmentation algorithm and the character recognition method are used to compare the image recognition model to the The license plate area performs the extraction of the text feature, and the license plate result is determined according to the text feature. The license plate result is whether the license plate number corresponding to the recognized license plate area is the same as the license plate number, and the license plate result It can be set according to requirements. For example, the license plate result may include the same as the target person's license plate number and the target person's license plate number different.

S311: Using the YOLO algorithm, the image recognition model extracts the facial features of the facial region to obtain the facial result.

Understandably, the YOLO (You Only Look Once) algorithm is an algorithm that uses a CNN operation to directly predict the categories and regions of different targets, and the image recognition model uses the YOLO algorithm to extract the face regions. Face features, the face features are features related to the eyes, eyebrows, mouth, nose, ears, and face profile of the target, the face result is determined according to the face features, and the face result represents In order to determine whether the face area contains the face of the target person, the face result can be set according to requirements. For example, the face result includes the face of the target person and the face of the target person.

S312: Using the chain code curvature algorithm, the image recognition model recognizes the shoulder width scale, the neck scale, and the chest width scale of the human body feature region, and extracts according to the shoulder width scale, the neck scale, and the chest width scale The posture characteristics of the human body are extracted, and the result of the human body characteristics is obtained.

Understandably, the chain code curvature algorithm is an algorithm that calculates the curvature value corresponding to the point four-connected chain code or the eight-connected chain code of the contour edge of the human body feature area. The point of each contour edge is performed by the image recognition model. Recognition, the shoulder width scale, the neck scale, and the chest width scale are recognized, and the human body posture feature correspondence is extracted according to the ratio between the shoulder width scale, the neck scale and the chest width scale The human body feature result is determined according to the vector value corresponding to the human body posture feature. The human body feature result table indicates whether the human body posture of the target person is contained in the human body feature area, and the human body feature result can be It is set according to requirements. For example, the result of the human body feature includes correlation with the human body posture of the target person and non-correlation with the human body posture of the target person.

This application obtains license plate area, face area and human body feature area after binarization processing of the video frame image, edge detection processing based on Candy algorithm and contour tracking processing, and uses character segmentation algorithm to identify the license plate number of the license plate area As a result, using the YOLO algorithm to recognize the face results in the face area and the chain code curvature algorithm to recognize the human body feature results in the human body feature area, realizing the rapid and accurate recognition of the license plate number result, the face result and the body feature result. Improved accuracy and reliability.

S40: Determine a recognition result corresponding to the video frame image according to the license plate number result, the face result and the human body feature result; the recognition result characterizes whether the video frame image contains the license plate number Any one of the face image and the human body feature map; the recognition result includes whether it is related or not related to the license plate number, the face image, and the human body feature map.

Understandably, if the license plate number result is different from the target person's license plate number, and the face result is not related to the target person's face, and the human body feature result is not related to the target person's body posture, It is determined that the recognition result corresponding to the video frame image is not related to the license plate number, the face image, and the human body feature map. If the license plate number result is the same as the target person’s license plate number, or the If the face result is related to the target person’s face, or the human feature result is related to the target person’s body posture, it is determined that the recognition result corresponding to the video image is related to the license plate number, the face image, and the The human body feature map is not relevant. In this way, the video frame image of the target person driving the vehicle after the target person's face is partially occluded can be determined as the relevant image, and the video related to the target person can be more accurately identified.

In one embodiment, as shown in FIG. 7, in the step S40, the recognition result corresponding to the video frame image is determined according to the result of the license plate number, the result of the face, and the result of the human body feature ,include:

S401: Determine the license plate number result, the face result, and the human body feature result of the video frame image as a recognition set of the video frame image.

Understandably, the recognition set is a set that includes a plurality of the license plate number results, a plurality of the face results, and a plurality of the human body feature results.

S402: If there is a match between the license plate number, the face image, and the human body feature map in the recognition set, determine that the recognition result corresponding to the video frame image is the same as the license plate number. , The face image is related to the human body feature map.

Understandably, as long as one of the recognition sets matches the license plate number or the face image or the human body feature map, it is determined that the recognition result corresponding to the video frame image is the same as the license plate number. , The face image is related to the human body feature map.

S403: If the recognition set does not match the license plate number, the face image, and the human body feature map, it is determined that the recognition result corresponding to the video frame image is the same as the license plate number and the face image. The image is not related to the human body feature map.

Understandably, none of the recognition set matches the license plate number or the face image or the human body feature map, so that it is determined that the recognition result corresponding to the video frame image is the same as the license plate number, The face image and the human body feature map are not related.

In this way, the present application can quickly and accurately identify whether the video frame image contains any of the license plate number, the face image, and the human body feature map through the image recognition model, and obtain the correlation or Non-correlated recognition results improve the efficiency and reliability of recognition.

S50: According to the recognition result corresponding to the video frame image in each of the original videos, extract a plurality of matching segments in each of the original videos, and combine the matching segments with the corresponding corresponding segments. The unique identification code is associated; the matching segment is the video segment between the start frame image and the end frame image in the original video; the start frame image refers to the previous video frame that is related and adjacent to the recognition result The recognition result of the image is a non-correlated video frame image; the end frame image refers to the recognition result is non-correlated and the previous video frame image adjacent to the recognition result is a related video frame image, or refers to the original video The video frame image at the end.

Understandably, according to the recognition result corresponding to the video frame image, that is, according to the recognition result in each of the original videos as a relevant video frame image and the recognition result as a non-relevant video frame image , Extracting multiple matching segments in each of the original videos, and associating the matching segments with the corresponding unique identification code. The unique identification code can be set according to requirements, such as the unique The identification code may be a time value on the time axis accurate to the second level, or a combination code that combines the unique identification code of the photographing device and the time value.

Wherein, the matching segment is a video segment that includes any one of the license plate number, the face image, and the human body feature map.

In one embodiment, in the step S50, the multiple matching segments in each original video are extracted according to the recognition result corresponding to the video frame image in each original video , And associating the matching segment with the unique identification code associated with the corresponding original video, including:

S501: Obtain the original video.

S502: Acquire all the start point frame images from the original video, and acquire all the end point frame images from the original video in chronological order.

Understandably, the starting point frame image refers to a video frame image whose recognition result is relevant and the recognition result of the previous video frame image adjacent to it is irrelevant, and the ending frame image refers to the recognition result being irrelevant and not relevant to it. The recognition result of the adjacent previous video frame image is the relevant video frame image, or refers to the end video frame image in the original video, and all the start point frame images and all the end points are obtained from the original video Frame image.

S503: If the start frame image and the end frame image do not exist in the original video, determine that the original video is a non-matching segment.

Understandably, if the starting point frame image and the ending point frame image do not exist in the original video, it is determined that there is no matching video segment in the original video, that is, the original video is marked as a non-matching segment.

S504: If there is only one start frame image and no end frame image in the original video, record the start frame image as the matching segment of the original video.

Understandably, if there is only one start frame image and no end frame image in the original video, then the start frame image is marked as the matching segment of the original video.

S505: If there is only one end frame image in the original video and the start point frame image does not exist, record the end frame image as the matching segment of the original video.

Understandably, if there is only one end frame image in the original video and the start point frame image does not exist, then the end frame image is marked as the matching segment of the original video.

S506: If the end frame image adjacent to the start point frame image exists after the start point frame image, obtain a video segment between the start point frame image and the end point frame image, and combine the start point frame image and the end point frame image The video segment between the images is recorded as the matching segment of the original video.

Understandably, if there is an end frame image adjacent to it after the start point frame image, the video segment between the start point frame image and the end point frame image is cut out, and the video segment is marked as the The matching segment of the original video.

S507: Associate all the matched segments with the unique identification code associated with the original video.

Understandably, the matching segment is associated with the unique identification code associated with the original video. The unique identification code can be set according to requirements. For example, the unique identification code can be accurate to the second level. The time value of the time axis, or the combination code of the unique identification code of the shooting device and the time value.

In this way, an interception method for obtaining matching segments is provided, which can accurately intercept required video segments, which improves efficiency and reduces costs.

S60, splicing the matching segments associated with the same unique identification code in chronological order to obtain a video synthesis segment, and associating the video synthesis segment with the same unique identification code.

Understandably, according to chronological order, the matching segments associated with the same unique identification code are spliced to obtain a video synthesis segment that develops in a chronological order, and the video synthesis segment is combined with the unique identifier. Code to associate.

S70. Sort all the video synthesis fragments according to a preset unique identification code order rule, and determine all the video synthesis fragments after sorting as target videos in the to-be-processed video set that are related to the target data set .

Understandably, the unique identification code order rule is a sorting rule preset for the unique identification code according to requirements. For example, the unique identification code order rule may be the order of the road route the target drove, or may be the target The time sequence of the driver’s driving, etc., determine all the video synthesis segments after sorting as the target video in the to-be-processed video set that is related to the target image set and target data set. In this way, after a traffic accident, it can be quickly The target video for the target person is provided to the responsible person, and the target video is used as evidence of the responsibility, without manual viewing and interception, and not affected by the mental state of the staff. Therefore, the investment cost is reduced. Improved work efficiency.

This application realizes that by acquiring the target data set containing the license plate number, the face image and the human body feature map and the to-be-processed video set containing multiple original videos, the video frame image is extracted from the original video, and the image binarization process is performed. After the detection processing and contour tracking processing, the relevant features are extracted through the combination of character segmentation algorithm, YOLO algorithm and chain code curvature algorithm to obtain the recognition result, which characterizes whether the video frame image contains the license plate number , Any one of the face image and the human body feature map extracts multiple matching segments according to the recognition result, and stitches the matching segments to obtain a video synthesis segment, and synthesizes all the videos after sorting The segment is determined to be the target video related to the target data set in the to-be-processed video set. Therefore, it is realized that the video segments related to the target data set are automatically cut out from the to-be-processed video set quickly and accurately, which improves the recognition efficiency. And the accuracy rate greatly reduces the input cost.

In one embodiment, a video data processing device is provided, and the video data processing device corresponds to the video data processing method in the above-mentioned embodiment one-to-one. As shown in FIG. 8, the video data processing device includes a receiving module 11, an extraction module 12, an acquisition module 13, an identification module 14, an extraction module 15, a merging module 16 and a determination module 17. The detailed description of each functional module is as follows:

The receiving module 11 is configured to receive a video extraction instruction to obtain a target data set and a video set to be processed; the target data set includes a license plate number, a face image and a human body feature map, and the video set to be processed includes a number of original videos, Each of the original videos is associated with a unique identification code;

The extraction module 12 is configured to extract at least two video frame images from each of the original videos according to a preset extraction rule;

The acquisition module 13 is configured to input the extracted video frame image into an image recognition model, and the image recognition model obtains the license plate area, In the face area and the human body feature area, the text feature of the license plate area is extracted through the character segmentation algorithm, and the license plate number result of the license plate area is obtained. At the same time, the facial feature extraction is performed on the face area through the YOLO algorithm, Acquiring a face result of the human face region, and extracting a human posture feature of the human body feature region by a chain code curvature algorithm, and acquiring a human body feature result of the human body feature region;

The recognition module 14 is configured to determine the recognition result corresponding to the video frame image according to the license plate number result, the face result and the human body feature result; the recognition result characterizes whether the video frame image contains Any one of the license plate number, the face image, and the human body feature map; the recognition result includes whether it is related or unrelated to the license plate number, the face image, and the human body feature map;

The extraction module 15 is configured to extract multiple matching segments in each original video according to the recognition result corresponding to the video frame image in each original video, and combine the matching segments with the Corresponding to the unique identification code; the matching segment is the video segment between the start frame image and the end frame image in the original video; the start frame image refers to the recognition result that is related and adjacent to it The recognition result of the previous video frame image is a non-correlated video frame image; the end frame image refers to the recognition result is non-correlated and the recognition result of the previous video frame image adjacent to it is a related video frame image, or refers to The last video frame image in the original video;

The merging module 16 is configured to splice the matching segments associated with the same unique identification code in chronological order to obtain a video synthesis segment, and associate the video synthesis segment with the same unique identification code;

The determining module 17 is configured to sort all the video synthesis fragments according to a preset unique identification code order rule, and determine all the video synthesis fragments after sorting as the to-be-processed video set and the target data set Related target videos.

In an embodiment, the receiving module 11 includes:

The receiving unit is configured to receive a target collection instruction and obtain a sample video; the sample video includes a license plate, a human face, and a specific human posture;

A splitting unit for splitting the sample video into several sample images;

The first input unit is configured to input all the sample images into a sample collection model, the sample collection model performs image collection on all the sample images, and intercepts all the sample images containing the license plate area images of the license plate , At the same time, extract all the face region images that contain the face in all the sample images, and extract the feature region images of the human body that contain the face and the specific human posture from all the sample images;

The first acquiring unit is configured to input the license plate area image into a license plate extraction model, and the license plate extraction model performs license plate number recognition on the license plate area image to obtain a license plate number output by the license plate extraction model;

The second acquisition unit is configured to input all the face region images into a face extraction model, and the face extraction model screens all the face region images, and obtains the face extraction model to select a person Face image

The third acquisition unit is configured to input all the human body feature region images into a human body feature extraction model, and the human body feature extraction model screens all the human body feature region images to obtain one selected by the human body feature extraction model Human body feature map;

The first output unit is configured to determine the license plate number, the face image, and the human body feature map as the target data set.

In an embodiment, the extraction module 12 includes:

A fourth acquiring unit, configured to acquire the original video and the extraction parameters in the extraction rule;

A first determining unit, configured to determine a starting video frame image in the original video as a starting frame image;

The second determining unit is configured to determine the ending video frame image in the original video as the ending frame image;

An extracting unit, configured to extract a video frame image from the start frame image at intervals of the extraction parameter until the end frame image;

A third determining unit, configured to determine the video frame image after extraction as a process frame image;

The second output unit is configured to form all the video frame images in the original video by the start frame image, the end frame image, and the process frame image.

In an embodiment, the acquisition module 13 includes:

The fifth acquisition unit is used to acquire a sample training image set; the sample training image set includes a number of sample training images, and the sample training images include all the license plate area images, all the face sample images, and all the A human body feature region image and a number of negative sample images; each of the sample training images is associated with a sample training label;

The second input unit is configured to input the sample training image into a deep convolutional neural network model containing initial parameters, and extract text features, facial features, and human poses from the sample training image through the deep convolutional neural network model Feature, the training result output by the deep convolutional neural network model according to the extracted text feature, the face feature, and the human posture feature;

A loss unit, configured to match the training result with the sample training label to obtain a loss value;

A first convergence unit, configured to record the deep convolutional neural network model after convergence as an image recognition model when the loss value reaches a preset convergence condition;

The second convergence unit is configured to iteratively update the initial parameters of the deep convolutional neural network model when the loss value does not reach the preset convergence condition, until the loss value reaches the preset convergence condition, The deep convolutional neural network model after convergence is recorded as an image recognition model.

In an embodiment, the acquiring module 13 further includes:

The first processing unit is configured to perform binarization processing on the video frame image through the image recognition model to obtain a grayscale image;

The second processing unit is configured to perform edge detection processing on the grayscale image by using the Candy algorithm to obtain an edge image;

The third processing unit is configured to perform contour tracking processing on the edge image according to the tracking criterion and the node criterion to obtain a contour image;

An analysis unit, configured to analyze the contour image, and cut out the license plate area containing the license plate, the face area containing the human face, and the human body feature area related to the posture of the human body;

The first extraction unit is configured to extract the text features of the license plate area by using a character segmentation algorithm and a character recognition method, and the image recognition model to obtain the license plate number result;

The second extraction unit is configured to use the YOLO algorithm and the image recognition model to extract the facial features of the facial region to obtain the facial result;

The third output unit is configured to use the chain code curvature algorithm to identify the shoulder width scale, neck scale and chest width scale of the human body feature area by the image recognition model. The human body posture feature is extracted by the breast width scale, and the human body feature result is obtained.

In an embodiment, the identification module 14 includes:

A fourth determining unit, configured to determine the license plate number result, the face result, and the human body feature result of the video frame image as the recognition set of the video frame image;

The fifth determining unit is configured to determine that the recognition result corresponding to the video frame image is if there is a match with any one of the license plate number, the face image, and the human body feature map in the recognition set Related to the license plate number, the face image, and the human body feature map;

The sixth determining unit is configured to determine that the recognition result corresponding to the video frame image is the same as the license plate number if the recognition set does not match the license plate number, the face image, and the human body feature map. , The face image and the human body feature map are not related.

In an embodiment, the extraction module 15 includes:

The sixth acquiring unit is configured to acquire the original video;

A seventh acquiring unit, configured to acquire all the start point frame images from the original video, and acquire all the end point frame images from the original video in chronological order;

A seventh determining unit, configured to determine that the original video is a non-matching segment if the start point frame image and the end point frame image do not exist in the original video;

An eighth determining unit, configured to, if there is only one start frame image and no end frame image in the original video, record the start frame image as the matching segment of the original video;

A ninth determining unit, configured to record the end frame image as the matching segment of the original video if there is only one end frame image in the original video and the start point frame image does not exist;

The tenth determining unit is configured to obtain a video segment between the start point frame image and the end point frame image if there is the end point frame image adjacent to it after the start point frame image, and compare the start point frame image A video segment between the end frame image and the end frame image is recorded as the matching segment of the original video;

The associating unit is used for associating all the matching segments with the unique identification code associated with the original video.

For the specific definition of the video data processing device, please refer to the above definition of the video data processing method, which will not be repeated here. Each module in the above-mentioned video data processing device may be implemented in whole or in part by software, hardware, and a combination thereof. The above-mentioned modules may be embedded in the form of hardware or independent of the processor in the computer equipment, or may be stored in the memory of the computer equipment in the form of software, so that the processor can call and execute the operations corresponding to the above-mentioned modules.

In one embodiment, a computer device is provided. The computer device may be a server, and its internal structure diagram may be as shown in FIG. 9. The computer equipment includes a processor, a memory, a network interface, and a database connected through a system bus. Among them, the processor of the computer device is used to provide calculation and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage medium. The network interface of the computer device is used to communicate with an external terminal through a network connection. The computer program is executed by the processor to realize a video data processing method.

In one embodiment, a computer device is provided, including a memory, a processor, and a computer program stored on the memory and capable of running on the processor. The processor executes the computer program to implement the video data processing method in the foregoing embodiment.

In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored, and the computer program is executed by a processor to implement the video data processing method in the above-mentioned embodiment.

Further, the computer-readable storage medium may mainly include a storage program area and a storage data area, where the storage program area may store an operating system, an application program required by at least one function, etc.; the storage data area may store Data created by the use of nodes, etc.

A person of ordinary skill in the art can understand that all or part of the processes in the above-mentioned embodiment methods can be implemented by instructing relevant hardware through a computer program. The computer program can be stored in a non-volatile computer readable storage. In the medium, when the computer program is executed, it may include the processes of the above-mentioned method embodiments. Wherein, any reference to memory, storage, database, or other media used in the embodiments provided in this application may include non-volatile and/or volatile memory. Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), and electrically programmable ROM (EPROM). Volatile memory may include random access memory (RAM) or external cache memory. As an illustration and not a limitation, RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), synchronous link (Synchlink) DRAM (SLDRAM), and memory bus dynamic RAM ( RDRAM) and so on.

The blockchain referred to in this application is a new application mode of computer technology such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm. Blockchain, essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information for verification. The validity of the information (anti-counterfeiting) and the generation of the next block. The blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.

Those skilled in the art can clearly understand that for the convenience and conciseness of description, only the division of the above functional units and modules is used as an example. In practical applications, the above functions can be allocated to different functional units and modules as required. Module completion, that is, the internal structure of the device is divided into different functional units or modules to complete all or part of the functions described above.

Claims

A video data processing method, which includes:

Receive a video extraction instruction to obtain a target data set and a video set to be processed; the target data set includes a license plate number, a face image, and a human body feature map; the video set to be processed includes a number of original videos, each of the original videos Are all associated with a unique identification code;

Extract at least two video frame images from each of the original videos according to a preset extraction rule;

The extracted video frame image is input into an image recognition model, and the image recognition model performs image binarization processing, edge detection processing, and contour tracking processing on the video image to obtain license plate area, face area, and human body features Region, extract the text features of the license plate area through the character segmentation algorithm, obtain the license plate number result of the license plate area, and extract the facial features of the face area through the YOLO algorithm to obtain the face area And extract the human body posture feature of the human body feature region through a chain code curvature algorithm to obtain the human body feature result of the human body feature region;

According to the license plate number result, the face result, and the human body feature result, the recognition result corresponding to the video frame image is determined; the recognition result characterizes whether the video frame image contains the license plate number, Any one of the human face image and the human body feature map; the recognition result includes correlation or non-correlation with the license plate number, the human face image, and the human body feature map;

According to the recognition result corresponding to the video frame image in each of the original videos, multiple matching segments in each of the original videos are extracted, and the matching segment and the unique identifier corresponding to it are extracted Code to associate; the matching segment is the video segment between the start frame image and the end frame image in the original video; the start frame image refers to the image of the previous video frame image that is related and adjacent to the recognition result The recognition result is a non-correlated video frame image; the end frame image means that the recognition result is non-correlated and the recognition result of the previous video frame image adjacent to it is the relevant video frame image, or refers to the video frame image in the original video Video frame image at the end;

Splicing the matching segments associated with the same unique identification code in chronological order to obtain a video synthesis segment, and associating the video synthesis segment with the same unique identification code;

According to a preset unique identification code order rule, all the video synthesis fragments are sorted, and all the video synthesis fragments after sorting are determined as target videos in the to-be-processed video set that are related to the target data set.
5. The video data processing method according to claim 1, wherein before said acquiring the target data set, it comprises:

Receive a target collection instruction to obtain a sample video; the sample video includes a license plate, a face, and a specific human pose;

Split the sample video into several sample images;

Input all the sample images into a sample collection model, the sample collection model performs image collection on all the sample images, cuts out all the images of the license plate area containing the license plate in all the sample images, and cuts out all the images at the same time The sample image contains the face region image of the human face, and extracts the human body feature region images that contain the human face and the specific human posture from all the sample images;

Inputting the license plate area image into a license plate extraction model, and the license plate extraction model performs license plate number recognition on the license plate area image to obtain a license plate number output by the license plate extraction model;

Inputting all the face region images into a face extraction model, and the face extraction model filters all the face region images, and obtains the face extraction model to select a face image;

Inputting all the human body feature region images into a human body feature extraction model, and the human body feature extraction model screens all the human body feature region images to obtain a human body feature map selected by the human body feature extraction model;

The license plate number, the face image, and the human body feature map are determined as the target data set.
5. The video data processing method according to claim 1, wherein said extracting at least two video frame images from each of said original videos according to a preset extraction rule comprises:

Acquiring the original video and the extraction parameters in the extraction rule;

Determining a starting video frame image in the original video as a starting frame image;

Determining the ending video frame image in the original video as the ending frame image;

From the start frame image, extract a video frame image every other extraction parameters until the end frame image;

Determining the video frame image after extraction as a process frame image;

All the video frame images in the original video are composed of the start frame image, the end frame image, and the process frame image.
5. The video data processing method according to claim 1, wherein, before inputting the extracted video frame image into an image recognition model, the method comprises:

Obtain a sample training image set; the sample training image set contains a number of sample training images, the sample training images include all the license plate region images, all the face sample images, all the human body feature region images, and several Negative sample images; each of the sample training images is associated with a sample training label;

The sample training image is input into a deep convolutional neural network model containing initial parameters, and text features, facial features, and human posture features are extracted from the sample training image through the deep convolutional neural network model. The deep convolution The training result output by the neural network model according to the extracted text feature, the face feature, and the human posture feature;

Matching the training result with the sample training label to obtain a loss value;

When the loss value reaches a preset convergence condition, recording the converged deep convolutional neural network model as an image recognition model;

When the loss value does not reach the preset convergence condition, iteratively update the initial parameters of the deep convolutional neural network model, until the loss value reaches the preset convergence condition, the depth after convergence The convolutional neural network model is recorded as an image recognition model.
The video data processing method of claim 1, wherein the extracted video frame image is input into an image recognition model, and the image recognition model performs image binarization processing and edge detection processing on the video image After processing and contour tracking, the license plate area, face area, and human body feature area are obtained, and the text feature of the license plate area is extracted through the character segmentation algorithm, and the license plate number result of the license plate area is obtained. At the same time, the YOLO algorithm Extract the facial features of the face area, obtain the face result of the face area, and extract the posture feature of the human body feature area through the chain code curvature algorithm, and obtain the result of the human body feature area of the human body feature area ,include:

Binarize the video frame image by using the image recognition model to obtain a grayscale image;

Performing edge detection processing on the gray image by using a Candy algorithm to obtain an edge image;

Performing contour tracking processing on the edge image according to the tracking criterion and the node criterion to obtain a contour image;

Analyze the contour image, and cut out the license plate area containing the license plate, the face area containing the human face, and the human body feature area related to the posture of the human body;

Using a character segmentation algorithm and a character recognition method, the image recognition model extracts the text features of the license plate area to obtain the license plate number result;

Using the YOLO algorithm, the image recognition model extracts the face features of the face area to obtain the face result;

Through the chain code curvature algorithm, the image recognition model recognizes the shoulder width scale, the neck scale and the chest width scale of the human body feature area, and extracts all the data according to the shoulder width scale, the neck scale and the chest width scale. The posture characteristics of the human body are described, and the result of the human body characteristics is obtained.
5. The video data processing method according to claim 1, wherein the determining the recognition result corresponding to the video frame image according to the result of the license plate number, the result of the face and the result of the human body feature comprises:

Determining the license plate number result, the face result, and the human body feature result of the video frame image as the recognition set of the video frame image;

If there is a match between the license plate number, the face image, and the human body feature map in the recognition set, it is determined that the recognition result corresponding to the video frame image is the same as the license plate number, the The face image is related to the feature map of the human body;

If the recognition set does not match the license plate number, the face image, and the human body feature map, it is determined that the recognition result corresponding to the video frame image is the same as the license plate number, the face image, and The human body feature map is not relevant.
5. The video data processing method according to claim 1, wherein said extracting a plurality of matches in each of the original videos according to the recognition result corresponding to the video frame image in each of the original videos Segment, and associating the matched segment with the unique identification code associated with the corresponding original video, including:

Obtaining the original video;

Acquiring all the starting point frame images from the original video and acquiring all the ending point frame images from the original video in chronological order;

If the starting point frame image and the ending point frame image do not exist in the original video, determining that the original video is a non-matching segment;

If there is only one start frame image and no end frame image in the original video, recording the start frame image as the matching segment of the original video;

If there is only one end frame image in the original video and the start point frame image does not exist, recording the end frame image as the matching segment of the original video;

If there is the end frame image adjacent to it after the start frame image, the video segment between the start frame image and the end frame image is obtained, and the start frame image and the end frame image are separated The video segment between the two is recorded as the matching segment of the original video;

Associating all the matching segments with the unique identification code associated with the original video.
A video data processing device, which includes:

The receiving module is used to receive a video extraction instruction to obtain a target data set and a to-be-processed video set; the target data set includes a license plate number, a face image, and a human body feature map, and the to-be-processed video set includes a number of original videos, each Each of the original videos is associated with a unique identification code;

An extraction module, configured to extract at least two video frame images from each of the original videos according to a preset extraction rule;

The acquisition module is used to input the extracted video frame image into an image recognition model, and the image recognition model obtains the license plate area and the person after image binarization processing, edge detection processing, and contour tracking processing are performed on the video image. Face area and human body feature area, extract the text feature of the license plate area through the character segmentation algorithm, obtain the license plate number result of the license plate area, and extract the facial features of the face area through the YOLO algorithm to obtain Extracting the human body posture feature of the human body feature region by using a chain code curvature algorithm to obtain the human body feature result of the human body feature region;

The recognition module is used to determine the recognition result corresponding to the video frame image according to the result of the license plate number, the face result and the result of the human body feature; the recognition result characterizes whether the video frame image contains Any one of the license plate number, the face image, and the human body feature map; the recognition result includes whether it is related or unrelated to the license plate number, the face image, and the human body feature map;

The extraction module is configured to extract multiple matching segments in each original video according to the recognition result corresponding to the video frame image in each original video, and compare the matching segments to the corresponding The unique identification code is associated with the unique identification code; the matching segment is the video segment between the start frame image and the end frame image in the original video; the start frame image refers to the previous recognition result that is related and adjacent to it The recognition result of a video frame image is a non-correlated video frame image; the end frame image refers to the recognition result is non-correlated and the recognition result of the previous video frame image adjacent to it is a related video frame image, or refers to the relevant video frame image. The video frame image at the end of the original video;

A merging module, configured to splice the matching segments associated with the same unique identification code in chronological order to obtain a video synthesis segment, and associate the video synthesis segment with the same unique identification code;

The determining module is configured to sort all the video synthesis fragments according to a preset unique identification code order rule, and determine all the video synthesis fragments after sorting as the to-be-processed video set related to the target data set The target video.
A computer device includes a memory, a processor, and a computer program stored in the memory and running on the processor, wherein the processor implements the following steps when the processor executes the computer program:

Receive a video extraction instruction to obtain a target data set and a video set to be processed; the target data set includes a license plate number, a face image, and a human body feature map; the video set to be processed includes a number of original videos, each of the original videos Are all associated with a unique identification code;

Extract at least two video frame images from each of the original videos according to a preset extraction rule;

The extracted video frame image is input into an image recognition model, and the image recognition model performs image binarization processing, edge detection processing, and contour tracking processing on the video image to obtain license plate area, face area, and human body features Region, extract the text features of the license plate area through the character segmentation algorithm, obtain the license plate number result of the license plate area, and extract the facial features of the face area through the YOLO algorithm to obtain the face area And extract the human body posture feature of the human body feature region through a chain code curvature algorithm to obtain the human body feature result of the human body feature region;

According to the license plate number result, the face result, and the human body feature result, the recognition result corresponding to the video frame image is determined; the recognition result characterizes whether the video frame image contains the license plate number, Any one of the human face image and the human body feature map; the recognition result includes correlation or non-correlation with the license plate number, the human face image, and the human body feature map;

According to the recognition result corresponding to the video frame image in each of the original videos, multiple matching segments in each of the original videos are extracted, and the matching segment and the unique identifier corresponding to it are extracted Code to associate; the matching segment is the video segment between the start frame image and the end frame image in the original video; the start frame image refers to the image of the previous video frame image that is related and adjacent to the recognition result The recognition result is a non-correlated video frame image; the end frame image means that the recognition result is non-correlated and the recognition result of the previous video frame image adjacent to it is the relevant video frame image, or refers to the video frame image in the original video Video frame image at the end;

Splicing the matching segments associated with the same unique identification code in chronological order to obtain a video synthesis segment, and associating the video synthesis segment with the same unique identification code;

According to a preset unique identification code order rule, all the video synthesis fragments are sorted, and all the video synthesis fragments after sorting are determined as target videos in the to-be-processed video set that are related to the target data set.
9. The computer device according to claim 9, wherein before said acquiring the target data set, it comprises:

Receive a target collection instruction to obtain a sample video; the sample video includes a license plate, a face, and a specific human pose;

Split the sample video into several sample images;

Input all the sample images into a sample collection model, the sample collection model performs image collection on all the sample images, cuts out all the images of the license plate area containing the license plate in all the sample images, and cuts out all the images at the same time The sample image contains the face region image of the human face, and extracts the human body feature region images that contain the human face and the specific human posture from all the sample images;

Inputting the license plate area image into a license plate extraction model, and the license plate extraction model performs license plate number recognition on the license plate area image to obtain a license plate number output by the license plate extraction model;

Inputting all the face region images into a face extraction model, and the face extraction model filters all the face region images, and obtains the face extraction model to select a face image;

Inputting all the human body feature region images into a human body feature extraction model, and the human body feature extraction model screens all the human body feature region images to obtain a human body feature map selected by the human body feature extraction model;

The license plate number, the face image, and the human body feature map are determined as the target data set.
8. The computer device according to claim 9, wherein said extracting at least two video frame images from each of said original videos according to a preset extraction rule comprises:

Acquiring the original video and the extraction parameters in the extraction rule;

Determining a starting video frame image in the original video as a starting frame image;

Determining the ending video frame image in the original video as the ending frame image;

From the start frame image, extract a video frame image every other extraction parameters until the end frame image;

Determining the video frame image after extraction as a process frame image;

All the video frame images in the original video are composed of the start frame image, the end frame image, and the process frame image.
9. The computer device according to claim 9, wherein, before inputting the extracted video frame image into an image recognition model, the method comprises:

Obtain a sample training image set; the sample training image set contains a number of sample training images, the sample training images include all the license plate region images, all the face sample images, all the human body feature region images, and several Negative sample images; each of the sample training images is associated with a sample training label;

The sample training image is input into a deep convolutional neural network model containing initial parameters, and text features, facial features, and human posture features are extracted from the sample training image through the deep convolutional neural network model. The deep convolution The training result output by the neural network model according to the extracted text feature, the face feature, and the human posture feature;

Matching the training result with the sample training label to obtain a loss value;

When the loss value reaches a preset convergence condition, recording the converged deep convolutional neural network model as an image recognition model;

When the loss value does not reach the preset convergence condition, iteratively update the initial parameters of the deep convolutional neural network model, until the loss value reaches the preset convergence condition, the depth after convergence The convolutional neural network model is recorded as an image recognition model.
The computer device of claim 9, wherein the extracted video frame image is input into an image recognition model, and the image recognition model performs image binarization processing, edge detection processing, and contour processing on the video image. After the tracking process, the license plate area, the face area and the human body feature area are obtained, the text feature of the license plate area is extracted through the character segmentation algorithm, the license plate number result of the license plate area is obtained, and the face is analyzed by the YOLO algorithm The region performs facial feature extraction, obtains the face result of the face region, and extracts the human body posture feature of the human body feature region through the chain code curvature algorithm, and obtains the human body feature result of the human body feature region, including :

Binarize the video frame image by using the image recognition model to obtain a grayscale image;

Performing edge detection processing on the gray image by using a Candy algorithm to obtain an edge image;

Performing contour tracking processing on the edge image according to the tracking criterion and the node criterion to obtain a contour image;

Analyze the contour image, and cut out the license plate area containing the license plate, the face area containing the human face, and the human body feature area related to the posture of the human body;

Using a character segmentation algorithm and a character recognition method, the image recognition model extracts the text features of the license plate area to obtain the license plate number result;

Using the YOLO algorithm, the image recognition model extracts the face features of the face area to obtain the face result;

Through the chain code curvature algorithm, the image recognition model recognizes the shoulder width scale, the neck scale and the chest width scale of the human body feature area, and extracts all the data according to the shoulder width scale, the neck scale and the chest width scale. The posture characteristics of the human body are described, and the result of the human body characteristics is obtained.
The computer device according to claim 9, wherein the determining the recognition result corresponding to the video frame image according to the result of the license plate number, the result of the face, and the result of the human body feature comprises:

Determining the license plate number result, the face result, and the human body feature result of the video frame image as the recognition set of the video frame image;

If there is a match between the license plate number, the face image, and the human body feature map in the recognition set, it is determined that the recognition result corresponding to the video frame image is the same as the license plate number, the The face image is related to the feature map of the human body;

If the recognition set does not match the license plate number, the face image, and the human body feature map, it is determined that the recognition result corresponding to the video frame image is the same as the license plate number, the face image, and The human body feature map is not relevant.
9. The computer device according to claim 9, wherein said extracting a plurality of matching segments in each of the original videos according to the recognition result corresponding to the video frame image in each of the original videos, And associating the matching segment with the unique identification code associated with the corresponding original video includes:

Obtaining the original video;

Acquiring all the starting point frame images from the original video and acquiring all the ending point frame images from the original video in chronological order;

If the starting point frame image and the ending point frame image do not exist in the original video, determining that the original video is a non-matching segment;

If there is only one start frame image and no end frame image in the original video, recording the start frame image as the matching segment of the original video;

If there is only one end frame image in the original video and the start point frame image does not exist, recording the end frame image as the matching segment of the original video;

If the end frame image adjacent to the start point frame image exists after the start point frame image, a video segment between the start point frame image and the end point frame image is obtained, and the start point frame image and the end point frame image are separated The video segment between the two is recorded as the matching segment of the original video;

Associating all the matching segments with the unique identification code associated with the original video.
A computer-readable storage medium having a computer program stored on the computer-readable storage medium, wherein, when the computer program is executed by a processor, the following steps are implemented:

Receive a video extraction instruction to obtain a target data set and a video set to be processed; the target data set includes a license plate number, a face image, and a human body feature map; the video set to be processed includes a number of original videos, each of the original videos Are all associated with a unique identification code;

Extract at least two video frame images from each of the original videos according to a preset extraction rule;

The extracted video frame image is input into an image recognition model, and the image recognition model performs image binarization processing, edge detection processing, and contour tracking processing on the video image to obtain license plate area, face area, and human body features Region, extract the text features of the license plate area through the character segmentation algorithm, obtain the license plate number result of the license plate area, and extract the facial features of the face area through the YOLO algorithm to obtain the face area And extract the human body posture feature of the human body feature region through a chain code curvature algorithm to obtain the human body feature result of the human body feature region;

According to the license plate number result, the face result, and the human body feature result, the recognition result corresponding to the video frame image is determined; the recognition result characterizes whether the video frame image contains the license plate number, Any one of the human face image and the human body feature map; the recognition result includes correlation or non-correlation with the license plate number, the human face image, and the human body feature map;

According to the recognition result corresponding to the video frame image in each of the original videos, multiple matching segments in each of the original videos are extracted, and the matching segment and the unique identifier corresponding to it are extracted Code to associate; the matching segment is the video segment between the start frame image and the end frame image in the original video; the start frame image refers to the image of the previous video frame image that is related and adjacent to the recognition result The recognition result is a non-correlated video frame image; the end frame image means that the recognition result is non-correlated and the recognition result of the previous video frame image adjacent to it is the relevant video frame image, or refers to the video frame image in the original video Video frame image at the end;

Splicing the matching segments associated with the same unique identification code in chronological order to obtain a video synthesis segment, and associating the video synthesis segment with the same unique identification code;

According to a preset unique identification code order rule, all the video synthesis fragments are sorted, and all the video synthesis fragments after sorting are determined as target videos in the to-be-processed video set that are related to the target data set.
15. The computer-readable storage medium according to claim 16, wherein before said acquiring the target data set, it comprises:

Receive a target collection instruction to obtain a sample video; the sample video includes a license plate, a face, and a specific human pose;

Split the sample video into several sample images;

Input all the sample images into a sample collection model, the sample collection model performs image collection on all the sample images, cuts out all the images of the license plate area containing the license plate in all the sample images, and cuts out all the images at the same time The sample image contains the face region image of the human face, and extracts the human body feature region images that contain the human face and the specific human posture from all the sample images;

Inputting the license plate area image into a license plate extraction model, and the license plate extraction model performs license plate number recognition on the license plate area image to obtain a license plate number output by the license plate extraction model;

Inputting all the face region images into a face extraction model, and the face extraction model filters all the face region images, and obtains the face extraction model to select a face image;

Inputting all the human body feature region images into a human body feature extraction model, and the human body feature extraction model screens all the human body feature region images to obtain a human body feature map selected by the human body feature extraction model;

The license plate number, the face image, and the human body feature map are determined as the target data set.
15. The computer-readable storage medium of claim 16, wherein said extracting at least two video frame images from each of said original videos according to a preset extraction rule comprises:

Acquiring the original video and the extraction parameters in the extraction rule;

Determining a starting video frame image in the original video as a starting frame image;

Determining the ending video frame image in the original video as the ending frame image;

From the start frame image, extract a video frame image every other extraction parameters until the end frame image;

Determining the video frame image after extraction as a process frame image;

All the video frame images in the original video are composed of the start frame image, the end frame image, and the process frame image.
15. The computer-readable storage medium according to claim 16, wherein, before inputting the extracted video frame image into an image recognition model, the method comprises:

Obtain a sample training image set; the sample training image set contains a number of sample training images, the sample training images include all the license plate region images, all the face sample images, all the human body feature region images, and several Negative sample images; each of the sample training images is associated with a sample training label;

The sample training image is input into a deep convolutional neural network model containing initial parameters, and text features, facial features, and human posture features are extracted from the sample training image through the deep convolutional neural network model. The deep convolution The training result output by the neural network model according to the extracted text feature, the face feature, and the human posture feature;

Matching the training result with the sample training label to obtain a loss value;

When the loss value reaches a preset convergence condition, recording the converged deep convolutional neural network model as an image recognition model;

When the loss value does not reach the preset convergence condition, iteratively update the initial parameters of the deep convolutional neural network model, until the loss value reaches the preset convergence condition, the depth after convergence The convolutional neural network model is recorded as an image recognition model.
15. The computer-readable storage medium according to claim 16, wherein the determining the recognition result corresponding to the video frame image according to the result of the license plate number, the result of the face, and the result of the human body feature comprises:

Determining the license plate number result, the face result, and the human body feature result of the video frame image as the recognition set of the video frame image;

If there is a match between the license plate number, the face image, and the human body feature map in the recognition set, it is determined that the recognition result corresponding to the video frame image is the same as the license plate number, the The face image is related to the feature map of the human body;

If the recognition set does not match the license plate number, the face image, and the human body feature map, it is determined that the recognition result corresponding to the video frame image is the same as the license plate number, the face image, and The human body feature map is not relevant.