WO2021212659A1 - Video data processing method and apparatus, and computer device and storage medium - Google Patents

Video data processing method and apparatus, and computer device and storage medium Download PDF

Info

Publication number
WO2021212659A1
WO2021212659A1 PCT/CN2020/099082 CN2020099082W WO2021212659A1 WO 2021212659 A1 WO2021212659 A1 WO 2021212659A1 CN 2020099082 W CN2020099082 W CN 2020099082W WO 2021212659 A1 WO2021212659 A1 WO 2021212659A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
image
frame image
license plate
human body
Prior art date
Application number
PCT/CN2020/099082
Other languages
French (fr)
Chinese (zh)
Inventor
黄小弟
Original Assignee
平安国际智慧城市科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安国际智慧城市科技股份有限公司 filed Critical 平安国际智慧城市科技股份有限公司
Publication of WO2021212659A1 publication Critical patent/WO2021212659A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • G06V20/625License plates

Definitions

  • This application relates to the field of artificial intelligence data processing, and in particular to a video data processing method, device, computer equipment, and storage medium.
  • This application provides a video data processing method, device, computer equipment, and storage medium, which can automatically quickly and accurately cut out video clips related to the target data set from the video set to be processed.
  • This application can be applied to the field of smart transportation In this way, it promotes the construction of smart cities, improves the efficiency and accuracy of recognition, and greatly reduces input costs.
  • a video data processing method including:
  • the target data set includes a license plate number, a face image, and a human body feature map;
  • the video set to be processed includes a number of original videos, each of the original videos Are all associated with a unique identification code;
  • the extracted video frame image is input into an image recognition model, and the image recognition model performs image binarization processing, edge detection processing, and contour tracking processing on the video image to obtain license plate area, face area, and human body features Region, extract the text features of the license plate area through the character segmentation algorithm, obtain the license plate number result of the license plate area, and extract the facial features of the face area through the YOLO algorithm to obtain the face area And extract the human body posture feature of the human body feature region through a chain code curvature algorithm to obtain the human body feature result of the human body feature region;
  • the recognition result corresponding to the video frame image is determined; the recognition result characterizes whether the video frame image contains the license plate number, Any one of the human face image and the human body feature map; the recognition result includes correlation or non-correlation with the license plate number, the human face image, and the human body feature map;
  • the recognition result corresponding to the video frame image in each of the original videos, multiple matching segments in each of the original videos are extracted, and the matching segment and the unique identifier corresponding to it are extracted Code to associate;
  • the matching segment is the video segment between the start frame image and the end frame image in the original video;
  • the start frame image refers to the image of the previous video frame image that is related and adjacent to the recognition result
  • the recognition result is a non-correlated video frame image;
  • the end frame image means that the recognition result is non-correlated and the recognition result of the previous video frame image adjacent to it is the relevant video frame image, or refers to the video frame image in the original video Video frame image at the end;
  • all the video synthesis fragments are sorted, and all the video synthesis fragments after sorting are determined as target videos in the to-be-processed video set that are related to the target data set.
  • a video data processing device including:
  • the receiving module is used to receive a video extraction instruction to obtain a target data set and a to-be-processed video set;
  • the target data set includes a license plate number, a face image, and a human body feature map
  • the to-be-processed video set includes a number of original videos, each Each of the original videos is associated with a unique identification code;
  • An extraction module configured to extract at least two video frame images from each of the original videos according to a preset extraction rule
  • the acquisition module is used to input the extracted video frame image into an image recognition model, and the image recognition model obtains the license plate area and the person after image binarization processing, edge detection processing, and contour tracking processing are performed on the video image.
  • Face area and human body feature area extract the text feature of the license plate area through the character segmentation algorithm, obtain the license plate number result of the license plate area, and extract the facial features of the face area through the YOLO algorithm to obtain Extracting the human body posture feature of the human body feature region by using a chain code curvature algorithm to obtain the human body feature result of the human body feature region;
  • the recognition module is used to determine the recognition result corresponding to the video frame image according to the result of the license plate number, the face result and the result of the human body feature; the recognition result characterizes whether the video frame image contains Any one of the license plate number, the face image, and the human body feature map; the recognition result includes whether it is related or unrelated to the license plate number, the face image, and the human body feature map;
  • the extraction module is configured to extract multiple matching segments in each original video according to the recognition result corresponding to the video frame image in each original video, and compare the matching segments to the corresponding
  • the unique identification code is associated with the unique identification code; the matching segment is the video segment between the start frame image and the end frame image in the original video; the start frame image refers to the previous recognition result that is related and adjacent to it
  • the recognition result of a video frame image is a non-correlated video frame image; the end frame image refers to the recognition result is non-correlated and the recognition result of the previous video frame image adjacent to it is a related video frame image, or refers to the relevant video frame image.
  • a merging module configured to splice the matching segments associated with the same unique identification code in chronological order to obtain a video synthesis segment, and associate the video synthesis segment with the same unique identification code;
  • the determining module is configured to sort all the video synthesis fragments according to a preset unique identification code order rule, and determine all the video synthesis fragments after sorting as the to-be-processed video set related to the target data set The target video.
  • a computer device includes a memory, a processor, and a computer program that is stored in the memory and can run on the processor, and the processor implements the following steps when the processor executes the computer program:
  • the target data set includes a license plate number, a face image, and a human body feature map;
  • the video set to be processed includes a number of original videos, each of the original videos Are all associated with a unique identification code;
  • the extracted video frame image is input into an image recognition model, and the image recognition model performs image binarization processing, edge detection processing, and contour tracking processing on the video image to obtain license plate area, face area, and human body features Region, extract the text features of the license plate area through the character segmentation algorithm, obtain the license plate number result of the license plate area, and extract the facial features of the face area through the YOLO algorithm to obtain the face area And extract the human body posture feature of the human body feature region through a chain code curvature algorithm to obtain the human body feature result of the human body feature region;
  • the recognition result corresponding to the video frame image is determined; the recognition result characterizes whether the video frame image contains the license plate number, Any one of the human face image and the human body feature map; the recognition result includes correlation or non-correlation with the license plate number, the human face image, and the human body feature map;
  • the recognition result corresponding to the video frame image in each of the original videos, multiple matching segments in each of the original videos are extracted, and the matching segment and the unique identifier corresponding to it are extracted Code to associate;
  • the matching segment is the video segment between the start frame image and the end frame image in the original video;
  • the start frame image refers to the image of the previous video frame image that is related and adjacent to the recognition result
  • the recognition result is a non-correlated video frame image;
  • the end frame image means that the recognition result is non-correlated and the recognition result of the previous video frame image adjacent to it is the relevant video frame image, or refers to the video frame image in the original video Video frame image at the end;
  • all the video synthesis fragments are sorted, and all the video synthesis fragments after sorting are determined as target videos in the to-be-processed video set that are related to the target data set.
  • a computer-readable storage medium having a computer program stored on the computer-readable storage medium, wherein, when the computer program is executed by a processor, the following steps are implemented:
  • the target data set includes a license plate number, a face image, and a human body feature map;
  • the video set to be processed includes a number of original videos, each of the original videos Are all associated with a unique identification code;
  • the extracted video frame image is input into an image recognition model, and the image recognition model performs image binarization processing, edge detection processing, and contour tracking processing on the video image to obtain license plate area, face area, and human body features Region, extract the text features of the license plate area through the character segmentation algorithm, obtain the license plate number result of the license plate area, and extract the facial features of the face area through the YOLO algorithm to obtain the face area And extract the human body posture feature of the human body feature region through a chain code curvature algorithm to obtain the human body feature result of the human body feature region;
  • the recognition result corresponding to the video frame image is determined; the recognition result characterizes whether the video frame image contains the license plate number, Any one of the human face image and the human body feature map; the recognition result includes correlation or non-correlation with the license plate number, the human face image, and the human body feature map;
  • the recognition result corresponding to the video frame image in each of the original videos, multiple matching segments in each of the original videos are extracted, and the matching segment and the unique identifier corresponding to it are extracted Code to associate;
  • the matching segment is the video segment between the start frame image and the end frame image in the original video;
  • the start frame image refers to the image of the previous video frame image that is related and adjacent to the recognition result
  • the recognition result is a non-correlated video frame image;
  • the end frame image means that the recognition result is non-correlated and the recognition result of the previous video frame image adjacent to it is the relevant video frame image, or refers to the video frame image in the original video Video frame image at the end;
  • all the video synthesis fragments are sorted, and all the video synthesis fragments after sorting are determined as target videos in the to-be-processed video set that are related to the target data set.
  • This application can automatically quickly and accurately cut out video clips related to the target data set from the video set to be processed, which improves the recognition efficiency and accuracy, and greatly reduces the input cost.
  • FIG. 1 is a schematic diagram of an application environment of a video data processing method in an embodiment of the present application
  • FIG. 2 is a flowchart of a video data processing method in an embodiment of the present application
  • Fig. 3 is a flowchart of step S10 of a video data processing method in an embodiment of the present application
  • step S20 of the video data processing method in an embodiment of the present application is a flowchart of step S20 of the video data processing method in an embodiment of the present application.
  • FIG. 5 is a flowchart of step S30 of the video data processing method in an embodiment of the present application.
  • Fig. 6 is a flowchart of step S30 of a video data processing method in another embodiment of the present application.
  • FIG. 7 is a flowchart of step S40 of the video data processing method in an embodiment of the present application.
  • Fig. 8 is a functional block diagram of a video data processing device in an embodiment of the present application.
  • Fig. 9 is a schematic diagram of a computer device in an embodiment of the present application.
  • the video data processing method provided by this application can be applied in the application environment as shown in Fig. 1, in which the client (computer equipment) communicates with the server through the network.
  • the client includes, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, cameras, and portable wearable devices.
  • the server can be implemented as an independent server or a server cluster composed of multiple servers.
  • a video data processing method is provided, and the technical solution mainly includes the following steps S10-S70:
  • S10 Receive a video extraction instruction, and obtain a target data set and a video set to be processed;
  • the target data set includes a license plate number, a face image, and a human body feature map;
  • the video set to be processed includes a number of original videos, each of which The original videos are all associated with a unique identification code.
  • the video extraction instruction is an instruction triggered after selecting the target data set and the to-be-processed video set
  • the target data set is a data set related to the searched target
  • the target is The person who needs to be searched
  • the target data set includes a license plate number, a face image and a human body feature map
  • the license plate number is the unique license plate number of the vehicle driven by the target
  • the face image is the face of the target
  • the human body feature map is an image of a specific human posture of the target person, for example, the human body feature map is the upper body image of the target person driving a vehicle
  • the to-be-processed video set is to find videos related to the target person
  • the set of videos to be processed includes at least one of the original videos.
  • the original video may be an unprocessed video clip or a clipped video clip, for example, the original video is of a certain intersection on a certain day Surveillance video, or the original video is surveillance video of a certain street from 19:00 to 21:00 on a certain day, etc., each of the original videos is associated with a unique identification code, and the unique identification code is The unique identification code assigned by the original video, and the unique identification code can be set according to requirements.
  • the method before the step 10, that is, before the acquisition of the target data set, the method includes:
  • S101 Receive a target collection instruction, and obtain a sample video; the sample video includes a license plate, a human face, and a specific human posture.
  • the target collection instruction is an instruction that is triggered when the target person needs to collect information
  • the sample video is a short video clip related to the target person
  • the content in the sample video contains a license plate
  • the face of a target person and the specific body posture of a target person which means that the license plate, the face and the specific body posture appear at least once in the sample video.
  • one of the sample images in the sample video is obtained every interval of the splitting parameter, and the sample image is an image selected from the sample videos, and the
  • the splitting parameters can be set according to requirements.
  • the splitting parameters can be set to 30 frames or 25 frames, etc., to split the sample video into multiple sample images.
  • sample collection model performs image collection on all the sample images, intercepts all the images of the license plate area containing the license plate in all the sample images, and intercepts all the images at the same time.
  • the sample image includes the human face region image of the human face, and the human body feature region images including the human face and the specific human posture are extracted from all the sample images.
  • the sample collection model refers to a neural network model that has been trained to identify the license plate area, face area, and human feature area in the image.
  • the image collection is to identify and cut out the license plate contained in the sample image.
  • the network structure of the sample collection model can be implemented according to requirements Setting, for example, the network structure of the sample collection model can be the network structure of the Inception series or the network structure of the VGG series.
  • S104 Input the license plate area image into a license plate extraction model, and the license plate extraction model performs license plate number recognition on the license plate area image, and obtains a license plate number output by the license plate extraction model.
  • the license plate extraction model refers to a neural network model that has been trained to identify the license plate number in the image.
  • the network structure of the license plate extraction model can be set according to requirements, such as the network structure of the license plate extraction model. It can be obtained by migration learning GoogleNet.
  • the license plate extraction model can identify Chinese characters, numbers and letters in the license plate area image, the license plate number is the unique identification code of the vehicle driven by the target, and the license plate number consists of Chinese characters, Composition of numbers and letters.
  • S105 Input all the face region images into a face extraction model, and the face extraction model filters all the face region images, and obtains the face extraction model to filter a face image.
  • the face region images containing eyes, eyebrows, mouth, nose, ears, face profile and having the highest definition are screened out, and the screening method It can be set according to requirements.
  • the filtering method is to obtain the average value of the pixel values of the pixels at the same position in all the face region images, and then combine all the pixels at the same position in the face region images.
  • the average value of the pixel value is recorded as the average value of the pixel corresponding to the pixel, and the difference between each pixel in the face area image and the average of the pixel corresponding to the pixel is taken and the absolute value is taken to obtain the absolute pixel difference , And calculate the sum of the absolute pixel differences of all pixels in the face area image, determine the sum of the absolute pixel differences of all pixels in the face area image as the image difference value, and select The face area image with the smallest correspondence among all the image differences is determined to be the face image, that is, the face area image after screening is determined as the face image.
  • S106 Input all the human body feature region images into a human body feature extraction model, and the human body feature extraction model screens all the human body feature region images to obtain a human body feature map selected by the human body feature extraction model.
  • the human body feature region image with the highest definition including the face and the specific human body posture is filtered from all the human body feature region images through the human body feature extraction model, and the screening method It can be set according to requirements.
  • the filtering method is the sum of the pixel value corresponding to each pixel in the image of the human body feature area and the pixel value corresponding to the surrounding pixels adjacent to the pixel.
  • the difference is determined as the local pixel difference, the sum of all the local pixel differences in each of the human body feature region images is obtained, and the corresponding one with the smallest sum of all the local pixel differences in all the human body feature region images is selected
  • the human body feature region image is determined as the human body feature map, that is, the human body feature region image after screening is determined as the human body feature map.
  • S107 Determine the license plate number, the face image, and the human body feature map as the target data set.
  • the license plate number, the face image, and the human body feature map are determined and recorded as the target data set, that is, the target data set contains the license plate number, face, and specific information related to the target. Information collection of human posture.
  • the sample video is collected through the sample collection model, and the license plate number, face image, and face feature map are obtained from the image after the image collection through the license plate extraction model, face extraction model, and human feature extraction model. It automatically recognizes the license plate number from the sample video and intercepts the target's face and human body feature maps, which improves the accuracy of recognition and the reliability of interception, reduces labor costs, and improves efficiency.
  • S20 Extract at least two video frame images from each of the original videos according to a preset extraction rule.
  • the extraction rules can be set according to requirements.
  • the extraction rules can be set to extract at least two video frame images from the original video on average, or to extract video frames from the frame parameters preset in the original video at every interval.
  • Image, etc. the video frame image is an image corresponding to a frame in the original video.
  • the step S20 that is, extracting at least two video frame images from each of the original videos according to a preset extraction rule, includes:
  • the extraction rule can be set according to requirements, and the purpose of the extraction rule is to extract at least two video frame images from each of the original videos.
  • the extraction rule can be to obtain the original video.
  • the start video frame image and the end video frame image in the original video, and the video frame image at the bisecting center position in the original video the extraction rule may also be to obtain the start video frame image in the original video
  • the video frame image at the end and the end of the original video, and the extraction parameter preset in the middle part of the original video selects one video frame image.
  • the extraction parameter can be set according to requirements, for example, the extraction parameter is 15 frames or 25 frames. , 30 frames and so on.
  • S202 Determine a starting video frame image in the original video as a starting frame image.
  • the video frame image corresponding to the first frame in the original video is determined as the start frame image.
  • S203 Determine the ending video frame image in the original video as the ending frame image.
  • the video frame image corresponding to the last frame in the original video is determined as the end frame image.
  • the extraction parameters can be set according to requirements.
  • the extraction parameters can be set to 25 frames (about 1 second), that is, starting from the start frame image, a video is extracted every 25 frames Frame images, stop until the end frame image is less than the 25-frame interval.
  • the process frame image is the video frame image after extraction except for the start frame image and the end frame image in the original video.
  • All the video frame images in the original video are composed of the start frame image, the end frame image, and the process frame image.
  • the start frame image, the end frame image, and all the process frame images are determined as all the video frame images in the original video.
  • the image recognition model includes training of the image binarization processing, the edge detection processing, the contour tracking processing, the character segmentation algorithm, the YOLO algorithm, and the chain code curvature algorithm
  • the completed neural network model the image recognition model can use the image binarization processing, the edge detection processing and the contour tracking processing to identify the license plate area and the face area in the video image
  • the human body feature area the license plate number result of the license plate area can be recognized by the character segmentation algorithm
  • the face result of the face area can be recognized by the YOLO algorithm
  • the chain code curvature algorithm can identify the human body feature result of the human body feature region.
  • the YOLO (You Only Look Once) algorithm is an algorithm that uses a CNN (Convolutional Neural Networks, convolutional neural network) operation to directly predict the categories and regions of different targets.
  • CNN Convolutional Neural Networks, convolutional neural network
  • the method before step S30, that is, before inputting the extracted video frame image into an image recognition model, the method includes:
  • sample training image set includes several sample training images, and the sample training images include all the license plate region images, all the face sample images, all the human feature region images, and A number of negative sample images; each of the sample training images is associated with a sample training label.
  • the sample training image set is a collection of the sample training images
  • the sample training images include all the license plate area images, all the face area images, all the human feature area images, and at least One of the negative sample images, the negative sample images are collected images that do not contain the license plate, the face, and the specific human posture, and the sample training images are all associated with one of the sample training tags, so
  • the sample training label can be set according to requirements.
  • the sample training label can be set to include relevant and non-relevant, or set to include relevant license plate, relevant face, relevant human body feature, and non-relevant, and so on.
  • the text features are features related to the color, shape, and text of the license plate
  • the facial features are features related to the target's eyes, eyebrows, mouth, nose, ears, and face profile
  • the human body posture Features are features related to the target’s head, face, arms, shoulders, etc.
  • the training result represents whether the sample training image contains the license plate number, the face image, and the human feature map.
  • the deep convolutional neural network model extracts the text feature from all the license plate region images, and outputs the training result corresponding to the license plate region image according to the extracted text features
  • the deep convolution The neural network model extracts the face features from all the face region images, and outputs the training results corresponding to the face region images according to the extracted face features.
  • the deep convolutional neural network model performs The human body feature region image extracts the human body posture feature, and the training result corresponding to the human body feature region image is output according to the extracted human body posture feature.
  • the deep convolutional neural network model extracts all the negative sample images.
  • the text feature, the face feature and the human posture feature, and the training result corresponding to the negative sample image is output according to the extracted text feature, the face feature and the human posture feature.
  • the loss function can be set according to requirements, and can be a multi-class cross-entropy loss function, or It may be a regression loss function, and the loss value is calculated by the loss function.
  • the preset convergence condition may be a condition that the value of the loss value is very small and will not drop after 7000 calculations, that is, the value of the loss value is very small and does not fall after 7000 calculations. When it drops again, stop training, and record the converged deep convolutional neural network model as an image recognition model; the preset convergence condition can also be a condition that the loss value is less than a set threshold, that is, When the loss value is less than the set threshold, the training is stopped, and the converged deep convolutional neural network model is recorded as an image recognition model.
  • the initial parameters of the iterative deep convolutional neural network model can be continuously updated to continuously move closer to the accurate recognition result, so that the accuracy of the recognition result is increasing. Higher.
  • the license plate area image, the face sample image, and the human body feature area image output by inputting the sample video into the sample collection model are used as sample training images, and the image recognition model is obtained by training the sample training images. Therefore, image recognition The model is more sensitive to the data of the target data set, thereby improving the image recognition model to be more targeted, the recognition accuracy rate is higher, and the recognition reliability is improved.
  • the extracted video frame image is input into an image recognition model, and the image recognition model performs image binarization on the video image.
  • the image recognition model performs image binarization on the video image.
  • edge detection processing and contour tracking processing the license plate area, face area and human body feature area are obtained, and the text feature of the license plate area is extracted through the character segmentation algorithm, and the license plate number result of the license plate area is obtained.
  • the YOLO algorithm extracts the face features of the face area, obtains the face result of the face area, and extracts the posture features of the human body feature area through the chain code curvature algorithm to obtain the human body feature
  • the results of the human body characteristics of the area including:
  • S306 Binarize the video frame image by using the image recognition model to obtain a grayscale image.
  • the gray-scale value of each pixel in the video frame image is calculated through the gray-scale algorithm, and the gray-scale value ranges from 0 to 255, which is a binarization process.
  • the gray value is arranged according to the position of the corresponding pixel point to obtain the gray image.
  • S307 Perform edge detection processing on the grayscale image by using the Candy algorithm to obtain an edge image.
  • the Candy algorithm (Canney edge extraction algorithm) is an algorithm for extracting the edge of an object.
  • the Candy algorithm uses a variational method to be less susceptible to noise interference and can detect the true edge of a weak edge.
  • the Candy algorithm first uses a two-dimensional Gaussian filter function for convolution, and performs noise reduction processing to reduce the noise of each pixel in the image to no interference; secondly, according to the point on the edge contour, the first-order partial derivative in two directions And gradient, calculate the gradient direction of the point; then according to the 0 degree, 45 degree, 90 degree and 135 degree directions corresponding to the gradient direction, determine the adjacent point; finally calculate the difference between the gray value of the point and the adjacent point, according to The difference determines the edge, and finally the final edge image is obtained.
  • S308 Perform contour tracking processing on the edge image according to the tracking criterion and the node criterion to obtain a contour image.
  • the contour tracking process is to start from a starting point and follow the edge route near the starting point to find points that meet the tracking criteria, and follow the points that meet the tracking criteria until the points that meet the tracking criteria are found. According to the points of the node criterion, the contour in the edge image is clearly marked, and the contour image is obtained.
  • S309 Analyze the contour image, and cut out the license plate area including the license plate, the face area including the human face, and the human body feature area including the human body posture.
  • contour images that is, analyzing the contour images containing license plates, human faces, or human poses, and determining the license plate area containing the license plate and all areas containing the human face from the analysis result
  • the human face area and the human body feature area related to the posture of the human body are included, and intercepted.
  • the image recognition model extracts the text feature of the license plate area to obtain the license plate number result.
  • the character segmentation algorithm is an image segmented into individual characters from the license plate area
  • the character recognition method is to recognize letters, Chinese characters and numbers through the segmented image of a single character, that is, extract a single
  • the character segmentation algorithm and the character recognition method are used to compare the image recognition model to the
  • the license plate area performs the extraction of the text feature, and the license plate result is determined according to the text feature.
  • the license plate result is whether the license plate number corresponding to the recognized license plate area is the same as the license plate number, and the license plate result It can be set according to requirements.
  • the license plate result may include the same as the target person's license plate number and the target person's license plate number different.
  • the YOLO (You Only Look Once) algorithm is an algorithm that uses a CNN operation to directly predict the categories and regions of different targets, and the image recognition model uses the YOLO algorithm to extract the face regions.
  • Face features are features related to the eyes, eyebrows, mouth, nose, ears, and face profile of the target
  • the face result is determined according to the face features
  • the face result represents
  • the face result can be set according to requirements.
  • the face result includes the face of the target person and the face of the target person.
  • the image recognition model recognizes the shoulder width scale, the neck scale, and the chest width scale of the human body feature region, and extracts according to the shoulder width scale, the neck scale, and the chest width scale The posture characteristics of the human body are extracted, and the result of the human body characteristics is obtained.
  • the chain code curvature algorithm is an algorithm that calculates the curvature value corresponding to the point four-connected chain code or the eight-connected chain code of the contour edge of the human body feature area.
  • the point of each contour edge is performed by the image recognition model.
  • Recognition, the shoulder width scale, the neck scale, and the chest width scale are recognized, and the human body posture feature correspondence is extracted according to the ratio between the shoulder width scale, the neck scale and the chest width scale
  • the human body feature result is determined according to the vector value corresponding to the human body posture feature.
  • the human body feature result table indicates whether the human body posture of the target person is contained in the human body feature area, and the human body feature result can be It is set according to requirements.
  • the result of the human body feature includes correlation with the human body posture of the target person and non-correlation with the human body posture of the target person.
  • This application obtains license plate area, face area and human body feature area after binarization processing of the video frame image, edge detection processing based on Candy algorithm and contour tracking processing, and uses character segmentation algorithm to identify the license plate number of the license plate area
  • using the YOLO algorithm to recognize the face results in the face area and the chain code curvature algorithm to recognize the human body feature results in the human body feature area realizing the rapid and accurate recognition of the license plate number result, the face result and the body feature result. Improved accuracy and reliability.
  • S40 Determine a recognition result corresponding to the video frame image according to the license plate number result, the face result and the human body feature result; the recognition result characterizes whether the video frame image contains the license plate number Any one of the face image and the human body feature map; the recognition result includes whether it is related or not related to the license plate number, the face image, and the human body feature map.
  • the license plate number result is different from the target person's license plate number, and the face result is not related to the target person's face, and the human body feature result is not related to the target person's body posture
  • the recognition result corresponding to the video frame image is not related to the license plate number, the face image, and the human body feature map. If the license plate number result is the same as the target person’s license plate number, or the If the face result is related to the target person’s face, or the human feature result is related to the target person’s body posture, it is determined that the recognition result corresponding to the video image is related to the license plate number, the face image, and the The human body feature map is not relevant. In this way, the video frame image of the target person driving the vehicle after the target person's face is partially occluded can be determined as the relevant image, and the video related to the target person can be more accurately identified.
  • the recognition result corresponding to the video frame image is determined according to the result of the license plate number, the result of the face, and the result of the human body feature ,include:
  • S401 Determine the license plate number result, the face result, and the human body feature result of the video frame image as a recognition set of the video frame image.
  • the recognition set is a set that includes a plurality of the license plate number results, a plurality of the face results, and a plurality of the human body feature results.
  • the recognition set matches the license plate number or the face image or the human body feature map, it is determined that the recognition result corresponding to the video frame image is the same as the license plate number.
  • the face image is related to the human body feature map.
  • none of the recognition set matches the license plate number or the face image or the human body feature map, so that it is determined that the recognition result corresponding to the video frame image is the same as the license plate number,
  • the face image and the human body feature map are not related.
  • the present application can quickly and accurately identify whether the video frame image contains any of the license plate number, the face image, and the human body feature map through the image recognition model, and obtain the correlation or Non-correlated recognition results improve the efficiency and reliability of recognition.
  • S50 According to the recognition result corresponding to the video frame image in each of the original videos, extract a plurality of matching segments in each of the original videos, and combine the matching segments with the corresponding corresponding segments.
  • the unique identification code is associated;
  • the matching segment is the video segment between the start frame image and the end frame image in the original video;
  • the start frame image refers to the previous video frame that is related and adjacent to the recognition result
  • the recognition result of the image is a non-correlated video frame image;
  • the end frame image refers to the recognition result is non-correlated and the previous video frame image adjacent to the recognition result is a related video frame image, or refers to the original video
  • the video frame image at the end is associated with the corresponding corresponding segments.
  • the recognition result corresponding to the video frame image that is, according to the recognition result in each of the original videos as a relevant video frame image and the recognition result as a non-relevant video frame image , Extracting multiple matching segments in each of the original videos, and associating the matching segments with the corresponding unique identification code.
  • the unique identification code can be set according to requirements, such as the unique
  • the identification code may be a time value on the time axis accurate to the second level, or a combination code that combines the unique identification code of the photographing device and the time value.
  • the matching segment is a video segment that includes any one of the license plate number, the face image, and the human body feature map.
  • the multiple matching segments in each original video are extracted according to the recognition result corresponding to the video frame image in each original video , And associating the matching segment with the unique identification code associated with the corresponding original video, including:
  • S502 Acquire all the start point frame images from the original video, and acquire all the end point frame images from the original video in chronological order.
  • the starting point frame image refers to a video frame image whose recognition result is relevant and the recognition result of the previous video frame image adjacent to it is irrelevant
  • the ending frame image refers to the recognition result being irrelevant and not relevant to it.
  • the recognition result of the adjacent previous video frame image is the relevant video frame image, or refers to the end video frame image in the original video, and all the start point frame images and all the end points are obtained from the original video Frame image.
  • the starting point frame image and the ending point frame image do not exist in the original video, it is determined that there is no matching video segment in the original video, that is, the original video is marked as a non-matching segment.
  • the start frame image is marked as the matching segment of the original video.
  • the end frame image is marked as the matching segment of the original video.
  • the video segment between the start point frame image and the end point frame image is cut out, and the video segment is marked as the The matching segment of the original video.
  • the matching segment is associated with the unique identification code associated with the original video.
  • the unique identification code can be set according to requirements. For example, the unique identification code can be accurate to the second level.
  • an interception method for obtaining matching segments is provided, which can accurately intercept required video segments, which improves efficiency and reduces costs.
  • the matching segments associated with the same unique identification code are spliced to obtain a video synthesis segment that develops in a chronological order, and the video synthesis segment is combined with the unique identifier. Code to associate.
  • S70 Sort all the video synthesis fragments according to a preset unique identification code order rule, and determine all the video synthesis fragments after sorting as target videos in the to-be-processed video set that are related to the target data set .
  • the unique identification code order rule is a sorting rule preset for the unique identification code according to requirements.
  • the unique identification code order rule may be the order of the road route the target drove, or may be the target The time sequence of the driver’s driving, etc., determine all the video synthesis segments after sorting as the target video in the to-be-processed video set that is related to the target image set and target data set.
  • the target video for the target person is provided to the responsible person, and the target video is used as evidence of the responsibility, without manual viewing and interception, and not affected by the mental state of the staff. Therefore, the investment cost is reduced. Improved work efficiency.
  • This application realizes that by acquiring the target data set containing the license plate number, the face image and the human body feature map and the to-be-processed video set containing multiple original videos, the video frame image is extracted from the original video, and the image binarization process is performed.
  • the relevant features are extracted through the combination of character segmentation algorithm, YOLO algorithm and chain code curvature algorithm to obtain the recognition result, which characterizes whether the video frame image contains the license plate number .
  • Any one of the face image and the human body feature map extracts multiple matching segments according to the recognition result, and stitches the matching segments to obtain a video synthesis segment, and synthesizes all the videos after sorting
  • the segment is determined to be the target video related to the target data set in the to-be-processed video set. Therefore, it is realized that the video segments related to the target data set are automatically cut out from the to-be-processed video set quickly and accurately, which improves the recognition efficiency. And the accuracy rate greatly reduces the input cost.
  • a video data processing device is provided, and the video data processing device corresponds to the video data processing method in the above-mentioned embodiment one-to-one.
  • the video data processing device includes a receiving module 11, an extraction module 12, an acquisition module 13, an identification module 14, an extraction module 15, a merging module 16 and a determination module 17.
  • the detailed description of each functional module is as follows:
  • the receiving module 11 is configured to receive a video extraction instruction to obtain a target data set and a video set to be processed;
  • the target data set includes a license plate number, a face image and a human body feature map, and the video set to be processed includes a number of original videos, Each of the original videos is associated with a unique identification code;
  • the extraction module 12 is configured to extract at least two video frame images from each of the original videos according to a preset extraction rule
  • the acquisition module 13 is configured to input the extracted video frame image into an image recognition model, and the image recognition model obtains the license plate area, In the face area and the human body feature area, the text feature of the license plate area is extracted through the character segmentation algorithm, and the license plate number result of the license plate area is obtained.
  • the facial feature extraction is performed on the face area through the YOLO algorithm, Acquiring a face result of the human face region, and extracting a human posture feature of the human body feature region by a chain code curvature algorithm, and acquiring a human body feature result of the human body feature region;
  • the recognition module 14 is configured to determine the recognition result corresponding to the video frame image according to the license plate number result, the face result and the human body feature result; the recognition result characterizes whether the video frame image contains Any one of the license plate number, the face image, and the human body feature map; the recognition result includes whether it is related or unrelated to the license plate number, the face image, and the human body feature map;
  • the extraction module 15 is configured to extract multiple matching segments in each original video according to the recognition result corresponding to the video frame image in each original video, and combine the matching segments with the Corresponding to the unique identification code;
  • the matching segment is the video segment between the start frame image and the end frame image in the original video;
  • the start frame image refers to the recognition result that is related and adjacent to it
  • the recognition result of the previous video frame image is a non-correlated video frame image;
  • the end frame image refers to the recognition result is non-correlated and the recognition result of the previous video frame image adjacent to it is a related video frame image, or refers to The last video frame image in the original video;
  • the merging module 16 is configured to splice the matching segments associated with the same unique identification code in chronological order to obtain a video synthesis segment, and associate the video synthesis segment with the same unique identification code;
  • the determining module 17 is configured to sort all the video synthesis fragments according to a preset unique identification code order rule, and determine all the video synthesis fragments after sorting as the to-be-processed video set and the target data set Related target videos.
  • the receiving module 11 includes:
  • the receiving unit is configured to receive a target collection instruction and obtain a sample video;
  • the sample video includes a license plate, a human face, and a specific human posture;
  • the first input unit is configured to input all the sample images into a sample collection model, the sample collection model performs image collection on all the sample images, and intercepts all the sample images containing the license plate area images of the license plate , At the same time, extract all the face region images that contain the face in all the sample images, and extract the feature region images of the human body that contain the face and the specific human posture from all the sample images;
  • the first acquiring unit is configured to input the license plate area image into a license plate extraction model, and the license plate extraction model performs license plate number recognition on the license plate area image to obtain a license plate number output by the license plate extraction model;
  • the second acquisition unit is configured to input all the face region images into a face extraction model, and the face extraction model screens all the face region images, and obtains the face extraction model to select a person Face image
  • the third acquisition unit is configured to input all the human body feature region images into a human body feature extraction model, and the human body feature extraction model screens all the human body feature region images to obtain one selected by the human body feature extraction model Human body feature map;
  • the first output unit is configured to determine the license plate number, the face image, and the human body feature map as the target data set.
  • the extraction module 12 includes:
  • a fourth acquiring unit configured to acquire the original video and the extraction parameters in the extraction rule
  • a first determining unit configured to determine a starting video frame image in the original video as a starting frame image
  • the second determining unit is configured to determine the ending video frame image in the original video as the ending frame image
  • An extracting unit configured to extract a video frame image from the start frame image at intervals of the extraction parameter until the end frame image
  • a third determining unit configured to determine the video frame image after extraction as a process frame image
  • the second output unit is configured to form all the video frame images in the original video by the start frame image, the end frame image, and the process frame image.
  • the acquisition module 13 includes:
  • the fifth acquisition unit is used to acquire a sample training image set;
  • the sample training image set includes a number of sample training images, and the sample training images include all the license plate area images, all the face sample images, and all the A human body feature region image and a number of negative sample images; each of the sample training images is associated with a sample training label;
  • the second input unit is configured to input the sample training image into a deep convolutional neural network model containing initial parameters, and extract text features, facial features, and human poses from the sample training image through the deep convolutional neural network model Feature, the training result output by the deep convolutional neural network model according to the extracted text feature, the face feature, and the human posture feature;
  • a loss unit configured to match the training result with the sample training label to obtain a loss value
  • a first convergence unit configured to record the deep convolutional neural network model after convergence as an image recognition model when the loss value reaches a preset convergence condition
  • the second convergence unit is configured to iteratively update the initial parameters of the deep convolutional neural network model when the loss value does not reach the preset convergence condition, until the loss value reaches the preset convergence condition,
  • the deep convolutional neural network model after convergence is recorded as an image recognition model.
  • the acquiring module 13 further includes:
  • the first processing unit is configured to perform binarization processing on the video frame image through the image recognition model to obtain a grayscale image
  • the second processing unit is configured to perform edge detection processing on the grayscale image by using the Candy algorithm to obtain an edge image
  • the third processing unit is configured to perform contour tracking processing on the edge image according to the tracking criterion and the node criterion to obtain a contour image;
  • An analysis unit configured to analyze the contour image, and cut out the license plate area containing the license plate, the face area containing the human face, and the human body feature area related to the posture of the human body;
  • the first extraction unit is configured to extract the text features of the license plate area by using a character segmentation algorithm and a character recognition method, and the image recognition model to obtain the license plate number result;
  • the second extraction unit is configured to use the YOLO algorithm and the image recognition model to extract the facial features of the facial region to obtain the facial result;
  • the third output unit is configured to use the chain code curvature algorithm to identify the shoulder width scale, neck scale and chest width scale of the human body feature area by the image recognition model.
  • the human body posture feature is extracted by the breast width scale, and the human body feature result is obtained.
  • the identification module 14 includes:
  • a fourth determining unit configured to determine the license plate number result, the face result, and the human body feature result of the video frame image as the recognition set of the video frame image;
  • the fifth determining unit is configured to determine that the recognition result corresponding to the video frame image is if there is a match with any one of the license plate number, the face image, and the human body feature map in the recognition set Related to the license plate number, the face image, and the human body feature map;
  • the sixth determining unit is configured to determine that the recognition result corresponding to the video frame image is the same as the license plate number if the recognition set does not match the license plate number, the face image, and the human body feature map. , The face image and the human body feature map are not related.
  • the extraction module 15 includes:
  • the sixth acquiring unit is configured to acquire the original video
  • a seventh acquiring unit configured to acquire all the start point frame images from the original video, and acquire all the end point frame images from the original video in chronological order;
  • a seventh determining unit configured to determine that the original video is a non-matching segment if the start point frame image and the end point frame image do not exist in the original video
  • An eighth determining unit configured to, if there is only one start frame image and no end frame image in the original video, record the start frame image as the matching segment of the original video;
  • a ninth determining unit configured to record the end frame image as the matching segment of the original video if there is only one end frame image in the original video and the start point frame image does not exist;
  • the tenth determining unit is configured to obtain a video segment between the start point frame image and the end point frame image if there is the end point frame image adjacent to it after the start point frame image, and compare the start point frame image A video segment between the end frame image and the end frame image is recorded as the matching segment of the original video;
  • the associating unit is used for associating all the matching segments with the unique identification code associated with the original video.
  • Each module in the above-mentioned video data processing device may be implemented in whole or in part by software, hardware, and a combination thereof.
  • the above-mentioned modules may be embedded in the form of hardware or independent of the processor in the computer equipment, or may be stored in the memory of the computer equipment in the form of software, so that the processor can call and execute the operations corresponding to the above-mentioned modules.
  • a computer device is provided.
  • the computer device may be a server, and its internal structure diagram may be as shown in FIG. 9.
  • the computer equipment includes a processor, a memory, a network interface, and a database connected through a system bus.
  • the processor of the computer device is used to provide calculation and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium stores an operating system, a computer program, and a database.
  • the internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage medium.
  • the network interface of the computer device is used to communicate with an external terminal through a network connection.
  • the computer program is executed by the processor to realize a video data processing method.
  • a computer device including a memory, a processor, and a computer program stored on the memory and capable of running on the processor.
  • the processor executes the computer program to implement the video data processing method in the foregoing embodiment.
  • a computer-readable storage medium on which a computer program is stored, and the computer program is executed by a processor to implement the video data processing method in the above-mentioned embodiment.
  • the computer-readable storage medium may mainly include a storage program area and a storage data area, where the storage program area may store an operating system, an application program required by at least one function, etc.; the storage data area may store Data created by the use of nodes, etc.
  • Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), and electrically programmable ROM (EPROM).
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), synchronous link (Synchlink) DRAM (SLDRAM), and memory bus dynamic RAM (RDRAM) and so on.
  • SRAM static RAM
  • DRAM dynamic RAM
  • SDRAM synchronous DRAM
  • SLDRAM synchronous link DRAM
  • RDRAM memory bus dynamic RAM
  • the blockchain referred to in this application is a new application mode of computer technology such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm.
  • Blockchain essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information for verification. The validity of the information (anti-counterfeiting) and the generation of the next block.
  • the blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.

Abstract

Disclosed are a video data processing method and apparatus, and a computer device and a storage medium. The method comprises: acquiring a set of target data and a set of videos to be processed; extracting at least two video frame images from each original video; an image identification model performing image binarization processing, edge detection processing and contour tracking processing on a video image to then obtain a license plate area, a facial area and a human-body feature area; extracting, by means of a character segmentation algorithm, a text feature from the license plate area to obtain a license plate number result; extracting, by means of a YOLO algorithm, a facial feature from the facial area to obtain a facial result; extracting, by means of a chain-code curvature algorithm, a human-body posture feature from the human-body feature area to obtain a human-body feature result; determining an identification result; extracting matching fragments; and performing splicing to obtain video composite fragments, and ranking the video composite fragments and then obtaining a target video. In addition, the present application further relates to blockchain technology. The set of target data may be stored in a blockchain node.

Description

视频数据处理方法、装置、计算机设备及存储介质Video data processing method, device, computer equipment and storage medium
本申请要求于2020年4月24日提交中国专利局、申请号为CN202010332455.9,发明名称为“视频数据处理方法、装置、计算机设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on April 24, 2020, the application number is CN202010332455.9, and the invention title is "Video data processing methods, devices, computer equipment and storage media", and its entire contents Incorporated in this application by reference.
技术领域Technical field
本申请涉及人工智能的数据处理领域,尤其涉及一种视频数据处理方法、装置、计算机设备及存储介质。This application relates to the field of artificial intelligence data processing, and in particular to a video data processing method, device, computer equipment, and storage medium.
背景技术Background technique
目前,随着交通监控视频的不断发展,监控视频数据量不断增加,在现有技术中,发明人意识到发生交通事故之后需要调取监控视频数据进行追踪和定责,大部分都是通过肉眼进行人工查看及截取,将截取出的视频片段进行拼接作为定责的证据,由于监控视频数据中拍摄的车牌、人脸、人体存在分散且范围小的问题,以致工作人员查找困难,而且工作效率会受工作人员的精神状态而影响,也容易出现遗漏而导致定责不当的风险,造成投入成本高且工作效率低下。At present, with the continuous development of traffic surveillance video, the amount of surveillance video data continues to increase. In the prior art, the inventor realized that after a traffic accident, surveillance video data needs to be retrieved for tracking and responsibilities, most of which are through the naked eye. Perform manual viewing and interception, and stitch the intercepted video clips as evidence of responsibility. Due to the scattered and small scope of the license plates, faces, and human bodies captured in the surveillance video data, it is difficult for the staff to find it and work efficiency It will be affected by the mental state of the staff, and it is also prone to the risk of omissions leading to improper responsibilities, resulting in high input costs and low work efficiency.
发明内容Summary of the invention
本申请提供一种视频数据处理方法、装置、计算机设备及存储介质,实现了自动从待处理视频集中快速地、准确地截取出与目标数据集相关的视频片段,本申请可应用于智慧交通领域中,从而推动智慧城市的建设,提高了识别效率和准确率,大大降低了投入成本。This application provides a video data processing method, device, computer equipment, and storage medium, which can automatically quickly and accurately cut out video clips related to the target data set from the video set to be processed. This application can be applied to the field of smart transportation In this way, it promotes the construction of smart cities, improves the efficiency and accuracy of recognition, and greatly reduces input costs.
一种视频数据处理方法,包括:A video data processing method, including:
接收视频提取指令,获取目标数据集和待处理视频集;所述目标数据集包括车牌号、人脸图像和人体特征图,所述待处理视频集包括若干个原始视频,每个所述原始视频都与一个唯一标识码关联;Receive a video extraction instruction to obtain a target data set and a video set to be processed; the target data set includes a license plate number, a face image, and a human body feature map; the video set to be processed includes a number of original videos, each of the original videos Are all associated with a unique identification code;
根据预设的抽取规则,从每个所述原始视频中抽取出至少两个视频帧图像;Extract at least two video frame images from each of the original videos according to a preset extraction rule;
将抽取出的所述视频帧图像输入图像识别模型,所述图像识别模型对所述视频图像经过图像二值化处理、边缘检测处理及轮廓追踪处理后,获得车牌区域、人脸区域和人体特征区域,通过字符分割算法对所述车牌区域进行文本特征的提取,获取所述车牌区域的车牌号结果,同时通过YOLO算法对所述人脸区域进行人脸特征的提取,获取所述人脸区域的人脸结果,并通过链码曲率算法对所述人体特征区域进行人体姿势特征的提取,获取所述人体特征区域的人体特征结果;The extracted video frame image is input into an image recognition model, and the image recognition model performs image binarization processing, edge detection processing, and contour tracking processing on the video image to obtain license plate area, face area, and human body features Region, extract the text features of the license plate area through the character segmentation algorithm, obtain the license plate number result of the license plate area, and extract the facial features of the face area through the YOLO algorithm to obtain the face area And extract the human body posture feature of the human body feature region through a chain code curvature algorithm to obtain the human body feature result of the human body feature region;
根据所述车牌号结果、所述人脸结果和所述人体特征结果,确定所述视频帧图像对应的识别结果;所述识别结果表征了所述视频帧图像是否包含有所述车牌号、所述人脸图像和所述人体特征图之中的任一个;所述识别结果包括与所述车牌号、所述人脸图像和所述人体特征图相关或非相关;According to the license plate number result, the face result, and the human body feature result, the recognition result corresponding to the video frame image is determined; the recognition result characterizes whether the video frame image contains the license plate number, Any one of the human face image and the human body feature map; the recognition result includes correlation or non-correlation with the license plate number, the human face image, and the human body feature map;
根据每个所述原始视频中的所述视频帧图像对应的所述识别结果,提取出每个所述原始视频中的多个匹配片段,并将所述匹配片段和与其对应的所述唯一标识码进行关联;所述匹配片段为所述原始视频中的起点帧图像与终点帧图像之间的视频片段;所述起点帧图像是指识别结果为相关且与其相邻的前一个视频帧图像的识别结果为非相关的视频帧图像;所述终点帧图像是指识别结果为非相关且与其相邻的前一个视频帧图像的识别结果为相关的视频帧图像,或指所述原始视频中的结尾的视频帧图像;According to the recognition result corresponding to the video frame image in each of the original videos, multiple matching segments in each of the original videos are extracted, and the matching segment and the unique identifier corresponding to it are extracted Code to associate; the matching segment is the video segment between the start frame image and the end frame image in the original video; the start frame image refers to the image of the previous video frame image that is related and adjacent to the recognition result The recognition result is a non-correlated video frame image; the end frame image means that the recognition result is non-correlated and the recognition result of the previous video frame image adjacent to it is the relevant video frame image, or refers to the video frame image in the original video Video frame image at the end;
将与相同的所述唯一标识码关联的所述匹配片段按照时间顺序进行拼接,得到视频合成片段,并将所述视频合成片段与相同的所述唯一标识码进行关联;Splicing the matching segments associated with the same unique identification code in chronological order to obtain a video synthesis segment, and associating the video synthesis segment with the same unique identification code;
按照预设的唯一标识码顺序规则,对所有所述视频合成片段进行排序,将排序后的所有所述视频合成片段确定为所述待处理视频集中与所述目标数据集相关的目标视频。According to a preset unique identification code order rule, all the video synthesis fragments are sorted, and all the video synthesis fragments after sorting are determined as target videos in the to-be-processed video set that are related to the target data set.
一种视频数据处理装置,包括:A video data processing device, including:
接收模块,用于接收视频提取指令,获取目标数据集和待处理视频集;所述目标数据集包括车牌号、人脸图像和人体特征图,所述待处理视频集包括若干个原始视频,每个所述原始视频都与一个唯一标识码关联;The receiving module is used to receive a video extraction instruction to obtain a target data set and a to-be-processed video set; the target data set includes a license plate number, a face image, and a human body feature map, and the to-be-processed video set includes a number of original videos, each Each of the original videos is associated with a unique identification code;
抽取模块,用于根据预设的抽取规则,从每个所述原始视频中抽取出至少两个视频帧图像;An extraction module, configured to extract at least two video frame images from each of the original videos according to a preset extraction rule;
获取模块,用于将抽取出的所述视频帧图像输入图像识别模型,所述图像识别模型对所述视频图像经过图像二值化处理、边缘检测处理及轮廓追踪处理后,获得车牌区域、人脸区域和人体特征区域,通过字符分割算法对所述车牌区域进行文本特征的提取,获取所述车牌区域的车牌号结果,同时通过YOLO算法对所述人脸区域进行人脸特征的提取,获取所述人脸区域的人脸结果,并通过链码曲率算法对所述人体特征区域进行人体姿势特征的提取,获取所述人体特征区域的人体特征结果;The acquisition module is used to input the extracted video frame image into an image recognition model, and the image recognition model obtains the license plate area and the person after image binarization processing, edge detection processing, and contour tracking processing are performed on the video image. Face area and human body feature area, extract the text feature of the license plate area through the character segmentation algorithm, obtain the license plate number result of the license plate area, and extract the facial features of the face area through the YOLO algorithm to obtain Extracting the human body posture feature of the human body feature region by using a chain code curvature algorithm to obtain the human body feature result of the human body feature region;
识别模块,用于根据所述车牌号结果、所述人脸结果和所述人体特征结果,确定所述视频帧图像对应的识别结果;所述识别结果表征了所述视频帧图像是否包含有所述车牌号、所述人脸图像和所述人体特征图之中的任一个;所述识别结果包括与所述车牌号、所述人脸图像和所述人体特征图相关或非相关;The recognition module is used to determine the recognition result corresponding to the video frame image according to the result of the license plate number, the face result and the result of the human body feature; the recognition result characterizes whether the video frame image contains Any one of the license plate number, the face image, and the human body feature map; the recognition result includes whether it is related or unrelated to the license plate number, the face image, and the human body feature map;
提取模块,用于根据每个所述原始视频中的所述视频帧图像对应的所述识别结果,提取出每个所述原始视频中的多个匹配片段,并将所述匹配片段和与其对应的所述唯一标识码进行关联;所述匹配片段为所述原始视频中的起点帧图像与终点帧图像之间的视频片段;所述起点帧图像是指识别结果为相关且与其相邻的前一个视频帧图像的识别结果为非相关的视频帧图像;所述终点帧图像是指识别结果为非相关且与其相邻的前一个视频帧图像的识别结果为相关的视频帧图像,或指所述原始视频中的结尾的视频帧图像;The extraction module is configured to extract multiple matching segments in each original video according to the recognition result corresponding to the video frame image in each original video, and compare the matching segments to the corresponding The unique identification code is associated with the unique identification code; the matching segment is the video segment between the start frame image and the end frame image in the original video; the start frame image refers to the previous recognition result that is related and adjacent to it The recognition result of a video frame image is a non-correlated video frame image; the end frame image refers to the recognition result is non-correlated and the recognition result of the previous video frame image adjacent to it is a related video frame image, or refers to the relevant video frame image. The video frame image at the end of the original video;
合并模块,用于将与相同的所述唯一标识码关联的所述匹配片段按照时间顺序进行拼接,得到视频合成片段,并将所述视频合成片段与相同的所述唯一标识码进行关联;A merging module, configured to splice the matching segments associated with the same unique identification code in chronological order to obtain a video synthesis segment, and associate the video synthesis segment with the same unique identification code;
确定模块,用于按照预设的唯一标识码顺序规则,对所有所述视频合成片段进行排序,将排序后的所有所述视频合成片段确定为所述待处理视频集中与所述目标数据集相关的目标视频。The determining module is configured to sort all the video synthesis fragments according to a preset unique identification code order rule, and determine all the video synthesis fragments after sorting as the to-be-processed video set related to the target data set The target video.
一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现如下步骤:A computer device includes a memory, a processor, and a computer program that is stored in the memory and can run on the processor, and the processor implements the following steps when the processor executes the computer program:
接收视频提取指令,获取目标数据集和待处理视频集;所述目标数据集包括车牌号、人脸图像和人体特征图,所述待处理视频集包括若干个原始视频,每个所述原始视频都与一个唯一标识码关联;Receive a video extraction instruction to obtain a target data set and a video set to be processed; the target data set includes a license plate number, a face image, and a human body feature map; the video set to be processed includes a number of original videos, each of the original videos Are all associated with a unique identification code;
根据预设的抽取规则,从每个所述原始视频中抽取出至少两个视频帧图像;Extract at least two video frame images from each of the original videos according to a preset extraction rule;
将抽取出的所述视频帧图像输入图像识别模型,所述图像识别模型对所述视频图像经过图像二值化处理、边缘检测处理及轮廓追踪处理后,获得车牌区域、人脸区域和人体特征区域,通过字符分割算法对所述车牌区域进行文本特征的提取,获取所述车牌区域的车牌号结果,同时通过YOLO算法对所述人脸区域进行人脸特征的提取,获取所述人脸区域的人脸结果,并通过链码曲率算法对所述人体特征区域进行人体姿势特征的提取,获取所述人体特征区域的人体特征结果;The extracted video frame image is input into an image recognition model, and the image recognition model performs image binarization processing, edge detection processing, and contour tracking processing on the video image to obtain license plate area, face area, and human body features Region, extract the text features of the license plate area through the character segmentation algorithm, obtain the license plate number result of the license plate area, and extract the facial features of the face area through the YOLO algorithm to obtain the face area And extract the human body posture feature of the human body feature region through a chain code curvature algorithm to obtain the human body feature result of the human body feature region;
根据所述车牌号结果、所述人脸结果和所述人体特征结果,确定所述视频帧图像对应的识别结果;所述识别结果表征了所述视频帧图像是否包含有所述车牌号、所述人脸图像和所述人体特征图之中的任一个;所述识别结果包括与所述车牌号、所述人脸图像和所述人体特征图相关或非相关;According to the license plate number result, the face result, and the human body feature result, the recognition result corresponding to the video frame image is determined; the recognition result characterizes whether the video frame image contains the license plate number, Any one of the human face image and the human body feature map; the recognition result includes correlation or non-correlation with the license plate number, the human face image, and the human body feature map;
根据每个所述原始视频中的所述视频帧图像对应的所述识别结果,提取出每个所述原始视频中的多个匹配片段,并将所述匹配片段和与其对应的所述唯一标识码进行关联;所述匹配片段为所述原始视频中的起点帧图像与终点帧图像之间的视频片段;所述起点帧图像是指识别结果为相关且与其相邻的前一个视频帧图像的识别结果为非相关的视频帧图像;所述终点帧图像是指识别结果为非相关且与其相邻的前一个视频帧图像的识别结果为相关的视频帧图像,或指所述原始视频中的结尾的视频帧图像;According to the recognition result corresponding to the video frame image in each of the original videos, multiple matching segments in each of the original videos are extracted, and the matching segment and the unique identifier corresponding to it are extracted Code to associate; the matching segment is the video segment between the start frame image and the end frame image in the original video; the start frame image refers to the image of the previous video frame image that is related and adjacent to the recognition result The recognition result is a non-correlated video frame image; the end frame image means that the recognition result is non-correlated and the recognition result of the previous video frame image adjacent to it is the relevant video frame image, or refers to the video frame image in the original video Video frame image at the end;
将与相同的所述唯一标识码关联的所述匹配片段按照时间顺序进行拼接,得到视频合成片段,并将所述视频合成片段与相同的所述唯一标识码进行关联;Splicing the matching segments associated with the same unique identification code in chronological order to obtain a video synthesis segment, and associating the video synthesis segment with the same unique identification code;
按照预设的唯一标识码顺序规则,对所有所述视频合成片段进行排序,将排序后的所有所述视频合成片段确定为所述待处理视频集中与所述目标数据集相关的目标视频。According to a preset unique identification code order rule, all the video synthesis fragments are sorted, and all the video synthesis fragments after sorting are determined as target videos in the to-be-processed video set that are related to the target data set.
一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,其中,所述计算机程序被处理器执行时实现如下步骤:A computer-readable storage medium having a computer program stored on the computer-readable storage medium, wherein, when the computer program is executed by a processor, the following steps are implemented:
接收视频提取指令,获取目标数据集和待处理视频集;所述目标数据集包括车牌号、人脸图像和人体特征图,所述待处理视频集包括若干个原始视频,每个所述原始视频都与一个唯一标识码关联;Receive a video extraction instruction to obtain a target data set and a video set to be processed; the target data set includes a license plate number, a face image, and a human body feature map; the video set to be processed includes a number of original videos, each of the original videos Are all associated with a unique identification code;
根据预设的抽取规则,从每个所述原始视频中抽取出至少两个视频帧图像;Extract at least two video frame images from each of the original videos according to a preset extraction rule;
将抽取出的所述视频帧图像输入图像识别模型,所述图像识别模型对所述视频图像经过图像二值化处理、边缘检测处理及轮廓追踪处理后,获得车牌区域、人脸区域和人体特征区域,通过字符分割算法对所述车牌区域进行文本特征的提取,获取所述车牌区域的车牌号结果,同时通过YOLO算法对所述人脸区域进行人脸特征的提取,获取所述人脸区域的人脸结果,并通过链码曲率算法对所述人体特征区域进行人体姿势特征的提取,获取所述人体特征区域的人体特征结果;The extracted video frame image is input into an image recognition model, and the image recognition model performs image binarization processing, edge detection processing, and contour tracking processing on the video image to obtain license plate area, face area, and human body features Region, extract the text features of the license plate area through the character segmentation algorithm, obtain the license plate number result of the license plate area, and extract the facial features of the face area through the YOLO algorithm to obtain the face area And extract the human body posture feature of the human body feature region through a chain code curvature algorithm to obtain the human body feature result of the human body feature region;
根据所述车牌号结果、所述人脸结果和所述人体特征结果,确定所述视频帧图像对应的识别结果;所述识别结果表征了所述视频帧图像是否包含有所述车牌号、所述人脸图像和所述人体特征图之中的任一个;所述识别结果包括与所述车牌号、所述人脸图像和所述人体特征图相关或非相关;According to the license plate number result, the face result, and the human body feature result, the recognition result corresponding to the video frame image is determined; the recognition result characterizes whether the video frame image contains the license plate number, Any one of the human face image and the human body feature map; the recognition result includes correlation or non-correlation with the license plate number, the human face image, and the human body feature map;
根据每个所述原始视频中的所述视频帧图像对应的所述识别结果,提取出每个所述原始视频中的多个匹配片段,并将所述匹配片段和与其对应的所述唯一标识码进行关联;所述匹配片段为所述原始视频中的起点帧图像与终点帧图像之间的视频片段;所述起点帧图像是指识别结果为相关且与其相邻的前一个视频帧图像的识别结果为非相关的视频帧图像;所述终点帧图像是指识别结果为非相关且与其相邻的前一个视频帧图像的识别结果为相关的视频帧图像,或指所述原始视频中的结尾的视频帧图像;According to the recognition result corresponding to the video frame image in each of the original videos, multiple matching segments in each of the original videos are extracted, and the matching segment and the unique identifier corresponding to it are extracted Code to associate; the matching segment is the video segment between the start frame image and the end frame image in the original video; the start frame image refers to the image of the previous video frame image that is related and adjacent to the recognition result The recognition result is a non-correlated video frame image; the end frame image means that the recognition result is non-correlated and the recognition result of the previous video frame image adjacent to it is the relevant video frame image, or refers to the video frame image in the original video Video frame image at the end;
将与相同的所述唯一标识码关联的所述匹配片段按照时间顺序进行拼接,得到视频合成片段,并将所述视频合成片段与相同的所述唯一标识码进行关联;Splicing the matching segments associated with the same unique identification code in chronological order to obtain a video synthesis segment, and associating the video synthesis segment with the same unique identification code;
按照预设的唯一标识码顺序规则,对所有所述视频合成片段进行排序,将排序后的所有所述视频合成片段确定为所述待处理视频集中与所述目标数据集相关的目标视频。According to a preset unique identification code order rule, all the video synthesis fragments are sorted, and all the video synthesis fragments after sorting are determined as target videos in the to-be-processed video set that are related to the target data set.
本申请可以自动从待处理视频集中快速地、准确地截取出与目标数据集相关的视频片段,提高了识别效率和准确率,大大降低了投入成本。This application can automatically quickly and accurately cut out video clips related to the target data set from the video set to be processed, which improves the recognition efficiency and accuracy, and greatly reduces the input cost.
附图说明Description of the drawings
图1是本申请一实施例中视频数据处理方法的应用环境示意图;FIG. 1 is a schematic diagram of an application environment of a video data processing method in an embodiment of the present application;
图2是本申请一实施例中视频数据处理方法的流程图;Figure 2 is a flowchart of a video data processing method in an embodiment of the present application;
图3是本申请一实施例中视频数据处理方法的步骤S10的流程图;Fig. 3 is a flowchart of step S10 of a video data processing method in an embodiment of the present application;
图4是本申请一实施例中视频数据处理方法的步骤S20的流程图;4 is a flowchart of step S20 of the video data processing method in an embodiment of the present application;
图5是本申请一实施例中视频数据处理方法的步骤S30的流程图;FIG. 5 is a flowchart of step S30 of the video data processing method in an embodiment of the present application;
图6是本申请另一实施例中视频数据处理方法的步骤S30的流程图;Fig. 6 is a flowchart of step S30 of a video data processing method in another embodiment of the present application;
图7是本申请一实施例中视频数据处理方法的步骤S40的流程图;FIG. 7 is a flowchart of step S40 of the video data processing method in an embodiment of the present application;
图8是本申请一实施例中视频数据处理装置的原理框图;Fig. 8 is a functional block diagram of a video data processing device in an embodiment of the present application;
图9是本申请一实施例中计算机设备的示意图;Fig. 9 is a schematic diagram of a computer device in an embodiment of the present application;
本申请目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。The realization, functional characteristics, and advantages of the purpose of this application will be further described in conjunction with the embodiments and with reference to the accompanying drawings.
具体实施方式Detailed ways
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The technical solutions in the embodiments of the present application will be described clearly and completely in conjunction with the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, rather than all of them. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of this application.
本申请提供的视频数据处理方法,可应用在如图1的应用环境中,其中,客户端(计算机设备)通过网络与服务器进行通信。其中,客户端(计算机设备)包括但不限于为各种个人计算机、笔记本电脑、智能手机、平板电脑、摄像头和便携式可穿戴设备。服务器可以用独立的服务器或者是多个服务器组成的服务器集群来实现。The video data processing method provided by this application can be applied in the application environment as shown in Fig. 1, in which the client (computer equipment) communicates with the server through the network. Among them, the client (computer equipment) includes, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, cameras, and portable wearable devices. The server can be implemented as an independent server or a server cluster composed of multiple servers.
在一实施例中,如图2所示,提供一种视频数据处理方法,其技术方案主要包括以下步骤S10-S70:In an embodiment, as shown in FIG. 2, a video data processing method is provided, and the technical solution mainly includes the following steps S10-S70:
S10,接收视频提取指令,获取目标数据集和待处理视频集;所述目标数据集包括车牌号、人脸图像和人体特征图,所述待处理视频集包括若干个原始视频,每个所述原始视频都与一个唯一标识码关联。S10. Receive a video extraction instruction, and obtain a target data set and a video set to be processed; the target data set includes a license plate number, a face image, and a human body feature map; the video set to be processed includes a number of original videos, each of which The original videos are all associated with a unique identification code.
可理解地,所述视频提取指令为选择所述目标数据集和所述待处理视频集之后触发的指令,所述目标数据集为与被查找的目标者相关的数据集,所述目标者为需要被查找的人,所述目标数据集包括车牌号、人脸图像和人体特征图,所述车牌号为目标者驾驶的车辆的唯一的车牌号码,所述人脸图像为目标者的人脸区域的图像,所述人体特征图为目标者某一特定人体姿势的图像,比如所述人体特征图为目标者驾驶车辆时的上半身图像,所述待处理视频集为查找与目标者相关的视频集合,所述待处理视频集包含至少一个所述原始视频,所述原始视频可以为一段未被加工的视频片段,也可以为一段被截取的视频片段,比如原始视频为某一天的某路口的监控视频,或者原始视频为某天19:00至21:00的某街道的监控视频等等,每个所述原始视频都与一个所述唯一标识码关联,所述唯一标识码为对所述原始视频赋予的唯一的标识码,所述唯一标识码可以根据需求进行设定。Understandably, the video extraction instruction is an instruction triggered after selecting the target data set and the to-be-processed video set, the target data set is a data set related to the searched target, and the target is The person who needs to be searched, the target data set includes a license plate number, a face image and a human body feature map, the license plate number is the unique license plate number of the vehicle driven by the target, and the face image is the face of the target The image of the region, the human body feature map is an image of a specific human posture of the target person, for example, the human body feature map is the upper body image of the target person driving a vehicle, and the to-be-processed video set is to find videos related to the target person The set of videos to be processed includes at least one of the original videos. The original video may be an unprocessed video clip or a clipped video clip, for example, the original video is of a certain intersection on a certain day Surveillance video, or the original video is surveillance video of a certain street from 19:00 to 21:00 on a certain day, etc., each of the original videos is associated with a unique identification code, and the unique identification code is The unique identification code assigned by the original video, and the unique identification code can be set according to requirements.
在一实施例中,如图3所示,所述步骤10之前,即所述获取目标数据集之前,包括:In an embodiment, as shown in FIG. 3, before the step 10, that is, before the acquisition of the target data set, the method includes:
S101,接收目标收集指令,获取样本视频;所述样本视频包含一个车牌、一个人脸和一个特定人体姿势。S101: Receive a target collection instruction, and obtain a sample video; the sample video includes a license plate, a human face, and a specific human posture.
可理解地,所述目标收集指令为需要对目标者进行信息收集时触发的指令,所述样本视频为一小段与所述目标者相关的视频片段,所述样本视频中的内容包含有一个车牌、一个目标者的人脸和一个目标者的特定人体姿势,即表明所述车牌、所述人脸和所述特定人体姿势在所述样本视频中至少出现一次。Understandably, the target collection instruction is an instruction that is triggered when the target person needs to collect information, the sample video is a short video clip related to the target person, and the content in the sample video contains a license plate , The face of a target person and the specific body posture of a target person, which means that the license plate, the face and the specific body posture appear at least once in the sample video.
S102,将所述样本视频拆分成若干个样本图像。S102: Split the sample video into several sample images.
可理解地,按照预设的拆分参数,每间隔所述拆分参数就获取所述样本视频中的一个所述样本图像,所述样本图像为从所述样本视频中选取的图像,所述拆分参数可以根据需求进行设定,比如所述拆分参数可以设置为30帧,也可以设置为25帧等等,将所述样本视频拆分出多个所述样本图像。Understandably, according to a preset splitting parameter, one of the sample images in the sample video is obtained every interval of the splitting parameter, and the sample image is an image selected from the sample videos, and the The splitting parameters can be set according to requirements. For example, the splitting parameters can be set to 30 frames or 25 frames, etc., to split the sample video into multiple sample images.
S103,将所有所述样本图像输入样本采集模型中,所述样本采集模型对所有所述样本图像进行图像采集,截取出所有所述样本图像中包含所述车牌的车牌区域图像,同时截取出所有所述样本图像中包含所述人脸的人脸区域图像,以及提取出所述所有样本图像中包 含所述人脸和所述特定人体姿势的人体特征区域图像。S103. Input all the sample images into a sample collection model, the sample collection model performs image collection on all the sample images, intercepts all the images of the license plate area containing the license plate in all the sample images, and intercepts all the images at the same time. The sample image includes the human face region image of the human face, and the human body feature region images including the human face and the specific human posture are extracted from all the sample images.
可理解地,所述样本采集模型指为了识别出图像中车牌区域、人脸区域和人体特征区域而训练完成的神经网络模型,所述图像采集为识别并截取出所述样本图像中包含车牌的矩形的车牌区域图像、包含所述人脸的矩形的人脸区域图像和包含所述人脸以及所述特定人体姿势的矩形的人体特征区域图像,所述样本采集模型的网络结构可以根据需求进行设定,比如所述样本采集模型的网络结构可以为Inception系列的网络结构,或者为VGG系列的网络结构。Understandably, the sample collection model refers to a neural network model that has been trained to identify the license plate area, face area, and human feature area in the image. The image collection is to identify and cut out the license plate contained in the sample image. A rectangular license plate area image, a rectangular face area image containing the face, and a rectangular human feature area image containing the face and the specific human posture. The network structure of the sample collection model can be implemented according to requirements Setting, for example, the network structure of the sample collection model can be the network structure of the Inception series or the network structure of the VGG series.
S104,将所述车牌区域图像输入车牌提取模型中,所述车牌提取模型对所述车牌区域图像进行车牌号识别,获取所述车牌提取模型输出的一个车牌号。S104: Input the license plate area image into a license plate extraction model, and the license plate extraction model performs license plate number recognition on the license plate area image, and obtains a license plate number output by the license plate extraction model.
可理解地,所述车牌提取模型指为了识别出图像中的车牌号且训练完成的神经网络模型,所述车牌提取模型的网络结构可以根据需求进行设定,比如所述车牌提取模型的网络结构可以通过迁移学习GoogleNet获得,所述车牌提取模型可以识别出所述车牌区域图像中的汉字、数字和字母,所述车牌号为目标者驾驶的车辆的唯一标识码,所述车牌号由汉字、数字和字母组成。Understandably, the license plate extraction model refers to a neural network model that has been trained to identify the license plate number in the image. The network structure of the license plate extraction model can be set according to requirements, such as the network structure of the license plate extraction model. It can be obtained by migration learning GoogleNet. The license plate extraction model can identify Chinese characters, numbers and letters in the license plate area image, the license plate number is the unique identification code of the vehicle driven by the target, and the license plate number consists of Chinese characters, Composition of numbers and letters.
S105,将所有所述人脸区域图像输入人脸提取模型中,所述人脸提取模型对所有所述人脸区域图像进行筛选,获取所述人脸提取模型筛选出一个人脸图像。S105: Input all the face region images into a face extraction model, and the face extraction model filters all the face region images, and obtains the face extraction model to filter a face image.
可理解地,通过所述人脸提取模型,从所有所述人脸区域图像中筛选出包含眼睛、眉毛、嘴巴、鼻子、耳朵、脸廓且清晰度最高的人脸区域图像,所述筛选方法可以根据需求进行设定,优选地,所述筛选方法为获取所有所述人脸区域图像中相同位置的像素点的像素值的平均值,将所有所述人脸区域图像中相同位置的像素点的像素值的平均值记录为该像素点对应的像素平均值,将所述人脸区域图像中每个像素点与该像素点对应的所述像素平均值之差并取绝对值得到绝对像素差,以及计算出所述人脸区域图像中所有像素点的所述绝对像素差之和,将所述人脸区域图像中所有像素点的所述绝对像素差之和确定为图像差值,选取出所有所述图像差值中最小对应的所述人脸区域图像确定为所述人脸图像,即将筛选后的所述人脸区域图像确定为所述人脸图像。Understandably, through the face extraction model, from all the face region images, the face region images containing eyes, eyebrows, mouth, nose, ears, face profile and having the highest definition are screened out, and the screening method It can be set according to requirements. Preferably, the filtering method is to obtain the average value of the pixel values of the pixels at the same position in all the face region images, and then combine all the pixels at the same position in the face region images. The average value of the pixel value is recorded as the average value of the pixel corresponding to the pixel, and the difference between each pixel in the face area image and the average of the pixel corresponding to the pixel is taken and the absolute value is taken to obtain the absolute pixel difference , And calculate the sum of the absolute pixel differences of all pixels in the face area image, determine the sum of the absolute pixel differences of all pixels in the face area image as the image difference value, and select The face area image with the smallest correspondence among all the image differences is determined to be the face image, that is, the face area image after screening is determined as the face image.
S106,将所有所述人体特征区域图像输入人体特征提取模型中,所述人体特征提取模型对所有所述人体特征区域图像进行筛选,获取所述人体特征提取模型筛选出的一个人体特征图。S106: Input all the human body feature region images into a human body feature extraction model, and the human body feature extraction model screens all the human body feature region images to obtain a human body feature map selected by the human body feature extraction model.
可理解地,通过所述人体特征提取模型,从所有所述人体特征区域图像中筛选出包含所述人脸和所述特定人体姿势的清晰度最高的所述人体特征区域图像,所述筛选方式可以根据需求进行设定,优选地,所述筛选方式为将每个所述人体特征区域图像中每个像素点对应的像素值与该像素点邻近的四周的像素点对应的像素值之和的差值确定为局部像素差,获取每个所述人体特征区域图像中所有所述局部像素差之和,选取出所有所述人体特征区域图像中所有所述局部像素差之和最小对应的所述人体特征区域图像确定为所述人体特征图,即将筛选后的所述人体特征区域图像确定为所述人体特征图。Understandably, the human body feature region image with the highest definition including the face and the specific human body posture is filtered from all the human body feature region images through the human body feature extraction model, and the screening method It can be set according to requirements. Preferably, the filtering method is the sum of the pixel value corresponding to each pixel in the image of the human body feature area and the pixel value corresponding to the surrounding pixels adjacent to the pixel. The difference is determined as the local pixel difference, the sum of all the local pixel differences in each of the human body feature region images is obtained, and the corresponding one with the smallest sum of all the local pixel differences in all the human body feature region images is selected The human body feature region image is determined as the human body feature map, that is, the human body feature region image after screening is determined as the human body feature map.
S107,将所述车牌号、所述人脸图像和所述人体特征图确定为所述目标数据集。S107: Determine the license plate number, the face image, and the human body feature map as the target data set.
可理解地,将所述车牌号、所述人脸图像和所述人体特征图确定记录为所述目标数据集,即所述目标数据集包含有与目标者相关的车牌号、人脸和特定人体姿势的信息集合。Understandably, the license plate number, the face image, and the human body feature map are determined and recorded as the target data set, that is, the target data set contains the license plate number, face, and specific information related to the target. Information collection of human posture.
如此,通过样本采集模型对样本视频进行图像采集,并通过车牌提取模型、人脸提取模型和人体特征提取模型,从图像采集之后的图像中获取车牌号、人脸图像和人脸特征图,实现了自动从样本视频中识别出车牌号以及截取出目标者的人脸和人体特征图,提高了识别的准确率及截取的可靠性、降低了人工成本,提高了效率。In this way, the sample video is collected through the sample collection model, and the license plate number, face image, and face feature map are obtained from the image after the image collection through the license plate extraction model, face extraction model, and human feature extraction model. It automatically recognizes the license plate number from the sample video and intercepts the target's face and human body feature maps, which improves the accuracy of recognition and the reliability of interception, reduces labor costs, and improves efficiency.
S20,根据预设的抽取规则,从每个所述原始视频中抽取出至少两个视频帧图像。S20: Extract at least two video frame images from each of the original videos according to a preset extraction rule.
可理解地,所述抽取规则可以根据需求进行设置,比如抽取规则可以设置为从原始视频中平均抽取出至少两个的视频帧图像,或者从原始视频中每间隔预设的帧参数抽取视频 帧图像等,所述视频帧图像为从所述原始时视频中的帧对应的图像。Understandably, the extraction rules can be set according to requirements. For example, the extraction rules can be set to extract at least two video frame images from the original video on average, or to extract video frames from the frame parameters preset in the original video at every interval. Image, etc., the video frame image is an image corresponding to a frame in the original video.
在一实施例中,如图4所示,所述步骤S20中,即所述根据预设的抽取规则,从每个所述原始视频中抽取出至少两个视频帧图像,包括:In an embodiment, as shown in FIG. 4, the step S20, that is, extracting at least two video frame images from each of the original videos according to a preset extraction rule, includes:
S201,获取所述原始视频和所述抽取规则中的抽取参数。S201: Acquire extraction parameters in the original video and the extraction rule.
可理解地,所述抽取规则可以根据需求进行设置,所述抽取规则的目的为从每个所述原始视频中抽取出至少两个视频帧图像,比如所述抽取规则可以为获取所述原始视频中的开始的视频帧图像和结尾的视频帧图像,以及在所述原始视频中的平分中心点位置的视频帧图像,所述抽取规则也可以为获取所述原始视频中的开始的视频帧图像和结尾的视频帧图像,以及在所述原始视频中间部分每间隔预设的抽取参数选取一个视频帧图像,所述抽取参数可以根据需求进行设定,比如所述抽取参数为15帧、25帧、30帧等等。Understandably, the extraction rule can be set according to requirements, and the purpose of the extraction rule is to extract at least two video frame images from each of the original videos. For example, the extraction rule can be to obtain the original video. The start video frame image and the end video frame image in the original video, and the video frame image at the bisecting center position in the original video, the extraction rule may also be to obtain the start video frame image in the original video The video frame image at the end and the end of the original video, and the extraction parameter preset in the middle part of the original video selects one video frame image. The extraction parameter can be set according to requirements, for example, the extraction parameter is 15 frames or 25 frames. , 30 frames and so on.
S202,将所述原始视频中的开始的视频帧图像确定为开始帧图像。S202: Determine a starting video frame image in the original video as a starting frame image.
可理解地,将所述原始视频中的第一帧对应的视频帧图像确定为开始帧图像。Understandably, the video frame image corresponding to the first frame in the original video is determined as the start frame image.
S203,将所述原始视频中的结尾的视频帧图像确定为结尾帧图像。S203: Determine the ending video frame image in the original video as the ending frame image.
可理解地,将所述原始视频中最后一帧对应的视频帧图像确定为结尾帧图像。Understandably, the video frame image corresponding to the last frame in the original video is determined as the end frame image.
S204,从所述开始帧图像开始,每隔所述抽取参数抽取一个视频帧图像,直到所述结尾帧图像。S204: Starting from the start frame image, extract one video frame image every other extraction parameters until the end frame image.
可理解地,所述抽取参数可以根据需求进行设定,比如所述抽取参数可以设定为25帧(约1秒),即从所述开始帧图像开始,每隔25帧就抽取出一个视频帧图像,直到所述结尾帧图像且不够25帧间隔时停止。Understandably, the extraction parameters can be set according to requirements. For example, the extraction parameters can be set to 25 frames (about 1 second), that is, starting from the start frame image, a video is extracted every 25 frames Frame images, stop until the end frame image is less than the 25-frame interval.
S205,将抽取之后的所述视频帧图像确定为过程帧图像。S205: Determine the video frame image after the extraction as a process frame image.
可理解地,所述过程帧图像为所述原始视频中除了所述开始帧图像和所述结尾帧图像之外的被抽取之后的所述视频帧图像。Understandably, the process frame image is the video frame image after extraction except for the start frame image and the end frame image in the original video.
S206,由所述开始帧图像、所述结尾帧图像和所述过程帧图像组成所述原始视频中的所有所述视频帧图像。S206: All the video frame images in the original video are composed of the start frame image, the end frame image, and the process frame image.
可理解地,将所述开始帧图像、所述结尾帧图像和所有所述过程帧图像确定为所述原始视频中的所有所述视频帧图像。Understandably, the start frame image, the end frame image, and all the process frame images are determined as all the video frame images in the original video.
如此,提供了一种从原始视频中抽取视频帧图像的方法,并且保证从原始视频中抽取至少两个视频帧图像,通过抽取规则避免了识别过程中遗漏导致抽取不足的问题。In this way, a method for extracting video frame images from the original video is provided, and it is guaranteed that at least two video frame images are extracted from the original video, and the problem of insufficient extraction caused by omission in the recognition process is avoided through the extraction rule.
S30,将抽取出的所述视频帧图像输入图像识别模型,所述图像识别模型对所述视频图像经过图像二值化处理、边缘检测处理及轮廓追踪处理后,获得车牌区域、人脸区域和人体特征区域,通过字符分割算法对所述车牌区域进行文本特征的提取,获取所述车牌区域的车牌号结果,同时通过YOLO算法对所述人脸区域进行人脸特征的提取,获取所述人脸区域的人脸结果,并通过链码曲率算法对所述人体特征区域进行人体姿势特征的提取,获取所述人体特征区域的人体特征结果。S30. Input the extracted video frame image into an image recognition model, and the image recognition model obtains the license plate area, the face area, and In the human body feature area, extract the text feature of the license plate area through the character segmentation algorithm, and obtain the license plate number result of the license plate area. At the same time, extract the facial features of the face area through the YOLO algorithm to obtain the person And extract the human body posture feature from the human body feature region by using a chain code curvature algorithm to obtain the human body feature result of the human body feature region.
可理解地,所述图像识别模型为包含所述图像二值化处理、所述边缘检测处理、所述轮廓追踪处理、所述字符分割算法、所述YOLO算法和所述链码曲率算法的训练完成的神经网络模型,所述图像识别模型能够运用所述图像二值化处理、所述边缘检测处理和所述轮廓追踪处理识别出所述视频图像中的所述车牌区域、所述人脸区域和所述人体特征区域,再通过所述字符分割算法能够识别所述车牌区域的所述车牌号结果,以及通过所述YOLO算法能够识别出所述人脸区域的所述人脸结果,最后通过所述链码曲率算法能够识别出所述人体特征区域的所述人体特征结果。Understandably, the image recognition model includes training of the image binarization processing, the edge detection processing, the contour tracking processing, the character segmentation algorithm, the YOLO algorithm, and the chain code curvature algorithm The completed neural network model, the image recognition model can use the image binarization processing, the edge detection processing and the contour tracking processing to identify the license plate area and the face area in the video image And the human body feature area, the license plate number result of the license plate area can be recognized by the character segmentation algorithm, and the face result of the face area can be recognized by the YOLO algorithm, and finally passed The chain code curvature algorithm can identify the human body feature result of the human body feature region.
其中,所述YOLO(You Only Look Once)算法为使用一个CNN(Convolutional Neural Networks,卷积神经网络)运算直接预测不同目标的类别和区域的算法。The YOLO (You Only Look Once) algorithm is an algorithm that uses a CNN (Convolutional Neural Networks, convolutional neural network) operation to directly predict the categories and regions of different targets.
在一实施例中,如图5所示,所述步骤S30之前,即所述将抽取出的所述视频帧图像输入图像识别模型之前,包括:In an embodiment, as shown in FIG. 5, before step S30, that is, before inputting the extracted video frame image into an image recognition model, the method includes:
S301,获取样本训练图像集;所述样本训练图像集包含若干个样本训练图像,所述样本训练图像包括所有所述车牌区域图像、所有所述人脸样本图像、所有所述人体特征区域图像和若干个负样本图像;每个所述样本训练图像都与一个样本训练标签关联。S301. Obtain a sample training image set; the sample training image set includes several sample training images, and the sample training images include all the license plate region images, all the face sample images, all the human feature region images, and A number of negative sample images; each of the sample training images is associated with a sample training label.
可理解地,所述样本训练图像集为所述样本训练图像的集合,所述样本训练图像中包含所有所述车牌区域图像、所有所述人脸区域图像、所有所述人体特征区域图像和至少一个所述负样本图像,所述负样本图像为收集的不包含所述车牌、所述人脸和所述特定人体姿势的图像,所述样本训练图像都与一个所述样本训练标签关联,所述样本训练标签可以根据需求进行设定,比如所述样本训练标签可以设置为包括相关和非相关,或者设置为包括相关车牌、相关人脸、相关人体特征和非相关等等。Understandably, the sample training image set is a collection of the sample training images, and the sample training images include all the license plate area images, all the face area images, all the human feature area images, and at least One of the negative sample images, the negative sample images are collected images that do not contain the license plate, the face, and the specific human posture, and the sample training images are all associated with one of the sample training tags, so The sample training label can be set according to requirements. For example, the sample training label can be set to include relevant and non-relevant, or set to include relevant license plate, relevant face, relevant human body feature, and non-relevant, and so on.
S302,将所述样本训练图像输入含有初始参数的深度卷积神经网络模型,通过所述深度卷积神经网络模型对所述样本训练图像提取文本特征、人脸特征和人体姿势特征,所述深度卷积神经网络模型根据提取的所述文本特征、所述人脸特征和所述人体姿势特征输出的训练结果。S302. Input the sample training image into a deep convolutional neural network model containing initial parameters, and extract text features, facial features, and human posture features from the sample training image through the deep convolutional neural network model. The convolutional neural network model outputs a training result according to the extracted text feature, the face feature, and the human posture feature.
可理解地,所述文本特征为与车牌颜色、形状和文字相关的特征,所述人脸特征为与目标者的眼睛、眉毛、嘴巴、鼻子、耳朵、脸廓相关的特征,所述人体姿势特征为与目标者的头、人脸、手臂、肩等相关的特征,所述训练结果表征了所述样本训练图像中是否包含所述车牌号、所述人脸图像和所述人体特征图之中的任一个,所述深度卷积神经网络模型对所有所述车牌区域图像提取所述文本特征,根据提取出的所述文本特征输出所述车牌区域图像对应的训练结果,所述深度卷积神经网络模型对所有所述人脸区域图像提取所述人脸特征,根据提取出的所述人脸特征输出所述人脸区域图像对应的训练结果,所述深度卷积神经网络模型对所有所述人体特征区域图像提取所述人体姿势特征,根据提取出的所述人体姿势特征输出所述人体特征区域图像对应的训练结果,所述深度卷积神经网络模型对所述负样本图像分别提取所述文本特征、所述人脸特征和所述人体姿势特征,根据提取出的所述文本特征、所述人脸特征和所述人体姿势特征输出所述负样本图像对应的训练结果。Understandably, the text features are features related to the color, shape, and text of the license plate, the facial features are features related to the target's eyes, eyebrows, mouth, nose, ears, and face profile, and the human body posture Features are features related to the target’s head, face, arms, shoulders, etc. The training result represents whether the sample training image contains the license plate number, the face image, and the human feature map. In any one of the above, the deep convolutional neural network model extracts the text feature from all the license plate region images, and outputs the training result corresponding to the license plate region image according to the extracted text features, and the deep convolution The neural network model extracts the face features from all the face region images, and outputs the training results corresponding to the face region images according to the extracted face features. The deep convolutional neural network model performs The human body feature region image extracts the human body posture feature, and the training result corresponding to the human body feature region image is output according to the extracted human body posture feature. The deep convolutional neural network model extracts all the negative sample images. The text feature, the face feature and the human posture feature, and the training result corresponding to the negative sample image is output according to the extracted text feature, the face feature and the human posture feature.
S303,将所述训练结果与所述样本训练标签进行匹配,得到损失值。S303: Match the training result with the sample training label to obtain a loss value.
可理解地,将所述训练结果和所述样本训练标签输入所述深度卷积神经网络模型中的损失函数,所述损失函数可以根据需求进行设定,可以为多分类交叉熵损失函数,也可以为回归损失函数,通过所述损失函数计算出所述损失值。Understandably, the training result and the sample training label are input into the loss function in the deep convolutional neural network model. The loss function can be set according to requirements, and can be a multi-class cross-entropy loss function, or It may be a regression loss function, and the loss value is calculated by the loss function.
S304,在所述损失值达到预设的收敛条件时,将收敛之后的所述深度卷积神经网络模型记录为图像识别模型。S304: When the loss value reaches a preset convergence condition, record the deep convolutional neural network model after convergence as an image recognition model.
其中,所述预设的收敛条件可以为所述损失值经过了7000次计算后值为很小且不会再下降的条件,即在所述损失值经过7000次计算后值为很小且不会再下降时,停止训练,并将收敛后的所述深度卷积神经网络模型记录为图像识别模型;所述预设的收敛条件也可以为所述损失值小于设定阈值的条件,即在所述损失值小于设定阈值时,停止训练,并将收敛后的所述深度卷积神经网络模型记录为图像识别模型。Wherein, the preset convergence condition may be a condition that the value of the loss value is very small and will not drop after 7000 calculations, that is, the value of the loss value is very small and does not fall after 7000 calculations. When it drops again, stop training, and record the converged deep convolutional neural network model as an image recognition model; the preset convergence condition can also be a condition that the loss value is less than a set threshold, that is, When the loss value is less than the set threshold, the training is stopped, and the converged deep convolutional neural network model is recorded as an image recognition model.
S305,在所述损失值未达到预设的收敛条件时,迭代更新所述深度卷积神经网络模型的初始参数,直至所述损失值达到所述预设的收敛条件时,将收敛之后的所述深度卷积神经网络模型记录为图像识别模型。S305: When the loss value does not reach a preset convergence condition, iteratively update the initial parameters of the deep convolutional neural network model, until the loss value reaches the preset convergence condition, converge all subsequent parameters. The deep convolutional neural network model is recorded as an image recognition model.
可理解地,在所述损失值未达到预设的收敛条件时,不断更新迭代所述深度卷积神经网络模型的初始参数,可以不断向准确的识别结果靠拢,让识别结果的准确率越来越高。Understandably, when the loss value does not reach the preset convergence condition, the initial parameters of the iterative deep convolutional neural network model can be continuously updated to continuously move closer to the accurate recognition result, so that the accuracy of the recognition result is increasing. Higher.
如此,通过将样本视频输入样本采集模型而输出的车牌区域图像、所述人脸样本图像和所述人体特征区域图像作为样本训练图像,通过样本训练图像进行训练得到图像识别模型,因此,图像识别模型对目标数据集的数据更加敏感,从而提高了图像识别模型更具有针对性,识别准确率更高,提升了识别可靠性。In this way, the license plate area image, the face sample image, and the human body feature area image output by inputting the sample video into the sample collection model are used as sample training images, and the image recognition model is obtained by training the sample training images. Therefore, image recognition The model is more sensitive to the data of the target data set, thereby improving the image recognition model to be more targeted, the recognition accuracy rate is higher, and the recognition reliability is improved.
在一实施例中,如图6所示,所述步骤S30中,即所述将抽取出的所述视频帧图像输入图像识别模型,所述图像识别模型对所述视频图像经过图像二值化处理、边缘检测处理及轮廓追踪处理后,获得车牌区域、人脸区域和人体特征区域,通过字符分割算法对所述车牌区域进行文本特征的提取,获取所述车牌区域的车牌号结果,同时通过YOLO算法对所述人脸区域进行人脸特征的提取,获取所述人脸区域的人脸结果,并通过链码曲率算法对所述人体特征区域进行人体姿势特征的提取,获取所述人体特征区域的人体特征结果,包括:In one embodiment, as shown in FIG. 6, in the step S30, the extracted video frame image is input into an image recognition model, and the image recognition model performs image binarization on the video image. After processing, edge detection processing and contour tracking processing, the license plate area, face area and human body feature area are obtained, and the text feature of the license plate area is extracted through the character segmentation algorithm, and the license plate number result of the license plate area is obtained. The YOLO algorithm extracts the face features of the face area, obtains the face result of the face area, and extracts the posture features of the human body feature area through the chain code curvature algorithm to obtain the human body feature The results of the human body characteristics of the area, including:
S306,通过所述图像识别模型对所述视频帧图像进行二值化处理,得到灰度图像。S306: Binarize the video frame image by using the image recognition model to obtain a grayscale image.
可理解地,通过所述灰度算法,计算出所述视频帧图像中每个像素点的灰度值,所述灰度值范围为0至255,即为二值化处理,将所有所述灰度值按照对应像素点的位置排布即得到灰度图像。Understandably, the gray-scale value of each pixel in the video frame image is calculated through the gray-scale algorithm, and the gray-scale value ranges from 0 to 255, which is a binarization process. The gray value is arranged according to the position of the corresponding pixel point to obtain the gray image.
S307,通过Candy算法对所述灰度图像进行边缘检测处理,得到边缘图像。S307: Perform edge detection processing on the grayscale image by using the Candy algorithm to obtain an edge image.
可理解地,所述Candy算法(坎尼边缘提取算法)为提取物体边缘的算法,所述Candy算法使用了变分法做到不容易受噪声干扰,并且能够检测出弱边缘的真正边缘,所述Candy算法首先采用二维高斯滤波函数进行卷积,进行降噪处理使图像中的每个像素的噪声降到无干扰的影响;其次根据边缘轮廓上的点、两个方向的一阶偏导数及梯度,计算出该点的梯度方向;再根据梯度方向对应的0度,45度,90度和135度方向,确定邻接点;最后计算该点与邻接点的灰度值的差值,根据差值确定边缘,最终得到所述最终边缘图像。Understandably, the Candy algorithm (Canney edge extraction algorithm) is an algorithm for extracting the edge of an object. The Candy algorithm uses a variational method to be less susceptible to noise interference and can detect the true edge of a weak edge. The Candy algorithm first uses a two-dimensional Gaussian filter function for convolution, and performs noise reduction processing to reduce the noise of each pixel in the image to no interference; secondly, according to the point on the edge contour, the first-order partial derivative in two directions And gradient, calculate the gradient direction of the point; then according to the 0 degree, 45 degree, 90 degree and 135 degree directions corresponding to the gradient direction, determine the adjacent point; finally calculate the difference between the gray value of the point and the adjacent point, according to The difference determines the edge, and finally the final edge image is obtained.
S308,根据跟踪准则和结点准则,对所述边缘图像进行轮廓跟踪处理,得到轮廓图像。S308: Perform contour tracking processing on the edge image according to the tracking criterion and the node criterion to obtain a contour image.
可理解地,所述轮廓跟踪处理为从一个起点开始按照该起点附近的边缘路线进行寻找,寻找出符合所述跟踪准则的点,顺着符合所述跟踪准则的点一直跟踪直到寻找到符合所述结点准则的点,从而将所述边缘图像中的轮廓进行明确标识,进而得到所述轮廓图像。Understandably, the contour tracking process is to start from a starting point and follow the edge route near the starting point to find points that meet the tracking criteria, and follow the points that meet the tracking criteria until the points that meet the tracking criteria are found. According to the points of the node criterion, the contour in the edge image is clearly marked, and the contour image is obtained.
S309,对所述轮廓图像进行分析,截取出包含车牌的所述车牌区域、包含人脸的所述人脸区域和包含人体姿势相关的所述人体特征区域。S309: Analyze the contour image, and cut out the license plate area including the license plate, the face area including the human face, and the human body feature area including the human body posture.
可理解地,通过对所有所述轮廓图像进行图像分析,即分析所述轮廓图像中包含车牌、人脸或者人体姿势,从分析结果中确定出包含车牌的所述车牌区域、包含人脸的所述人脸区域和包含人体姿势相关的所述人体特征区域,并进行截取。Understandably, by performing image analysis on all the contour images, that is, analyzing the contour images containing license plates, human faces, or human poses, and determining the license plate area containing the license plate and all areas containing the human face from the analysis result The human face area and the human body feature area related to the posture of the human body are included, and intercepted.
S310,通过字符分割算法和字符识别方法,所述图像识别模型提取所述车牌区域的所述文本特征,得到所述车牌号结果。S310: Using a character segmentation algorithm and a character recognition method, the image recognition model extracts the text feature of the license plate area to obtain the license plate number result.
可理解地,所述字符分割算法为从所述车牌区域中分割成单个字符的图像,所述字符识别方法为通过对分割后的单个字符的图像进行字母、汉字和数字的识别,即提取单个字符的图像的所述文本特征,根据提取的所述文本特征识别出该单个字符的图像对应的字符值,根据所述字符分割算法和所述字符识别方法,通过所述图像识别模型对所述车牌区域进行所述文本特征的提取,根据所述文本特征确定出所述车牌结果,所述车牌结果为识别出的所述车牌区域对应的车牌号码与所述车牌号是否相同,所述车牌结果可以根据需求进行设定,比如所述车牌结果可以包括与目标者的车牌号相同和与目标者的车牌号不相同。Understandably, the character segmentation algorithm is an image segmented into individual characters from the license plate area, and the character recognition method is to recognize letters, Chinese characters and numbers through the segmented image of a single character, that is, extract a single According to the text feature of the image of the character, the character value corresponding to the image of the single character is recognized according to the extracted text feature, and the character segmentation algorithm and the character recognition method are used to compare the image recognition model to the The license plate area performs the extraction of the text feature, and the license plate result is determined according to the text feature. The license plate result is whether the license plate number corresponding to the recognized license plate area is the same as the license plate number, and the license plate result It can be set according to requirements. For example, the license plate result may include the same as the target person's license plate number and the target person's license plate number different.
S311,通过YOLO算法,所述图像识别模型提取所述人脸区域的所述人脸特征,得到所述人脸结果。S311: Using the YOLO algorithm, the image recognition model extracts the facial features of the facial region to obtain the facial result.
可理解地,所述YOLO(You Only Look Once)算法为使用一个CNN运算直接预测不同目标的类别和区域的算法,所述图像识别模型通过所述YOLO算法提取所述人脸区域中的所述人脸特征,所述人脸特征为与目标者的眼睛、眉毛、嘴巴、鼻子、耳朵、脸廓相关的特征,根据所述人脸特征确定出所述人脸结果,所述人脸结果表征了所述人脸区域中是否包含有目标者的人脸,所述人脸结果可以根据需求进行设定,比如所述人脸结果包括与目标者人脸相关和与目标者人脸非相关。Understandably, the YOLO (You Only Look Once) algorithm is an algorithm that uses a CNN operation to directly predict the categories and regions of different targets, and the image recognition model uses the YOLO algorithm to extract the face regions. Face features, the face features are features related to the eyes, eyebrows, mouth, nose, ears, and face profile of the target, the face result is determined according to the face features, and the face result represents In order to determine whether the face area contains the face of the target person, the face result can be set according to requirements. For example, the face result includes the face of the target person and the face of the target person.
S312,通过链码曲率算法,所述图像识别模型识别出所述人体特征区域的肩宽尺度、 脖子尺度和胸宽尺度,根据所述肩宽尺度、所述脖子尺度和所述胸宽尺度提取出所述人体姿势特征,得到所述人体特征结果。S312: Using the chain code curvature algorithm, the image recognition model recognizes the shoulder width scale, the neck scale, and the chest width scale of the human body feature region, and extracts according to the shoulder width scale, the neck scale, and the chest width scale The posture characteristics of the human body are extracted, and the result of the human body characteristics is obtained.
可理解地,链码曲率算法为计算所述人体特征区域的轮廓边缘的点四连通链码或者八连通链码对应的曲率值的算法,通过所述图像识别模型对每个轮廓边缘的点进行识别,识别出所述肩宽尺度、所述脖子尺度和所述胸宽尺度,根据所述肩宽尺度、所述脖子尺度和所述胸宽尺度之间的比率提取出所述人体姿势特征对应的向量值,根据所述人体姿势特征对应的向量值确定出所述人体特征结果,所述人体特征结果表针了所述人体特征区域中是否包含有目标者的人体姿势,所述人体特征结果可以根据需求进行设定,比如所述人体特征结果包括与目标者人体姿势相关和与目标者人体姿势非相关。Understandably, the chain code curvature algorithm is an algorithm that calculates the curvature value corresponding to the point four-connected chain code or the eight-connected chain code of the contour edge of the human body feature area. The point of each contour edge is performed by the image recognition model. Recognition, the shoulder width scale, the neck scale, and the chest width scale are recognized, and the human body posture feature correspondence is extracted according to the ratio between the shoulder width scale, the neck scale and the chest width scale The human body feature result is determined according to the vector value corresponding to the human body posture feature. The human body feature result table indicates whether the human body posture of the target person is contained in the human body feature area, and the human body feature result can be It is set according to requirements. For example, the result of the human body feature includes correlation with the human body posture of the target person and non-correlation with the human body posture of the target person.
本申请通过对所述视频帧图像进行二值化处理、基于Candy算法的边缘检测处理和轮廓跟踪处理后得到车牌区域、人脸区域和人体特征区域,运用字符分割算法识别出车牌区域的车牌号结果,运用YOLO算法识别人脸区域的人脸结果和运用链码曲率算法识别出人体特征区域的人体特征结果,实现了快速地、准确地识别出车牌号结果、人脸结果和人体特征结果,提高了准确率和可靠性。This application obtains license plate area, face area and human body feature area after binarization processing of the video frame image, edge detection processing based on Candy algorithm and contour tracking processing, and uses character segmentation algorithm to identify the license plate number of the license plate area As a result, using the YOLO algorithm to recognize the face results in the face area and the chain code curvature algorithm to recognize the human body feature results in the human body feature area, realizing the rapid and accurate recognition of the license plate number result, the face result and the body feature result. Improved accuracy and reliability.
S40,根据所述车牌号结果、所述人脸结果和所述人体特征结果,确定所述视频帧图像对应的识别结果;所述识别结果表征了所述视频帧图像是否包含有所述车牌号、所述人脸图像和所述人体特征图之中的任一个;所述识别结果包括与所述车牌号、所述人脸图像和所述人体特征图相关或非相关。S40: Determine a recognition result corresponding to the video frame image according to the license plate number result, the face result and the human body feature result; the recognition result characterizes whether the video frame image contains the license plate number Any one of the face image and the human body feature map; the recognition result includes whether it is related or not related to the license plate number, the face image, and the human body feature map.
可理解地,如果所述车牌号结果为与目标者的车牌号不相同,且所述人脸结果为与目标者人脸非相关,且所述人体特征结果为与目标者人体姿势非相关,则确定所述视频帧图像对应的识别结果为与所述车牌号、所述人脸图像和所述人体特征图非相关,如果所述车牌号结果为与目标者的车牌号相同,或者所述人脸结果为与目标者人脸相关,或者所述人体特征结果为与目标者人体姿势相关,则确定所述视频图像对应的识别结果为与所述车牌号、所述人脸图像和所述人体特征图非相关,如此,可以将所述目标者的人脸部分被遮挡后且目标者驾驶着车辆的视频帧图像确定为相关的图像,能够更加准确地识别出与目标者相关的视频。Understandably, if the license plate number result is different from the target person's license plate number, and the face result is not related to the target person's face, and the human body feature result is not related to the target person's body posture, It is determined that the recognition result corresponding to the video frame image is not related to the license plate number, the face image, and the human body feature map. If the license plate number result is the same as the target person’s license plate number, or the If the face result is related to the target person’s face, or the human feature result is related to the target person’s body posture, it is determined that the recognition result corresponding to the video image is related to the license plate number, the face image, and the The human body feature map is not relevant. In this way, the video frame image of the target person driving the vehicle after the target person's face is partially occluded can be determined as the relevant image, and the video related to the target person can be more accurately identified.
在一实施例中,如图7所示,所述步骤S40中,即所述根据所述车牌号结果、所述人脸结果和所述人体特征结果,确定所述视频帧图像对应的识别结果,包括:In one embodiment, as shown in FIG. 7, in the step S40, the recognition result corresponding to the video frame image is determined according to the result of the license plate number, the result of the face, and the result of the human body feature ,include:
S401,将所述视频帧图像的所述车牌号结果、所述人脸结果和所述人体特征结果确定为所述视频帧图像的识别集合。S401: Determine the license plate number result, the face result, and the human body feature result of the video frame image as a recognition set of the video frame image.
可理解地,所述识别集合为包含有多个所述车牌号结果、多个所述人脸结果和多个所述人体特征结果的集合。Understandably, the recognition set is a set that includes a plurality of the license plate number results, a plurality of the face results, and a plurality of the human body feature results.
S402,若所述识别集合中存在与所述车牌号、所述人脸图像和所述人体特征图之中的任一个相匹配,确定所述视频帧图像对应的识别结果为与所述车牌号、所述人脸图像和所述人体特征图相关。S402: If there is a match between the license plate number, the face image, and the human body feature map in the recognition set, determine that the recognition result corresponding to the video frame image is the same as the license plate number. , The face image is related to the human body feature map.
可理解地,只要所述识别集合中的其中一个与所述车牌号或者所述人脸图像或者所述人体特征图相匹配,就确定所述视频帧图像对应的识别结果为与所述车牌号、所述人脸图像和所述人体特征图相关。Understandably, as long as one of the recognition sets matches the license plate number or the face image or the human body feature map, it is determined that the recognition result corresponding to the video frame image is the same as the license plate number. , The face image is related to the human body feature map.
S403,若所述识别集合中与所述车牌号、所述人脸图像和所述人体特征图均不匹配,确定所述视频帧图像对应的识别结果为与所述车牌号、所述人脸图像和所述人体特征图非相关。S403: If the recognition set does not match the license plate number, the face image, and the human body feature map, it is determined that the recognition result corresponding to the video frame image is the same as the license plate number and the face image. The image is not related to the human body feature map.
可理解地,所述识别集合中没有任何一个与所述车牌号或者所述人脸图像或者所述人体特征图相匹配,从而确定所述视频帧图像对应的识别结果为与所述车牌号、所述人脸图像和所述人体特征图非相关。Understandably, none of the recognition set matches the license plate number or the face image or the human body feature map, so that it is determined that the recognition result corresponding to the video frame image is the same as the license plate number, The face image and the human body feature map are not related.
如此,本申请通过图像识别模型能够快速地、准确地识别出所述视频帧图像是否包含 有所述车牌号、所述人脸图像和所述人体特征图之中的任一个,得出相关或非相关的识别结果,提高了识别效率和可靠性。In this way, the present application can quickly and accurately identify whether the video frame image contains any of the license plate number, the face image, and the human body feature map through the image recognition model, and obtain the correlation or Non-correlated recognition results improve the efficiency and reliability of recognition.
S50,根据每个所述原始视频中的所述视频帧图像对应的所述识别结果,提取出每个所述原始视频中的多个匹配片段,并将所述匹配片段和与其对应的所述唯一标识码进行关联;所述匹配片段为所述原始视频中的起点帧图像与终点帧图像之间的视频片段;所述起点帧图像是指识别结果为相关且与其相邻的前一个视频帧图像的识别结果为非相关的视频帧图像;所述终点帧图像是指识别结果为非相关且与其相邻的前一个视频帧图像的识别结果为相关的视频帧图像,或指所述原始视频中的结尾的视频帧图像。S50: According to the recognition result corresponding to the video frame image in each of the original videos, extract a plurality of matching segments in each of the original videos, and combine the matching segments with the corresponding corresponding segments. The unique identification code is associated; the matching segment is the video segment between the start frame image and the end frame image in the original video; the start frame image refers to the previous video frame that is related and adjacent to the recognition result The recognition result of the image is a non-correlated video frame image; the end frame image refers to the recognition result is non-correlated and the previous video frame image adjacent to the recognition result is a related video frame image, or refers to the original video The video frame image at the end.
可理解地,根据的所述视频帧图像对应的所述识别结果,即根据每个所述原始视频中的所述识别结果为相关的视频帧图像和所述识别结果为非相关的视频帧图像,提取出每个所述原始视频中的多个匹配片段,并将所述匹配片段和与其对应的所述唯一标识码进行关联,所述唯一标识码可以根据需求进行设定,比如所述唯一标识码可以为精确至秒级的时间轴的时间值,或者为拍摄设备的唯一标识码与时间值进行组合的组合码。Understandably, according to the recognition result corresponding to the video frame image, that is, according to the recognition result in each of the original videos as a relevant video frame image and the recognition result as a non-relevant video frame image , Extracting multiple matching segments in each of the original videos, and associating the matching segments with the corresponding unique identification code. The unique identification code can be set according to requirements, such as the unique The identification code may be a time value on the time axis accurate to the second level, or a combination code that combines the unique identification code of the photographing device and the time value.
其中,所述匹配片段为包含有所述车牌号、所述人脸图像和所述人体特征图之中的任一个相关的视频片段。Wherein, the matching segment is a video segment that includes any one of the license plate number, the face image, and the human body feature map.
在一实施例中,所述步骤S50中,即所述根据每个所述原始视频中的所述视频帧图像对应的所述识别结果,提取出每个所述原始视频中的多个匹配片段,并将所述匹配片段与其对应的原始视频关联的所述唯一标识码进行关联,包括:In one embodiment, in the step S50, the multiple matching segments in each original video are extracted according to the recognition result corresponding to the video frame image in each original video , And associating the matching segment with the unique identification code associated with the corresponding original video, including:
S501,获取所述原始视频。S501: Obtain the original video.
S502,按照时间顺序,从所述原始视频中获取所有所述起点帧图像,从所述原始视频中获取所有所述终点帧图像。S502: Acquire all the start point frame images from the original video, and acquire all the end point frame images from the original video in chronological order.
可理解地,所述起点帧图像是指识别结果为相关且与其相邻的前一个视频帧图像的识别结果为非相关的视频帧图像,所述终点帧图像是指识别结果为非相关且与其相邻的前一个视频帧图像的识别结果为相关的视频帧图像,或指所述原始视频中的结尾的视频帧图像,从所述原始视频中获取所有所述起点帧图像和所有所述终点帧图像。Understandably, the starting point frame image refers to a video frame image whose recognition result is relevant and the recognition result of the previous video frame image adjacent to it is irrelevant, and the ending frame image refers to the recognition result being irrelevant and not relevant to it. The recognition result of the adjacent previous video frame image is the relevant video frame image, or refers to the end video frame image in the original video, and all the start point frame images and all the end points are obtained from the original video Frame image.
S503,若所述原始视频中不存在所述起点帧图像和所述终点帧图像,确定所述原始视频为无匹配片段。S503: If the start frame image and the end frame image do not exist in the original video, determine that the original video is a non-matching segment.
可理解地,如果所述原始视频中不存在所述起点帧图像和所述终点帧图像,则确定所述原始视频中没有相匹配的视频片段,即将所述原始视频标记为无匹配片段。Understandably, if the starting point frame image and the ending point frame image do not exist in the original video, it is determined that there is no matching video segment in the original video, that is, the original video is marked as a non-matching segment.
S504,若所述原始视频中只存在一个所述起点帧图像且不存在所述终点帧图像,将所述起点帧图像记录为所述原始视频的所述匹配片段。S504: If there is only one start frame image and no end frame image in the original video, record the start frame image as the matching segment of the original video.
可理解地,如果所述原始视频中只存在一个所述起点帧图像而且不存在所述终点帧图像,则将所述起点帧图像标记为所述原始视频的所述匹配片段。Understandably, if there is only one start frame image and no end frame image in the original video, then the start frame image is marked as the matching segment of the original video.
S505,若所述原始视频中只存在一个所述终点帧图像且不存在所述起点帧图像,将所述终点帧图像记录为所述原始视频的所述匹配片段。S505: If there is only one end frame image in the original video and the start point frame image does not exist, record the end frame image as the matching segment of the original video.
可理解地,如果所述原始视频中只存在一个所述终点帧图像而且不存在所述起点帧图像,则将所述终点帧图像标记为所述原始视频的所述匹配片段。Understandably, if there is only one end frame image in the original video and the start point frame image does not exist, then the end frame image is marked as the matching segment of the original video.
S506,若所述起点帧图像之后存在与其相邻的所述终点帧图像,获取所述起点帧图像与所述终点帧图像之间的视频片段,并将所述起点帧图像与所述终点帧图像之间的视频片段记录为所述原始视频的所述匹配片段。S506: If the end frame image adjacent to the start point frame image exists after the start point frame image, obtain a video segment between the start point frame image and the end point frame image, and combine the start point frame image and the end point frame image The video segment between the images is recorded as the matching segment of the original video.
可理解地,如果所述起点帧图像之后存在与其相邻的所述终点帧图像,截取出所述起点帧图像与所述终点帧图像之间的视频片段,并将该视频片段标记为所述原始视频的所述匹配片段。Understandably, if there is an end frame image adjacent to it after the start point frame image, the video segment between the start point frame image and the end point frame image is cut out, and the video segment is marked as the The matching segment of the original video.
S507,将所有所述匹配片段与所述原始视频关联的所述唯一标识码进行关联。S507: Associate all the matched segments with the unique identification code associated with the original video.
可理解地,将所述匹配片段和与所述原始视频关联的所述唯一标识码进行关联,所述 唯一标识码可以根据需求进行设定,比如所述唯一标识码可以为精确至秒级的时间轴的时间值,或者为拍摄设备的唯一标识码与时间值进行组合的组合码。Understandably, the matching segment is associated with the unique identification code associated with the original video. The unique identification code can be set according to requirements. For example, the unique identification code can be accurate to the second level. The time value of the time axis, or the combination code of the unique identification code of the shooting device and the time value.
如此,提供了一种获取匹配片段的截取方法,能够准确截取出所需的视频片段,提高了效率,减少了成本。In this way, an interception method for obtaining matching segments is provided, which can accurately intercept required video segments, which improves efficiency and reduces costs.
S60,将与相同的所述唯一标识码关联的所述匹配片段按照时间顺序进行拼接,得到视频合成片段,并将所述视频合成片段与相同的所述唯一标识码进行关联。S60, splicing the matching segments associated with the same unique identification code in chronological order to obtain a video synthesis segment, and associating the video synthesis segment with the same unique identification code.
可理解地,按照时间顺序,将与相同的所述唯一标识码关联的所述匹配片段进行拼接,获得一个按时间进度发展的所述视频合成片段,以及将所述视频合成片段与该唯一标识码进行关联。Understandably, according to chronological order, the matching segments associated with the same unique identification code are spliced to obtain a video synthesis segment that develops in a chronological order, and the video synthesis segment is combined with the unique identifier. Code to associate.
S70,按照预设的唯一标识码顺序规则,对所有所述视频合成片段进行排序,将排序后的所有所述视频合成片段确定为所述待处理视频集中与所述目标数据集相关的目标视频。S70. Sort all the video synthesis fragments according to a preset unique identification code order rule, and determine all the video synthesis fragments after sorting as target videos in the to-be-processed video set that are related to the target data set .
可理解地,所述唯一标识码顺序规则为对所述唯一标识码根据需求而预设的排序规则,比如所述唯一标识码顺序规则可以为目标者驾驶经过的道路路线顺序,或者可以为目标者驾驶的时间顺序等等,将排序后的所有所述视频合成片段确定为所述待处理视频集中与所述目标图像集目标数据集相关的目标视频,如此,在发生交通事故之后,能够快速地将针对目标者的所述目标视频提供给定责人员,并将所述目标视频作为定责证据,无需人工查看及截取,也不受工作人员的精神状态影响,因此,减少了投入成本,提高了工作效率。Understandably, the unique identification code order rule is a sorting rule preset for the unique identification code according to requirements. For example, the unique identification code order rule may be the order of the road route the target drove, or may be the target The time sequence of the driver’s driving, etc., determine all the video synthesis segments after sorting as the target video in the to-be-processed video set that is related to the target image set and target data set. In this way, after a traffic accident, it can be quickly The target video for the target person is provided to the responsible person, and the target video is used as evidence of the responsibility, without manual viewing and interception, and not affected by the mental state of the staff. Therefore, the investment cost is reduced. Improved work efficiency.
本申请实现了通过获取含有车牌号、人脸图像和人体特征图的目标数据集和含有多个原始视频的待处理视频集,从原始视频中抽取出视频帧图像经过图像二值化处理、边缘检测处理及轮廓追踪处理后,再通过字符分割算法、YOLO算法和链码曲率算法的组合提取相关的特征后得到识别结果,所述识别结果表征了所述视频帧图像是否包含有所述车牌号、所述人脸图像和所述人体特征图之中的任一个,根据识别结果提取出多个匹配片段,并对所述匹配片段进行拼接得到视频合成片段,将排序后的所有所述视频合成片段确定为所述待处理视频集中与所述目标数据集相关的目标视频,因此,实现了自动从待处理视频集中快速地、准确地截取出与目标数据集相关的视频片段,提高了识别效率和准确率,大大降低了投入成本。This application realizes that by acquiring the target data set containing the license plate number, the face image and the human body feature map and the to-be-processed video set containing multiple original videos, the video frame image is extracted from the original video, and the image binarization process is performed. After the detection processing and contour tracking processing, the relevant features are extracted through the combination of character segmentation algorithm, YOLO algorithm and chain code curvature algorithm to obtain the recognition result, which characterizes whether the video frame image contains the license plate number , Any one of the face image and the human body feature map extracts multiple matching segments according to the recognition result, and stitches the matching segments to obtain a video synthesis segment, and synthesizes all the videos after sorting The segment is determined to be the target video related to the target data set in the to-be-processed video set. Therefore, it is realized that the video segments related to the target data set are automatically cut out from the to-be-processed video set quickly and accurately, which improves the recognition efficiency. And the accuracy rate greatly reduces the input cost.
在一实施例中,提供一种视频数据处理装置,该视频数据处理装置与上述实施例中视频数据处理方法一一对应。如图8所示,该视频数据处理装置包括接收模块11、抽取模块12、获取模块13、识别模块14、提取模块15、合并模块16和确定模块17。各功能模块详细说明如下:In one embodiment, a video data processing device is provided, and the video data processing device corresponds to the video data processing method in the above-mentioned embodiment one-to-one. As shown in FIG. 8, the video data processing device includes a receiving module 11, an extraction module 12, an acquisition module 13, an identification module 14, an extraction module 15, a merging module 16 and a determination module 17. The detailed description of each functional module is as follows:
接收模块11,用于接收视频提取指令,获取目标数据集和待处理视频集;所述目标数据集包括车牌号、人脸图像和人体特征图,所述待处理视频集包括若干个原始视频,每个所述原始视频都与一个唯一标识码关联;The receiving module 11 is configured to receive a video extraction instruction to obtain a target data set and a video set to be processed; the target data set includes a license plate number, a face image and a human body feature map, and the video set to be processed includes a number of original videos, Each of the original videos is associated with a unique identification code;
抽取模块12,用于根据预设的抽取规则,从每个所述原始视频中抽取出至少两个视频帧图像;The extraction module 12 is configured to extract at least two video frame images from each of the original videos according to a preset extraction rule;
获取模块13,用于将抽取出的所述视频帧图像输入图像识别模型,所述图像识别模型对所述视频图像经过图像二值化处理、边缘检测处理及轮廓追踪处理后,获得车牌区域、人脸区域和人体特征区域,通过字符分割算法对所述车牌区域进行文本特征的提取,获取所述车牌区域的车牌号结果,同时通过YOLO算法对所述人脸区域进行人脸特征的提取,获取所述人脸区域的人脸结果,并通过链码曲率算法对所述人体特征区域进行人体姿势特征的提取,获取所述人体特征区域的人体特征结果;The acquisition module 13 is configured to input the extracted video frame image into an image recognition model, and the image recognition model obtains the license plate area, In the face area and the human body feature area, the text feature of the license plate area is extracted through the character segmentation algorithm, and the license plate number result of the license plate area is obtained. At the same time, the facial feature extraction is performed on the face area through the YOLO algorithm, Acquiring a face result of the human face region, and extracting a human posture feature of the human body feature region by a chain code curvature algorithm, and acquiring a human body feature result of the human body feature region;
识别模块14,用于根据所述车牌号结果、所述人脸结果和所述人体特征结果,确定所述视频帧图像对应的识别结果;所述识别结果表征了所述视频帧图像是否包含有所述车牌号、所述人脸图像和所述人体特征图之中的任一个;所述识别结果包括与所述车牌号、所 述人脸图像和所述人体特征图相关或非相关;The recognition module 14 is configured to determine the recognition result corresponding to the video frame image according to the license plate number result, the face result and the human body feature result; the recognition result characterizes whether the video frame image contains Any one of the license plate number, the face image, and the human body feature map; the recognition result includes whether it is related or unrelated to the license plate number, the face image, and the human body feature map;
提取模块15,用于根据每个所述原始视频中的所述视频帧图像对应的所述识别结果,提取出每个所述原始视频中的多个匹配片段,并将所述匹配片段和与其对应的所述唯一标识码进行关联;所述匹配片段为所述原始视频中的起点帧图像与终点帧图像之间的视频片段;所述起点帧图像是指识别结果为相关且与其相邻的前一个视频帧图像的识别结果为非相关的视频帧图像;所述终点帧图像是指识别结果为非相关且与其相邻的前一个视频帧图像的识别结果为相关的视频帧图像,或指所述原始视频中的结尾的视频帧图像;The extraction module 15 is configured to extract multiple matching segments in each original video according to the recognition result corresponding to the video frame image in each original video, and combine the matching segments with the Corresponding to the unique identification code; the matching segment is the video segment between the start frame image and the end frame image in the original video; the start frame image refers to the recognition result that is related and adjacent to it The recognition result of the previous video frame image is a non-correlated video frame image; the end frame image refers to the recognition result is non-correlated and the recognition result of the previous video frame image adjacent to it is a related video frame image, or refers to The last video frame image in the original video;
合并模块16,用于将与相同的所述唯一标识码关联的所述匹配片段按照时间顺序进行拼接,得到视频合成片段,并将所述视频合成片段与相同的所述唯一标识码进行关联;The merging module 16 is configured to splice the matching segments associated with the same unique identification code in chronological order to obtain a video synthesis segment, and associate the video synthesis segment with the same unique identification code;
确定模块17,用于按照预设的唯一标识码顺序规则,对所有所述视频合成片段进行排序,将排序后的所有所述视频合成片段确定为所述待处理视频集中与所述目标数据集相关的目标视频。The determining module 17 is configured to sort all the video synthesis fragments according to a preset unique identification code order rule, and determine all the video synthesis fragments after sorting as the to-be-processed video set and the target data set Related target videos.
在一实施例中,所述接收模块11包括:In an embodiment, the receiving module 11 includes:
接收单元,用于接收目标收集指令,获取样本视频;所述样本视频包含一个车牌、一个人脸和一个特定人体姿势;The receiving unit is configured to receive a target collection instruction and obtain a sample video; the sample video includes a license plate, a human face, and a specific human posture;
拆分单元,用于将所述样本视频拆分成若干个样本图像;A splitting unit for splitting the sample video into several sample images;
第一输入单元,用于将所有所述样本图像输入样本采集模型中,所述样本采集模型对所有所述样本图像进行图像采集,截取出所有所述样本图像中包含所述车牌的车牌区域图像,同时截取出所有所述样本图像中包含所述人脸的人脸区域图像,以及提取出所述所有样本图像中包含所述人脸和所述特定人体姿势的人体特征区域图像;The first input unit is configured to input all the sample images into a sample collection model, the sample collection model performs image collection on all the sample images, and intercepts all the sample images containing the license plate area images of the license plate , At the same time, extract all the face region images that contain the face in all the sample images, and extract the feature region images of the human body that contain the face and the specific human posture from all the sample images;
第一获取单元,用于将所述车牌区域图像输入车牌提取模型中,所述车牌提取模型对所述车牌区域图像进行车牌号识别,获取所述车牌提取模型输出的一个车牌号;The first acquiring unit is configured to input the license plate area image into a license plate extraction model, and the license plate extraction model performs license plate number recognition on the license plate area image to obtain a license plate number output by the license plate extraction model;
第二获取单元,用于将所有所述人脸区域图像输入人脸提取模型中,所述人脸提取模型对所有所述人脸区域图像进行筛选,获取所述人脸提取模型筛选出一个人脸图像;The second acquisition unit is configured to input all the face region images into a face extraction model, and the face extraction model screens all the face region images, and obtains the face extraction model to select a person Face image
第三获取单元,用于将所有所述人体特征区域图像输入人体特征提取模型中,所述人体特征提取模型对所有所述人体特征区域图像进行筛选,获取所述人体特征提取模型筛选出的一个人体特征图;The third acquisition unit is configured to input all the human body feature region images into a human body feature extraction model, and the human body feature extraction model screens all the human body feature region images to obtain one selected by the human body feature extraction model Human body feature map;
第一输出单元,用于将所述车牌号、所述人脸图像和所述人体特征图确定为所述目标数据集。The first output unit is configured to determine the license plate number, the face image, and the human body feature map as the target data set.
在一实施例中,所述抽取模块12包括:In an embodiment, the extraction module 12 includes:
第四获取单元,用于获取所述原始视频和所述抽取规则中的抽取参数;A fourth acquiring unit, configured to acquire the original video and the extraction parameters in the extraction rule;
第一确定单元,用于将所述原始视频中的开始的视频帧图像确定为开始帧图像;A first determining unit, configured to determine a starting video frame image in the original video as a starting frame image;
第二确定单元,用于将所述原始视频中的结尾的视频帧图像确定为结尾帧图像;The second determining unit is configured to determine the ending video frame image in the original video as the ending frame image;
抽取单元,用于从所述开始帧图像开始,每隔所述抽取参数抽取一个视频帧图像,直到所述结尾帧图像;An extracting unit, configured to extract a video frame image from the start frame image at intervals of the extraction parameter until the end frame image;
第三确定单元,用于将抽取之后的所述视频帧图像确定为过程帧图像;A third determining unit, configured to determine the video frame image after extraction as a process frame image;
第二输出单元,用于由所述开始帧图像、所述结尾帧图像和所述过程帧图像组成所述原始视频中的所有所述视频帧图像。The second output unit is configured to form all the video frame images in the original video by the start frame image, the end frame image, and the process frame image.
在一实施例中,所述获取模块13包括:In an embodiment, the acquisition module 13 includes:
第五获取单元,用于获取样本训练图像集;所述样本训练图像集包含若干个样本训练图像,所述样本训练图像包括所有所述车牌区域图像、所有所述人脸样本图像、所有所述人体特征区域图像和若干个负样本图像;每个所述样本训练图像都与一个样本训练标签关联;The fifth acquisition unit is used to acquire a sample training image set; the sample training image set includes a number of sample training images, and the sample training images include all the license plate area images, all the face sample images, and all the A human body feature region image and a number of negative sample images; each of the sample training images is associated with a sample training label;
第二输入单元,用于将所述样本训练图像输入含有初始参数的深度卷积神经网络模型,通过所述深度卷积神经网络模型对所述样本训练图像提取文本特征、人脸特征和人体姿势 特征,所述深度卷积神经网络模型根据提取的所述文本特征、所述人脸特征和所述人体姿势特征输出的训练结果;The second input unit is configured to input the sample training image into a deep convolutional neural network model containing initial parameters, and extract text features, facial features, and human poses from the sample training image through the deep convolutional neural network model Feature, the training result output by the deep convolutional neural network model according to the extracted text feature, the face feature, and the human posture feature;
损失单元,用于将所述训练结果与所述样本训练标签进行匹配,得到损失值;A loss unit, configured to match the training result with the sample training label to obtain a loss value;
第一收敛单元,用于在所述损失值达到预设的收敛条件时,将收敛之后的所述深度卷积神经网络模型记录为图像识别模型;A first convergence unit, configured to record the deep convolutional neural network model after convergence as an image recognition model when the loss value reaches a preset convergence condition;
第二收敛单元,用于在所述损失值未达到预设的收敛条件时,迭代更新所述深度卷积神经网络模型的初始参数,直至所述损失值达到所述预设的收敛条件时,将收敛之后的所述深度卷积神经网络模型记录为图像识别模型。The second convergence unit is configured to iteratively update the initial parameters of the deep convolutional neural network model when the loss value does not reach the preset convergence condition, until the loss value reaches the preset convergence condition, The deep convolutional neural network model after convergence is recorded as an image recognition model.
在一实施例中,所述获取模块13还包括:In an embodiment, the acquiring module 13 further includes:
第一处理单元,用于通过所述图像识别模型对所述视频帧图像进行二值化处理,得到灰度图像;The first processing unit is configured to perform binarization processing on the video frame image through the image recognition model to obtain a grayscale image;
第二处理单元,用于通过Candy算法对所述灰度图像进行边缘检测处理,得到边缘图像;The second processing unit is configured to perform edge detection processing on the grayscale image by using the Candy algorithm to obtain an edge image;
第三处理单元,用于根据跟踪准则和结点准则,对所述边缘图像进行轮廓跟踪处理,得到轮廓图像;The third processing unit is configured to perform contour tracking processing on the edge image according to the tracking criterion and the node criterion to obtain a contour image;
分析单元,用于对所述轮廓图像进行分析,截取出包含车牌的所述车牌区域、包含人脸的所述人脸区域和包含人体姿势相关的所述人体特征区域;An analysis unit, configured to analyze the contour image, and cut out the license plate area containing the license plate, the face area containing the human face, and the human body feature area related to the posture of the human body;
第一提取单元,用于通过字符分割算法和字符识别方法,所述图像识别模型提取所述车牌区域的所述文本特征,得到所述车牌号结果;The first extraction unit is configured to extract the text features of the license plate area by using a character segmentation algorithm and a character recognition method, and the image recognition model to obtain the license plate number result;
第二提取单元,用于通过YOLO算法,所述图像识别模型提取所述人脸区域的所述人脸特征,得到所述人脸结果;The second extraction unit is configured to use the YOLO algorithm and the image recognition model to extract the facial features of the facial region to obtain the facial result;
第三输出单元,用于通过链码曲率算法,所述图像识别模型识别出所述人体特征区域的肩宽尺度、脖子尺度和胸宽尺度,根据所述肩宽尺度、所述脖子尺度和所述胸宽尺度提取出所述人体姿势特征,得到所述人体特征结果。The third output unit is configured to use the chain code curvature algorithm to identify the shoulder width scale, neck scale and chest width scale of the human body feature area by the image recognition model. The human body posture feature is extracted by the breast width scale, and the human body feature result is obtained.
在一实施例中,所述识别模块14包括:In an embodiment, the identification module 14 includes:
第四确定单元,用于将所述视频帧图像的所述车牌号结果、所述人脸结果和所述人体特征结果确定为所述视频帧图像的识别集合;A fourth determining unit, configured to determine the license plate number result, the face result, and the human body feature result of the video frame image as the recognition set of the video frame image;
第五确定单元,用于若所述识别集合中存在与所述车牌号、所述人脸图像和所述人体特征图之中的任一个相匹配,确定所述视频帧图像对应的识别结果为与所述车牌号、所述人脸图像和所述人体特征图相关;The fifth determining unit is configured to determine that the recognition result corresponding to the video frame image is if there is a match with any one of the license plate number, the face image, and the human body feature map in the recognition set Related to the license plate number, the face image, and the human body feature map;
第六确定单元,用于若所述识别集合中与所述车牌号、所述人脸图像和所述人体特征图均不匹配,确定所述视频帧图像对应的识别结果为与所述车牌号、所述人脸图像和所述人体特征图非相关。The sixth determining unit is configured to determine that the recognition result corresponding to the video frame image is the same as the license plate number if the recognition set does not match the license plate number, the face image, and the human body feature map. , The face image and the human body feature map are not related.
在一实施例中,所述提取模块15包括:In an embodiment, the extraction module 15 includes:
第六获取单元,用于获取所述原始视频;The sixth acquiring unit is configured to acquire the original video;
第七获取单元,用于按照时间顺序,从所述原始视频中获取所有所述起点帧图像,从所述原始视频中获取所有所述终点帧图像;A seventh acquiring unit, configured to acquire all the start point frame images from the original video, and acquire all the end point frame images from the original video in chronological order;
第七确定单元,用于若所述原始视频中不存在所述起点帧图像和所述终点帧图像,确定所述原始视频为无匹配片段;A seventh determining unit, configured to determine that the original video is a non-matching segment if the start point frame image and the end point frame image do not exist in the original video;
第八确定单元,用于若所述原始视频中只存在一个所述起点帧图像且不存在所述终点帧图像,将所述起点帧图像记录为所述原始视频的所述匹配片段;An eighth determining unit, configured to, if there is only one start frame image and no end frame image in the original video, record the start frame image as the matching segment of the original video;
第九确定单元,用于若所述原始视频中只存在一个所述终点帧图像且不存在所述起点帧图像,将所述终点帧图像记录为所述原始视频的所述匹配片段;A ninth determining unit, configured to record the end frame image as the matching segment of the original video if there is only one end frame image in the original video and the start point frame image does not exist;
第十确定单元,用于若所述起点帧图像之后存在与其相邻的所述终点帧图像,获取所述起点帧图像与所述终点帧图像之间的视频片段,并将所述起点帧图像与所述终点帧图像 之间的视频片段记录为所述原始视频的所述匹配片段;The tenth determining unit is configured to obtain a video segment between the start point frame image and the end point frame image if there is the end point frame image adjacent to it after the start point frame image, and compare the start point frame image A video segment between the end frame image and the end frame image is recorded as the matching segment of the original video;
关联单元,用于将所有所述匹配片段与所述原始视频关联的所述唯一标识码进行关联。The associating unit is used for associating all the matching segments with the unique identification code associated with the original video.
关于视频数据处理装置的具体限定可以参见上文中对于视频数据处理方法的限定,在此不再赘述。上述视频数据处理装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。For the specific definition of the video data processing device, please refer to the above definition of the video data processing method, which will not be repeated here. Each module in the above-mentioned video data processing device may be implemented in whole or in part by software, hardware, and a combination thereof. The above-mentioned modules may be embedded in the form of hardware or independent of the processor in the computer equipment, or may be stored in the memory of the computer equipment in the form of software, so that the processor can call and execute the operations corresponding to the above-mentioned modules.
在一个实施例中,提供了一种计算机设备,该计算机设备可以是服务器,其内部结构图可以如图9所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口和数据库。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统、计算机程序和数据库。该内存储器为非易失性存储介质中的操作系统和计算机程序的运行提供环境。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机程序被处理器执行时以实现一种视频数据处理方法。In one embodiment, a computer device is provided. The computer device may be a server, and its internal structure diagram may be as shown in FIG. 9. The computer equipment includes a processor, a memory, a network interface, and a database connected through a system bus. Among them, the processor of the computer device is used to provide calculation and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage medium. The network interface of the computer device is used to communicate with an external terminal through a network connection. The computer program is executed by the processor to realize a video data processing method.
在一个实施例中,提供了一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,处理器执行计算机程序时实现上述实施例中视频数据处理方法。In one embodiment, a computer device is provided, including a memory, a processor, and a computer program stored on the memory and capable of running on the processor. The processor executes the computer program to implement the video data processing method in the foregoing embodiment.
在一个实施例中,提供了一种计算机可读存储介质,其上存储有计算机程序,计算机程序被处理器执行时实现上述实施例中视频数据处理方法。In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored, and the computer program is executed by a processor to implement the video data processing method in the above-mentioned embodiment.
进一步地,所述计算机可读存储介质可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序等;存储数据区可存储根据区块链节点的使用所创建的数据等。Further, the computer-readable storage medium may mainly include a storage program area and a storage data area, where the storage program area may store an operating system, an application program required by at least one function, etc.; the storage data area may store Data created by the use of nodes, etc.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储于一非易失性计算机可读取存储介质中,该计算机程序在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、以及存储器总线动态RAM(RDRAM)等。A person of ordinary skill in the art can understand that all or part of the processes in the above-mentioned embodiment methods can be implemented by instructing relevant hardware through a computer program. The computer program can be stored in a non-volatile computer readable storage. In the medium, when the computer program is executed, it may include the processes of the above-mentioned method embodiments. Wherein, any reference to memory, storage, database, or other media used in the embodiments provided in this application may include non-volatile and/or volatile memory. Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), and electrically programmable ROM (EPROM). Volatile memory may include random access memory (RAM) or external cache memory. As an illustration and not a limitation, RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), synchronous link (Synchlink) DRAM (SLDRAM), and memory bus dynamic RAM ( RDRAM) and so on.
本申请所指区块链是分布式数据存储、点对点传输、共识机制、加密算法等计算机技术的新型应用模式。区块链(Blockchain),本质上是一个去中心化的数据库,是一串使用密码学方法相关联产生的数据块,每一个数据块中包含了一批次网络交易的信息,用于验证其信息的有效性(防伪)和生成下一个区块。区块链可以包括区块链底层平台、平台产品服务层以及应用服务层等。The blockchain referred to in this application is a new application mode of computer technology such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm. Blockchain, essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information for verification. The validity of the information (anti-counterfeiting) and the generation of the next block. The blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,仅以上述各功能单元、模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能单元、模块完成,即将所述装置的内部结构划分成不同的功能单元或模块,以完成以上描述的全部或者部分功能。Those skilled in the art can clearly understand that for the convenience and conciseness of description, only the division of the above functional units and modules is used as an example. In practical applications, the above functions can be allocated to different functional units and modules as required. Module completion, that is, the internal structure of the device is divided into different functional units or modules to complete all or part of the functions described above.

Claims (20)

  1. 一种视频数据处理方法,其中,包括:A video data processing method, which includes:
    接收视频提取指令,获取目标数据集和待处理视频集;所述目标数据集包括车牌号、人脸图像和人体特征图,所述待处理视频集包括若干个原始视频,每个所述原始视频都与一个唯一标识码关联;Receive a video extraction instruction to obtain a target data set and a video set to be processed; the target data set includes a license plate number, a face image, and a human body feature map; the video set to be processed includes a number of original videos, each of the original videos Are all associated with a unique identification code;
    根据预设的抽取规则,从每个所述原始视频中抽取出至少两个视频帧图像;Extract at least two video frame images from each of the original videos according to a preset extraction rule;
    将抽取出的所述视频帧图像输入图像识别模型,所述图像识别模型对所述视频图像经过图像二值化处理、边缘检测处理及轮廓追踪处理后,获得车牌区域、人脸区域和人体特征区域,通过字符分割算法对所述车牌区域进行文本特征的提取,获取所述车牌区域的车牌号结果,同时通过YOLO算法对所述人脸区域进行人脸特征的提取,获取所述人脸区域的人脸结果,并通过链码曲率算法对所述人体特征区域进行人体姿势特征的提取,获取所述人体特征区域的人体特征结果;The extracted video frame image is input into an image recognition model, and the image recognition model performs image binarization processing, edge detection processing, and contour tracking processing on the video image to obtain license plate area, face area, and human body features Region, extract the text features of the license plate area through the character segmentation algorithm, obtain the license plate number result of the license plate area, and extract the facial features of the face area through the YOLO algorithm to obtain the face area And extract the human body posture feature of the human body feature region through a chain code curvature algorithm to obtain the human body feature result of the human body feature region;
    根据所述车牌号结果、所述人脸结果和所述人体特征结果,确定所述视频帧图像对应的识别结果;所述识别结果表征了所述视频帧图像是否包含有所述车牌号、所述人脸图像和所述人体特征图之中的任一个;所述识别结果包括与所述车牌号、所述人脸图像和所述人体特征图相关或非相关;According to the license plate number result, the face result, and the human body feature result, the recognition result corresponding to the video frame image is determined; the recognition result characterizes whether the video frame image contains the license plate number, Any one of the human face image and the human body feature map; the recognition result includes correlation or non-correlation with the license plate number, the human face image, and the human body feature map;
    根据每个所述原始视频中的所述视频帧图像对应的所述识别结果,提取出每个所述原始视频中的多个匹配片段,并将所述匹配片段和与其对应的所述唯一标识码进行关联;所述匹配片段为所述原始视频中的起点帧图像与终点帧图像之间的视频片段;所述起点帧图像是指识别结果为相关且与其相邻的前一个视频帧图像的识别结果为非相关的视频帧图像;所述终点帧图像是指识别结果为非相关且与其相邻的前一个视频帧图像的识别结果为相关的视频帧图像,或指所述原始视频中的结尾的视频帧图像;According to the recognition result corresponding to the video frame image in each of the original videos, multiple matching segments in each of the original videos are extracted, and the matching segment and the unique identifier corresponding to it are extracted Code to associate; the matching segment is the video segment between the start frame image and the end frame image in the original video; the start frame image refers to the image of the previous video frame image that is related and adjacent to the recognition result The recognition result is a non-correlated video frame image; the end frame image means that the recognition result is non-correlated and the recognition result of the previous video frame image adjacent to it is the relevant video frame image, or refers to the video frame image in the original video Video frame image at the end;
    将与相同的所述唯一标识码关联的所述匹配片段按照时间顺序进行拼接,得到视频合成片段,并将所述视频合成片段与相同的所述唯一标识码进行关联;Splicing the matching segments associated with the same unique identification code in chronological order to obtain a video synthesis segment, and associating the video synthesis segment with the same unique identification code;
    按照预设的唯一标识码顺序规则,对所有所述视频合成片段进行排序,将排序后的所有所述视频合成片段确定为所述待处理视频集中与所述目标数据集相关的目标视频。According to a preset unique identification code order rule, all the video synthesis fragments are sorted, and all the video synthesis fragments after sorting are determined as target videos in the to-be-processed video set that are related to the target data set.
  2. 如权利要求1所述的视频数据处理方法,其中,所述获取目标数据集之前,包括:5. The video data processing method according to claim 1, wherein before said acquiring the target data set, it comprises:
    接收目标收集指令,获取样本视频;所述样本视频包含一个车牌、一个人脸和一个特定人体姿势;Receive a target collection instruction to obtain a sample video; the sample video includes a license plate, a face, and a specific human pose;
    将所述样本视频拆分成若干个样本图像;Split the sample video into several sample images;
    将所有所述样本图像输入样本采集模型中,所述样本采集模型对所有所述样本图像进行图像采集,截取出所有所述样本图像中包含所述车牌的车牌区域图像,同时截取出所有所述样本图像中包含所述人脸的人脸区域图像,以及提取出所述所有样本图像中包含所述人脸和所述特定人体姿势的人体特征区域图像;Input all the sample images into a sample collection model, the sample collection model performs image collection on all the sample images, cuts out all the images of the license plate area containing the license plate in all the sample images, and cuts out all the images at the same time The sample image contains the face region image of the human face, and extracts the human body feature region images that contain the human face and the specific human posture from all the sample images;
    将所述车牌区域图像输入车牌提取模型中,所述车牌提取模型对所述车牌区域图像进行车牌号识别,获取所述车牌提取模型输出的一个车牌号;Inputting the license plate area image into a license plate extraction model, and the license plate extraction model performs license plate number recognition on the license plate area image to obtain a license plate number output by the license plate extraction model;
    将所有所述人脸区域图像输入人脸提取模型中,所述人脸提取模型对所有所述人脸区域图像进行筛选,获取所述人脸提取模型筛选出一个人脸图像;Inputting all the face region images into a face extraction model, and the face extraction model filters all the face region images, and obtains the face extraction model to select a face image;
    将所有所述人体特征区域图像输入人体特征提取模型中,所述人体特征提取模型对所有所述人体特征区域图像进行筛选,获取所述人体特征提取模型筛选出的一个人体特征图;Inputting all the human body feature region images into a human body feature extraction model, and the human body feature extraction model screens all the human body feature region images to obtain a human body feature map selected by the human body feature extraction model;
    将所述车牌号、所述人脸图像和所述人体特征图确定为所述目标数据集。The license plate number, the face image, and the human body feature map are determined as the target data set.
  3. 如权利要求1所述的视频数据处理方法,其中,所述根据预设的抽取规则,从每个所述原始视频中抽取出至少两个视频帧图像,包括:5. The video data processing method according to claim 1, wherein said extracting at least two video frame images from each of said original videos according to a preset extraction rule comprises:
    获取所述原始视频和所述抽取规则中的抽取参数;Acquiring the original video and the extraction parameters in the extraction rule;
    将所述原始视频中的开始的视频帧图像确定为开始帧图像;Determining a starting video frame image in the original video as a starting frame image;
    将所述原始视频中的结尾的视频帧图像确定为结尾帧图像;Determining the ending video frame image in the original video as the ending frame image;
    从所述开始帧图像开始,每隔所述抽取参数抽取一个视频帧图像,直到所述结尾帧图像;From the start frame image, extract a video frame image every other extraction parameters until the end frame image;
    将抽取之后的所述视频帧图像确定为过程帧图像;Determining the video frame image after extraction as a process frame image;
    由所述开始帧图像、所述结尾帧图像和所述过程帧图像组成所述原始视频中的所有所述视频帧图像。All the video frame images in the original video are composed of the start frame image, the end frame image, and the process frame image.
  4. 如权利要求1所述的视频数据处理方法,其中,所述将抽取出的所述视频帧图像输入图像识别模型之前,包括:5. The video data processing method according to claim 1, wherein, before inputting the extracted video frame image into an image recognition model, the method comprises:
    获取样本训练图像集;所述样本训练图像集包含若干个样本训练图像,所述样本训练图像包括所有所述车牌区域图像、所有所述人脸样本图像、所有所述人体特征区域图像和若干个负样本图像;每个所述样本训练图像都与一个样本训练标签关联;Obtain a sample training image set; the sample training image set contains a number of sample training images, the sample training images include all the license plate region images, all the face sample images, all the human body feature region images, and several Negative sample images; each of the sample training images is associated with a sample training label;
    将所述样本训练图像输入含有初始参数的深度卷积神经网络模型,通过所述深度卷积神经网络模型对所述样本训练图像提取文本特征、人脸特征和人体姿势特征,所述深度卷积神经网络模型根据提取的所述文本特征、所述人脸特征和所述人体姿势特征输出的训练结果;The sample training image is input into a deep convolutional neural network model containing initial parameters, and text features, facial features, and human posture features are extracted from the sample training image through the deep convolutional neural network model. The deep convolution The training result output by the neural network model according to the extracted text feature, the face feature, and the human posture feature;
    将所述训练结果与所述样本训练标签进行匹配,得到损失值;Matching the training result with the sample training label to obtain a loss value;
    在所述损失值达到预设的收敛条件时,将收敛之后的所述深度卷积神经网络模型记录为图像识别模型;When the loss value reaches a preset convergence condition, recording the converged deep convolutional neural network model as an image recognition model;
    在所述损失值未达到预设的收敛条件时,迭代更新所述深度卷积神经网络模型的初始参数,直至所述损失值达到所述预设的收敛条件时,将收敛之后的所述深度卷积神经网络模型记录为图像识别模型。When the loss value does not reach the preset convergence condition, iteratively update the initial parameters of the deep convolutional neural network model, until the loss value reaches the preset convergence condition, the depth after convergence The convolutional neural network model is recorded as an image recognition model.
  5. 如权利要求1所述的视频数据处理方法,其中,所述将抽取出的所述视频帧图像输入图像识别模型,所述图像识别模型对所述视频图像经过图像二值化处理、边缘检测处理及轮廓追踪处理后,获得车牌区域、人脸区域和人体特征区域,通过字符分割算法对所述车牌区域进行文本特征的提取,获取所述车牌区域的车牌号结果,同时通过YOLO算法对所述人脸区域进行人脸特征的提取,获取所述人脸区域的人脸结果,并通过链码曲率算法对所述人体特征区域进行人体姿势特征的提取,获取所述人体特征区域的人体特征结果,包括:The video data processing method of claim 1, wherein the extracted video frame image is input into an image recognition model, and the image recognition model performs image binarization processing and edge detection processing on the video image After processing and contour tracking, the license plate area, face area, and human body feature area are obtained, and the text feature of the license plate area is extracted through the character segmentation algorithm, and the license plate number result of the license plate area is obtained. At the same time, the YOLO algorithm Extract the facial features of the face area, obtain the face result of the face area, and extract the posture feature of the human body feature area through the chain code curvature algorithm, and obtain the result of the human body feature area of the human body feature area ,include:
    通过所述图像识别模型对所述视频帧图像进行二值化处理,得到灰度图像;Binarize the video frame image by using the image recognition model to obtain a grayscale image;
    通过Candy算法对所述灰度图像进行边缘检测处理,得到边缘图像;Performing edge detection processing on the gray image by using a Candy algorithm to obtain an edge image;
    根据跟踪准则和结点准则,对所述边缘图像进行轮廓跟踪处理,得到轮廓图像;Performing contour tracking processing on the edge image according to the tracking criterion and the node criterion to obtain a contour image;
    对所述轮廓图像进行分析,截取出包含车牌的所述车牌区域、包含人脸的所述人脸区域和包含人体姿势相关的所述人体特征区域;Analyze the contour image, and cut out the license plate area containing the license plate, the face area containing the human face, and the human body feature area related to the posture of the human body;
    通过字符分割算法和字符识别方法,所述图像识别模型提取所述车牌区域的所述文本特征,得到所述车牌号结果;Using a character segmentation algorithm and a character recognition method, the image recognition model extracts the text features of the license plate area to obtain the license plate number result;
    通过YOLO算法,所述图像识别模型提取所述人脸区域的所述人脸特征,得到所述人脸结果;Using the YOLO algorithm, the image recognition model extracts the face features of the face area to obtain the face result;
    通过链码曲率算法,所述图像识别模型识别出所述人体特征区域的肩宽尺度、脖子尺度和胸宽尺度,根据所述肩宽尺度、所述脖子尺度和所述胸宽尺度提取出所述人体姿势特征,得到所述人体特征结果。Through the chain code curvature algorithm, the image recognition model recognizes the shoulder width scale, the neck scale and the chest width scale of the human body feature area, and extracts all the data according to the shoulder width scale, the neck scale and the chest width scale. The posture characteristics of the human body are described, and the result of the human body characteristics is obtained.
  6. 如权利要求1所述的视频数据处理方法,其中,所述根据所述车牌号结果、所述人脸结果和所述人体特征结果,确定所述视频帧图像对应的识别结果,包括:5. The video data processing method according to claim 1, wherein the determining the recognition result corresponding to the video frame image according to the result of the license plate number, the result of the face and the result of the human body feature comprises:
    将所述视频帧图像的所述车牌号结果、所述人脸结果和所述人体特征结果确定为所述视频帧图像的识别集合;Determining the license plate number result, the face result, and the human body feature result of the video frame image as the recognition set of the video frame image;
    若所述识别集合中存在与所述车牌号、所述人脸图像和所述人体特征图之中的任一个相匹配,确定所述视频帧图像对应的识别结果为与所述车牌号、所述人脸图像和所述人体特征图相关;If there is a match between the license plate number, the face image, and the human body feature map in the recognition set, it is determined that the recognition result corresponding to the video frame image is the same as the license plate number, the The face image is related to the feature map of the human body;
    若所述识别集合中与所述车牌号、所述人脸图像和所述人体特征图均不匹配,确定所述视频帧图像对应的识别结果为与所述车牌号、所述人脸图像和所述人体特征图非相关。If the recognition set does not match the license plate number, the face image, and the human body feature map, it is determined that the recognition result corresponding to the video frame image is the same as the license plate number, the face image, and The human body feature map is not relevant.
  7. 如权利要求1所述的视频数据处理方法,其中,所述根据每个所述原始视频中的所述视频帧图像对应的所述识别结果,提取出每个所述原始视频中的多个匹配片段,并将所述匹配片段与其对应的原始视频关联的所述唯一标识码进行关联,包括:5. The video data processing method according to claim 1, wherein said extracting a plurality of matches in each of the original videos according to the recognition result corresponding to the video frame image in each of the original videos Segment, and associating the matched segment with the unique identification code associated with the corresponding original video, including:
    获取所述原始视频;Obtaining the original video;
    按照时间顺序,从所述原始视频中获取所有所述起点帧图像,从所述原始视频中获取所有所述终点帧图像;Acquiring all the starting point frame images from the original video and acquiring all the ending point frame images from the original video in chronological order;
    若所述原始视频中不存在所述起点帧图像和所述终点帧图像,确定所述原始视频为无匹配片段;If the starting point frame image and the ending point frame image do not exist in the original video, determining that the original video is a non-matching segment;
    若所述原始视频中只存在一个所述起点帧图像且不存在所述终点帧图像,将所述起点帧图像记录为所述原始视频的所述匹配片段;If there is only one start frame image and no end frame image in the original video, recording the start frame image as the matching segment of the original video;
    若所述原始视频中只存在一个所述终点帧图像且不存在所述起点帧图像,将所述终点帧图像记录为所述原始视频的所述匹配片段;If there is only one end frame image in the original video and the start point frame image does not exist, recording the end frame image as the matching segment of the original video;
    若所述起点帧图像之后存在与其相邻的所述终点帧图像,获取所述起点帧图像与所述终点帧图像之间的视频片段,并将所述起点帧图像与所述终点帧图像之间的视频片段记录为所述原始视频的所述匹配片段;If there is the end frame image adjacent to it after the start frame image, the video segment between the start frame image and the end frame image is obtained, and the start frame image and the end frame image are separated The video segment between the two is recorded as the matching segment of the original video;
    将所有所述匹配片段与所述原始视频关联的所述唯一标识码进行关联。Associating all the matching segments with the unique identification code associated with the original video.
  8. 一种视频数据处理装置,其中,包括:A video data processing device, which includes:
    接收模块,用于接收视频提取指令,获取目标数据集和待处理视频集;所述目标数据集包括车牌号、人脸图像和人体特征图,所述待处理视频集包括若干个原始视频,每个所述原始视频都与一个唯一标识码关联;The receiving module is used to receive a video extraction instruction to obtain a target data set and a to-be-processed video set; the target data set includes a license plate number, a face image, and a human body feature map, and the to-be-processed video set includes a number of original videos, each Each of the original videos is associated with a unique identification code;
    抽取模块,用于根据预设的抽取规则,从每个所述原始视频中抽取出至少两个视频帧图像;An extraction module, configured to extract at least two video frame images from each of the original videos according to a preset extraction rule;
    获取模块,用于将抽取出的所述视频帧图像输入图像识别模型,所述图像识别模型对所述视频图像经过图像二值化处理、边缘检测处理及轮廓追踪处理后,获得车牌区域、人脸区域和人体特征区域,通过字符分割算法对所述车牌区域进行文本特征的提取,获取所述车牌区域的车牌号结果,同时通过YOLO算法对所述人脸区域进行人脸特征的提取,获取所述人脸区域的人脸结果,并通过链码曲率算法对所述人体特征区域进行人体姿势特征的提取,获取所述人体特征区域的人体特征结果;The acquisition module is used to input the extracted video frame image into an image recognition model, and the image recognition model obtains the license plate area and the person after image binarization processing, edge detection processing, and contour tracking processing are performed on the video image. Face area and human body feature area, extract the text feature of the license plate area through the character segmentation algorithm, obtain the license plate number result of the license plate area, and extract the facial features of the face area through the YOLO algorithm to obtain Extracting the human body posture feature of the human body feature region by using a chain code curvature algorithm to obtain the human body feature result of the human body feature region;
    识别模块,用于根据所述车牌号结果、所述人脸结果和所述人体特征结果,确定所述视频帧图像对应的识别结果;所述识别结果表征了所述视频帧图像是否包含有所述车牌号、所述人脸图像和所述人体特征图之中的任一个;所述识别结果包括与所述车牌号、所述人脸图像和所述人体特征图相关或非相关;The recognition module is used to determine the recognition result corresponding to the video frame image according to the result of the license plate number, the face result and the result of the human body feature; the recognition result characterizes whether the video frame image contains Any one of the license plate number, the face image, and the human body feature map; the recognition result includes whether it is related or unrelated to the license plate number, the face image, and the human body feature map;
    提取模块,用于根据每个所述原始视频中的所述视频帧图像对应的所述识别结果,提取出每个所述原始视频中的多个匹配片段,并将所述匹配片段和与其对应的所述唯一标识码进行关联;所述匹配片段为所述原始视频中的起点帧图像与终点帧图像之间的视频片段;所述起点帧图像是指识别结果为相关且与其相邻的前一个视频帧图像的识别结果为非相关的视频帧图像;所述终点帧图像是指识别结果为非相关且与其相邻的前一个视频帧图像的识别结果为相关的视频帧图像,或指所述原始视频中的结尾的视频帧图像;The extraction module is configured to extract multiple matching segments in each original video according to the recognition result corresponding to the video frame image in each original video, and compare the matching segments to the corresponding The unique identification code is associated with the unique identification code; the matching segment is the video segment between the start frame image and the end frame image in the original video; the start frame image refers to the previous recognition result that is related and adjacent to it The recognition result of a video frame image is a non-correlated video frame image; the end frame image refers to the recognition result is non-correlated and the recognition result of the previous video frame image adjacent to it is a related video frame image, or refers to the relevant video frame image. The video frame image at the end of the original video;
    合并模块,用于将与相同的所述唯一标识码关联的所述匹配片段按照时间顺序进行拼接,得到视频合成片段,并将所述视频合成片段与相同的所述唯一标识码进行关联;A merging module, configured to splice the matching segments associated with the same unique identification code in chronological order to obtain a video synthesis segment, and associate the video synthesis segment with the same unique identification code;
    确定模块,用于按照预设的唯一标识码顺序规则,对所有所述视频合成片段进行排序,将排序后的所有所述视频合成片段确定为所述待处理视频集中与所述目标数据集相关的目标视频。The determining module is configured to sort all the video synthesis fragments according to a preset unique identification code order rule, and determine all the video synthesis fragments after sorting as the to-be-processed video set related to the target data set The target video.
  9. 一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,其中,所述处理器执行所述计算机程序时实现如下步骤:A computer device includes a memory, a processor, and a computer program stored in the memory and running on the processor, wherein the processor implements the following steps when the processor executes the computer program:
    接收视频提取指令,获取目标数据集和待处理视频集;所述目标数据集包括车牌号、人脸图像和人体特征图,所述待处理视频集包括若干个原始视频,每个所述原始视频都与一个唯一标识码关联;Receive a video extraction instruction to obtain a target data set and a video set to be processed; the target data set includes a license plate number, a face image, and a human body feature map; the video set to be processed includes a number of original videos, each of the original videos Are all associated with a unique identification code;
    根据预设的抽取规则,从每个所述原始视频中抽取出至少两个视频帧图像;Extract at least two video frame images from each of the original videos according to a preset extraction rule;
    将抽取出的所述视频帧图像输入图像识别模型,所述图像识别模型对所述视频图像经过图像二值化处理、边缘检测处理及轮廓追踪处理后,获得车牌区域、人脸区域和人体特征区域,通过字符分割算法对所述车牌区域进行文本特征的提取,获取所述车牌区域的车牌号结果,同时通过YOLO算法对所述人脸区域进行人脸特征的提取,获取所述人脸区域的人脸结果,并通过链码曲率算法对所述人体特征区域进行人体姿势特征的提取,获取所述人体特征区域的人体特征结果;The extracted video frame image is input into an image recognition model, and the image recognition model performs image binarization processing, edge detection processing, and contour tracking processing on the video image to obtain license plate area, face area, and human body features Region, extract the text features of the license plate area through the character segmentation algorithm, obtain the license plate number result of the license plate area, and extract the facial features of the face area through the YOLO algorithm to obtain the face area And extract the human body posture feature of the human body feature region through a chain code curvature algorithm to obtain the human body feature result of the human body feature region;
    根据所述车牌号结果、所述人脸结果和所述人体特征结果,确定所述视频帧图像对应的识别结果;所述识别结果表征了所述视频帧图像是否包含有所述车牌号、所述人脸图像和所述人体特征图之中的任一个;所述识别结果包括与所述车牌号、所述人脸图像和所述人体特征图相关或非相关;According to the license plate number result, the face result, and the human body feature result, the recognition result corresponding to the video frame image is determined; the recognition result characterizes whether the video frame image contains the license plate number, Any one of the human face image and the human body feature map; the recognition result includes correlation or non-correlation with the license plate number, the human face image, and the human body feature map;
    根据每个所述原始视频中的所述视频帧图像对应的所述识别结果,提取出每个所述原始视频中的多个匹配片段,并将所述匹配片段和与其对应的所述唯一标识码进行关联;所述匹配片段为所述原始视频中的起点帧图像与终点帧图像之间的视频片段;所述起点帧图像是指识别结果为相关且与其相邻的前一个视频帧图像的识别结果为非相关的视频帧图像;所述终点帧图像是指识别结果为非相关且与其相邻的前一个视频帧图像的识别结果为相关的视频帧图像,或指所述原始视频中的结尾的视频帧图像;According to the recognition result corresponding to the video frame image in each of the original videos, multiple matching segments in each of the original videos are extracted, and the matching segment and the unique identifier corresponding to it are extracted Code to associate; the matching segment is the video segment between the start frame image and the end frame image in the original video; the start frame image refers to the image of the previous video frame image that is related and adjacent to the recognition result The recognition result is a non-correlated video frame image; the end frame image means that the recognition result is non-correlated and the recognition result of the previous video frame image adjacent to it is the relevant video frame image, or refers to the video frame image in the original video Video frame image at the end;
    将与相同的所述唯一标识码关联的所述匹配片段按照时间顺序进行拼接,得到视频合成片段,并将所述视频合成片段与相同的所述唯一标识码进行关联;Splicing the matching segments associated with the same unique identification code in chronological order to obtain a video synthesis segment, and associating the video synthesis segment with the same unique identification code;
    按照预设的唯一标识码顺序规则,对所有所述视频合成片段进行排序,将排序后的所有所述视频合成片段确定为所述待处理视频集中与所述目标数据集相关的目标视频。According to a preset unique identification code order rule, all the video synthesis fragments are sorted, and all the video synthesis fragments after sorting are determined as target videos in the to-be-processed video set that are related to the target data set.
  10. 如权利要求9所述的计算机设备,其中,所述获取目标数据集之前,包括:9. The computer device according to claim 9, wherein before said acquiring the target data set, it comprises:
    接收目标收集指令,获取样本视频;所述样本视频包含一个车牌、一个人脸和一个特定人体姿势;Receive a target collection instruction to obtain a sample video; the sample video includes a license plate, a face, and a specific human pose;
    将所述样本视频拆分成若干个样本图像;Split the sample video into several sample images;
    将所有所述样本图像输入样本采集模型中,所述样本采集模型对所有所述样本图像进行图像采集,截取出所有所述样本图像中包含所述车牌的车牌区域图像,同时截取出所有所述样本图像中包含所述人脸的人脸区域图像,以及提取出所述所有样本图像中包含所述人脸和所述特定人体姿势的人体特征区域图像;Input all the sample images into a sample collection model, the sample collection model performs image collection on all the sample images, cuts out all the images of the license plate area containing the license plate in all the sample images, and cuts out all the images at the same time The sample image contains the face region image of the human face, and extracts the human body feature region images that contain the human face and the specific human posture from all the sample images;
    将所述车牌区域图像输入车牌提取模型中,所述车牌提取模型对所述车牌区域图像进行车牌号识别,获取所述车牌提取模型输出的一个车牌号;Inputting the license plate area image into a license plate extraction model, and the license plate extraction model performs license plate number recognition on the license plate area image to obtain a license plate number output by the license plate extraction model;
    将所有所述人脸区域图像输入人脸提取模型中,所述人脸提取模型对所有所述人脸区域图像进行筛选,获取所述人脸提取模型筛选出一个人脸图像;Inputting all the face region images into a face extraction model, and the face extraction model filters all the face region images, and obtains the face extraction model to select a face image;
    将所有所述人体特征区域图像输入人体特征提取模型中,所述人体特征提取模型对所有所述人体特征区域图像进行筛选,获取所述人体特征提取模型筛选出的一个人体特征图;Inputting all the human body feature region images into a human body feature extraction model, and the human body feature extraction model screens all the human body feature region images to obtain a human body feature map selected by the human body feature extraction model;
    将所述车牌号、所述人脸图像和所述人体特征图确定为所述目标数据集。The license plate number, the face image, and the human body feature map are determined as the target data set.
  11. 如权利要求9所述的计算机设备,其中,所述根据预设的抽取规则,从每个所述 原始视频中抽取出至少两个视频帧图像,包括:8. The computer device according to claim 9, wherein said extracting at least two video frame images from each of said original videos according to a preset extraction rule comprises:
    获取所述原始视频和所述抽取规则中的抽取参数;Acquiring the original video and the extraction parameters in the extraction rule;
    将所述原始视频中的开始的视频帧图像确定为开始帧图像;Determining a starting video frame image in the original video as a starting frame image;
    将所述原始视频中的结尾的视频帧图像确定为结尾帧图像;Determining the ending video frame image in the original video as the ending frame image;
    从所述开始帧图像开始,每隔所述抽取参数抽取一个视频帧图像,直到所述结尾帧图像;From the start frame image, extract a video frame image every other extraction parameters until the end frame image;
    将抽取之后的所述视频帧图像确定为过程帧图像;Determining the video frame image after extraction as a process frame image;
    由所述开始帧图像、所述结尾帧图像和所述过程帧图像组成所述原始视频中的所有所述视频帧图像。All the video frame images in the original video are composed of the start frame image, the end frame image, and the process frame image.
  12. 如权利要求9所述的计算机设备,其中,所述将抽取出的所述视频帧图像输入图像识别模型之前,包括:9. The computer device according to claim 9, wherein, before inputting the extracted video frame image into an image recognition model, the method comprises:
    获取样本训练图像集;所述样本训练图像集包含若干个样本训练图像,所述样本训练图像包括所有所述车牌区域图像、所有所述人脸样本图像、所有所述人体特征区域图像和若干个负样本图像;每个所述样本训练图像都与一个样本训练标签关联;Obtain a sample training image set; the sample training image set contains a number of sample training images, the sample training images include all the license plate region images, all the face sample images, all the human body feature region images, and several Negative sample images; each of the sample training images is associated with a sample training label;
    将所述样本训练图像输入含有初始参数的深度卷积神经网络模型,通过所述深度卷积神经网络模型对所述样本训练图像提取文本特征、人脸特征和人体姿势特征,所述深度卷积神经网络模型根据提取的所述文本特征、所述人脸特征和所述人体姿势特征输出的训练结果;The sample training image is input into a deep convolutional neural network model containing initial parameters, and text features, facial features, and human posture features are extracted from the sample training image through the deep convolutional neural network model. The deep convolution The training result output by the neural network model according to the extracted text feature, the face feature, and the human posture feature;
    将所述训练结果与所述样本训练标签进行匹配,得到损失值;Matching the training result with the sample training label to obtain a loss value;
    在所述损失值达到预设的收敛条件时,将收敛之后的所述深度卷积神经网络模型记录为图像识别模型;When the loss value reaches a preset convergence condition, recording the converged deep convolutional neural network model as an image recognition model;
    在所述损失值未达到预设的收敛条件时,迭代更新所述深度卷积神经网络模型的初始参数,直至所述损失值达到所述预设的收敛条件时,将收敛之后的所述深度卷积神经网络模型记录为图像识别模型。When the loss value does not reach the preset convergence condition, iteratively update the initial parameters of the deep convolutional neural network model, until the loss value reaches the preset convergence condition, the depth after convergence The convolutional neural network model is recorded as an image recognition model.
  13. 如权利要求9所述的计算机设备,其中,所述将抽取出的所述视频帧图像输入图像识别模型,所述图像识别模型对所述视频图像经过图像二值化处理、边缘检测处理及轮廓追踪处理后,获得车牌区域、人脸区域和人体特征区域,通过字符分割算法对所述车牌区域进行文本特征的提取,获取所述车牌区域的车牌号结果,同时通过YOLO算法对所述人脸区域进行人脸特征的提取,获取所述人脸区域的人脸结果,并通过链码曲率算法对所述人体特征区域进行人体姿势特征的提取,获取所述人体特征区域的人体特征结果,包括:The computer device of claim 9, wherein the extracted video frame image is input into an image recognition model, and the image recognition model performs image binarization processing, edge detection processing, and contour processing on the video image. After the tracking process, the license plate area, the face area and the human body feature area are obtained, the text feature of the license plate area is extracted through the character segmentation algorithm, the license plate number result of the license plate area is obtained, and the face is analyzed by the YOLO algorithm The region performs facial feature extraction, obtains the face result of the face region, and extracts the human body posture feature of the human body feature region through the chain code curvature algorithm, and obtains the human body feature result of the human body feature region, including :
    通过所述图像识别模型对所述视频帧图像进行二值化处理,得到灰度图像;Binarize the video frame image by using the image recognition model to obtain a grayscale image;
    通过Candy算法对所述灰度图像进行边缘检测处理,得到边缘图像;Performing edge detection processing on the gray image by using a Candy algorithm to obtain an edge image;
    根据跟踪准则和结点准则,对所述边缘图像进行轮廓跟踪处理,得到轮廓图像;Performing contour tracking processing on the edge image according to the tracking criterion and the node criterion to obtain a contour image;
    对所述轮廓图像进行分析,截取出包含车牌的所述车牌区域、包含人脸的所述人脸区域和包含人体姿势相关的所述人体特征区域;Analyze the contour image, and cut out the license plate area containing the license plate, the face area containing the human face, and the human body feature area related to the posture of the human body;
    通过字符分割算法和字符识别方法,所述图像识别模型提取所述车牌区域的所述文本特征,得到所述车牌号结果;Using a character segmentation algorithm and a character recognition method, the image recognition model extracts the text features of the license plate area to obtain the license plate number result;
    通过YOLO算法,所述图像识别模型提取所述人脸区域的所述人脸特征,得到所述人脸结果;Using the YOLO algorithm, the image recognition model extracts the face features of the face area to obtain the face result;
    通过链码曲率算法,所述图像识别模型识别出所述人体特征区域的肩宽尺度、脖子尺度和胸宽尺度,根据所述肩宽尺度、所述脖子尺度和所述胸宽尺度提取出所述人体姿势特征,得到所述人体特征结果。Through the chain code curvature algorithm, the image recognition model recognizes the shoulder width scale, the neck scale and the chest width scale of the human body feature area, and extracts all the data according to the shoulder width scale, the neck scale and the chest width scale. The posture characteristics of the human body are described, and the result of the human body characteristics is obtained.
  14. 如权利要求9所述的计算机设备,其中所述根据所述车牌号结果、所述人脸结果和所述人体特征结果,确定所述视频帧图像对应的识别结果,包括:The computer device according to claim 9, wherein the determining the recognition result corresponding to the video frame image according to the result of the license plate number, the result of the face, and the result of the human body feature comprises:
    将所述视频帧图像的所述车牌号结果、所述人脸结果和所述人体特征结果确定为所述 视频帧图像的识别集合;Determining the license plate number result, the face result, and the human body feature result of the video frame image as the recognition set of the video frame image;
    若所述识别集合中存在与所述车牌号、所述人脸图像和所述人体特征图之中的任一个相匹配,确定所述视频帧图像对应的识别结果为与所述车牌号、所述人脸图像和所述人体特征图相关;If there is a match between the license plate number, the face image, and the human body feature map in the recognition set, it is determined that the recognition result corresponding to the video frame image is the same as the license plate number, the The face image is related to the feature map of the human body;
    若所述识别集合中与所述车牌号、所述人脸图像和所述人体特征图均不匹配,确定所述视频帧图像对应的识别结果为与所述车牌号、所述人脸图像和所述人体特征图非相关。If the recognition set does not match the license plate number, the face image, and the human body feature map, it is determined that the recognition result corresponding to the video frame image is the same as the license plate number, the face image, and The human body feature map is not relevant.
  15. 如权利要求9所述的计算机设备,其中,所述根据每个所述原始视频中的所述视频帧图像对应的所述识别结果,提取出每个所述原始视频中的多个匹配片段,并将所述匹配片段与其对应的原始视频关联的所述唯一标识码进行关联,包括:9. The computer device according to claim 9, wherein said extracting a plurality of matching segments in each of the original videos according to the recognition result corresponding to the video frame image in each of the original videos, And associating the matching segment with the unique identification code associated with the corresponding original video includes:
    获取所述原始视频;Obtaining the original video;
    按照时间顺序,从所述原始视频中获取所有所述起点帧图像,从所述原始视频中获取所有所述终点帧图像;Acquiring all the starting point frame images from the original video and acquiring all the ending point frame images from the original video in chronological order;
    若所述原始视频中不存在所述起点帧图像和所述终点帧图像,确定所述原始视频为无匹配片段;If the starting point frame image and the ending point frame image do not exist in the original video, determining that the original video is a non-matching segment;
    若所述原始视频中只存在一个所述起点帧图像且不存在所述终点帧图像,将所述起点帧图像记录为所述原始视频的所述匹配片段;If there is only one start frame image and no end frame image in the original video, recording the start frame image as the matching segment of the original video;
    若所述原始视频中只存在一个所述终点帧图像且不存在所述起点帧图像,将所述终点帧图像记录为所述原始视频的所述匹配片段;If there is only one end frame image in the original video and the start point frame image does not exist, recording the end frame image as the matching segment of the original video;
    若所述起点帧图像之后存在与其相邻的所述终点帧图像,获取所述起点帧图像与所述终点帧图像之间的视频片段,并将所述起点帧图像与所述终点帧图像之间的视频片段记录为所述原始视频的所述匹配片段;If the end frame image adjacent to the start point frame image exists after the start point frame image, a video segment between the start point frame image and the end point frame image is obtained, and the start point frame image and the end point frame image are separated The video segment between the two is recorded as the matching segment of the original video;
    将所有所述匹配片段与所述原始视频关联的所述唯一标识码进行关联。Associating all the matching segments with the unique identification code associated with the original video.
  16. 一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,其中,所述计算机程序被处理器执行时实现如下步骤:A computer-readable storage medium having a computer program stored on the computer-readable storage medium, wherein, when the computer program is executed by a processor, the following steps are implemented:
    接收视频提取指令,获取目标数据集和待处理视频集;所述目标数据集包括车牌号、人脸图像和人体特征图,所述待处理视频集包括若干个原始视频,每个所述原始视频都与一个唯一标识码关联;Receive a video extraction instruction to obtain a target data set and a video set to be processed; the target data set includes a license plate number, a face image, and a human body feature map; the video set to be processed includes a number of original videos, each of the original videos Are all associated with a unique identification code;
    根据预设的抽取规则,从每个所述原始视频中抽取出至少两个视频帧图像;Extract at least two video frame images from each of the original videos according to a preset extraction rule;
    将抽取出的所述视频帧图像输入图像识别模型,所述图像识别模型对所述视频图像经过图像二值化处理、边缘检测处理及轮廓追踪处理后,获得车牌区域、人脸区域和人体特征区域,通过字符分割算法对所述车牌区域进行文本特征的提取,获取所述车牌区域的车牌号结果,同时通过YOLO算法对所述人脸区域进行人脸特征的提取,获取所述人脸区域的人脸结果,并通过链码曲率算法对所述人体特征区域进行人体姿势特征的提取,获取所述人体特征区域的人体特征结果;The extracted video frame image is input into an image recognition model, and the image recognition model performs image binarization processing, edge detection processing, and contour tracking processing on the video image to obtain license plate area, face area, and human body features Region, extract the text features of the license plate area through the character segmentation algorithm, obtain the license plate number result of the license plate area, and extract the facial features of the face area through the YOLO algorithm to obtain the face area And extract the human body posture feature of the human body feature region through a chain code curvature algorithm to obtain the human body feature result of the human body feature region;
    根据所述车牌号结果、所述人脸结果和所述人体特征结果,确定所述视频帧图像对应的识别结果;所述识别结果表征了所述视频帧图像是否包含有所述车牌号、所述人脸图像和所述人体特征图之中的任一个;所述识别结果包括与所述车牌号、所述人脸图像和所述人体特征图相关或非相关;According to the license plate number result, the face result, and the human body feature result, the recognition result corresponding to the video frame image is determined; the recognition result characterizes whether the video frame image contains the license plate number, Any one of the human face image and the human body feature map; the recognition result includes correlation or non-correlation with the license plate number, the human face image, and the human body feature map;
    根据每个所述原始视频中的所述视频帧图像对应的所述识别结果,提取出每个所述原始视频中的多个匹配片段,并将所述匹配片段和与其对应的所述唯一标识码进行关联;所述匹配片段为所述原始视频中的起点帧图像与终点帧图像之间的视频片段;所述起点帧图像是指识别结果为相关且与其相邻的前一个视频帧图像的识别结果为非相关的视频帧图像;所述终点帧图像是指识别结果为非相关且与其相邻的前一个视频帧图像的识别结果为相关的视频帧图像,或指所述原始视频中的结尾的视频帧图像;According to the recognition result corresponding to the video frame image in each of the original videos, multiple matching segments in each of the original videos are extracted, and the matching segment and the unique identifier corresponding to it are extracted Code to associate; the matching segment is the video segment between the start frame image and the end frame image in the original video; the start frame image refers to the image of the previous video frame image that is related and adjacent to the recognition result The recognition result is a non-correlated video frame image; the end frame image means that the recognition result is non-correlated and the recognition result of the previous video frame image adjacent to it is the relevant video frame image, or refers to the video frame image in the original video Video frame image at the end;
    将与相同的所述唯一标识码关联的所述匹配片段按照时间顺序进行拼接,得到视频合成片段,并将所述视频合成片段与相同的所述唯一标识码进行关联;Splicing the matching segments associated with the same unique identification code in chronological order to obtain a video synthesis segment, and associating the video synthesis segment with the same unique identification code;
    按照预设的唯一标识码顺序规则,对所有所述视频合成片段进行排序,将排序后的所有所述视频合成片段确定为所述待处理视频集中与所述目标数据集相关的目标视频。According to a preset unique identification code order rule, all the video synthesis fragments are sorted, and all the video synthesis fragments after sorting are determined as target videos in the to-be-processed video set that are related to the target data set.
  17. 如权利要求16所述的计算机可读存储介质,其中,所述获取目标数据集之前,包括:15. The computer-readable storage medium according to claim 16, wherein before said acquiring the target data set, it comprises:
    接收目标收集指令,获取样本视频;所述样本视频包含一个车牌、一个人脸和一个特定人体姿势;Receive a target collection instruction to obtain a sample video; the sample video includes a license plate, a face, and a specific human pose;
    将所述样本视频拆分成若干个样本图像;Split the sample video into several sample images;
    将所有所述样本图像输入样本采集模型中,所述样本采集模型对所有所述样本图像进行图像采集,截取出所有所述样本图像中包含所述车牌的车牌区域图像,同时截取出所有所述样本图像中包含所述人脸的人脸区域图像,以及提取出所述所有样本图像中包含所述人脸和所述特定人体姿势的人体特征区域图像;Input all the sample images into a sample collection model, the sample collection model performs image collection on all the sample images, cuts out all the images of the license plate area containing the license plate in all the sample images, and cuts out all the images at the same time The sample image contains the face region image of the human face, and extracts the human body feature region images that contain the human face and the specific human posture from all the sample images;
    将所述车牌区域图像输入车牌提取模型中,所述车牌提取模型对所述车牌区域图像进行车牌号识别,获取所述车牌提取模型输出的一个车牌号;Inputting the license plate area image into a license plate extraction model, and the license plate extraction model performs license plate number recognition on the license plate area image to obtain a license plate number output by the license plate extraction model;
    将所有所述人脸区域图像输入人脸提取模型中,所述人脸提取模型对所有所述人脸区域图像进行筛选,获取所述人脸提取模型筛选出一个人脸图像;Inputting all the face region images into a face extraction model, and the face extraction model filters all the face region images, and obtains the face extraction model to select a face image;
    将所有所述人体特征区域图像输入人体特征提取模型中,所述人体特征提取模型对所有所述人体特征区域图像进行筛选,获取所述人体特征提取模型筛选出的一个人体特征图;Inputting all the human body feature region images into a human body feature extraction model, and the human body feature extraction model screens all the human body feature region images to obtain a human body feature map selected by the human body feature extraction model;
    将所述车牌号、所述人脸图像和所述人体特征图确定为所述目标数据集。The license plate number, the face image, and the human body feature map are determined as the target data set.
  18. 如权利要求16所述的计算机可读存储介质,其中,所述根据预设的抽取规则,从每个所述原始视频中抽取出至少两个视频帧图像,包括:15. The computer-readable storage medium of claim 16, wherein said extracting at least two video frame images from each of said original videos according to a preset extraction rule comprises:
    获取所述原始视频和所述抽取规则中的抽取参数;Acquiring the original video and the extraction parameters in the extraction rule;
    将所述原始视频中的开始的视频帧图像确定为开始帧图像;Determining a starting video frame image in the original video as a starting frame image;
    将所述原始视频中的结尾的视频帧图像确定为结尾帧图像;Determining the ending video frame image in the original video as the ending frame image;
    从所述开始帧图像开始,每隔所述抽取参数抽取一个视频帧图像,直到所述结尾帧图像;From the start frame image, extract a video frame image every other extraction parameters until the end frame image;
    将抽取之后的所述视频帧图像确定为过程帧图像;Determining the video frame image after extraction as a process frame image;
    由所述开始帧图像、所述结尾帧图像和所述过程帧图像组成所述原始视频中的所有所述视频帧图像。All the video frame images in the original video are composed of the start frame image, the end frame image, and the process frame image.
  19. 如权利要求16所述的计算机可读存储介质,其中,所述将抽取出的所述视频帧图像输入图像识别模型之前,包括:15. The computer-readable storage medium according to claim 16, wherein, before inputting the extracted video frame image into an image recognition model, the method comprises:
    获取样本训练图像集;所述样本训练图像集包含若干个样本训练图像,所述样本训练图像包括所有所述车牌区域图像、所有所述人脸样本图像、所有所述人体特征区域图像和若干个负样本图像;每个所述样本训练图像都与一个样本训练标签关联;Obtain a sample training image set; the sample training image set contains a number of sample training images, the sample training images include all the license plate region images, all the face sample images, all the human body feature region images, and several Negative sample images; each of the sample training images is associated with a sample training label;
    将所述样本训练图像输入含有初始参数的深度卷积神经网络模型,通过所述深度卷积神经网络模型对所述样本训练图像提取文本特征、人脸特征和人体姿势特征,所述深度卷积神经网络模型根据提取的所述文本特征、所述人脸特征和所述人体姿势特征输出的训练结果;The sample training image is input into a deep convolutional neural network model containing initial parameters, and text features, facial features, and human posture features are extracted from the sample training image through the deep convolutional neural network model. The deep convolution The training result output by the neural network model according to the extracted text feature, the face feature, and the human posture feature;
    将所述训练结果与所述样本训练标签进行匹配,得到损失值;Matching the training result with the sample training label to obtain a loss value;
    在所述损失值达到预设的收敛条件时,将收敛之后的所述深度卷积神经网络模型记录为图像识别模型;When the loss value reaches a preset convergence condition, recording the converged deep convolutional neural network model as an image recognition model;
    在所述损失值未达到预设的收敛条件时,迭代更新所述深度卷积神经网络模型的初始参数,直至所述损失值达到所述预设的收敛条件时,将收敛之后的所述深度卷积神经网络模型记录为图像识别模型。When the loss value does not reach the preset convergence condition, iteratively update the initial parameters of the deep convolutional neural network model, until the loss value reaches the preset convergence condition, the depth after convergence The convolutional neural network model is recorded as an image recognition model.
  20. 如权利要求16所述的计算机可读存储介质,其中,所述根据所述车牌号结果、所述人脸结果和所述人体特征结果,确定所述视频帧图像对应的识别结果,包括:15. The computer-readable storage medium according to claim 16, wherein the determining the recognition result corresponding to the video frame image according to the result of the license plate number, the result of the face, and the result of the human body feature comprises:
    将所述视频帧图像的所述车牌号结果、所述人脸结果和所述人体特征结果确定为所述视频帧图像的识别集合;Determining the license plate number result, the face result, and the human body feature result of the video frame image as the recognition set of the video frame image;
    若所述识别集合中存在与所述车牌号、所述人脸图像和所述人体特征图之中的任一个相匹配,确定所述视频帧图像对应的识别结果为与所述车牌号、所述人脸图像和所述人体特征图相关;If there is a match between the license plate number, the face image, and the human body feature map in the recognition set, it is determined that the recognition result corresponding to the video frame image is the same as the license plate number, the The face image is related to the feature map of the human body;
    若所述识别集合中与所述车牌号、所述人脸图像和所述人体特征图均不匹配,确定所述视频帧图像对应的识别结果为与所述车牌号、所述人脸图像和所述人体特征图非相关。If the recognition set does not match the license plate number, the face image, and the human body feature map, it is determined that the recognition result corresponding to the video frame image is the same as the license plate number, the face image, and The human body feature map is not relevant.
PCT/CN2020/099082 2020-04-24 2020-06-30 Video data processing method and apparatus, and computer device and storage medium WO2021212659A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010332455.9A CN111626123A (en) 2020-04-24 2020-04-24 Video data processing method and device, computer equipment and storage medium
CN202010332455.9 2020-04-24

Publications (1)

Publication Number Publication Date
WO2021212659A1 true WO2021212659A1 (en) 2021-10-28

Family

ID=72271775

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/099082 WO2021212659A1 (en) 2020-04-24 2020-06-30 Video data processing method and apparatus, and computer device and storage medium

Country Status (2)

Country Link
CN (1) CN111626123A (en)
WO (1) WO2021212659A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114363660A (en) * 2021-12-24 2022-04-15 腾讯科技(武汉)有限公司 Video collection determining method and device, electronic equipment and storage medium
CN115019008A (en) * 2022-05-30 2022-09-06 深圳市鸿普森科技股份有限公司 Intelligent 3D model design analysis service management platform
CN115037987A (en) * 2022-06-07 2022-09-09 厦门蝉羽网络科技有限公司 Method and system for watching back live video with goods
CN115858854A (en) * 2023-02-28 2023-03-28 北京奇树有鱼文化传媒有限公司 Video data sorting method and device, electronic equipment and storage medium
CN117437505A (en) * 2023-12-18 2024-01-23 杭州任性智能科技有限公司 Training data set generation method and system based on video

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112257674B (en) * 2020-11-17 2022-05-27 珠海大横琴科技发展有限公司 Visual data processing method and device
CN112465691A (en) * 2020-11-25 2021-03-09 北京旷视科技有限公司 Image processing method, image processing device, electronic equipment and computer readable medium
CN112650876A (en) * 2020-12-30 2021-04-13 北京嘀嘀无限科技发展有限公司 Image processing method, image processing apparatus, electronic device, storage medium, and program product
CN113435330A (en) * 2021-06-28 2021-09-24 平安科技(深圳)有限公司 Micro-expression identification method, device, equipment and storage medium based on video
CN114286171B (en) * 2021-08-19 2023-04-07 腾讯科技(深圳)有限公司 Video processing method, device, equipment and storage medium
CN114004757B (en) * 2021-10-14 2024-04-05 大族激光科技产业集团股份有限公司 Method, system, device and storage medium for removing interference in industrial image
CN113688810B (en) * 2021-10-26 2022-03-08 深圳市安软慧视科技有限公司 Target capturing method and system of edge device and related device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180300556A1 (en) * 2017-04-17 2018-10-18 Intel Corporation Person tracking and privacy and acceleration of data using autonomous machines
CN109672936A (en) * 2018-12-26 2019-04-23 上海众源网络有限公司 A kind of the determination method, apparatus and electronic equipment of video evaluations collection
CN110191324A (en) * 2019-06-28 2019-08-30 Oppo广东移动通信有限公司 Image processing method, device, server and storage medium
CN110287778A (en) * 2019-05-15 2019-09-27 北京旷视科技有限公司 A kind of processing method of image, device, terminal and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180300556A1 (en) * 2017-04-17 2018-10-18 Intel Corporation Person tracking and privacy and acceleration of data using autonomous machines
CN109672936A (en) * 2018-12-26 2019-04-23 上海众源网络有限公司 A kind of the determination method, apparatus and electronic equipment of video evaluations collection
CN110287778A (en) * 2019-05-15 2019-09-27 北京旷视科技有限公司 A kind of processing method of image, device, terminal and storage medium
CN110191324A (en) * 2019-06-28 2019-08-30 Oppo广东移动通信有限公司 Image processing method, device, server and storage medium

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114363660A (en) * 2021-12-24 2022-04-15 腾讯科技(武汉)有限公司 Video collection determining method and device, electronic equipment and storage medium
CN114363660B (en) * 2021-12-24 2023-09-08 腾讯科技(武汉)有限公司 Video collection determining method and device, electronic equipment and storage medium
CN115019008A (en) * 2022-05-30 2022-09-06 深圳市鸿普森科技股份有限公司 Intelligent 3D model design analysis service management platform
CN115037987A (en) * 2022-06-07 2022-09-09 厦门蝉羽网络科技有限公司 Method and system for watching back live video with goods
CN115858854A (en) * 2023-02-28 2023-03-28 北京奇树有鱼文化传媒有限公司 Video data sorting method and device, electronic equipment and storage medium
CN115858854B (en) * 2023-02-28 2023-05-26 北京奇树有鱼文化传媒有限公司 Video data sorting method and device, electronic equipment and storage medium
CN117437505A (en) * 2023-12-18 2024-01-23 杭州任性智能科技有限公司 Training data set generation method and system based on video

Also Published As

Publication number Publication date
CN111626123A (en) 2020-09-04

Similar Documents

Publication Publication Date Title
WO2021212659A1 (en) Video data processing method and apparatus, and computer device and storage medium
CN110569721B (en) Recognition model training method, image recognition method, device, equipment and medium
WO2020253629A1 (en) Detection model training method and apparatus, computer device, and storage medium
CN111310624B (en) Occlusion recognition method, occlusion recognition device, computer equipment and storage medium
KR102554724B1 (en) Method for identifying an object in an image and mobile device for practicing the method
Tapia et al. Gender classification from iris images using fusion of uniform local binary patterns
WO2021139324A1 (en) Image recognition method and apparatus, computer-readable storage medium and electronic device
CN111428604B (en) Facial mask recognition method, device, equipment and storage medium
CN111950424B (en) Video data processing method and device, computer and readable storage medium
CN112801057B (en) Image processing method, image processing device, computer equipment and storage medium
WO2019033525A1 (en) Au feature recognition method, device and storage medium
CN105488468A (en) Method and device for positioning target area
CN112016464A (en) Method and device for detecting face shielding, electronic equipment and storage medium
CN106295547A (en) A kind of image comparison method and image comparison device
CN111274926A (en) Image data screening method and device, computer equipment and storage medium
US20230060211A1 (en) System and Method for Tracking Moving Objects by Video Data
CN111783681A (en) Large-scale face library recognition method, system, computer equipment and storage medium
CN110580507B (en) City texture classification and identification method
WO2023279799A1 (en) Object identification method and apparatus, and electronic system
CN113706481A (en) Sperm quality detection method, sperm quality detection device, computer equipment and storage medium
CN115862113A (en) Stranger abnormity identification method, device, equipment and storage medium
CN111539320A (en) Multi-view gait recognition method and system based on mutual learning network strategy
CN113780145A (en) Sperm morphology detection method, sperm morphology detection device, computer equipment and storage medium
JP2013218605A (en) Image recognition device, image recognition method, and program
CN113762031A (en) Image identification method, device, equipment and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20932057

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 17.02.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 20932057

Country of ref document: EP

Kind code of ref document: A1