WO2021212659A1 - Procédé et appareil de traitement de données vidéo, dispositif informatique et support de stockage - Google Patents

Procédé et appareil de traitement de données vidéo, dispositif informatique et support de stockage Download PDF

Info

Publication number
WO2021212659A1
WO2021212659A1 PCT/CN2020/099082 CN2020099082W WO2021212659A1 WO 2021212659 A1 WO2021212659 A1 WO 2021212659A1 CN 2020099082 W CN2020099082 W CN 2020099082W WO 2021212659 A1 WO2021212659 A1 WO 2021212659A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
image
frame image
license plate
human body
Prior art date
Application number
PCT/CN2020/099082
Other languages
English (en)
Chinese (zh)
Inventor
黄小弟
Original Assignee
平安国际智慧城市科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安国际智慧城市科技股份有限公司 filed Critical 平安国际智慧城市科技股份有限公司
Publication of WO2021212659A1 publication Critical patent/WO2021212659A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • G06V20/625License plates

Definitions

  • This application relates to the field of artificial intelligence data processing, and in particular to a video data processing method, device, computer equipment, and storage medium.
  • This application provides a video data processing method, device, computer equipment, and storage medium, which can automatically quickly and accurately cut out video clips related to the target data set from the video set to be processed.
  • This application can be applied to the field of smart transportation In this way, it promotes the construction of smart cities, improves the efficiency and accuracy of recognition, and greatly reduces input costs.
  • a video data processing method including:
  • the target data set includes a license plate number, a face image, and a human body feature map;
  • the video set to be processed includes a number of original videos, each of the original videos Are all associated with a unique identification code;
  • the extracted video frame image is input into an image recognition model, and the image recognition model performs image binarization processing, edge detection processing, and contour tracking processing on the video image to obtain license plate area, face area, and human body features Region, extract the text features of the license plate area through the character segmentation algorithm, obtain the license plate number result of the license plate area, and extract the facial features of the face area through the YOLO algorithm to obtain the face area And extract the human body posture feature of the human body feature region through a chain code curvature algorithm to obtain the human body feature result of the human body feature region;
  • the recognition result corresponding to the video frame image is determined; the recognition result characterizes whether the video frame image contains the license plate number, Any one of the human face image and the human body feature map; the recognition result includes correlation or non-correlation with the license plate number, the human face image, and the human body feature map;
  • the recognition result corresponding to the video frame image in each of the original videos, multiple matching segments in each of the original videos are extracted, and the matching segment and the unique identifier corresponding to it are extracted Code to associate;
  • the matching segment is the video segment between the start frame image and the end frame image in the original video;
  • the start frame image refers to the image of the previous video frame image that is related and adjacent to the recognition result
  • the recognition result is a non-correlated video frame image;
  • the end frame image means that the recognition result is non-correlated and the recognition result of the previous video frame image adjacent to it is the relevant video frame image, or refers to the video frame image in the original video Video frame image at the end;
  • all the video synthesis fragments are sorted, and all the video synthesis fragments after sorting are determined as target videos in the to-be-processed video set that are related to the target data set.
  • a video data processing device including:
  • the receiving module is used to receive a video extraction instruction to obtain a target data set and a to-be-processed video set;
  • the target data set includes a license plate number, a face image, and a human body feature map
  • the to-be-processed video set includes a number of original videos, each Each of the original videos is associated with a unique identification code;
  • An extraction module configured to extract at least two video frame images from each of the original videos according to a preset extraction rule
  • the acquisition module is used to input the extracted video frame image into an image recognition model, and the image recognition model obtains the license plate area and the person after image binarization processing, edge detection processing, and contour tracking processing are performed on the video image.
  • Face area and human body feature area extract the text feature of the license plate area through the character segmentation algorithm, obtain the license plate number result of the license plate area, and extract the facial features of the face area through the YOLO algorithm to obtain Extracting the human body posture feature of the human body feature region by using a chain code curvature algorithm to obtain the human body feature result of the human body feature region;
  • the recognition module is used to determine the recognition result corresponding to the video frame image according to the result of the license plate number, the face result and the result of the human body feature; the recognition result characterizes whether the video frame image contains Any one of the license plate number, the face image, and the human body feature map; the recognition result includes whether it is related or unrelated to the license plate number, the face image, and the human body feature map;
  • the extraction module is configured to extract multiple matching segments in each original video according to the recognition result corresponding to the video frame image in each original video, and compare the matching segments to the corresponding
  • the unique identification code is associated with the unique identification code; the matching segment is the video segment between the start frame image and the end frame image in the original video; the start frame image refers to the previous recognition result that is related and adjacent to it
  • the recognition result of a video frame image is a non-correlated video frame image; the end frame image refers to the recognition result is non-correlated and the recognition result of the previous video frame image adjacent to it is a related video frame image, or refers to the relevant video frame image.
  • a merging module configured to splice the matching segments associated with the same unique identification code in chronological order to obtain a video synthesis segment, and associate the video synthesis segment with the same unique identification code;
  • the determining module is configured to sort all the video synthesis fragments according to a preset unique identification code order rule, and determine all the video synthesis fragments after sorting as the to-be-processed video set related to the target data set The target video.
  • a computer device includes a memory, a processor, and a computer program that is stored in the memory and can run on the processor, and the processor implements the following steps when the processor executes the computer program:
  • the target data set includes a license plate number, a face image, and a human body feature map;
  • the video set to be processed includes a number of original videos, each of the original videos Are all associated with a unique identification code;
  • the extracted video frame image is input into an image recognition model, and the image recognition model performs image binarization processing, edge detection processing, and contour tracking processing on the video image to obtain license plate area, face area, and human body features Region, extract the text features of the license plate area through the character segmentation algorithm, obtain the license plate number result of the license plate area, and extract the facial features of the face area through the YOLO algorithm to obtain the face area And extract the human body posture feature of the human body feature region through a chain code curvature algorithm to obtain the human body feature result of the human body feature region;
  • the recognition result corresponding to the video frame image is determined; the recognition result characterizes whether the video frame image contains the license plate number, Any one of the human face image and the human body feature map; the recognition result includes correlation or non-correlation with the license plate number, the human face image, and the human body feature map;
  • the recognition result corresponding to the video frame image in each of the original videos, multiple matching segments in each of the original videos are extracted, and the matching segment and the unique identifier corresponding to it are extracted Code to associate;
  • the matching segment is the video segment between the start frame image and the end frame image in the original video;
  • the start frame image refers to the image of the previous video frame image that is related and adjacent to the recognition result
  • the recognition result is a non-correlated video frame image;
  • the end frame image means that the recognition result is non-correlated and the recognition result of the previous video frame image adjacent to it is the relevant video frame image, or refers to the video frame image in the original video Video frame image at the end;
  • all the video synthesis fragments are sorted, and all the video synthesis fragments after sorting are determined as target videos in the to-be-processed video set that are related to the target data set.
  • a computer-readable storage medium having a computer program stored on the computer-readable storage medium, wherein, when the computer program is executed by a processor, the following steps are implemented:
  • the target data set includes a license plate number, a face image, and a human body feature map;
  • the video set to be processed includes a number of original videos, each of the original videos Are all associated with a unique identification code;
  • the extracted video frame image is input into an image recognition model, and the image recognition model performs image binarization processing, edge detection processing, and contour tracking processing on the video image to obtain license plate area, face area, and human body features Region, extract the text features of the license plate area through the character segmentation algorithm, obtain the license plate number result of the license plate area, and extract the facial features of the face area through the YOLO algorithm to obtain the face area And extract the human body posture feature of the human body feature region through a chain code curvature algorithm to obtain the human body feature result of the human body feature region;
  • the recognition result corresponding to the video frame image is determined; the recognition result characterizes whether the video frame image contains the license plate number, Any one of the human face image and the human body feature map; the recognition result includes correlation or non-correlation with the license plate number, the human face image, and the human body feature map;
  • the recognition result corresponding to the video frame image in each of the original videos, multiple matching segments in each of the original videos are extracted, and the matching segment and the unique identifier corresponding to it are extracted Code to associate;
  • the matching segment is the video segment between the start frame image and the end frame image in the original video;
  • the start frame image refers to the image of the previous video frame image that is related and adjacent to the recognition result
  • the recognition result is a non-correlated video frame image;
  • the end frame image means that the recognition result is non-correlated and the recognition result of the previous video frame image adjacent to it is the relevant video frame image, or refers to the video frame image in the original video Video frame image at the end;
  • all the video synthesis fragments are sorted, and all the video synthesis fragments after sorting are determined as target videos in the to-be-processed video set that are related to the target data set.
  • This application can automatically quickly and accurately cut out video clips related to the target data set from the video set to be processed, which improves the recognition efficiency and accuracy, and greatly reduces the input cost.
  • FIG. 1 is a schematic diagram of an application environment of a video data processing method in an embodiment of the present application
  • FIG. 2 is a flowchart of a video data processing method in an embodiment of the present application
  • Fig. 3 is a flowchart of step S10 of a video data processing method in an embodiment of the present application
  • step S20 of the video data processing method in an embodiment of the present application is a flowchart of step S20 of the video data processing method in an embodiment of the present application.
  • FIG. 5 is a flowchart of step S30 of the video data processing method in an embodiment of the present application.
  • Fig. 6 is a flowchart of step S30 of a video data processing method in another embodiment of the present application.
  • FIG. 7 is a flowchart of step S40 of the video data processing method in an embodiment of the present application.
  • Fig. 8 is a functional block diagram of a video data processing device in an embodiment of the present application.
  • Fig. 9 is a schematic diagram of a computer device in an embodiment of the present application.
  • the video data processing method provided by this application can be applied in the application environment as shown in Fig. 1, in which the client (computer equipment) communicates with the server through the network.
  • the client includes, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, cameras, and portable wearable devices.
  • the server can be implemented as an independent server or a server cluster composed of multiple servers.
  • a video data processing method is provided, and the technical solution mainly includes the following steps S10-S70:
  • S10 Receive a video extraction instruction, and obtain a target data set and a video set to be processed;
  • the target data set includes a license plate number, a face image, and a human body feature map;
  • the video set to be processed includes a number of original videos, each of which The original videos are all associated with a unique identification code.
  • the video extraction instruction is an instruction triggered after selecting the target data set and the to-be-processed video set
  • the target data set is a data set related to the searched target
  • the target is The person who needs to be searched
  • the target data set includes a license plate number, a face image and a human body feature map
  • the license plate number is the unique license plate number of the vehicle driven by the target
  • the face image is the face of the target
  • the human body feature map is an image of a specific human posture of the target person, for example, the human body feature map is the upper body image of the target person driving a vehicle
  • the to-be-processed video set is to find videos related to the target person
  • the set of videos to be processed includes at least one of the original videos.
  • the original video may be an unprocessed video clip or a clipped video clip, for example, the original video is of a certain intersection on a certain day Surveillance video, or the original video is surveillance video of a certain street from 19:00 to 21:00 on a certain day, etc., each of the original videos is associated with a unique identification code, and the unique identification code is The unique identification code assigned by the original video, and the unique identification code can be set according to requirements.
  • the method before the step 10, that is, before the acquisition of the target data set, the method includes:
  • S101 Receive a target collection instruction, and obtain a sample video; the sample video includes a license plate, a human face, and a specific human posture.
  • the target collection instruction is an instruction that is triggered when the target person needs to collect information
  • the sample video is a short video clip related to the target person
  • the content in the sample video contains a license plate
  • the face of a target person and the specific body posture of a target person which means that the license plate, the face and the specific body posture appear at least once in the sample video.
  • one of the sample images in the sample video is obtained every interval of the splitting parameter, and the sample image is an image selected from the sample videos, and the
  • the splitting parameters can be set according to requirements.
  • the splitting parameters can be set to 30 frames or 25 frames, etc., to split the sample video into multiple sample images.
  • sample collection model performs image collection on all the sample images, intercepts all the images of the license plate area containing the license plate in all the sample images, and intercepts all the images at the same time.
  • the sample image includes the human face region image of the human face, and the human body feature region images including the human face and the specific human posture are extracted from all the sample images.
  • the sample collection model refers to a neural network model that has been trained to identify the license plate area, face area, and human feature area in the image.
  • the image collection is to identify and cut out the license plate contained in the sample image.
  • the network structure of the sample collection model can be implemented according to requirements Setting, for example, the network structure of the sample collection model can be the network structure of the Inception series or the network structure of the VGG series.
  • S104 Input the license plate area image into a license plate extraction model, and the license plate extraction model performs license plate number recognition on the license plate area image, and obtains a license plate number output by the license plate extraction model.
  • the license plate extraction model refers to a neural network model that has been trained to identify the license plate number in the image.
  • the network structure of the license plate extraction model can be set according to requirements, such as the network structure of the license plate extraction model. It can be obtained by migration learning GoogleNet.
  • the license plate extraction model can identify Chinese characters, numbers and letters in the license plate area image, the license plate number is the unique identification code of the vehicle driven by the target, and the license plate number consists of Chinese characters, Composition of numbers and letters.
  • S105 Input all the face region images into a face extraction model, and the face extraction model filters all the face region images, and obtains the face extraction model to filter a face image.
  • the face region images containing eyes, eyebrows, mouth, nose, ears, face profile and having the highest definition are screened out, and the screening method It can be set according to requirements.
  • the filtering method is to obtain the average value of the pixel values of the pixels at the same position in all the face region images, and then combine all the pixels at the same position in the face region images.
  • the average value of the pixel value is recorded as the average value of the pixel corresponding to the pixel, and the difference between each pixel in the face area image and the average of the pixel corresponding to the pixel is taken and the absolute value is taken to obtain the absolute pixel difference , And calculate the sum of the absolute pixel differences of all pixels in the face area image, determine the sum of the absolute pixel differences of all pixels in the face area image as the image difference value, and select The face area image with the smallest correspondence among all the image differences is determined to be the face image, that is, the face area image after screening is determined as the face image.
  • S106 Input all the human body feature region images into a human body feature extraction model, and the human body feature extraction model screens all the human body feature region images to obtain a human body feature map selected by the human body feature extraction model.
  • the human body feature region image with the highest definition including the face and the specific human body posture is filtered from all the human body feature region images through the human body feature extraction model, and the screening method It can be set according to requirements.
  • the filtering method is the sum of the pixel value corresponding to each pixel in the image of the human body feature area and the pixel value corresponding to the surrounding pixels adjacent to the pixel.
  • the difference is determined as the local pixel difference, the sum of all the local pixel differences in each of the human body feature region images is obtained, and the corresponding one with the smallest sum of all the local pixel differences in all the human body feature region images is selected
  • the human body feature region image is determined as the human body feature map, that is, the human body feature region image after screening is determined as the human body feature map.
  • S107 Determine the license plate number, the face image, and the human body feature map as the target data set.
  • the license plate number, the face image, and the human body feature map are determined and recorded as the target data set, that is, the target data set contains the license plate number, face, and specific information related to the target. Information collection of human posture.
  • the sample video is collected through the sample collection model, and the license plate number, face image, and face feature map are obtained from the image after the image collection through the license plate extraction model, face extraction model, and human feature extraction model. It automatically recognizes the license plate number from the sample video and intercepts the target's face and human body feature maps, which improves the accuracy of recognition and the reliability of interception, reduces labor costs, and improves efficiency.
  • S20 Extract at least two video frame images from each of the original videos according to a preset extraction rule.
  • the extraction rules can be set according to requirements.
  • the extraction rules can be set to extract at least two video frame images from the original video on average, or to extract video frames from the frame parameters preset in the original video at every interval.
  • Image, etc. the video frame image is an image corresponding to a frame in the original video.
  • the step S20 that is, extracting at least two video frame images from each of the original videos according to a preset extraction rule, includes:
  • the extraction rule can be set according to requirements, and the purpose of the extraction rule is to extract at least two video frame images from each of the original videos.
  • the extraction rule can be to obtain the original video.
  • the start video frame image and the end video frame image in the original video, and the video frame image at the bisecting center position in the original video the extraction rule may also be to obtain the start video frame image in the original video
  • the video frame image at the end and the end of the original video, and the extraction parameter preset in the middle part of the original video selects one video frame image.
  • the extraction parameter can be set according to requirements, for example, the extraction parameter is 15 frames or 25 frames. , 30 frames and so on.
  • S202 Determine a starting video frame image in the original video as a starting frame image.
  • the video frame image corresponding to the first frame in the original video is determined as the start frame image.
  • S203 Determine the ending video frame image in the original video as the ending frame image.
  • the video frame image corresponding to the last frame in the original video is determined as the end frame image.
  • the extraction parameters can be set according to requirements.
  • the extraction parameters can be set to 25 frames (about 1 second), that is, starting from the start frame image, a video is extracted every 25 frames Frame images, stop until the end frame image is less than the 25-frame interval.
  • the process frame image is the video frame image after extraction except for the start frame image and the end frame image in the original video.
  • All the video frame images in the original video are composed of the start frame image, the end frame image, and the process frame image.
  • the start frame image, the end frame image, and all the process frame images are determined as all the video frame images in the original video.
  • the image recognition model includes training of the image binarization processing, the edge detection processing, the contour tracking processing, the character segmentation algorithm, the YOLO algorithm, and the chain code curvature algorithm
  • the completed neural network model the image recognition model can use the image binarization processing, the edge detection processing and the contour tracking processing to identify the license plate area and the face area in the video image
  • the human body feature area the license plate number result of the license plate area can be recognized by the character segmentation algorithm
  • the face result of the face area can be recognized by the YOLO algorithm
  • the chain code curvature algorithm can identify the human body feature result of the human body feature region.
  • the YOLO (You Only Look Once) algorithm is an algorithm that uses a CNN (Convolutional Neural Networks, convolutional neural network) operation to directly predict the categories and regions of different targets.
  • CNN Convolutional Neural Networks, convolutional neural network
  • the method before step S30, that is, before inputting the extracted video frame image into an image recognition model, the method includes:
  • sample training image set includes several sample training images, and the sample training images include all the license plate region images, all the face sample images, all the human feature region images, and A number of negative sample images; each of the sample training images is associated with a sample training label.
  • the sample training image set is a collection of the sample training images
  • the sample training images include all the license plate area images, all the face area images, all the human feature area images, and at least One of the negative sample images, the negative sample images are collected images that do not contain the license plate, the face, and the specific human posture, and the sample training images are all associated with one of the sample training tags, so
  • the sample training label can be set according to requirements.
  • the sample training label can be set to include relevant and non-relevant, or set to include relevant license plate, relevant face, relevant human body feature, and non-relevant, and so on.
  • the text features are features related to the color, shape, and text of the license plate
  • the facial features are features related to the target's eyes, eyebrows, mouth, nose, ears, and face profile
  • the human body posture Features are features related to the target’s head, face, arms, shoulders, etc.
  • the training result represents whether the sample training image contains the license plate number, the face image, and the human feature map.
  • the deep convolutional neural network model extracts the text feature from all the license plate region images, and outputs the training result corresponding to the license plate region image according to the extracted text features
  • the deep convolution The neural network model extracts the face features from all the face region images, and outputs the training results corresponding to the face region images according to the extracted face features.
  • the deep convolutional neural network model performs The human body feature region image extracts the human body posture feature, and the training result corresponding to the human body feature region image is output according to the extracted human body posture feature.
  • the deep convolutional neural network model extracts all the negative sample images.
  • the text feature, the face feature and the human posture feature, and the training result corresponding to the negative sample image is output according to the extracted text feature, the face feature and the human posture feature.
  • the loss function can be set according to requirements, and can be a multi-class cross-entropy loss function, or It may be a regression loss function, and the loss value is calculated by the loss function.
  • the preset convergence condition may be a condition that the value of the loss value is very small and will not drop after 7000 calculations, that is, the value of the loss value is very small and does not fall after 7000 calculations. When it drops again, stop training, and record the converged deep convolutional neural network model as an image recognition model; the preset convergence condition can also be a condition that the loss value is less than a set threshold, that is, When the loss value is less than the set threshold, the training is stopped, and the converged deep convolutional neural network model is recorded as an image recognition model.
  • the initial parameters of the iterative deep convolutional neural network model can be continuously updated to continuously move closer to the accurate recognition result, so that the accuracy of the recognition result is increasing. Higher.
  • the license plate area image, the face sample image, and the human body feature area image output by inputting the sample video into the sample collection model are used as sample training images, and the image recognition model is obtained by training the sample training images. Therefore, image recognition The model is more sensitive to the data of the target data set, thereby improving the image recognition model to be more targeted, the recognition accuracy rate is higher, and the recognition reliability is improved.
  • the extracted video frame image is input into an image recognition model, and the image recognition model performs image binarization on the video image.
  • the image recognition model performs image binarization on the video image.
  • edge detection processing and contour tracking processing the license plate area, face area and human body feature area are obtained, and the text feature of the license plate area is extracted through the character segmentation algorithm, and the license plate number result of the license plate area is obtained.
  • the YOLO algorithm extracts the face features of the face area, obtains the face result of the face area, and extracts the posture features of the human body feature area through the chain code curvature algorithm to obtain the human body feature
  • the results of the human body characteristics of the area including:
  • S306 Binarize the video frame image by using the image recognition model to obtain a grayscale image.
  • the gray-scale value of each pixel in the video frame image is calculated through the gray-scale algorithm, and the gray-scale value ranges from 0 to 255, which is a binarization process.
  • the gray value is arranged according to the position of the corresponding pixel point to obtain the gray image.
  • S307 Perform edge detection processing on the grayscale image by using the Candy algorithm to obtain an edge image.
  • the Candy algorithm (Canney edge extraction algorithm) is an algorithm for extracting the edge of an object.
  • the Candy algorithm uses a variational method to be less susceptible to noise interference and can detect the true edge of a weak edge.
  • the Candy algorithm first uses a two-dimensional Gaussian filter function for convolution, and performs noise reduction processing to reduce the noise of each pixel in the image to no interference; secondly, according to the point on the edge contour, the first-order partial derivative in two directions And gradient, calculate the gradient direction of the point; then according to the 0 degree, 45 degree, 90 degree and 135 degree directions corresponding to the gradient direction, determine the adjacent point; finally calculate the difference between the gray value of the point and the adjacent point, according to The difference determines the edge, and finally the final edge image is obtained.
  • S308 Perform contour tracking processing on the edge image according to the tracking criterion and the node criterion to obtain a contour image.
  • the contour tracking process is to start from a starting point and follow the edge route near the starting point to find points that meet the tracking criteria, and follow the points that meet the tracking criteria until the points that meet the tracking criteria are found. According to the points of the node criterion, the contour in the edge image is clearly marked, and the contour image is obtained.
  • S309 Analyze the contour image, and cut out the license plate area including the license plate, the face area including the human face, and the human body feature area including the human body posture.
  • contour images that is, analyzing the contour images containing license plates, human faces, or human poses, and determining the license plate area containing the license plate and all areas containing the human face from the analysis result
  • the human face area and the human body feature area related to the posture of the human body are included, and intercepted.
  • the image recognition model extracts the text feature of the license plate area to obtain the license plate number result.
  • the character segmentation algorithm is an image segmented into individual characters from the license plate area
  • the character recognition method is to recognize letters, Chinese characters and numbers through the segmented image of a single character, that is, extract a single
  • the character segmentation algorithm and the character recognition method are used to compare the image recognition model to the
  • the license plate area performs the extraction of the text feature, and the license plate result is determined according to the text feature.
  • the license plate result is whether the license plate number corresponding to the recognized license plate area is the same as the license plate number, and the license plate result It can be set according to requirements.
  • the license plate result may include the same as the target person's license plate number and the target person's license plate number different.
  • the YOLO (You Only Look Once) algorithm is an algorithm that uses a CNN operation to directly predict the categories and regions of different targets, and the image recognition model uses the YOLO algorithm to extract the face regions.
  • Face features are features related to the eyes, eyebrows, mouth, nose, ears, and face profile of the target
  • the face result is determined according to the face features
  • the face result represents
  • the face result can be set according to requirements.
  • the face result includes the face of the target person and the face of the target person.
  • the image recognition model recognizes the shoulder width scale, the neck scale, and the chest width scale of the human body feature region, and extracts according to the shoulder width scale, the neck scale, and the chest width scale The posture characteristics of the human body are extracted, and the result of the human body characteristics is obtained.
  • the chain code curvature algorithm is an algorithm that calculates the curvature value corresponding to the point four-connected chain code or the eight-connected chain code of the contour edge of the human body feature area.
  • the point of each contour edge is performed by the image recognition model.
  • Recognition, the shoulder width scale, the neck scale, and the chest width scale are recognized, and the human body posture feature correspondence is extracted according to the ratio between the shoulder width scale, the neck scale and the chest width scale
  • the human body feature result is determined according to the vector value corresponding to the human body posture feature.
  • the human body feature result table indicates whether the human body posture of the target person is contained in the human body feature area, and the human body feature result can be It is set according to requirements.
  • the result of the human body feature includes correlation with the human body posture of the target person and non-correlation with the human body posture of the target person.
  • This application obtains license plate area, face area and human body feature area after binarization processing of the video frame image, edge detection processing based on Candy algorithm and contour tracking processing, and uses character segmentation algorithm to identify the license plate number of the license plate area
  • using the YOLO algorithm to recognize the face results in the face area and the chain code curvature algorithm to recognize the human body feature results in the human body feature area realizing the rapid and accurate recognition of the license plate number result, the face result and the body feature result. Improved accuracy and reliability.
  • S40 Determine a recognition result corresponding to the video frame image according to the license plate number result, the face result and the human body feature result; the recognition result characterizes whether the video frame image contains the license plate number Any one of the face image and the human body feature map; the recognition result includes whether it is related or not related to the license plate number, the face image, and the human body feature map.
  • the license plate number result is different from the target person's license plate number, and the face result is not related to the target person's face, and the human body feature result is not related to the target person's body posture
  • the recognition result corresponding to the video frame image is not related to the license plate number, the face image, and the human body feature map. If the license plate number result is the same as the target person’s license plate number, or the If the face result is related to the target person’s face, or the human feature result is related to the target person’s body posture, it is determined that the recognition result corresponding to the video image is related to the license plate number, the face image, and the The human body feature map is not relevant. In this way, the video frame image of the target person driving the vehicle after the target person's face is partially occluded can be determined as the relevant image, and the video related to the target person can be more accurately identified.
  • the recognition result corresponding to the video frame image is determined according to the result of the license plate number, the result of the face, and the result of the human body feature ,include:
  • S401 Determine the license plate number result, the face result, and the human body feature result of the video frame image as a recognition set of the video frame image.
  • the recognition set is a set that includes a plurality of the license plate number results, a plurality of the face results, and a plurality of the human body feature results.
  • the recognition set matches the license plate number or the face image or the human body feature map, it is determined that the recognition result corresponding to the video frame image is the same as the license plate number.
  • the face image is related to the human body feature map.
  • none of the recognition set matches the license plate number or the face image or the human body feature map, so that it is determined that the recognition result corresponding to the video frame image is the same as the license plate number,
  • the face image and the human body feature map are not related.
  • the present application can quickly and accurately identify whether the video frame image contains any of the license plate number, the face image, and the human body feature map through the image recognition model, and obtain the correlation or Non-correlated recognition results improve the efficiency and reliability of recognition.
  • S50 According to the recognition result corresponding to the video frame image in each of the original videos, extract a plurality of matching segments in each of the original videos, and combine the matching segments with the corresponding corresponding segments.
  • the unique identification code is associated;
  • the matching segment is the video segment between the start frame image and the end frame image in the original video;
  • the start frame image refers to the previous video frame that is related and adjacent to the recognition result
  • the recognition result of the image is a non-correlated video frame image;
  • the end frame image refers to the recognition result is non-correlated and the previous video frame image adjacent to the recognition result is a related video frame image, or refers to the original video
  • the video frame image at the end is associated with the corresponding corresponding segments.
  • the recognition result corresponding to the video frame image that is, according to the recognition result in each of the original videos as a relevant video frame image and the recognition result as a non-relevant video frame image , Extracting multiple matching segments in each of the original videos, and associating the matching segments with the corresponding unique identification code.
  • the unique identification code can be set according to requirements, such as the unique
  • the identification code may be a time value on the time axis accurate to the second level, or a combination code that combines the unique identification code of the photographing device and the time value.
  • the matching segment is a video segment that includes any one of the license plate number, the face image, and the human body feature map.
  • the multiple matching segments in each original video are extracted according to the recognition result corresponding to the video frame image in each original video , And associating the matching segment with the unique identification code associated with the corresponding original video, including:
  • S502 Acquire all the start point frame images from the original video, and acquire all the end point frame images from the original video in chronological order.
  • the starting point frame image refers to a video frame image whose recognition result is relevant and the recognition result of the previous video frame image adjacent to it is irrelevant
  • the ending frame image refers to the recognition result being irrelevant and not relevant to it.
  • the recognition result of the adjacent previous video frame image is the relevant video frame image, or refers to the end video frame image in the original video, and all the start point frame images and all the end points are obtained from the original video Frame image.
  • the starting point frame image and the ending point frame image do not exist in the original video, it is determined that there is no matching video segment in the original video, that is, the original video is marked as a non-matching segment.
  • the start frame image is marked as the matching segment of the original video.
  • the end frame image is marked as the matching segment of the original video.
  • the video segment between the start point frame image and the end point frame image is cut out, and the video segment is marked as the The matching segment of the original video.
  • the matching segment is associated with the unique identification code associated with the original video.
  • the unique identification code can be set according to requirements. For example, the unique identification code can be accurate to the second level.
  • an interception method for obtaining matching segments is provided, which can accurately intercept required video segments, which improves efficiency and reduces costs.
  • the matching segments associated with the same unique identification code are spliced to obtain a video synthesis segment that develops in a chronological order, and the video synthesis segment is combined with the unique identifier. Code to associate.
  • S70 Sort all the video synthesis fragments according to a preset unique identification code order rule, and determine all the video synthesis fragments after sorting as target videos in the to-be-processed video set that are related to the target data set .
  • the unique identification code order rule is a sorting rule preset for the unique identification code according to requirements.
  • the unique identification code order rule may be the order of the road route the target drove, or may be the target The time sequence of the driver’s driving, etc., determine all the video synthesis segments after sorting as the target video in the to-be-processed video set that is related to the target image set and target data set.
  • the target video for the target person is provided to the responsible person, and the target video is used as evidence of the responsibility, without manual viewing and interception, and not affected by the mental state of the staff. Therefore, the investment cost is reduced. Improved work efficiency.
  • This application realizes that by acquiring the target data set containing the license plate number, the face image and the human body feature map and the to-be-processed video set containing multiple original videos, the video frame image is extracted from the original video, and the image binarization process is performed.
  • the relevant features are extracted through the combination of character segmentation algorithm, YOLO algorithm and chain code curvature algorithm to obtain the recognition result, which characterizes whether the video frame image contains the license plate number .
  • Any one of the face image and the human body feature map extracts multiple matching segments according to the recognition result, and stitches the matching segments to obtain a video synthesis segment, and synthesizes all the videos after sorting
  • the segment is determined to be the target video related to the target data set in the to-be-processed video set. Therefore, it is realized that the video segments related to the target data set are automatically cut out from the to-be-processed video set quickly and accurately, which improves the recognition efficiency. And the accuracy rate greatly reduces the input cost.
  • a video data processing device is provided, and the video data processing device corresponds to the video data processing method in the above-mentioned embodiment one-to-one.
  • the video data processing device includes a receiving module 11, an extraction module 12, an acquisition module 13, an identification module 14, an extraction module 15, a merging module 16 and a determination module 17.
  • the detailed description of each functional module is as follows:
  • the receiving module 11 is configured to receive a video extraction instruction to obtain a target data set and a video set to be processed;
  • the target data set includes a license plate number, a face image and a human body feature map, and the video set to be processed includes a number of original videos, Each of the original videos is associated with a unique identification code;
  • the extraction module 12 is configured to extract at least two video frame images from each of the original videos according to a preset extraction rule
  • the acquisition module 13 is configured to input the extracted video frame image into an image recognition model, and the image recognition model obtains the license plate area, In the face area and the human body feature area, the text feature of the license plate area is extracted through the character segmentation algorithm, and the license plate number result of the license plate area is obtained.
  • the facial feature extraction is performed on the face area through the YOLO algorithm, Acquiring a face result of the human face region, and extracting a human posture feature of the human body feature region by a chain code curvature algorithm, and acquiring a human body feature result of the human body feature region;
  • the recognition module 14 is configured to determine the recognition result corresponding to the video frame image according to the license plate number result, the face result and the human body feature result; the recognition result characterizes whether the video frame image contains Any one of the license plate number, the face image, and the human body feature map; the recognition result includes whether it is related or unrelated to the license plate number, the face image, and the human body feature map;
  • the extraction module 15 is configured to extract multiple matching segments in each original video according to the recognition result corresponding to the video frame image in each original video, and combine the matching segments with the Corresponding to the unique identification code;
  • the matching segment is the video segment between the start frame image and the end frame image in the original video;
  • the start frame image refers to the recognition result that is related and adjacent to it
  • the recognition result of the previous video frame image is a non-correlated video frame image;
  • the end frame image refers to the recognition result is non-correlated and the recognition result of the previous video frame image adjacent to it is a related video frame image, or refers to The last video frame image in the original video;
  • the merging module 16 is configured to splice the matching segments associated with the same unique identification code in chronological order to obtain a video synthesis segment, and associate the video synthesis segment with the same unique identification code;
  • the determining module 17 is configured to sort all the video synthesis fragments according to a preset unique identification code order rule, and determine all the video synthesis fragments after sorting as the to-be-processed video set and the target data set Related target videos.
  • the receiving module 11 includes:
  • the receiving unit is configured to receive a target collection instruction and obtain a sample video;
  • the sample video includes a license plate, a human face, and a specific human posture;
  • the first input unit is configured to input all the sample images into a sample collection model, the sample collection model performs image collection on all the sample images, and intercepts all the sample images containing the license plate area images of the license plate , At the same time, extract all the face region images that contain the face in all the sample images, and extract the feature region images of the human body that contain the face and the specific human posture from all the sample images;
  • the first acquiring unit is configured to input the license plate area image into a license plate extraction model, and the license plate extraction model performs license plate number recognition on the license plate area image to obtain a license plate number output by the license plate extraction model;
  • the second acquisition unit is configured to input all the face region images into a face extraction model, and the face extraction model screens all the face region images, and obtains the face extraction model to select a person Face image
  • the third acquisition unit is configured to input all the human body feature region images into a human body feature extraction model, and the human body feature extraction model screens all the human body feature region images to obtain one selected by the human body feature extraction model Human body feature map;
  • the first output unit is configured to determine the license plate number, the face image, and the human body feature map as the target data set.
  • the extraction module 12 includes:
  • a fourth acquiring unit configured to acquire the original video and the extraction parameters in the extraction rule
  • a first determining unit configured to determine a starting video frame image in the original video as a starting frame image
  • the second determining unit is configured to determine the ending video frame image in the original video as the ending frame image
  • An extracting unit configured to extract a video frame image from the start frame image at intervals of the extraction parameter until the end frame image
  • a third determining unit configured to determine the video frame image after extraction as a process frame image
  • the second output unit is configured to form all the video frame images in the original video by the start frame image, the end frame image, and the process frame image.
  • the acquisition module 13 includes:
  • the fifth acquisition unit is used to acquire a sample training image set;
  • the sample training image set includes a number of sample training images, and the sample training images include all the license plate area images, all the face sample images, and all the A human body feature region image and a number of negative sample images; each of the sample training images is associated with a sample training label;
  • the second input unit is configured to input the sample training image into a deep convolutional neural network model containing initial parameters, and extract text features, facial features, and human poses from the sample training image through the deep convolutional neural network model Feature, the training result output by the deep convolutional neural network model according to the extracted text feature, the face feature, and the human posture feature;
  • a loss unit configured to match the training result with the sample training label to obtain a loss value
  • a first convergence unit configured to record the deep convolutional neural network model after convergence as an image recognition model when the loss value reaches a preset convergence condition
  • the second convergence unit is configured to iteratively update the initial parameters of the deep convolutional neural network model when the loss value does not reach the preset convergence condition, until the loss value reaches the preset convergence condition,
  • the deep convolutional neural network model after convergence is recorded as an image recognition model.
  • the acquiring module 13 further includes:
  • the first processing unit is configured to perform binarization processing on the video frame image through the image recognition model to obtain a grayscale image
  • the second processing unit is configured to perform edge detection processing on the grayscale image by using the Candy algorithm to obtain an edge image
  • the third processing unit is configured to perform contour tracking processing on the edge image according to the tracking criterion and the node criterion to obtain a contour image;
  • An analysis unit configured to analyze the contour image, and cut out the license plate area containing the license plate, the face area containing the human face, and the human body feature area related to the posture of the human body;
  • the first extraction unit is configured to extract the text features of the license plate area by using a character segmentation algorithm and a character recognition method, and the image recognition model to obtain the license plate number result;
  • the second extraction unit is configured to use the YOLO algorithm and the image recognition model to extract the facial features of the facial region to obtain the facial result;
  • the third output unit is configured to use the chain code curvature algorithm to identify the shoulder width scale, neck scale and chest width scale of the human body feature area by the image recognition model.
  • the human body posture feature is extracted by the breast width scale, and the human body feature result is obtained.
  • the identification module 14 includes:
  • a fourth determining unit configured to determine the license plate number result, the face result, and the human body feature result of the video frame image as the recognition set of the video frame image;
  • the fifth determining unit is configured to determine that the recognition result corresponding to the video frame image is if there is a match with any one of the license plate number, the face image, and the human body feature map in the recognition set Related to the license plate number, the face image, and the human body feature map;
  • the sixth determining unit is configured to determine that the recognition result corresponding to the video frame image is the same as the license plate number if the recognition set does not match the license plate number, the face image, and the human body feature map. , The face image and the human body feature map are not related.
  • the extraction module 15 includes:
  • the sixth acquiring unit is configured to acquire the original video
  • a seventh acquiring unit configured to acquire all the start point frame images from the original video, and acquire all the end point frame images from the original video in chronological order;
  • a seventh determining unit configured to determine that the original video is a non-matching segment if the start point frame image and the end point frame image do not exist in the original video
  • An eighth determining unit configured to, if there is only one start frame image and no end frame image in the original video, record the start frame image as the matching segment of the original video;
  • a ninth determining unit configured to record the end frame image as the matching segment of the original video if there is only one end frame image in the original video and the start point frame image does not exist;
  • the tenth determining unit is configured to obtain a video segment between the start point frame image and the end point frame image if there is the end point frame image adjacent to it after the start point frame image, and compare the start point frame image A video segment between the end frame image and the end frame image is recorded as the matching segment of the original video;
  • the associating unit is used for associating all the matching segments with the unique identification code associated with the original video.
  • Each module in the above-mentioned video data processing device may be implemented in whole or in part by software, hardware, and a combination thereof.
  • the above-mentioned modules may be embedded in the form of hardware or independent of the processor in the computer equipment, or may be stored in the memory of the computer equipment in the form of software, so that the processor can call and execute the operations corresponding to the above-mentioned modules.
  • a computer device is provided.
  • the computer device may be a server, and its internal structure diagram may be as shown in FIG. 9.
  • the computer equipment includes a processor, a memory, a network interface, and a database connected through a system bus.
  • the processor of the computer device is used to provide calculation and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium stores an operating system, a computer program, and a database.
  • the internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage medium.
  • the network interface of the computer device is used to communicate with an external terminal through a network connection.
  • the computer program is executed by the processor to realize a video data processing method.
  • a computer device including a memory, a processor, and a computer program stored on the memory and capable of running on the processor.
  • the processor executes the computer program to implement the video data processing method in the foregoing embodiment.
  • a computer-readable storage medium on which a computer program is stored, and the computer program is executed by a processor to implement the video data processing method in the above-mentioned embodiment.
  • the computer-readable storage medium may mainly include a storage program area and a storage data area, where the storage program area may store an operating system, an application program required by at least one function, etc.; the storage data area may store Data created by the use of nodes, etc.
  • Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), and electrically programmable ROM (EPROM).
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), synchronous link (Synchlink) DRAM (SLDRAM), and memory bus dynamic RAM (RDRAM) and so on.
  • SRAM static RAM
  • DRAM dynamic RAM
  • SDRAM synchronous DRAM
  • SLDRAM synchronous link DRAM
  • RDRAM memory bus dynamic RAM
  • the blockchain referred to in this application is a new application mode of computer technology such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm.
  • Blockchain essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information for verification. The validity of the information (anti-counterfeiting) and the generation of the next block.
  • the blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.

Abstract

L'invention concerne un procédé et un appareil de traitement de données vidéo, ainsi qu'un dispositif informatique et un support de stockage. Le procédé consiste : à acquérir un ensemble de données cibles et un ensemble de vidéos à traiter ; à extraire au moins deux images de trame vidéo à partir de chaque vidéo d'origine ; à réaliser, par un modèle d'identification d'image, un traitement de binarisation d'image, un traitement de détection de bord et un traitement de suivi de contour sur une image vidéo pour obtenir ensuite une zone de plaque d'immatriculation, une zone faciale et une zone de caractéristique de corps humain ; à extraire, au moyen d'un algorithme de segmentation de caractères, une caractéristique de texte à partir de la zone de plaque d'immatriculation pour obtenir un résultat de numéro de plaque d'immatriculation ; à extraire, au moyen d'un algorithme YOLO, une caractéristique faciale à partir de la zone faciale pour obtenir un résultat facial ; à extraire, au moyen d'un algorithme de courbure de code de chaîne, une caractéristique de posture de corps humain à partir de la zone de caractéristique de corps humain pour obtenir un résultat de caractéristique de corps humain ; à déterminer un résultat d'identification ; à extraire des fragments correspondants ; et à réaliser un épissage pour obtenir des fragments composites vidéo, et à classer les fragments composites vidéo et à obtenir ensuite une vidéo cible. La présente invention concerne en outre la technologie des chaînes de blocs. L'ensemble des données cibles peut être stocké dans un nœud de chaîne de blocs.
PCT/CN2020/099082 2020-04-24 2020-06-30 Procédé et appareil de traitement de données vidéo, dispositif informatique et support de stockage WO2021212659A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010332455.9 2020-04-24
CN202010332455.9A CN111626123A (zh) 2020-04-24 2020-04-24 视频数据处理方法、装置、计算机设备及存储介质

Publications (1)

Publication Number Publication Date
WO2021212659A1 true WO2021212659A1 (fr) 2021-10-28

Family

ID=72271775

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/099082 WO2021212659A1 (fr) 2020-04-24 2020-06-30 Procédé et appareil de traitement de données vidéo, dispositif informatique et support de stockage

Country Status (2)

Country Link
CN (1) CN111626123A (fr)
WO (1) WO2021212659A1 (fr)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114363660A (zh) * 2021-12-24 2022-04-15 腾讯科技(武汉)有限公司 视频合集确定方法、装置、电子设备及存储介质
CN115019008A (zh) * 2022-05-30 2022-09-06 深圳市鸿普森科技股份有限公司 一种智能3d模型设计分析服务管理平台
CN115037987A (zh) * 2022-06-07 2022-09-09 厦门蝉羽网络科技有限公司 一种直播带货视频的回看方法及系统
CN115858854A (zh) * 2023-02-28 2023-03-28 北京奇树有鱼文化传媒有限公司 一种视频数据整理方法、装置、电子设备及存储介质
CN117437505A (zh) * 2023-12-18 2024-01-23 杭州任性智能科技有限公司 一种基于视频的训练数据集生成方法和系统

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112257674B (zh) * 2020-11-17 2022-05-27 珠海大横琴科技发展有限公司 一种视觉数据处理的方法和装置
CN112465691A (zh) * 2020-11-25 2021-03-09 北京旷视科技有限公司 图像处理方法、装置、电子设备和计算机可读介质
CN112650876A (zh) * 2020-12-30 2021-04-13 北京嘀嘀无限科技发展有限公司 图像处理方法、装置、电子设备、存储介质和程序产品
CN113435330A (zh) * 2021-06-28 2021-09-24 平安科技(深圳)有限公司 基于视频的微表情识别方法、装置、设备及存储介质
CN114286171B (zh) * 2021-08-19 2023-04-07 腾讯科技(深圳)有限公司 视频处理方法、装置、设备及存储介质
CN114004757B (zh) * 2021-10-14 2024-04-05 大族激光科技产业集团股份有限公司 去除工业图像中干扰的方法、系统、设备和存储介质
CN113688810B (zh) * 2021-10-26 2022-03-08 深圳市安软慧视科技有限公司 一种边缘设备的目标捕获方法、系统及相关设备
CN114880517A (zh) * 2022-05-27 2022-08-09 支付宝(杭州)信息技术有限公司 用于视频检索的方法及装置

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180300556A1 (en) * 2017-04-17 2018-10-18 Intel Corporation Person tracking and privacy and acceleration of data using autonomous machines
CN109672936A (zh) * 2018-12-26 2019-04-23 上海众源网络有限公司 一种视频评估集的确定方法、装置及电子设备
CN110191324A (zh) * 2019-06-28 2019-08-30 Oppo广东移动通信有限公司 图像处理方法、装置、服务器及存储介质
CN110287778A (zh) * 2019-05-15 2019-09-27 北京旷视科技有限公司 一种图像的处理方法、装置、终端及存储介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180300556A1 (en) * 2017-04-17 2018-10-18 Intel Corporation Person tracking and privacy and acceleration of data using autonomous machines
CN109672936A (zh) * 2018-12-26 2019-04-23 上海众源网络有限公司 一种视频评估集的确定方法、装置及电子设备
CN110287778A (zh) * 2019-05-15 2019-09-27 北京旷视科技有限公司 一种图像的处理方法、装置、终端及存储介质
CN110191324A (zh) * 2019-06-28 2019-08-30 Oppo广东移动通信有限公司 图像处理方法、装置、服务器及存储介质

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114363660A (zh) * 2021-12-24 2022-04-15 腾讯科技(武汉)有限公司 视频合集确定方法、装置、电子设备及存储介质
CN114363660B (zh) * 2021-12-24 2023-09-08 腾讯科技(武汉)有限公司 视频合集确定方法、装置、电子设备及存储介质
CN115019008A (zh) * 2022-05-30 2022-09-06 深圳市鸿普森科技股份有限公司 一种智能3d模型设计分析服务管理平台
CN115037987A (zh) * 2022-06-07 2022-09-09 厦门蝉羽网络科技有限公司 一种直播带货视频的回看方法及系统
CN115037987B (zh) * 2022-06-07 2024-05-07 厦门蝉羽网络科技有限公司 一种直播带货视频的回看方法及系统
CN115858854A (zh) * 2023-02-28 2023-03-28 北京奇树有鱼文化传媒有限公司 一种视频数据整理方法、装置、电子设备及存储介质
CN115858854B (zh) * 2023-02-28 2023-05-26 北京奇树有鱼文化传媒有限公司 一种视频数据整理方法、装置、电子设备及存储介质
CN117437505A (zh) * 2023-12-18 2024-01-23 杭州任性智能科技有限公司 一种基于视频的训练数据集生成方法和系统

Also Published As

Publication number Publication date
CN111626123A (zh) 2020-09-04

Similar Documents

Publication Publication Date Title
WO2021212659A1 (fr) Procédé et appareil de traitement de données vidéo, dispositif informatique et support de stockage
CN110569721B (zh) 识别模型训练方法、图像识别方法、装置、设备及介质
WO2020253629A1 (fr) Procédé et appareil d'entraînement de modèle de détection, dispositif informatique et support de stockage
CN111310624B (zh) 遮挡识别方法、装置、计算机设备及存储介质
KR102554724B1 (ko) 이미지 내 객체를 식별하기 위한 방법 및 상기 방법을 실행하기 위한 모바일 디바이스
Tapia et al. Gender classification from iris images using fusion of uniform local binary patterns
CN111428604B (zh) 面部佩戴口罩识别方法、装置、设备及存储介质
WO2021139324A1 (fr) Procédé et appareil de reconnaissance d'image, support de stockage lisible par ordinateur et dispositif électronique
CN112801057B (zh) 图像处理方法、装置、计算机设备和存储介质
WO2019033525A1 (fr) Procédé de reconnaissance de caractéristiques d'unité d'action, dispositif et support d'informations
CN111680672B (zh) 人脸活体检测方法、系统、装置、计算机设备和存储介质
CN112016464A (zh) 检测人脸遮挡的方法、装置、电子设备及存储介质
CN111950424A (zh) 一种视频数据处理方法、装置、计算机及可读存储介质
CN106295547A (zh) 一种图像比对方法及图像比对装置
CN111274926A (zh) 图像数据筛选方法、装置、计算机设备和存储介质
US20230060211A1 (en) System and Method for Tracking Moving Objects by Video Data
CN111783681A (zh) 大规模人脸库识别方法、系统、计算机设备及存储介质
CN112101195A (zh) 人群密度预估方法、装置、计算机设备和存储介质
CN110580507B (zh) 一种城市肌理分类识别方法
CN113706481A (zh) 精子质量检测方法、装置、计算机设备和存储介质
CN111539320A (zh) 基于互相学习网络策略的多视角步态识别方法及系统
CN113780145A (zh) 精子形态检测方法、装置、计算机设备和存储介质
JP2013218605A (ja) 画像認識装置、画像認識方法及びプログラム
CN113762031A (zh) 一种图像识别方法、装置、设备及存储介质
CN114038045A (zh) 一种跨模态人脸识别模型构建方法、装置及电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20932057

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 17.02.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 20932057

Country of ref document: EP

Kind code of ref document: A1