WO2021017289A1 - Procédé et appareil de localisation d'un objet dans une vidéo, dispositif informatique et support d'informations - Google Patents

Procédé et appareil de localisation d'un objet dans une vidéo, dispositif informatique et support d'informations Download PDF

Info

Publication number
WO2021017289A1
WO2021017289A1 PCT/CN2019/117702 CN2019117702W WO2021017289A1 WO 2021017289 A1 WO2021017289 A1 WO 2021017289A1 CN 2019117702 W CN2019117702 W CN 2019117702W WO 2021017289 A1 WO2021017289 A1 WO 2021017289A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
feature
candidate
video
face
Prior art date
Application number
PCT/CN2019/117702
Other languages
English (en)
Chinese (zh)
Inventor
张磊
宋晨
李雪冰
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021017289A1 publication Critical patent/WO2021017289A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7847Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using low-level visual features of the video content
    • G06F16/785Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using low-level visual features of the video content using colour or luminescence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7847Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using low-level visual features of the video content
    • G06F16/7854Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using low-level visual features of the video content using shape
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • G06V20/53Recognition of crowd images, e.g. recognition of crowd congestion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification

Definitions

  • This application belongs to the field of artificial intelligence, and in particular relates to a method, device, computer equipment, and storage medium for locating an object in a video.
  • the security system uses a large number of video capture equipment to monitor in real time through video and record video data for viewing in order to maintain public safety.
  • This application provides a method, device, computer equipment and storage medium for locating an object in a video to solve the problem of time-consuming locating an object.
  • this application proposes a method for locating an object in a video, which includes the following steps:
  • the first image feature of the object to be positioned includes the image contour and/or the image
  • the face feature of the object to be located is compared with the image of the candidate object, and an object in the candidate object that matches the face feature of the object to be located is determined as the object to be located.
  • an embodiment of the present application also provides an apparatus for locating an object in a video, including:
  • the first acquisition module is configured to acquire a first image feature of the object to be positioned, the first image feature including image contour and/or image color feature;
  • a retrieval module configured to retrieve a preset video database according to the first image feature of the object to be located, and obtain an image of a candidate object that matches the first image feature of the object to be located;
  • the second acquisition module is used to acquire the facial features of the object to be located
  • a processing module configured to compare the facial features of the object to be located with the image of the candidate object, and determine that the object that matches the facial feature of the object to be located among the candidate objects is the object to be located .
  • an embodiment of the present application further provides a computer device including a memory and a processor.
  • the memory stores computer-readable instructions.
  • the processor executes the steps of the method for locating an object in the video.
  • the embodiments of the present application further provide one or more non-volatile readable storage media, the non-volatile readable storage medium stores computer readable instructions, and the computer readable instructions are When the processor is executed, the processor is caused to execute the steps of the method for locating an object in the video.
  • FIG. 1 is a schematic diagram of the basic flow of a method for locating an object in a video according to an embodiment of this application;
  • FIG. 2 is a schematic diagram of a process of acquiring a first image feature of an object to be positioned according to an embodiment of the application;
  • FIG. 3 is a schematic diagram of a process for determining a candidate object according to an embodiment of the application
  • FIG. 4 is a schematic diagram of a training process of a convolutional neural network model according to an embodiment of the application
  • FIG. 5 is a schematic diagram of a process of determining an object to be located according to an embodiment of the application
  • FIG. 6 is a block diagram of the basic structure of an apparatus for locating an object in a video according to an embodiment of this application;
  • FIG. 7 is a block diagram of the basic structure of the computer equipment implemented in this application.
  • terminal and “terminal equipment” used herein include both wireless signal receiver equipment, equipment that only has wireless signal receivers without transmitting capability, and equipment receiving and transmitting hardware.
  • a device which has a device capable of performing two-way communication receiving and transmitting hardware on a two-way communication link.
  • Such equipment may include: cellular or other communication equipment, which has a single-line display or multi-line display or cellular or other communication equipment without a multi-line display; PCS (Personal Communications Service, personal communication system), which can combine voice and data Processing, fax and/or data communication capabilities; PDA (Personal Digital Assistant), which can include radio frequency receivers, pagers, Internet/Intranet access, web browsers, notebooks, calendars and/or GPS (Global Positioning System (Global Positioning System) receiver; a conventional laptop and/or palmtop computer or other device, which has and/or includes a radio frequency receiver, a conventional laptop and/or palmtop computer or other device.
  • PCS Personal Communications Service, personal communication system
  • PDA Personal Digital Assistant
  • GPS Global Positioning System (Global Positioning System) receiver
  • a conventional laptop and/or palmtop computer or other device which has and/or includes a radio frequency receiver, a conventional laptop and/or palmtop computer or other device.
  • terminal and terminal equipment used here may be portable, transportable, installed in vehicles (aviation, sea and/or land), or suitable and/or configured to operate locally, and/or In a distributed form, it runs on the earth and/or any other location in space.
  • the "terminal” and “terminal device” used here can also be communication terminals, Internet terminals, music/video playback terminals, such as PDA, MID (Mobile Internet Device, mobile Internet device) and/or music/video playback Functional mobile phones can also be devices such as smart TVs and set-top boxes.
  • the terminals involved in this embodiment are the aforementioned terminals.
  • FIG. 1 is a schematic flowchart of a method for locating an object in a video according to this embodiment.
  • a method for positioning an object in a video includes the following steps:
  • the first image feature of the object to be positioned is received through an interactive interface, where the object to be positioned refers to a specific person, and the first image feature here refers to the contour feature or color feature of the image containing the object to be positioned, or a combination of the two.
  • contour features include people's height, short, fat and thin
  • color features include people's skin color, hair color, and clothing color.
  • the aforementioned features can be input by the user through an interactive interface.
  • the contour feature extraction algorithm and the color feature extraction algorithm are used to obtain the first image feature of the object to be located. Specifically, please refer to FIG. 2.
  • S102 Search a preset video database according to the first image feature of the object to be located, and obtain an image of the candidate object that matches the first image feature of the object to be located;
  • the preset video database is retrieved according to the first image feature, where the preset video database refers to the storage space where the video collected by the video surveillance device is saved.
  • the existing semantic-based retrieval requires pre-marking of the semantic attributes of the image.
  • the image comes from the real-time collection of the video surveillance equipment, and pre-marking is not applicable.
  • similar feature comparison is used.
  • the algorithm specifically, please refer to Figure 3.
  • the face feature of the object to be located is acquired through an interactive interface, where the face feature is an n-dimensional vector representing the feature of the face image.
  • Image features are the corresponding (essential) characteristics or characteristics of a certain type of object that are different from other types of objects, or a collection of these characteristics and characteristics.
  • Features are data that can be extracted through measurement or processing. For images, each image has its own characteristics that can be distinguished from other types of images. Some are natural features that can be intuitively felt, such as brightness, edges, texture, and color; others need to be transformed or processed Can be obtained, such as moments, histograms, and principal components.
  • Grayscale think the image as a three-dimensional image of x, y, z (grayscale)
  • a pre-trained convolutional neural network is used to extract the feature vectors of a face image.
  • the convolutional neural network extracts image features. The extracted features are less likely to be over-fitted. Use different convolution, pooling and the size of the final output feature vector to control the fitting ability of the overall model, which is more flexible. See Figure 4 for the training steps.
  • step S102 by comparing the facial features of the object to be located with the image of the candidate object obtained in step S102, it is determined that the object having the same facial feature as the object to be located among the candidate objects is the final object to be located.
  • the face image of the candidate object is intercepted, the face feature vector of the candidate object is obtained in the same manner as in step 103, and the similarity between the two vectors is compared.
  • the cosine similarity means that the cosine value of the angle between two vectors ranges from [-1,1]. The closer the value is to 1, the closer the two vectors are, the more similar the two vectors are. ; The closer they are to -1, the more opposite their directions; the closer to 0, it means that the two vectors are nearly orthogonal.
  • the specific calculation formula is:
  • Ai and Bi represent the components of vectors A and B, respectively.
  • the facial features of the candidate object are obtained by the preset facial feature extraction model extraction model, and then the facial features of the object to be located are compared with the facial feature similarity of the candidate object. Please refer to FIG. 5 for details.
  • step S101 the following steps are further included:
  • S112 Process the image of the object to be positioned according to other algorithms of the image contour feature and/or the color feature extraction algorithm to obtain the first image feature of the object to be positioned.
  • Image contour feature extraction is performed on the image of the object to be positioned.
  • the contour feature extraction of the image can be extracted by the image gradient algorithm.
  • the gradient of the image function f (x, y) at the point (x, y) is a vector with size and direction, set Gx and Gy to represent the x direction and y direction, respectively
  • the gradient of this gradient can be expressed as:
  • the gradient can be approximately expressed as:
  • f(x,y) is the image function of the image to be calculated contour
  • f(x-1,y) and f(x,y-1) are the image function f(x, y)
  • Gx and Gy are the image function f(x, y) in the x direction and y direction respectively gradient.
  • the direction of the gradient is the fastest changing direction of the function f(x,y).
  • f(x,y) When there are edges in the image, there must be a larger gradient value. On the contrary, when there are relatively smooth parts in the image, the gray value changes less , The corresponding gradient is also smaller.
  • the image gradient algorithm considers the grayscale change in a certain neighborhood of each pixel of the image, and uses the first-order or second-order derivative change law near the edge to set a gradient operator in a certain neighborhood of the pixel in the original image, such as the Sobel algorithm. Sub, Robinson operator, Laplace operator, etc., convolve the original image with the gradient operator to obtain the contour of the target object image.
  • the color is a global feature that describes the surface properties of the scene corresponding to the image or image area.
  • the general color feature is based on the feature of the pixel.
  • the target detection algorithm is realized by the cascaded convolutional neural network model.
  • the color characteristics of the target image are obtained by calculating the color histogram of the cropped image.
  • the color histogram can be calculated by the API function calcHist provided in OpenCV for calculating the image histogram.
  • the image in the video stream is also down-sampled by the same multiple, and the down-sampled image is used to match the contour feature or color feature of the extracted target object to obtain the candidate target object.
  • the pixel data is reduced, which can reduce the amount of calculation and speed up the calculation.
  • step S102 the following steps are further included:
  • the video image frame is the decomposition of the video, and third-party software can be used to decompose the video to obtain the video image frame.
  • S122 Input the video image frame into a preset target detection model, and obtain an image of the target object output by the target detection model in response to the video image frame, wherein the preset target detection model is based on a preset target detection model.
  • a trained deep learning neural network where the deep learning neural network performs target detection on the input video image frame and the output image of the target object is a human body image;
  • the obtained video image frames often contain more than the target image.
  • the target detection is performed on the video image frame.
  • the target detection in this application is to detect the human body, and the purpose is to remove other parts except the human body image.
  • the image of the target object obtained after detection is a human body image.
  • a pre-trained deep learning neural network is used to detect the target object.
  • the video image frame is first divided into equal parts.
  • the input image is divided into 7*7 puzzle images.
  • the puzzle image is input into the deep learning neural network, and for each puzzle grid deep learning neural network will predict 2 prediction boxes.
  • the predicted prediction box contains 5 values: x, y, w, h and confidence.
  • x and y are the center coordinates of the prediction box, and w and h are the width and height of the prediction box.
  • the third convolutional neural network outputs a 7x7x (2x5+1) prediction tensor for the next step of determining the target object prediction frame. After obtaining the prediction tensor, filter by setting a confidence threshold.
  • the prediction frames with a confidence less than the threshold will be filtered out, leaving only prediction frames with higher confidence as the remaining frames. Then for each remaining prediction box, calculate the IOU (coincidence) value of a remaining prediction box and the remaining box in turn. If the IOU value is greater than the preset threshold, then the prediction box will be eliminated and the remaining predictions The frame repeats the above process until all prediction frames are processed and the image of the target object is obtained.
  • S123 Calculate the first image feature of the target object according to the image contour feature extraction algorithm and/or the color feature extraction algorithm on the image of the target object;
  • the image of the target object is calculated according to the image contour feature extraction algorithm and/or the color feature extraction algorithm, and the specific algorithm is the same as in step S112.
  • contour feature matching is realized by the contour moment matching method.
  • Contour moments can be spatial moments, central moments, etc.
  • I(x,y) is the value of the image contour pixel point (x,y)
  • n is the number of points on the contour
  • p and q are the moments in the x and y dimensions respectively, that is, m00 ,m10,m01...m03
  • the zero-order moment m00 is a simple accumulation of points on the contour, that is, how many points are there on the contour.
  • the first moments m10 and m01 are the accumulation in the x and y directions, respectively.
  • the spatial moment can be calculated by the OpenCV function cvGetSpatialMoment().
  • Color feature matching uses the histogram comparison function compareHist() provided by OpenCV to compare the similarity.
  • the training of the pre-trained convolutional neural network model includes the following steps:
  • the training sample dimension is marked with the face image of the identity identifier.
  • the training samples are input to the convolutional neural network model, and the convolutional neural network model is input to the prediction result of the identity of each sample.
  • N is the number of training samples
  • yi corresponding to the i-th sample is the marked result
  • h (h1, h2,...,hi) is the prediction result of sample i.
  • the loss function is used to compare whether the prediction result of the identity of the training sample is consistent with the marked identity, and the Softmax cross-entropy loss function is used in the embodiment of the application.
  • the value of the loss function is no longer reduced, but increased, it is considered that the first A convolutional neural network training can end.
  • the gradient descent method is an optimization algorithm used in machine learning and artificial intelligence. Approximate the minimum deviation model recursively.
  • step 104 the following steps are further included:
  • step S102 an image of the candidate object is obtained, and face detection is performed on the candidate object image, and the face image of the candidate object is intercepted.
  • the face detection method is the same as the method described in step S122.
  • S142 Input the face image of the candidate object into the preset face feature extraction model, and obtain the face feature of the candidate object.
  • the face image of the candidate object is input to the preset face feature extraction model.
  • the preset face feature extraction model uses a pre-trained convolutional neural network model, and the training steps are the same as in FIG. 4.
  • the cosine similarity means that the cosine value of the angle between two vectors ranges from [-1,1]. The closer the value is to 1, the closer the two vectors are, the more similar the two vectors are. ; The closer they are to -1, the more opposite their directions; the closer to 0, it means that the two vectors are nearly orthogonal.
  • the specific calculation formula is
  • Ai and Bi represent the components of vectors A and B, respectively.
  • FIG. 6 is a basic structural block diagram of an apparatus for positioning an object in a video in this embodiment.
  • an apparatus for locating an object in a video includes a first acquisition module 210, a retrieval module 220, a second acquisition module 230, and a processing module 240.
  • the first acquisition module 210 is used to acquire the object to be located.
  • the first image feature includes the image contour and/or the image color feature
  • the retrieval module 220 is configured to retrieve the preset video database according to the first image feature of the object to be located, and obtain the The image of the candidate object that matches the first image feature of the object to be located
  • the second acquisition module 230 is used to obtain the face feature of the object to be located
  • the processing module 240 is used to compare the face feature of the object to be located with the The image comparison of the candidate object determines that an object matching the facial feature of the object to be located among the candidate objects is the object to be located.
  • the embodiment of the application obtains the first image feature of the object to be located, the first image feature includes the image contour and/or the image color feature; according to the first image feature of the object to be located, the preset video database is retrieved to obtain The image of the candidate object that matches the first image feature of the object to be located; obtain the facial feature of the object to be located; compare the facial feature of the object to be located with the image of the candidate object to determine the Among the candidate objects, an object that matches the facial feature of the object to be located is the object to be located. Searching the video database through the first image feature can quickly locate the candidate object, and then locate the object to be located according to the facial feature, which greatly reduces the amount of calculation and improves the timeliness of object positioning.
  • the first acquisition module 210 further includes: a first acquisition sub-module for acquiring an image of an object to be positioned; a first processing sub-module for extracting an algorithm and/or The color feature extraction algorithm processes the image of the object to be positioned to obtain the first image feature of the object to be positioned.
  • the second acquisition module 230 further includes: a second acquisition sub-module for acquiring a face image of the object to be located; a second processing sub-module for acquiring the image of the object to be located The face image is input into a preset face feature extraction model, and the face feature of the object image to be located is obtained.
  • the retrieval module 220 further includes: a third acquisition sub-module, configured to acquire video image frames, where the video image frames are decompositions of videos stored in the preset video database; A detection sub-module for inputting the video image frame into a preset target detection model, and obtaining an image of the target object output by the target detection model in response to the video image frame, wherein the preset The target detection model is based on a pre-trained deep learning neural network, the image of the target object includes a human body image, and the deep learning neural network performs target detection on the input video image frame to output the human body image; the first calculator The module is used to calculate the first image feature of the target object according to the image contour feature extraction algorithm and/or the color feature extraction algorithm on the image of the target object; the third processing sub-module is used to calculate the object to be located When the degree of matching between the first image feature of and the first image feature of the target object is greater than a preset first threshold, the target object is determined to be the candidate object.
  • the processing module 240 further includes: a fourth acquisition sub-module, configured to acquire the face image of the candidate object, the face image of the candidate object is intercepted from the image of the candidate object
  • the second calculation sub-module is used to input the face image of the candidate object into the preset facial feature extraction model to obtain the facial features of the candidate object
  • the fourth processing sub-module is used to Calculate the degree of matching between the facial features of the object to be located and the facial features of the candidate object, and when the degree of matching is greater than a preset second threshold, determine that the candidate object is the object to be located .
  • the preset facial feature extraction model is based on a pre-trained convolutional neural network model, wherein the second calculation sub-module further includes: Five acquisition sub-module, used to obtain training samples marked with identities, the training samples are face images marked with different identities; the first prediction sub-module, used to input the training samples into the convolutional neural network The model obtains the identification prediction result of the training sample; the first comparison sub-module is used to compare whether the identification prediction result of the training sample is consistent with the identification according to a loss function, wherein the loss function is :
  • N is the number of training samples
  • yi corresponding to the i-th sample is the marked result
  • the fifth processing sub-module is used for when When the identity identifier prediction result is inconsistent with the identity identifier, the weights in the convolutional neural network model are updated repeatedly and iteratively until the loss function converges.
  • the image contour feature extraction algorithm adopts an image gradient algorithm, and the gradient is expressed as:
  • f(x,y) is the image function of the image to be calculated contour
  • f(x-1,y) and f(x,y-1) are the image function f(x, y)
  • G x and G y are the image function f(x, y) in the x direction and y respectively The gradient of the direction.
  • FIG. 7 is a block diagram of the basic structure of the computer device in this embodiment.
  • the computer device includes a processor, a non-volatile storage medium, a memory, and a network interface connected through a system bus.
  • the non-volatile storage medium of the computer device stores an operating system, a database, and computer-readable instructions.
  • the database may store control information sequences.
  • the processor can realize a kind of object positioning method.
  • the processor of the computer equipment is used to provide calculation and control capabilities, and supports the operation of the entire computer equipment.
  • a computer readable instruction may be stored in the memory of the computer device. When the computer readable instruction is executed by the processor, the processor may cause the processor to execute a method for positioning an object.
  • the network interface of the computer device is used to connect and communicate with the terminal.
  • FIG. 7 is only a block diagram of part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied.
  • the specific computer device may Including more or less parts than shown in the figure, or combining some parts, or having a different part arrangement.
  • the processor is used to execute the specific content of the first acquisition module 210, the retrieval module 220, the second acquisition module 230, and the processing module 240 in FIG. 6, and the memory stores computer readable instructions and various types of instructions required to execute the above modules. data.
  • the network interface is used for data transmission between user terminals or servers.
  • the memory in this embodiment stores the computer-readable instructions and data required to execute all sub-modules in the method for locating objects in the video, and the server can call the computer-readable instructions and data of the server to perform the functions of all the sub-modules.
  • the computer device obtains the first image feature of the object to be located, and the first image feature includes the image outline and/or the image color feature; searches the preset video database according to the first image feature of the object to be located, and obtains the The image of the candidate object matched by the first image feature of the object to be located; obtain the facial feature of the object to be located; compare the facial feature of the object to be located with the image of the candidate object to determine the candidate object The object in which matches the facial feature of the object to be located is the object to be located. Retrieving the video database through the first image feature can quickly locate the candidate object, and then locate the object to be positioned based on the facial feature, which greatly reduces the amount of calculation and improves the timeliness of object positioning.
  • the present application also provides one or more non-volatile storage media storing computer-readable instructions.
  • the computer-readable instructions are executed by one or more processors, the one or more processors execute any of the foregoing implementations.
  • the example describes the steps of the method of positioning objects in the video.
  • the aforementioned storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disc, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Library & Information Science (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

La présente invention relève du domaine de l'intelligence artificielle. L'invention concerne un procédé et un appareil de localisation d'un objet dans une vidéo, un dispositif informatique et un support d'informations. Le procédé comprend les étapes suivantes consistant : à acquérir une première caractéristique d'image d'un objet à localiser, la première caractéristique d'image comprenant un contour d'image et/ou une caractéristique de couleur d'image ; à rechercher un objet dans une base de données vidéo prédéfinie en fonction de la première caractéristique d'image de l'objet à localiser, et à acquérir des images d'objets candidats correspondant à la première caractéristique d'image de l'objet à localiser ; à acquérir une caractéristique faciale de l'objet à localiser ; et à comparer la caractéristique faciale de l'objet à localiser avec les images des objets candidats pour déterminer un objet, en mettant en correspondance la caractéristique faciale de l'objet à localiser, parmi les objets candidats à localiser, comme étant l'objet à localiser. Les objets sont recherchés dans la base de données vidéo selon la première caractéristique d'image, de telle sorte que les objets candidats puissent être rapidement localisés, et l'objet à localiser est ensuite localisé en fonction de la caractéristique faciale, ce qui permet de réduire la quantité de calcul dans une grande mesure, et d'améliorer l'efficacité de localisation d'un objet en termes de temps.
PCT/CN2019/117702 2019-08-01 2019-11-12 Procédé et appareil de localisation d'un objet dans une vidéo, dispositif informatique et support d'informations WO2021017289A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910707924.8A CN110633627A (zh) 2019-08-01 2019-08-01 在视频中定位对象的方法、装置、计算机设备及存储介质
CN201910707924.8 2019-08-01

Publications (1)

Publication Number Publication Date
WO2021017289A1 true WO2021017289A1 (fr) 2021-02-04

Family

ID=68969147

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/117702 WO2021017289A1 (fr) 2019-08-01 2019-11-12 Procédé et appareil de localisation d'un objet dans une vidéo, dispositif informatique et support d'informations

Country Status (2)

Country Link
CN (1) CN110633627A (fr)
WO (1) WO2021017289A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023024749A1 (fr) * 2021-08-24 2023-03-02 腾讯科技(深圳)有限公司 Procédé et appareil de récupération vidéo, dispositif, et support de stockage

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20170015639A (ko) * 2015-07-29 2017-02-09 대한민국(관리부서: 행정자치부 국립과학수사연구원장) 디지털 영상 내의 얼굴 인식을 통한 개인 식별 시스템 및 방법
CN106845385A (zh) * 2017-01-17 2017-06-13 腾讯科技(上海)有限公司 视频目标跟踪的方法和装置
CN109190561A (zh) * 2018-09-04 2019-01-11 四川长虹电器股份有限公司 一种视频播放中的人脸识别方法及系统
CN109299642A (zh) * 2018-06-08 2019-02-01 嘉兴弘视智能科技有限公司 基于人像识别的逻辑布控预警系统及方法
CN109308463A (zh) * 2018-09-12 2019-02-05 北京奇艺世纪科技有限公司 一种视频目标识别方法、装置及设备
CN109344713A (zh) * 2018-08-31 2019-02-15 电子科技大学 一种姿态鲁棒的人脸识别方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20170015639A (ko) * 2015-07-29 2017-02-09 대한민국(관리부서: 행정자치부 국립과학수사연구원장) 디지털 영상 내의 얼굴 인식을 통한 개인 식별 시스템 및 방법
CN106845385A (zh) * 2017-01-17 2017-06-13 腾讯科技(上海)有限公司 视频目标跟踪的方法和装置
CN109299642A (zh) * 2018-06-08 2019-02-01 嘉兴弘视智能科技有限公司 基于人像识别的逻辑布控预警系统及方法
CN109344713A (zh) * 2018-08-31 2019-02-15 电子科技大学 一种姿态鲁棒的人脸识别方法
CN109190561A (zh) * 2018-09-04 2019-01-11 四川长虹电器股份有限公司 一种视频播放中的人脸识别方法及系统
CN109308463A (zh) * 2018-09-12 2019-02-05 北京奇艺世纪科技有限公司 一种视频目标识别方法、装置及设备

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023024749A1 (fr) * 2021-08-24 2023-03-02 腾讯科技(深圳)有限公司 Procédé et appareil de récupération vidéo, dispositif, et support de stockage

Also Published As

Publication number Publication date
CN110633627A (zh) 2019-12-31

Similar Documents

Publication Publication Date Title
CN109151501B (zh) 一种视频关键帧提取方法、装置、终端设备及存储介质
US10515275B2 (en) Intelligent digital image scene detection
WO2021139324A1 (fr) Procédé et appareil de reconnaissance d'image, support de stockage lisible par ordinateur et dispositif électronique
US10891465B2 (en) Methods and apparatuses for searching for target person, devices, and media
CN109284733B (zh) 一种基于yolo和多任务卷积神经网络的导购消极行为监控方法
WO2019001481A1 (fr) Procédé et appareil de recherche de véhicule et d'identification de caractéristique d'aspect de véhicule, support de stockage et dispositif électronique
CN110532970B (zh) 人脸2d图像的年龄性别属性分析方法、系统、设备和介质
CN109101602A (zh) 图像检索模型训练方法、图像检索方法、设备及存储介质
WO2020228181A1 (fr) Procédé et appareil de cadrage d'image de paume, dispositif informatique et support de stockage
WO2022247539A1 (fr) Procédé de détection de corps vivant, procédé et appareil de traitement par réseau d'estimation, dispositif informatique et produit programme lisible par ordinateur
CN114550053A (zh) 一种交通事故定责方法、装置、计算机设备及存储介质
CN109711443A (zh) 基于神经网络的户型图识别方法、装置、设备及存储介质
CN113392866A (zh) 一种基于人工智能的图像处理方法、装置及存储介质
WO2022161302A1 (fr) Procédé et appareil de reconnaissance d'actions, dispositif, support de stockage et produit programme d'ordinateur
Werner et al. DeepMoVIPS: Visual indoor positioning using transfer learning
CN110852327A (zh) 图像处理方法、装置、电子设备及存储介质
CN112651381A (zh) 基于卷积神经网络的视频图像中家畜识别方法及装置
CN112232422A (zh) 一种目标行人的重识别方法、装置、电子设备和存储介质
WO2019100348A1 (fr) Procédé et dispositif de récupération d'images, ainsi que procédé et dispositif de génération de bibliothèques d'images
Singh et al. Performance enhancement of salient object detection using superpixel based Gaussian mixture model
CN117854156A (zh) 一种特征提取模型的训练方法和相关装置
WO2021017289A1 (fr) Procédé et appareil de localisation d'un objet dans une vidéo, dispositif informatique et support d'informations
CN108694411A (zh) 一种识别相似图像的方法
CN114299539B (zh) 一种模型训练方法、行人重识别方法和装置
CN115393755A (zh) 视觉目标跟踪方法、装置、设备以及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19940104

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19940104

Country of ref document: EP

Kind code of ref document: A1