WO2019153830A1 - 行人再识别方法、装置、电子设备和存储介质 - Google Patents
行人再识别方法、装置、电子设备和存储介质 Download PDFInfo
- Publication number
- WO2019153830A1 WO2019153830A1 PCT/CN2018/116600 CN2018116600W WO2019153830A1 WO 2019153830 A1 WO2019153830 A1 WO 2019153830A1 CN 2018116600 W CN2018116600 W CN 2018116600W WO 2019153830 A1 WO2019153830 A1 WO 2019153830A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- candidate
- target
- feature vector
- video
- target video
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 53
- 239000013598 vector Substances 0.000 claims description 359
- 230000015654 memory Effects 0.000 claims description 13
- 238000004590 computer program Methods 0.000 claims description 12
- 238000004891 communication Methods 0.000 description 20
- 238000010586 diagram Methods 0.000 description 10
- 230000008859 change Effects 0.000 description 9
- 230000006870 function Effects 0.000 description 6
- 230000008569 process Effects 0.000 description 5
- 238000004364 calculation method Methods 0.000 description 4
- 230000007246 mechanism Effects 0.000 description 4
- 238000010606 normalization Methods 0.000 description 4
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000013527 convolutional neural network Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 230000002159 abnormal effect Effects 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 230000002860 competitive effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 239000012634 fragment Substances 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000007670 refining Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/46—Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/46—Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
- G06V10/469—Contour-based spatial representations, e.g. vector-coding
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/48—Matching video sequences
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/52—Surveillance or monitoring of activities, e.g. for recognising suspicious objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/103—Static body considered as a whole, e.g. static pedestrian or occupant recognition
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/17—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
- H04N19/172—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/234—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
- H04N21/2343—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
- H04N21/44008—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
Definitions
- the embodiments of the present invention relate to the field of image processing technologies, and in particular, to a pedestrian re-identification method, apparatus, electronic device, and storage medium.
- Pedestrian re-identification is a key technology in intelligent video surveillance systems. It aims to find the same pedestrians in the target video by measuring the similarity between the given target video and the candidate video. Candidate video.
- the current pedestrian re-identification method mainly encodes a complete video, and uses the coding result to measure the similarity between the whole target video and the entire candidate video, and the effect of pedestrian recognition is poor.
- the embodiment of the present application provides a pedestrian re-identification technical solution.
- a pedestrian re-identification method including: acquiring a target video including a target pedestrian and at least one candidate video; and each target video segment and at least one of the target videos Each of the candidate video segments is separately encoded; a similarity score between each of the target video segments and each of the candidate video segments is determined according to the encoding result; the similarity score is used for characterization A degree of similarity between the target video segment and a pedestrian feature in the candidate video segment; performing at least one of the candidate videos for re-recognition based on the similarity score.
- the encoding, each of the target video segments and each of the at least one candidate video segment, respectively is encoded, including: acquiring each of the target video segments Obtaining a first target feature vector and a second target feature vector of each target video frame and an index feature vector of each of the target video segments, acquiring a first candidate feature of each candidate video frame in each of the candidate video segments a vector and a second candidate feature vector; generating an attention weight vector according to the index feature vector, the first target feature vector, and the first candidate feature vector; according to the attention weight vector, the second target feature vector, and The second candidate feature vector obtains an encoding result of each of the target video segments and an encoding result of each of the candidate video segments.
- the acquiring a first target feature vector and a second target feature vector of each target video frame in each of the target video segments and an index feature vector of each of the target video segments acquiring each Extracting the first candidate feature vector and the second candidate feature vector of each of the candidate video segments, respectively: extracting an image feature vector of each of the target video frames and each of the candidate video frames An image feature vector; generating, according to an image feature vector of each of the target video frames, a first target feature vector and a second target feature vector of each of the target video frames and an index feature vector of each of the target video segments, according to An image feature vector of each of the candidate video frames generates a first candidate feature vector and a second candidate feature vector for each of the candidate video frames.
- the generating the attention weight vector according to the index feature vector, the first target feature vector, and the first candidate feature vector comprises: according to the index feature vector and the first target feature The vector generates a target attention weight vector for each of the target video frames, and generates a candidate attention weight vector for each of the candidate video frames according to the index feature vector and the first candidate feature vector.
- the generating the target attention weight vector of each of the target video frames according to the index feature vector and the first target feature vector comprises: according to the index feature vector, each of the targets Generating a target heat map of each of the target video frames by the first target feature vector of the video frame; normalizing the target heat map to obtain a target attention weight vector for each of the target video frames; and / Or generating the candidate attention weight vector of each of the candidate video frames according to the index feature vector and the first candidate feature vector, including: according to the index feature vector, each of the candidate video frames Generating, by the first candidate feature vector, a candidate heat map of each of the candidate video frames; normalizing the candidate heat map to obtain candidate attention weight vectors for each of the candidate video frames.
- the obtaining, according to the attention weight vector, the second target feature vector, and the second candidate feature vector, an encoding result of each of the target video segments and each of the candidate video segments comprising: obtaining, according to a target attention weight vector of each of the target video frames and a second target feature vector, an encoding result of each of the target video segments, according to a candidate attention weight vector sum of each of the candidate video frames
- the second candidate feature vector obtains an encoding result of each of the candidate video segments.
- obtaining the encoding result of each of the target video segments according to the target attention weight vector and the second target feature vector of each of the target video frames including: The target attention weight vector is multiplied by the second target feature vector of the respective target video frame; the multiplied result of each of the target video frames is added in the time dimension to obtain a coding result of each of the target video segments; and / Or obtaining the coding result of each of the candidate video segments according to the candidate attention weight vector and the second candidate feature vector of each of the target video frames, including: selecting candidate attention weight vectors of each of the candidate video frames and respective The second candidate feature vectors of the candidate video frames are multiplied; the multiplication results of each of the candidate video frames are added in a time dimension to obtain an encoding result of each of the candidate video segments.
- determining the similarity score between each of the target video segments and each of the candidate video segments according to the encoding result including: encoding the result of each of the target video segments with each The coding results of the candidate video segments are sequentially subjected to a subtraction operation; the result of the subtraction operation is squared in each dimension; and the feature vector obtained by the square operation is fully connected to obtain a two-dimensional feature vector; The two-dimensional feature vector is subjected to a normalization operation to obtain a similarity score between each of the target video segments and each of the candidate video segments.
- performing the pedestrian re-identification on the at least one candidate video according to the similarity score includes: maximizing the score for each of the at least one candidate video segment
- the similarity scores of the preset ratio thresholds are added as the similarity scores of each of the candidate videos; the similarity scores of each of the candidate videos are arranged in descending order; One or more of the candidate videos are determined to include a video of the same target pedestrian as the target video.
- a pedestrian re-identification apparatus comprising: an acquisition module configured to acquire a target video including a target pedestrian and at least one candidate video; and an encoding module configured to be in the target video
- Each of the target video segments and each of the at least one of the candidate videos are separately encoded;
- a determining module configured to determine between each of the target video segments and each of the candidate video segments based on the encoding result a similarity score;
- the similarity score is used to characterize the degree of similarity between the target video segment and the pedestrian feature in the candidate video segment;
- the identification module is configured to match the at least one location according to the similarity score
- the candidate video is for pedestrian re-identification.
- the encoding module includes: a feature vector acquiring module configured to acquire a first target feature vector and a second target feature vector of each target video frame in each of the target video segments and each And an index feature vector of the target video segment, acquiring a first candidate feature vector and a second candidate feature vector of each candidate video frame in each of the candidate video segments; and a weight vector generating module configured to be configured according to the index feature a vector, the first target feature vector, and the first candidate feature vector generate an attention weight vector; the encoding result obtaining module is configured to be configured according to the attention weight vector, the second target feature vector, and the second candidate feature The vector obtains an encoding result of each of the target video segments and an encoding result of each of the candidate video segments.
- the feature vector obtaining module is configured to respectively extract an image feature vector of each of the target video frames and an image feature vector of each of the candidate video frames; according to each of the target video frames And generating, by the image feature vector, a first target feature vector and a second target feature vector of each of the target video frames and an index feature vector of each of the target video segments, generating each image feature vector according to each of the candidate video frames a first candidate feature vector and a second candidate feature vector of the candidate video frame.
- the weight vector generating module is configured to generate a target attention weight vector of each of the target video frames according to the index feature vector and the first target feature vector, according to the index feature vector sum
- the first candidate feature vector generates a candidate attention weight vector for each of the candidate video frames.
- the weight vector generating module is configured to generate a target heat map of each of the target video frames according to the index feature vector and the first target feature vector of each of the target video frames; Performing a normalization process on the target heat map to obtain a target attention weight vector of each of the target video frames; and/or, according to the index feature vector, the first candidate feature of each of the candidate video frames a vector generates a candidate heat map for each of the candidate video frames; normalizing the candidate heat map to obtain a candidate attention weight vector for each of the candidate video frames.
- the encoding result obtaining module is configured to obtain a coding result of each of the target video segments according to a target attention weight vector and a second target feature vector of each of the target video frames, according to each The candidate attention weight vector and the second candidate feature vector of the candidate video frame obtain an encoding result of each of the candidate video segments.
- the encoding result obtaining module is configured to multiply a target attention weight vector of each of the target video frames by a second target feature vector of a respective target video frame; each of the target video frames The multiplied results are added in a time dimension to obtain an encoding result of each of the target video segments; and/or, a candidate attention weight vector of each of the candidate video frames and a second candidate feature vector of a respective candidate video frame Multiplying; multiplying the multiplied results of each of the candidate video frames in a time dimension to obtain an encoding result for each of the candidate video segments.
- the determining module is configured to sequentially perform a subtraction operation on the encoding result of each of the target video segments and the encoding result of each of the candidate video segments; and the result of the subtraction operation is in each Performing a square operation on the dimension; performing a full join operation on the feature vector obtained by the square operation to obtain a two-dimensional feature vector; normalizing the two-dimensional feature vector to obtain each of the target video segments and each A similarity score between candidate video segments.
- the identifying module is configured to add, according to each of the candidate video segments of the at least one candidate video, the similarity scores of the preset proportional thresholds with the highest score as a similarity score for each of the candidate videos; sorting the similarity scores of each of the candidate videos in descending order; determining one or several of the candidate videos arranged in front to be included with the target video Video of the same target pedestrian.
- an electronic device includes: a processor and a memory; the memory is configured to store at least one executable instruction, the executable instruction causing the processor to execute as the first The pedestrian re-identification method described in the aspect.
- a computer readable storage medium having stored thereon a computer program, the computer program being executed by a processor to implement the pedestrian re-identification method as described in the first aspect.
- a computer program product comprising: at least one executable instruction, when executed by a processor, for implementing the pedestrian re-identification method according to the first aspect .
- the embodiment of the present application acquires a target video including a target pedestrian and at least one candidate video, and respectively encodes each target video segment in the target video and each candidate video segment in the at least one candidate video. And determining a similarity score between each target video segment and each candidate video segment according to the encoding result; and performing pedestrian re-recognition on the at least one candidate video according to the similarity score. Since the video clip contains far fewer frames than the entire video, the degree of change in the pedestrian surface information in the video clip is much smaller than the change in the pedestrian surface information in the entire video.
- each target video segment and each candidate video segment effectively reduces the change of pedestrian surface information, and utilizes pedestrians in different video frames.
- the diversity of surface information and the dynamic correlation between video frames and video frames improve the utilization of pedestrian surface information and improve the similarity score calculation between each target video segment and each candidate video segment.
- the accuracy rate can further improve the accuracy of pedestrian recognition.
- FIG. 1 is a schematic flow chart of an embodiment of a pedestrian re-identification method according to an embodiment of the present application
- FIG. 2 is a schematic diagram of a computing framework of an embodiment of a pedestrian re-identification method according to an embodiment of the present application
- FIG. 3 is a schematic flow chart of another embodiment of a pedestrian re-identification method according to an embodiment of the present application.
- FIG. 4 is a schematic diagram of a note coding mechanism in a pedestrian re-identification method according to an embodiment of the present application
- FIG. 5 is a schematic structural diagram of an embodiment of a pedestrian re-identification device according to an embodiment of the present application.
- FIG. 6 is a schematic structural diagram of another embodiment of a pedestrian re-identification device according to an embodiment of the present application.
- FIG. 7 is a schematic structural diagram of an embodiment of an electronic device according to an embodiment of the present application.
- FIG. 1 a flow diagram of one embodiment of a pedestrian re-identification method in accordance with an embodiment of the present application is shown.
- the pedestrian re-identification method of the embodiment of the present application performs the following steps by the processor of the electronic device calling the relevant instruction stored in the memory.
- Step S100 Acquire a target video including a target pedestrian and at least one candidate video.
- the target video in the embodiment of the present application may include one or more target pedestrians, and the candidate video may include one or more candidate pedestrians or no candidate pedestrians.
- the target video and the at least one candidate video in the embodiment of the present application may be a video image from a video capture device, and may also be derived from other devices.
- One of the purposes of the embodiments of the present application is to obtain candidate pedestrians from at least one candidate video.
- the target pedestrian is a candidate video for the same pedestrian.
- the step S100 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by an acquisition module 50 executed by the processor.
- Step S102 encoding each of the target video segments and each of the at least one candidate video in the target video separately.
- video segmentation is performed on the target video and the candidate video to generate each target video segment in the target video and each candidate video segment in the candidate video, wherein each target video segment has a fixed length of time, each candidate The video clips have a fixed length of time, and the length of time of each target video clip may or may not be the same as the length of time of each candidate video clip.
- each target video segment and each candidate video segment are respectively subjected to an encoding operation to obtain an encoding result of each target video segment and an encoding result of each candidate video segment.
- step S102 may be performed by a processor invoking a corresponding instruction stored in the memory or by an encoding module 52 executed by the processor.
- Step S104 Determine a similarity score between each target video segment and each candidate video segment according to the encoding result.
- the coding result of each target video segment may be considered as a representation form of the pedestrian feature vector in each target video segment, and the coding result of each candidate video segment may be considered as being in each candidate video segment.
- the result of the encoding is the pedestrian feature vector.
- the pedestrian feature vector between a target video segment and a candidate video segment is the same or similar, it indicates that the target video segment and the candidate video segment are more likely to include the same target pedestrian, that is, the target video segment and the target video segment
- the similarity score between the candidate video segments is higher; if the pedestrian feature vector between a target video segment and a candidate video segment is different, it indicates that the target video segment and the candidate video segment contain the same target pedestrian The possibility is lower, that is, the similarity score between the target video segment and the candidate video segment is lower.
- step S104 may be performed by the processor invoking a corresponding instruction stored in the memory, or may be performed by the determining module 54 executed by the processor.
- Step S106 Perform pedestrian recognition on the at least one candidate video according to the similarity score.
- the similarity score of at least one candidate video may be obtained according to the similarity score.
- a candidate video having a higher similarity score is determined to include a candidate video having the same target pedestrian as in the target video.
- step S106 may be performed by the processor invoking a corresponding instruction stored in the memory or by the identification module 56 being executed by the processor.
- the pedestrian re-identification method proposed in the embodiment of the present application can be executed under the calculation framework shown in FIG. 2.
- the video (including the target video and the at least one candidate video) is cut to generate a video clip having a fixed length.
- p represents the target video
- g represents one of the at least one candidate video
- p n is a target video segment in the target video p
- g k is a candidate video segment in the candidate video g.
- a deep network with a cooperative attention mechanism is utilized.
- the depth network takes the target video segment p n and the candidate video segment g k as input items, and the output term m(p n , g k ) is a similarity score between the target video segment p n and the candidate video segment g k .
- m(p n , g k ) is a similarity score between the target video segment p n and the candidate video segment g k .
- a similarity score between several video segments can be obtained.
- a competitive mechanism can be used to select partial similarity scores with higher similarity, and the target video p can be obtained by adding these similarity scores.
- the embodiment of the present application acquires a target video including a target pedestrian and at least one candidate video, and respectively encodes each target video segment in the target video and each candidate video segment in the at least one candidate video. And determining a similarity score between each target video segment and each candidate video segment according to the encoding result; and performing pedestrian re-recognition on the at least one candidate video according to the similarity score. Since the video clip contains far fewer frames than the entire video, the degree of change in the pedestrian surface information in the video clip is much smaller than the change in the pedestrian surface information in the entire video.
- each target video segment and each candidate video segment effectively reduces the change of pedestrian surface information, and utilizes pedestrians in different video frames.
- the diversity of surface information and the dynamic correlation between video frames and video frames improve the utilization of pedestrian surface information and improve the similarity score calculation between each target video segment and each candidate video segment.
- the accuracy rate can further improve the accuracy of pedestrian recognition.
- FIG. 3 a flow diagram of another embodiment of a pedestrian re-identification method in accordance with an embodiment of the present application is shown.
- Step S300 Acquire a target video including a target pedestrian and at least one candidate video.
- Step S302 encoding each of the target video segments and each of the at least one candidate video in the target video separately.
- this step S302 may include the following steps:
- Step S3020 Acquire a first target feature vector and a second target feature vector of each target video frame in each target video segment and an index feature vector of each target video segment, and obtain each candidate in each candidate video segment.
- the image feature vector of each target video frame and the image feature vector of each candidate video frame may be extracted by using a neural network, and the image feature vector is used to reflect image features in the video frame, such as pedestrian characteristics. , background features, and more.
- the first target feature vector and the second target feature vector of each target video frame and the index feature vector of each target video segment are generated according to the image feature vector of each target video frame, and the index feature vector includes the target
- the information of the video clip can effectively distinguish useful information from noise information.
- the candidate video frames the first candidate feature vector and the second candidate feature vector of each candidate video frame are generated according to the image feature vector of each candidate video frame.
- the first target feature vector (“key” feature vector) and the first candidate feature vector (“key” feature vector) may be generated according to linear transformation of each frame feature, and may be generated according to another linear transformation of each frame feature.
- the second target feature vector (“value” feature vector) and the second candidate feature vector (“value” feature vector) may utilize a Long Short-Term Memory (LSTM) network and each of each target video segment
- LSTM Long Short-Term Memory
- the image feature vector of the target video frame generates an index feature vector for each target video segment, and the index feature vector is generated by the target video segment, acting on the target video segment itself and all candidate video segments.
- Step S3022 Generate an attention weight vector according to the index feature vector, the first target feature vector, and the first candidate feature vector.
- the first target feature vector and the first candidate feature vector are used to generate a attention weight vector.
- the target attention weight vector of each target video frame may be generated according to the index feature vector and the first target feature vector, optionally, according to the index feature vector, each target video.
- the first target feature vector of the frame generates a target heat map of each target video frame, specifically, performing an inner product operation according to the index feature vector and the first target feature vector of each target video frame to obtain a target heat of each target video frame.
- Figure normalize the target heat map using the softmax function in the time dimension to obtain the target attention weight vector for each target video frame.
- the candidate attention weight vector of each candidate video frame may be generated according to the index feature vector and the first candidate feature vector, and optionally, each of the candidate feature vectors of each candidate video frame is generated according to the index feature vector.
- a candidate heat map of the candidate video frames specifically, performing an inner product operation according to the index feature vector and the first candidate feature vector of each candidate video frame to obtain a candidate heat map of each candidate video frame; using a softmax function in a time dimension
- the candidate heat map is normalized to obtain a candidate attention weight vector for each candidate video frame.
- the weight vector is used to enhance the effective pedestrian characteristics in the encoding process. It is a weight vector with discriminative information, which can reduce the influence of noise information.
- Step S3024 Obtain an encoding result of each target video segment and an encoding result of the candidate video segment according to the attention weight vector, the second target feature vector, and the second candidate feature vector.
- the second target feature vector is used to reflect the image feature of each frame in the target video segment
- the second candidate feature vector is used to reflect the image feature of each frame in the candidate video segment.
- the encoding result of each target video segment is obtained according to the target attention weight vector and the second target feature vector of each target video frame. Specifically, the target attention weight vector of each target video frame is multiplied by the second target feature vector of the respective target video frame; the multiplication results of each target video frame are added in the time dimension to obtain each target video segment. The result of the encoding.
- the coding result of each candidate video segment is obtained according to the candidate attention weight vector and the second candidate feature vector of each candidate video frame.
- multiplying candidate attention weight vectors of each candidate video frame by second candidate feature vectors of respective candidate video frames; adding multiplication results of each candidate video frame in a time dimension to obtain each candidate video The result of the encoding of the fragment.
- the step S302 of the embodiment of the present application can be implemented by paying attention to the encoding mechanism, that is, obtaining the encoding result of the video segment by refining different frame features in the video segment (the target video segment and the candidate video segment), and the process thereof is as shown in FIG. 4 .
- a convolutional neural network feature is extracted for each of the target video frame and the candidate video segment of the target video segment, and corresponding to each target video frame or each candidate video frame according to the convolutional neural network feature generation.
- the "key" feature vector and the "value” feature vector, the "key” feature vector of each target video frame or each candidate video frame is internally coded with the index feature vector of each target video segment to form a heat map, and the heat is generated.
- the map reflects the correlation of each feature within the target video frame or candidate video frame with global information.
- the heat map is normalized by the softmax function in the time dimension to form a attention weight vector, which is multiplied by the "value" feature vector of each video frame in each dimension, and the different video frames are obtained.
- the results are summed in the time dimension to obtain the encoded result for each video segment.
- Step S304 Determine a similarity score between each target video segment and each candidate video segment according to the encoding result.
- the coding result of each target video segment and the coding result of each candidate video segment are sequentially subjected to a subtraction operation, a square operation, a full connection operation, and a normalization operation to obtain each target video.
- the similarity score between the segment and each candidate video segment is sequentially subtracted from the coding result of each candidate video segment, and then squared operation is performed on each image dimension, including but not limited to: pedestrian image dimension and background The image dimension, wherein the pedestrian image dimension includes a head image dimension, an upper body image dimension, a lower body image dimension, and the like; the background image dimension includes an architectural image dimension, a street image dimension, and the like.
- the eigenvector obtained after the squaring operation obtains a two-dimensional eigenvector through the fully connected layer, and finally obtains the similarity score between each target video segment and each candidate video segment by nonlinear normalization of the Sigmoid function.
- Step S306 Perform pedestrian recognition on the at least one candidate video according to the similarity score.
- a similarity score greater than or equal to a preset threshold or a similar score with a higher score are added as the similarity score for each candidate video; the similarity scores of each candidate video are arranged in descending order; one or several candidate videos arranged in front are determined as The target video contains a video of the same target pedestrian.
- the preset threshold can be set according to actual conditions, and the higher score is relatively speaking.
- the embodiment of the present application acquires a target video including a target pedestrian and at least one candidate video, and respectively encodes each target video segment in the target video and each candidate video segment in the at least one candidate video. And determining a similarity score between each target video segment and each candidate video segment according to the encoding result; and performing pedestrian re-recognition on the at least one candidate video according to the similarity score. Since the video clip contains far fewer frames than the entire video, the degree of change in the pedestrian surface information in the video clip is much smaller than the change in the pedestrian surface information in the entire video.
- each target video segment and each candidate video segment effectively reduces the change of pedestrian surface information, and utilizes pedestrians in different video frames.
- the diversity of surface information and the dynamic correlation between video frames and video frames improve the utilization of pedestrian surface information and improve the similarity score calculation between each target video segment and each candidate video segment.
- the accuracy rate can further improve the accuracy of pedestrian recognition.
- the encoding result of the candidate video in the embodiment of the present application is obtained by the index feature vector of the target video segment and the "key" feature vector of the candidate video segment.
- the index feature vector of the target video segment is used as the guiding information.
- the accuracy of the coding result of the candidate video is determined to determine the similarity score.
- the attention weight vector of each candidate video frame is estimated by using the index feature vector of the target video segment, and the influence of the abnormal candidate video frame on the coding result of the candidate video segment in the candidate video is reduced, and the pertinence of pedestrian re-identification in the candidate video is improved.
- the target video and the candidate video are sliced, and the target video segment and the candidate video segment are encoded.
- the candidate with higher similarity score is selected.
- the video clip is a valid candidate video clip of the candidate video, and the candidate video clip with lower similarity score is ignored.
- FIG. 5 there is shown a block diagram of one embodiment of a pedestrian re-identification apparatus in accordance with an embodiment of the present application.
- the pedestrian re-identification device includes: an obtaining module 50 configured to acquire a target video including a target pedestrian and at least one candidate video; and an encoding module 52 configured to each target video segment and at least one in the target video Each candidate video segment in the candidate video is separately encoded; the determining module 54 is configured to determine a similarity score between each target video segment and each candidate video segment according to the encoding result; the similarity score is used to represent the target The degree of similarity between the video segment and the pedestrian feature in the candidate video segment; the identification module 56 is configured to perform pedestrian re-recognition on the at least one candidate video based on the similarity score.
- the pedestrian re-identification device of the embodiment of the present invention is used to implement the corresponding pedestrian re-identification method in the above embodiments, and has the beneficial effects of the corresponding method embodiments, and details are not described herein again.
- FIG. 6 a block diagram of another embodiment of a pedestrian re-identification apparatus in accordance with an embodiment of the present application is shown.
- the pedestrian re-identification device includes: an obtaining module 60 configured to acquire a target video including a target pedestrian and at least one candidate video; and an encoding module 62 configured to each target video segment and at least one in the target video Each candidate video segment in the candidate video is separately encoded; the determining module 64 is configured to determine a similarity score between each target video segment and each candidate video segment according to the encoding result; the similarity score is used to represent the target The degree of similarity between the video clip and the pedestrian feature in the candidate video clip; the identification module 66 is configured to perform pedestrian re-recognition on the at least one candidate video according to the similarity score.
- the encoding module 62 includes: a feature vector obtaining module 620 configured to acquire a first target feature vector and a second target feature vector of each target video frame in each target video segment and an index of each target video segment. And acquiring a first candidate feature vector and a second candidate feature vector for each candidate video frame in each of the candidate video segments; the weight vector generation module 622 is configured to: according to the index feature vector, the first target feature vector, and the first The candidate feature vector generates an attention weight vector; the encoding result obtaining module 624 is configured to obtain an encoding result of each target video segment and an encoding result of each candidate video segment according to the attention weight vector, the second target feature vector, and the second candidate feature vector. .
- the feature vector obtaining module 620 is configured to separately extract an image feature vector of each target video frame and an image feature vector of each candidate video frame; and generate each target video frame according to the image feature vector of each target video frame. a first target feature vector and a second target feature vector and an index feature vector of each target video segment, and generating a first candidate feature vector and a second candidate feature of each candidate video frame according to the image feature vector of each candidate video frame vector.
- the weight vector generating module 622 is configured to generate a target attention weight vector of each target video frame according to the index feature vector and the first target feature vector, and generate each candidate video frame according to the index feature vector and the first candidate feature vector.
- Candidate attention weight vector is configured to generate a target attention weight vector of each target video frame according to the index feature vector and the first target feature vector, and generate each candidate video frame according to the index feature vector and the first candidate feature vector.
- the weight vector generating module 622 is configured to generate a target heat map of each target video frame according to the index feature vector and the first target feature vector of each target video frame; normalizing the target heat map to obtain each a target attention weight vector of the target video frames; and/or, generating a candidate heat map for each candidate video frame according to the index feature vector, the first candidate feature vector of each candidate video frame; normalizing the candidate heat map A candidate attention weight vector for each candidate video frame is obtained.
- the encoding result obtaining module 624 is configured to obtain, according to the target attention weight vector and the second target feature vector of each target video frame, the encoding result of each target video segment, according to the candidate attention weight vector of each candidate video frame.
- the encoding result of each candidate video segment is obtained with the second candidate feature vector.
- the encoding result obtaining module 624 is configured to multiply the target attention weight vector of each target video frame by the second target feature vector of the respective target video frame; and multiply the result of each target video frame in the time dimension. Adding, obtaining an encoding result of each target video segment; and/or multiplying a candidate attention weight vector of each candidate video frame by a second candidate feature vector of a respective candidate video frame; The multiplication results are added in the time dimension to obtain the coding result of each candidate video segment.
- the determining module 64 is configured to sequentially perform a subtraction operation on the coding result of each target video segment and the coding result of each candidate video segment; and perform a square operation on each dimension in the result of the subtraction operation;
- the feature vector obtained by the square operation performs a full join operation to obtain a two-dimensional feature vector;
- the two-dimensional feature vector is normalized to obtain a similarity score between each target video segment and each candidate video segment.
- the identifying module 66 is configured to add, for each candidate video segment of the at least one candidate video, a similarity score of the highest-precision preset proportional threshold as the similarity score of each candidate video. And sorting the similarity scores of each candidate video in descending order; determining one or several candidate videos arranged in front as the video containing the same target pedestrian as the target video.
- the pedestrian re-identification device of the embodiment of the present invention is used to implement the corresponding pedestrian re-identification method in the above embodiments, and has the beneficial effects of the corresponding method embodiments, and details are not described herein again.
- the embodiment of the present application further provides an electronic device, such as a mobile terminal, a personal computer (PC), a tablet computer, a server, and the like.
- an electronic device such as a mobile terminal, a personal computer (PC), a tablet computer, a server, and the like.
- FIG. 7 there is shown a schematic structural diagram of an electronic device 700 suitable for implementing the pedestrian re-identification device of the embodiment of the present application.
- the electronic device 700 may include a memory and a processor.
- electronic device 700 includes one or more processors, communication elements, etc., such as one or more central processing units (CPUs) 701, and/or one or more image processors (GPU) 713, etc., the processor may perform various appropriate actions in accordance with executable instructions stored in read only memory (ROM) 702 or executable instructions loaded from random access memory (RAM) 703 from storage portion 708.
- the communication component includes a communication component 712 and/or a communication interface 709.
- the communication component 712 can include, but is not limited to, a network card, and the network card can include, but is not limited to, an IB (Infiniband) network card, and the communication interface 709 includes a communication interface of a network interface card such as a local area network (LAN) card, a modem, or the like.
- the communication interface 709 performs communication processing via a network such as the Internet.
- the processor can communicate with read-only memory 702 and/or random access memory 703 to execute executable instructions, communicate with communication component 712 via communication bus 704, and communicate with other target devices via communication component 712, thereby completing embodiments of the present application.
- Providing any operation corresponding to the pedestrian re-identification method for example, acquiring a target video including the target pedestrian and at least one candidate video; for each of the target video segments and at least one of the candidate videos The candidate video segments are respectively encoded; determining a similarity score between each of the target video segments and each of the candidate video segments according to the encoding result; the similarity score is used to represent the target video segment and The degree of similarity of the pedestrian features in the candidate video segments; performing pedestrian re-recognition on at least one of the candidate videos according to the similarity score.
- RAM 703 various programs and data required for the operation of the device can be stored.
- the CPU 701 or the GPU 713, the ROM 702, and the RAM 703 are connected to each other through a communication bus 704.
- ROM 702 is an optional module.
- the RAM 703 stores executable instructions or writes executable instructions to the ROM 702 at runtime, the executable instructions causing the processor to perform operations corresponding to the above-described communication methods.
- An input/output (I/O) interface 705 is also coupled to communication bus 704.
- the communication component 712 can be integrated or can be configured to have multiple sub-modules (e.g., multiple IB network cards) and be on a communication bus link.
- the following components are connected to the I/O interface 705: an input portion 706 including a keyboard, a mouse, etc.; an output portion 707 including a cathode ray tube (CRT), a liquid crystal display (LCD), and the like, and a speaker; a storage portion 708 including a hard disk or the like And a communication interface 709 including a network interface card such as a LAN card, modem, or the like.
- Driver 710 is also connected to I/O interface 705 as needed.
- a removable medium 711 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory or the like, is mounted on the drive 710 as needed so that a computer program read therefrom is installed into the storage portion 708 as needed.
- FIG. 7 is only an optional implementation manner.
- the number and type of components in FIG. 7 may be selected, deleted, added, or replaced according to actual needs;
- Different function components can also be implemented in separate settings or integrated settings, such as GPU and CPU detachable settings or GPU can be integrated on the CPU, communication components can be separated, or integrated on the CPU or GPU. and many more.
- the electronic device in the embodiment of the present application may be used to implement a corresponding pedestrian re-identification method in the foregoing embodiments, where each device in the electronic device may be used to perform various steps in the foregoing method embodiments, for example, the pedestrians described above.
- the identification method can be implemented by the processor of the electronic device calling the relevant instruction stored in the memory. For brevity, no further details are provided herein.
- embodiments of the present application include a computer program product comprising a computer program tangibly embodied on a machine readable medium, the computer program comprising program code for executing the method illustrated in the flowchart, the program code comprising the corresponding execution
- An instruction corresponding to the method step provided by the embodiment of the present application for example, acquiring a target video including a target pedestrian and at least one candidate video; each of the target video segments and at least one of the candidate videos in the target video The candidate video segments are respectively encoded; the similarity score between each of the target video segments and each of the candidate video segments is determined according to the encoding result; the similarity score is used to represent the target video segment and the The degree of similarity of the pedestrian features in the candidate video segments; performing at least one of the candidate videos for re-recognition based on the similarity scores.
- the computer program can be downloaded and installed from the
- the methods and apparatus, electronic devices, and storage media of the embodiments of the present application may be implemented in many ways.
- the methods and apparatus, electronic devices, and storage media of the embodiments of the present application can be implemented by software, hardware, firmware, or any combination of software, hardware, and firmware.
- the above-described sequence of steps for the method is for illustrative purposes only, and the steps of the method of the embodiments of the present application are not limited to the order specifically described above unless otherwise specifically stated.
- the present application may also be embodied as a program recorded in a recording medium, the programs including machine readable instructions for implementing a method in accordance with embodiments of the present application.
- the present application also covers a recording medium storing a program for executing the method according to an embodiment of the present application.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Signal Processing (AREA)
- Human Computer Interaction (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
- Image Analysis (AREA)
Abstract
Description
Claims (21)
- 一种行人再识别方法,包括:获取包含目标行人的目标视频和至少一个候选视频;对所述目标视频中的每个目标视频片段和至少一个所述候选视频中的每个候选视频片段分别进行编码;根据编码结果确定每个所述目标视频片段和每个所述候选视频片段之间的相似性分值;所述相似性分值用于表征所述目标视频片段与所述候选视频片段中的行人特征的相似程度;根据所述相似性分值对至少一个所述候选视频进行行人再识别。
- 根据权利要求1所述的方法,其中,对所述目标视频中的每个目标视频片段和至少一个所述候选视频中的每个候选视频片段分别进行编码,包括:获取每个所述目标视频片段中的每个目标视频帧的第一目标特征向量和第二目标特征向量以及每个所述目标视频片段的索引特征向量,获取每个所述候选视频片段中的每个候选视频帧的第一候选特征向量和第二候选特征向量;根据所述索引特征向量、所述第一目标特征向量和所述第一候选特征向量生成注意权重向量;根据所述注意权重向量、所述第二目标特征向量和所述第二候选特征向量获得每个所述目标视频片段的编码结果和每个所述候选视频片段的编码结果。
- 根据权利要求2所述的方法,其中,所述获取每个所述目标视频片段中的每个目标视频帧的第一目标特征向量和第二目标特征向量以及每个所述目标视频片段的索引特征向量,获取每个所述候选视频片段中的每个候选视频帧的第一候选特征向量和第二候选特征向量,包括:分别提取每个所述目标视频帧的图像特征向量和每个所述候选视频帧的图像特征向量;根据每个所述目标视频帧的图像特征向量生成每个所述目标视频帧的第一目标特征向量和第二目标特征向量以及每个所述目标视频片段的索引特征向量,根据每个所述候选视频帧的图像特征向量生成每个所述候选视频帧的第一候选特征向量和第二候选特征向量。
- 根据权利要求2或3所述的方法,其中,所述根据所述索引特征向量、所述第一目标特征向量和所述第一候选特征向量生成注意权重向量,包括:根据所述索引特征向量和所述第一目标特征向量生成每个所述目标视频帧的目标注意权重向量,根据所述索引特征向量和所述第一候选特征向 量生成每个所述候选视频帧的候选注意权重向量。
- 根据权利要求4所述的方法,其中,所述根据所述索引特征向量和所述第一目标特征向量生成每个所述目标视频帧的目标注意权重向量,包括:根据所述索引特征向量、每个所述目标视频帧的所述第一目标特征向量生成每个所述目标视频帧的目标热度图;对所述目标热度图进行归一化处理得到每个所述目标视频帧的目标注意权重向量;和/或,所述根据所述索引特征向量和所述第一候选特征向量生成每个所述候选视频帧的候选注意权重向量,包括:根据所述索引特征向量、每个所述候选视频帧的所述第一候选特征向量生成每个所述候选视频帧的候选热度图;对所述候选热度图进行归一化处理得到每个所述候选视频帧的候选注意权重向量。
- 根据权利要求2-5任一项所述的方法,其中,所述根据所述注意权重向量、所述第二目标特征向量和所述第二候选特征向量获得每个所述目标视频片段的编码结果和每个所述候选视频片段的编码结果,包括:根据每个所述目标视频帧的目标注意权重向量和第二目标特征向量获得每个所述目标视频片段的编码结果,根据每个所述候选视频帧的候选注意权重向量和第二候选特征向量获得每个所述候选视频片段的编码结果。
- 根据权利要求6所述的方法,其中,所述根据每个所述目标视频帧的目标注意权重向量和第二目标特征向量获得每个所述目标视频片段的编码结果,包括:将每个所述目标视频帧的目标注意权重向量与各自目标视频帧的第二目标特征向量相乘;将每个所述目标视频帧的相乘结果在时间维度相加,得到每个所述目标视频片段的编码结果;和/或,所述根据每个所述目标视频帧的候选注意权重向量和第二候选特征向量获得每个所述候选视频片段的编码结果,包括:将每个所述候选视频帧的候选注意权重向量与各自候选视频帧的第二候选特征向量相乘;将每个所述候选视频帧的相乘结果在时间维度相加,得到每个所述候选视频片段的编码结果。
- 根据权利要求1-7中任一项所述的方法,其中,所述根据编码结果确定每个所述目标视频片段和每个所述候选视频片段之间的相似性分值,包括:将每个所述目标视频片段的编码结果与每个所述候选视频片段的编码结果依次进行相减操作;将相减操作的结果在每一个维度上进行平方操作;对平方操作得到的特征向量进行全连接操作得到二维的特征向量;将所述二维的特征向量进行归一化操作,得到每个所述目标视频片段和每个所述候选视频片段之间的相似性分值。
- 根据权利要求1-8中任一项所述的方法,其中,所述根据所述相似性分值对至少一个所述候选视频进行行人再识别,包括:针对至少一个所述候选视频中的每个所述候选视频片段,将分值最高的预设比例阈值的所述相似性分值相加,作为每个所述候选视频的相似性分值;将每个所述候选视频的相似性分值按照降序进行排列;将排列在前面的一个或者几个所述候选视频确定为与所述目标视频包含同一目标行人的视频。
- 一种行人再识别装置,包括:获取模块,配置为获取包含目标行人的目标视频和至少一个候选视频;编码模块,配置为对所述目标视频中的每个目标视频片段和至少一个所述候选视频中的每个候选视频片段分别进行编码;确定模块,配置为根据编码结果确定每个所述目标视频片段和每个所述候选视频片段之间的相似性分值;所述相似性分值用于表征所述目标视频片段与所述候选视频片段中的行人特征的相似程度;识别模块,配置为根据所述相似性分值对至少一个所述候选视频进行行人再识别。
- 根据权利要求10所述的装置,其中,所述编码模块,包括:特征向量获取模块,配置为获取每个所述目标视频片段中的每个目标视频帧的第一目标特征向量和第二目标特征向量以及每个所述目标视频片段的索引特征向量,获取每个所述候选视频片段中的每个候选视频帧的第一候选特征向量和第二候选特征向量;权重向量生成模块,配置为根据所述索引特征向量、所述第一目标特征向量和所述第一候选特征向量生成注意权重向量;编码结果获取模块,配置为根据所述注意权重向量、所述第二目标特征向量和所述第二候选特征向量获得每个所述目标视频片段的编码结果和每个所述候选视频片段的编码结果。
- 根据权利要求11所述的装置,其中,所述特征向量获取模块,配置为分别提取每个所述目标视频帧的图像特征向量和每个所述候选视频帧的图像特征向量;根据每个所述目标视频帧的图像特征向量生成每个所述目标视频帧的第一目标特征向量和第二目标特征向量以及每个所述目标视频片段的索引特征向量,根据每个所述候选视频帧的图像特征向量生成每 个所述候选视频帧的第一候选特征向量和第二候选特征向量。
- 根据权利要求11或12所述的装置,其中,所述权重向量生成模块,配置为根据所述索引特征向量和所述第一目标特征向量生成每个所述目标视频帧的目标注意权重向量,根据所述索引特征向量和所述第一候选特征向量生成每个所述候选视频帧的候选注意权重向量。
- 根据权利要求13所述的装置,其中,所述权重向量生成模块,配置为根据所述索引特征向量、每个所述目标视频帧的所述第一目标特征向量生成每个所述目标视频帧的目标热度图;对所述目标热度图进行归一化处理得到每个所述目标视频帧的目标注意权重向量;和/或,根据所述索引特征向量、每个所述候选视频帧的所述第一候选特征向量生成每个所述候选视频帧的候选热度图;对所述候选热度图进行归一化处理得到每个所述候选视频帧的候选注意权重向量。
- 根据权利要求11-14任一项所述的装置,其中,所述编码结果获取模块,配置为根据每个所述目标视频帧的目标注意权重向量和第二目标特征向量获得每个所述目标视频片段的编码结果,根据每个所述候选视频帧的候选注意权重向量和第二候选特征向量获得每个所述候选视频片段的编码结果。
- 根据权利要求15所述的装置,其中,所述编码结果获取模块,配置为将每个所述目标视频帧的目标注意权重向量与各自目标视频帧的第二目标特征向量相乘;将每个所述目标视频帧的相乘结果在时间维度相加,得到每个所述目标视频片段的编码结果;和/或,将每个所述候选视频帧的候选注意权重向量与各自候选视频帧的第二候选特征向量相乘;将每个所述候选视频帧的相乘结果在时间维度相加,得到每个所述候选视频片段的编码结果。
- 根据权利要求10-16中任一项所述的装置,其中,所述确定模块,配置为将每个所述目标视频片段的编码结果与每个所述候选视频片段的编码结果依次进行相减操作;将相减操作的结果在每一个维度上进行平方操作;对平方操作得到的特征向量进行全连接操作得到二维的特征向量;将所述二维的特征向量进行归一化操作,得到每个所述目标视频片段和每个所述候选视频片段之间的相似性分值。
- 根据权利要求10-17中任一项所述的装置,其中,所述识别模块,配置为针对至少一个所述候选视频中的每个所述候选视频片段,将分值最高的预设比例阈值的所述相似性分值相加,作为每个所述候选视频的相似性分值;将每个所述候选视频的相似性分值按照降序进行排列;将排列在前面的一个或者几个所述候选视频确定为与所述目标视频包含同一目标行人的视频。
- 一种电子设备,包括:处理器和存储器;所述存储器用于存放至少一个可执行指令,所述可执行指令使所述处 理器执行如权利要求1-9任一项所述的行人再识别方法。
- 一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现权利要求1-9任一项所述的行人再识别方法。
- 一种计算机程序产品,包括至少一个可执行指令,所述可执行指令被处理器执行时用于实现如权利要求1-9任一项所述的行人再识别方法。
Priority Applications (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020197038764A KR102348002B1 (ko) | 2018-02-12 | 2018-11-21 | 보행자 재식별 방법, 장치, 전자 기기 및 저장 매체 |
SG11201913733QA SG11201913733QA (en) | 2018-02-12 | 2018-11-21 | Pedestrian re-identification method and apparatus, electronic device, and storage medium |
JP2019570048A JP6905601B2 (ja) | 2018-02-12 | 2018-11-21 | 歩行者再認識方法、装置、電子機器および記憶媒体 |
US16/726,878 US11301687B2 (en) | 2018-02-12 | 2019-12-25 | Pedestrian re-identification methods and apparatuses, electronic devices, and storage media |
PH12020500050A PH12020500050A1 (en) | 2018-02-12 | 2020-01-06 | Pedestrian re-identification method and apparatus, electronic device, and storage medium |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810145717.3 | 2018-02-12 | ||
CN201810145717.3A CN108399381B (zh) | 2018-02-12 | 2018-02-12 | 行人再识别方法、装置、电子设备和存储介质 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/726,878 Continuation US11301687B2 (en) | 2018-02-12 | 2019-12-25 | Pedestrian re-identification methods and apparatuses, electronic devices, and storage media |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2019153830A1 true WO2019153830A1 (zh) | 2019-08-15 |
Family
ID=63096438
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2018/116600 WO2019153830A1 (zh) | 2018-02-12 | 2018-11-21 | 行人再识别方法、装置、电子设备和存储介质 |
Country Status (7)
Country | Link |
---|---|
US (1) | US11301687B2 (zh) |
JP (1) | JP6905601B2 (zh) |
KR (1) | KR102348002B1 (zh) |
CN (1) | CN108399381B (zh) |
PH (1) | PH12020500050A1 (zh) |
SG (1) | SG11201913733QA (zh) |
WO (1) | WO2019153830A1 (zh) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110827312A (zh) * | 2019-11-12 | 2020-02-21 | 北京深境智能科技有限公司 | 一种基于协同视觉注意力神经网络的学习方法 |
CN111538861A (zh) * | 2020-04-22 | 2020-08-14 | 浙江大华技术股份有限公司 | 基于监控视频进行图像检索的方法、装置、设备及介质 |
CN111723645A (zh) * | 2020-04-24 | 2020-09-29 | 浙江大学 | 用于同相机内有监督场景的多相机高精度行人重识别方法 |
CN115150663A (zh) * | 2022-07-01 | 2022-10-04 | 北京奇艺世纪科技有限公司 | 热度曲线的生成方法、装置、电子设备及存储介质 |
Families Citing this family (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108399381B (zh) | 2018-02-12 | 2020-10-30 | 北京市商汤科技开发有限公司 | 行人再识别方法、装置、电子设备和存储介质 |
JP7229698B2 (ja) * | 2018-08-20 | 2023-02-28 | キヤノン株式会社 | 情報処理装置、情報処理方法及びプログラム |
CN111523569B (zh) * | 2018-09-04 | 2023-08-04 | 创新先进技术有限公司 | 一种用户身份确定方法、装置及电子设备 |
CN109543537B (zh) * | 2018-10-23 | 2021-03-23 | 北京市商汤科技开发有限公司 | 重识别模型增量训练方法及装置、电子设备和存储介质 |
CN110083742B (zh) * | 2019-04-29 | 2022-12-06 | 腾讯科技(深圳)有限公司 | 一种视频查询方法和装置 |
CN110175527B (zh) * | 2019-04-29 | 2022-03-25 | 北京百度网讯科技有限公司 | 行人再识别方法及装置、计算机设备及可读介质 |
US11062455B2 (en) * | 2019-10-01 | 2021-07-13 | Volvo Car Corporation | Data filtering of image stacks and video streams |
CN111339849A (zh) * | 2020-02-14 | 2020-06-26 | 北京工业大学 | 一种融合行人属性的行人重识别的方法 |
CN111339360B (zh) * | 2020-02-24 | 2024-03-26 | 北京奇艺世纪科技有限公司 | 视频处理方法、装置、电子设备及计算机可读存储介质 |
CN111539341B (zh) * | 2020-04-26 | 2023-09-22 | 香港中文大学(深圳) | 目标定位方法、装置、电子设备和介质 |
CN112001243A (zh) * | 2020-07-17 | 2020-11-27 | 广州紫为云科技有限公司 | 一种行人重识别数据标注方法、装置及设备 |
CN111897993A (zh) * | 2020-07-20 | 2020-11-06 | 杭州叙简科技股份有限公司 | 一种基于行人再识别的高效目标人物轨迹生成方法 |
CN112069952B (zh) | 2020-08-25 | 2024-10-15 | 北京小米松果电子有限公司 | 视频片段提取方法、视频片段提取装置及存储介质 |
CN112150514A (zh) * | 2020-09-29 | 2020-12-29 | 上海眼控科技股份有限公司 | 视频的行人轨迹追踪方法、装置、设备及存储介质 |
CN112906483B (zh) * | 2021-01-25 | 2024-01-23 | 中国银联股份有限公司 | 一种目标重识别方法、装置及计算机可读存储介质 |
CN113221641B (zh) * | 2021-04-01 | 2023-07-07 | 哈尔滨工业大学(深圳) | 基于生成对抗网络和注意力机制的视频行人重识别方法 |
CN113011395B (zh) * | 2021-04-26 | 2023-09-01 | 深圳市优必选科技股份有限公司 | 一种单阶段动态位姿识别方法、装置和终端设备 |
CN113255598B (zh) * | 2021-06-29 | 2021-09-28 | 南京视察者智能科技有限公司 | 一种基于Transformer的行人重识别方法 |
CN113780066B (zh) * | 2021-07-29 | 2023-07-25 | 苏州浪潮智能科技有限公司 | 行人重识别方法、装置、电子设备及可读存储介质 |
CN117522454B (zh) * | 2024-01-05 | 2024-04-16 | 北京文安智能技术股份有限公司 | 一种工作人员识别方法及系统 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150131858A1 (en) * | 2013-11-13 | 2015-05-14 | Fujitsu Limited | Tracking device and tracking method |
CN105518744A (zh) * | 2015-06-29 | 2016-04-20 | 北京旷视科技有限公司 | 行人再识别方法及设备 |
CN106022220A (zh) * | 2016-05-09 | 2016-10-12 | 西安北升信息科技有限公司 | 一种体育视频中对参赛运动员进行多人脸跟踪的方法 |
CN107346409A (zh) * | 2016-05-05 | 2017-11-14 | 华为技术有限公司 | 行人再识别方法和装置 |
CN108399381A (zh) * | 2018-02-12 | 2018-08-14 | 北京市商汤科技开发有限公司 | 行人再识别方法、装置、电子设备和存储介质 |
Family Cites Families (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6567116B1 (en) * | 1998-11-20 | 2003-05-20 | James A. Aman | Multiple object tracking system |
JP5837484B2 (ja) * | 2010-05-26 | 2015-12-24 | パナソニック インテレクチュアル プロパティ コーポレーション オブアメリカPanasonic Intellectual Property Corporation of America | 画像情報処理装置 |
KR20140090795A (ko) * | 2013-01-10 | 2014-07-18 | 한국전자통신연구원 | 다중 카메라 환경에서 객체 추적 방법 및 장치 |
CN103810476B (zh) * | 2014-02-20 | 2017-02-01 | 中国计量学院 | 基于小群体信息关联的视频监控网络中行人重识别方法 |
CN105095475B (zh) * | 2015-08-12 | 2018-06-19 | 武汉大学 | 基于两级融合的不完整属性标记行人重识别方法与系统 |
CN105354548B (zh) * | 2015-10-30 | 2018-10-26 | 武汉大学 | 一种基于ImageNet检索的监控视频行人重识别方法 |
JP2017167970A (ja) * | 2016-03-17 | 2017-09-21 | 株式会社リコー | 画像処理装置、物体認識装置、機器制御システム、画像処理方法およびプログラム |
JP6656987B2 (ja) * | 2016-03-30 | 2020-03-04 | 株式会社エクォス・リサーチ | 画像認識装置、移動体装置、及び画像認識プログラム |
EP3549063A4 (en) * | 2016-12-05 | 2020-06-24 | Avigilon Corporation | APPEARANCE SEARCH SYSTEM AND METHOD |
-
2018
- 2018-02-12 CN CN201810145717.3A patent/CN108399381B/zh active Active
- 2018-11-21 SG SG11201913733QA patent/SG11201913733QA/en unknown
- 2018-11-21 KR KR1020197038764A patent/KR102348002B1/ko active IP Right Grant
- 2018-11-21 JP JP2019570048A patent/JP6905601B2/ja active Active
- 2018-11-21 WO PCT/CN2018/116600 patent/WO2019153830A1/zh active Application Filing
-
2019
- 2019-12-25 US US16/726,878 patent/US11301687B2/en active Active
-
2020
- 2020-01-06 PH PH12020500050A patent/PH12020500050A1/en unknown
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150131858A1 (en) * | 2013-11-13 | 2015-05-14 | Fujitsu Limited | Tracking device and tracking method |
CN105518744A (zh) * | 2015-06-29 | 2016-04-20 | 北京旷视科技有限公司 | 行人再识别方法及设备 |
CN107346409A (zh) * | 2016-05-05 | 2017-11-14 | 华为技术有限公司 | 行人再识别方法和装置 |
CN106022220A (zh) * | 2016-05-09 | 2016-10-12 | 西安北升信息科技有限公司 | 一种体育视频中对参赛运动员进行多人脸跟踪的方法 |
CN108399381A (zh) * | 2018-02-12 | 2018-08-14 | 北京市商汤科技开发有限公司 | 行人再识别方法、装置、电子设备和存储介质 |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110827312A (zh) * | 2019-11-12 | 2020-02-21 | 北京深境智能科技有限公司 | 一种基于协同视觉注意力神经网络的学习方法 |
CN110827312B (zh) * | 2019-11-12 | 2023-04-28 | 北京深境智能科技有限公司 | 一种基于协同视觉注意力神经网络的学习方法 |
CN111538861A (zh) * | 2020-04-22 | 2020-08-14 | 浙江大华技术股份有限公司 | 基于监控视频进行图像检索的方法、装置、设备及介质 |
CN111538861B (zh) * | 2020-04-22 | 2023-08-15 | 浙江大华技术股份有限公司 | 基于监控视频进行图像检索的方法、装置、设备及介质 |
CN111723645A (zh) * | 2020-04-24 | 2020-09-29 | 浙江大学 | 用于同相机内有监督场景的多相机高精度行人重识别方法 |
CN111723645B (zh) * | 2020-04-24 | 2023-04-18 | 浙江大学 | 用于同相机内有监督场景的多相机高精度行人重识别方法 |
CN115150663A (zh) * | 2022-07-01 | 2022-10-04 | 北京奇艺世纪科技有限公司 | 热度曲线的生成方法、装置、电子设备及存储介质 |
CN115150663B (zh) * | 2022-07-01 | 2023-12-15 | 北京奇艺世纪科技有限公司 | 热度曲线的生成方法、装置、电子设备及存储介质 |
Also Published As
Publication number | Publication date |
---|---|
KR102348002B1 (ko) | 2022-01-06 |
PH12020500050A1 (en) | 2020-11-09 |
KR20200015610A (ko) | 2020-02-12 |
SG11201913733QA (en) | 2020-01-30 |
US11301687B2 (en) | 2022-04-12 |
CN108399381B (zh) | 2020-10-30 |
US20200134321A1 (en) | 2020-04-30 |
JP6905601B2 (ja) | 2021-07-21 |
JP2020525901A (ja) | 2020-08-27 |
CN108399381A (zh) | 2018-08-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2019153830A1 (zh) | 行人再识别方法、装置、电子设备和存储介质 | |
US10810748B2 (en) | Multiple targets—tracking method and apparatus, device and storage medium | |
WO2018157735A1 (zh) | 目标跟踪方法、系统及电子设备 | |
CN108154222B (zh) | 深度神经网络训练方法和系统、电子设备 | |
US11586842B2 (en) | System and method for machine learning based video quality assessment | |
CN107273458B (zh) | 深度模型训练方法及装置、图像检索方法及装置 | |
WO2022062344A1 (zh) | 压缩视频的显著性目标检测方法、系统、设备及存储介质 | |
CN109118420B (zh) | 水印识别模型建立及识别方法、装置、介质及电子设备 | |
CN115294332B (zh) | 一种图像处理方法、装置、设备和存储介质 | |
US11282179B2 (en) | System and method for machine learning based video quality assessment | |
CN108108769B (zh) | 一种数据的分类方法、装置及存储介质 | |
CN114429577B (zh) | 一种基于高置信标注策略的旗帜检测方法及系统及设备 | |
WO2020151300A1 (zh) | 基于深度残差网络的性别识别方法、装置、介质和设备 | |
Zhang et al. | A review of small target detection based on deep learning | |
US20230252683A1 (en) | Image processing device, image processing method, and computer-readable recording medium storing image processing program | |
CN115035463B (zh) | 行为识别方法、装置、设备和存储介质 | |
CN111523399A (zh) | 敏感视频检测及装置 | |
CN114724144B (zh) | 文本识别方法、模型的训练方法、装置、设备及介质 | |
Meng et al. | Structure preservation adversarial network for visual domain adaptation | |
Schulz et al. | Identity documents image quality assessment | |
CN113642443A (zh) | 模型的测试方法、装置、电子设备及存储介质 | |
Guesdon et al. | Multitask Metamodel for Keypoint Visibility Prediction in Human Pose Estimation | |
CN113780268B (zh) | 商标识别方法、装置与电子设备 | |
CN113627341B (zh) | 一种视频样例比对的方法、系统、设备及存储介质 | |
CN115331062B (zh) | 图像识别方法、装置、电子设备和计算机可读存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18905247 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2019570048 Country of ref document: JP Kind code of ref document: A |
|
ENP | Entry into the national phase |
Ref document number: 20197038764 Country of ref document: KR Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 26/11/2020) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 18905247 Country of ref document: EP Kind code of ref document: A1 |