CN111598067A - Re-recognition training method, re-recognition method and storage device in video - Google Patents

Re-recognition training method, re-recognition method and storage device in video Download PDF

Info

Publication number
CN111598067A
CN111598067A CN202010723115.9A CN202010723115A CN111598067A CN 111598067 A CN111598067 A CN 111598067A CN 202010723115 A CN202010723115 A CN 202010723115A CN 111598067 A CN111598067 A CN 111598067A
Authority
CN
China
Prior art keywords
animal
picture sequence
feature map
features
picture
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010723115.9A
Other languages
Chinese (zh)
Other versions
CN111598067B (en
Inventor
张迪
潘华东
罗时现
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Dahua Technology Co Ltd
Original Assignee
Zhejiang Dahua Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Dahua Technology Co Ltd filed Critical Zhejiang Dahua Technology Co Ltd
Priority to CN202010723115.9A priority Critical patent/CN111598067B/en
Publication of CN111598067A publication Critical patent/CN111598067A/en
Application granted granted Critical
Publication of CN111598067B publication Critical patent/CN111598067B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/23Recognition of whole body movements, e.g. for sport training
    • G06V40/25Recognition of walking or running movements, e.g. gait recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a re-recognition training method, a re-recognition method and a storage device in a video, wherein the re-recognition training method comprises the following steps: detecting an animal picture sequence in a video by using an animal detection and animal tracking method; extracting time domain features and space features of the animal picture sequence, fusing the time domain features and the space features and obtaining a feature map of the animal picture sequence; carrying out blocking processing on the feature map in different sizes on a horizontal dimension, and respectively calculating a local blocking feature map and the loss between the global feature map and the real animal; and optimizing the loss for training until the training is converged to obtain an optimal animal re-identification result. By the mode, the time domain feature and the spatial feature of the animal picture sequence can be fused, fine-grained learning is performed on different parts of the animal, the learning of the global feature is considered, and the accuracy and the robustness of animal weight identification are improved.

Description

Re-recognition training method, re-recognition method and storage device in video
Technical Field
The present application relates to the field of computer vision technologies, and in particular, to a video re-recognition training method, a video re-recognition method, and a storage device.
Background
The monitoring video is often applied to public places such as subways, airports and traffic roads to maintain public safety, pedestrians are detected through the video, and suspects or lost children are found by utilizing a pedestrian re-identification technology. In the traditional pedestrian re-identification method, a single pedestrian picture is mostly used for retrieval, in a monitoring video, the postures, the shielding conditions, the environmental backgrounds and the like of pedestrians are possibly different along with different time points, and the robustness of a single picture retrieval result is not strong. The pedestrian re-identification method based on the video utilizes a plurality of pictures in the video sequence for identification, and the identification effect is better.
The method for solving the problem of video pedestrian re-identification at present is to take a section of sequence pictures of pedestrians in a monitoring video as input, extract the spatio-temporal information of the pictures by using a convolutional neural network and a cyclic neural network, encode the characteristic information into a characteristic vector, and identify the pedestrians by calculating the distance of the characteristic vector of each pedestrian. These methods usually only focus on global features of pedestrian sequences, but do not focus on significant features of faces, trunks and other key parts of pedestrians, so that accuracy of video-based pedestrian re-identification is not high enough.
Disclosure of Invention
The application provides a re-recognition training method, a re-recognition method and a storage device in a video, which can improve the accuracy and robustness of animal re-recognition.
In order to solve the technical problem, the application adopts a technical scheme that: the method for re-recognition training in the video comprises the following steps:
detecting an animal picture sequence in a video by using an animal detection and animal tracking method;
extracting time domain features and space features of the animal picture sequence, fusing the time domain features and the space features and obtaining a feature map of the animal picture sequence;
carrying out blocking processing on the feature map in different sizes on a horizontal dimension, and respectively calculating a local blocking feature map and the loss between the global feature map and the real animal;
and optimizing the loss for training until the training is converged to obtain an optimal animal re-identification result.
In order to solve the above technical problem, another technical solution adopted by the present application is: the method for re-identifying in the video comprises the following steps:
detecting a picture sequence of an animal to be detected in a video to be detected by using an animal detection and animal tracking method;
selecting a plurality of pictures to be detected from the picture sequence of the animal to be detected, and performing time domain feature and spatial feature fusion processing and blocking processing on the pictures to be detected to obtain a feature vector of the picture sequence of the animal to be detected;
and comparing the characteristic vector of the animal picture sequence to be detected with the characteristic vector of the animal picture sequence in a preset search base library, searching out a target picture with the highest similarity, and outputting a re-identification matching result.
In order to solve the above technical problem, the present application adopts another technical solution that: a storage device is provided, which stores a program file capable of realizing the re-identification method.
The beneficial effect of this application is: by means of the method, the situation that a single picture is possibly poor in recognition effect due to animal posture, environment background and shielding can be avoided, each frame is correlated in time, each position is correlated in space, fine-grained learning is conducted on different parts of an animal, and meanwhile global feature learning is considered, so that accuracy and robustness of animal re-recognition are improved.
Drawings
FIG. 1 is a schematic flow chart of a method for re-recognition training in video according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of the structure of a convolutional neural network in an embodiment of the present invention;
FIG. 3 is a schematic structural diagram of a non-local attention module in an embodiment of the invention;
FIG. 4 is a flowchart illustrating a video re-recognition method according to an embodiment of the present invention;
FIG. 5 is a schematic flow chart illustrating the establishment of a default search base according to an embodiment of the present invention;
FIG. 6 is a schematic structural diagram of a video re-recognition training device according to an embodiment of the present invention;
FIG. 7 is a schematic structural diagram of a video re-recognition apparatus according to an embodiment of the present invention;
fig. 8 is a schematic structural diagram of a memory device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The terms "first", "second" and "third" in this application are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implying any indication of the number of technical features indicated. Thus, a feature defined as "first," "second," or "third" may explicitly or implicitly include at least one of the feature. In the description of the present application, "plurality" means at least two, e.g., two, three, etc., unless explicitly specifically limited otherwise. All directional indications (such as up, down, left, right, front, and rear … …) in the embodiments of the present application are only used to explain the relative positional relationship between the components, the movement, and the like in a specific posture (as shown in the drawings), and if the specific posture is changed, the directional indication is changed accordingly. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
Fig. 1 is a schematic flow chart of a video re-recognition training method according to an embodiment of the present invention. It should be noted that the method of the present invention is not limited to the flow sequence shown in fig. 1 if the results are substantially the same. As shown in fig. 1, the method comprises the steps of:
step S101: and detecting an animal picture sequence in the video by using an animal detection and animal tracking method.
In step S101, a surveillance video is first acquired, animal pictures are then extracted from the surveillance video by using an animal detection and animal tracking method, and the extracted animal pictures are made into an animal picture sequence. The animal of the present embodiment is a dynamic object, including but not limited to pedestrians and vehicles.
Step S102: and extracting the time domain features and the space features of the animal picture sequence, fusing the time domain features and the space features and obtaining a feature map of the animal picture sequence.
In step S102, selecting a plurality of pictures from the animal picture sequence by random sampling, sorting the plurality of pictures according to the sequence of the shooting time, dividing the plurality of pictures into a plurality of sub-picture sequences, randomly selecting one picture from each sub-picture sequence, and sequentially performing scaling processing and random horizontal flipping processing; inputting the processed pictures into a convolutional neural network, performing first convolution on the pictures to obtain input features with the number of channels reduced, associating frame dimensions in the input features in a matrix multiplication mode, associating width dimensions in the input features to obtain output features after fusion of time domain features and space features, and sequentially performing second convolution and third convolution on the output features to extract feature maps of animal picture sequences.
Specifically, the pedestrian is taken as an example for explanation, a plurality of pictures are selected from a pedestrian picture sequence in a random sampling mode, the plurality of pictures are sorted according to the sequence of shooting time and are divided into four sub-picture sequences, during training, one picture is selected from each sub-picture sequence at random, the size of the picture is adjusted to (384,128), and then random horizontal turning processing is performed.
The convolutional neural network of the present embodiment employs a ResNet50 network, in which a non-local attention module for fusing a temporal feature and a spatial feature is inserted, and further, as shown in fig. 2, the convolutional neural network includes a 1 × 1 convolution, a 3 × 3 convolution and a 1 × 1 convolution, where the first 1 × 1 convolution reduces the amount of computation by reducing the number of channels of an input picture, and the non-local attention module is disposed after the first 1 × 1 convolution.
Further, the non-local attention module performs temporal feature and spatial feature fusion according to the following formula,
Figure 964351DEST_PATH_IMAGE001
wherein x and y respectively represent input features and output features, i represents coordinates of a current position, j represents coordinates of all space-time positions, f is a feature correlation degree for calculating i and j, g is a linear representation of j position features, and C is a normalization coefficient.
The non-local attention module calculates the frame dimension correlation in the input features, correlates the width dimension and the height dimension in the input features, and has a huge calculation amount, so that for pedestrian re-identification, the calculation amount of the non-local attention module is suddenly increased when the input features are four frames of pictures. In the embodiment, the calculation amount of the non-local attention module is greatly reduced by simplifying the non-local attention module and compressing the redundancy characteristic of the spatio-temporal domain.
Specifically, referring to fig. 2, after the processed picture is input into a convolutional neural network as an initial input picture and is subjected to 1 × 1 convolution operation, a C/4 × H × W matrix is obtained, where C is the number of channels, H is the height of a feature map, and W is the width of the feature map, the C/4 × H × W matrix is input into a non-local attention module as an input feature and is subjected to time domain feature and spatial feature fusion, an output feature is obtained, the output feature is sequentially subjected to one 3 × 3 convolution and one 1 × 1 convolution, a feature map of a pedestrian picture sequence is extracted, and an output result is a C × H × W matrix.
Further, referring to fig. 3, the non-local attention module is configured to perform the following operations: the method comprises the steps of sequentially carrying out global pooling operation and 1 × 1 × 1 convolution operation on input features to obtain a matrix A of 1 × C ', sequentially carrying out maximum pooling operation and 1 × 1 × 1 convolution operation on the input features to obtain a matrix B of C ' × THW/4, wherein T is the number of sequence frames, H is the height of a feature map, W is the width of the feature map, and the number of C channels, sequentially carrying out another maximum pooling operation and 1 × 1 × 1 convolution operation on the input features to obtain a matrix D of THW/4 × C ', performing matrix multiplication on the matrix A and the matrix B to obtain a matrix C of 1 × THW/4, performing matrix multiplication and 1 × 1 × 1 convolution on the matrix D and the matrix C to obtain a matrix E of 1 × C, and adding the matrix E and the input features to obtain output features.
Step S103: and carrying out blocking processing on the feature map in different sizes in the horizontal dimension, and respectively calculating the local blocking feature map and the loss between the global feature map and the real animal.
In step S103, the feature map is subjected to blocking processing of different sizes in the horizontal dimension; performing maximum pooling and convolution dimension reduction on the block processing result in sequence to obtain a local block feature map and a global feature map; and calculating the cross entropy loss of the local block feature map, and calculating the triple loss and the cross entropy loss between the global feature map and the real animal.
Specifically, the feature map is divided into three branches, and in the high dimension, the first branch is divided into one piece, the second branch is divided into one piece and two pieces, and the third branch is divided into one piece and three pieces. More specifically, the feature map of the first branch has a size of (12, 4, 2048), the feature map of the second branch has a size of (24, 8, 2048), and the feature map of the third branch has a size of (24, 8, 2048); performing pooling operation on the feature map of the first branch using maximal pooling with kernel (12, 4); for the feature map of the second branch, dividing the feature map into one block by using the maximum pooling of the kernel (24, 8), and dividing the feature map into two blocks by using the maximum pooling of the kernel (12, 8); and for the feature map of the third branch, dividing the feature map into one block by using the maximum pooling with the kernel of (24, 8), dividing the feature map into three blocks by using the maximum pooling with the kernel of (8, 8), and performing convolution dimensionality reduction on the 2048-dimensional feature map obtained by block processing to 256 dimensions, thereby reducing the calculation amount. And (3) the feature map with the blocking result being one block is a global feature map, otherwise, the feature map is a local blocking feature map, then the cross entropy loss of the local blocking feature map is calculated, and the triple loss and the cross entropy loss between the global feature map and the real animal are calculated.
Step S104: and optimizing the loss for training until the training is converged to obtain an optimal animal re-identification result.
In step S104, the Adam optimization algorithm is used to optimize the loss for training until the training converges to obtain the optimal animal re-recognition result.
According to the video re-identification training method, multiple pedestrian pictures in the video sequence of the animal are used for identification according to the actual monitoring scene, so that the situation that a single picture is possibly poor in identification effect due to animal posture, environmental background and shielding is avoided. Meanwhile, fine-grained learning is carried out on different parts of the animal body, and meanwhile, end-to-end learning is realized by considering global characteristics. A non-local attention module is added into the convolutional neural network, each input frame image is correlated, and meanwhile, the position of each point is correlated with other positions in the space position, so that the accuracy and robustness of animal re-identification are improved.
Fig. 4 is a flowchart illustrating a video re-recognition method according to an embodiment of the present invention. It should be noted that the method of the present invention is not limited to the flow sequence shown in fig. 4 if the results are substantially the same. As shown in fig. 4, the method includes the steps of:
step S401: and detecting the picture sequence of the animal to be detected in the video to be detected by using an animal detection and animal tracking method.
In step S401, a surveillance video is first acquired, animal pictures are then extracted from the surveillance video by using an animal detection and animal tracking method, and the extracted animal pictures are made into an animal picture sequence. The animal of the present embodiment includes, but is not limited to, a pedestrian.
Step S402: selecting a plurality of pictures to be detected from the picture sequence of the animal to be detected, and carrying out time domain feature and space feature fusion processing and blocking processing on the pictures to be detected to obtain a feature vector of the picture sequence of the animal to be detected.
In step S402, the step of selecting a plurality of pictures to be tested from the picture sequence of the animal to be tested specifically includes: selecting a plurality of pictures from the picture sequence of the animal to be detected by adopting a random sampling mode, sequencing the pictures according to the sequence of the shooting time, dividing the pictures into a plurality of sub-picture sequences, randomly selecting one picture from each sub-picture sequence, and sequentially carrying out zooming processing and random horizontal turning processing. The method comprises the following steps of performing time domain feature and spatial feature fusion processing and blocking processing on a picture to be detected to obtain a feature vector of an animal picture sequence to be detected: inputting the processed pictures into a convolutional neural network, performing first convolution on the pictures to obtain an input feature map with the number of channels reduced, associating frame dimensions in the input features in a matrix multiplication mode, associating width dimensions and height dimensions in the input features to obtain an output feature map with fused time domain features and space features, and performing blocking processing on the feature map in different sizes in a horizontal dimension; and sequentially carrying out maximum pooling processing and convolution dimension reduction processing on the block processing result, and then sequentially carrying out second convolution and third convolution to extract the feature vector of the animal picture sequence.
Specifically, in the sub-block processing, the feature map is divided into three branches, and in the high dimension, the first branch is divided into one block, the second branch is divided into one block and two blocks, and the third branch is divided into one block and three blocks. More specifically, the feature map of the first branch has a size of (12, 4, 2048), the feature map of the second branch has a size of (24, 8, 2048), and the feature map of the third branch has a size of (24, 8, 2048); performing pooling operation on the feature map of the first branch using maximal pooling with kernel (12, 4); for the feature map of the second branch, dividing the feature map into one block by using the maximum pooling of the kernel (24, 8), and dividing the feature map into two blocks by using the maximum pooling of the kernel (12, 8); and for the feature map of the third branch, dividing the feature map into one block by using the maximum pooling with the kernel of (24, 8), dividing the feature map into three blocks by using the maximum pooling with the kernel of (8, 8), and performing convolution dimensionality reduction on the 2048-dimensional feature map obtained by block processing to 256 dimensions, thereby reducing the calculation amount.
Step S403: and comparing the characteristic vector of the animal picture sequence to be detected with the characteristic vector of the animal picture sequence in a preset search base library, searching out a target picture with the highest similarity, and outputting a re-identification matching result.
In step S403, calculating an euclidean distance between the feature vector of the animal picture sequence to be detected and the feature vector of the animal picture sequence in the preset search base; and sequencing the Euclidean distances, and outputting an animal picture sequence of which the minimum Euclidean distance corresponds to a preset search base.
In this embodiment, the re-identification method further includes: and establishing a preset search base. As shown in fig. 5, the step of establishing the preset search base includes:
step S501: the method comprises the steps of collecting registered animals in a monitoring video by using an animal detection and animal tracking method, detecting and extracting a registration picture of each registered animal, and forming a section of registration animal picture sequence for each registered animal;
step S502: marking corresponding animal identity labels for each registered animal picture sequence;
step S503: inputting the registered animal picture sequence into a re-recognition training model to obtain a feature vector of the registered animal picture sequence;
step S504: and establishing a preset search base according to the characteristic vector of the registered animal picture sequence.
The re-recognition training method in the video of the embodiment of the invention adopts the re-recognition training model obtained by training the re-recognition training method in the video, thereby not only reducing the calculated amount in the re-recognition process, but also improving the accuracy and robustness of animal re-recognition.
Fig. 6 is a schematic structural diagram of a video re-recognition training device according to an embodiment of the present invention. As shown in fig. 6, the apparatus 60 includes a picture sequence acquiring module 61, a feature fusion and extraction module 62, a block processing module 63, and an optimization module 64.
The image sequence acquiring module 61 is configured to detect an animal image sequence in the video by using an animal detection and animal tracking method.
The feature fusion and extraction module 62 is coupled to the picture sequence acquisition module 61, and is configured to extract time-domain features and spatial features of the animal picture sequence, fuse the time-domain features and the spatial features, and obtain a feature map of the animal picture sequence.
The block processing module 63 is coupled to the feature fusion and extraction module 62, and configured to perform block processing on the feature map in different sizes in the horizontal dimension, and calculate the local block feature map and the loss between the global feature map and the real animal respectively.
The optimization module 64 is coupled to the block processing module 63, and is configured to optimize the loss for training until the training converges to obtain an optimal animal re-recognition result.
Fig. 7 is a schematic structural diagram of a video re-recognition apparatus according to an embodiment of the present invention. As shown in fig. 7, the apparatus 70 includes a picture sequence acquiring module 71, a feature extracting module 72, and a re-identifying module 73.
The image sequence acquiring module 71 is configured to detect an animal image sequence in a video to be detected by using an animal detection and animal tracking method.
The feature extraction module 72 is coupled to the image sequence acquisition module 71, and is configured to select multiple images to be detected from the image sequence of the animal to be detected, input the images to be detected into the re-recognition training model, and obtain feature vectors of the image sequence of the animal to be detected.
The re-recognition training model is obtained by adopting the above-mentioned re-recognition training method in the video, and for the sake of brevity, the re-recognition training method in the video is not repeated herein.
The re-recognition module 73 is coupled to the feature extraction module 72, and configured to compare the feature vector of the animal picture sequence to be detected with the feature vector of the animal picture sequence in the preset search base, search out a target picture with the highest similarity, and output a re-recognition matching result.
Referring to fig. 8, fig. 8 is a schematic structural diagram of a memory device according to an embodiment of the invention. The storage device of the embodiment of the present invention stores a program file 81 capable of implementing all the methods described above, wherein the program file 81 may be stored in the storage device in the form of a software product, and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute all or part of the steps of the methods described in the embodiments of the present application. The aforementioned storage device includes: various media capable of storing program codes, such as a usb disk, a mobile hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, or terminal devices, such as a computer, a server, a mobile phone, and a tablet.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a unit is merely a logical division, and an actual implementation may have another division, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The above embodiments are merely examples and are not intended to limit the scope of the present disclosure, and all modifications, equivalents, and flow charts using the contents of the specification and drawings of the present disclosure or those directly or indirectly applied to other related technical fields are intended to be included in the scope of the present disclosure.

Claims (10)

1. A re-recognition training method in video is characterized by comprising the following steps:
detecting an animal picture sequence in a video by using an animal detection and animal tracking method;
extracting time domain features and space features of the animal picture sequence, fusing the time domain features and the space features and obtaining a feature map of the animal picture sequence;
carrying out blocking processing on the feature map in different sizes on a horizontal dimension, and respectively calculating a local blocking feature map and the loss between the global feature map and the real animal;
and optimizing the loss for training until the training is converged to obtain an optimal animal re-identification result.
2. The re-recognition training method according to claim 1, wherein the step of performing block processing on the feature map in different sizes in a horizontal dimension and calculating the loss between the local block feature map and the global feature map and the real animal respectively comprises:
carrying out blocking processing on the feature map in different sizes in a horizontal dimension;
performing maximum pooling and convolution dimension reduction on the block processing result in sequence to obtain the local block feature map and the global feature map;
and calculating the cross entropy loss of the local block feature map, and calculating the triple loss and the cross entropy loss between the global feature map and the real animal.
3. The re-recognition training method of claim 1, wherein the step of extracting the time-domain features and the spatial features of the animal picture sequence, fusing the time-domain features and the spatial features, and obtaining the feature map of the animal picture sequence comprises:
selecting a plurality of pictures from the animal picture sequence;
and inputting the picture into a convolutional neural network, extracting time domain features and space features of the picture, fusing the time domain features and the space features, and obtaining a feature map of the animal picture sequence.
4. The re-recognition training method of claim 3, wherein the step of inputting the picture into a convolutional neural network, extracting temporal features and spatial features of the picture, fusing the temporal features and the spatial features, and obtaining a feature map of the animal picture sequence comprises:
performing first convolution on the picture to obtain input features with the number of channels reduced;
correlating frame dimensions in the input features by using a matrix multiplication mode, correlating width dimensions and height dimensions in the input features, and obtaining output features after feature fusion;
and sequentially carrying out second convolution and third convolution on the output features to extract a feature map of the animal picture sequence.
5. The re-recognition training method of claim 3, wherein the step of selecting a plurality of pictures from the sequence of animal pictures comprises:
selecting a plurality of pictures from the animal picture sequence by adopting a random sampling mode;
and sequencing the plurality of pictures according to the sequence of the shooting time, dividing the pictures into a plurality of sub-picture sequences, randomly selecting one picture from each sub-picture sequence, and sequentially carrying out zooming processing and random horizontal turning processing.
6. A method for re-recognition in video is characterized in that,
detecting a picture sequence of an animal to be detected in a video to be detected by using an animal detection and animal tracking method;
selecting a plurality of pictures to be detected from the picture sequence of the animal to be detected, and performing time domain feature and spatial feature fusion processing and blocking processing on the pictures to be detected to obtain a feature vector of the picture sequence of the animal to be detected;
and comparing the characteristic vector of the animal picture sequence to be detected with the characteristic vector of the animal picture sequence in a preset search base library, searching out a target picture with the highest similarity, and outputting a re-identification matching result.
7. The re-recognition method of claim 6, wherein the step of comparing the feature vector of the image sequence of the animal to be tested with the feature vector of the image sequence of the animal in a preset search base, searching out the target image with the highest similarity, and outputting the re-recognition matching result comprises:
calculating the Euclidean distance between the characteristic vector of the animal picture sequence to be detected and the characteristic vector of the animal picture sequence of a preset search base;
and sequencing the Euclidean distances, and outputting the animal picture sequence of which the minimum Euclidean distance corresponds to a preset search base.
8. The re-recognition method of claim 6, further comprising:
and establishing a preset search base.
9. The re-recognition method of claim 8, wherein the step of establishing a pre-set search base comprises:
the method comprises the steps of collecting registered animals in a monitoring video by using an animal detection and animal tracking method, detecting and extracting a registration picture of each registered animal, and forming a section of registration animal picture sequence for each registered animal;
marking corresponding animal identity labels for each registered animal picture sequence;
inputting the registered animal picture sequence into a re-recognition training model to obtain a feature vector of the registered animal picture sequence;
and establishing the preset search base according to the characteristic vector of the registered animal picture sequence.
10. A storage device in which a program file capable of implementing the re-recognition method according to any one of claims 6 to 9 is stored.
CN202010723115.9A 2020-07-24 2020-07-24 Re-recognition training method, re-recognition method and storage device in video Active CN111598067B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010723115.9A CN111598067B (en) 2020-07-24 2020-07-24 Re-recognition training method, re-recognition method and storage device in video

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010723115.9A CN111598067B (en) 2020-07-24 2020-07-24 Re-recognition training method, re-recognition method and storage device in video

Publications (2)

Publication Number Publication Date
CN111598067A true CN111598067A (en) 2020-08-28
CN111598067B CN111598067B (en) 2020-11-10

Family

ID=72191884

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010723115.9A Active CN111598067B (en) 2020-07-24 2020-07-24 Re-recognition training method, re-recognition method and storage device in video

Country Status (1)

Country Link
CN (1) CN111598067B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113177920A (en) * 2021-04-29 2021-07-27 宁波智能装备研究院有限公司 Target re-identification method and system of model biological tracking system
CN113177528A (en) * 2021-05-27 2021-07-27 南京昊烽信息科技有限公司 License plate recognition method and system based on multi-task learning strategy training network model
CN113221776A (en) * 2021-05-19 2021-08-06 彭东乔 Method for identifying general behaviors of ruminant based on artificial intelligence
US20220058396A1 (en) * 2019-11-19 2022-02-24 Tencent Technology (Shenzhen) Company Limited Video Classification Model Construction Method and Apparatus, Video Classification Method and Apparatus, Device, and Medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2015149030A (en) * 2014-02-07 2015-08-20 国立大学法人名古屋大学 Video content violence degree evaluation device, video content violence degree evaluation method, and video content violence degree evaluation program
CN108764096A (en) * 2018-05-21 2018-11-06 华中师范大学 A kind of pedestrian weight identifying system and method
CN110110601A (en) * 2019-04-04 2019-08-09 深圳久凌软件技术有限公司 Video pedestrian weight recognizer and device based on multi-space attention model

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2015149030A (en) * 2014-02-07 2015-08-20 国立大学法人名古屋大学 Video content violence degree evaluation device, video content violence degree evaluation method, and video content violence degree evaluation program
CN108764096A (en) * 2018-05-21 2018-11-06 华中师范大学 A kind of pedestrian weight identifying system and method
CN110110601A (en) * 2019-04-04 2019-08-09 深圳久凌软件技术有限公司 Video pedestrian weight recognizer and device based on multi-space attention model

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220058396A1 (en) * 2019-11-19 2022-02-24 Tencent Technology (Shenzhen) Company Limited Video Classification Model Construction Method and Apparatus, Video Classification Method and Apparatus, Device, and Medium
US11967152B2 (en) * 2019-11-19 2024-04-23 Tencent Technology (Shenzhen) Company Limited Video classification model construction method and apparatus, video classification method and apparatus, device, and medium
CN113177920A (en) * 2021-04-29 2021-07-27 宁波智能装备研究院有限公司 Target re-identification method and system of model biological tracking system
CN113177920B (en) * 2021-04-29 2022-08-09 宁波智能装备研究院有限公司 Target re-identification method and system of model biological tracking system
CN113221776A (en) * 2021-05-19 2021-08-06 彭东乔 Method for identifying general behaviors of ruminant based on artificial intelligence
CN113221776B (en) * 2021-05-19 2024-05-28 彭东乔 Method for identifying general behaviors of ruminants based on artificial intelligence
CN113177528A (en) * 2021-05-27 2021-07-27 南京昊烽信息科技有限公司 License plate recognition method and system based on multi-task learning strategy training network model
CN113177528B (en) * 2021-05-27 2024-05-03 南京昊烽信息科技有限公司 License plate recognition method and system based on multi-task learning strategy training network model

Also Published As

Publication number Publication date
CN111598067B (en) 2020-11-10

Similar Documents

Publication Publication Date Title
CN111598067B (en) Re-recognition training method, re-recognition method and storage device in video
Qu et al. RGBD salient object detection via deep fusion
Zheng et al. Partial person re-identification
CN107624189B (en) Method and apparatus for generating a predictive model
CN109829398B (en) Target detection method in video based on three-dimensional convolution network
CN108090435B (en) Parking available area identification method, system and medium
CN109325964B (en) Face tracking method and device and terminal
CN109635686B (en) Two-stage pedestrian searching method combining human face and appearance
CN111046752B (en) Indoor positioning method, computer equipment and storage medium
CN112541448B (en) Pedestrian re-identification method and device, electronic equipment and storage medium
CN111310728B (en) Pedestrian re-identification system based on monitoring camera and wireless positioning
Cheng et al. Person re-identification by articulated appearance matching
Tian et al. Scene Text Detection in Video by Learning Locally and Globally.
CN110765903A (en) Pedestrian re-identification method and device and storage medium
CN110796074A (en) Pedestrian re-identification method based on space-time data fusion
Mansour et al. Video background subtraction using semi-supervised robust matrix completion
CN115049731B (en) Visual image construction and positioning method based on binocular camera
CN111444817B (en) Character image recognition method and device, electronic equipment and storage medium
CN108109164B (en) Information processing method and electronic equipment
CN113610967B (en) Three-dimensional point detection method, three-dimensional point detection device, electronic equipment and storage medium
CN112949539A (en) Pedestrian re-identification interactive retrieval method and system based on camera position
CN111814624B (en) Gait recognition training method, gait recognition method and storage device for pedestrian in video
CN115018886B (en) Motion trajectory identification method, device, equipment and medium
CN113221922B (en) Image processing method and related device
CN115082854A (en) Pedestrian searching method oriented to security monitoring video

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant