WO2021243947A1 - Procédé et appareil de réidentification d'objet, terminal et support d'enregistrement - Google Patents

Procédé et appareil de réidentification d'objet, terminal et support d'enregistrement Download PDF

Info

Publication number
WO2021243947A1
WO2021243947A1 PCT/CN2020/126269 CN2020126269W WO2021243947A1 WO 2021243947 A1 WO2021243947 A1 WO 2021243947A1 CN 2020126269 W CN2020126269 W CN 2020126269W WO 2021243947 A1 WO2021243947 A1 WO 2021243947A1
Authority
WO
WIPO (PCT)
Prior art keywords
image data
clustered
cluster
network
clustering
Prior art date
Application number
PCT/CN2020/126269
Other languages
English (en)
Chinese (zh)
Inventor
葛艺潇
陈大鹏
朱烽
赵瑞
李鸿升
Original Assignee
商汤集团有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 商汤集团有限公司 filed Critical 商汤集团有限公司
Priority to JP2021549335A priority Critical patent/JP2022548187A/ja
Priority to KR1020217025979A priority patent/KR20210151773A/ko
Publication of WO2021243947A1 publication Critical patent/WO2021243947A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Definitions

  • the present disclosure relates to the field of image processing technology, and in particular to an object re-identification method, device, storage medium and computer equipment.
  • re-identification of objects (such as pedestrians, vehicles, etc.).
  • pseudo-labeling Pseudo-Labelling
  • Pseudo-Labelling Pseudo-Labelling
  • the trained network clusters the image data of the target domain to generate pseudo-labels, and finally uses the image data with pseudo-labels to optimize the network to obtain the final network.
  • the present disclosure provides an object re-identification method, device, storage medium and computer equipment.
  • the present disclosure provides an object re-recognition method, including: obtaining a pre-trained re-recognition network; obtaining an image to be recognized; performing re-recognition processing on the image to be recognized through the re-recognition network to obtain a target in the image to be recognized The re-identification result of the object; wherein the training image data of the re-identification network includes at least first clustered image data and non-clustered instance image data, the first clustered image data and the non-clustered instance image data In order to obtain by performing clustering processing on the first image data set by the initial network corresponding to the re-identification network, the image data in the first image data set does not contain real cluster tags.
  • the embodiments of the present disclosure perform network training by combining outliers that are not in the clustering, which helps to improve the clustering performance of the re-identification network, and further improves the performance of the target object re-identification result obtained by the object re-identification method of the present disclosure. accuracy.
  • the training image data of the re-identification network further includes a second image data set, and the second cluster image data in the second image data set includes a true cluster label; the second image data set The image data domain where the first image data set is located is different from the image data domain where the first image data set is located.
  • the embodiments of the present disclosure help to improve re-identification by providing supervision of the first cluster image data that does not contain real cluster labels, non-cluster instance image data, and the second cluster image data containing real cluster labels.
  • the clustering performance of the network further improves the accuracy of the target object re-recognition result obtained by the object re-recognition method of the present disclosure.
  • the method before acquiring the pre-trained re-identification network, the method further includes: acquiring the initial network; acquiring the training image data; training the initial network through the training image data to obtain the Then identify the network.
  • the embodiments of the present disclosure train the initial network through the acquired training image data to obtain the re-identification network, which can improve the image classification and object recognition capabilities of the re-identification network.
  • the acquiring the training image data includes: acquiring an initial clustering result obtained by performing clustering processing on the first image data set through the initial network; performing the initial clustering result on the initial clustering result Re-clustering processing to obtain the first clustered image data and the non-clustered instance image data.
  • the processing flow for processing the target domain image data in the embodiment of the present disclosure can be understood as a self-defined step size comparison learning strategy, that is, according to the principle of "from simple to difficult", first obtain the most credible cluster, and then Through the re-clustering process, credible clusters are gradually increased, thereby improving the quality of the learning target, and reducing errors by increasing credible clusters.
  • the initial clustering result includes initial clustered image data; the re-clustering process is performed on the initial clustering result to obtain the first clustered image data and the non-clustering instance
  • the image data includes: reducing the number of image data of the first current cluster in the initial clustered image data according to the image feature distance to obtain the second current cluster; and determining the density index of the second current cluster, where the density index is The ratio of the number of image data of the second current cluster to the number of image data of the first current cluster; when the density index reaches a first preset threshold, the second current cluster is used to replace the first A current cluster, the first clustered image data is obtained; the reduced image data is updated to belong to the non-clustered instance image data.
  • re-clustering is performed by evaluating the density of clusters to gradually increase credible clusters, thereby improving the quality of the learning target, and reducing errors by increasing credible clusters.
  • the initial clustering result further includes initial non-clustered image data; the re-clustering process is performed on the initial clustering result to obtain the first clustered image data and the non-clustered image data.
  • the class instance image data includes: adding image data of other clusters and/or image data in the initial non-clustered image data to the third current cluster of the initial clustered image data according to the image feature distance to obtain the first 4.
  • the recognition rate of feature representation can be gradually increased, and more non-clustering data can be added to the new clusters to gradually increase the credible clusters.
  • the training the initial network through the training image data to obtain the re-identification network includes: determining an image data center based on the training image data; The image data center determines the contrast loss, optimizes the parameters of the initial network based on the contrast loss, and obtains an optimized network; clusters the non-clustered instance image data in the training image data through the optimized network, according to The clustering result updates the first clustered image data and the non-clustered instance image data to obtain new training image data; determines a new image data center based on the new training image data, and returns based on the The new training image data and the new image data center determine a new contrast loss step until the training is completed, and the re-identification network is obtained.
  • the embodiments of the present disclosure dynamically optimize the network, update the training data, and update the image data center, so as to improve the training performance of the re-recognition network, thereby improving the accuracy of the target object re-recognition result obtained by the object re-recognition method of the present disclosure. sex.
  • the image data center includes a first cluster center corresponding to the first clustered image data and an instance center corresponding to the non-clustered instance image data; or, the image data center includes all The first cluster center corresponding to the first clustered image data, the instance center corresponding to the non-clustered instance image data, and the second cluster center corresponding to the second clustered image data.
  • the network training can be performed through unsupervised learning, and the second cluster image data can be introduced for training using semi-supervised learning, which provides the flexibility and diversity of network training.
  • the re-identification network includes a residual network.
  • the residual network is a network composed of residual blocks
  • the residual blocks inside the network use jump connections, which helps to solve the problems of gradient disappearance and gradient explosion, and makes the residual network easy to optimize. At the same time, it can improve the performance of image classification and object recognition.
  • the present disclosure provides an object re-identification device, which includes: a network acquisition module configured to acquire a pre-trained re-identification network; an image acquisition module configured to acquire an image to be identified; a re-identification module configured to pair through the re-identification network
  • the image to be recognized is subjected to re-recognition processing to obtain the re-recognition result of the target object in the image to be recognized; wherein the training image data of the re-recognition network includes at least first cluster image data and non-cluster instance image data ,
  • the first clustered image data and the non-clustering instance image data are obtained by clustering the first image data set by the initial network corresponding to the re-identification network, and the images in the first image data set The data does not contain true cluster labels.
  • the present disclosure provides a computer device, including a memory, a processor, and a computer program stored on the memory and capable of running on the processor, and the processor implements the above object re-identification method when the processor executes the program.
  • the present disclosure provides a computer-readable storage medium in which computer-executable instructions are stored, and the computer-executable instructions are configured to implement the above-mentioned object re-identification method when executed by a processor.
  • the embodiments of the present disclosure provide a computer program product, wherein the above-mentioned computer program product includes a non-transitory computer-readable storage medium storing a computer program, and the above-mentioned computer program is operable to cause a computer to perform object recognition as in the embodiments of the present disclosure. Some or all of the steps described in the method.
  • the computer program product may be a software installation package.
  • FIG. 1 is a schematic diagram of a re-identification network obtained through network training in an embodiment of the disclosure
  • FIG. 2 is a schematic diagram of processing target domain image data in an embodiment of the disclosure
  • FIG. 3 is a schematic diagram of performing re-clustering processing on initial clustering results in an embodiment of the disclosure to obtain first clustered image data and non-clustered instance image data;
  • FIG. 4 is an example diagram of calculating an intensive index in an embodiment of the disclosure
  • FIG. 5 is a schematic diagram of performing re-clustering processing on initial clustering results in an embodiment of the present disclosure to obtain first clustered image data and non-clustering instance image data;
  • FIG. 6 is an example diagram of calculating an independent index in an embodiment of the disclosure.
  • FIG. 7 is a schematic diagram of training the initial network through training image data in an embodiment of the disclosure to obtain a re-identification network
  • FIG. 8 is a schematic diagram of object re-identification through a re-identification network in an embodiment of the disclosure.
  • FIG. 9 is a schematic diagram of a method for re-identification network training in an embodiment of the disclosure.
  • FIG. 10a is a schematic diagram of a method of re-clustering processing according to an embodiment of the disclosure.
  • FIG. 10b is a schematic diagram of another re-clustering processing method according to an embodiment of the disclosure.
  • FIG. 11 is a schematic diagram of a re-identification network training device in an embodiment of the disclosure.
  • FIG. 12 is a schematic diagram of an object re-identification device in an embodiment of the disclosure.
  • the words “if” and “if” as used herein can be interpreted as “when” or “when” or “in response to determination” or “in response to detection”.
  • the phrase “if determined” or “if detected (statement or event)” can be interpreted as “when determined” or “in response to determination” or “when detected (statement or event) )” or “in response to detection (statement or event)”.
  • AI Artificial Intelligence
  • digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge, and use knowledge to obtain the best results.
  • artificial intelligence is a comprehensive technology of computer science. It attempts to understand the essence of intelligence and produce a new kind of intelligent machine that can react in a similar way to human intelligence.
  • Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.
  • Artificial intelligence technology is a comprehensive discipline, covering a wide range of fields, including both hardware-level technology and software-level technology.
  • Basic artificial intelligence technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, and mechatronics.
  • Artificial intelligence software technology mainly includes several major directions such as computer vision technology and machine learning/deep learning.
  • Computer Vision is a science that studies how to make machines "see”. In some embodiments of this disclosure, it refers to the use of cameras and computers instead of human eyes to identify, track, and measure objects. , And do graphics processing to make computer processing more suitable for human eyes to observe or send to the instrument to detect images.
  • computer vision studies related theories and technologies, trying to establish an artificial intelligence system that can obtain information from images or multi-dimensional data.
  • Computer vision technology usually includes image processing, image recognition, image semantic understanding, image retrieval, OCR (Optical Character Recognition, optical character recognition), video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D (three dimensional) , 3D) technology, virtual reality, augmented reality, synchronous positioning and map construction, and other technologies, including common facial recognition, fingerprint recognition and other biometric technology.
  • Machine Learning is a multi-disciplinary interdisciplinary, involving probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and other subjects. Specializing in the study of how computers simulate or realize human learning behaviors in order to acquire new knowledge or skills, and reorganize the existing knowledge structure to continuously improve its own performance.
  • Machine learning is the core of artificial intelligence, the fundamental way to make computers intelligent, and its applications cover all fields of artificial intelligence.
  • Machine learning and deep learning usually include artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, teaching learning and other technologies.
  • Target re-recognition is an important issue in the field of computer vision and security monitoring. It is required to retrieve the image of the corresponding target from the data set.
  • the target can be a pedestrian, a vehicle, etc.
  • the network shows an unavoidable performance degradation, which is caused by the difference between image fields, such as camera environment, light, background, shooting equipment, etc. .
  • it is unrealistic to label different training data for each monitoring scene for network training, because labeling requires a lot of manpower and time.
  • the method based on pseudo-labels is a common method.
  • This method aims at self-training by continuously clustering on unlabeled target domains to generate pseudo-labels, and can achieve the most advanced performance.
  • the clustering process will produce certain abnormal points, that is, it cannot be classified into any type of edge samples. In order to ensure the quality of the clustering, these abnormal points are directly discarded and not included in the training set.
  • the process of self-training of the network only the image data with pseudo labels in the target domain is used, and outliers that are not included in the cluster are discarded.
  • outliers may be difficult but valuable sample images Data, thereby limiting the clustering performance of the network, which may have a certain impact on the clustering results of the network.
  • the present disclosure proposes an object re-identification method.
  • the re-identification network used in the method is trained based on at least the first clustered image data and the non-clustered instance image data. Therefore, the present disclosure is not in the cluster by combining Performing network training on the outliers of, helps to improve the clustering performance of the re-identification network, and further improves the accuracy of the target object re-identification result obtained by the object re-identification method of the present disclosure.
  • the object re-recognition method proposed in the embodiments of the present disclosure can be divided into two parts, including a network training part and a network application part; among them, the network training part relates to the technical field of machine learning.
  • machine learning This technique trains the initial network to obtain a trained re-recognition network; in the network application part, by using the re-recognition network trained in the network training part, the re-recognition result of the target object in the image to be recognized is obtained.
  • the method steps of the network training part of the present disclosure can be implemented by a terminal or a server.
  • Fig. 1 is a schematic diagram of a re-identification network obtained through network training in an embodiment of the disclosure. As shown in Fig. 1, the processing flow includes the following steps:
  • the initial network is the initial network to be trained, and the initial network has certain object re-identification capabilities.
  • the initial network can be, for example, a residual network (Residual Network, ResNet), etc.
  • the residual network is a network composed of residual blocks (Residual blocks), and the residual blocks inside the network use skip connections to help solve the disappearance of gradients.
  • the gradient explosion problem makes the residual network easy to optimize, and at the same time improves the performance of image classification and object recognition.
  • the network training method may adopt unsupervised learning.
  • Unsupervised learning refers to a process of network training using only unlabeled image data in a target domain, and the target domain may be the first surveillance scene.
  • the training image data of the re-identification network includes the first clustered image data and the non-clustered instance image data.
  • the first clustered image data and the non-clustered instance image data are obtained by clustering the first image data set by the initial network corresponding to the re-identification network, and the image data in the first image data set does not contain the true cluster label ,
  • the first image data set corresponds to the image data of the target domain.
  • the network training method may adopt semi-supervised learning.
  • Semi-supervised learning refers to a process of network training using both labeled image data in the source domain and unlabeled image data in the target domain.
  • the source domain may be a second surveillance scene.
  • the labeled image data in the source domain has a ground-truth (true value) label.
  • the ground-truth can be manually labeled, and the ground-truth can provide valuable supervision during the network training process.
  • the training image data of the re-identification network includes at least the first clustered image data, the non-clustered instance image data, and the second image data set.
  • the first clustered image data and the non-clustered instance image data are obtained by clustering the first image data set by the initial network corresponding to the re-identification network, and the image data in the first image data set does not contain the true cluster label ,
  • the first image data set corresponds to the image data of the target domain.
  • the second cluster image data in the second image data set contains the true cluster label, and the second image data set corresponds to the image data of the source domain; the image data domain where the second image data set is located is the same as the image data where the first image data set is located The domain is different.
  • the step of obtaining training image data includes obtaining labeled source domain image data, obtaining unlabeled target domain image data, and processing target domain image data A step of.
  • the image data that has been marked can be directly acquired.
  • the step of acquiring training image data includes the steps of acquiring unlabeled target domain image data and processing the target domain image data.
  • S220 Obtain an initial clustering result obtained by performing clustering processing on the first image data set through the initial network
  • S240 Perform re-clustering processing on the initial clustering result to obtain first clustered image data and non-clustered instance image data.
  • the first image data set corresponds to the target domain image data.
  • the first image data set is initially clustered through the initial network to obtain the initial clustering result corresponding to the first image data set, and then the initial clustering result is re-processed
  • the clustering process obtains the first clustered image data and the non-clustered instance image data.
  • the above processing flow for processing the image data of the target domain can be understood as a self-defined step size comparison learning strategy, that is, according to the principle of "from simple to difficult", first obtain the most credible cluster, and then regroup Class processing gradually increases credible clusters, thereby improving the quality of learning objectives, and reducing errors by increasing credible clusters.
  • the initial clustering result includes initial clustering image data
  • FIG. 3 is a schematic diagram of performing re-clustering processing on the initial clustering results in an embodiment of the present disclosure to obtain first clustered image data and non-clustering instance image data. As shown in FIG. 3, the processing flow includes the following steps:
  • S242A according to the image feature distance, reduce the number of image data of the first current cluster in the initial clustered image data to obtain the second current cluster;
  • S244A Determine the density index of the second current cluster, where the density index is the ratio of the number of image data of the second current cluster to the number of image data of the first current cluster;
  • S248A Update the reduced image data to the image data belonging to the non-clustering instance.
  • the image feature distance of each image data meets the clustering criterion, namely ⁇ d ⁇ d1, where ⁇ d is the image feature distance, and d1 is the distance corresponding to the clustering standard .
  • the clustering standard After increasing the clustering standard (reducing the distance corresponding to the clustering standard), for example, the clustering standard becomes d2, and d2 ⁇ d1, it may happen that the image feature distance of some image data is greater than the clustering standard, namely ⁇ d >d2, at this time, retain the image data with ⁇ d ⁇ d2 according to the image feature distance, and remove the image data with ⁇ d>d2 from the first current cluster, the number of image data in the first current cluster is reduced, and a new one is obtained The second current cluster.
  • the density index of the second current cluster is calculated, and the density index is used to evaluate the density of the cluster.
  • Fig. 4 is an example diagram of calculating the density index.
  • the dots represent image data
  • the black dots represent the retained image data
  • the white dots represent the removed image data
  • the solid line area represents the first current cluster clu1
  • the dotted area represents the second current cluster clu2.
  • the density index P is calculated, the density index P is compared with the corresponding first preset threshold ⁇ P, and it is determined whether to retain the new cluster (that is, the second current cluster) according to the comparison result.
  • the density index P of the second current cluster clu2 reaches the preset density requirement.
  • the first current cluster is disbanded, the second current cluster is retained, and the second current cluster pair is used
  • the first cluster image data is updated.
  • the image data that is reduced (removed) in the cluster the image data is updated to belong to the non-cluster instance image data. For example, referring to Fig. 4, when P is 5/7 and ⁇ P is 0.5, P> ⁇ P.
  • the first current cluster is replaced by the second current cluster, and the image data of the first cluster is updated.
  • re-clustering is performed by evaluating the density of clusters, so as to gradually increase credible clusters, thereby improving the quality of the learning target, and reducing errors by increasing credible clusters.
  • a clustering credibility evaluation criterion which re-clusters the initial clustering results by evaluating the independence of the clusters, thereby increasing the number of credible clusters.
  • the initial clustering result includes initial clustered image data and initial non-clustered image data.
  • FIG. 5 is a schematic diagram of performing re-clustering processing on initial clustering results in an embodiment of the disclosure to obtain first clustered image data and non-clustering instance image data. As shown in FIG. 5, the processing flow includes the following steps:
  • S244B Determine the independence index of the fourth current cluster; the independence index is the ratio of the number of image data of the third current cluster to the number of image data of the fourth current cluster;
  • re-clustering is performed by lowering the clustering standard to verify whether the independence of clustering meets the preset requirements.
  • the image feature distance of each image data meets the clustering criterion, namely ⁇ d ⁇ d1, where ⁇ d is the image feature distance, and d1 is the distance corresponding to the clustering standard .
  • the non-current cluster image data of ⁇ d' ⁇ d3 is added to the third current cluster, the number of image data in the third current cluster increases, and a new fourth current cluster is obtained.
  • the added image data may include only image data of other clusters that meet the requirements, may include only image data in the initial non-clustered image data that meets the requirements, or may also include other clusters that meet the requirements. Image data and image data in the initial non-clustered image data.
  • the independence index of the fourth current cluster is calculated, and the independence index is used to evaluate the independence of the cluster.
  • Figure 6 is an example diagram for calculating the independence index.
  • the solid line area represents the existing cluster clusters before re-clustering, that is, the clusters in the initial clustering image data, including the third current cluster clu3 and other clusters clui
  • the dots represent the image data
  • the black dots represent the image data in the initial clustered image data
  • the white dots represent the image data in the initial non-clustered image data
  • the independent index Q is compared with the corresponding second preset threshold ⁇ Q, and it is determined whether to retain the new cluster (that is, the fourth current cluster) according to the comparison result.
  • the independent index Q of the fourth current cluster clu4 meets the preset independence requirement.
  • the third current cluster is disbanded, the fourth current cluster is retained, and the fourth current cluster pair is used
  • the first cluster image data is updated.
  • the added image data includes the image data of other clusters
  • the other clusters are dissolved, for example, when the independence index Q of the fourth current cluster clu4 reaches the preset independence requirement, the other clusters clui (i is Integer representing the cluster label).
  • the added image data includes the image data in the initial non-clustered image data
  • the added image data is updated to not belong to the non-clustered instance image data.
  • the other clusters are retained.
  • the independence index Q of the fourth current cluster clu4 does not meet the preset independence requirement, the other clusters clui are retained.
  • the added image data includes the image data in the initial non-clustered image data
  • the added image data is updated to belong to the non-clustered instance image data.
  • re-clustering is performed by evaluating the independence of clusters, which can gradually increase the recognition rate of feature representations, and add more non-clustered data to the new clusters, so as to gradually increase the credible clusters. , So as to improve the quality of learning objectives, and reduce errors by increasing credible clustering.
  • the corresponding preset threshold can be set according to the actual situation, for example, both ⁇ P and ⁇ Q are set to 0.5.
  • FIG. 7 is a schematic diagram of training the initial network through training image data to obtain the re-identification network in an embodiment of the disclosure. As shown in FIG. 7, the processing flow includes the following steps:
  • S320 Determine an image data center based on the training image data
  • S380 Determine a new image data center based on the new training image data, and return to the step of determining a new contrast loss based on the new training image data and the new image data center, until the training is completed, and the re-identification network is obtained.
  • the training data when semi-supervised learning is used for network training, includes first clustered image data, non-clustered instance image data, and second clustered image data.
  • the image data center includes a first cluster center corresponding to the first clustered image data, an instance center corresponding to the non-clustered instance image data, and a second cluster center corresponding to the second clustered image data.
  • the definition X s represents the second clustered image data in the second image data set (i.e. source domain data), and X t represents the first image data set (i.e. target domain data), Represents the first cluster image data, Represents non-clustering instance image data, then
  • the comparison loss can be calculated by the following formula (1), and the parameters of the initial network can be optimized based on the comparison loss to obtain an optimized network:
  • is set to 0.05
  • ⁇ a, b> represents the inner product between the two feature vectors of a and b, which is used to measure the similarity of feature vectors
  • n s represents the number of clusters in the second cluster image data
  • w k represents the second cluster center corresponding to the second cluster image data
  • c k represents the first cluster center corresponding to the first cluster image data
  • v k represents The instance center corresponding to the non-clustered instance image data.
  • the non-clustered instance image data is clustered through the optimized network, and the first clustered image data and the non-clustered instance image data are updated according to the clustering result.
  • hybrid memory can be used to store the first clustered image data, the non-clustered instance image data, and the second clustered image data, as well as the data corresponding to the first clustered image data.
  • a new image data center is determined based on the new training image data, that is, the image data center stored in the hybrid memory is updated and adjusted.
  • the update of the second cluster center can be adjusted on the basis of the original center; and the update of the first cluster center and the instance center is based on the first cluster image data and non-cluster instance images The updated changes of the data are recalculated.
  • the update of the second cluster center w k can be achieved by the following formula (2):
  • the update of the first cluster center c k can be achieved by the following formula (3):
  • I k is the k-th cluster in the first cluster image data
  • I k represents the number of features in the cluster.
  • step (2) After updating the hybrid memory, return to step (2) to perform network iteration training until the network converges, and the re-identification network is obtained.
  • the method steps of the network application part of the present disclosure may be implemented by a terminal or a server, and the execution subject of the method steps of the network application part may be the same as or different from the execution subject of the method steps of the network training part.
  • FIG. 8 is a schematic diagram of object re-identification through a re-identification network in an embodiment of the disclosure. As shown in FIG. 8, the processing flow includes the following steps:
  • the re-identification network is obtained by training through the method steps of the network training part in the above embodiments of the present disclosure.
  • the training image data of the re-recognition network includes at least the first clustered image data and non-clustered instance image data, the first clustered image data and the non-clustered instance image data
  • the image data in the first image data set does not contain the true clustering label.
  • the training image data of the re-recognition network further includes a second image data set, and the second cluster image data in the second image data set contains the true cluster label;
  • the image data domain where the second image data set is located is different from the image data domain where the first image data set is located.
  • This embodiment provides an object re-identification method.
  • the re-identification network used in the method is trained based on at least the first clustered image data and the non-clustered instance image data. Therefore, the present disclosure combines the separations that are not in the clustering.
  • the network training of the group value helps to improve the clustering performance of the re-identification network, and further improves the accuracy of the target object re-identification result obtained by the object re-identification method of the present disclosure.
  • the purely unsupervised problem aims to learn discriminative features without any labeled data, that is, without the aid of labeled data in the source domain, it can directly and effectively perform target re-identification on the target domain in an unsupervised manner .
  • the method based on pseudo-label is the most effective.
  • This type of method aims at self-training by continuously clustering to generate pseudo-labels on unlabeled target domains, and can achieve the most advanced performance.
  • this type of method has the following shortcomings, which limit their performance improvement: First, because the clustering process will produce certain clustering abnormal samples, that is, they cannot be classified into any type of edge samples. The existing methods are for To ensure the quality of clustering, these clustering abnormal samples are directly discarded, and they are not included in the training set.
  • cluster-based unsupervised domain adaptive algorithms often use source domain data for pre-training, and then the trained model Read in, and train through the pseudo-labels generated by clustering and unlabeled target domain samples to migrate to the target domain.
  • the algorithm discards valuable source domain data during the training process of the target domain, wastes data with real labels in the source domain, and causes loss of source domain performance.
  • the relevant comparative learning loss function only considers instance-level supervision.
  • the method proposed in the embodiments of the present disclosure achieves an advanced recognition degree in the unsupervised domain adaptive pedestrian and vehicle re-recognition problem, and can effectively improve the source domain performance without manual labeling.
  • the method of the embodiments of the present disclosure can be simply extended to the problem of unsupervised target re-identification, that is, by removing source domain data in training and source domain class-level supervision, the performance is significantly improved compared to related methods.
  • the self-step comparison learning strategy proposed in the embodiments of the present disclosure is based on the principle of "from simple to difficult", by first learning the most credible clusters, and then gradually increasing the credible clusters, to improve the quality of the learning objectives, and thus through Increase credible clustering to reduce errors.
  • This strategy provides a cluster credibility evaluation criterion. By evaluating the independence and compactness of the clusters, the most credible clusters are selected for retention, and the remaining clusters will be returned to samples without clusters to provide examples Level supervision.
  • the image encoder of the algorithm of the embodiment of the present disclosure can be used to extract the feature information of the target image; the feature extracted by the algorithm of the embodiment of the present disclosure can be used to retrieve pedestrians or vehicles in the security monitoring scene; the algorithm of the embodiment of the present disclosure can be used, Improve the capability of image encoders without supervision.
  • FIG. 9 is a schematic diagram of a method for training a re-identification network using semi-supervised learning according to an embodiment of the disclosure.
  • the training method of the re-identification network includes the following steps:
  • Step S901 Obtain a residual network (initial network) 901;
  • Step S903 Perform clustering processing on the target domain image data X t in the first image data set through the residual network 901 to obtain an initial clustering result, where the initial clustering result includes the initial clustered image data and the initial non- Cluster image data;
  • Step S905 Determine an image data center based on the training image data
  • the training image data includes the first clustered image data, the non-clustered instance image data, and the second clustered image data;
  • the image data center includes the first clustered image data corresponding The first cluster center, the instance center corresponding to the non-clustering instance image data, and the second cluster center corresponding to the second cluster image data, the determined first cluster center, all the Both the second cluster center and the example center are stored in the hybrid memory 902.
  • step S905 may include the following steps:
  • Step S9051 Determine a comparison loss based on the training image data and the image data center, and perform parameter optimization on the residual network 901 based on the comparison loss to obtain an optimized network;
  • Step S9052 cluster the non-clustered instance image data in the training image data through the optimization network, and perform clustering on the first clustered image data in the hybrid memory 902 and the non-clustered instance image data according to the clustering result.
  • the instance image data is updated to obtain new training image data f s and f t .
  • the f s includes the second cluster image data
  • the f t includes the updated first cluster image data and the non- Clustering instance image data;
  • Step S9053 Determine a new image data center based on the new training image data, return to the step of determining a new contrast loss based on the new training image data and the new image data center, until the training is completed, the Then identify the network.
  • the hybrid memory 902 can be updated according to the new training data f s and f t.
  • the re-clustering process is performed on the initial clustering result in step S904.
  • Figure 10a which may include the following steps:
  • Step S9041 According to the image feature distance, reduce the number of image data of the first current cluster in the initial clustered image data to obtain a second current cluster;
  • Step S9042 Determine the density index of the second current cluster, where the density index is the ratio of the number of image data of the second current cluster to the number of image data of the first current cluster;
  • the number of image data in the second current cluster is 5, and the number of image data in the first current cluster is 7, and the density index of the second current cluster is 5/7.
  • Step S9043 When the density index reaches a first preset threshold, replace the first current cluster with the second current cluster to obtain the first cluster image data 90211;
  • the first cluster image data may be image data in the second current cluster 102a.
  • Step S9044 update the reduced image data to belong to the non-clustering instance image data 90212.
  • the reduced image data 1011a and image data 1012a can be updated to belong to the non-clustering instance image data 90212.
  • the non-clustering instance image data includes the initial non-clustering image data represented by the gray dots, and Image data 1011a and image data 1012a.
  • re-clustering processing is performed on the initial clustering result in step S904, which can be seen in FIG. 10b, including the following steps:
  • Step S9045 Add image data of other clusters and/or image data in the initial non-clustered image data to the third current cluster of the initial clustered image data according to the image feature distance to obtain a fourth current cluster,
  • the other clusters are clusters that are different from the third current cluster in the initial clustered image data;
  • the dots can represent image data
  • the white dots can represent the initial clustered image data
  • the gray dots can represent the initial non-clustered image data; the third current cluster 101b and other clusters that existed before the re-clustering process 102b.
  • the image feature distance changes from d1 to d3, and d3>d1.
  • the image feature distances of the initial non-clustered image data 1011b, the initial non-clustered image data 1012b, and the initial non-clustered image data 1013b are all less than d3 .
  • the initial non-clustered image data 1011b, the initial non-clustered image data 1012b, the initial non-clustered image data 1013b, and the image data in other clusters 102b are added to the third current cluster 101b, and the images in the third current cluster 101b The data increases, and a new fourth current cluster 103b is obtained.
  • Step S9046 Determine the independence index of the fourth current cluster; the independence index is the ratio of the number of image data of the third current cluster to the number of image data of the fourth current cluster;
  • Step S9047 When the independence index reaches a first preset threshold, replace the third current cluster with the fourth current cluster to obtain the first cluster image data;
  • the first cluster image data 90211 may be image data in the fourth current cluster 103a.
  • a re-identification network training device is provided.
  • FIG. 11 is a schematic diagram of a re-identification network training device in an embodiment of the disclosure. As shown in FIG. 11, the device includes the following modules:
  • the first obtaining module 100 is configured to obtain an initial network
  • the network training module 300 is configured to train the initial network through training image data to obtain the re-identification network.
  • Each module in the above-mentioned re-identification network training device can be implemented in whole or in part by software, hardware, and a combination thereof.
  • the foregoing modules may be embedded in the form of hardware or independent of the processor in the computer device, or may be stored in the memory of the computer device in the form of software, so that the processor can call and execute the operations corresponding to the foregoing modules.
  • an object re-identification device is provided.
  • FIG. 12 is a schematic diagram of an object re-identification device in an embodiment of the disclosure. As shown in FIG. 12, the device includes the following modules:
  • the network acquisition module 400 is configured to acquire a pre-trained re-identification network
  • the image acquisition module 500 is configured to acquire an image to be recognized
  • the re-recognition module 600 is configured to perform re-recognition processing on the image to be recognized through the re-recognition network to obtain the re-recognition result of the target object in the image to be recognized;
  • the training image data of the re-identification network includes at least the first clustered image data and the non-clustered instance image data.
  • the image data set is obtained by performing clustering processing, and the image data in the first image data set does not contain real cluster tags.
  • the training image data of the re-identification network further includes a second image data set, and the second cluster image data in the second image data set includes a true cluster label; the second image data set The image data domain where the first image data set is located is different from the image data domain where the first image data set is located.
  • the device further includes: an initial network acquisition module configured to acquire the initial network; a data acquisition module configured to acquire the training image data; and a training module configured to pass the training image data pair
  • the initial network is trained to obtain the re-identification network.
  • the initial clustering result includes initial clustered image data; the clustering processing unit is configured to reduce the number of image data of the first current cluster in the initial clustered image data according to the distance of the image feature , Obtain the second current cluster; determine the density index of the second current cluster, where the density index is the ratio of the number of image data of the second current cluster to the number of image data of the first current cluster; When the density index reaches the first preset threshold, replace the first current cluster with the second current cluster to obtain the first clustered image data; update the reduced image data to belong to non-clustered instance images data.
  • the initial clustering result further includes initial non-clustered image data;
  • the clustering processing unit is further configured to, according to the image feature distance, in the third current cluster of the initial clustered image data Add image data of other clusters and/or image data in the initial non-clustered image data to obtain a fourth current cluster, where the other clusters are different from the third current cluster in the initial clustered image data Cluster; determine the independence index of the fourth current cluster; the independence index is the ratio of the number of image data of the third current cluster to the number of image data of the fourth current cluster; when the independence index reaches the first In the case of a preset threshold, replace the third current cluster with the fourth current cluster to obtain the first cluster image data; in the case where the added image data includes the image data of the other clusters, dissolve The other clusters; and/or, in the case where the added image data includes the image data in the initial non-clustered image data, the added image data is updated to not belong to the non-clustered instance image data.
  • the training module includes: a first determining unit configured to determine an image data center based on the training image data; an optimization unit configured to determine a comparison based on the training image data and the image data center Loss, parameter optimization of the initial network based on the comparison loss to obtain an optimized network; the clustering unit is configured to cluster the non-clustered instance image data in the training image data through the optimized network, according to The clustering result updates the first clustered image data and the non-clustered instance image data to obtain new training image data; the second determining unit is configured to determine a new image based on the new training image data The data center returns to the step of determining a new contrast loss based on the new training image data and the new image data center, until the training is completed, and the re-identification network is obtained.
  • the image data center includes a first cluster center corresponding to the first clustered image data and an instance center corresponding to the non-clustered instance image data; or, the image data center includes all The first cluster center corresponding to the first clustered image data, the instance center corresponding to the non-clustered instance image data, and the second cluster center corresponding to the second clustered image data.
  • the re-identification network includes a residual network.
  • Each module in the above-mentioned object re-identification device can be implemented in whole or in part by software, hardware, and a combination thereof.
  • the foregoing modules may be embedded in the form of hardware or independent of the processor in the computer device, or may be stored in the memory of the computer device in the form of software, so that the processor can call and execute the operations corresponding to the foregoing modules.
  • a computer device including: a memory, a processor, and a computer program stored in the memory and capable of running on the processor.
  • the processor executes the program to implement the method of the network training part of the above embodiments Steps, and/or, method steps of the network application part.
  • the present disclosure combines outliers not in the cluster for network training, which helps to improve the re-recognition.
  • the clustering performance of the recognition network further improves the accuracy of the target object re-recognition result obtained by the object re-recognition method of the present disclosure.

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Signal Processing For Digital Recording And Reproducing (AREA)

Abstract

Selon des modes de réalisation, la présente invention concerne un procédé et un appareil de réidentification d'objet, et un support d'enregistrement et un dispositif informatique. Le procédé consiste à : obtenir un réseau de réidentification préformé; obtenir une image à identifier; et exécuter un traitement de réidentification sur ladite image au moyen du réseau de réidentification pour obtenir le résultat de réidentification d'un objet cible dans ladite image. Des données d'image de formation du réseau de réidentification comprennent au moins des premières données d'image de regroupement et des données d'image d'instance de non-regroupement. Les premières données d'image de regroupement et les données d'image d'instance de non-regroupement sont obtenues par regroupement d'un premier jeu de données d'image au moyen d'un réseau initial correspondant au réseau de réidentification, et des données d'image dans le premier jeu de données d'image ne comprennent pas de balises de regroupement vraies.
PCT/CN2020/126269 2020-06-04 2020-11-03 Procédé et appareil de réidentification d'objet, terminal et support d'enregistrement WO2021243947A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
JP2021549335A JP2022548187A (ja) 2020-06-04 2020-11-03 対象再識別方法および装置、端末並びに記憶媒体
KR1020217025979A KR20210151773A (ko) 2020-06-04 2020-11-03 대상 재인식 방법 및 장치, 단말 및 저장 매체

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010499288.7A CN111612100B (zh) 2020-06-04 2020-06-04 对象再识别方法、装置、存储介质及计算机设备
CN202010499288.7 2020-06-04

Publications (1)

Publication Number Publication Date
WO2021243947A1 true WO2021243947A1 (fr) 2021-12-09

Family

ID=72202637

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/126269 WO2021243947A1 (fr) 2020-06-04 2020-11-03 Procédé et appareil de réidentification d'objet, terminal et support d'enregistrement

Country Status (5)

Country Link
JP (1) JP2022548187A (fr)
KR (1) KR20210151773A (fr)
CN (1) CN111612100B (fr)
TW (1) TWI780567B (fr)
WO (1) WO2021243947A1 (fr)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111612100B (zh) * 2020-06-04 2023-11-03 商汤集团有限公司 对象再识别方法、装置、存储介质及计算机设备
CN112965890B (zh) * 2021-03-10 2024-06-07 中国民航信息网络股份有限公司 一种数据处理方法及相关设备
CN113221820B (zh) * 2021-05-28 2022-07-19 杭州网易智企科技有限公司 一种对象识别方法、装置、设备及介质
CN116682043B (zh) * 2023-06-13 2024-01-26 西安科技大学 基于SimCLR无监督深度对比学习异常视频清洗方法

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090034791A1 (en) * 2006-12-04 2009-02-05 Lockheed Martin Corporation Image processing for person and object Re-identification
CN106022293A (zh) * 2016-05-31 2016-10-12 华南农业大学 一种基于自适应共享小生境进化算法的行人再识别方法
CN108921107A (zh) * 2018-07-06 2018-11-30 北京市新技术应用研究所 基于排序损失和Siamese网络的行人再识别方法
CN109740653A (zh) * 2018-12-25 2019-05-10 北京航空航天大学 一种融合视觉表观与时空约束的车辆再识别方法
CN109961051A (zh) * 2019-03-28 2019-07-02 湖北工业大学 一种基于聚类和分块特征提取的行人重识别方法
CN111210269A (zh) * 2020-01-02 2020-05-29 平安科技(深圳)有限公司 基于大数据的对象识别方法、电子装置及存储介质
CN111612100A (zh) * 2020-06-04 2020-09-01 商汤集团有限公司 对象再识别方法、装置、存储介质及计算机设备

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108288051B (zh) * 2018-02-14 2020-11-27 北京市商汤科技开发有限公司 行人再识别模型训练方法及装置、电子设备和存储介质
US10176405B1 (en) * 2018-06-18 2019-01-08 Inception Institute Of Artificial Intelligence Vehicle re-identification techniques using neural networks for image analysis, viewpoint-aware pattern recognition, and generation of multi- view vehicle representations
US11537817B2 (en) * 2018-10-18 2022-12-27 Deepnorth Inc. Semi-supervised person re-identification using multi-view clustering
CN110263697A (zh) * 2019-06-17 2019-09-20 哈尔滨工业大学(深圳) 基于无监督学习的行人重识别方法、装置及介质

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090034791A1 (en) * 2006-12-04 2009-02-05 Lockheed Martin Corporation Image processing for person and object Re-identification
CN106022293A (zh) * 2016-05-31 2016-10-12 华南农业大学 一种基于自适应共享小生境进化算法的行人再识别方法
CN108921107A (zh) * 2018-07-06 2018-11-30 北京市新技术应用研究所 基于排序损失和Siamese网络的行人再识别方法
CN109740653A (zh) * 2018-12-25 2019-05-10 北京航空航天大学 一种融合视觉表观与时空约束的车辆再识别方法
CN109961051A (zh) * 2019-03-28 2019-07-02 湖北工业大学 一种基于聚类和分块特征提取的行人重识别方法
CN111210269A (zh) * 2020-01-02 2020-05-29 平安科技(深圳)有限公司 基于大数据的对象识别方法、电子装置及存储介质
CN111612100A (zh) * 2020-06-04 2020-09-01 商汤集团有限公司 对象再识别方法、装置、存储介质及计算机设备

Also Published As

Publication number Publication date
JP2022548187A (ja) 2022-11-17
KR20210151773A (ko) 2021-12-14
TW202147156A (zh) 2021-12-16
CN111612100A (zh) 2020-09-01
TWI780567B (zh) 2022-10-11
CN111612100B (zh) 2023-11-03

Similar Documents

Publication Publication Date Title
CN111814854B (zh) 一种无监督域适应的目标重识别方法
Han et al. A unified metric learning-based framework for co-saliency detection
CN110414432B (zh) 对象识别模型的训练方法、对象识别方法及相应的装置
CN111709409B (zh) 人脸活体检测方法、装置、设备及介质
WO2021243947A1 (fr) Procédé et appareil de réidentification d'objet, terminal et support d'enregistrement
CN111754596B (zh) 编辑模型生成、人脸图像编辑方法、装置、设备及介质
CN108960080B (zh) 基于主动防御图像对抗攻击的人脸识别方法
CN109948475B (zh) 一种基于骨架特征和深度学习的人体动作识别方法
CN109558823B (zh) 一种以图搜图的车辆识别方法及系统
CN107133569B (zh) 基于泛化多标记学习的监控视频多粒度标注方法
CN112395951B (zh) 一种面向复杂场景的域适应交通目标检测与识别方法
CN112418032B (zh) 一种人体行为识别方法、装置、电子设备及存储介质
CN111046732A (zh) 一种基于多粒度语义解析的行人重识别方法及存储介质
CN110111365B (zh) 基于深度学习的训练方法和装置以及目标跟踪方法和装置
Xiong et al. Contrastive learning for automotive mmWave radar detection points based instance segmentation
CN113065409A (zh) 一种基于摄像分头布差异对齐约束的无监督行人重识别方法
CN110751005B (zh) 融合深度感知特征和核极限学习机的行人检测方法
Jemilda et al. Moving object detection and tracking using genetic algorithm enabled extreme learning machine
CN115223020A (zh) 图像处理方法、装置、电子设备以及可读存储介质
CN113095199A (zh) 一种高速行人识别方法及装置
CN110222772B (zh) 一种基于块级别主动学习的医疗图像标注推荐方法
CN113076963B (zh) 一种图像识别方法、装置和计算机可读存储介质
CN117152851B (zh) 基于大模型预训练的人脸、人体协同聚类方法
CN112487927B (zh) 一种基于物体关联注意力的室内场景识别实现方法及系统
CN113420608A (zh) 一种基于密集时空图卷积网络的人体异常行为识别方法

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2021549335

Country of ref document: JP

Kind code of ref document: A

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20939442

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20939442

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 19/05/2023)