CN113761263A

CN113761263A - Similarity determination method and device and computer readable storage medium

Info

Publication number: CN113761263A
Application number: CN202111050793.4A
Authority: CN
Inventors: 邓潇; 李林森; 莫致良
Original assignee: Hangzhou Hikvision Digital Technology Co Ltd
Current assignee: Hangzhou Hikvision Digital Technology Co Ltd
Priority date: 2021-09-08
Filing date: 2021-09-08
Publication date: 2021-12-07

Abstract

The embodiment of the application discloses a similarity determining method and device and a computer readable storage medium, and belongs to the technical field of image processing. In the embodiment of the application, for two images in which a target object exists, the image similarity between the two images is determined, and the transition probability that one target object is transferred from the shooting view of the second camera to the shooting view of the first camera within the reference time length is determined, so that the similarity between the target objects existing in the two images is determined based on the image similarity and the transition probability. Wherein the transition probability is a spatio-temporal feature, not a feature in an image. That is, the present scheme determines the similarity not only from the features in the image but also from the spatio-temporal features, and thus the accuracy of the determined similarity is high even in the case where the image quality is low.

Description

Similarity determination method and device and computer readable storage medium

Technical Field

The embodiment of the application relates to the technical field of image processing, in particular to a similarity determination method and device and a computer readable storage medium.

Background

In the field of image processing technology, it is generally necessary to determine the similarity between target objects in any two images, and then process the images according to the similarity between the target objects. For example, the similarity between the target objects in any two images is determined, and target clustering is performed on the multiple images based on the similarity between the target objects.

In the related art, it is assumed that target clustering is performed according to the similarity between target objects, taking a target object as a person as an example, a first similarity between the persons in the plurality of images is determined according to the person feature in the plurality of images, and the plurality of images are initially clustered based on the first similarity to obtain at least one initial image set. The age and the gender of the people in each image are obtained through the information extraction model, the second similarity between the people in the images included in each initial image set is determined based on the age and the gender of the people in each image, and the final image clustering result is determined based on the second similarity.

However, the portrait characteristics, the age and the sex of the person are determined according to the characteristics in the image, and once the image quality is low due to occlusion, illumination change, exposure transition and the like, the accuracy of determining the similarity between the target objects according to the characteristics in the image is low.

Disclosure of Invention

The embodiment of the application provides a similarity determination method, a similarity determination device and a computer readable storage medium, which can improve the accuracy of the determined similarity. The technical scheme is as follows:

in one aspect, a method for determining similarity is provided, where the method includes:

determining image similarity between a first image and a second image, wherein the first image has a first target object, the second image has a second target object, the first image is obtained by shooting through a first camera, and the second image is obtained by shooting through a second camera;

determining a transition probability that the first target object transitions from the capture view of the second camera to the capture view of the first camera within a reference duration;

determining a similarity between the first target object and the second target object based on the image similarity and the transition probability.

Optionally, the determining the transition probability includes:

determining a time difference between a first shooting time and a second shooting time, wherein the first shooting time is the shooting time of the first image, and the second shooting time is the shooting time of the second image;

determining a time length range in which the time difference is positioned from a plurality of time length ranges;

and inputting the mark value of the time length range in which the time difference is positioned into a specified probability distribution model to obtain the transition probability output by the specified probability distribution model, wherein the mark value is used for indicating the corresponding time length range.

Optionally, before determining the transition probability, the method further includes:

determining the designated probability distribution model from a plurality of probability distribution models based on the identity of the first camera and the identity of the second camera;

wherein the specified probability distribution model corresponds to a reference camera pair, the reference camera pair comprising the first camera and the second camera, the probability distribution model characterizing a temporal probability distribution of a transition of the same target object between two cameras comprised by the corresponding one of the camera pairs, different probability distribution models corresponding to different camera pairs.

Optionally, before determining the specific probability distribution model from the plurality of probability distribution models according to the identifier of the first camera and the identifier of the second camera, the method further includes:

acquiring a plurality of observation sample pairs corresponding to each camera pair in the plurality of camera pairs, wherein each observation sample pair comprises shooting time of two observation images which are shot by the corresponding camera pair and have the same target object;

determining a plurality of time differences corresponding to the corresponding camera pairs based on the shooting time included in the plurality of observation sample pairs respectively corresponding to the camera pairs;

counting the number of time differences in each time difference range in the plurality of time difference ranges respectively corresponding to each camera pair to obtain a plurality of statistical frequency numbers corresponding to the corresponding camera pairs;

and determining a probability distribution model corresponding to each camera pair based on a plurality of statistical frequency numbers respectively corresponding to each camera pair and the mark values of the plurality of time ranges.

Optionally, the determining the similarity between the first target object and the second target object based on the image similarity and the transition probability includes:

determining a comprehensive feature vector based on the image similarity and the transition probability;

and inputting the comprehensive characteristic vector into an image classification model to obtain the similarity between the first target object and the second target object output by the image classification model.

Optionally, before determining the comprehensive feature vector based on the image similarity and the transition probability, the method further includes:

acquiring first attribute information and second attribute information, wherein the first attribute information represents the attribute of the first target object, and the second attribute information represents the attribute of the second target object;

determining an attribute feature vector based on the first attribute information and the second attribute information, the attribute feature vector characterizing attribute similarity between the first target object and the second target object;

the determining a synthetic feature vector based on the image similarity and the transition probability comprises:

determining the synthetic feature vector based on the image similarity, the attribute feature vector, and the transition probability.

Optionally, before determining the comprehensive feature vector based on the image similarity, the attribute feature vector, and the transition probability, the method further includes:

acquiring a distance between the first camera and the second camera and a time difference between the first shooting time and the second shooting time;

determining a moving speed of the first target object based on the distance and the time difference;

the determining the synthetic feature vector based on the image similarity, the attribute feature vector, and the transition probability includes:

and combining the image similarity, the attribute feature vector, the transition probability, the distance, the time difference and the moving speed to obtain the comprehensive feature vector.

Optionally, before the inputting the comprehensive feature vector into the image classification model, the method further includes:

acquiring a training sample set, wherein the training sample set comprises a plurality of positive samples and a plurality of negative samples, each positive sample comprises two sample images with the same target object, each negative sample comprises two sample images with different target objects, and the training sample set further comprises an identifier of a camera for shooting the sample images and shooting time;

determining a plurality of sample image similarities based on sample images included in the plurality of positive samples and the plurality of negative samples;

determining a plurality of sample transition probabilities based on the identities and capture times of cameras included in the plurality of positive samples and the plurality of negative samples, and a plurality of probability distribution models;

determining a plurality of sample synthetic feature vectors based on the plurality of sample image similarities and the plurality of sample transition probabilities;

and training an initial classification model through the multiple sample comprehensive characteristic vectors to obtain the image classification model.

if the image similarity is greater than a first threshold and less than a second threshold, then the step of determining a transition probability is performed.

In another aspect, a similarity determination apparatus is provided, the apparatus including:

the device comprises a first determining module, a second determining module and a third determining module, wherein the first determining module is used for determining the image similarity between a first image and a second image, the first image has a first target object, the second image has a second target object, the first image is obtained by shooting through a first camera, and the second image is obtained by shooting through a second camera;

a second determination module for determining a transition probability, the transition probability being a probability that the first target object transitions from the photographing view of the second camera to the photographing view of the first camera within a reference time period;

a third determining module, configured to determine a similarity between the first target object and the second target object based on the image similarity and the transition probability.

Optionally, the second determining module includes:

the first determining submodule is used for determining the time difference between first shooting time and second shooting time, wherein the first shooting time is the shooting time of the first image, and the second shooting time is the shooting time of the second image;

the second determining submodule is used for determining a time length range in which the time difference is positioned from a plurality of time length ranges;

and the first processing submodule is used for inputting the mark value of the time length range in which the time difference is positioned into a specified probability distribution model to obtain the transition probability output by the specified probability distribution model, and the mark value is used for indicating the corresponding time length range.

Optionally, the apparatus further comprises:

a fourth determining module configured to determine the assigned probability distribution model from a plurality of probability distribution models according to the identifier of the first camera and the identifier of the second camera;

Optionally, the apparatus further comprises:

the device comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring a plurality of observation sample pairs corresponding to each camera pair in a plurality of camera pairs respectively, and each observation sample pair comprises the shooting time of two observation images which are shot by the corresponding camera pair and have the same target object;

a fifth determining module, configured to determine, based on shooting times included in a plurality of observation sample pairs respectively corresponding to each camera pair, a plurality of time differences corresponding to the corresponding camera pair;

the counting module is used for counting the number of time differences positioned in each time length range in the time length ranges in the time differences respectively corresponding to each camera pair to obtain a plurality of counting frequency numbers corresponding to the corresponding camera pairs;

and the sixth determining module is used for determining the probability distribution model corresponding to the corresponding camera pair based on the plurality of statistical frequency numbers respectively corresponding to the camera pairs and the mark values of the plurality of time ranges.

Optionally, the third determining module includes:

a third determining submodule, configured to determine a comprehensive feature vector based on the image similarity and the transition probability;

and the second processing submodule is used for inputting the comprehensive characteristic vector into an image classification model to obtain the similarity between the first target object and the second target object output by the image classification model.

Optionally, the third determining module further includes:

the first obtaining submodule is used for obtaining first attribute information and second attribute information, the first attribute information represents the attribute of the first target object, and the second attribute information represents the attribute of the second target object;

a fourth determining submodule, configured to determine an attribute feature vector based on the first attribute information and the second attribute information, where the attribute feature vector represents an attribute similarity between the first target object and the second target object;

the third determining submodule is specifically configured to:

Optionally, the third determining module further includes:

a second obtaining sub-module, configured to obtain a distance between the first camera and the second camera, and a time difference between the first shooting time and the second shooting time;

a fifth determination submodule for determining a moving speed of the first target object based on the distance and the time difference;

the third determining submodule is specifically configured to:

Optionally, the apparatus further comprises:

the second acquisition module is used for acquiring a training sample set, wherein the training sample set comprises a plurality of positive samples and a plurality of negative samples, each positive sample comprises two sample images with the same target object, each negative sample comprises two sample images with different target objects, and the training sample set further comprises an identifier of a camera for shooting the sample images and shooting time;

a seventh determining module, configured to determine a plurality of sample image similarities based on sample images included in the plurality of positive samples and the plurality of negative samples;

an eighth determining module for determining a plurality of sample transition probabilities based on an identification and a shooting time of a camera included in the plurality of positive samples and the plurality of negative samples, and a plurality of probability distribution models;

a ninth determining module, configured to determine a plurality of sample comprehensive feature vectors based on the plurality of sample image similarities and the plurality of sample transition probabilities;

and the processing module is used for training an initial classification model through the multiple sample comprehensive characteristic vectors to obtain the image classification model.

Optionally, the apparatus further comprises:

a triggering module, configured to trigger the second determining module to execute the step of determining the transition probability if the image similarity is greater than a first threshold and smaller than a second threshold.

In another aspect, a computer device is provided, where the computer device includes a processor, a communication interface, a memory, and a communication bus, where the processor, the communication interface, and the memory complete mutual communication through the communication bus, the memory is used to store a computer program, and the processor is used to execute the program stored in the memory to implement the steps of the similarity determination method.

In another aspect, a computer-readable storage medium is provided, in which a computer program is stored, which, when being executed by a processor, implements the steps of the similarity determination method described above.

In another aspect, a computer program product is provided comprising instructions which, when run on a computer, cause the computer to perform the steps of the similarity determination method described above.

The technical scheme provided by the embodiment of the application can at least bring the following beneficial effects:

in the embodiment of the application, for two images in which a target object exists, the image similarity between the two images is determined, and the transition probability that one target object is transferred from the shooting view of the second camera to the shooting view of the first camera within the reference time length is determined, so that the similarity between the target objects existing in the two images is determined based on the image similarity and the transition probability. Wherein the transition probability is a spatio-temporal feature, not a feature in an image. That is, the present scheme determines the similarity not only from the features in the image but also from the spatio-temporal features, and thus the accuracy of the determined similarity is high even in the case where the image quality is low.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a flowchart of a similarity determining method according to an embodiment of the present disclosure;

FIG. 2 is a statistical histogram of transition times between a pair of cameras according to an embodiment of the present disclosure;

FIG. 3 is a schematic diagram of a probabilistic observation of transition times between a pair of cameras provided by an embodiment of the present application;

fig. 4 is a schematic structural diagram of a similarity determination apparatus according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of a server according to an embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the embodiments of the present application more clear, the embodiments of the present application will be further described in detail with reference to the accompanying drawings.

It should be noted that the network architecture and the service scenario described in the embodiment of the present application are for more clearly illustrating the technical solution of the embodiment of the present application, and do not constitute a limitation to the technical solution provided in the embodiment of the present application, and it is known by a person skilled in the art that the technical solution provided in the embodiment of the present application is also applicable to similar technical problems along with the evolution of the network architecture and the appearance of a new service scenario.

For convenience of understanding, some terms or nouns referred to in the embodiments of the present application are first explained.

Image archiving: and classifying the images acquired by the camera according to the target objects in the images to obtain an image set, wherein the same image set comprises the images with the same target object.

Target clustering: and clustering the images with the same target object in one group according to the similarity of the target objects in the images by a clustering algorithm.

Probability density estimation: and estimating the probability density of the whole body by using part of observation samples to obtain the relation between the observation value and the probability thereof. The resulting relationship is characterized by a probability distribution model.

Transition probability: the transition probability referred to herein is the spatiotemporal transition probability, i.e., the probability that a target object takes some time to transition between any two cameras.

Machine learning: data or past experience is used to optimize the performance criteria of a computer program. Machine learning techniques are applied herein to associated tasks for the same target object.

Visual space-time fusion model: namely, the image classification model in the embodiment of the present application, which is a machine learning model combining visual features (i.e., image features) and spatiotemporal information (i.e., temporal spatial features).

And (3) comprehensive similarity: representing the degree of closeness between the target objects in the two images. First, by extracting visual features of two images, image similarity between the two images is obtained. And then combining the image similarity with the space-time characteristics to obtain the similarity between the target objects existing in the two images through a visual space-time fusion model, namely obtaining the comprehensive similarity. The greater the comprehensive similarity, the greater the probability that the target objects existing in the two images are the same target object.

The similarity determining method provided by the embodiment of the application can be applied to various scenes. And various processing can be performed on the images based on the similarity determined by the embodiment of the application, such as target clustering based on the similarity, image classification based on the similarity, and the like.

Illustratively, in an image archiving scenario, multiple cameras in the environment each capture images within their respective capture fields of view, resulting in a stream of images, which the multiple cameras send to a computer device. And the computer equipment receives the image streams respectively obtained by the multiple cameras. The computer can determine the similarity of the target existing between the image and the images in the archived image sets by the similarity determination method provided by the embodiment of the application every time the computer receives one image, and archive the image based on the similarity, so that the images included in each archived image set have the same target object. The computer device may also determine, after receiving the multiple images, the multiple images to perform target clustering by using the similarity determination method provided in the embodiment of the present application, so as to archive the multiple images. In an image archiving scene, an image is archived by the similarity determination method provided by the embodiment of the application, and the image archiving scene can be regarded as a track association of a target object.

In the embodiment of the present application, the target object may be a person, a vehicle, a robot, or the like, which is not limited in the embodiment of the present application. The data acquisition device for acquiring the image may be any type of camera, monitoring camera, and the like, and the data acquisition device is taken as the camera in the embodiment of the application for description.

In addition, an execution subject of the similarity determination method provided by the embodiment of the present application may be a computer device, and the computer device may be a terminal or a server. The terminal can be a mobile phone, a desktop computer, a notebook computer, a tablet computer and the like. The server may be an independent server, or a server cluster composed of a plurality of servers, or a cloud computing center.

Next, a detailed explanation is given of the similarity determination method provided in the embodiments of the present application.

Fig. 1 is a flowchart of a similarity determining method according to an embodiment of the present application. Taking the method applied to a computer device as an example, please refer to fig. 1, the method includes the following steps.

Step 101: determining the image similarity between a first image and a second image, wherein the first image has a first target object, the second image has a second target object, the first image is obtained by shooting through a first camera, and the second image is obtained by shooting through a second camera.

In an embodiment of the present application, a computer device first determines a similarity between a first image and a second image. The first image is shot by a first camera, and the second image is shot by a second camera.

One implementation of the computer device determining the first image and the second image is as follows: the computer device obtains a first image feature and a second image feature, and determines a similarity between the first image and the second image based on the first image feature and the second image feature. The first image characteristic is the characteristic of the first image, and the second image characteristic is the characteristic of the second image.

Alternatively, the computer device may input the first image and the second image into the feature extraction model respectively, so as to obtain a first image feature and a second image feature output by the feature extraction model respectively. The feature extraction model may be a machine learning model, or may be another model capable of extracting image features, which is not limited in this embodiment of the present application. Illustratively, the feature extraction model is a deep learning model, such as a convolutional neural network model. It should be noted that the feature extraction model is a model obtained by pre-training, and the structure and the training mode of the feature extraction model are not limited in the embodiments of the present application.

Optionally, the first image feature and the second image feature are both represented in a vector form, and one implementation of the computer device for determining the similarity between the first image and the second image based on the first image feature and the second image feature is as follows: the computer device calculates a distance, such as a Euclidean distance, a cosine distance, etc., between the first image feature and the second image feature, and determines an image similarity based on the calculated distance.

Step 102: a transition probability is determined, which is a probability that the first target object is transitioning from the capture field of view of the second camera to the capture field of view of the first camera within a reference time period.

In an embodiment of the application, after determining the image similarity, the computer device may further determine a transition probability, where the transition probability is a probability that the first target object is transferred from the shooting view of the second camera to the shooting view of the first camera within the reference time period. Wherein the transition probabilities can characterize spatiotemporal features that are features other than image features.

Optionally, one implementation of the computer device determining the transition probability is: the computer device determines a time difference between a first photographing time and a second photographing time, wherein the first photographing time is a photographing time of the first image and the second photographing time is a photographing time of the second image. The computer device determines a duration range from the plurality of duration ranges in which the time difference is located. And the computer equipment inputs the mark value of the time length range of the time difference into the appointed probability distribution model to obtain the transition probability output by the appointed probability distribution model. Wherein the flag value is used to indicate a corresponding duration range, and different duration ranges correspond to different flag values. It should be noted that the assigned probability distribution model can represent the time probability distribution of the transition of the same target object between the first camera and the second camera.

The plurality of time ranges in the embodiments of the present application may also be referred to as a plurality of time bins. The embodiment of the present application does not limit the method for dividing the plurality of time bins. Illustratively, the time of day or the time of two days is divided into a plurality of time bins (i.e., a plurality of time range), and the time of each time bin is equal. For example, dividing the time of day into a plurality of time bins, each time bin having a time duration of 5 minutes, thus dividing 24 hours into 288 time bins, the marker values of the 288 time bins may be 1, 2, 3, … …, 288 in turn. Assuming that the time difference between the first shooting time and the second shooting time is 17 minutes, the computer device determines that the time bin where the time difference is located is the 3 rd time bin, the label value of the third time bin is 3, and the computer device inputs the 3 into the assigned probability distribution model to obtain the transition probability output by the assigned probability distribution model.

The assigned probability distribution model can be regarded as a function, the mark value of the time range can be regarded as an independent variable of the function, and the obtained transition probability can be regarded as a dependent variable of the function.

Optionally, in this embodiment of the application, one camera pair corresponds to one probability distribution model, different camera pairs may correspond to different probability distribution models, a plurality of probability distribution models corresponding to the plurality of camera pairs respectively may be stored in the computer device, and the computer device determines the specified probability distribution model from the plurality of probability distribution models according to the identifier of the first camera and the identifier of the second camera before determining the transition probability based on the specified probability distribution model. The designated probability distribution model corresponds to a reference camera pair, the reference camera pair comprises a first camera and a second camera, the probability distribution model represents the time probability distribution of the same target object transferred between the two cameras included in the corresponding camera pair, and different probability distribution models correspond to different camera pairs. That is, the computer device first selects a probability distribution model corresponding to the reference camera pair from the plurality of probability distribution models, thereby determining the transition probability based on the selected probability distribution model.

It should be noted that, in the embodiment of the present application, one or more cameras for acquiring images may be provided. If there is one camera for capturing the image, the first camera and the second camera are the same camera. If there are multiple cameras for acquiring images, the first camera and the second camera may be the same camera or different cameras. If the first camera and the second camera are different cameras, the two cameras included in the camera pair corresponding to any probability distribution model have no sequence, for example, { first camera, second camera } and { second camera, first camera } represent the same camera pair, and the camera pair corresponds to one probability distribution model. Or, if the first camera and the second camera are different cameras, the two cameras included in the camera pair corresponding to any probability distribution model have a sequential order, for example, { first camera, second camera } and { second camera, first camera } represent two different camera pairs, and the two camera pairs correspond to the two probability distribution models. Under the condition that two cameras included in the camera pair corresponding to any probability distribution model are in sequence, the probability distribution models can more accurately represent the time probability distribution of the same target object transferred between the cameras. In addition, in the embodiment of the present application, the probability distribution models corresponding to different camera pairs are generally different, but it is needless to say that the probability distribution models corresponding to different camera pairs may be the same.

In this embodiment, the plurality of probability distribution models are predetermined models, and may be determined by any computer device, which is not limited in this embodiment. An implementation of determining the plurality of probability distribution models by a computer device is next described, the implementation comprising the following steps.

Step 1021: a plurality of observation sample pairs respectively corresponding to the respective camera pairs in the plurality of camera pairs are acquired, each observation sample pair including the shooting time of two observation images shot by the respective camera pair and in which the same target object exists.

In order to obtain the probability that the same target object takes a certain time to transit between a camera pair, and thus obtain a probability distribution model, the overall probability density is estimated by observing samples in the embodiment of the application, that is, the probability distribution models are determined by probability density estimation.

In the application example, an observation sample needs to be obtained first. Optionally, multiple images with the target object are obtained through camera shooting, and any two images, which are shot by the same camera pair and have the same target object, in the images are taken as one group, so as to obtain multiple groups of observation images corresponding to the corresponding camera pairs, that is, each group of observation images includes two observation images which are shot by the same camera pair and have the same target object. And taking the shooting time of two observation images included in each group of observation images corresponding to each camera pair as an observation sample pair, thereby obtaining a plurality of observation sample pairs corresponding to the corresponding camera pairs. It should be noted that obtaining sufficient observation sample pairs can improve the accuracy of the subsequently obtained probability distribution model.

It should be noted that the time difference between the capturing times of the two observation images included in each observation sample pair cannot exceed the maximum time length range among the plurality of time length ranges. Illustratively, the 24 hours are divided into 288 time length ranges, and the maximum time length range is [1336,1440] minutes, so that the time difference of the shooting time of the two observation images included in each observation sample pair cannot exceed 1440 minutes (i.e., 24 hours). In short, the shooting time interval of each group of observation images cannot exceed the maximum time bin.

Step 1022: and determining a plurality of time differences corresponding to the corresponding camera pairs based on the shooting time included by the plurality of observation sample pairs respectively corresponding to the camera pairs.

In the embodiment of the application, after the computer device acquires a plurality of observation sample pairs corresponding to each of the plurality of camera pairs, a plurality of time differences corresponding to the corresponding camera pairs are determined based on shooting times included in the plurality of observation sample pairs corresponding to each of the camera pairs. It should be noted that the time difference in the embodiment of the present application is a non-negative value.

Illustratively, taking an observation sample pair corresponding to a camera pair as an example, the computer device calculates an absolute value of a difference between two shooting times included in the observation sample pair, and obtains a time difference. Assuming that the observation sample pair includes two photographing times of 20 am and 36 am, respectively, a time difference obtained based on the two photographing times is 16 minutes.

In the embodiment of the present application, the time precision of the shooting time may be minutes, seconds, hours, or the like, and the embodiment of the present application is not limited to this. In the embodiment of the present application, the time division precision of the time division box may also be minutes, seconds, hours, and the like. Generally, the time precision of the shooting time of the camera shooting images is second, and if the division precision of the time bins is minute, the computer device removes the seconds in the shooting time, so as to calculate the time difference with the time precision being wonderful, and the precision of the time difference is consistent with the division precision of the time bins. Alternatively, the computer device first obtains the initial time difference based on the shooting time, and the seconds of the initial time difference are removed, or if the seconds are greater than 30, the minutes of the initial time difference are added by 1, and the principle is similar to rounding.

Step 1023: and counting the number of the time differences positioned in each time length range in the time length ranges in the time differences respectively corresponding to each camera pair to obtain a plurality of statistical frequency numbers corresponding to the corresponding camera pairs.

In this embodiment of the application, after obtaining a plurality of time differences corresponding to each camera pair, the computer device counts the number of time differences located in each of a plurality of time length ranges in the plurality of time differences corresponding to each camera pair, and obtains a plurality of statistical frequency counts (which may also be referred to as bin frequency counts) corresponding to the corresponding camera pairs.

Illustratively, for one camera pair, the computer device counts the number of time differences located in each of the time length ranges in the plurality of time length ranges corresponding to the camera pair to obtain a plurality of statistical frequency numbers corresponding to the camera pair. The plurality of statistical frequencies correspond to the plurality of time length ranges.

Assuming that the duration of each time range is dt, i.e. the time is discretized by dt, the marker value tag of the time range is time _ max/dt, where time _ max represents the cutoff value of a time range. For example, in the time range [1,5 ]]Has a cutoff value of 5. Assuming that a time difference corresponding to one camera pair is time _ diff, the computer device discretizes the time difference to obtain a corresponding bin number

Wherein the content of the first and second substances,

indicating rounding up. The computer device determines that the time difference is within a time range where the tag value tag is equal to the number of bins hist _ num. The computer device judges in which time range each time difference is located in such a manner, thereby counting a plurality of statistical frequency counts corresponding to each camera pair.

Or, the computer device judges which time range the time difference is located in by comparing the time difference with the size of the starting value of each time range and the size of the time difference with the size of the ending value of each time range, so as to obtain a plurality of statistical frequencies corresponding to each camera pair through statistics.

Illustratively, fig. 2 is a statistical histogram of transition times between a camera pair provided by the embodiment of the present application. In fig. 2, taking the example of dividing 24 hours into 288 time bins, the labeled values of the 288 time bins are 1, 2, 3, … … and 288 in sequence, and the duration of each time bin is 5 minutes. Each curve in fig. 2 corresponds to one camera pair, and the horizontal axis of the curve is a mark value obtained by time discretization, and the vertical axis is a statistical frequency. Looking laterally at FIG. 2, each curve shows that the frequency of transitions between the same camera pair for the same target object is different for different durations. Looking at fig. 2 in the longitudinal direction, fig. 2 also shows that the frequency of transitions between different camera pairs within a certain time for the same target object is different. It should be noted that only some camera pairs, for example, 10 camera pairs such as cam _1- > cam _2, cam _1- > cam _3, are shown in fig. 2. In fig. 2, the curve at the top corresponds to cam _1- > cam _2, the overlapping portions of the remaining curves are large (due to the relatively close statistical frequency), the 9 curves with large overlapping correspond to the remaining 9 camera pairs, and although there are overlapping portions, it can be seen that the statistical frequency corresponding to different camera pairs is different.

Step 1024: and determining a probability distribution model corresponding to each camera pair based on the plurality of statistical frequency numbers respectively corresponding to each camera pair and the marking values of the plurality of time ranges.

In the embodiment of the present application, after obtaining the plurality of statistical frequency counts corresponding to each camera pair, the computer device estimates the overall probability density based on the plurality of statistical frequency counts corresponding to each camera pair, that is, determines the plurality of probability distribution models by probability density estimation.

Optionally, the probability density estimation method in the embodiment of the present application is a parzen window estimation method, a k-neighborhood estimation method, or other probability estimation methods, which is not limited in the embodiment of the present application.

Optionally, in order to alleviate the sparsity of distribution of the observation samples in a large-scale scene, the computer device may perform smoothing processing on the plurality of statistical frequency counts corresponding to the respective camera pairs obtained through statistics, so as to obtain a plurality of probability observation values corresponding to the respective camera pairs. The computer device determines a probability distribution model corresponding to each camera pair based on the plurality of probabilistic observations corresponding to each camera pair and the marker values for the plurality of time ranges.

The smoothing process may be gaussian smoothing or other smoothing processes, which is not limited in this embodiment.

Taking Gaussian smoothing as an example, assume that a statistical frequency obtained based on observation sample pairs is m_kThe total number of a plurality of observation sample pairs corresponding to one camera pair is M, the total number of a plurality of time bins is D, and the label value of each time bin is D_kAnd λ is a smoothing parameter. Y ═ d_kRepresenting the position of one camera pair among a plurality of time differences corresponding to a tag value d_kTime difference in the time range of (1), p_λRepresenting a probabilistic observation. The computer device may perform smoothing processing on the plurality of statistical frequency counts corresponding to each camera pair according to the following formula (1), to obtain a plurality of probability observation values corresponding to each camera pair.

Illustratively, fig. 3 is a schematic diagram of a probabilistic observation of transition time between a pair of cameras provided by an embodiment of the present application. Fig. 3 is a schematic diagram of a probability observation value obtained by smoothing the statistical frequency shown in fig. 2. In fig. 3, one curve corresponds to one camera pair, the horizontal axis of the curve is a mark value obtained by time discretization, and the vertical axis of the curve is a probability observed value (i.e., probability density). Looking laterally at FIG. 3, each curve shows that the probability observations of the same target object transitioning between the same camera pair are different for different durations. Looking at FIG. 3 in the longitudinal direction, FIG. 3 also shows that the probability observations of the same target object transitioning between different camera pairs over time are different. It should be noted that fig. 3 is similar to fig. 2, and fig. 3 only shows a part of the camera pairs, for example, 10 camera pairs such as cam _1- > cam _2, cam _1- > cam _3, and so on. In fig. 3, the curve with the widest distribution of the probability observation values corresponds to cam _1- > cam _2, the distribution ranges of the probability observation values of the other curves are smaller, and the 9 curves with the smaller distribution ranges of the probability observation values respectively correspond to the other 9 camera pairs, so that it can be seen that the probability observation values corresponding to the different camera pairs are different.

Step 103: based on the image similarity and the transition probability, a similarity between the first target object and the second target object is determined.

In the embodiment of the present application, after obtaining the image similarity and the transition probability between the first image and the second image through

steps

101 and 102, the computer device determines the similarity between the first target object and the second target object based on the image similarity and the transition probability.

Illustratively, the computer device calculates a weighted average of the image similarity and the transition probability by a weighted average method, taking the weighted average as the similarity between the first target object and the second target object.

Or the computer device processes the image similarity and the transition probability through an image classification model, so as to obtain the similarity between the first target object and the second target object. In an embodiment of the application, the computer device determines a synthetic feature vector based on the image similarity and the transition probability. And the computer equipment inputs the comprehensive characteristic vector into the image classification model to obtain the similarity between the first target object and the second target object output by the image classification model.

For example, the computer device combines the image similarity and the transition probability to obtain a comprehensive feature vector, inputs the comprehensive feature vector into an image classification model, and obtains the similarity between a first target object and a second target object output by the image classification model.

The image classification model is a machine learning model or a deep learning model trained in advance. For example, the image classification model may be a logistic regression model, a decision tree model, or a support vector machine-based classification model, among others. The embodiment of the present application does not limit the type, structure, and the like of the image classification model. In addition, the comprehensive feature vector in the embodiment of the present application includes two part features, the first part feature is an image feature, the second part feature is a spatio-temporal feature, where the image feature includes an image similarity, and the spatio-temporal feature includes a transition probability.

If the computer device obtains the similarity between the first target object and the second target object through the image classification model, optionally, the feature vector of the input image classification model may include more features in addition to the image similarity and the transition probability, so as to increase the richness of the features represented by the feature vector, thereby improving the accuracy of the similarity obtained through the image classification model.

Optionally, the computer device obtains first attribute information and second attribute information, where the first attribute information represents an attribute of the first target object, and the second attribute information represents an attribute of the second target object. The computer device determines an attribute feature vector characterizing attribute similarity between the first target object and the second target object based on the first attribute information and the second attribute information. The computer device determines a synthetic feature vector based on the image similarity, the attribute feature vector, and the transition probability. That is, in the embodiment of the present application, the similarity between the target objects (i.e., the comprehensive similarity) may also be determined by combining the attribute features of the target objects.

The attribute information may be referred to as an attribute feature, a structured attribute feature, or the like. Taking the target object as a person as an example, the attribute information of one target object may include a face attribute and/or a body attribute. The face attributes may include one or more of eye spacing, gray scale number, sharpness score, front face score, visibility score, and face quality score, among others. The body attributes may include one or more of a top color, a bottom color, an age, a gender, whether to bike, whether to carry a bag, whether to wear a hat, etc.

In the embodiment of the present application, various attributes included in the attribute information are all represented by numerical values, which is convenient for a computer device to perform calculation.

Optionally, one implementation manner of determining, by the computer device, the attribute feature vector based on the first attribute information and the second attribute information is as follows: and the computer equipment performs specified operation processing on the corresponding attributes in the first attribute information and the second attribute information to obtain the characteristic values of the corresponding attributes. And the computer equipment combines the characteristic values of the attributes included in the first attribute information and the second attribute information to obtain an attribute characteristic vector. Wherein the specified operation may comprise one or more of an addition, a subtraction, a division, or the like. The assigned operations corresponding to different attributes may be the same or different.

Illustratively, taking the attribute of the gray scale number in the first attribute information and the second attribute information as an example, the specified operation corresponding to the gray scale number includes addition, subtraction and division, and assuming that the corresponding two gray scale numbers in the first attribute information and the second attribute information are h1 and h2, respectively, the computer device adds, subtracts and divides the two gray scale numbers to obtain three eigenvalues of the gray scale number, which are h_f1、h_f2And h_f3. Taking the attribute of the age in the first attribute information and the second attribute information as an example, the specified operation corresponding to the age includes subtraction, and assuming that two ages corresponding to the first attribute information and the second attribute information are o1 and o2, respectively, the computer device subtracts the two ages to obtain a characteristic value of the age, which is o_f1。

Optionally, the computer device may combine feature values of the attributes included in the first attribute information and the second attribute information according to the attribute order to obtain the attribute feature vector. The attribute sequence is the sequence of the eye distance, the gray scale number and the age in turn, so that the computer equipment can splice the characteristic value of the eye distance, the characteristic value of the gray scale number and the characteristic value of the age to obtain the attribute characteristic vector.

In the embodiment of the present application, the first attribute information and the second attribute information are determined based on the first image and the second image, respectively, and the attribute information may be regarded as an image feature, that is, the image feature in the present scheme further includes the attribute information. Illustratively, the computer device inputs the first image and the second image into the image analysis model respectively, and obtains first attribute information and second attribute information which are respectively output by the image analysis model. The image analysis model is a machine learning model or a deep learning model that can analyze and obtain attribute information of a target object in an image, and is obtained by training the image analysis model in advance. The embodiment of the application does not limit the type, structure, training method and the like of the image analysis model. In addition, an image analysis model can be deployed in any computer device to analyze the image to obtain the attribute information of the target object. An image analysis model can also be deployed in a camera for acquiring images, and attribute information of the acquired images is extracted through the image analysis model by the camera. The camera not only sends the acquired image to the computer device, but also sends the attribute information of the image to the computer device. Optionally, the attribute information is a structured attribute feature, and the camera simultaneously sends the acquired image and the analyzed structured attribute feature to the computer device.

Optionally, in the embodiment of the present application, in addition to determining the similarity between the target objects by combining the attribute features of the target objects, the similarity between the target objects may be determined by combining more other features. Illustratively, the computer device obtains a distance between the first camera and the second camera, and a time difference between the first shot time and the second shot time. The computer device determines a movement speed of the first target object based on the distance and the time difference. And the computer equipment combines the image similarity, the attribute feature vector, the transition probability, the distance, the time difference and the moving speed to obtain a comprehensive feature vector. That is, the computer device may also determine the similarity between the target objects in conjunction with the distance between the cameras and the moving speed of the target objects. It should be noted that the distance, the time difference, and the moving speed may be considered as a space-time feature, that is, the space-time feature in the present embodiment further includes the distance between cameras, the time difference between shooting times, and the moving speed of the target object.

Alternatively, the computer device may determine the position coordinates of the cameras from a mapping of the camera identifications and the camera position coordinates based on the identifications of the cameras, and determine the distance between the cameras based on the position coordinates of the cameras. The computer device may also determine the distance between the cameras by other methods, which is not limited in this embodiment.

Illustratively, assuming that the image similarity between the first image and the second image is pic _ sim, the attribute feature vector is [ f1, f2, f3, …, fn ], the transition probability is prob, the distance is dist, the time difference is time _ diff, and the moving speed is speed, then the integrated feature vector obtained by the computer device through combination is pic _ sim, f1, f2, f3, …, fn, prob, dist, time _ diff ].

It should be noted that the attribute information, the distance between the cameras, the time difference between the shooting times, and the moving speed of the target object used in the above embodiments are optional. That is, the computer device may determine the similarity between the target objects in combination with one or more attributes of the attribute information, the distance between the cameras, the time difference between the photographing times, and the moving speed of the target objects. Illustratively, the computer device determines a similarity between the target objects in conjunction with the attribute information, the distance, and the time difference. In addition, the embodiments of the present application are not limited to the above-described features, that is, in addition to the above-described features, the computer device may also combine other features to determine the similarity between the target objects.

In the embodiment of the application, after obtaining the comprehensive feature vector, the computer device inputs the comprehensive feature vector into the image classification model to obtain the similarity between the first target object and the second target object output by the image classification model.

As can be seen from the foregoing, the image classification model is a model obtained by training in advance, and a plurality of training data required for obtaining the image classification model by training are introduced next. It should be noted that each training data is similar to the integrated feature vector in the above embodiments, that is, the training data is determined in a manner similar to that of the integrated feature vector. The training data is referred to as a sample synthetic feature vector in the embodiments of the present application. In addition, the embodiment of the present application does not limit which device is used for training to obtain the image classification model, and the following description will take the example of obtaining the image classification model by training of a computer device as an example.

In the embodiment of the present application, the implementation process of obtaining an image classification model by training a computer device includes the following steps:

step 1031: the method comprises the steps of obtaining a training sample set, wherein the training sample set comprises a plurality of positive samples and a plurality of negative samples, each positive sample comprises two sample images with the same target object, each negative sample comprises two sample images with different target objects, and the training sample set further comprises identification and shooting time of a camera for shooting the sample images.

In an embodiment of the present application, a computer device first obtains a training sample set, including obtaining a plurality of positive samples and a plurality of negative samples. Illustratively, the computer device acquires a plurality of image sets, each image set including a plurality of images in which the same target object exists, and combines the plurality of images in each image set two by two to obtain a plurality of positive samples. The computer device randomly selects one image from any two image sets respectively to be combined to obtain a plurality of negative samples.

For example, the set of images Pa in which the target object a exists is { P_a1,P_a2,P_a3That the set of images Pb in which the target object b exists is { P }_b1,P_b2,P_b3}. Any two images in Pa can be combined into one positive sample. Similarly, any two images in Pb combine to make one positive sample. Any one of the images Pa and any one of the images Pb are combined to constitute one negative sample. Then for both image sets, 3+ 3-6 positive samples and 3-9 negative samples can be obtained by combining.

In the embodiment of the present application, since the comprehensive feature vector needs to be determined based on the transition probability, which is related to both the camera and the shooting time, the training sample set further includes the identification of the camera that shoots the sample image and the shooting time. That is, each positive sample further includes an identification and a photographing time of a camera that photographed the corresponding sample image, and each negative sample also includes an identification and a photographing time of a camera that photographed the corresponding sample image.

Step 1032: based on sample images included in the plurality of positive samples and the plurality of negative samples, a plurality of sample image similarities are determined.

In an embodiment of the present application, the computer device determines a plurality of sample image similarities based on sample images included in the plurality of positive samples and the plurality of negative samples. Illustratively, for two sample images included in each positive sample, the computer device extracts image features of the two sample images, resulting in a first sample image feature and a second sample image feature. The computer device determines the sample image similarity between the two sample images based on the first sample image feature and the second sample image feature, that is, the computer device obtains a corresponding image similarity based on each positive sample. Similarly, for two sample images included in each negative sample, the computer device extracts image features of the two sample images to obtain a third sample image feature and a fourth sample image feature. The computer device determines the sample image similarity between the two image samples based on the third sample image feature and the fourth sample image feature, that is, the computer device obtains a corresponding image similarity based on each negative sample. The computer obtains a plurality of sample image similarities based on the plurality of positive samples and the plurality of negative samples, one sample image similarity corresponding to one positive sample or one negative sample.

Wherein the computer device may extract image features of the sample image through the feature extraction model. Illustratively, the computer device inputs each sample image into the feature extraction model, resulting in image features output by the feature extraction model. It should be noted that the feature extraction model in step 1032 is the same as the feature extraction model in step 101, and reference may be made to the foregoing embodiment for the description of the feature extraction model.

Step 1033: a plurality of sample transition probabilities are determined based on the identities and capture times of the cameras included in the plurality of positive samples and the plurality of negative samples, and a plurality of probability distribution models.

In an embodiment of the present application, the computer device further needs to determine a plurality of sample transition probabilities based on the identification and shooting time of the cameras included in the plurality of positive samples and the plurality of negative samples, and the plurality of probability distribution models. Illustratively, for the two camera identifications and the two times of capture included in each positive or negative sample, the computer device determines a time difference between the two times of capture, and determines a time length range in which the time difference is located from a plurality of time length ranges. The computer device determines a probability distribution model corresponding to the two cameras from a plurality of probability distribution models based on the identification of the two cameras. And inputting the mark value of the time length range in which the time difference is positioned into the determined probability distribution model by the computer equipment to obtain the sample transfer probability corresponding to the positive sample or the negative sample. That is, the computer device also obtains a corresponding one of the sample transition probabilities based on each of the positive samples and also obtains a corresponding one of the sample transition probabilities based on each of the negative samples.

The descriptions of the plurality of time length ranges and the plurality of probability distribution models may refer to the relevant descriptions in step 102. The principle of determining the sample transition probability by the computer is the same as the principle of determining the transition probability, and the specific implementation process of determining the sample transition probability can also refer to the related description in step 102.

Step 1034: based on the plurality of sample image similarities and the plurality of sample transition probabilities, a plurality of sample synthetic feature vectors are determined.

In an embodiment of the application, after determining a plurality of sample image similarities and a plurality of sample transition probabilities, the computer device determines a plurality of sample composite feature vectors based on the plurality of sample image similarities and the plurality of sample transition probabilities. Wherein, a sample image similarity corresponds to a positive sample or a negative sample, and a sample transition probability corresponds to a positive sample or a negative sample.

Illustratively, the computer device obtains a corresponding sample comprehensive feature vector based on the sample image similarity and the sample transition probability corresponding to each positive sample. And the computer equipment obtains a corresponding sample comprehensive characteristic vector based on the sample image similarity and the sample transfer probability corresponding to each negative sample. In this way, the computer device obtains a plurality of sample comprehensive feature vectors based on the sample image similarity and the sample transition probability corresponding to the plurality of positive samples and the sample image similarity and the sample transition probability corresponding to the plurality of negative samples. One sample integrated feature vector corresponds to one positive sample or one negative sample.

It should be noted that the principle of determining the sample comprehensive feature vector by the computer device is the same as that of determining the comprehensive feature vector in step 102, and the implementation process of determining the sample comprehensive feature vector in step 1034 is the same as that of determining the comprehensive feature vector in step 102. That is, determining which features are used in the integrated feature vector requires the same features to be used in determining the sample integrated feature vector.

For example, in the case where the integrated feature vector is determined based on the image similarity and the transition probability, the computer device combines the sample image similarity and the sample transition probability corresponding to each positive sample to obtain a corresponding one of the sample integrated feature vectors. And combining the sample image similarity and the sample transfer probability corresponding to each negative sample by the computer equipment to obtain a corresponding sample comprehensive characteristic vector.

As another example, where the synthetic feature vector is determined based on image similarity, attribute information, and transition probability, the computer device also needs to determine a sample synthetic feature vector in conjunction with the sample attribute information. Illustratively, each positive sample acquired by the computer device further includes sample attribute information corresponding to the respective two sample images, and each negative sample acquired by the computer device further includes sample attribute information corresponding to the respective two sample images. The computer device determines a plurality of sample attribute feature vectors based on sample attribute information included in the plurality of positive samples and the plurality of negative samples. The computer device determines a plurality of sample composite feature vectors based on the plurality of sample image similarities, the plurality of sample attribute feature vectors, and the plurality of sample transition probabilities. Illustratively, the computer device combines the sample image similarity, the sample attribute feature vector and the sample transition probability corresponding to each positive sample to obtain a corresponding sample comprehensive feature vector. And the computer equipment combines the sample image similarity, the sample attribute feature vector and the sample transfer probability corresponding to each negative sample to obtain a corresponding sample comprehensive feature vector. Wherein, a sample comprehensive characteristic vector corresponds to a positive sample or a negative sample. The implementation manner of determining the plurality of sample attribute feature vectors by the computer device based on the sample attribute information included in the plurality of positive samples and the plurality of negative samples is the same as the implementation manner of determining the attribute feature vectors in step 102.

As another example, in the case where the integrated feature vector is determined based on the image similarity, the attribute information, the transition probability, the distance between cameras, the time difference between shooting times, and the moving speed of the target object, the computer apparatus also needs to determine the sample integrated feature vector in combination with the sample attribute information, the distance between cameras, the time difference between shooting times, and the moving speed of the target object. Illustratively, the computer device obtains a distance between two cameras corresponding to each positive sample and a distance between two cameras corresponding to each negative sample to obtain a plurality of sample distances, each sample distance corresponding to one positive sample or one negative sample. The computer device calculates a time difference between two photographing times included in each positive sample and a time difference between two photographing times included in each negative sample to obtain a plurality of sample time differences, each sample time difference corresponding to one positive sample or one negative sample. The computer device determines a plurality of sample movement speeds based on the plurality of sample distances and the plurality of sample time differences. Each sample movement speed corresponds to one positive sample or one negative sample. The computer device determines a plurality of sample synthetic feature vectors based on the plurality of sample image similarities, the plurality of sample attribute feature vectors, the plurality of sample transition probabilities, the plurality of sample distances, the plurality of sample time differences, and the plurality of sample movement speeds. For example, the computer device combines the sample image similarity, the sample attribute feature vector, the sample transition probability, the sample distance, the sample time difference and the sample moving speed corresponding to each positive sample to obtain a corresponding sample comprehensive feature vector. And the computer equipment combines the sample image similarity, the sample attribute feature vector, the sample transfer probability, the sample distance, the sample time difference and the sample moving speed corresponding to each negative sample to obtain a corresponding sample comprehensive feature vector. It should be noted that the principle of this implementation is the same as that of the related content in step 102, and please refer to the foregoing description, and detailed description is omitted here.

Step 1035: and training an initial classification model through the multiple sample comprehensive characteristic vectors to obtain an image classification model.

In the embodiment of the application, after obtaining a plurality of sample comprehensive feature vectors, the computer device trains an initial classification model through the plurality of sample comprehensive feature vectors to obtain an image classification model. It should be noted that, as can be seen from the foregoing, one sample comprehensive feature vector corresponds to one positive sample or one negative sample, both the positive sample and the negative sample may have corresponding label information, and then the multiple sample comprehensive feature vectors also have corresponding label information, and the computer device may train the initial classification model through the multiple sample comprehensive feature vectors and the label information corresponding to the multiple sample comprehensive feature vectors to obtain the image classification model. That is, the computer device is trained in a supervised manner to derive the image classification model. In other embodiments, the computer device may also train the image classification model in an unsupervised or semi-supervised manner, i.e. the label information is optional. Optionally, in this embodiment of the present application, the label information of the positive swatch is "1", and the label information of the negative swatch is "0".

The initial classification model and the finally obtained image classification model have the same structure, or the image classification model is a part of the initial classification model, which is not limited in the embodiments of the present application. In addition, the embodiment of the present application also does not limit the way of training the initial classification model. And testing or verifying the trained image classification model through a test set or a verification set so as to optimize the image classification model and finally obtain the image classification model with better performance.

Optionally, in this embodiment of the application, before determining the transition probability, or after determining the image similarity between the first image and the second image, if the image similarity is greater than the first threshold and less than the second threshold, the computer device performs the step of determining the transition probability. If the image similarity is less than or equal to the first threshold, or the image similarity is greater than or equal to the second threshold, the computer device determines the image similarity as the similarity between the first target object and the second target object without performing the step of determining the transition probability. That is, the computer device may determine whether to further incorporate spatiotemporal features by setting a threshold to determine the similarity between the target objects.

The first threshold and the second threshold may be preset values, and the first threshold is smaller than the second threshold, and the embodiment of the present application does not limit specific values of the first threshold and the second threshold, and the first threshold and the second threshold are both within a value range of the image similarity. Illustratively, assuming that the image similarity has a value range of [0,1], the first threshold is 0.7, and the second threshold is 0.8, if the image similarity between the first image and the second image is greater than 0.7 and less than 0.8, the computer device further determines the transition probability.

That is, in the case of low or high image similarity, the image similarity may be considered to be relatively accurate, in the case of low image similarity, the target object existing in the first image and the target object existing in the second image may be directly considered to be not the same, and in the case of high image similarity, the target object existing in the first image and the target object existing in the second image may be directly considered to be the same. In other cases, the image similarity may be considered to be inaccurate enough, and the similarity between the target objects needs to be determined by further combining the transition probabilities.

Optionally, after determining the image similarity between the first image and the second image, the computer device performs the step of determining the transition probability if the image similarity is greater than a third threshold. If the image similarity is less than or equal to the third threshold, the computer device determines the image similarity as the similarity between the first target object and the second target object without performing the step of determining the transition probability. The third threshold is a preset numerical value, the embodiment of the application does not limit the specific numerical value of the third threshold, and the third threshold is only within the value range of the image similarity. Illustratively, assuming that the image similarity ranges from [0,1], the third threshold is 0.75, and if the image similarity between the first image and the second image is greater than 0.75, the computer device further determines the transition probability.

As can be seen from the above, the computer device may determine the similarity between the target objects based on the image similarity and the transition probability. The computer device may also determine the transition probability again in a case where the image similarity satisfies a preset threshold condition, and determine the similarity between the target objects based on the image similarity and the transition probability. The setting method of the preset threshold condition is not limited in the embodiment of the application. It can be seen that in the scheme, the computer device can execute the steps of determining the transition probability and the subsequent steps under the condition that the image similarity meets the preset threshold condition through a multi-threshold strategy, and does not execute the steps under the condition that the image similarity does not meet the preset threshold condition, so that the accuracy can be ensured and the operating efficiency can be improved.

In addition, the moving mode of the target object is modeled based on a probability density estimation method to obtain a plurality of probability distribution models, and the probability distribution models represent the time probability distribution of the same target object transferred between two cameras included in a corresponding camera pair. The scheme can also combine the space-time characteristics such as the distance between cameras, the time difference between shooting times, the moving speed of the target object and the like, and can also combine the space-time characteristics to make up the deficiency of the image characteristics under the condition that the image similarity is low due to low image quality caused by occlusion, illumination change, overexposure and the like, so that the accuracy of the finally determined similarity (namely the comprehensive similarity) between the target objects is high.

Optionally, in this embodiment of the application, after determining the similarity between the first target object and the second target object, the computer device may cluster the images based on the similarity, thereby improving the clustering accuracy.

Illustratively, the computer device determines that the first image and the second image belong to the same class if the similarity between the first target object and the second target object is greater than a fourth threshold, indicating that the first target object and the second target object are the same target object, and determines that the first image and the second image do not belong to the same class if the similarity between the first target object and the second target object is greater than the fourth threshold, indicating that the first target object and the second target object are different target objects.

In one scenario of image archiving, each time an image is received by a computer device, the image needs to be archived, i.e., classified into a certain set of images based on similarity. Alternatively, the computer device may perform similarity calculation with all or part of the images in the respective image sets, thereby performing clustering.

If the computer device performs similarity calculation between the image and all images in each image set, the computer device may calculate similarity between a target object existing in the image and a target object existing in each image in any one image set, so as to obtain a plurality of similarities. The computer device takes an average of the plurality of similarities as the similarity between the image and the set of images. After obtaining the similarity between the image and each image set, the computer device classifies the image into the image set with the highest similarity.

If the computer device performs similarity calculation on the image and the partial images in each image set, if the partial images are at least one representative image in the image sets, the computer device may calculate the similarity between the target object existing in the image and the target object existing in at least one representative image in any image set, so as to obtain at least one similarity. The computer device takes the average of the at least one similarity as the similarity between the image and the set of images. After obtaining the similarity between the image and each image set, the computer device classifies the image into the image set with the highest similarity. The number of representative images in each image set may be the same or different. Also, the representative images in the respective image sets may be updated. For example, the computer device selects a representative image based on the quality score of the image, and if the quality score of the newly included image is higher, the computer device treats the newly included image as one representative image. Wherein the quality score of the image may be determined based on one or more of a sharpness score, a frontal extent score, a visibility score, and a face quality score in the structured attributes of the image in the previous embodiments.

It should be noted that the following application of clustering images based on similarity after determining the similarity between target objects is used as an exemplary description, and the above example is not used to limit the embodiments of the present application.

In summary, in the embodiment of the present application, for two images in which a target object exists, an image similarity between the two images is determined, and a transition probability that one target object is transferred from the photographing field of view of the second camera to the photographing field of view of the first camera within a reference time period is determined, so that the similarity between the target objects existing in the two images is determined based on the image similarity and the transition probability. Wherein the transition probability is a spatio-temporal feature, not a feature in an image. That is, the present scheme determines the similarity not only from the features in the image but also from the spatio-temporal features, and thus the accuracy of the determined similarity is high even in the case where the image quality is low.

All the above optional technical solutions can be combined arbitrarily to form an optional embodiment of the present application, and the present application embodiment is not described in detail again.

Fig. 4 is a schematic structural diagram of a similarity determination apparatus 400 provided in an embodiment of the present application, where the similarity determination apparatus 400 may be implemented as part of or all of a computer device by software, hardware, or a combination of the software and the hardware. Referring to fig. 4, the apparatus 400 includes: a first determining module 401, a second determining module 402 and a third determining module 403.

A first determining module 401, configured to determine an image similarity between a first image and a second image, where the first image has a first target object, the second image has a second target object, the first image is obtained by shooting with a first camera, and the second image is obtained by shooting with a second camera;

a second determining module 402, configured to determine a transition probability, where the transition probability is a probability that the first target object is transferred from the shooting view of the second camera to the shooting view of the first camera within the reference time length;

a third determining module 403, configured to determine a similarity between the first target object and the second target object based on the image similarity and the transition probability.

Optionally, the second determining module 402 includes:

the second determining submodule is used for determining a time length range in which the time difference is positioned from the plurality of time length ranges;

and the first processing submodule is used for inputting the marking value of the duration range where the time difference is positioned into the specified probability distribution model to obtain the transition probability output by the specified probability distribution model, and the marking value is used for indicating the corresponding duration range.

Optionally, the apparatus 400 further comprises:

a fourth determining module, configured to determine a specified probability distribution model from the plurality of probability distribution models according to the identifier of the first camera and the identifier of the second camera;

the designated probability distribution model corresponds to a reference camera pair, the reference camera pair comprises a first camera and a second camera, the probability distribution model represents the time probability distribution of the same target object transferred between the two cameras included in the corresponding camera pair, and different probability distribution models correspond to different camera pairs.

Optionally, the apparatus 400 further comprises:

and the sixth determining module is used for determining the probability distribution model corresponding to the corresponding camera pair based on the plurality of statistical frequency numbers respectively corresponding to the camera pairs and the marking values of the plurality of time ranges.

Optionally, the third determining module 403 includes:

the third determining submodule is used for determining a comprehensive characteristic vector based on the image similarity and the transition probability;

and the second processing submodule is used for inputting the comprehensive characteristic vector into the image classification model to obtain the similarity between the first target object and the second target object output by the image classification model.

Optionally, the third determining module 403 further includes:

the fourth determining submodule is used for determining an attribute feature vector based on the first attribute information and the second attribute information, and the attribute feature vector represents the attribute similarity between the first target object and the second target object;

the third determination submodule is specifically configured to:

and determining a comprehensive characteristic vector based on the image similarity, the attribute characteristic vector and the transition probability.

Optionally, the third determining module 403 further includes:

the second acquisition submodule is used for acquiring the distance between the first camera and the second camera and the time difference between the first shooting time and the second shooting time;

the third determination submodule is specifically configured to:

and combining the image similarity, the attribute feature vector, the transition probability, the distance, the time difference and the moving speed to obtain a comprehensive feature vector.

Optionally, the apparatus 400 further comprises:

an eighth determining module for determining a plurality of sample transition probabilities based on an identification and a shooting time of a camera included in the plurality of positive samples and the plurality of negative samples, and the plurality of probability distribution models;

and the processing module is used for training the initial classification model through a plurality of sample comprehensive characteristic vectors to obtain an image classification model.

Optionally, the apparatus 400 further comprises:

and the triggering module is used for triggering the second determining module to execute the step of determining the transition probability if the image similarity is greater than the first threshold and less than the second threshold.

It should be noted that: in the similarity determination apparatus provided in the above embodiment, when determining the similarity, only the division of the functional modules is illustrated, and in practical applications, the function distribution may be completed by different functional modules according to needs, that is, the internal structure of the apparatus is divided into different functional modules to complete all or part of the functions described above. In addition, the similarity determination apparatus provided in the above embodiments and the similarity determination method embodiments belong to the same concept, and specific implementation processes thereof are described in the method embodiments and are not described herein again.

Fig. 5 is a schematic structural diagram of a server according to an embodiment of the present application. The server may be a computer device in the foregoing embodiments, so as to execute the similarity determination method provided in the embodiments of the present application. The server 500 includes a Central Processing Unit (CPU)501, a system memory 504 including a Random Access Memory (RAM)502 and a Read Only Memory (ROM)503, and a system bus 505 connecting the system memory 504 and the central processing unit 501. The server 500 also includes a basic input/output system (I/O system) 506, which facilitates transfer of information between devices within the computer, and a mass storage device 507, which stores an operating system 513, application programs 514, and other program modules 515.

The basic input/output system 506 comprises a display 508 for displaying information and an input device 509, such as a mouse, keyboard, etc., for user input of information. Wherein a display 508 and an input device 509 are connected to the central processing unit 501 through an input output controller 510 connected to the system bus 505. The basic input/output system 506 may also include an input/output controller 510 for receiving and processing input from a number of other devices, such as a keyboard, mouse, or electronic stylus. Similarly, input-output controller 510 also provides output to a display screen, a printer, or other type of output device.

The mass storage device 507 is connected to the central processing unit 501 through a mass storage controller (not shown) connected to the system bus 505. The mass storage device 507 and its associated computer-readable media provide non-volatile storage for the server 500. That is, the mass storage device 507 may include a computer readable medium (not shown) such as a hard disk or CD-ROM drive.

Without loss of generality, computer readable media may comprise computer storage media and communication media. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes RAM, ROM, EPROM, EEPROM, flash memory or other solid state memory technology, CD-ROM, DVD, or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices. Of course, those skilled in the art will appreciate that computer storage media is not limited to the foregoing. The system memory 504 and mass storage device 507 described above may be collectively referred to as memory.

According to various embodiments of the present application, server 500 may also operate as a remote computer connected to a network through a network, such as the Internet. That is, the server 500 may be connected to the network 512 through the network interface unit 511 connected to the system bus 505, or may be connected to other types of networks or remote computer systems (not shown) using the network interface unit 511.

The memory further includes one or more programs, and the one or more programs are stored in the memory and configured to be executed by the CPU.

In some embodiments, a computer-readable storage medium is also provided, in which a computer program is stored, and the computer program, when executed by a processor, implements the steps of the similarity determination method in the above embodiments. For example, the computer readable storage medium may be a ROM, a RAM, a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

It is noted that the computer-readable storage medium referred to in the embodiments of the present application may be a non-volatile storage medium, in other words, a non-transitory storage medium.

It should be understood that all or part of the steps for implementing the above embodiments may be implemented by software, hardware, firmware or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The computer instructions may be stored in the computer-readable storage medium described above.

That is, in some embodiments, there is also provided a computer program product containing instructions which, when run on a computer, cause the computer to perform the steps of the similarity determination method described above.

It is to be understood that reference herein to "at least one" means one or more and "a plurality" means two or more. In the description of the embodiments of the present application, "/" means "or" unless otherwise specified, for example, a/B may mean a or B; "and/or" herein is merely an association describing an associated object, and means that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, in order to facilitate clear description of technical solutions of the embodiments of the present application, in the embodiments of the present application, terms such as "first" and "second" are used to distinguish the same items or similar items having substantially the same functions and actions. Those skilled in the art will appreciate that the terms "first," "second," etc. do not denote any order or quantity, nor do the terms "first," "second," etc. denote any order or importance.

The above-mentioned embodiments are provided not to limit the present application, and any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. A method for similarity determination, the method comprising:

2. The method of claim 1, wherein determining the transition probability comprises:

3. The method of claim 1, wherein prior to determining the transition probability, further comprising:

4. The method of claim 3, wherein prior to determining the specified probability distribution model from a plurality of probability distribution models based on the identity of the first camera and the identity of the second camera, further comprising:

5. The method of any of claims 1-4, wherein determining the similarity between the first target object and the second target object based on the image similarity and the transition probability comprises:

6. The method of claim 5, wherein prior to determining a composite feature vector based on the image similarity and the transition probability, further comprising:

7. The method of claim 6, wherein prior to determining the composite feature vector based on the image similarity, the attribute feature vector, and the transition probability, further comprising:

8. The method of claim 5, wherein before inputting the synthesized feature vector into an image classification model, further comprising:

9. The method of any of claims 1-4, wherein prior to determining the transition probability, further comprising:

10. A similarity determination apparatus, characterized in that the apparatus comprises:

11. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 9.