CN113516690A

CN113516690A - Image detection method, device, equipment and storage medium

Info

Publication number: CN113516690A
Application number: CN202011157305.5A
Authority: CN
Inventors: 刘俊龙; 沈旭; 黄建强
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2020-10-26
Filing date: 2020-10-26
Publication date: 2021-10-19

Abstract

The embodiment of the invention provides an image detection method, an image detection device, image detection equipment and a storage medium, wherein the method comprises the following steps: determining a plurality of images of which the visual characteristics and the target object meet the similarity requirement from the images acquired by the plurality of cameras, and determining a space-time point sequence according to the space-time points corresponding to the plurality of images. The space-time point corresponding to any image is composed of the acquisition time corresponding to any image and the user position of the target object corresponding to the acquisition time, and the space-time points in the space-time point sequence are arranged according to the acquisition time sequence. And performing Gaussian process modeling on the time-space point sequence to determine path smoothing indexes corresponding to the plurality of images, and filtering the images which are not matched with the target object in the plurality of images according to the path smoothing indexes, so that the rest images after filtering are the images matched with the target object.

Description

Image detection method, device, equipment and storage medium

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to an image detection method, an image detection apparatus, an image detection device, and a storage medium.

Background

At present, cameras are deployed in many scenes to acquire videos of surrounding environments, so that the purposes of video monitoring and the like are achieved.

For example, cameras are arranged on both sides of a road in a city, when a moving track of a target object (such as a person or a vehicle) in a certain period of time needs to be found, the target object can be found by means of images (image frames of video collected by the cameras are sampled) collected by the cameras, and the moving track of the target object is further determined based on the images.

However, in practical applications, the height of the position where the cameras are often set is relatively high, the pixel resolution is relatively low, and it is difficult to obtain an accurate and reliable detection result by directly detecting the target object from the image collected by the cameras, which affects the determination result of the real movement track of the target object.

Disclosure of Invention

The embodiment of the invention provides an image detection method, an image detection device, image detection equipment and a storage medium, which can detect an image actually containing a target object.

In a first aspect, an embodiment of the present invention provides an image detection method, where the method includes:

determining a plurality of images of which the visual characteristics and the target object meet the similarity requirement from the images acquired by the plurality of cameras;

determining a spatiotemporal point sequence according to spatiotemporal points corresponding to the plurality of images, wherein the spatiotemporal point corresponding to any image is composed of acquisition time corresponding to any image and a user position corresponding to the acquisition time of the target object, and the spatiotemporal points in the spatiotemporal point sequence are arranged according to the acquisition time sequence;

performing Gaussian process modeling on the space-time point sequence to determine a path smoothing index corresponding to the space-time point sequence;

filtering images of the plurality of images that do not match the target object according to the path smoothing index.

In a second aspect, an embodiment of the present invention provides an image detection apparatus, including:

the image primary selection module is used for determining a plurality of images of which the visual characteristics and the target object meet the similarity requirement from the images acquired by the plurality of cameras;

the smoothing processing module is used for determining a spatiotemporal point sequence according to spatiotemporal points corresponding to the plurality of images and carrying out Gaussian process modeling on the spatiotemporal point sequence to determine a path smoothing index corresponding to the plurality of images, wherein the spatiotemporal point corresponding to any image is formed by acquisition time corresponding to any image and a user position corresponding to the acquisition time of the target object, and the spatiotemporal points in the spatiotemporal point sequence are arranged according to the acquisition time sequence;

and the image filtering module is used for filtering the images which are not matched with the target object in the plurality of images according to the path smoothing indexes.

In a third aspect, an embodiment of the present invention provides an electronic device, including: a memory, a processor; wherein the memory has stored thereon executable code which, when executed by the processor, causes the processor to implement at least the image detection method of the first aspect.

In a fourth aspect, an embodiment of the present invention provides a non-transitory machine-readable storage medium having stored thereon executable code, which when executed by a processor of an electronic device, causes the processor to implement at least the image detection method according to the first aspect.

In a fifth aspect, an embodiment of the present invention provides an image detection method, including:

responding to a request for calling a target service, and determining a processing resource corresponding to the target service;

executing the following steps by utilizing the processing resource corresponding to the target service:

performing Gaussian process modeling on the time-space point sequence to determine path smoothing indexes corresponding to the plurality of images;

In the scheme provided by the embodiment of the invention, after a plurality of images with visual characteristics meeting the similarity requirement with the target object are determined from the images collected by a plurality of cameras, a space-time point sequence can be determined according to a plurality of space-time points corresponding to the plurality of images. The space-time point corresponding to any image is composed of the acquisition time corresponding to any image and the user position of the target object corresponding to the acquisition time, and the space-time points in the space-time point sequence are arranged according to the acquisition time sequence. In order to determine whether the images are all images matched with the target object, namely whether abnormal images which do not contain the target object exist in the images, Gaussian process modeling is carried out on the spatio-temporal point sequence to determine a path smoothing index corresponding to the spatio-temporal point sequence, wherein the path smoothing index reflects the smoothness of the positions of a plurality of users in the spatio-temporal point sequence. It can be understood that if an image that does not match the target object exists in the plurality of images and the spatio-temporal points corresponding to the image are included in the spatio-temporal point sequence, the path smoothing index corresponding to the spatio-temporal point sequence is relatively low, that is, the path smoothing index can measure the reasonableness of the appearance of each user position in the spatio-temporal point sequence, and if the appearance of a certain user position is not reasonable, the image corresponding to the user position is an image that does not match the target object. Accordingly, images of the plurality of images that do not match the target object may be filtered according to the path smoothing index. In this way, the images left after filtering are all images matched with the target object (i.e. including the target object), and further, the real moving track of the target object in the corresponding time period can be known based on the images left after filtering.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.

Fig. 1 is a flowchart of an image detection method according to an embodiment of the present invention;

fig. 2 is a schematic diagram of a path smoothness index value taking situation according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of an image detection scenario according to an embodiment of the present invention;

FIG. 4 is a flowchart of an image detection method according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of an image detection apparatus according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of an electronic device corresponding to the image detection apparatus provided in the embodiment shown in fig. 5.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The terminology used in the embodiments of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the examples of the present invention and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise, and "a plurality" typically includes at least two.

The words "if", as used herein, may be interpreted as "at … …" or "at … …" or "in response to a determination" or "in response to a detection", depending on the context. Similarly, the phrases "if determined" or "if detected (a stated condition or event)" may be interpreted as "when determined" or "in response to a determination" or "when detected (a stated condition or event)" or "in response to a detection (a stated condition or event)", depending on the context.

In addition, the sequence of steps in each method embodiment described below is only an example and is not strictly limited.

The image detection method provided by the embodiment of the invention can be executed by an electronic device, and the electronic device can be a terminal device such as a PC (personal computer), a notebook computer and the like, and can also be a server at the cloud end. The server may be a physical server comprising a stand-alone host, or may be a virtual server, or may be a cloud server.

Fig. 1 is a flowchart of an image detection method according to an embodiment of the present invention, as shown in fig. 1, the method includes the following steps:

101. and determining a plurality of images of which the visual features and the target object meet the similarity requirement from the images acquired by the plurality of cameras.

102. And determining a space-time point sequence according to the space-time points corresponding to the plurality of images, wherein the space-time point corresponding to any image is formed by the acquisition time corresponding to any image and the user position of the target object corresponding to the acquisition time, and the space-time points in the space-time point sequence are arranged according to the acquisition time sequence.

103. And carrying out Gaussian process modeling on the time-space point sequence to determine a path smoothing index corresponding to the time-space point sequence.

104. And filtering images which are not matched with the target object in the plurality of images according to the path smoothing index.

In the embodiment of the invention, the target object can be an object of a specific person or a specific vehicle.

In practical applications, there may be a need for: when the moving track of a specific person needs to be known within a certain period of time, the scheme provided by the embodiment of the invention can realize accurate determination of the moving track of the person.

A plurality of cameras are arranged on two sides of a road in a city, so that pedestrians and vehicles on the road can be monitored through the cameras.

When the moving track of the target object in a certain time period needs to be detected, a reference image of the target object can be input, and the visual characteristics of the target object are clearly presented in the reference image. For example, when the target object is a person, the visual feature may be a feature of the person, such as the outline of five sense organs, hair style, sex, clothing, etc.; when the target object is a vehicle, the visual feature may be a license plate number, a vehicle type, a color, a specific object placed somewhere in the vehicle, or the like.

And determining a plurality of images which meet the similarity requirement of the visual characteristics and the target object from all the images acquired by the plurality of cameras in the time period based on the reference image of the target object.

The plurality of cameras may be all cameras deployed in a city, or may be individual cameras deployed in a moving range when a general moving range of the target object in the time period is approximately known.

It can be understood that when the camera is working, the camera is generally working in a manner of shooting a video, and therefore, taking a video collected by any one of the multiple cameras in the time period as an example, by sampling image frames of the video, an M1 frame image corresponding to the camera can be obtained, and assuming that the number of the multiple cameras is M2, an M1 × M2 frame image can be finally obtained, where M1 and M2 are integers greater than 1.

Then, a plurality of images with visual characteristics meeting the similarity requirement with the target object are determined in the M1M 2 frames of images. In brief, the visual features are extracted from any one of the frames of images, similarity calculation is performed between the extracted visual features and the visual features extracted from the reference image, the similarity between the two is obtained, and if the similarity is greater than a set threshold, the frame of image is regarded as one of the images.

After obtaining the plurality of images, optionally, determining a space-time point sequence according to the space-time points corresponding to the plurality of images may be implemented as: a sequence of space-time points is determined, which is made up of a plurality of space-time points corresponding to the plurality of images.

The embodiment of the invention introduces a concept: a point of space time. And a space-time point is composed of a time and a place, and the space-time point corresponds to one image in the plurality of images, wherein the time in the space-time point refers to the acquisition time corresponding to the image, and the place in the space-time point sets the user position corresponding to the target object at the acquisition time. Therefore, the space-time point corresponding to any image is composed of the acquisition time corresponding to any image and the user position corresponding to the acquisition time of the target object.

It can be understood that when a certain image is captured by a certain camera, the camera records the captured timestamp information, so that the capture time of the image can be known. For the user position, different determination methods can be provided according to the requirement of positioning accuracy, for example, if the requirement on positioning accuracy is low, the position of a camera for acquiring an image can be used as the user position; if the requirement on the positioning accuracy is high, the user position can be positioned by combining the relevant parameters of the camera for acquiring the image and the pixel position of the target object in the image.

After the time-space points corresponding to the multiple images are obtained, sequencing the obtained multiple time-space points according to the sequence of the acquisition time, and obtaining a time-space point sequence corresponding to the multiple images.

It will be appreciated that the sequence of space-time points is actually a discrete space-time point trajectory: a trajectory consisting of a discrete number of space-time points.

For the sake of convenience of description, it is assumed that the space-time point sequence, i.e. the discrete space-time point trace, is represented by τ (t)₁,l₁),(t₁,l₁),(t₂,l₂),…(t_n,l_n),t_i≤t_i+1。

It follows that the plurality of images is assumed here to be n images, n being greater than 1. (t)_k,l_k) Represents time (t)_kWhen the target object appears at the location l_k。

Wherein l_kContaining longitude and latitude, can be represented as (l)_kx,l_ky)。

In the embodiment of the invention, assuming that a plurality of acquisition times and a plurality of user positions corresponding to a plurality of space-time points are in accordance with Gaussian distribution, path smoothing indexes corresponding to a plurality of images can be determined by performing Gaussian process modeling on the space-time point sequence. In summary, in the process of performing gaussian process modeling, a plurality of acquisition times included in a spatiotemporal point sequence are used as independent variables, and a plurality of user positions are used as dependent variables to perform gaussian process modeling on the spatiotemporal point sequence.

In the embodiment of the invention, a path smoothing index is designed for measuring the smoothness corresponding to the discrete space-time point locus.

In short, in practical application, when the target object is located at two locations successively, if the distance between the two locations is closer, the time difference between the times corresponding to the two locations passing successively is smaller. In other words, if the target object appears at point a at point 8, and point B far from point a at point 8 and point 1, this is obviously unreasonable, and at this time, the image corresponding to point B will be the filtered image, i.e. it is very likely that the image does not match the target object, i.e. the image does not contain the target object, but only contains an object similar to the target object.

It can be seen that the path smoothness index can be used to reflect the reasonableness of the occurrence of each user position in the sequence of space-time points. In the calculation process of the path smoothing index, the above characteristics of time and distance are combined to perform calculation, which will be described later.

In summary, optionally, the sequence of spatio-temporal points is modeled by a gaussian process comprising:

magnitude elimination processing is carried out on a plurality of user positions contained in the time-space point sequence;

and performing Gaussian process modeling on the time-space point sequence by taking a plurality of acquisition times contained in the time-space point sequence as independent variables and taking magnitude elimination processing results of a plurality of user positions as dependent variables.

As can be seen from the above example, one user position is composed of longitude and latitude, and therefore, the magnitude elimination process for the user position includes a magnitude elimination process for longitude and a magnitude elimination process for latitude, so that, finally, the gaussian process modeling can be performed on the spatiotemporal point sequence with a plurality of acquisition times included in the spatiotemporal point sequence as independent variables and with magnitude elimination process results for longitude and magnitude elimination process results for latitude in the plurality of user positions as dependent variables.

In other words, in short, a gaussian process is established (i.e., gaussian process modeling is performed) for two types of location information, namely longitude and latitude, so that the edge probability of the multivariate gaussian distribution corresponding to longitude and the edge probability of the multivariate gaussian distribution corresponding to latitude can be obtained.

Wherein, for the time-space point sequence including n user positions, the number of the above-mentioned multivariate is the number of the user positions.

Based on the above-mentioned gaussian process establishing result, finally, defining the path smoothing indexes corresponding to the plurality of images as the sum of the following two terms: the logarithm of the edge probability of the multivariate gaussian distribution corresponding to the longitude and the logarithm of the edge probability of the multivariate gaussian distribution corresponding to the latitude.

In the above, the magnitude elimination process is performed on a plurality of user locations included in the time-space point sequence, where the magnitude elimination process may be understood as performing a certain transformation or preprocessing on the user locations to eliminate the influence of the magnitude, and finally making the path smoothing index magnitude-independent.

Optionally, the magnitude elimination processing is performed on a plurality of user positions included in the time-space point sequence, and includes:

determining a center of gravity point for a plurality of user locations;

determining an average distance from a plurality of user locations to a center of gravity point;

and for any user position in a plurality of user positions, carrying out magnitude elimination processing on the any user position according to the any user position, the average distance and the gravity center point.

Since one user location is composed of longitude and latitude, determining the center of gravity of a plurality of user locations may be specifically implemented as: the average value of the longitudes in the plurality of user positions is taken as a longitude center of gravity point, and the average value of the latitudes in the plurality of user positions is taken as a latitude center of gravity point, so that the center of gravity points of the plurality of user positions are composed of the longitude center of gravity point and the latitude center of gravity point.

Determining the average distance of the plurality of user positions to the center of gravity point refers to: the distances from the positions of the users to the gravity center point are respectively calculated to obtain a plurality of distances, and then the plurality of distances are subjected to average calculation to obtain the average distance.

Optionally, according to the any user position, the average distance, and the center of gravity point, magnitude elimination processing is performed on the any user position, which may specifically be implemented as:

determining a first difference of a longitude and a longitude center of gravity point in the any user location and a second difference of a latitude and a latitude center of gravity point in the any user location;

determining a first ratio of a first difference to the average distance and a second ratio of a second difference to the average distance;

the first ratio is a magnitude elimination result of longitude in any user position, and the second ratio is a magnitude elimination result of latitude in any user position.

For ease of understanding, the following is in conjunction with the sequence of space-time points assumed above: t (t)₁,l₁),(t₁,l₁),(t₂,l₂),…(t_n,l_n),t_i≤t_i+1For illustration.

Let l_kIs any user position in the sequence of spatio-temporal points, which contains longitude and latitude, and can be represented as (l)_kx,l_ky) Wherein l is_kxDenotes longitude, l_kyIndicating the latitude.

The center of gravity point is expressed as:

based on the above center of gravity point calculation results, an average distance of the n user positions to the center of gravity point is determined, assuming that the average distance is represented as μ (l). The average distance is the euclidean distance.

With l_kFor example, the magnitude elimination processing is performed on the longitude thereof as follows:

the magnitude elimination processing is performed on the longitude thereof as follows:

based on the above example, the path smoothing indexes corresponding to the n images, that is, the path smoothing indexes corresponding to the spatio-temporal point sequence, can be finally defined as the following formula (1):

wherein l_xRepresenting a longitude vector consisting of n longitudes in a sequence of spatio-temporal points, l_yRepresenting a latitude vector consisting of n latitudes in a sequence of spatio-temporal points, l_tRepresenting a time vector consisting of n acquisition times in a sequence of spatio-temporal points. Sigma l_tFor evaluating l_tIs a covariance matrix with dimension n × n, and the element of the ith row and j column is k (t)_i,t_j) K () is a gaussian process kernel function, and if i ═ j, the variance of the gaussian noise distribution is additionally added to the function value.

If it will be_kThe polar coordinates with the origin at the center of gravity point are expressed as: (ρ)_kcosθ_k,ρ_ksin θ), then

At this time, the above calculation formula of the path smoothing index is equivalent to the following formula (2):

wherein, a_ijIs sigma_ltRow i and column j elements of the inverse matrix.

Based on the above formula (1) or (2), a path smoothing index corresponding to the spatio-temporal point sequence, that is, the plurality of images, is determined, and the path smoothing index reflects the smoothness of the movement trajectory corresponding to the plurality of user positions arranged in front of and behind the spatio-temporal point sequence.

In summary, the theoretical basis for filtering the image that does not match the target object from the plurality of images according to the path smoothing index is: if one of the images is an image not matched with the target object, the path smoothing indexes corresponding to the rest of the images are obviously improved after the image is filtered. According to this feature, filtering of images that do not match the target object can be performed.

The process of filtering the image is actually a process of finding outliers: unreasonable points in time-space are sought in the sequence of points in time-space. This unreasonable example is represented by: it is obviously unreasonable that the time difference between the acquisition times corresponding to the current spatio-temporal point and the next spatio-temporal point is small, but the next user position is far from the previous one.

It can be understood that the greater the number of outliers, the smaller the path smoothing exponent; the fewer the number of outliers, the larger the path smoothing exponent.

For ease of understanding, the difference in path smoothness indices for several different numbers of outliers is illustrated in conjunction with FIG. 2.

In fig. 2, the horizontal axis represents time, and the vertical axis represents a longitudinal position. Illustrated is a path smoothing exponent for a space-time point consisting of 11 time-longitude positions. The whole space-time point trajectory is subjected to fitting prediction according to Gaussian process regression, and a plurality of path smoothing indexes illustrated in FIG. 2 are obtained.

Specifically, in the upper left diagram, assuming that there are no outliers in the 11 spatio-temporal points, the smoothness of the 11 spatio-temporal points is good, and the path smoothness index is the largest.

In the upper right diagram, a case is assumed where there is one singular point among the 11 space-time points, which is a position corresponding to when t is 2.

In the lower left diagram, a case is assumed where there are two outliers among the 11 spatio-temporal points, which are positions corresponding to when t is 2 and t is 8, respectively.

In the lower right diagram, a case is assumed where there are three outliers among the 11 space-time points, which are positions corresponding to when t is 2, t is 6, and t is 8, respectively.

As can be seen from the example in fig. 2, as the number of outliers increases, the path smoothing index tends to decrease.

It should be noted that fig. 2 illustrates the path smoothing exponent corresponding to the longitude position, and actually, as can be seen from the above definition of the path smoothing exponent corresponding to the time-space point sequence, the path smoothing exponent corresponding to the time-space point sequence is formed by adding two parts: a path smoothing exponent corresponding to a longitude location and a path smoothing exponent corresponding to a latitude location.

After obtaining the path smoothing indexes corresponding to the plurality of images, optionally, if the path smoothing index is greater than a set threshold, it may be determined that an image that does not match the target object does not exist in the plurality of images, and conversely, an image that does not match the target object exists, and at this time, image filtering processing is required to filter out an abnormal image (an image that does not match the target object).

In order to realize the filtering of the abnormal images, in the process of determining the spatio-temporal point sequence according to the spatio-temporal points corresponding to the plurality of images, optionally, besides determining the spatio-temporal point sequence formed by the spatio-temporal points corresponding to the plurality of images, a sub-spatio-temporal point sequence corresponding to the plurality of images can be determined, wherein the sub-spatio-temporal point sequence corresponding to any image is obtained by arranging the spatio-temporal points corresponding to other images in the plurality of images according to the acquisition time sequence.

Therefore, in the calculation process of the path smoothing index, in addition to the gaussian process modeling can be performed on the spatio-temporal point sequence formed by the spatio-temporal points corresponding to the plurality of images respectively to determine the path smoothing index corresponding to the plurality of images, the gaussian process modeling can be performed on the sub-spatio-temporal point sequence corresponding to the plurality of images respectively to determine the path smoothing index corresponding to the plurality of images respectively.

And finally, when the lifting degree of the path smoothing indexes corresponding to the target image in the plurality of images relative to the path smoothing indexes corresponding to the plurality of images is determined to meet the set condition, determining that the target image is an image which is not matched with the target object.

For convenience of understanding, for example, assuming that there are 20 images, after the respective spatio-temporal points corresponding to the 20 images are determined, the spatio-temporal points are sorted according to the acquisition time sequence to obtain a spatio-temporal point sequence C0 including 20 spatio-temporal points.

Then, for any image 1 in the 20 images, after excluding the image 1 from the 20 images, the spatiotemporal points corresponding to the remaining 19 other images form a sub-spatiotemporal point sequence C1 corresponding to the image 1, and it is understood that the spatiotemporal points in the sub-spatiotemporal point sequence C1 are also sorted in the order of acquisition time.

Similarly, for another image 2 in the 20 images, after the image 2 is excluded from the 20 images, the spatiotemporal points corresponding to the remaining 19 other images form a sub-spatiotemporal point sequence C2 corresponding to the image 2, and it can be understood that the spatiotemporal points in the sub-spatiotemporal point sequence C2 are also sorted according to the acquisition time sequence.

By analogy, for each of the 20 images, a corresponding sequence of sub-spatio-temporal points is obtained, which eventually results in 20 sequences of sub-spatio-temporal points, which are assumed to be denoted as C1, C2 … C20.

Further, the path smoothing index corresponding to each of the 20 sub spatio-temporal point sequences and the path smoothing index corresponding to the spatio-temporal point sequence C0 are calculated according to the above-described path smoothing index calculation formula.

Then, optionally, the path smoothing indexes corresponding to the 20 sub-spatiotemporal point sequences may be compared with the path smoothing index corresponding to the spatiotemporal point sequence C0 one by one, and if the path smoothing index corresponding to a certain sub-spatiotemporal point sequence is greater than the path smoothing index corresponding to the spatiotemporal point sequence C0 and the difference between the two is greater than a set threshold, the image corresponding to the sub-spatiotemporal point sequence may be considered as an image that does not match the target object. For example, if the path smoothing index corresponding to the sub-space-point sequence C1 corresponding to image 1 meets the above condition, image 1 is considered as an image that does not match the target object, and may be referred to as an abnormal image.

The following describes an implementation of the above image detection method in a practical application scenario, with reference to fig. 3.

Assume that in the road scene illustrated in fig. 3, three cameras illustrated in the drawing are deployed. And assume that the target object is a set person, called user a. The current tasks are: and finding out an image containing the user A from a picture shot by the camera in a certain time period.

At this time, taking three cameras illustrated in fig. 3 as an example, first, through matching of visual features, a plurality of images similar to the user a are sampled from a video acquired by the three cameras, and an acquisition time and a user position corresponding to each image are determined to form a space-time point, so that a space-time point sequence corresponding to the plurality of images can be obtained. In fig. 3, it is assumed that 7 images are sampled, and the sampling time and the user position corresponding to the 7 images are respectively shown in fig. 3.

And then, carrying out Gaussian process modeling on the space-time point sequence formed by the 7 space-time points, and determining path smoothing indexes corresponding to the plurality of images according to the Gaussian process modeling result. The path smoothness index reflects the smoothness of these user positions in the temporal-spatial point sequence.

Alternatively, if the path smoothing index is greater than the preset threshold, the multiple images may be considered as images matching the user a, that is, images including the user a, otherwise, the multiple images are considered as images not matching the user a, and at this time, the multiple images need to be filtered.

In fig. 3, it is assumed that the real movement trajectory of user a corresponds to six location points L1, L2, L3, L4, L5 and L7, and the user corresponding to location L6 is not user a but is only someone similar to user a in physical characteristics.

At this time, the path smoothing index of the sub-spatiotemporal sequence corresponding to each image is determined by excluding spatiotemporal points from the spatiotemporal point sequence one by one, and finally, after finding that the spatiotemporal point corresponding to the position L6 is removed, the path smoothing index corresponding to the spatiotemporal point sequence is significantly improved, and then it is determined that the user corresponding to the position L6 is not the user a, and the corresponding image (i.e., the abnormal image) is removed.

The abnormal image filtering method may be implemented in other manners besides the above manners, such as the scheme provided with reference to the embodiment shown in fig. 4.

Fig. 4 is a flowchart of an image detection method according to an embodiment of the present invention, and as shown in fig. 4, the method includes the following steps:

401. and determining a plurality of images of which the visual features and the target object meet the similarity requirement from the images acquired by the plurality of cameras.

402. And determining sub space-time point sequences corresponding to the plurality of images, wherein the sub space-time point sequence corresponding to any image is obtained by arranging the space-time points corresponding to other images in the plurality of images according to the acquisition time sequence.

403. And respectively carrying out Gaussian process modeling on the sub-space-time point sequences corresponding to the plurality of images so as to determine path smoothing indexes corresponding to the plurality of images.

404. And determining the image with the path smoothing index meeting the set condition as an image which is not matched with the target object according to the path smoothing indexes corresponding to the plurality of images.

In this embodiment, in order to determine an image that does not match the target object from the plurality of images, after obtaining the spatio-temporal points corresponding to the plurality of images, a sub-spatio-temporal point sequence corresponding to each image may be determined.

As in the above example, assuming that there are 20 images in total, after the spatio-temporal points corresponding to the 20 images are determined, for any image 1 in the 20 images, after the image 1 is excluded from the 20 images, the spatio-temporal points corresponding to the remaining 19 images form a sub-spatio-temporal point sequence C1 corresponding to the image 1, and it can be understood that the spatio-temporal points in the sub-spatio-temporal point sequence C1 are ordered according to the acquisition time sequence.

Furthermore, according to the above-mentioned formula for calculating path smoothing index, the path smoothing index corresponding to each of the 20 sub-spatio-temporal point sequences is calculated, that is, the path smoothing index corresponding to each of the 20 images is obtained. This eventually results in 20 path smoothing exponents.

Then, optionally, a mean value of the 20 path smoothing indexes may be calculated, a value range, such as a range of ± 3 of the mean value, is set according to the mean value, and if a certain path smoothing index is outside the value range, that is, does not fall into the value range, the image corresponding to the path smoothing index is considered to be an abnormal image that is not matched with the target object.

That is, according to the path smoothing indexes corresponding to the plurality of images, determining that the image whose path smoothing index meets the set condition is an image that does not match the target object may be implemented as:

determining the mean value of path smoothing indexes corresponding to the plurality of images;

determining a target value range according to the mean value;

and determining the image corresponding to the path smoothing index which does not belong to the target value range as the image which is not matched with the target object according to the attribution relationship between the path smoothing index corresponding to each of the plurality of images and the target value range.

It should be noted that, for example, still with the 20 images, assuming that the image 1 is determined to be an abnormal image, the image 1 is filtered out, and at this time, 19 images remain. And continuing to perform the filtering processing on the 19 images to further eliminate abnormal images possibly contained in the 19 images, and so on until abnormal images meeting the set conditions cannot be found, wherein the remaining images are considered to be images matched with the target object.

Thus, the space-time points corresponding to the final remaining images are the space-time points corresponding to the real movement trajectory of the target object.

In summary, based on the path smoothing index defined for the discrete space-time point trajectory in the embodiment of the present invention, after a plurality of images similar to the target object in visual characteristics are preliminarily screened out, images that are actually not matched with the target object and are not matched with the target object are further filtered out based on the path smoothing index, so that the finally obtained images are all images matched with the target object, not only is the filtering of the images matched with the target object realized, but also the movement trajectory of the target object in the corresponding time period can be accurately known based on the remaining filtered images.

It should be further noted that the path smoothing index provided by the embodiment of the present invention satisfies some index requirements as follows:

first, rationality is mainly reflected by rotational invariance and magnitude independence. Where rotational invariance is simply the number of spatiotemporal points that are smooth in the east-west direction, and then also in the north-south direction, it can be demonstrated by the above equation (2) that the path smoothness index is related to the angle of the spatiotemporal points with respect to the position of the center of gravity point. The magnitude independence refers to magnitude elimination through the average distance, so that the determination result of the final path smoothing index is not influenced no matter what magnitude scale the spatio-temporal point sequence has

Second, consistency, mainly manifested as monotonicity. As the space-time point increases, the path smoothing exponent will substantially exhibit a decreasing trend. However, it is not strictly monotonous as the space-time point increases. For example, when adding a space-time point such that the front and back traces join (there are two separate space-time point traces before adding the space-time point, and after adding the space-time point, the two space-time point traces join together), then the path smoothness index should be increased rather than decreased.

Third, sensitivity, primarily refers to abnormal sensitivity. If noise is added to the sequence of spatio-temporal points, i.e. outliers are added, the path smoothing exponent will drop significantly.

Fourth, stability refers to practical stability. The above definition of path smoothing exponent is stable and available, and the computational complexity is not high.

As described above, the image detection method provided by the present invention can be executed in the cloud, and a plurality of computing nodes may be deployed in the cloud, and each computing node has processing resources such as computation and storage. In the cloud, a plurality of computing nodes may be organized to provide a service, and of course, one computing node may also provide one or more services.

According to the scheme provided by the invention, the cloud end can be provided with a service for completing image detection, which is called a target service. When the user needs to use the target service, the target service is called so as to trigger a request for calling the target service to the cloud, and the request can carry videos obtained by the cameras. The cloud determines the compute node responding to the request, and performs the following steps by using the processing resource (i.e. the processing resource corresponding to the target service) in the compute node:

The specific implementation of the above steps may refer to the related descriptions in the foregoing other embodiments, which are not described herein again.

An image detection apparatus according to one or more embodiments of the present invention will be described in detail below. Those skilled in the art will appreciate that these means can each be constructed using commercially available hardware components and by performing the steps taught in this disclosure.

Fig. 5 is a schematic structural diagram of an image detection apparatus according to an embodiment of the present invention, and as shown in fig. 5, the apparatus includes: an image primary selection module 11, a smoothing processing module 12 and an image filtering module 13.

And the image primary selection module 11 is configured to determine a plurality of images, of which the visual features and the target object meet the similarity requirement, from the images acquired by the plurality of cameras.

A smoothing module 12, configured to determine a spatio-temporal point sequence formed by a plurality of spatio-temporal points corresponding to the plurality of images, and perform gaussian process modeling on the spatio-temporal point sequence to determine a path smoothing index corresponding to the spatio-temporal point sequence; the space-time point corresponding to any image is composed of the acquisition time corresponding to any image and the user position of the target object corresponding to the acquisition time, and the plurality of space-time points are arranged according to the acquisition time sequence to obtain the space-time point sequence.

An image filtering module 13, configured to filter, according to the path smoothness index, an image that does not match the target object in the plurality of images.

Optionally, the smoothing module 12 may be specifically configured to: carrying out magnitude elimination processing on a plurality of user positions contained in the space-time point sequence; and performing Gaussian process modeling on the space-time point sequence by taking a plurality of acquisition times contained in the space-time point sequence as independent variables and taking magnitude elimination processing results of the plurality of user positions as dependent variables.

Optionally, one user location is composed of a longitude and a latitude, and the smoothing module 12 may be specifically configured to: and taking a plurality of acquisition times contained in the space-time point sequence as independent variables, and respectively taking magnitude elimination processing results of longitudes and magnitudes of latitudes in the plurality of user positions as dependent variables to perform Gaussian process modeling on the space-time point sequence.

Optionally, the smoothing module 12 may be specifically configured to: determining the path smoothing indexes corresponding to the plurality of images as the sum of the following two terms: the logarithm of the edge probability of the multivariate gaussian distribution corresponding to the longitude and the logarithm of the edge probability of the multivariate gaussian distribution corresponding to the latitude.

Optionally, in the process of performing magnitude elimination processing on a plurality of user positions included in the space-time point sequence, the smoothing processing module 12 may be specifically configured to: determining a center of gravity point for the plurality of user locations; determining an average distance of the plurality of user locations to the center of gravity point; and for any user position in the plurality of user positions, carrying out magnitude elimination processing on the any user position according to the any user position, the average distance and the gravity center point.

Wherein, one user location is composed of longitude and latitude, and in the process of determining the center of gravity points of the plurality of user locations, the smoothing processing module 12 may specifically be configured to: taking an average value of longitudes in the plurality of user positions as a longitude center of gravity point; taking an average value of the latitudes in the plurality of user positions as a latitude center of gravity point, wherein the center of gravity point is composed of the longitude center of gravity point and the latitude center of gravity point.

Optionally, in the process of magnitude elimination processing on any user location, the smoothing processing module 12 may specifically be configured to: determining a first difference of the longitude in the any user location and the longitude center of gravity point, and a second difference of the latitude in the any user location and the latitude center of gravity point; determining a first ratio of the first difference to the average distance and a second ratio of the second difference to the average distance; wherein the first ratio is a magnitude elimination result of longitude in any user position, and the second ratio is a magnitude elimination result of latitude in any user position.

Optionally, the image filtering module 13 may be specifically configured to: determining sub-space-time point sequences corresponding to the plurality of images, wherein the sub-space-time point sequence corresponding to any image is obtained by arranging space-time points corresponding to other images in the plurality of images according to an acquisition time sequence; respectively carrying out Gaussian process modeling on the sub-space-time point sequences corresponding to the plurality of images so as to determine path smoothing indexes corresponding to the plurality of images; and determining that the lifting degree of the path smoothing indexes corresponding to the target images in the plurality of images relative to the path smoothing indexes corresponding to the plurality of images meets a set condition, and determining that the target images are images which are not matched with the target objects.

Optionally, the smoothing module 12 may be specifically configured to: determining sub-space-time point sequences corresponding to the plurality of images, wherein the sub-space-time point sequence corresponding to any image is obtained by arranging space-time points corresponding to other images in the plurality of images according to an acquisition time sequence; and respectively carrying out Gaussian process modeling on the sub-space-time point sequences corresponding to the plurality of images so as to determine path smoothing indexes corresponding to the plurality of images. Thus, the image filtering module 13 may be specifically configured to: and determining the image with the path smoothing index meeting the set condition as the image which is not matched with the target object according to the path smoothing indexes corresponding to the plurality of images.

The apparatus shown in fig. 5 can execute the image detection method provided in the embodiments shown in fig. 1 to fig. 4, and the detailed execution process and technical effect are described in the embodiments, and are not described herein again.

In one possible design, the structure of the image detection apparatus shown in fig. 5 may be implemented as an electronic device, as shown in fig. 6, which may include: a processor 21 and a memory 22. Wherein the memory 22 has stored thereon executable code which, when executed by the processor 21, makes the processor 21 at least implement the image detection method as provided in the aforementioned embodiments illustrated in fig. 1 to 4.

Optionally, the electronic device may further include a communication interface 23 for communicating with other devices.

In addition, an embodiment of the present invention provides a non-transitory machine-readable storage medium having stored thereon executable code, which, when executed by a processor of an electronic device, causes the processor to implement at least the image detection method provided in the foregoing embodiments shown in fig. 1 to 4.

The above-described apparatus embodiments are merely illustrative, wherein the units described as separate components may or may not be physically separate. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by adding a necessary general hardware platform, and of course, can also be implemented by a combination of hardware and software. With this understanding in mind, the above-described aspects and portions of the present technology which contribute substantially or in part to the prior art may be embodied in the form of a computer program product, which may be embodied on one or more computer-usable storage media having computer-usable program code embodied therein, including without limitation disk storage, CD-ROM, optical storage, and the like.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. An image detection method, comprising:

2. The method of claim 1, wherein said modeling said sequence of spatio-temporal points using a gaussian process comprises:

carrying out magnitude elimination processing on a plurality of user positions contained in the space-time point sequence;

and performing Gaussian process modeling on the space-time point sequence by taking a plurality of acquisition times contained in the space-time point sequence as independent variables and taking magnitude elimination processing results of the plurality of user positions as dependent variables.

3. The method of claim 2, wherein a user location consists of longitude and latitude;

the performing gaussian process modeling on the spatiotemporal point sequence by taking a plurality of acquisition times contained in the spatiotemporal point sequence as independent variables and taking magnitude elimination processing results of a plurality of user positions as dependent variables comprises the following steps:

and taking a plurality of acquisition times contained in the space-time point sequence as independent variables, and respectively taking magnitude elimination processing results of longitudes and magnitudes of latitudes in the plurality of user positions as dependent variables to perform Gaussian process modeling on the space-time point sequence.

4. The method of claim 3, wherein the path smoothness index for the plurality of images is determined as a sum of: the logarithm of the edge probability of the multivariate gaussian distribution corresponding to the longitude and the logarithm of the edge probability of the multivariate gaussian distribution corresponding to the latitude.

5. The method of claim 2, wherein the magnitude elimination processing for the plurality of user locations included in the sequence of space-time points comprises:

determining a center of gravity point for the plurality of user locations;

determining an average distance of the plurality of user locations to the center of gravity point;

and for any user position in the plurality of user positions, carrying out magnitude elimination processing on the any user position according to the any user position, the average distance and the gravity center point.

6. The method of claim 5, wherein a user location consists of longitude and latitude;

the determining the center of gravity points of the plurality of user locations comprises:

taking an average value of longitudes in the plurality of user positions as a longitude center of gravity point;

taking an average value of the latitudes in the plurality of user positions as a latitude center of gravity point, wherein the center of gravity point is composed of the longitude center of gravity point and the latitude center of gravity point.

7. The method of claim 6, wherein said performing magnitude elimination processing on said any user position according to said any user position, said average distance and said gravity center point comprises:

determining a first difference of the longitude in the any user location and the longitude center of gravity point, and a second difference of the latitude in the any user location and the latitude center of gravity point;

determining a first ratio of the first difference to the average distance and a second ratio of the second difference to the average distance;

wherein the first ratio is a magnitude elimination result of longitude in any user position, and the second ratio is a magnitude elimination result of latitude in any user position.

8. The method according to any one of claims 1 to 7, wherein determining a sequence of spatio-temporal points from spatio-temporal points corresponding to each of the plurality of images comprises:

determining a space-time point sequence formed by the space-time points corresponding to the images according to the space-time points corresponding to the images, and determining a sub-space-time point sequence corresponding to the images, wherein the sub-space-time point sequence corresponding to any image is obtained by arranging the space-time points corresponding to other images in the images according to an acquisition time sequence;

the performing Gaussian process modeling on the spatio-temporal point sequence to determine path smoothness indexes corresponding to the plurality of images comprises:

performing Gaussian process modeling on a space-time point sequence formed by space-time points corresponding to the plurality of images respectively to determine path smoothing indexes corresponding to the plurality of images; respectively carrying out Gaussian process modeling on the sub-space-time point sequences corresponding to the plurality of images to determine path smoothing indexes corresponding to the plurality of images;

the filtering images of the plurality of images that do not match the target object according to the path smoothing index includes:

and determining that the lifting degree of the path smoothing indexes corresponding to the target images in the plurality of images relative to the path smoothing indexes corresponding to the plurality of images meets a set condition, and determining that the target images are images which are not matched with the target objects.

9. The method according to any one of claims 1 to 7, wherein determining a sequence of spatio-temporal points from spatio-temporal points corresponding to each of the plurality of images comprises:

determining sub-space-time point sequences corresponding to the plurality of images, wherein the sub-space-time point sequence corresponding to any image is obtained by arranging space-time points corresponding to other images in the plurality of images according to an acquisition time sequence;

the performing gaussian process modeling on the spatio-temporal point sequence to determine a path smoothing index corresponding to the spatio-temporal point sequence includes:

respectively carrying out Gaussian process modeling on the sub-space-time point sequences corresponding to the plurality of images so as to determine path smoothing indexes corresponding to the plurality of images;

and determining the image with the path smoothing index meeting the set condition as the image which is not matched with the target object according to the path smoothing indexes corresponding to the plurality of images.

10. An image detection apparatus, characterized by comprising:

11. An electronic device, comprising: a memory, a processor; wherein the memory has stored thereon executable code which, when executed by the processor, causes the processor to perform the image detection method of any one of claims 1 to 9.

12. A non-transitory machine-readable storage medium having stored thereon executable code, which when executed by a processor of an electronic device, causes the processor to perform the image detection method of any one of claims 1 to 9.

13. An image detection method, comprising: