CN112149640B

CN112149640B - Method, device, computer equipment and medium for determining position of target object

Info

Publication number: CN112149640B
Application number: CN202011150444.5A
Authority: CN
Inventors: 杨宏达; 李国镇; 卢美奇; 李友增; 戚龙雨; 吴若溪
Original assignee: Beijing Didi Infinity Technology and Development Co Ltd
Current assignee: Beijing Didi Infinity Technology and Development Co Ltd
Priority date: 2020-10-23
Filing date: 2020-10-23
Publication date: 2022-03-04
Anticipated expiration: 2040-10-23
Also published as: CN112149640A

Abstract

The application provides a method, a device, computer equipment and a medium for determining the position of a target object, wherein the method comprises the following steps: determining a first candidate virtual position of a target object in a first frame of target image according to a first detection model trained in advance aiming at the first frame of target image in each group of target images; determining a first virtual position of a target object in a first frame of target image according to a pre-trained second detection model and the first candidate virtual position; determining a first actual position of the target object according to the first virtual position; according to the second frame of target images in each group of target images, determining a second candidate virtual position of the target object in the second frame of target images according to the first virtual position and a pre-trained prediction model; determining a second virtual position of the target object in the second frame target image according to the second candidate virtual position and the second frame target image; and determining a second actual position of the target object according to the second virtual position.

Description

Method, device, computer equipment and medium for determining position of target object

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a method, an apparatus, a computer device, and a medium for determining a position of a target object.

Background

With the rapid development of social economy, the living standard of people is continuously improved. The vehicle becomes the most convenient vehicle for people to go out, and automatic driving is gradually applied to the life of people; people attach importance to personal safety of people gradually, and intelligent security capable of providing help for personal safety is gradually applied to life of people. Whether automatic driving or intelligent security, the function of determining the position of a target object is required to be realized.

The method for determining the position of the target object basically identifies the target object from the acquired image, the method for determining the position of the target object at the present stage is generally realized by adopting a deep learning model with higher computational complexity, the deep learning model with higher computational complexity needs to be executed by equipment with higher performance, and an edge computing terminal is generally limited by power consumption and cost, has lower computational power and cannot be directly applied to a deep neural network model with intensive computation. The edge computing terminal applies a deep network model and needs to compress, prune and quantize the model, but the operations all cause the performance of the model to be reduced.

Disclosure of Invention

In view of the above, an object of the present application is to provide a method, an apparatus, a computer device and a medium for determining a position of a target object, which can improve the accuracy of a method for determining a position of a target object applied in an edge computing terminal in the prior art.

In a first aspect, an embodiment of the present application provides a method for determining a position of a target object, where the method is applied to a terminal device, and the method includes:

acquiring a plurality of groups of target images in a monitoring video, wherein each group of target images comprises at least two continuous frame target images;

aiming at a first frame of target image in each group of target images, determining a first candidate virtual position of a target object in the first frame of target image according to a first detection model trained in advance; determining a first virtual position of the target object in the first frame of target image according to a pre-trained second detection model and the first candidate virtual position; determining a first actual position of the target object according to the first virtual position;

for a second frame of target image in each group of target images, determining a second candidate virtual position of the target object in the second frame of target image according to the first virtual position and a pre-trained prediction model; determining a second virtual position of the target object in the second frame target image according to the second candidate virtual position and the second frame target image; and determining a second actual position of the target object according to the second virtual position.

Optionally, when each group of target images includes at least three consecutive target images, the method further includes:

and taking the third frame target image in each group of target images as a new second frame target image, taking the second virtual position as a new first virtual position, returning to the step of determining a new second candidate virtual position of the target object in the new second frame target image according to the new first virtual position and the prediction model until obtaining a new second virtual position corresponding to each frame target image except the first frame image and the second frame image in the group of target images and a new second actual position corresponding to each frame target image in the group of target images.

Optionally, the method further includes:

determining a third candidate virtual position of the target object in a first frame target image in a second group of target images according to a pre-trained first detection model aiming at the first frame target image in the second group of target images; determining a third virtual position of a target object in a first frame target image in a second group of target images according to the existence condition of the target object in a last frame target image in the first group of target images, a third candidate virtual position of the target object in the first frame target image in the second group of target images and a pre-trained second detection model; determining a third actual position of the target object according to the third virtual position;

and aiming at a second frame target image in the second group of target images, taking the third virtual position as a new first virtual position, and returning to the step of determining a new second candidate virtual position of the target object in the new second frame target image according to the new first virtual position and the prediction model.

Optionally, determining a third virtual position of the target object in the first frame target image in the second group of target images according to the existence of the target object in the last frame target image in the first group of target images, a third candidate virtual position of the target object in the first frame target image in the second group of target images, and a pre-trained second detection model, includes:

if a target object exists in a last frame of target image in the first group of target images, in the first frame of target image in the second group of target images, determining a fourth target image according to a second candidate virtual position of the target object in the last frame of target image in the first group of target images;

determining a third target image according to a third candidate virtual position of the target object in the first frame target image in the second group of target images;

if the intersection ratio of the areas of the third target image and the fourth target image is smaller than a preset threshold value, determining a third virtual position of a target object in a first frame target image in the second group of target images according to the fourth target image and a pre-trained second detection model;

and if the intersection ratio of the areas of the third target image and the fourth target image is greater than or equal to a preset threshold value, determining a third virtual position of the target object in a first frame of target image in the second group of target images according to the third target image and a pre-trained second detection model.

Optionally, when at least three consecutive groups of target images are included in the plurality of groups of target images, the method further includes:

and taking the third group of target images as a new second group of target images, taking the first frame of target images in the third group of target images as the first frame of target images in the new second group of target images, and returning to the step of determining the third candidate virtual position of the target object in the first frame of target images in the new second group of target images according to the pre-trained first detection model until obtaining a new second actual position corresponding to the last frame of image in the monitoring video.

Optionally, the determining a first candidate virtual position of the target object in the first frame of the target image according to the pre-trained first detection model includes:

determining each candidate virtual position of each object in the first frame target image and the classification probability of each object in each candidate virtual position according to the first frame target image and a first detection model;

and determining a first candidate virtual position of the target object in the first frame image based on the classification probability of each candidate virtual position.

Optionally, the determining a first candidate virtual position of the target object in the first frame image based on the classification probability of each candidate virtual position includes:

obtaining a plurality of candidate target objects in the first frame image based on the classification probability of each candidate virtual position;

determining the first candidate virtual position of the target object in the first frame image according to the candidate virtual positions of the candidate target objects in the first frame image.

Optionally, the determining a first virtual position of the target object in the first frame of the target image according to the pre-trained second detection model and the first candidate virtual position includes:

determining a first target image corresponding to the first candidate virtual position from the first frame image;

inputting the first target image into a second detection model trained in advance to obtain a first target virtual position of the target object in the first target image, wherein the first target virtual position is output by the second detection model;

and determining a first virtual position of the target object according to a first target virtual position of the target object in the first target image and a first candidate virtual position of the first target image in the first frame image.

Optionally, the method further includes:

inputting the first target image into the second detection model, if the second detection model does not output the first target virtual position of the target object in the first target image, regrouping the target images in the monitoring video, taking the second frame target image as a new first frame target image in the regrouped first group of target images, and returning to the step of determining the first candidate virtual position of the target object in the new first frame target image according to the pre-trained first detection model.

Optionally, the determining a second candidate virtual position of the target object in the second frame of the target image according to the first virtual position and a pre-trained prediction model includes:

and inputting the first virtual position into a pre-trained prediction model to obtain a second candidate virtual position of the target object in the second frame target image, which is output by the prediction model.

Optionally, the determining a second virtual position of the target object in the second frame target image according to the second candidate virtual position and the second frame target image includes:

determining a second target image corresponding to the second candidate virtual position from the second frame image;

inputting the second target image into the second detection model to obtain a second target virtual position of the target object output by the second detection model in the second target image;

and determining a second virtual position of the target object according to a second target virtual position of the target object in the second target image and a second candidate virtual position of the second target image in the second frame image.

Optionally, the method further includes:

inputting the second target image into the second detection model, if the second detection model does not output the second target virtual position of the target object in the second target image, regrouping the target images in the monitoring video, taking the second frame target image corresponding to the second target image as a new first frame target image in the regrouped first group of target images, and returning to the step of determining the first candidate virtual position of the target object in the new first frame target image according to the pre-trained first detection model.

In a second aspect, an embodiment of the present application provides a method for tracking a target object, including:

calculating the actual position of a target object in the obtained monitoring video according to the obtained monitoring video; the actual position is calculated by the method;

and tracking the target object according to the actual position of the target object.

In a third aspect, an embodiment of the present application provides an apparatus for determining a position of a target object, where the apparatus is applied to a terminal device, and the apparatus includes:

the system comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring a plurality of groups of target images in a monitoring video, and each group of target images comprises at least two continuous frames of target images;

the first actual position determining module is used for determining a first candidate virtual position of a target object in a first frame of target image in each group of target images according to a first detection model trained in advance; determining a first virtual position of the target object in the first frame of target image according to a pre-trained second detection model and the first candidate virtual position; determining a first actual position of the target object according to the first virtual position;

a second actual position determining module, configured to determine, for a second frame of target images in each group of target images, a second candidate virtual position of the target object in the second frame of target images according to the first virtual position and a pre-trained prediction model; determining a second virtual position of the target object in the second frame target image according to the second candidate virtual position and the second frame target image; and determining a second actual position of the target object according to the second virtual position.

Optionally, the apparatus further comprises:

and a new second actual position determining module, configured to use a third frame target image in each group of target images as a new second frame target image, use the second virtual position as a new first virtual position, and return to the step of determining a new second candidate virtual position of the target object in the new second frame target image according to the new first virtual position and the prediction model until obtaining a new second virtual position corresponding to each frame target image in the group of target images except the first frame image and the second frame image and a new second actual position corresponding to each frame target image in the group of target images.

Optionally, the apparatus further comprises:

a third actual position determining module, configured to determine, according to a pre-trained first detection model, a third candidate virtual position of the target object in a first frame of target images in a second group of target images; determining a third virtual position of a target object in a first frame target image in a second group of target images according to the existence condition of the target object in a last frame target image in the first group of target images, a third candidate virtual position of the target object in the first frame target image in the second group of target images and a pre-trained second detection model; determining a third actual position of the target object according to the third virtual position;

and a first returning module, configured to, for a second frame target image in the second group of target images, take the third virtual position as a new first virtual position, and return to the step of determining a new second candidate virtual position of the target object in the new second frame target image according to the new first virtual position and the prediction model.

Optionally, the third actual position determining module includes:

a fourth target image determining unit, configured to determine, in a first frame target image in the second group of target images, a fourth target image according to a second candidate virtual position of the target object in a last frame target image in the first group of target images, if the target object exists in the last frame target image in the first group of target images;

a third target image determining unit, configured to determine, in a first frame target image in the second group of target images, a third target image according to a third candidate virtual position of the target object in the first frame target image in the second group of target images;

a first determining unit, configured to determine, according to the fourth target image and a second detection model trained in advance, a third virtual position of the target object in a first frame target image in the second group of target images if an intersection ratio of areas between the third target image and the fourth target image is smaller than a preset threshold;

and the second determining unit is used for determining a third virtual position of the target object in the first frame target image in the second group of target images according to the third target image and a second detection model trained in advance if the intersection ratio of the areas of the third target image and the fourth target image is greater than or equal to a preset threshold value.

Optionally, the apparatus further comprises:

and the second returning module is used for taking the third group of target images as a new second group of target images, taking the first frame of target images in the third group of target images as the first frame of target images in the new second group of target images, and returning to the step of determining the third candidate virtual position of the target object in the first frame of target images in the new second group of target images according to the pre-trained first detection model until obtaining a new second actual position corresponding to the last frame of image in the monitoring video.

Optionally, the first actual position determining module includes:

the first calculation unit is used for determining each candidate virtual position of each object in the first frame target image and the classification probability of each object at each candidate virtual position according to the first frame target image and a first detection model;

a second determining unit, configured to determine a first candidate virtual position of the target object in the first frame image based on the classification probability of each candidate virtual position.

Optionally, the second determining unit includes:

a first press subunit, configured to obtain a plurality of candidate target objects in the first frame image based on the classification probability of each candidate virtual position;

a first determining subunit, configured to determine, according to respective candidate virtual positions of the multiple candidate target objects in the first frame image, the first candidate virtual position of the target object in the first frame image.

Optionally, the first actual position determining module includes:

the second calculating unit is used for determining a first target image corresponding to the first candidate virtual position from the first frame image;

the third calculation unit is used for inputting the first target image into a second detection model trained in advance to obtain a first target virtual position of the target object output by the second detection model in the first target image;

a fourth calculating unit, configured to determine the first virtual position of the target object according to the first target virtual position of the target object in the first target image and the first candidate virtual position of the first target image in the first frame image.

Optionally, the apparatus further comprises:

and a third returning module, configured to input the first target image into the second detection model, regroup the target images in the monitoring video if the second detection model does not output the first target virtual position of the target object in the first target image, use the second frame target image as a new first frame target image in the regrouped first group of target images, and return to the step of determining the first candidate virtual position of the target object in the new first frame target image according to the pre-trained first detection model.

Optionally, the second actual position determining module includes:

and the fifth calculating unit is used for inputting the first virtual position into a pre-trained prediction model to obtain a second candidate virtual position of the target object output by the prediction model in the second frame target image.

Optionally, the second actual position determining module includes:

a sixth calculating unit, configured to determine, from the second frame image, a second target image corresponding to the second candidate virtual position;

a seventh calculating unit, configured to input the second target image into the second detection model, and obtain a second target virtual position of the target object in the second target image, where the second target virtual position is output by the second detection model;

an eighth calculating unit, configured to determine a second virtual position of the target object according to a second target virtual position of the target object in the second target image and a second candidate virtual position of the second target image in the second frame image.

Optionally, the apparatus further comprises:

and a fourth returning module, configured to input the second target image into the second detection model, and if the second detection model does not output the second target virtual position of the target object in the second target image, regroup the target images in the monitoring video, take the second frame target image corresponding to the second target image as a new first frame target image in the regrouped first group of target images, and return to the step of determining the first candidate virtual position of the target object in the new first frame target image according to the pre-trained first detection model.

In a fourth aspect, an embodiment of the present application provides an electronic device, including: a processor, a storage medium and a bus, the storage medium storing machine-readable instructions executable by the processor, the processor and the storage medium communicating via the bus when the electronic device is operating, the processor executing the machine-readable instructions to perform the steps of the method.

In a fifth aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, performs the steps of the method.

The method for determining the position of the tracked object comprises the following steps of firstly, obtaining a plurality of groups of target images in a monitoring video, wherein each group of target images comprises at least two continuous frames of target images; then, aiming at a first frame of target image in each group of target images, determining a first candidate virtual position of a target object in the first frame of target image according to a first detection model trained in advance; determining a first virtual position of the target object in the first frame of target image according to a pre-trained second detection model and the first candidate virtual position; determining a first actual position of the target object according to the first virtual position; finally, for a second frame of target image in each group of target images, determining a second candidate virtual position of the target object in the second frame of target image according to the first virtual position and a pre-trained prediction model; determining a second virtual position of the target object in the second frame target image according to the second candidate virtual position and the second frame target image; and determining a second actual position of the target object according to the second virtual position.

In some embodiments, the first prediction model and the second prediction model are used for calculating the position of the target object in the first frame target image in the group of target images, instead of a large-scale deep learning model, so that the calculation density of the whole method is reduced, and the calculation efficiency is improved. And the second actual position is determined in the second frame image in a prediction mode without using the first detection model again, so that the calculation complexity of the whole algorithm is reduced, and the calculation efficiency of the whole method in the application process is improved. Of course, the method provided by the application has low calculation complexity, and can be applied to terminal equipment with low calculation power, not only to equipment with high calculation power, but also to the method for determining the position of the target object, which is equivalent to expanding the application range.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.

Fig. 1 is a flowchart illustrating a method for determining a position of a target object according to an embodiment of the present application;

fig. 2 is a schematic diagram illustrating an architecture of a service system provided in an embodiment of the present application;

fig. 3 is a schematic structural diagram illustrating an apparatus for determining a position of a target object according to an embodiment of the present application;

fig. 4 shows a schematic structural diagram of an electronic device provided in an embodiment of the present application.

Detailed Description

In order to make the purpose, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it should be understood that the drawings in the present application are for illustrative and descriptive purposes only and are not used to limit the scope of protection of the present application. Additionally, it should be understood that the schematic drawings are not necessarily drawn to scale. The flowcharts used in this application illustrate operations implemented according to some embodiments of the present application. It should be understood that the operations of the flow diagrams may be performed out of order, and steps without logical context may be performed in reverse order or simultaneously. One skilled in the art, under the guidance of this application, may add one or more other operations to, or remove one or more operations from, the flowchart.

In addition, the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.

It will be apparent to those skilled in the art that the general principles defined herein may be applied to other embodiments and applications without departing from the spirit and scope of the application. Although the present application is primarily described in the context of determining a tracked object location, it should be understood that this is merely one exemplary embodiment.

It should be noted that in the embodiments of the present application, the term "comprising" is used to indicate the presence of the features stated hereinafter, but does not exclude the addition of further features.

In the prior art, the way of determining the position of the target object is generally realized by a deep learning model, and a terminal device executing the deep learning model is basically a terminal device with a large calculation capacity, but the cost, the volume and the power consumption corresponding to the terminal device with the large calculation capacity are also large, so that the terminal device with the large calculation capacity is not suitable for being applied to the environment where the edge computing terminal is located. Along with the rapid development of social economy, functions such as automatic driving, intelligent security and the like are continuously applied to the life of people, and if people realize the functions such as intelligent security and the automatic driving through terminal equipment with high calculation capacity, certain troubles are brought to the life of people due to the influence of power consumption, volume and cost; if functions such as automatic driving, intelligent security and the like are applied to the edge computing terminal in order to facilitate life of people, the computing result is inaccurate due to the fact that the computing power of the edge computing terminal is small.

Based on the above drawbacks, in order to apply functions such as automatic driving and intelligent security to people's lives in a wide range, the functions such as automatic driving and intelligent security need to be implemented through an edge computing terminal, so the present application provides a method for determining a position of a target object, which can be applied to a terminal device, as shown in fig. 1, and includes:

s101, acquiring a plurality of groups of target images in a monitoring video, wherein each group of target images comprises at least two continuous frames of target images;

s102, aiming at a first frame of target image in each group of target images, determining a first candidate virtual position of a target object in the first frame of target image according to a first detection model trained in advance; determining a first virtual position of the target object in the first frame of target image according to a pre-trained second detection model and the first candidate virtual position; determining a first actual position of the target object according to the first virtual position;

s103, determining a second candidate virtual position of the target object in a second frame target image in each group of target images according to the first virtual position and a pre-trained prediction model; determining a second virtual position of the target object in the second frame target image according to the second candidate virtual position and the second frame target image; and determining a second actual position of the target object according to the second virtual position.

In the above method S101, the monitoring video refers to a video acquired by the terminal device, that is, a video that can be used to determine the position of the tracking object. Multiple frames of images may be included in the surveillance video, i.e., at least two frames of images are included in the surveillance video. Because the images in the surveillance video may include a plurality of frames, the surveillance video may be divided into a plurality of sets of target images, each set including at least two consecutive target images. The terminal device is a terminal with low cost, small volume and low power consumption, and is a terminal with small calculation force, namely an edge calculation terminal, and the terminal device can be a mobile phone, a vehicle-mounted device, a monitoring camera and the like.

In step S102, the frame images in each set of target images are continuous. The first detection model is a less-computationally intensive deep network model and can be applied to the edge computing terminal, and preferably, the first detection model can be a ShuffleNet V2-SSD model. The first candidate virtual position refers to a rectangular area that can cover the target object. The second detection model is used for determining a first virtual position of the target object according to the first candidate virtual position, wherein the first virtual position refers to a rectangular position corresponding to an edge area of the target object. The second detection model is a depth network model with low computational power, and can be applied to terminal equipment, and preferably, the second detection model can be an adjusted ShuffleNet model, a MobileNet model and a Resnet model. The second detection model can accurately determine the edge position of the target object on the picture with smaller resolution. The first actual position refers to a geographic position of the target object in the actual scene, for example, three-dimensional coordinates, longitude and latitude coordinates, and the like of the target object in the camera coordinate system.

In specific implementation, in a first frame target image in each group of target images, a rough position of a target object, that is, a first candidate virtual position, is determined in the first frame target image through a first detection model trained in advance, then an accurate position of the target object in the first frame target image, that is, a first virtual position, can be calculated through an image area corresponding to the first candidate virtual position in the first frame target image and a second detection model set in advance, and finally a first actual position of the target object can be calculated through the first virtual position. In the application, the accurate position of the target object in the actual scene can be calculated through two lightweight detection models, and the algorithm does not need a large calculation amount, so that the algorithm can be applied to terminal equipment. The method for determining the first actual position does not need to require the terminal equipment to have larger calculation performance, and the application range of the scheme is enlarged.

In step S103, the pre-trained prediction model is used to predict the position of the target object in the next frame of image according to the position of the target object in the previous frame of image. The prediction model is a Kalman filter, and four Kalman filters are used in the position prediction model. The first virtual position in this application is a rectangular area, and the first virtual position can be represented by coordinates of four vertices of the rectangular area. In the application, four Kalman filters are used in the prediction model. The four vertex coordinates of the first virtual position are input to the corresponding kalman filters, respectively, and the position of the first virtual position in the next frame, that is, the second candidate virtual position, can be estimated. A second actual position of the target object may be determined in the second frame of the target image using the second candidate virtual position. The position of the target object in the second frame target image is predicted only by using the prediction model in the second frame image, and the first detection model does not need to be used any more, because the calculation force of the prediction model is far smaller than that of the first detection model, which is equivalent to saving power consumption when calculating the second actual position.

In the three steps provided by the application, the position of the target object in the first frame target image in the group of target images is calculated by adopting the first detection model and the second detection model instead of a large-scale deep learning model, so that the calculation power of the whole method is reduced, and the calculation efficiency is improved. And the second actual position is determined in the second frame image in a prediction mode without using the first detection model again, so that the calculation power of the whole algorithm is reduced, and the calculation efficiency of the whole method in the application process is improved. Of course, the method provided by the application has small calculation power, and can be applied to terminal equipment with small calculation power, not only to equipment with large calculation power, but also to the expansion of the application range of determining the position of the target object.

The set of target images may include at least two frames of target images, that is, the set of images may include more than two frames of target images, and when the set of target images includes at least three consecutive target images, the method provided by the present application further includes:

and 104, taking the third frame target image in each group of target images as a new second frame target image, taking the second virtual position as a new first virtual position, and returning to the step of determining a new second candidate virtual position of the target object in the new second frame target image according to the new first virtual position and the prediction model until obtaining a new second virtual position corresponding to each frame target image except the first frame image and the second frame image in the group of target images and a new second actual position corresponding to each frame target image in the group of target images.

In step S104, the third frame target image in the target image may be directly replaced with a new second frame target image, and the second virtual position in the previous second frame target image is used as a new first virtual position, and the step S103 of determining a new second candidate virtual position of the target object in the new second frame target image according to the new first virtual position and the prediction model is performed, so that a new second actual position corresponding to the target object in the third frame target image may be calculated. Of course, if the group of images further includes the fourth frame image and the fifth frame image, the corresponding new second actual positions may also be calculated according to the execution manner of the third frame, that is, the new second actual positions corresponding to the other frame images in the group of images except the first frame target image and the second frame target image may be calculated through step 104.

In a group of target images, except that the first detection model is used for the first frame target image, the first detection model is not needed for other frame target images, so that the consumption corresponding to the first detection model can be reduced, and the calculation efficiency is improved.

Because there may be multiple groups of images in the surveillance video, and only the first group of target images is not preceded by the previous frame of target image, there is no detection result of the target image, but all the other groups of target images except the first group of target images have detection results of the previous frame of target image, the method provided by the present application further includes:

step 105, determining a third candidate virtual position of the target object in a first frame target image in a second group of target images according to a pre-trained first detection model aiming at the first frame target image in the second group of target images; determining a third virtual position of a target object in a first frame target image in a second group of target images according to the existence condition of the target object in a last frame target image in the first group of target images, a third candidate virtual position of the target object in the first frame target image in the second group of target images and a pre-trained second detection model; determining a third actual position of the target object according to the third virtual position;

and 106, regarding a second frame target image in the second group of target images, taking the third virtual position as a new first virtual position, and returning to the step of determining a new second candidate virtual position of the target object in the new second frame target image according to the new first virtual position and the prediction model until obtaining a new second virtual position corresponding to each frame target image except the first frame image and the second frame image in the group of target images and a new second actual position corresponding to each frame target image in the group of target images.

In step 105, the third virtual position candidate refers to a third virtual position candidate determined in the first frame target image by the first detection model, and the third virtual position candidate refers to a rectangular position corresponding to the edge region of the target object.

In specific implementation, the third candidate virtual position is determined in the first frame target image in the second group of target images through the first detection model. The first detection model is a model with lower accuracy, so the determined third virtual position candidate may be inaccurate, if there is a detection result of the target object in the last target image (i.e. the presence of the target object in the last target image in the first group of target images), the presence of the target object in the last target image in the first group of target images may be used and the third virtual position candidate may be determined in the first target image in the second group of target images by the first detection model, and then the third virtual position candidate may be determined by the second detection model and the third virtual position candidate, and further the third actual position of the target object may be determined by the third virtual position.

In step 106, the processing procedure for the second frame target image in the second group of target images is the same as the processing procedure for the second frame target image in the first group of target images, so the method may directly return to the step of determining a new second candidate virtual position of the target object in the new second frame target image according to the new first virtual position and the prediction model in step S103, and further calculate a new second actual position corresponding to the target object in the second frame target image in the second group of target images.

For a first frame target image of a second group of target images, which has a plurality of candidate target objects, but not all candidate target objects may be target objects, candidate target objects need to be screened, and in step 105, a third candidate virtual position of the target object in the first frame target image of the second group of target images is determined according to a first detection model trained in advance, including:

step 1051, determining a third candidate virtual position of the target object in the first frame target image in the second set of target images according to the existence condition of the target object in the last frame target image in the first set of target images and a pre-trained first detection model.

Specifically, step 1051 includes:

step 10511, determining a classification probability threshold according to the existence of the target object in the target image of the last frame in the first group of target images;

step 10512, determining candidate virtual positions of objects in the first frame of target images of the second group of target images and classification probabilities of the objects in the candidate virtual positions according to the first frame of target images and the first detection model of the second group of target images;

step 10513, determining a first candidate virtual position of the target object in the first frame of the target image in the second set of target images based on the classification probability of each candidate virtual position and the classification probability threshold.

In step 10511, the classification probability threshold is a threshold for screening candidate target objects, and candidate target objects having a high probability of being target objects are screened by the classification probability threshold. The existence situations of different target objects correspond to different screening probability thresholds, specifically, the method includes:

if the target object exists in the existence condition of the target object, determining a first classification probability threshold value;

and if the target object does not exist in the existence condition of the target object, determining a second screening classification probability threshold.

The first classification probability threshold is greater than the second classification probability threshold. For example, the first classification probability threshold is 0.6 and the second classification probability threshold is 0.4.

If the target object exists in the existence condition of the target object, which indicates that the target object may exist in the last target frame image, in order to improve the accuracy of determining the target object, a higher classification probability threshold is required to screen candidate target objects, so that the classification probability threshold is determined to be the first classification probability threshold. If the target object does not exist in the existence condition of the target object, it indicates that the target object may not exist in the previous target frame image, in order to determine the target object, a lower classification probability threshold is needed to screen the candidate target object, and the candidate target object is screened in the current frame target image as much as possible, so that the classification probability threshold is determined to be the second classification probability threshold.

In step 10512, the first frame of target images in the second group of target images is input into the first detection model, and the first detection model may output the classification probabilities corresponding to all candidate target objects in the first frame of target images in the second group of target images.

In the step 10513, the candidate target objects are screened according to the classification probability threshold, the candidate target objects larger than the classification probability threshold are screened, and the candidate target objects are screened according to the classification probability threshold, so that the number of the first candidate virtual positions input into the second detection model can be greatly reduced, and further, the calculation efficiency of the subsequent steps can be improved.

Certainly, the first candidate virtual position obtained by screening through the classification probability threshold may be one first candidate virtual position or a plurality of first candidate virtual positions, and if the first candidate virtual position is a plurality of first candidate virtual positions, in order to improve the accuracy of determining the target object, the first candidate virtual position corresponding to the target object may be determined in the following manner, including:

calculating the distance between the central position corresponding to each first candidate virtual position and the central position of the first frame target image in the second group of target images;

and determining a target object in the plurality of candidate target objects according to the distance between the central position corresponding to each first candidate virtual position and the central position of the first frame target image in the second group of target images.

In a specific implementation, one first candidate virtual position closest to the center position of the first frame target image in the second group of target images may be determined as the first candidate virtual position of the target object, and several (for example, 3) closest first candidate virtual positions may also be selected as the first candidate virtual positions of the target object.

In step 106, each group of target images has a second frame target image, or a third frame, a fourth frame, etc., so that the second frame target image of each group of target images can be used as a new second frame target image, the third virtual position is used as a new first virtual position, and the step of determining a new second candidate virtual position of the target object in the new second frame target image according to the new first virtual position and the prediction model is returned. And then, taking other frame target images (including the third frame target image) after the third frame target image as new second frame target images, and returning to the step of determining a new second candidate virtual position of the target object in the new second frame target images according to the new first virtual position and the prediction model, so that a second actual position corresponding to each frame target image can be obtained.

When processing the first frame of target image in the second group of target images, it is also necessary to consider the existence of the target object in the target image according to the last frame in the first group of target images, therefore, step 105 includes:

step 1052, if there is a target object in the last frame target image in the first group of target images, determining a fourth target image in the first frame target image in the second group of target images according to the second candidate virtual position of the target object in the last frame target image in the first group of target images;

step 1053, in the first frame target image in the second group of target images, determining a third target image according to a third candidate virtual position of the target object in the first frame target image in the second group of target images;

step 1054, if the intersection ratio of the areas of the third target image and the fourth target image is smaller than a preset threshold, determining a third virtual position of the target object in the first frame target image in the second group of target images according to the fourth target image and a pre-trained second detection model;

step 1055, if the intersection ratio of the areas of the third target image and the fourth target image is greater than or equal to a preset threshold, determining a third virtual position of the target object in the first frame target image in the second group of target images according to the third target image and a second detection model trained in advance.

In the above step 1052, according to the second candidate virtual position of the target object in the last frame target image in the first group of target images, a corresponding image is intercepted from the first frame target image in the second group of target images, so as to obtain a fourth target image.

In step 1053, a third target image is cut out from the first frame target image in the second group of target images according to the third candidate virtual position.

In step 1054, the area corresponding to the intersection region of the image region corresponding to the third target image and the image region corresponding to the fourth target image is taken as the first area, the area corresponding to the union region of the image region corresponding to the third target image and the image region corresponding to the fourth target image is taken as the second area, and the intersection ratio of the areas between the third target image and the fourth target image refers to the ratio between the first area and the second area.

Because the error of the first detection model is larger, and the accuracy of the determined second candidate virtual position is lower, in order to improve the accuracy of determining the position of the target object, the prediction result corresponding to the previous frame of target image can be directly used as the third candidate virtual position of the current frame of target image, and the prediction result corresponding to the previous frame of target image is predicted by the accurate result of the previous frame of target image, so that the accuracy is relatively higher. If the intersection ratio is smaller than the preset threshold, it is indicated that the difference between the third target image and the fourth target image is large, the prediction result corresponding to the previous frame of target image needs to be used as the input of the second detection model, and then the third virtual position of the target object in the first frame of target image in the second group of target images is calculated through the second candidate virtual position of the previous frame of target image and the second detection model.

In the step 1055, if the intersection ratio is greater than or equal to the preset threshold, it indicates that the difference between the third target image and the fourth target image is small, and the third virtual position of the target object in the first frame target image in the second group of target images may be calculated by directly using the third candidate virtual position in the current frame target image and the second detection model.

The monitoring video comprises a plurality of groups of target images, and a third group is arranged in addition to the second group. Fourth group, etc., so that each group of target images after the third group can be obtained by the following steps including:

and 107, taking the third group of target images as a new second group of target images, taking the first frame of target images in the third group of target images as the first frame of target images in the new second group of target images, and returning to the step of determining the third candidate virtual position of the target object in the first frame of target images in the new second group of target images according to the pre-trained first detection model until obtaining a new second actual position corresponding to the last frame of image in the monitoring video.

In step 107, it is equivalent to the step of taking the third group of target images as a new second group of target images and re-executing the actual position calculation corresponding to the second group of target images, that is, taking the first frame of target images in the third group of target images as the first frame of target images in the new second group of target images, and returning to the step of determining the third candidate virtual position of the target object in the first frame of target images in the new second group of target images according to the first detection model trained in advance. And only after the last frame of target image in the monitoring video obtains the actual position of the target object, the execution of the scheme is finished.

The first candidate virtual position is also determined by the classification probability, that is, in step S102, the determining the first candidate virtual position of the target object in the first frame of the target image according to the first detection model includes:

step 1021, determining each candidate virtual position of each object in the first frame image and the classification probability of each object at each candidate virtual position according to the first frame target image and a first detection model;

step 1022, determining a first candidate virtual position of the target object in the first frame image based on the classification probability of each candidate virtual position.

In step 1021, after the first frame target image is input into the first detection model, the first detection model outputs a candidate virtual position corresponding to each candidate target object in the first frame target image, and also outputs a classification probability corresponding to each candidate target object, so that each candidate virtual position also has a corresponding classification probability.

In the above step 1022, the first candidate virtual position is determined according to the classification probability, the candidate virtual position corresponding to the maximum classification probability may be determined as the first candidate virtual position, or the screening may be performed by using a classification probability threshold, because the first frame of target images of the first group of target images does not have target images before, the classification probability threshold in the step 1022 may be the second classification probability threshold mentioned above.

If a plurality of first candidate virtual positions may be obtained when the first candidate virtual positions are screened by the classification probability threshold, the distance between the center point of each first candidate virtual position and the position in the first frame target image may be used for determining, that is, one first candidate virtual position closest to the center point of the first frame target image is determined as the first candidate virtual position of the target object.

The first virtual position is a position of the target object with respect to the target image of the first frame, and thus is calculated by:

step 1023, determining a first target image corresponding to the first candidate virtual position from the first frame image;

step 1024, inputting the first target image into a second detection model trained in advance, so as to obtain a first target virtual position of the target object output by the second detection model in the first target image;

step 1025, determining a first virtual position of the target object according to a first target virtual position of the target object in the first target image and a first candidate virtual position of the first target image in the first frame image.

In step 1023, the first candidate virtual position is used to cut out the corresponding image area in the first frame image, so as to obtain the first target image.

In step 1024, the first target image is input into the second detection model trained in advance, and the position of the target object relative to the first target image, that is, the first target virtual position, is obtained.

In step 1025 above, the first target virtual position is the position of the target object relative to the first target image, and the first candidate virtual position is the position of the first target image relative to the first frame target image, so that the position of the target object relative to the first frame target image, that is, the first virtual position, can be calculated by position mapping.

Because the second detection model also screens the output result according to the classification probability in order to improve the accuracy of determining the position of the target object, the input of the first target image into the second detection model may not obtain the output result, and therefore, the application further includes:

step 108, inputting the first target image into the second detection model, regrouping the target images in the monitoring video if the second detection model does not output the first target virtual position of the target object in the first target image, taking the second frame target image as a new first frame target image in the regrouped first group of target images, and returning to the step of determining the first candidate virtual position of the target object in the new first frame target image according to the pre-trained first detection model.

In step 108, since the second detection model does not detect the position of the target object in the first frame target image, it may be determined that there is no target object in the first frame target image, the first frame target image may be discarded, and the next target image is determined as the first frame target image in the set of target images, but in this case, the number of frame target images in the set of target images may decrease, and in order to ensure that the number of frame target images in each set of target images is constant, the monitoring videos need to be regrouped, the first frame target image in the first set of target images after grouping is taken as a new first frame target image, and the step of determining the first candidate virtual position of the target object in the first frame target image according to the first detection model trained in advance is returned.

The second candidate virtual position is calculated from the first virtual position in the target image of the first frame and the prediction model, and represents a rough predicted position of the target object in the target image of the second frame. Therefore, the determining a second candidate virtual position of the target object in the second frame of the target image according to the first virtual position and the pre-trained prediction model in step S103 includes:

step 1031, inputting the first virtual position into a pre-trained prediction model, and obtaining a second candidate virtual position of the target object in the second frame target image output by the prediction model.

In step 1031, the first virtual position is a rectangular region and includes four vertex positions, so that the first virtual position is input into a pre-trained prediction model, that is, the four vertex positions corresponding to the first virtual position are input into the prediction model, the prediction model includes four kalman filters, and each vertex position is input into a corresponding kalman filter to obtain a prediction result, that is, the second candidate virtual position.

The second virtual position is the position of the target object relative to the second frame target image, and is therefore calculated by:

step 1032, determining a second target image corresponding to the second candidate virtual position from the second frame image;

step 1033, inputting the second target image into the second detection model, and obtaining a second target virtual position of the target object in the second target image, which is output by the second detection model;

step 1034, determining a second virtual position of the target object according to a second target virtual position of the target object in the second target image and a second candidate virtual position of the second target image in the second frame image.

In step 1032, the corresponding image area is cut out from the first frame image by using the first candidate virtual position, so as to obtain the first target image.

In step 1033, inputting the first target image into the second detection model trained in advance will obtain the position of the target object relative to the first target image, i.e. the first target virtual position.

In step 1034, the first target virtual position is the position of the target object relative to the first target image, and the first candidate virtual position is the position of the first target image relative to the first frame target image, so that the position of the target object relative to the first frame target image, that is, the first virtual position, can be calculated through the position mapping.

Because the second detection model also filters the output result according to the classification probability in order to improve the accuracy of determining the position of the target object, the output result may not be obtained when the second target image is input into the second detection model, and therefore, the application further includes:

step 109, inputting the second target image into the second detection model, regrouping the target images in the monitoring video if the second detection model does not output the second target virtual position of the target object in the second target image, taking the second frame target image corresponding to the second target image as a new first frame target image in the regrouped first group of target images, and returning to the step of determining the first candidate virtual position of the target object in the new first frame target image according to the pre-trained first detection model.

In step 109, since the second detection model does not detect the position of the target object in the second frame target image, it may be determined that there is no target object in the second frame target image, the second frame target image may be discarded, and the next target image is determined as the first frame target image in the set of target images, but in this case, the number of frame target images in the set of target images may decrease, and in order to ensure that the number of frame target images in each set of target images is constant, the monitoring videos need to be regrouped, the first frame target image in the first set of target images after grouping is taken as the new first frame target image, and the step of determining the first candidate virtual position of the target object in the first frame target image according to the first detection model trained in advance is returned.

Because the second detection model can only process images with fixed sizes, when a picture is input into the second detection model, the size of the picture needs to be scaled, and the scaling mode is determined by the input size of the picture corresponding to the second detection model.

The target object determining method provided by the present application can be applied to various scenes, and exemplarily, the present application provides a target object tracking method, which includes:

calculating the actual position of a target object in the obtained monitoring video according to the obtained monitoring video;

The actual position is calculated by the target object determination method provided by the application.

The Positioning technology used in the present application may be based on a Global Positioning System (GPS), a Global Navigation Satellite System (GLONASS), a COMPASS Navigation System (COMPASS), a galileo Positioning System, a Quasi-Zenith Satellite System (QZSS), a Wireless Fidelity (WiFi) Positioning technology, or the like, or any combination thereof. One or more of the above-described positioning systems may be used interchangeably in this application.

Fig. 2 is a schematic architecture diagram of a service system 100 according to an embodiment of the present application.

For example, the service system 100 may be an online transportation service platform for transportation services such as taxi cab, designated drive service, express, carpool, bus service, driver rental, or shift service, or any combination thereof. The service system 100 may include one or more of a server 110 (one of execution subjects of the methods provided herein), a network 120, a service request terminal 130 (authentication terminal), a service provider terminal 140 (network appointment), and a database 150, and the server 110 may include a processor for executing instructions.

In some embodiments, the server 110 may be a single server or a group of servers. The set of servers can be centralized or distributed (e.g., the servers 110 can be a distributed system). In some embodiments, the server 110 may be local or remote to the terminal. For example, the server 110 may access information and/or data stored in the service requester 130, the service provider 140, or the database 150, or any combination thereof, via the network 120. As another example, the server 110 may be directly connected to at least one of the service requester 130, the service provider 140, and the database 150 to access stored information and/or data. In some embodiments, the server 110 may be implemented on a cloud platform; by way of example only, the cloud platform may include a private cloud, a public cloud, a hybrid cloud, a community cloud (community cloud), a distributed cloud, an inter-cloud, a multi-cloud, and the like, or any combination thereof. In some embodiments, the server 110 may be implemented on an electronic device 1000 having one or more of the components shown in FIG. 4 in the present application.

Network 120 may be used for the exchange of information and/or data. In some embodiments, one or more components (e.g., server 110, service requester 130, service provider 140, and database 150) in service system 100 may send information and/or data to other components. For example, the server 110 may obtain a service request from the service requester 130 via the network 120. In some embodiments, the network 120 may be any type of wired or wireless network, or combination thereof. Merely by way of example, Network 120 may include a wired Network, a Wireless Network, a fiber optic Network, a telecommunications Network, an intranet, the internet, a Local Area Network (LAN), a Wide Area Network (WAN), a Wireless Local Area Network (WLAN), a WLAN, a Metropolitan Area Network (MAN), a Wide Area Network (WAN), a Public Switched Telephone Network (PSTN), a bluetooth Network, a ZigBee Network, a Near Field Communication (NFC) Network, or the like, or any combination thereof. In some embodiments, network 120 may include one or more network access points. For example, network 120 may include wired or wireless network access points, such as base stations and/or network switching nodes, through which one or more components of serving system 100 may connect to network 120 to exchange data and/or information.

In some embodiments, the service requestor of the service requestor 130 may be someone other than the actual demander of the service. For example, the service requester a of the service requester 130 may use the service requester 130 to initiate a service request for the actual service demander B (for example, the service requester a may call a car for its friend B), or receive service information or instructions from the server 110. In some embodiments, the service requester of the service provider 140 may be the actual service provider or another person other than the actual service provider. For example, the service requester C of the service provider 140 may use the service provider 140 to receive a service request serviced by the service provider D (e.g., the service requester C may order the service provider D employed by itself), and/or information or instructions from the server 110. In some embodiments, "service requestor" and "service requestor" may be used interchangeably, and "service provider" may be used interchangeably.

In some embodiments, the service requester 130 may include a mobile device, a tablet computer, a laptop computer, or a built-in device in a motor vehicle, etc., or any combination thereof. In some embodiments, the mobile device may include a smart home device, a wearable device, a smart mobile device, a virtual reality device, an augmented reality device, or the like, or any combination thereof. In some embodiments, the smart home devices may include smart lighting devices, control devices for smart electrical devices, smart monitoring devices, smart televisions, smart cameras, or walkie-talkies, or the like, or any combination thereof. In some embodiments, the wearable device may include a smart bracelet, a smart lace, smart glass, a smart helmet, a smart watch, a smart garment, a smart backpack, a smart accessory, and the like, or any combination thereof. In some embodiments, the smart mobile device may include a smartphone, a Personal Digital Assistant (PDA), a gaming device, a navigation device, or a point of sale (POS) device, or the like, or any combination thereof. In some embodiments, the virtual reality device and/or the augmented reality device may include a virtual reality helmet, virtual reality glass, a virtual reality patch, an augmented reality helmet, augmented reality glass, an augmented reality patch, or the like, or any combination thereof. For example, the virtual reality device and/or augmented reality device may include various virtual reality products and the like. In some embodiments, the built-in devices in the motor vehicle may include an on-board computer, an on-board television, and the like. In some embodiments, the service requester 130 may be a device having a location technology for locating the location of the service requester and/or the service requester.

In some embodiments, the service provider 140 may be a similar or the same device as the service requester 130. In some embodiments, the service provider 140 may be a device with location technology for locating the location of the service provider and/or the service provider. In some embodiments, the service requester 130 and/or the service provider 140 may communicate with other locating devices to determine the location of the service requester, the service requester 130, the service provider, or the service provider 140, or any combination thereof. In some embodiments, the service requester 130 and/or the service provider 140 may send the location information to the server 110.

Database 150 may store data and/or instructions. In some embodiments, the database 150 may store data obtained from the service requester 130 and/or the service provider 140. In some embodiments, database 150 may store data and/or instructions for the exemplary methods described herein. In some embodiments, database 150 may include mass storage, removable storage, volatile Read-write Memory, or Read-Only Memory (ROM), among others, or any combination thereof. By way of example, mass storage may include magnetic disks, optical disks, solid state drives, and the like; removable memory may include flash drives, floppy disks, optical disks, memory cards, zip disks, tapes, and the like; volatile read-write Memory may include Random Access Memory (RAM); the RAM may include Dynamic RAM (DRAM), Double data Rate Synchronous Dynamic RAM (DDR SDRAM); static RAM (SRAM), Thyristor-Based Random Access Memory (T-RAM), Zero-capacitor RAM (Zero-RAM), and the like. By way of example, ROMs may include Mask Read-Only memories (MROMs), Programmable ROMs (PROMs), Erasable Programmable ROMs (PERROMs), Electrically Erasable Programmable ROMs (EEPROMs), compact disk ROMs (CD-ROMs), digital versatile disks (ROMs), and the like. In some embodiments, database 150 may be implemented on a cloud platform. By way of example only, the cloud platform may include a private cloud, a public cloud, a hybrid cloud, a community cloud, a distributed cloud, across clouds, multiple clouds, or the like, or any combination thereof.

Based on the same inventive concept, a verification apparatus corresponding to the method for generating a virtual resource change notification is further provided in the embodiments of the present application, and since the principle of the apparatus in the embodiments of the present application for solving the problem is similar to the method for generating a virtual resource change notification described above in the embodiments of the present application, the implementation of the apparatus may refer to the implementation of the method, and repeated details are omitted.

An embodiment of the present application provides an apparatus for determining a position of a target object, as shown in fig. 3, and is applied to a terminal device, where the apparatus includes:

an obtaining module 301, configured to obtain multiple sets of target images in a monitoring video, where each set of target image includes at least two consecutive frames of target images;

a first actual position determining module 302, configured to determine, according to a first detection model trained in advance, a first candidate virtual position of a target object in a first frame of target image in each group of target images; determining a first virtual position of the target object in the first frame of target image according to a pre-trained second detection model and the first candidate virtual position; determining a first actual position of the target object according to the first virtual position;

a second actual position determining module 303, configured to determine, for a second frame of target images in each group of target images, a second candidate virtual position of the target object in the second frame of target images according to the first virtual position and a pre-trained prediction model; determining a second virtual position of the target object in the second frame target image according to the second candidate virtual position and the second frame target image; and determining a second actual position of the target object according to the second virtual position.

Optionally, the apparatus further comprises:

a third actual position determining module, configured to determine, according to a pre-trained first detection model, a third candidate virtual position of the target object in a first frame of target images in a second group of target images; determining a third virtual position of the target object in the first frame target image in the second group of target images according to the existence condition of the target object in the last frame target image in the first group of target images, a third candidate virtual position of the target object in the first frame target image in the second group of target images and a pre-trained second detection model; determining a third actual position of the target object according to the third virtual position;

Optionally, the third actual position determining module includes:

Optionally, the apparatus further comprises:

Optionally, the first actual position determining module includes:

Optionally, the second determining unit includes:

Optionally, the first actual position determining module includes:

Optionally, the apparatus further comprises:

Optionally, the second actual position determining module includes:

Optionally, the apparatus further comprises:

a fourth returning module, configured to input the second target image into the second detection model, and if the second detection model does not output the second target virtual position of the target object in the second target image, regroup the target images in the monitoring video, take the second frame target image corresponding to the second target image as the first frame target image in the regrouped new first group of target images, and return to the step of newly determining the first candidate virtual position of the target object in the first frame target image according to the pre-trained first detection model.

As shown in fig. 4, a schematic view of an electronic device provided in an embodiment of the present application, the electronic device 1000 includes: the electronic device comprises a processor 1001, a memory 1002 and a bus 1003, wherein the memory 1002 stores execution instructions, when the electronic device runs, the processor 1001 and the memory 1002 communicate through the bus 1003, and the processor 1001 executes the steps of the determination method for determining the position of the target object stored in the memory 1002.

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the system and the apparatus described above may refer to corresponding processes in the method embodiments, and are not described in detail in this application. In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. The above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical division, and there may be other divisions in actual implementation, and for example, a plurality of modules or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or modules through some communication interfaces, and may be in an electrical, mechanical or other form.

The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a U disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method for determining the position of a target object is applied to a terminal device, and the method comprises the following steps:

2. The method of claim 1, wherein when each set of target images includes at least three consecutive target images, the method further comprises:

3. The method of claim 1, wherein the determining a first candidate virtual position of a target object in a first frame of target image in each group of target images according to a first detection model trained in advance comprises:

determining a third candidate virtual position of a target object in a first frame target image in a second group of target images through a pre-trained first detection model according to the existence condition of the target object in a last frame target image in the last group of target images aiming at the first frame target image in the second group of target images; wherein the third candidate virtual position is the first candidate virtual position determined in consideration of the presence of the target object in the last frame of the target image in the set of target images on the second set of target images.

4. The method of claim 3, wherein determining the first virtual position of the target object in the first frame of the target image according to the pre-trained second detection model and the first candidate virtual position comprises:

for a first frame target image in a second group of target images, if a target object exists in a last frame target image in a last group of target images, in the first frame target image in the second group of target images, intercepting a corresponding image according to a second candidate virtual position of the target object in the last frame target image in the last group of target images to obtain a fourth target image;

in a first frame target image in the second group of target images, intercepting a corresponding image according to a third candidate virtual position of the target object in the first frame target image in the second group of target images to obtain a determined third target image;

if the intersection ratio of the areas of the third target image and the fourth target image is greater than or equal to a preset threshold value, determining a third virtual position of a target object in a first frame of target image in the second group of target images according to the third target image and a pre-trained second detection model; wherein the third virtual position is a first virtual position determined in consideration of the presence of the target object in the last frame of the target image in the set of target images on the second set of target images.

5. The method according to claim 3, wherein when at least three consecutive groups of target images are included in the plurality of groups of target images, the method further comprises:

and taking the third group of target images as a new second group of target images, taking the first frame of target images in the third group of target images as the first frame of target images in the new second group of target images, returning to the step of determining the third candidate virtual position of the target object in the first frame of target images in the second group of target images through a pre-trained first detection model according to the existence condition of the target object in the last frame of target images in the previous group of target images until obtaining a new second actual position corresponding to the last frame of image in the monitored video.

6. The method of claim 1, wherein determining a first candidate virtual position of a target object in the first frame of target image according to a first pre-trained detection model comprises:

7. The method of claim 6, wherein determining the first candidate virtual position of the target object in the first frame image based on the classification probability of each candidate virtual position comprises:

and determining the first candidate virtual position of the target object in the first frame image according to the distance between the central position corresponding to each candidate virtual position of the candidate target objects in the first frame image and the central position of the first frame image.

8. The method of claim 1, wherein determining the first virtual position of the target object in the first frame of the target image according to the pre-trained second detection model and the first candidate virtual position comprises:

9. The method of claim 8, further comprising:

10. The method of claim 1, wherein determining a second candidate virtual position of the target object in the second frame of the target image according to the first virtual position and a pre-trained prediction model comprises:

11. The method of claim 1, wherein determining the second virtual position of the target object in the second frame target image based on the second candidate virtual position and the second frame target image comprises:

12. The method of claim 11, further comprising:

13. A method for tracking a target object, comprising:

calculating the actual position of a target object in the obtained monitoring video according to the obtained monitoring video; the actual position is calculated by the method according to any of the claims 1-12;

14. An apparatus for determining a position of a target object, the apparatus being applied to a terminal device, the apparatus comprising:

15. An electronic device, comprising: a processor, a storage medium and a bus, the storage medium storing machine-readable instructions executable by the processor, the processor and the storage medium communicating via the bus when the electronic device is operating, the processor executing the machine-readable instructions to perform the steps of the method according to any one of claims 1 to 12 or 13.

16. A computer-readable storage medium, having stored thereon a computer program which, when being executed by a processor, is adapted to carry out the steps of the method according to any one of claims 1 to 12 or 13.