CN112819889B

CN112819889B - Method and device for determining position information, storage medium and electronic device

Info

Publication number: CN112819889B
Application number: CN202011627158.3A
Authority: CN
Inventors: 马梦园; 伍敏
Original assignee: Zhejiang Dahua Technology Co Ltd
Current assignee: Zhejiang Dahua Technology Co Ltd
Priority date: 2020-12-30
Filing date: 2020-12-30
Publication date: 2024-05-10
Anticipated expiration: 2040-12-30
Also published as: CN112819889A

Abstract

The embodiment of the invention provides a method and a device for determining position information, a storage medium and an electronic device, wherein the method comprises the following steps: extracting M characteristic points from the N frame image and detecting first position information of an abnormal object; matching the M feature points with P feature points obtained from the K frame image to obtain matched feature point pairs; and determining a conversion matrix between the Nth frame image and the Kth frame image by using the matched characteristic point pairs so as to acquire second position information of the abnormal object detected in the Nth frame image in the Kth frame image. The invention solves the problem of inaccurate detection of the abnormal object and achieves the effect of accurately detecting the abnormal object.

Description

Method and device for determining position information, storage medium and electronic device

Technical Field

The embodiment of the invention relates to the field of images, in particular to a method and a device for determining position information, a storage medium and an electronic device.

Background

In public security places such as subway, high-speed railway, the intelligent contraband detecting system that is adapted to X-ray security inspection machine has become popular gradually, and it mainly divides into intelligent detection and intelligent display two big modules. The intelligent detection module is responsible for detecting forbidden articles in the picture and reporting the categories and the current positions of the forbidden articles. The intelligent display module is responsible for calculating the movement track of the contraband and drawing the detection frame of the contraband into the security inspection picture in real time. The display module needs to avoid phenomena such as flash, shake, offset and the like of the article frame, and the phenomena can cause bad experiences such as visual interference, visual fatigue and the like of security inspectors. The frame movement shake and the zoom operation of the security inspector are all difficulties of the display module.

Aiming at the problem of continuous positioning of abnormal objects in subsequent frames in the prior art, no effective solution has been proposed in the related art.

Disclosure of Invention

The embodiment of the invention provides a method and a device for determining position information, a storage medium and an electronic device, which are used for at least solving the problem of continuous positioning of an abnormal object in a subsequent frame in the related technology.

According to an embodiment of the present invention, there is provided a method of determining location information, including: extracting M feature points and first position information of an abnormal object from an N-th frame image, wherein M and N are natural numbers greater than or equal to 1, and the first position information comprises the position of the abnormal object in the N-th frame image; matching the M characteristic points with P characteristic points obtained from a K frame image to obtain matched characteristic point pairs, wherein K and P are natural numbers greater than or equal to 1; and determining a conversion matrix between the Nth frame image and the Kth frame image by using the matched characteristic point pairs so as to acquire second position information of the abnormal object detected in the Nth frame image in the Kth frame image, wherein the second position information comprises the position of the abnormal object in the Kth frame image.

According to another embodiment of the present invention, there is provided a position information determining apparatus including: a first extraction module, configured to extract M feature points from an nth frame image and first position information of an abnormal object detected, where M and N are natural numbers greater than or equal to 1, and the first position information includes a position of the abnormal object in the nth frame image; the first matching module is used for matching the M characteristic points with P characteristic points obtained from a K frame image to obtain matched characteristic point pairs, wherein K and P are natural numbers which are greater than or equal to 1; a first determining module, configured to determine a transition matrix between the nth frame image and the kth frame image by using the matched pair of feature points, so as to obtain second position information of the abnormal object detected in the nth frame image in the kth frame image, where the second position information includes a position of the abnormal object in the kth frame image.

In an exemplary embodiment, the above apparatus further includes: a second determining module, configured to determine, before extracting M feature points from an nth frame image and detecting first location information of an abnormal object, a location of the abnormal object in the nth frame image, where the obtained nth frame image of a target device includes the abnormal object, to obtain the first location information; and the first storage module is used for storing the first position information into the abnormal object set.

In an exemplary embodiment, the first extraction module includes: a first determining unit, configured to determine a pixel difference between each pixel point and an adjacent pixel point in the nth frame image, to obtain Q pixel differences, where Q is a natural number greater than 1; and a second determining unit, configured to determine, as the M feature points, a pixel point corresponding to a pixel difference greater than a first preset threshold value from the Q pixel differences.

In an exemplary embodiment, the first extraction module includes: a third determining unit, configured to determine, by using a full convolution layer in a convolution network, reliability of each pixel point in the nth frame image and repeatability of each pixel point in the nth frame image; a fourth determining unit configured to determine a plurality of salient points from among the pixel points in the nth frame image using the reliability of each pixel point and the repeatability of each pixel point; and a fifth determining unit configured to determine the M feature points from the plurality of salient points by non-maximum suppression.

In an exemplary embodiment, the above apparatus further includes: and a third determining module, configured to determine a feature descriptor corresponding to each of the M feature points after extracting the M feature points from the nth frame image, where the feature descriptor is used to represent a feature of each of the M feature points.

In an exemplary embodiment, the first matching module includes: a first calculation unit configured to calculate a hamming distance between each of the M feature points and each of the P feature points; a sixth determining unit configured to determine a similarity between each of the M feature points and each of the P feature points based on the hamming distance; and a seventh determining unit configured to determine, as the matched feature point pair, a feature point having the similarity greater than a second preset threshold.

In an exemplary embodiment, the first matching module includes: a second calculation unit configured to calculate a euclidean distance between each of the M feature points and each of the P feature points; and the first traversing unit is used for traversing the Euclidean distance between each characteristic point in the M characteristic points and each characteristic point in the P characteristic points to obtain the matched characteristic point pair.

In an exemplary embodiment, the first determining module includes: a first extraction unit for extracting a sample feature point pair from the matched feature point pair; an eighth determining unit configured to determine an offset and a scaling of the sample feature point pair, where the offset includes a horizontal offset and a vertical offset; a third calculation unit configured to calculate a sample conversion matrix of the sample feature point pair using the offset, the scaling, the first position information, and the second position information; a ninth determining unit configured to determine projection errors between other feature point pairs of the matched feature point pairs and the sample conversion matrix; and a first iteration unit configured to iterate the sample conversion matrix through the projection error to determine a conversion matrix between the nth frame image and the kth frame image.

In an exemplary embodiment, the second determining module includes: a first prediction unit, configured to predict position information of the abnormal object in a current frame image by using the transformation matrix, so as to obtain predicted position information; a first drawing unit configured to draw the predicted position information into the current frame image so as to draw the second position information of the abnormal object in the current frame image.

According to a further embodiment of the invention, there is also provided a computer readable storage medium having stored therein a computer program, wherein the computer program is arranged to perform the steps of any of the method embodiments described above when run.

According to a further embodiment of the invention, there is also provided an electronic device comprising a memory having stored therein a computer program and a processor arranged to run the computer program to perform the steps of any of the method embodiments described above.

According to the invention, M characteristic points are extracted from an N-th frame image, and first position information of an abnormal object is detected, wherein M and N are natural numbers which are greater than or equal to 1, and the first position information comprises the position of the abnormal object in the N-th frame image; matching M characteristic points with P characteristic points obtained from a K frame image to obtain matched characteristic point pairs, wherein K and P are natural numbers greater than or equal to 1; and determining a conversion matrix between the Nth frame image and the Kth frame image by utilizing the matched characteristic point pairs so as to acquire second position information of the abnormal object detected in the Nth frame image in the Kth frame image, wherein the second position information comprises the position of the abnormal object in the Kth frame image. Therefore, the problem of continuous positioning of the abnormal object in the subsequent frame can be solved, and the effect of accurately determining the continuous positioning of the abnormal object in the subsequent frame is achieved.

Drawings

Fig. 1 is a block diagram of a hardware configuration of a mobile terminal according to a method for determining location information according to an embodiment of the present invention;

FIG. 2 is a flow chart of a method of determining location information according to an embodiment of the present invention;

FIG. 3 is a flow chart of dangerous goods detection tracking in a zoom scenario according to an embodiment of the present invention;

FIG. 4 is a diagram of an R2D2 network architecture according to an embodiment of the present invention;

FIG. 5 is a feature point matching graph according to an embodiment of the present invention;

FIG. 6 is a schematic drawing of tracking of items in a zoom scene according to an embodiment of the invention;

fig. 7 is a block diagram of a configuration of a position information determining apparatus according to an embodiment of the present invention.

Detailed Description

Embodiments of the present invention will be described in detail below with reference to the accompanying drawings in conjunction with the embodiments.

It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order.

The method embodiments provided in the embodiments of the present application may be performed in a mobile terminal, a computer terminal or similar computing device. Taking the mobile terminal as an example, fig. 1 is a block diagram of a hardware structure of the mobile terminal according to a method for determining location information according to an embodiment of the present application. As shown in fig. 1, a mobile terminal may include one or more (only one is shown in fig. 1) processors 102 (the processor 102 may include, but is not limited to, a microprocessor MCU or a processing device such as a programmable logic device FPGA) and a memory 104 for storing data, wherein the mobile terminal may also include a transmission device 106 for communication functions and an input-output device 108. It will be appreciated by those skilled in the art that the structure shown in fig. 1 is merely illustrative and not limiting of the structure of the mobile terminal described above. For example, the mobile terminal may also include more or fewer components than shown in fig. 1, or have a different configuration than shown in fig. 1.

The memory 104 may be used to store a computer program, for example, a software program of application software and a module, such as a computer program corresponding to a method for determining location information in an embodiment of the present invention, and the processor 102 executes the computer program stored in the memory 104 to perform various functional applications and data processing, that is, to implement the above-mentioned method. Memory 104 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory remotely located relative to the processor 102, which may be connected to the mobile terminal via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The transmission means 106 is arranged to receive or transmit data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the mobile terminal. In one example, the transmission device 106 includes a network adapter (Network Interface Controller, simply referred to as a NIC) that can connect to other network devices through a base station to communicate with the internet. In one example, the transmission device 106 may be a Radio Frequency (RF) module, which is used to communicate with the internet wirelessly.

In this embodiment, a method for determining location information is provided, fig. 2 is a flowchart of a method for determining location information according to an embodiment of the present invention, and as shown in fig. 2, the flowchart includes the following steps:

Step S202, extracting M feature points and first position information of an abnormal object from an Nth frame image, wherein M and N are natural numbers greater than or equal to 1, and the first position information comprises the position of the abnormal object in the Nth frame image;

Step S204, matching M characteristic points with P characteristic points obtained from a K frame image to obtain matched characteristic point pairs, wherein K and P are natural numbers greater than or equal to 1;

Step S206, determining a conversion matrix between the Nth frame image and the Kth frame image by using the matched characteristic point pairs so as to acquire second position information of the abnormal object detected in the Nth frame image in the Kth frame image, wherein the second position information comprises the position of the abnormal object in the Kth frame image.

The main execution body of the above steps may be a server, but is not limited thereto.

In this embodiment, the abnormal object includes, but is not limited to, contraband, such as drugs, knives, lighters, etc.

In the present embodiment, the extracted feature points may be in the abnormal object screen region or in other regions of the image.

The embodiment is applied to a scene of detecting abnormal objects, for example, a scene of detecting contraband by an X-ray security inspection machine in public security places such as subways, high-speed rails and the like.

In the present embodiment, the nth frame image and the kth frame image include, but are not limited to, images of adjacent two frames. The feature points of the abnormal object include features such as shape, size, and the like of the abnormal object. The first position information includes coordinates of the abnormal object in the nth frame image. The second position information includes coordinates of the abnormal object in the K-th frame image. The K-th frame image may be the current frame.

Through the steps, M characteristic points are extracted from the Nth frame image, and first position information of an abnormal object is detected, wherein M and N are natural numbers which are larger than or equal to 1, and the first position information comprises the position of the abnormal object in the Nth frame image; matching M characteristic points with P characteristic points obtained from a K frame image to obtain matched characteristic point pairs, wherein K and P are natural numbers greater than or equal to 1; and determining a conversion matrix between the Nth frame image and the Kth frame image by utilizing the matched characteristic point pairs so as to acquire second position information of the abnormal object detected in the Nth frame image in the Kth frame image, wherein the second position information comprises the position of the abnormal object in the Kth frame image. Therefore, the problem of continuous positioning of the abnormal object in the subsequent frame can be solved, and the effect of accurately determining the continuous positioning of the abnormal object in the subsequent frame is achieved.

In an exemplary embodiment, before extracting M feature points and the first position information of the abnormal object from the nth frame image, the method further includes:

S1, determining the position of an abnormal object in an N frame image of acquired target equipment under the condition that the N frame image of the target equipment comprises the abnormal object, and obtaining first position information;

And S3, storing the first position information into an abnormal object set.

In the present embodiment, the target device includes, but is not limited to, a device that detects an abnormal object, for example, an X-ray security inspection machine.

In this embodiment, after detecting the abnormal object, coordinates of the abnormal object in the image may be stored in the abnormal object set.

In one exemplary embodiment, extracting M feature points from an nth frame image includes:

s1, determining pixel differences between each pixel point and adjacent pixel points in an N frame image to obtain Q pixel differences, wherein Q is a natural number greater than 1;

S2, determining the pixel points corresponding to the pixel differences larger than the first preset threshold value in the Q pixel differences as M characteristic points.

In this embodiment, the pixel difference between each pixel point and surrounding pixel points is calculated first, and the number of pixels exceeding a certain threshold is counted. The number of surrounding points with large pixel difference exceeds a certain threshold value to obtain M characteristic points, and calculating response values corresponding to the M characteristic points to obtain a significant value. In order to eliminate the interference of the neighborhood pixels, the optimal feature point can be obtained by suppressing the non-maximum significant value.

S1, determining the reliability of each pixel point in an N frame image and the repeatability of each pixel point in the N frame image through a full convolution layer in a convolution network;

s2, determining a plurality of salient points from the pixel points in the N frame image by utilizing the reliability of each pixel point and the repeatability of each pixel point;

S3, determining M characteristic points from the salient points through non-maximum suppression.

In this embodiment, the feature point location and the descriptor thereof may be extracted using a trained R2D2 convolutional network. The network realizes the prediction of the salient feature points at the pixel level through the connection of the full convolution layer, the network output consists of three parts, namely a reliability, a repeatability and a 128-dimensional descriptor, the salient points are preselected finally according to the reliability and the repeatability, and the optimal feature points are obtained through non-maximum inhibition.

In an exemplary embodiment, after extracting M feature points from the nth frame image, the method further includes:

s1, determining a feature descriptor corresponding to each of M feature points, wherein the feature descriptor is used for representing the feature of each of the M feature points.

In this embodiment, the feature descriptors may determine feature similarities between pixel points.

In an exemplary embodiment, matching M feature points with P feature points acquired in a kth frame image to obtain a matched feature point pair includes:

S1, calculating the Hamming distance between each of M feature points and each of P feature points;

s2, determining the similarity between each feature point in the M feature points and each feature point in the P feature points based on the Hamming distance;

and S3, determining the feature points with similarity larger than a second preset threshold value as matched feature point pairs.

In this embodiment, the hamming distance may be used to measure the similarity of two feature points.

S1, calculating Euclidean distance between each feature point in M feature points and each feature point in P feature points;

and S2, traversing Euclidean distance between each feature point in the M feature points and each feature point in the P feature points to obtain matched feature point pairs.

In this embodiment, the feature points may be matched using euclidean distances.

In one exemplary embodiment, determining a transition matrix between an nth frame image and a kth frame image using matched pairs of feature points includes:

s1, extracting sample feature point pairs from matched feature point pairs;

S2, determining offset and scaling of sample feature point pairs, wherein the offset comprises horizontal offset and vertical offset;

S3, calculating a sample conversion matrix of the sample characteristic point pairs by using the offset, the scaling, the first position information and the second position information;

S4, determining projection errors between other feature point pairs in the matched feature point pairs and the sample conversion matrix;

S5, iterating the sample conversion matrix through projection errors to determine a conversion matrix between the Nth frame image and the Kth frame image.

In this embodiment, in the operation of the security inspection frame, the related frame conversion mainly includes frame translation and scaling, and the scaling is generally equal-scale scaling, and assuming that coordinates of an object (feature point) in T and t+1 frames are (x, y) and (x ', y'), respectively, the conversion relationship can be expressed by the following formula:

Where h ₂、h₃ is the horizontal and vertical offset, respectively, and h ₁ is the scaling. The RANSAC algorithm is used to calculate the transformation matrix, and since the equation contains three unknown variables, a pair of matching points can yield 2 sets of equations. In the execution process, 2 non-collinear samples are randomly extracted, and a conversion matrix H is obtained through calculation. And calculating projection errors of other matching points and H, continuously iterating to generate an optimal matching inner point set and a conversion matrix, removing the wrong matching, and finally obtaining the conversion matrix between the front frame and the rear frame.

In one exemplary embodiment, after determining a transition matrix between the nth frame image and the kth frame image using the matched pairs of feature points, the method includes:

s1, predicting position information of an abnormal object in a current frame image by using a conversion matrix to obtain predicted position information;

and S2, drawing the predicted position information into the current frame image so as to draw second position information of the abnormal object in the current frame image.

In this embodiment, after obtaining the predicted position information, the coordinates of the abnormal object may be updated, and the coordinates may be drawn into the current frame image.

The invention is illustrated below with reference to specific examples:

The embodiment provides an X-ray dangerous goods target tracking scheme based on image registration, which can realize goods tracking in a zoom scene and improve the problems of losing a tracking frame, shifting, inconsistent movement and the like in a security inspection display module.

In order to adapt to the view of forbidden articles in a picture scaling scene, the embodiment adopts an image registration technology to estimate a picture movement conversion matrix H, so that drawing of an article frame in a security inspection picture is realized. As shown in fig. 3, the method comprises the following steps:

Step 1: and the intelligent detection module is used for detecting the input image to obtain new contraband. The method comprises the steps of detecting by using a trained X-ray dangerous goods detection model based on a cellular neural network (CNN Cellular Neural Network, CNN for short), wherein an adopted CNN detection algorithm comprises SSD, yolov3 and the like, obtaining n dangerous goods in a T frame picture I ^T, obtaining a position coordinate S _i(x_min,y_min,x_max,y_max (i=1, …, n) of the dangerous goods, and updating the position coordinate S _i(x_min,y_min,x_max,y_max to a contraband library of a current frame picture S _T.

In view of the detection time-consuming factor, there are two implementation modes for detecting the input data and invoking the model in step 1:

Mode 1: the pictures are detected at intervals of a certain frame number, and the detected articles in the T frame and the articles in the S _T-1 frame are combined into S _T through a Network MANAGEMENT SYSTEM (NMS for short). The number of interval frames is determined by the algorithm time consumption and the speed of the left and right movement of the picture.

Mode 2: and adding a package detection module, monitoring whether a new package appears, intercepting a new package image for contraband detection, and directly updating the detected article to S _T.

Step 2: effective feature points in the picture and feature descriptors thereof are detected. The method mainly detects some remarkable points in the picture, such as corner points, local extreme points and the like. In view of the detection speed, there are two specific implementation modes for feature point extraction in step 2: firstly, adopting an object request agent (Object Request Broker, ORB) method to extract characteristic points rapidly; secondly, a deep full convolution neural network is adopted to extract richer feature points and descriptors.

And (3) quick feature point extraction: because the ORB is essentially extracted by the FSAT angular point and does not have scale invariance, the scheme firstly constructs a scale space in a downsampling mode, detects on different scale layers and returns the coordinates to the original scale. The method comprises the steps of firstly calculating pixel differences between pixel points and surrounding pixels, and counting the number of the pixel differences exceeding a certain threshold value. The number of surrounding points with large pixel difference exceeds a certain threshold value to be a preselected characteristic point, and a corresponding response value, namely a significant value, is calculated. In order to eliminate the interference of the neighborhood pixels, the optimal feature points are obtained through non-maximum significant value suppression.

And after the feature points are obtained, extracting BRIEF binary feature descriptors on the corresponding layers. The specific implementation is as follows, two points A, B are randomly selected from the neighborhood of the feature point, and the following oscillometric function is defined, where p (a) represents the sum of pixels in the window of point a 5x 5:

the BRIEF descriptor is:

g_n(p,θ)：＝∑_1≤i≤n2^i-1τ(p;A_i,B_i)；

In order to promote the rotation invariance, the feature descriptor is rotated to the gray centroid direction, and the centroid direction angle theta is calculated as follows:

depth feature point extraction: in the embodiment, the feature point positions and the descriptors thereof are directly extracted by using the trained R2D2 convolution network. The network realizes the prediction of the salient feature points at the pixel level through the connection of full convolution, the network output consists of three parts, namely a reliability, repeatability and 128-dimensional descriptor, the salient points are preselected finally according to the reliability and the repeatability, and the optimal feature points are obtained through non-maximum inhibition. The network structure is schematically shown in fig. 4.

Step 3: and (3) matching the characteristic points of the front and rear frames obtained in the step (2) to find matched characteristic point pairs. Assuming that feature points a (a _i, i e 1, 128) extracted at T frames are taken at t+1 frames to B (B _i, i e 1, 128).

When the characteristic point quick extraction method is used, the Hamming distance is adopted to measure the similarity of two characteristic points, and the formula is as follows:

When the feature points extracted by the depth method are matched, the Euclidean distance is utilized, and the formula is as follows:

Each feature point of the previous and subsequent frames is traversed to obtain the best match, as shown in fig. 5.

Step 4: and (3) calculating the conversion matrix of the front frame and the rear frame according to the matched point pair set in the step (3). In the operation of the security inspection picture, the related picture conversion mainly includes picture translation and scaling, and the scaling is generally equal-proportion scaling, and assuming that coordinates of an object (feature point) in T and t+1 frames are (x, y and (x ', y'), respectively, the conversion relationship can be expressed by the following formula:

Step 5: updating dangerous goods coordinates and drawing the dangerous goods coordinates into a security check picture. And updating the coordinates of the articles in S _T to S _T+1 according to the transformation matrix H of the T and T+1 frames obtained in the step 4, and eliminating out-of-range articles. And (3) updating the newly added dangerous goods in the T+1 frame acquired in the step (1) to S _T+1, finally acquiring a dangerous goods set of the current frame image, and drawing the dangerous goods set into a picture. The bounding box tracking trajectory for the zoomed scene is plotted as shown in fig. 6 below.

In summary, the present embodiment does not require a preset region of interest, and tracking is irrelevant to detection performance. The method has the advantages that the region of interest does not need to be preset, and the integral conversion matrix is predicted through detection and matching of key feature points of the front frame and the rear frame, so that the method adapts to the change of the scale. And predicting the spatial conversion of the whole picture, wherein the change of each article frame in the picture is wholly consistent, and single objects are not tracked respectively, so that the sensory experience is improved.

From the description of the above embodiments, it will be clear to a person skilled in the art that the method according to the above embodiments may be implemented by means of software plus the necessary general hardware platform, but of course also by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method according to the embodiments of the present invention.

The embodiment also provides a device for determining position information, which is used for implementing the above embodiment and the preferred implementation, and is not described in detail. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. While the means described in the following embodiments are preferably implemented in software, implementation in hardware, or a combination of software and hardware, is also possible and contemplated.

Fig. 7 is a block diagram of a location information determining apparatus according to an embodiment of the present invention, as shown in fig. 7, including:

a first extraction module 72, configured to extract M feature points from an nth frame image and first position information of an abnormal object detected, where M and N are natural numbers greater than or equal to 1, and the first position information includes a position of the abnormal object in the nth frame image;

A first matching module 74, configured to match the M feature points with P feature points obtained from the kth frame image, to obtain a matched feature point pair, where the K and the P are natural numbers greater than or equal to 1;

A first determining module 76, configured to determine a transformation matrix between the nth frame image and the kth frame image by using the matched pair of feature points, so as to obtain second position information of the abnormal object detected in the nth frame image in the kth frame image, where the second position information includes a position of the abnormal object in the kth frame image.

It should be noted that each of the above modules may be implemented by software or hardware, and for the latter, it may be implemented by, but not limited to: the modules are all located in the same processor; or the above modules may be located in different processors in any combination.

Embodiments of the present invention also provide a computer readable storage medium having a computer program stored therein, wherein the computer program is arranged to perform the steps of any of the method embodiments described above when run.

In the present embodiment, the above-described computer-readable storage medium may be configured to store a computer program for performing the steps of:

S1, extracting M characteristic points and first position information of an abnormal object from an Nth frame image, wherein M and N are natural numbers greater than or equal to 1, and the first position information comprises the position of the abnormal object in the Nth frame image;

S2, matching M characteristic points with P characteristic points obtained from a K frame image to obtain matched characteristic point pairs, wherein K and P are natural numbers greater than or equal to 1;

s3, determining a conversion matrix between the Nth frame image and the Kth frame image by utilizing the matched characteristic point pairs so as to acquire second position information of the abnormal object detected in the Nth frame image in the Kth frame image, wherein the second position information comprises the position of the abnormal object in the Kth frame image.

In one exemplary embodiment, the computer readable storage medium may include, but is not limited to: a usb disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory RAM), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing a computer program.

An embodiment of the invention also provides an electronic device comprising a memory having stored therein a computer program and a processor arranged to run the computer program to perform the steps of any of the method embodiments described above.

In an exemplary embodiment, the electronic apparatus may further include a transmission device connected to the processor, and an input/output device connected to the processor.

In an exemplary embodiment, the above-mentioned processor may be arranged to perform the following steps by means of a computer program:

Specific examples in this embodiment may refer to the examples described in the foregoing embodiments and the exemplary implementation, and this embodiment is not described herein.

It will be appreciated by those skilled in the art that the modules or steps of the invention described above may be implemented in a general purpose computing device, they may be concentrated on a single computing device, or distributed across a network of computing devices, they may be implemented in program code executable by computing devices, so that they may be stored in a storage device for execution by computing devices, and in some cases, the steps shown or described may be performed in a different order than that shown or described herein, or they may be separately fabricated into individual integrated circuit modules, or multiple modules or steps of them may be fabricated into a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.

The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A method for determining location information, comprising:

Extracting M feature points and first position information of an abnormal object from an N-th frame image, wherein M and N are natural numbers greater than or equal to 1, and the first position information comprises the position of the abnormal object in the N-th frame image;

Matching the M characteristic points with P characteristic points obtained from a K frame image to obtain matched characteristic point pairs, wherein the K and the P are natural numbers which are greater than or equal to 1;

Determining a conversion matrix between the Nth frame image and the Kth frame image by utilizing the matched characteristic point pairs so as to acquire second position information of the abnormal object detected in the Nth frame image in the Kth frame image, wherein the second position information comprises the position of the abnormal object in the Kth frame image;

The extracting M feature points from the nth frame image includes: determining pixel differences between each pixel point and adjacent pixel points in the N-th frame image to obtain Q pixel differences, wherein Q is a natural number greater than 1; determining pixel points corresponding to pixel differences larger than a first preset threshold value in the Q pixel differences as the M characteristic points; determining the reliability of each pixel point in the N frame image and the repeatability of each pixel point in the N frame image through a full convolution layer in a convolution network; determining a plurality of salient points from the pixel points in the N frame image by utilizing the reliability of each pixel point and the repeatability of each pixel point; determining the M feature points from the plurality of salient points by non-maximum suppression; after the M feature points are extracted from the nth frame image, the method further includes: and determining a feature descriptor corresponding to each of the M feature points, wherein the feature descriptor is used for representing the feature of each of the M feature points.

2. The method according to claim 1, wherein before extracting M feature points from the nth frame image and detecting the first position information of the abnormal object, the method further comprises:

Determining the position of the abnormal object in the N frame image under the condition that the obtained N frame image of the target equipment comprises the abnormal object, and obtaining the first position information;

And storing the first position information into an abnormal object set.

3. The method according to claim 1, wherein matching the M feature points with P feature points acquired in the kth frame image to obtain a matched feature point pair includes:

Calculating the Hamming distance between each of the M feature points and each of the P feature points;

Determining the similarity between each of the M feature points and each of the P feature points based on the Hamming distance;

And determining the feature points with the similarity larger than a second preset threshold value as the matched feature point pairs.

4. The method according to claim 1, wherein matching the M feature points with P feature points acquired in the kth frame image to obtain a matched feature point pair includes:

calculating Euclidean distance between each of the M feature points and each of the P feature points;

traversing the Euclidean distance between each feature point in the M feature points and each feature point in the P feature points to obtain the matched feature point pair.

5. The method of claim 1, wherein determining a transition matrix between the nth frame image and the kth frame image using the matched pairs of feature points comprises:

Extracting sample feature point pairs from the matched feature point pairs;

Determining an offset and a scaling of the sample feature point pairs, wherein the offset comprises a horizontal offset and a vertical offset;

Calculating a sample transformation matrix of the sample feature point pairs using the offset, the scaling, the first position information, and the second position information;

determining projection errors between other characteristic point pairs in the matched characteristic point pairs and the sample conversion matrix;

Iterating the sample transformation matrix through the projection errors to determine a transformation matrix between the nth frame image and the kth frame image.

6. The method of claim 1, wherein after determining a transition matrix between the nth frame image and the kth frame image using the matched pairs of feature points, the method further comprises:

predicting the position information of the abnormal object in the current frame image by using the conversion matrix to obtain predicted position information;

And drawing the predicted position information into the current frame image to draw the second position information of the abnormal object in the current frame image.

7. A position information determining apparatus, comprising:

A first extraction module, configured to extract M feature points from an nth frame image and first position information of an abnormal object detected, where M and N are natural numbers greater than or equal to 1, and the first position information includes a position of the abnormal object in the nth frame image;

The first matching module is used for matching the M characteristic points with P characteristic points obtained from a K frame image to obtain matched characteristic point pairs, wherein the K and the P are natural numbers which are greater than or equal to 1;

A first determining module, configured to determine a conversion matrix between the nth frame image and the kth frame image by using the matched feature point pair, so as to obtain second position information of the abnormal object detected in the nth frame image in the kth frame image, where the second position information includes a position of the abnormal object in the kth frame image;

The first extraction module includes: a first determining unit, configured to determine a pixel difference between each pixel point and an adjacent pixel point in the nth frame image, to obtain Q pixel differences, where Q is a natural number greater than 1; a second determining unit, configured to determine, as the M feature points, a pixel point corresponding to a pixel difference greater than a first preset threshold from the Q pixel differences; a third determining unit, configured to determine, by using a full convolution layer in a convolution network, reliability of each pixel point in the nth frame image and repeatability of each pixel point in the nth frame image; a fourth determining unit configured to determine a plurality of salient points from among the pixels in the nth frame image using the reliability of each pixel and the repeatability of each pixel; a fifth determining unit configured to determine the M feature points from the plurality of salient points by non-maximum suppression; the apparatus further comprises: and a third determining module, configured to determine a feature descriptor corresponding to each of the M feature points after extracting the M feature points from the nth frame image, where the feature descriptor is used to represent a feature of each of the M feature points.

8. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein a computer program, wherein the computer program is arranged to execute the method of any of the claims 1 to 6 when run.

9. An electronic device comprising a memory and a processor, characterized in that the memory has stored therein a computer program, the processor being arranged to run the computer program to perform the method of any of the claims 1 to 6.