CN113902905A

CN113902905A - Image processing method and device and electronic equipment

Info

Publication number: CN113902905A
Application number: CN202111183783.8A
Authority: CN
Inventors: 安容巧
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-10-11
Filing date: 2021-10-11
Publication date: 2022-01-07

Abstract

The disclosure provides an image processing method, an image processing device and electronic equipment, and relates to the technical field of artificial intelligence such as computer vision and deep learning. The specific implementation scheme is as follows: when the front end guides the user to shoot the image, a first feature vector corresponding to a captured first image and a second feature vector corresponding to a second image shot before the first image is captured can be obtained first, the two feature vectors are matched to obtain a projection matrix of the second image relative to the first image, and then on the basis of the projection matrix, a target marking frame is determined jointly by combining the first image and a third image captured before the first image, so that the user can be accurately guided to shoot the captured image through the target marking frame, and the occurrence of serious jitter of the marking frame is reduced in the front end guiding process.

Description

Image processing method and device and electronic equipment

Technical Field

The present disclosure relates to the field of image processing technologies, and in particular, to the field of artificial intelligence technologies such as computer vision and deep learning, and in particular, to an image processing method and apparatus, and an electronic device.

Background

In the off-line fast-moving market, it is generally necessary to take images of a long shelf and analyze data of the long shelf based on the taken images.

Since the shooting terminal can only shoot a part of the long shelf and needs to shoot multiple frames of images for splicing in order to restore the whole long shelf, how to guide the user to shoot the images is very important in order to accurately splice two adjacent frames of images.

Disclosure of Invention

The present disclosure provides an image processing method, apparatus and electronic device, which can accurately guide a user to capture a captured image through a target mark frame, and reduce the occurrence of severe jitter of the mark frame in a front end guiding process.

According to a first aspect of the present disclosure, there is provided an image processing method, which may include:

acquiring a first feature vector corresponding to a captured first image and a second feature vector corresponding to a second image shot before the first image is captured;

matching the first characteristic vector with the second characteristic vector to obtain a projection matrix of the second image relative to the first image;

determining a target mark frame corresponding to an overlapping area of the first image relative to the second image according to the first image, a third image captured before the first image and the projection matrix; wherein the first image and the third image are two adjacent frames of images.

According to a second aspect of the present disclosure, there is provided an image processing apparatus, which may include:

the image processing apparatus includes an acquisition unit configured to acquire a first feature vector corresponding to a captured first image and a second feature vector corresponding to a second image captured before the first image is captured.

And the matching unit is used for matching the first characteristic vector with the second characteristic vector to obtain a projection matrix of the second image relative to the first image.

A processing unit, configured to determine, according to the first image, a third image captured before the first image, and the projection matrix, a target mark frame corresponding to an overlapping area of the first image with respect to the second image; wherein the first image and the third image are two adjacent frames of images.

According to a third aspect of the present disclosure, there is provided an electronic device, which may include:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the image processing method of the first aspect.

According to a fourth aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to execute the image processing method of the first aspect.

According to a fifth aspect of the present disclosure, there is provided a computer program product comprising: a computer program, stored in a readable storage medium, from which at least one processor of an electronic device can read the computer program, execution of the computer program by the at least one processor causing the electronic device to perform the image processing method of the first aspect.

According to the technical scheme of the disclosure, the user can be accurately guided to shoot the captured image through the target mark frame, and in the front end guiding process, the occurrence of serious jitter of the mark frame is reduced.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

fig. 1 is a schematic flow chart of an image processing method provided according to a first embodiment of the present disclosure;

FIG. 2 is a schematic diagram of a target mark box provided by an embodiment of the present disclosure;

fig. 3 is a flowchart illustrating a method for determining a target mark box corresponding to an overlapping area according to a third embodiment of the present disclosure;

fig. 4 is a schematic flowchart of a method for stitching a fourth image and a second image obtained by performing an operation according to a fourth embodiment of the present disclosure;

fig. 5 is a schematic diagram illustrating a region corresponding to a target mark frame is divided into an upper region and a lower region according to an embodiment of the present disclosure;

fig. 6 is a schematic diagram illustrating a region corresponding to a target mark frame is divided into a left region and a right region according to an embodiment of the present disclosure;

fig. 7 is a schematic configuration diagram of an image processing apparatus provided according to a fifth embodiment of the present disclosure;

fig. 8 is a schematic block diagram of an electronic device provided by an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In embodiments of the present disclosure, "at least one" means one or more, "a plurality" means two or more. "and/or" describes the access relationship of the associated object, meaning that there may be three relationships, e.g., A and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone, wherein A and B can be singular or plural. In the description of the text of the present disclosure, the character "/" generally indicates that the former and latter associated objects are in an "or" relationship. In addition, in the embodiments of the present disclosure, "first", "second", "third", "fourth", "fifth", and "sixth" are only used to distinguish the contents of different objects, and have no other special meaning.

The technical scheme provided by the embodiment of the disclosure can be applied to scenes such as image processing, image recognition and the like. In the off-line fast-moving market, it is generally necessary to take images of a long shelf and analyze data of the long shelf based on the taken images. However, the shooting terminal can only shoot a part of the long shelf, and to restore the whole long shelf, two adjacent shot images need to be spliced, which needs to be supported by a splicing technology.

It can be understood that when two adjacent images are stitched by the stitching technique, an overlapping region needs to exist in the two adjacent images, and the size of the overlapping region may affect the stitching of the two adjacent images. In general, if the overlapping area is small, the splicing effect of two adjacent frames of images is poor; if the overlap area is large, the number of acquired images is large, resulting in a large amount of image data processing. Based on this, in order to guide the user to shoot the next frame of image, front end guidance is usually required, that is, the front end marks a mark frame in the captured image to guide the user to shoot the captured image through the mark frame, so that the shot next image can be accurately spliced with the shot previous image subsequently.

The shape of the mark frame may be a regular shape or an irregular shape, and may be specifically set according to actual needs, where, as for the shape of the mark frame, the embodiment of the present disclosure is not limited.

However, in an actual shooting scene, due to the fact that the long shelf is close to the aisle or the light of the environment where the long shelf is located is poor, the problem that the mark frame shakes seriously in the front-end guiding process occurs, and therefore a user cannot be guided to shoot a next frame of image.

In order to accurately guide a user to shoot a captured image and reduce the occurrence of serious jitter of a mark frame in a front-end guiding process, it may be considered that a projection matrix of a previous image with respect to a currently captured image is obtained by matching a feature vector corresponding to the currently captured image with a feature vector corresponding to a previous image that has been shot before the image was captured, and the mark frame is determined jointly on the basis of the projection matrix in combination with the currently captured image and the image captured before the image was captured, so that the user can be accurately guided to shoot the captured image through a target mark frame and the occurrence of serious jitter of the mark frame in the front-end guiding process is reduced.

Based on the above technical concept, embodiments of the present disclosure provide an image processing method, which will be described in detail below by specific embodiments. It is to be understood that the following detailed description may be combined with other embodiments, and that the same or similar concepts or processes may not be repeated in some embodiments.

Example one

Fig. 1 is a flowchart illustrating an image processing method according to a first embodiment of the present disclosure, which may be performed by software and/or a hardware device, for example, a terminal or a server. For example, referring to fig. 1, the image processing method may include:

s101, acquiring a first feature vector corresponding to a captured first image and a second feature vector corresponding to a second image shot before the first image is captured.

The captured image may be understood as an image captured by the camera during the photographing process, but an image of which the photographing instruction is not currently executed, such as the first image or the third image referred to in this application; the captured image may be understood as an image based on the captured image and having executed the capturing instruction during the photographing process of the camera, such as the second image or the fourth image referred to in this application.

After the first feature vector corresponding to the first image and the second feature vector corresponding to the second image are obtained, respectively, the first feature vector and the second feature vector may be matched, that is, the following S102 is executed:

and S102, matching the first characteristic vector with the second characteristic vector to obtain a projection matrix of the second image relative to the first image.

For example, the projection matrix may be a homography matrix, or may be other projection matrices, and may be specifically set according to actual needs, and here, the embodiment of the disclosure is only described by taking the homography matrix as an example, but does not represent that the embodiment of the disclosure is limited thereto.

Taking a homography matrix as an example, for example, an affinity bestoff 2nearest match method may be adopted to match the first feature vector and the second feature vector, and find the best matching point between the second image and the first image.

For example, when determining the best matching point between the second image and the first image, the above may be determined based on a ratio between distances of the feature descriptors, and if the ratio is greater than a threshold, a pair of matching points with the smallest distance may be determined as the best matching point.

After the projection matrix is determined, the user cannot be guided to shoot the captured image due to the problem that the mark frame is severely shaken when guided by the front end in the actual shooting scene. Therefore, in order to reduce the occurrence of serious jitter of the mark frame, so that the front-end guide can accurately guide the user to shoot the captured image, in the embodiment of the disclosure, after the projection matrix of the second image relative to the first image is obtained, unlike the prior art, the mark frame is not determined directly according to the projection matrix; instead, on the basis of the projection matrix, the first image and a third image captured before the first image are combined to determine the mark frame, i.e., the following S103 is performed:

s103, determining a target mark frame corresponding to the overlapping area of the first image relative to the second image according to the first image, a third image captured before the first image and the projection matrix.

For example, as shown in fig. 2, fig. 2 is a schematic diagram of a target mark frame provided by an embodiment of the present disclosure, where a left side diagram in fig. 2 is a second image taken before a first image is captured, a right side diagram in fig. 2 is a captured first image, and a dotted line area in the right side diagram is a target mark frame corresponding to an overlapping area of the first image with respect to the second image.

It can be seen that, in the embodiment of the present disclosure, when the front end guides the user to shoot an image, a first feature vector corresponding to a captured first image and a second feature vector corresponding to a second image captured before the first image is captured may be obtained first, and the two feature vectors are matched to obtain a projection matrix of the second image relative to the first image, and then on the basis of the projection matrix, the target mark frame is determined in combination with the first image and a third image captured before the first image, so that the user may be accurately guided to shoot the captured image through the target mark frame, and in the front end guiding process, occurrence of severe jitter of the mark frame is reduced.

Based on the embodiment shown in fig. 1, in order to facilitate understanding how to acquire the first feature vector corresponding to the captured first image and the second feature vector corresponding to the second image captured before the first image is captured in S101, the following will take the acquisition of the first feature vector corresponding to the first image as an example, and will describe in detail how to acquire the first feature vector corresponding to the first image and the second feature vector corresponding to the second image, and refer to the following second embodiment

Example two

For example, when acquiring the first feature vector corresponding to the first image, at least three possible implementations may be included:

in a possible implementation manner, in view that a conventional feature extraction algorithm can better extract gradient information, position information, and detail information in an image, a feature vector corresponding to a first image may be extracted by using the conventional feature extraction algorithm, and the feature vector is used as a first feature vector corresponding to the first image.

In this possible implementation manner, taking a conventional feature extraction algorithm as an accelerated-up robust features (SURF) algorithm as an example, a SURF algorithm may be used to extract a feature vector corresponding to the first image, and determine the feature vector as a first feature vector corresponding to the first image. For example, the resolution of the first image may be 240 × 320, and the extracted feature vector corresponding to the first image is a 256-dimensional feature vector, so that the extracted first feature vector has strong gradient information, more location information, and more detail information, and can better describe the first image. It should be noted that, the embodiment of the present disclosure is only described by taking the conventional feature extraction algorithm as the SURF algorithm as an example, but does not represent that the embodiment of the present disclosure is limited thereto.

In another possible implementation manner, since the depth feature extraction model can better extract semantic features of the image, a feature vector corresponding to the first image may be extracted by using the depth feature extraction model, and the feature vector is used as the first feature vector corresponding to the first image.

For example, a deep feature extraction model may be based on a lightweight feature extraction network, such as the MobileNetv3_ small _ x0_35 model, and model training using the paddlepaddlewheel framework; and distilling in imageNet, and determining the distilled model as a pre-training depth feature extraction model. In order to improve the learning capability and generalization property of the pre-training depth feature extraction model, the pre-training depth feature extraction model can be trained by adopting a metric learning method in combination with the current application scene, and in the training process, whether the two images are the same or not can be judged as a constraint, a contrast loss (comparative loss) is adopted as a target constraint in the training process, and the features of the images are learned.

For example, the input of the depth feature extraction model may be an image with a resolution of 224 × 224, and the feature vector corresponding to the extracted first image may be a feature vector with 256 dimensions, so that the extracted first feature vector has a strong semantic feature, for example, when the image processing method provided by the embodiment of the present disclosure is applied to a shooting scene with a long shelf, gradient change information of a repeated texture in the shooting scene may be better extracted through the depth feature extraction model, and the first image may be better described. It should be noted that, the embodiments of the present disclosure are only described by taking a lightweight feature extraction network as an example, but do not represent that the embodiments of the present disclosure are limited thereto.

In another possible implementation manner, in order to better extract the gradient information, the position information, and the detail information in the image and the semantic features of the image, a feature extraction model and a conventional feature extraction algorithm may be used to extract the feature vector corresponding to the first image together. When extracting the feature vector, a depth feature extraction model can be adopted to extract a first initial feature vector corresponding to the first image; extracting a second initial feature vector corresponding to the first image by adopting a feature extraction algorithm; and then performing feature fusion on the first initial feature vector and the second initial feature vector to obtain a first feature vector.

It can be understood that the operations of extracting the first initial feature vector and the second initial feature vector are not in a sequential order, and the extraction of the first initial feature vector may be performed first, and then the extraction of the second initial feature vector may be performed; or the extraction of the second initial feature vector can be executed firstly, and then the extraction of the first initial feature vector can be executed; the extraction of the first initial feature vector and the second initial feature vector may also be performed simultaneously, and may be specifically set according to actual needs.

When the first initial feature vector and the second initial feature vector are subjected to feature fusion, for example, a common splicing method can be adopted to perform feature fusion on the first initial feature vector and the second initial feature vector, and the fused feature vector is determined as a first feature vector corresponding to the first image, so that the extracted first feature vector has stronger gradient information, more position information and detail information, and the first image can be better described by having better semantic features.

It should be noted that, when the first feature vector corresponding to the first image is obtained, the embodiment of the disclosure is only described by taking the above three possible implementation manners as examples, but the embodiment of the disclosure is not limited thereto. For example, in the embodiment of the present disclosure, a third possible implementation manner may be adopted to extract a first feature vector corresponding to a first image.

It can be understood that, when a second feature vector corresponding to a second image is obtained, an obtaining manner of the second feature vector is similar to that of the first feature vector, and reference may be made to related descriptions of three possible implementation manners of the first feature vector, and here, details of the embodiment of the disclosure are not repeated. It should be noted that, in the embodiment of the present disclosure, if the second feature vector corresponding to the second image has been extracted and stored before, the second feature vector corresponding to the second image may be directly searched based on the pre-storage, and the extraction is not required to be performed through the above three possible implementation manners, so that feature extraction may be avoided from being performed for multiple times, a time delay is effectively reduced, and it is ensured that there is no delay or stuck phenomenon in front end guidance.

It should be noted that after the first feature vector corresponding to the first image is acquired, the first feature vector corresponding to the first image may be stored, so that when the first feature vector corresponding to the first image needs to be acquired later, the first feature vector corresponding to the first image may be directly searched based on pre-storage, and extraction through the three possible implementation manners is not required, so that feature extraction may be avoided from being performed for multiple times, a time delay is effectively reduced, and it is ensured that no delay or stuck phenomenon occurs in front-end guidance.

Based on the above-mentioned first embodiment or the second embodiment, in the practical application process, there may be an interface where the camera rapidly moves out and there is no overlapping area with the second image, and then, the camera may be moved to the interface where there is an overlapping area with the second image, and when the camera rapidly moves out and there is no overlapping area with the second image, the photographing instruction of the first image cannot be executed, so that there is no need to acquire the first feature vector corresponding to the first image, therefore, when acquiring the first feature vector corresponding to the first image, for the edge area corresponding to the moving direction of the camera, for example, a Kernel Correlation Filter (KCF) method may be adopted to determine whether there is a target object with the largest gaussian response in the first image, and the target object is continuously tracked until the target object disappears and disappears into the first image again, the target object in the first image is determined, and the first image can be taken as a photographing frame to a certain extent, so that the first feature vector corresponding to the first image can be obtained, the problem that the first feature vector corresponding to the first image can still be obtained when the camera rapidly moves out of an interface without any overlapping area with the second image can be effectively solved, and the data processing amount caused by invalid operation is reduced.

Based on any of the above embodiments, in order to facilitate understanding of how to determine the target mark frame corresponding to the overlapping area of the first image with respect to the second image according to the first image, the third image captured before the first image, and the projection matrix in the embodiments of the present disclosure, a detailed description will be made by an embodiment three shown in fig. 3 below.

EXAMPLE III

Fig. 3 is a flowchart illustrating a method for determining a target mark frame corresponding to an overlap area according to a third embodiment of the present disclosure, where the image processing method may be executed by software and/or a hardware device, for example, the hardware device may be a terminal or a server. For example, referring to fig. 3, the method for determining the target mark box corresponding to the overlap area may include:

s301, according to the projection matrix, determining an initial mark frame corresponding to the overlapping area of the first image relative to the second image.

It should be noted that, when determining the initial mark frame corresponding to the overlapping area of the first image with respect to the second image according to the projection matrix, reference may be made to the existing related technology for determining the initial mark frame based on the projection matrix, and here, details of the embodiment of the present disclosure are not repeated.

After determining the initial mark frame corresponding to the overlapping region of the first image with respect to the second image according to the projection matrix, in the embodiment of the present disclosure, unlike the prior art, instead of directly determining the initial mark frame as the final target mark frame, it is determined whether the projection matrix satisfies the preset condition based on the first image and the third image captured before the first image, and the final target mark frame is determined in combination with the determination result, that is, the following S302 and S303 are performed.

S302, determining whether the projection matrix meets a preset condition based on the third image and the initial mark frame.

For example, when determining whether the projection matrix meets the preset condition based on the third image and the initial mark frame, the corner detection may be performed on the third image, and the first vector moving distance may be determined according to a plurality of detected corners; determining a second vector moving distance according to the boundary point in the initial mark frame and the third image; and then determining whether the projection matrix meets a preset condition according to the first vector movement distance and the second vector movement distance, so that the problem of smooth shaking of the marking frame can be solved to a certain extent when a final target marking frame is determined by subsequently combining a judgment result by adopting an angular point detection technology, the occurrence of serious shaking of the marking frame is effectively reduced, and a user can be accurately guided to shoot an image through the marking frame.

In the following, detailed description is first given to how to determine the first vector movement distance according to the detected multiple corner points, and how to determine the second vector movement distance according to the boundary points in the initial mark frame and the third image.

For example, when determining the first vector movement distance according to the detected multiple corner points, optical flow tracking may be performed on the multiple corner points to obtain multiple target optical flow points; then, aiming at each target light stream point, determining a first key point corresponding to the target light stream point in the third image; determining the vector movement distance of the target optical flow point relative to the first key point, so that the vector movement distance of each target optical flow point relative to the corresponding first key point in the third image in the plurality of target optical flow points can be respectively determined, and a plurality of vector movement distances can be obtained; and determining a first vector movement distance according to the plurality of vector movement distances.

For example, when optical flow tracking is performed on a plurality of corner points, an existing Lucas-Kanade pyramid optical flow method may be adopted to calculate a motion trajectory of a gray scale image of two adjacent frames, and optical flow tracking is performed on the plurality of corner points based on the motion trajectory; other optical flow methods can be adopted to perform optical flow tracking on a plurality of corner points, and the optical flow tracking can be specifically set according to actual needs.

When optical flow tracking is performed on a plurality of angular points to obtain a plurality of target optical flow points, for example, optical flow tracking may be performed on a plurality of angular points to obtain a plurality of optical flow points; and respectively determining the confidence degrees corresponding to the plurality of optical flow points, and determining the optical flow points corresponding to the confidence degrees larger than the confidence degree threshold value as a plurality of target optical flow points, so that the plurality of optical flow points obtained by tracking the optical flow can be screened through the confidence degrees, the optical flow points with lower confidence degrees are removed, and the optimization of the optical flow points is realized. The value of the confidence threshold may be set according to actual needs, and the embodiment of the present disclosure is not particularly limited to the value of the confidence threshold.

For example, the confidence corresponding to each of the plurality of optical flow points may be determined by using a basic matrix method to screen the optical flow points, or the confidence corresponding to each of the plurality of optical flow points may be determined by using a random sample consensus (RANSAC) algorithm to screen the optical flow points, or other similar methods may be used to screen the optical flow points, and the setting may be specifically performed according to actual needs.

Thus, after a plurality of target optical flow points are obtained through screening by the confidence threshold, the first vector movement distance can be determined according to the vector movement distance corresponding to each target optical flow point. The vector movement distance corresponding to the target optical flow point is the above-described vector movement distance of the target optical flow point relative to the first key point corresponding to the target optical flow point in the third image. For example, when the first vector movement distance is determined according to the vector movement distance corresponding to each target optical flow point, an average vector movement distance corresponding to the vector movement distance corresponding to each target optical flow point may be determined, and the first vector movement distance may be determined according to the average vector movement distance. Therefore, the first vector moving distance is determined by adopting the corner detection technology and the optical flow tracking technology, so that the problem of smooth shaking of the mark frame can be solved to a certain extent when the final target mark frame is determined by subsequently combining the judgment result, the occurrence of serious shaking of the mark frame is effectively reduced, and the user can be accurately guided to shoot the image through the mark frame.

For example, when the first vector movement distance is determined according to the average vector movement distance, the average vector movement distance may be directly determined as the first vector movement distance; the average vector movement distance may be subjected to a certain process, such as a rounding process or a rounding process, and may be set according to actual needs.

For example, when the second vector movement distance is determined according to the boundary point in the initial mark frame and the third image, for each boundary point in the initial mark frame, a corresponding second key point of the boundary point in the third image may be determined first; determining the vector movement distance of the boundary point relative to the second key point, thus respectively determining the vector movement distance of each boundary point in the initial mark frame relative to the second key point, namely obtaining a plurality of vector movement distances; and determining a second vector movement distance according to the plurality of vector movement distances.

For example, when determining the second vector movement distance according to the plurality of vector movement distances, an average vector movement distance corresponding to the plurality of vector movement distances may be determined first, and the first vector movement distance may be determined according to the average vector movement distance. For example, when the first vector movement distance is determined according to the average vector movement distance, the average vector movement distance may be directly determined as the first vector movement distance; the average vector movement distance may be subjected to a certain process, such as a rounding process or a rounding process, and may be set according to actual needs.

Thus, after the first vector movement distance and the second vector movement distance are respectively determined, whether the projection matrix meets the preset condition or not can be determined according to the first vector movement distance and the second vector movement distance. For example, if the second vector movement distance is less than or equal to the preset multiple of the first vector movement distance, it is determined that the projection matrix meets the preset condition; and if the second vector movement distance is larger than the preset times of the first vector movement distance, determining that the projection matrix does not meet the preset condition. For example, the preset time may be 5 times, or may be other times, for example, 6 times, and the like, and may be specifically set according to actual needs, and here, the embodiment of the present disclosure is not specifically limited.

It should be noted that, in the embodiment of the present disclosure, there is no sequence between S301 and S302, and S301 may be executed first, and then S302 may be executed; or executing S302 first and then executing S301; s301 and S302 may also be executed at the same time, and may be specifically set according to actual needs, and here, the embodiment of the disclosure is only described by taking the example of executing S301 first and then executing S302, but the embodiment of the disclosure is not limited to this.

With reference to the above description, after the initial mark frame and the judgment result are determined respectively, the target mark frame may be determined jointly with the initial mark frame and the judgment result, that is, the following S303 is performed:

and S303, determining a target marking frame according to the initial marking frame and the judgment result.

For example, when the target mark frame is determined according to the initial mark frame and the determination result, if the determination result indicates that the projection matrix meets the preset condition, the initial mark frame may be directly determined as the target mark frame; on the contrary, if the judgment result indicates that the projection matrix does not meet the preset condition, the target mark frame can be determined jointly according to the position of the boundary point in the initial mark frame and the first vector movement distance, so that the target mark frame is determined jointly by combining the initial mark frame and the judgment result, the problem of smooth jitter of the mark frame can be solved to a certain extent, the occurrence of serious jitter of the mark frame is effectively reduced, and the user can be accurately guided to shoot the captured image through the target mark frame.

For example, when the target mark frame is determined together according to the position of the boundary point in the initial mark frame and the first vector movement distance, for each boundary point in the initial mark frame, the sum of the position of the boundary point and the first vector movement distance may be determined first to obtain a new boundary point, and the mark frame formed by each new boundary point may be determined as the final target mark frame.

It can be seen that in the embodiment of the present disclosure, when determining a target mark frame corresponding to an overlapping area of a first image with respect to a second image, an initial mark frame corresponding to the overlapping area of the first image with respect to the second image may be determined according to a projection matrix; determining whether the projection matrix meets a preset condition or not based on the third image and the initial mark frame; and determining the target mark frame by combining the initial mark frame and the judgment result, so that the problem of smooth jitter of the mark frame can be solved to a certain extent, the occurrence of serious jitter of the mark frame is effectively reduced, and the user can be accurately guided to shoot the captured image through the target mark frame.

Based on any of the above embodiments, after determining the target mark frame corresponding to the overlapping area of the first image with respect to the second image, it may further be determined whether the shooting operation may be performed on the captured first image based on whether the occupation ratio of the target mark frame in the second image is within the preset range, and if it is determined that the shooting operation may be performed on the captured first image, it may further be performed to stitch a fourth image obtained by performing the shooting operation with the second image, which may be referred to as an embodiment fourth shown in fig. 4 below.

Example four

Fig. 4 is a flowchart illustrating a method for stitching a fourth image and a second image obtained by performing an operation according to a fourth embodiment of the present disclosure, where the image processing method may be performed by software and/or a hardware device, for example, the hardware device may be a terminal or a server. For example, referring to fig. 4, the method for stitching the fourth image and the second image obtained by performing the operation may include:

s401, if the proportion of the target mark frame in the second image is within a preset range, shooting a fourth image.

Wherein the fourth image is an image obtained by performing a photographing operation on the captured first image.

For example, the preset range may be 30% to 80%, and may be specifically set according to actual needs, and here, the embodiment of the present disclosure is only described by taking the preset range of 30% to 80% as an example, but does not represent that the embodiment of the present disclosure is only limited thereto.

Taking the preset range of 30% -80% as an example, when determining whether the shooting operation can be performed on the captured first image according to the occupation ratio of the target mark frame in the second image, if the occupation ratio of the target mark frame in the second image is less than 30%, it indicates that the overlapping area indicated by the target mark frame is small, which may result in poor stitching effect between two adjacent frames of images, in which case, the shooting operation does not need to be performed on the captured first image; if the proportion of the target mark frame in the second image is more than 80%, the overlap area is larger, which results in a larger number of acquired images and a larger processing amount of image data, in which case, the shooting operation does not need to be performed on the captured first image; if the occupation ratio of the target mark frame in the second image is in the range of 30% -80%, the fourth image obtained by performing the shooting operation on the captured first image can be well spliced with the second image, and in this case, the shooting operation can be performed on the captured first image to splice the fourth image obtained by performing the shooting operation with the second image.

S402, according to the area corresponding to the target mark frame, the fourth image and the second image are spliced.

For example, when the fourth image and the second image are spliced according to the region corresponding to the target mark frame, the region corresponding to the target mark frame may be first divided into a plurality of sub-regions, and the areas of the overlapping regions of the sub-regions and the second image may be respectively determined; determining the moving direction of the camera according to the area of the overlapping area of each subarea and the second image; so that the fourth image and the second image are stitched according to the moving direction of the camera.

Taking the target mark frame as a regular rectangular frame as an example, the multiple sub-regions may include an upper half region, a lower half region, a left half region, and a right half region of a region corresponding to the target mark frame. For example, as shown in fig. 5, fig. 5 is a schematic diagram that divides a region corresponding to a target mark frame into an upper part region and a lower part region, which is provided in the embodiment of the present disclosure, and as can be seen from fig. 5, according to a position of the target mark frame [0, mask. rows/2] and a position of [ mask. rows/2, mask. rows ], a region corresponding to the target mark frame is divided into an upper part region and a lower part region in a vertical direction. The left half area and the right half area may be obtained by performing a division operation once, for example, as shown in fig. 6, fig. 6 is a schematic diagram of dividing an area corresponding to the target mark frame into a left part area and a right part area according to an embodiment of the present disclosure, and as can be seen by referring to fig. 6, the area corresponding to the target mark frame may be divided into the left half area and the right half area in a horizontal direction according to a position of the target mark frame [0, mask.

It should be understood that, in the embodiment of the present disclosure, the target mark frame is only used as a regular rectangular frame, and the plurality of sub-regions include an upper half region, a lower half region, a left half region, and a right half region of a region corresponding to the target mark frame.

As shown in fig. 5 and fig. 6, after the upper half area, the lower half area, the left half area, and the right half area are determined respectively, the areas of the overlapping areas of the upper half area and the second image may be respectively determined, for example, the area of the overlapping area of the upper half area and the second image may be referred to as top _ iou, the area of the overlapping area of the lower half area and the second image may be referred to as bottom _ iou, the area of the overlapping area of the left half area and the second image may be referred to as left _ iou, and the area of the overlapping area of the right half area and the second image may be referred to as right _ iou; then, the size relationship among top _ iou, bottom _ iou, left _ iou and right _ iou is compared, and the direction corresponding to the maximum area is determined as the moving direction of the camera.

For example, if top _ iou is the largest, the direction from bottom to top may be determined as the moving direction of the camera, if bottom _ iou is the largest, the direction from top to bottom may be determined as the moving direction of the camera, if left _ iou is the largest, the direction from right to left may be determined as the moving direction of the camera, and if right _ iou is the largest, the direction from left to right may be determined as the moving direction of the camera, so that the moving direction of the camera is determined according to the area of the overlapping region of each sub-region and the second image.

For example, when comparing the magnitude relationship among top _ iou, bottom _ iou, left _ iou, and right _ iou, and determining the direction corresponding to the maximum area as the moving direction of the camera, the corresponding section of pseudo code may be referred to the following description:

whether top _ iou is larger than bottom _ iou or not may be compared, if top _ iou is larger than bottom _ iou, the moving direction of the camera is determined to be from bottom to top, otherwise, the moving direction of the camera is determined to be from top to bottom, and max _ value ═ max (top _ iou, bottom _ iou) is recorded.

Secondly, comparing whether left _ iou is larger than max _ value or not, if the left _ iou is larger than max _ value, determining that the moving direction of the camera is from left to right, and updating the max _ value to be left _ iou;

and comparing whether right _ iou is larger than max _ value, if right _ iou is larger than max _ value, determining that the moving direction of the camera is from right to left, and updating max _ value to right _ iou, so that the moving direction of the camera is determined according to the size relationship among top _ iou, bottom _ iou, left _ iou and right _ iou.

With the above description, after the moving direction of the camera is determined, the moving direction of the camera can be used as a reference for splicing the fourth image and the second image, and the fourth image and the second image are spliced, so that the fourth image and the second image can be well spliced, and the splicing effect is improved.

EXAMPLE five

Fig. 7 is a schematic structural diagram of an image processing apparatus 70 provided according to a fifth embodiment of the present disclosure, and for example, referring to fig. 7, the image processing apparatus 70 may include:

an acquisition unit 701 configured to acquire a first feature vector corresponding to a captured first image and a second feature vector corresponding to a second image captured before the first image is captured;

a matching unit 702, configured to match the first eigenvector and the second eigenvector to obtain a projection matrix of the second image with respect to the first image;

a processing unit 703 for determining a target mark frame corresponding to an overlapping area of the first image with respect to the second image, based on the first image, a third image captured before the first image, and the projection matrix; the first image and the third image are two adjacent frames of images.

Optionally, the processing unit 703 includes a first processing module, a second processing module, and a third processing module.

And the first processing module is used for determining an initial mark frame corresponding to the overlapping area of the first image relative to the second image according to the projection matrix.

And the second processing module is used for determining whether the projection matrix meets the preset condition or not based on the third image and the initial mark frame.

And the third processing module is used for determining the target marking frame according to the initial marking frame and the judgment result.

Optionally, the second processing module includes a first processing sub-module, a second processing sub-module, and a third processing sub-module.

And the first processing submodule is used for carrying out corner detection on the third image and determining a first vector moving distance according to a plurality of detected corner points.

And the second processing submodule is used for determining a second vector moving distance according to the boundary point in the initial mark frame and the third image.

And the third processing submodule is used for determining whether the projection matrix meets the preset condition or not according to the first vector moving distance and the second vector moving distance.

Optionally, the third processing module includes a fourth processing sub-module and a fifth processing sub-module.

And the fourth processing submodule is used for determining the initial mark frame as the target mark frame if the judgment result indicates that the projection matrix meets the preset condition.

And the fifth processing submodule is used for determining the target marking frame according to the position of the boundary point in the initial marking frame and the first vector moving distance if the judgment result indicates that the projection matrix does not meet the preset condition.

Optionally, the third processing sub-module is specifically configured to determine that the projection matrix meets a preset condition if the second vector movement distance is less than or equal to a preset multiple of the first vector movement distance; and if the second vector movement distance is larger than the preset times of the first vector movement distance, determining that the projection matrix does not meet the preset condition.

Optionally, the first processing sub-module is specifically configured to perform optical flow tracking on the multiple corner points to obtain multiple target optical flow points; aiming at each target light stream point, determining a first key point corresponding to the target light stream point in the third image; determining the vector movement distance of the target optical flow point relative to the first key point; a first vector movement distance is determined based on the plurality of vector movement distances.

Optionally, the first processing sub-module is specifically configured to perform optical flow tracking on the multiple corner points to obtain multiple optical flow points; and determining the confidence degrees corresponding to the plurality of optical flow points respectively, and determining the optical flow points corresponding to the confidence degrees larger than the confidence degree threshold value as a plurality of target optical flow points.

Optionally, the second processing sub-module is specifically configured to determine, for each boundary point in the initial mark frame, a second key point corresponding to the boundary point in the third image; determining the vector movement distance of the boundary point relative to the second key point; and determining a second vector movement distance according to the plurality of vector movement distances.

Optionally, the obtaining unit 701 includes a first obtaining module and a second obtaining module.

And the first acquisition module is used for determining whether a target object with the maximum Gaussian response exists in the first image.

And the second acquisition module is used for acquiring a first feature vector corresponding to the first image if the target object exists.

Optionally, the obtaining unit 701 further includes a third obtaining module, a fourth obtaining module, and a fifth obtaining module.

And the third acquisition module is used for extracting a first initial feature vector corresponding to the first image by adopting the depth feature extraction model.

And the fourth acquisition module is used for extracting a second initial feature vector corresponding to the first image by adopting a feature extraction algorithm.

And the fifth obtaining module is used for performing feature fusion on the first initial feature vector and the second initial feature vector to obtain a first feature vector.

Optionally, the image processing apparatus further includes a shooting unit and a stitching unit.

The shooting unit is used for shooting a fourth image if the proportion of the target mark frame in the second image is within a preset range; wherein the fourth image is an image obtained by performing a photographing operation on the captured first image.

And the splicing unit is used for splicing the fourth image and the second image according to the area corresponding to the target mark frame.

Optionally, the splicing unit includes a first splicing module, a second splicing module, and a third splicing module.

The first splicing module is used for dividing the region corresponding to the target mark frame into a plurality of sub-regions and respectively determining the areas of the overlapping regions of the sub-regions and the second image.

And the second splicing module is used for determining the moving direction of the camera according to the area of the overlapping area of each subarea and the second image.

And the third splicing module is used for splicing the fourth image and the second image according to the moving direction of the camera.

The image processing apparatus 70 provided in the embodiment of the present disclosure may execute the technical solution of the image processing method shown in any one of the above embodiments, and the implementation principle and the beneficial effect of the image processing method are similar to those of the image processing method, which can be referred to as the implementation principle and the beneficial effect of the image processing method, and are not described herein again.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

According to an embodiment of the present disclosure, the present disclosure also provides a computer program product comprising: a computer program, stored in a readable storage medium, from which at least one processor of the electronic device can read the computer program, the at least one processor executing the computer program causing the electronic device to perform aspects of the image processing method provided by any of the above embodiments.

Fig. 8 is a schematic block diagram of an electronic device 80 provided by an embodiment of the disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 8, the apparatus 80 includes a computing unit 801 that can perform various appropriate actions and processes in accordance with a computer program stored in a Read Only Memory (ROM)802 or a computer program loaded from a storage unit 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data required for the operation of the device 80 can also be stored. The calculation unit 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to bus 804.

Various components in device 80 are connected to I/O interface 805, including: an input unit 806, such as a keyboard, a mouse, or the like; an output unit 807 such as various types of displays, speakers, and the like; a storage unit 808, such as a magnetic disk, optical disk, or the like; and a communication unit 809 such as a network card, modem, wireless communication transceiver, etc. The communication unit 809 allows the device 80 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.

Computing unit 801 may be a variety of general and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 801 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and the like. The calculation unit 801 executes the respective methods and processes described above, such as an image processing method. For example, in some embodiments, the image processing method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 808. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 80 via ROM 802 and/or communications unit 809. When the computer program is loaded into the RAM 803 and executed by the computing unit 801, one or more steps of the image processing method described above may be performed. Alternatively, in other embodiments, the computing unit 801 may be configured to perform the image processing method by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server can be a cloud Server, also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service ("Virtual Private Server", or simply "VPS"). The server may also be a server of a distributed system, or a server incorporating a blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. An image processing method comprising:

2. The method of claim 1, wherein the determining a target marker box corresponding to an overlapping region of the first image relative to the second image from the first image, a third image captured before the first image, and the projection matrix comprises:

determining an initial mark frame corresponding to an overlapping area of the first image relative to the second image according to the projection matrix;

determining whether the projection matrix meets a preset condition based on the third image and the initial mark frame;

and determining the target marking frame according to the initial marking frame and the judgment result.

3. The method of claim 2, wherein the determining whether the projection matrix satisfies a preset condition based on the third image and the initial mark box comprises:

carrying out corner detection on the third image, and determining a first vector moving distance according to a plurality of detected corner points;

determining a second vector movement distance according to the boundary point in the initial mark frame and the third image;

and determining whether the projection matrix meets a preset condition or not according to the first vector moving distance and the second vector moving distance.

4. The method of claim 3, wherein said determining the target mark box according to the initial mark box and the determination result comprises:

if the judgment result indicates that the projection matrix meets the preset condition, determining the initial mark frame as the target mark frame;

and if the judgment result indicates that the projection matrix does not meet the preset condition, determining the target mark frame according to the position of the boundary point in the initial mark frame and the first vector movement distance.

5. The method according to claim 3 or 4, wherein the determining whether the projection matrix satisfies a preset condition according to the first vector movement distance and the second vector movement distance comprises:

if the second vector movement distance is smaller than or equal to a preset time of the first vector movement distance, determining that the projection matrix meets a preset condition;

and if the second vector movement distance is larger than the first vector movement distance multiplied by a preset time, determining that the projection matrix does not meet a preset condition.

6. The method according to any of claims 3-5, wherein said determining a first vector movement distance from the detected plurality of corner points comprises:

carrying out optical flow tracking on the plurality of angular points to obtain a plurality of target optical flow points;

for each target light stream point, determining a first key point corresponding to the target light stream point in the third image; determining the vector movement distance of the target light stream point relative to the first key point;

and determining the first vector movement distance according to a plurality of vector movement distances.

7. The method of claim 6, wherein said performing optical flow tracking on said plurality of corner points to obtain a plurality of target optical flow points comprises:

carrying out optical flow tracking on the plurality of angular points to obtain a plurality of optical flow points;

and respectively determining the confidence degrees corresponding to the plurality of optical flow points, and determining the optical flow points corresponding to the confidence degrees larger than a confidence degree threshold value as the plurality of target optical flow points.

8. The method of any of claims 3-7, wherein said determining a second vector movement distance from the boundary point in the initial marker box and the third image comprises:

for each boundary point in the initial mark frame, determining a corresponding second key point of the boundary point in the third image; determining the vector movement distance of the boundary point relative to the second key point;

and determining the second vector movement distance according to the plurality of vector movement distances.

9. The method of any of claims 1-8, wherein said obtaining a first feature vector corresponding to the captured first image comprises:

determining whether a target object with the maximum Gaussian response exists in the first image;

and if the target object exists, acquiring the first feature vector corresponding to the first image.

10. The method of any of claims 1-9, wherein said obtaining a first feature vector corresponding to the captured first image comprises:

extracting a first initial feature vector corresponding to the first image by adopting a depth feature extraction model;

extracting a second initial feature vector corresponding to the first image by adopting a feature extraction algorithm;

and performing feature fusion on the first initial feature vector and the second initial feature vector to obtain the first feature vector.

11. The method according to any one of claims 1-10, wherein the method further comprises:

if the proportion of the target mark frame in the second image is within a preset range, shooting a fourth image; wherein the fourth image is an image obtained by performing a shooting operation on the captured first image;

and splicing the fourth image and the second image according to the area corresponding to the target mark frame.

12. The method according to claim 11, wherein the stitching the fourth image and the second image according to the region corresponding to the target mark frame comprises:

dividing the region corresponding to the target mark frame into a plurality of sub-regions, and respectively determining the area of the overlapping region of each sub-region and the second image;

determining the moving direction of the camera according to the area of the overlapped area of each subarea and the second image;

and splicing the fourth image and the second image according to the moving direction of the camera.

13. An image processing apparatus comprising:

an acquisition unit configured to acquire a first feature vector corresponding to a captured first image and a second feature vector corresponding to a second image captured before the first image is captured;

the matching unit is used for matching the first characteristic vector with the second characteristic vector to obtain a projection matrix of the second image relative to the first image;

14. The apparatus of claim 13, wherein the processing unit comprises a first processing module, a second processing module, and a third processing module;

the first processing module is configured to determine, according to the projection matrix, an initial mark frame corresponding to an overlapping area of the first image with respect to the second image;

the second processing module is used for determining whether the projection matrix meets a preset condition or not based on the third image and the initial mark frame;

15. The apparatus of claim 14, wherein the second processing module comprises a first processing sub-module, a second processing sub-module, and a third processing sub-module;

the first processing submodule is used for carrying out corner detection on the third image and determining a first vector moving distance according to a plurality of detected corners;

the second processing submodule is used for determining a second vector moving distance according to the boundary point in the initial marking frame and the third image;

and the third processing submodule is used for determining whether the projection matrix meets a preset condition according to the first vector movement distance and the second vector movement distance.

16. The apparatus of claim 15, wherein the third processing module comprises a fourth processing sub-module and a fifth processing sub-module;

the fourth processing submodule is configured to determine the initial mark frame as the target mark frame if the judgment result indicates that the projection matrix meets a preset condition;

and the fifth processing submodule is used for determining the target marking frame according to the position of the boundary point in the initial marking frame and the first vector movement distance if the judgment result indicates that the projection matrix does not meet the preset condition.

17. The apparatus of claim 15 or 16,

the third processing submodule is specifically configured to determine that the projection matrix meets a preset condition if the second vector movement distance is less than or equal to a preset multiple of the first vector movement distance; and if the second vector movement distance is larger than the first vector movement distance multiplied by a preset time, determining that the projection matrix does not meet a preset condition.

18. The apparatus of any one of claims 15-17,

the first processing sub-module is specifically configured to perform optical flow tracking on the multiple corner points to obtain multiple target optical flow points; for each target light stream point, determining a first key point corresponding to the target light stream point in the third image; determining the vector movement distance of the target light stream point relative to the first key point; and determining the first vector movement distance according to a plurality of vector movement distances.

19. The apparatus of claim 17, wherein,

the first processing sub-module is specifically configured to perform optical flow tracking on the multiple corner points to obtain multiple optical flow points; and respectively determining the confidence degrees corresponding to the plurality of optical flow points, and determining the optical flow points corresponding to the confidence degrees larger than a confidence degree threshold value as the plurality of target optical flow points.

20. The apparatus of any one of claims 15-19,

the second processing sub-module is specifically configured to determine, for each boundary point in the initial mark frame, a second key point corresponding to the boundary point in the third image; determining the vector movement distance of the boundary point relative to the second key point; and determining the second vector movement distance according to the plurality of vector movement distances.

21. The apparatus according to any one of claims 13-20, wherein the obtaining unit comprises a first obtaining module and a second obtaining module;

the first acquisition module is used for determining whether a target object with the maximum Gaussian response exists in the first image;

the second obtaining module is configured to obtain the first feature vector corresponding to the first image if the target object exists.

22. The apparatus according to any one of claims 13-21, wherein the acquiring unit further comprises a third acquiring module, a fourth acquiring module, and a fifth acquiring module;

the third obtaining module is configured to extract a first initial feature vector corresponding to the first image by using a depth feature extraction model;

the fourth obtaining module is configured to extract a second initial feature vector corresponding to the first image by using a feature extraction algorithm;

the fifth obtaining module is configured to perform feature fusion on the first initial feature vector and the second initial feature vector to obtain the first feature vector.

23. The apparatus according to any one of claims 13-22, wherein the apparatus further comprises a photographing unit and a stitching unit;

the shooting unit is used for shooting a fourth image if the proportion of the target mark frame in the second image is within a preset range; wherein the fourth image is an image obtained by performing a shooting operation on the captured first image;

24. The apparatus of claim 23, wherein the stitching unit comprises a first stitching module, a second stitching module, and a third stitching module;

the first splicing module is configured to divide a region corresponding to the target mark frame into a plurality of sub-regions, and determine areas of overlapping regions of the sub-regions and the second image respectively;

the second splicing module is used for determining the moving direction of the camera according to the area of the overlapping area of each sub-area and the second image;

25. An electronic device, comprising:

at least one processor; and

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the image processing method of any one of claims 1-12.

26. A non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the image processing method of any one of claims 1 to 12.

27. A computer program product comprising a computer program which, when being executed by a processor, carries out the steps of the image processing method of any one of claims 1 to 12.