CN117058012A - Image processing method, device, electronic equipment and storage medium - Google Patents

Image processing method, device, electronic equipment and storage medium Download PDF

Info

Publication number
CN117058012A
CN117058012A CN202310842745.1A CN202310842745A CN117058012A CN 117058012 A CN117058012 A CN 117058012A CN 202310842745 A CN202310842745 A CN 202310842745A CN 117058012 A CN117058012 A CN 117058012A
Authority
CN
China
Prior art keywords
image
spherical
pixel point
plane
coordinates
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310842745.1A
Other languages
Chinese (zh)
Inventor
许长桥
徐祖云
彭帅
肖寒
杨树杰
曾其妙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
Original Assignee
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications filed Critical Beijing University of Posts and Telecommunications
Priority to CN202310842745.1A priority Critical patent/CN117058012A/en
Publication of CN117058012A publication Critical patent/CN117058012A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/80Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The application provides an image processing method, an image processing device, electronic equipment and a storage medium, wherein the method comprises the steps of obtaining a plane image, wherein the plane image is determined by projecting a first spherical image, and the first spherical image is an image obtained by shooting the content to be shot in 360 degrees; inputting the planar image into a trained saliency model, generating a second spherical image corresponding to the planar image through the saliency model based on the planar image, and determining spherical pixel point coordinates corresponding to the second spherical image; determining the plane pixel point coordinates corresponding to the plane image through the saliency model based on the second spherical image and the spherical pixel point coordinates; based on the coordinates of the plane pixel points, outputting a target image with a salient region corresponding to the plane image through a salient model, solving the technical problem that the visual field prediction of the plane image obtained after the projection of the spherical image is inaccurate in the prior art, and achieving the purpose of accurately determining the salient region of the plane image.

Description

Image processing method, device, electronic equipment and storage medium
Technical Field
The present application relates to the field of data processing technologies, and in particular, to an image processing method, an image processing device, an electronic device, and a storage medium.
Background
With the commercialization of 5G and the rapid development of new multimedia technologies, virtual reality video (e.g., panoramic video, 360-degree video) has become increasingly popular in recent years. Unlike traditional video, virtual reality video allows users to view 360 degrees of video content, so transmitting virtual reality video requires a significant amount of bandwidth. However, when a user views a virtual reality video, the user usually views a region of interest, and in order to reduce the consumption of bandwidth, the video region of the region of interest of the user may be transmitted at a high resolution, and the remaining video region may be transmitted at a lower resolution, so it is important to accurately determine the video region of interest of the user.
In the prior art, a virtual reality video is generally projected into a plane image through a spherical image and then transmitted, however, the projected plane image generally has image distortion and pixel distortion, the image distortion becomes more and more serious from an equatorial plane corresponding to the spherical image to the north and south poles, and the characteristics of the plane image cannot be accurately learned by adopting a traditional model, so that a significant area determined by the traditional model is inaccurate.
Disclosure of Invention
In view of the above, the present application is directed to an image processing method, an image processing apparatus, an electronic device and a storage medium, so as to overcome all or part of the disadvantages in the prior art.
Based on the above object, the present application provides an image processing method comprising: acquiring a plane image, wherein the plane image is determined by projecting a first spherical image, and the first spherical image is an image obtained by shooting 360 degrees of content to be shot; inputting the planar image into a trained saliency model, generating a second spherical image corresponding to the planar image through the saliency model based on the planar image, and determining spherical pixel point coordinates corresponding to the second spherical image; determining a plane pixel point coordinate corresponding to the plane image through the saliency model based on the second spherical image and the spherical pixel point coordinate; and outputting a target image with a salient region corresponding to the planar image through the salient model based on the planar pixel point coordinates.
Optionally, determining the plane pixel point coordinates corresponding to the plane image based on the second spherical image and the spherical pixel point coordinates includes: determining any one of a plurality of tangent points corresponding to the second spherical image, and determining a tangent plane with a preset size taking the tangent point as a center based on the tangent point; projecting the spherical pixel point coordinates to the tangent plane based on the spherical pixel point coordinates to determine projection coordinates of the spherical pixel point coordinates in the tangent plane; and determining the plane pixel point coordinates based on the projection coordinates.
Optionally, the projecting the spherical pixel point coordinates onto the tangent plane based on the spherical pixel point coordinates to determine projection coordinates of the spherical pixel point coordinates in the tangent plane includes: establishing a coordinate system corresponding to the tangent plane by taking the tangent point as the center, dividing the tangent plane into areas based on the coordinate system corresponding to the tangent plane, and calculating unit coordinates of each area; determining a corresponding region of the spherical pixel point coordinates in the tangential plane; and calculating the projection coordinates based on the spherical pixel point coordinates and the unit coordinates of the region corresponding to the spherical pixel point coordinates.
Optionally, the determining the plane pixel point coordinates based on the projection coordinates includes: determining plane pixel point coordinates in the tangent plane by the following formula:
wherein Γ is x (phi, theta) is the abscissa, Γ, of the planar pixel point in the tangent plane y (phi, theta) is the ordinate of the plane pixel point in the tangential plane, theta is the abscissa of the spherical pixel point, phi is the ordinate of the spherical pixel point, theta γ Phi is the abscissa in the projection coordinates γ Is the ordinate of the projected coordinates.
Optionally, the loss function for training the saliency model is determined by: iota=l S-MSE (S,Q)+L CC (S,Q)+L KL (S, Q), wherein iota is the loss function, L S-MSE (S, Q) is a weight, L CC (S, Q) represents a linear correlation, L KL (S, Q) represents the difference relation, S is the target image, and Q is the marked sample image.
Optionally, the linear correlation is determined by the following formula: l (L) CC (S,Q)=1-CC(S,Q),Wherein L is CC (S, Q) is the linear correlation, CC (S, Q) is a linear correlation coefficient, cov (S, Q) is a covariance, σ (S) is a standard deviation of the target image, σ (Q) is a standard deviation of the noted sample image, S is the target image, and Q is the noted sample image; the difference relationship is determined by the following formula: l (L) KL (S,Q)=KL(S,Q),/>Wherein L is KL (S, Q) is the difference relation, KL (S, Q) is the difference between the target image and the marked sample image under the condition of information loss, S is the target image, Q is the marked sample image, epsilon is a regularization constant, n is the total number of initial plane pixel points, and i is the current pixel point.
Optionally, the saliency model is a convolutional neural network model, and the outputting, based on the planar pixel point coordinates, the target image with the saliency region corresponding to the planar image through the saliency model includes: in response to determining that the planar image has a preset calibration image corresponding to the planar image, inputting the preset calibration image into the significance model; based on the plane pixel point coordinates and the preset calibration image, respectively extracting first features corresponding to the plane pixel point coordinates and second features corresponding to the preset calibration image by using a convolution layer of the saliency model; outputting the target image through the saliency model based on the first feature and the second feature.
Based on the same inventive concept, the present application also provides an image processing apparatus, comprising: the acquisition module is configured to acquire a plane image, wherein the plane image is determined by projection of a first spherical image, and the first spherical image is an image obtained by shooting 360 degrees of content to be shot; a first determination module configured to input the planar image into a trained saliency model, generate a second spherical image corresponding to the planar image from the saliency model based on the planar image, and determine spherical pixel point coordinates corresponding to the second spherical image; the second determining module is configured to determine the plane pixel point coordinates corresponding to the plane image through the saliency model based on the second spherical image and the spherical pixel point coordinates; and the output determining module is configured to output a target image with a salient region corresponding to the planar image through the salient model based on the planar pixel point coordinates.
Based on the same inventive concept, the application also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable by the processor, the processor implementing the method as described above when executing the computer program.
Based on the same inventive concept, the present application also provides a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the method as described above.
As can be seen from the above, the image processing method, apparatus, electronic device and storage medium provided by the present application, the method includes: acquiring a plane image, wherein the plane image is determined by projecting a first spherical image, and the first spherical image is an image obtained by shooting 360 degrees of content to be shot; inputting the planar image into a trained significance model, generating a second spherical image corresponding to the planar image through the significance model based on the planar image, determining spherical pixel point coordinates corresponding to the second spherical image, and representing the second spherical image with specific values through the form of coordinates so as to establish accurate association with the planar image later. And determining the plane pixel point coordinates corresponding to the plane image through the saliency model based on the second spherical image and the spherical pixel point coordinates, so that the association relationship between the plane image and the second spherical image is more accurate. Based on the plane pixel point coordinates, outputting a target image with a salient region corresponding to the plane image through the salient model, wherein the salient model can accurately extract the characteristics of the plane image, and further, the purpose of accurately determining the salient region of the plane image through the salient model is achieved.
Drawings
In order to more clearly illustrate the technical solutions of the present application or related art, the drawings that are required to be used in the description of the embodiments or related art will be briefly described below, and it is apparent that the drawings in the following description are only embodiments of the present application, and other drawings may be obtained according to the drawings without inventive effort to those of ordinary skill in the art.
FIG. 1 is a flow chart of an image processing method according to an embodiment of the application;
fig. 2 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application;
fig. 3 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the application.
Detailed Description
The present application will be further described in detail below with reference to specific embodiments and with reference to the accompanying drawings, in order to make the objects, technical solutions and advantages of the present application more apparent.
It should be noted that unless otherwise defined, technical or scientific terms used in the embodiments of the present application should be given the ordinary meaning as understood by one of ordinary skill in the art to which the present application belongs. The terms "first," "second," and the like, as used in embodiments of the present application, do not denote any order, quantity, or importance, but rather are used to distinguish one element from another. The word "comprising" or "comprises", and the like, means that elements or items preceding the word are included in the element or item listed after the word and equivalents thereof, but does not exclude other elements or items. The terms "connected" or "connected," and the like, are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "upper", "lower", "left", "right", etc. are used merely to indicate relative positional relationships, which may also be changed when the absolute position of the object to be described is changed.
As described in the background section, with commercialization of 5G and rapid development of new multimedia technologies, virtual reality video (e.g., panoramic video, 360-degree video) has become more popular in recent years. The virtual reality video allows the user to view 360 degrees of video content, and thus the bandwidth consumed by the virtual reality video is not the same as that consumed by the conventional video, and the data rate required to transmit a 4k panoramic video to the client and allow the user to view all around is 400Mb/s, as is typically only 25Mb/s for conventional 4k video streaming. The user typically views a region of interest while viewing the virtual reality video, and there may also be a limit to the field of view of the head mounted device, where the user can only see 20% -30% of the video content of the full view while viewing the virtual reality video, and the video content of other regions transmitted is not viewed by the user but is entirely wasted. Therefore, it is important to accurately determine the video area of interest to the user.
The virtual reality video is generally projected into a plane image through a spherical image and then transmitted, however, the projected plane image generally has image distortion and pixel distortion, the image distortion becomes more and more serious from an equatorial plane corresponding to the spherical image to the north and south poles, and the characteristics of the plane image cannot be accurately learned by adopting a traditional model, so that the significant area determined by the traditional model is inaccurate.
In view of this, an embodiment of the present application provides an image processing method, referring to fig. 1, including the following steps:
step 101, obtaining a plane image, wherein the plane image is determined by projecting a first spherical image, and the first spherical image is an image obtained by shooting 360 degrees of content to be shot.
In this step, in order to enhance the viewing experience of the user, the first spherical image is typically projected as a planar image, the above projection step typically being completed during the image capturing phase. However, there are often cases where the planar image projected from the first spherical image is distorted in image and pixel. The planar image determined by projecting the first spherical image is acquired, and the acquisition sources of the planar image are various, and by way of example, the planar image can be acquired from a panoramic video, or the panoramic image shot by the user can be determined as the planar image.
Step 102, inputting the plane image into a trained saliency model, generating a second spherical image corresponding to the plane image through the saliency model based on the plane image, and determining spherical pixel point coordinates corresponding to the second spherical image.
In the step, the traditional saliency model can determine the saliency area corresponding to the image by extracting the characteristics of the input image, wherein the saliency area is the area focused by the user sight. However, since the planar image has image distortion and pixel distortion, the features of the planar image cannot be accurately extracted using the conventional saliency model. In order to solve the above-mentioned problem, the saliency model adopted in the present embodiment is a spherical convolutional neural network model, where the spherical convolutional neural network model may divide an input image into a plurality of regions, and extract the features of each region respectively by different weights obtained by model training in each region. When the trained spherical convolutional neural network model performs feature extraction on the spherical image, the output result has higher accuracy, so that a planar image is required to be associated with the spherical image which can be processed by the spherical convolutional neural network model. A second spherical image corresponding to the planar image is generated by the saliency model, wherein a related parameter of the second spherical image may be set by a user's demand, and illustratively, a radius of the second spherical image may be set to half a length of the planar image based on the user's demand. The coordinates of the second spherical image need to be determined, and for example, the coordinates of the spherical pixel point corresponding to the second spherical image can be determined by establishing a coordinate system. The second spherical image is represented by a specific numerical value in the form of coordinates so as to establish accurate association with the planar image later.
The significance model used in this embodiment may be a spherical convolutional neural network model having the following structure. The spherical convolutional neural network model consists of an Encoder (Encoder) and a Decoder (Decoder), wherein the Encoder is used as a backbone feature extraction network, one feature layer can be obtained through a contracted path, the Encoder comprises four spherical convolutional layers and four ReLU activation layers, and three spherical pooling operations are arranged between the spherical convolutional layers and the ReLU activation layers. As the enhanced feature extraction network, the Decoder performs feature fusion by using the preliminary effective feature layer acquired by the trunk feature extraction network through the expansion path, so as to acquire the final enhanced feature. The Decoder comprises three spherical convolution layers followed by corresponding ReLU activation function layers, with three upsampling layers between the spherical convolution layers and the ReLU activation layers.
And step 103, determining the plane pixel point coordinates corresponding to the plane image through the saliency model based on the second spherical image and the spherical pixel point coordinates.
In the step, the second spherical image which can be processed by the spherical convolution neural network model and the spherical pixel point coordinates corresponding to the second spherical image are used for determining the plane pixel point coordinates corresponding to the plane image. And establishing association between the second spherical image and the plane image in numerical value, and digitizing the association between the plane image and the second spherical image, so that the association between the plane image and the second spherical image is more accurate.
And 104, outputting a target image with a salient region corresponding to the planar image through the salient model based on the planar pixel point coordinates.
In the step, the features of the planar image are extracted through a saliency model, wherein the saliency model is a spherical convolutional neural network model. The spherical convolutional neural network model can identify the equatorial plane and the north-south poles corresponding to the spherical image, so that after the association relation between the spherical image and the plane image is established, the spherical convolutional neural network model can also identify the equatorial plane to the north-south poles corresponding to the first spherical image in the plane image. Since image distortion of a planar image becomes more and more severe from the equatorial plane corresponding to the first spherical image to the north-south poles, the spherical convolutional neural network model has a relatively small weight corresponding to coordinates in the vicinity of the two pole regions of the first spherical image present in the planar image, and a relatively large weight corresponding to coordinates in the vicinity of the equatorial plane region of the first spherical image present in the planar image. The coordinates of the undistorted region in the planar image can be highlighted by the saliency model. The saliency model can accurately extract the characteristics of the plane image, so that the purpose of accurately determining the saliency area of the plane image through the saliency model is achieved. The salient region of the planar image can be transmitted in high resolution later, and the other regions are transmitted in lower resolution, so that the user experience is improved, and the consumption of communication bandwidth is reduced.
According to the scheme, the plane image is obtained, wherein the plane image is determined by projecting a first spherical image, and the first spherical image is an image obtained by shooting 360 degrees of content to be shot; inputting the planar image into a trained significance model, generating a second spherical image corresponding to the planar image through the significance model based on the planar image, determining spherical pixel point coordinates corresponding to the second spherical image, and representing the second spherical image with specific values through the form of coordinates so as to establish accurate association with the planar image later. And determining the plane pixel point coordinates corresponding to the plane image through the saliency model based on the second spherical image and the spherical pixel point coordinates, so that the association relationship between the plane image and the second spherical image is more accurate. Based on the plane pixel point coordinates, outputting a target image with a salient region corresponding to the plane image through the salient model, wherein the salient model can accurately extract the characteristics of the plane image, and further, the purpose of accurately determining the salient region of the plane image through the salient model is achieved.
In some embodiments, determining the planar pixel point coordinates corresponding to the planar image based on the second spherical image and the spherical pixel point coordinates includes: determining any one of a plurality of tangent points corresponding to the second spherical image, and determining a tangent plane with a preset size taking the tangent point as a center based on the tangent point; projecting the spherical pixel point coordinates to the tangent plane based on the spherical pixel point coordinates to determine projection coordinates of the spherical pixel point coordinates in the tangent plane; and determining the plane pixel point coordinates based on the projection coordinates.
In this embodiment, in order to link the spherical pixel point coordinates and the planar pixel point coordinates, the spherical pixel point coordinates and the planar pixel point coordinates may be displayed by means of other planes. Since the second spherical image has a tangent point, a tangent plane can be established by means of the tangent point of the second spherical image, wherein the preset size can be determined according to the actual requirements, for example, in order to quickly determine the size of the tangent plane, the size of the planar image can be determined as the size of the tangent plane. And projecting the spherical pixel points to the tangential plane to obtain projection coordinates, and firstly establishing the association between the tangential plane and the spherical pixel points. And determining the plane pixel point coordinates through the projection coordinates, and establishing the connection between the plane pixel point coordinates and the projection coordinates. Because the feature extraction is carried out on the spherical image by the trained spherical convolutional neural network model, the output result has higher accuracy, and therefore, after the association of the planar image and the second spherical image in numerical value is established, the feature of the planar pixel point is extracted by the trained spherical convolutional neural network model, and the output result also has higher accuracy.
In some embodiments, the projecting the spherical pixel point coordinates to the tangent plane based on the spherical pixel point coordinates to determine projection coordinates of the spherical pixel point coordinates in the tangent plane includes: establishing a coordinate system corresponding to the tangent plane by taking the tangent point as the center, dividing the tangent plane into areas based on the coordinate system corresponding to the tangent plane, and calculating unit coordinates of each area; determining a corresponding region of the spherical pixel point coordinates in the tangential plane; and calculating the projection coordinates based on the spherical pixel point coordinates and the unit coordinates of the region corresponding to the spherical pixel point coordinates.
In this embodiment, since it is necessary to associate the second spherical image with the planar image by means of another plane, the projection coordinates of the spherical pixel coordinates of the second spherical image in the tangential plane are first determined. The tangential plane takes the tangential point as the center, a coordinate system corresponding to the tangential plane is established, and in order to improve the efficiency of determining the projection coordinates, the tangential plane can be divided into areas. For convenience of description, the horizontal axis may be considered to be divided by taking the center as a dividing line, the horizontal axis at the left portion of the center is a first horizontal axis, and the horizontal axis at the right portion of the center is a second horizontal axis; the vertical axis is divided by taking the center as a dividing line, the vertical axis at the upper part of the center is a first vertical axis, the vertical axis at the lower part of the center is a second vertical axis, and the center of the tangent plane is the center of the spherical pixel point coordinate. Dividing the tangential plane into eight regions, wherein the first region is a first transverse axis, and the unit coordinate of the first region is p γ(-1,0) =(-tanΔ θ 0); the second area is a second horizontal axis, and the unit coordinate of the second area is p γ(1,0) =(tanΔ θ 0); the third region is a first vertical axis, and the unit coordinate of the third region is p γ(0,1) =(0,tanΔ φ ) The method comprises the steps of carrying out a first treatment on the surface of the The fourth region is a second vertical axis, and the unit coordinate of the fourth region is p γ(0,-1) =(0,-tanΔ φ ) The method comprises the steps of carrying out a first treatment on the surface of the The fifth region is a region surrounded by the first horizontal axis and the first vertical axis, and the unit coordinate of the fifth region is p γ(-1,+1) =(-tanΔ θ ,+secΔ θ tanΔ φ ) The method comprises the steps of carrying out a first treatment on the surface of the The sixth region is a region surrounded by the second horizontal axis and the first vertical axis, and the unit coordinate of the sixth region is p γ(1,1) =(tanΔ θ ,secΔ θ tanΔ φ ) The method comprises the steps of carrying out a first treatment on the surface of the The seventh area is an area surrounded by the second horizontal axis and the second vertical axis, and the unit coordinate of the seventh area is p γ(+1,-1) =(+tanΔ θ ,-secΔ θ tanΔ φ ) The method comprises the steps of carrying out a first treatment on the surface of the The eighth region is a region surrounded by the second vertical axis and the second horizontal axis, and the unit coordinate of the eighth region is p γ(-1,-1) =(-tanΔ θ ,-secΔ θ tanΔ φ ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein delta is θ And delta φ Is a preset step size.
And respectively matching the positive and negative signs corresponding to the spherical pixel point coordinates with the positive and negative signs corresponding to the unit coordinates in each region, and determining the region as the region corresponding to the spherical pixel point coordinates in the tangential plane in response to determining that the positive and negative signs corresponding to the spherical pixel point coordinates are the same as the positive and negative signs corresponding to the unit coordinates of one region. For example, in response to determining that the abscissa of the spherical pixel point is positive and the ordinate is negative, the unit coordinates of which the abscissa is positive and the ordinate is negative are searched in the unit coordinates corresponding to all the areas, and then it is determined that the spherical pixel point coordinates are in the seventh area in the tangential plane. The projection coordinates are calculated based on the spherical pixel coordinates and the unit coordinates of the region corresponding to the spherical pixel coordinates, and illustratively, when the spherical pixel coordinates are (2, -2), the projection coordinates are determined by multiplying the unit coordinates corresponding to the seventh region by a value without positive and negative signs, and the projection coordinates are (+2tan Δ) θ ,-2secΔ θ tanΔ φ ). Passing the relationship between the second spherical image and the tangential plane through the form of coordinatesAnd the relationship between the second spherical image and the tangent plane can be accurately determined by numerical value.
In some embodiments, the determining the planar pixel point coordinates based on the projection coordinates includes: determining plane pixel point coordinates in the tangent plane by the following formula:
wherein Γ is x (phi, theta) is the abscissa, Γ, of the planar pixel point in the tangent plane y (phi, theta) is the ordinate of the plane pixel point in the tangential plane, theta is the abscissa of the spherical pixel point, phi is the ordinate of the spherical pixel point, theta γ Phi is the abscissa in the projection coordinates γ Is the ordinate of the projected coordinates.
In this embodiment, the planar image is associated with the second spherical image by means of a tangential plane, and therefore, the planar pixel coordinates corresponding to the planar image are determined in the tangential plane based on the spherical pixel coordinates and the projection coordinates corresponding to the second spherical image. The coordinates of the plane pixel points can be determined through a formula, the relationship between the second spherical image and the plane image is digitized through the form of the coordinates by means of the tangential plane, and then the relationship between the second spherical image and the plane image can be accurately determined, so that the characteristics of the plane pixel points can be accurately extracted by a subsequent saliency model.
In some embodiments, the loss function used to train the saliency model is determined by: iota=l S-MSE (S,Q)+L CC (S,Q)+L KL (S, Q), wherein iota is the loss function, L S-MSE (S, Q) is a weight, L CC (S, Q) represents a linear correlation, L KL (S, Q) represents the difference relation, S is the target image, and Q is the marked sample image.
In this embodiment, the saliency model is a spherical convolutional neural network model, which can divide an input image into a plurality of regions, and learn the features of each region by setting different weights, respectively. Therefore, the weights are added to the calculation of the loss function, and the training direction of the significance model is further guided based on the weights corresponding to each region. In order to train the significance model more fully, a linear correlation relationship and a difference relationship are introduced to guide the training direction of the significance model, wherein the linear correlation relationship is used for measuring the linear correlation coefficient between the target image and the marked sample image, and the larger the linear correlation coefficient is, the more similar the two images are; the difference relation measures the difference between the target image and the marked sample image under the condition of information loss, and the smaller the difference value is, the smaller the difference between the two images is. Through the loss function, the training direction of the significance model is more accurate.
In some embodiments, the linear correlation is determined by the following formula: l (L) CC (S,Q)=1-CC(S,Q),Wherein L is CC (S, Q) is the linear correlation, CC (S, Q) is a linear correlation coefficient, cov (S, Q) is a covariance, σ (S) is a standard deviation of the target image, σ (Q) is a standard deviation of the noted sample image, S is the target image, and Q is the noted sample image; the difference relationship is determined by the following formula: l (L) KL (S,Q)=KL(S,Q),/>Wherein L is KL (S, Q) is the difference relation, KL (S, Q) is the difference between the target image and the marked sample image under the condition of information loss, S is the target image, Q is the marked sample image, epsilon is a regularization constant, n is the total number of initial plane pixel points, and i is the current pixel point.
In this embodiment, the linear correlation relationship and the difference relationship are determined based on formulas, and the two abstract correlation relationships are quantized, so that the determination of the two relationships is more accurate, and the training direction of the significance model is more accurate.
In some embodiments, the saliency model is a convolutional neural network model, and the outputting, by the saliency model, a target image with a saliency region corresponding to the planar image based on the planar pixel point coordinates includes: in response to determining that the planar image has a preset calibration image corresponding to the planar image, inputting the preset calibration image into the significance model; based on the plane pixel point coordinates and the preset calibration image, respectively extracting first features corresponding to the plane pixel point coordinates and second features corresponding to the preset calibration image by using a convolution layer of the saliency model; outputting the target image through the saliency model based on the first feature and the second feature.
In the present embodiment, the presence-associated planar image is marked in consideration of a case where the presence user may continuously view a set of presence-associated planar images. For example, the presence association may refer to the planar image being obtained from the same source as other planar images, e.g., the planar images for which the presence association is derived from the same panoramic video. Before inputting the planar image into the saliency model, detecting whether other planar images with the same mark as the planar image to be input exist corresponding to target images output through the saliency model, and in response to determining that the target images exist, inputting one of the target images and the planar image to be input into the saliency model together, wherein the target images are preset calibration images, and the preset calibration images can correct the output result of the currently input planar image through the saliency model, so that the saliency model can output the target images efficiently and accurately. In order to make the output result of the target image more accurate, the target image may be a target image output by the saliency model at a previous time when the plane image to be input was input to the saliency model. It should be noted that, the saliency model in this embodiment may also be embedded in other models according to actual requirements, and for example, in order to further improve the viewing effect of the user, the saliency model may be embedded in a FoV model, where the FoV model can accurately represent an image that can be seen by a human eye or a machine vision system, and can also analyze the image.
It should be noted that, the method of the embodiment of the present application may be performed by a single device, for example, a computer or a server. The method of the embodiment can also be applied to a distributed scene, and is completed by mutually matching a plurality of devices. In the case of such a distributed scenario, one of the devices may perform only one or more steps of the method of an embodiment of the present application, the devices interacting with each other to accomplish the method.
It should be noted that the foregoing describes some embodiments of the present application. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments described above and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
Based on the same inventive concept, the application also provides an image processing device corresponding to the method of any embodiment.
Referring to fig. 2, the image processing apparatus includes:
the acquiring module 10 is configured to acquire a planar image, wherein the planar image is determined by projecting a first spherical image, and the first spherical image is an image obtained by 360-degree shooting of content to be shot.
A first determination module 20 configured to input the planar image into a trained saliency model, generate a second spherical image corresponding to the planar image from the saliency model based on the planar image, and determine spherical pixel point coordinates corresponding to the second spherical image.
A second determining module 30 is configured to determine, based on the second spherical image and the spherical pixel coordinates, the planar pixel coordinates corresponding to the planar image by the saliency model.
And an output module 40 configured to output, based on the planar pixel point coordinates, a target image with a salient region corresponding to the planar image through the salient model.
Through the device, a plane image is obtained, wherein the plane image is determined by projecting a first spherical image, and the first spherical image is an image obtained by shooting 360 degrees of contents to be shot; inputting the planar image into a trained significance model, generating a second spherical image corresponding to the planar image through the significance model based on the planar image, determining spherical pixel point coordinates corresponding to the second spherical image, and representing the second spherical image with specific values through the form of coordinates so as to establish accurate association with the planar image later. And determining the plane pixel point coordinates corresponding to the plane image through the saliency model based on the second spherical image and the spherical pixel point coordinates, so that the association relationship between the plane image and the second spherical image is more accurate. Based on the plane pixel point coordinates, outputting a target image with a salient region corresponding to the plane image through the salient model, wherein the salient model can accurately extract the characteristics of the plane image, and further, the purpose of accurately determining the salient region of the plane image through the salient model is achieved.
In some embodiments, the second determining module 30 is further configured to determine any one of a plurality of tangent points corresponding to the second spherical image, and determine a tangent plane of a preset size centered on the tangent point based on the tangent point; projecting the spherical pixel point coordinates to the tangent plane based on the spherical pixel point coordinates to determine projection coordinates of the spherical pixel point coordinates in the tangent plane; and determining the plane pixel point coordinates based on the projection coordinates.
In some embodiments, the second determining module 30 is further configured to set up a coordinate system corresponding to the tangent plane with the tangent point as a center, divide the tangent plane into regions based on the coordinate system corresponding to the tangent plane, and calculate a unit coordinate of each region; determining a corresponding region of the spherical pixel point coordinates in the tangential plane; and calculating the projection coordinates based on the spherical pixel point coordinates and the unit coordinates of the region corresponding to the spherical pixel point coordinates.
In some embodiments, the second determining module 30 is further configured to determine the planar pixel point coordinates based on the projection coordinates, including: determining plane pixel point coordinates in the tangent plane by the following formula: Wherein Γ is x (phi, theta) is the abscissa, Γ, of the planar pixel point in the tangent plane y (phi, theta) is the ordinate of the plane pixel point in the tangential plane, theta is the abscissa of the spherical pixel point, phi is the ordinate of the spherical pixel point, theta γ Phi is the abscissa in the projection coordinates γ Is the ordinate of the projected coordinates.
In some embodiments, a third determination module is further included, the third determination module further configured to determine a loss function for training the saliency model by: iota=l S-MSE (S,Q)+L CC (S,Q)+L KL (S, Q), wherein iota is the loss function, L S-MSE (S, Q) is a weight, L CC (S, Q) represents a linear correlation, L KL (S, Q) represents the difference relation, S is the target image, and Q is the marked sample image.
In some embodiments, the third determining module is further configured to determine the linear correlation by: l (L) CC (S,Q)=1-CC(S,Q),Wherein L is CC (S, Q) is the linear correlation, CC (S, Q) is the linear correlation coefficient, cov (S, Q) is the covariance, sigma (S) is the standard deviation of the target image, sigma (Q) is the targetStandard deviation of the annotated sample image, S is the target image, and Q is the annotated sample image; the difference relationship is determined by the following formula: l (L) KL (S,Q)=KL(S,Q),/>Wherein L is KL (S, Q) is the difference relation, KL (S, Q) is the difference between the target image and the marked sample image under the condition of information loss, S is the target image, Q is the marked sample image, epsilon is a regularization constant, n is the total number of initial plane pixel points, and i is the current pixel point.
In some embodiments, the output module 40 is further configured to input a preset calibration image to the saliency model in response to determining that the planar image has a preset calibration image corresponding thereto, the saliency model being a convolutional neural network model; based on the plane pixel point coordinates and the preset calibration image, respectively extracting first features corresponding to the plane pixel point coordinates and second features corresponding to the preset calibration image by using a convolution layer of the saliency model; outputting the target image through the saliency model based on the first feature and the second feature.
For convenience of description, the above devices are described as being functionally divided into various modules, respectively. Of course, the functions of each module may be implemented in the same piece or pieces of software and/or hardware when implementing the present application.
The device of the foregoing embodiment is configured to implement the corresponding image processing method in any of the foregoing embodiments, and has the beneficial effects of the corresponding method embodiment, which is not described herein.
Based on the same inventive concept, the application also provides an electronic device corresponding to the method of any embodiment, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor implements the image processing method of any embodiment when executing the program.
Fig. 3 shows a more specific hardware architecture of an electronic device according to this embodiment, where the device may include: a processor 1010, a memory 1020, an input/output interface 1030, a communication interface 1040, and a bus 1050. Wherein processor 1010, memory 1020, input/output interface 1030, and communication interface 1040 implement communication connections therebetween within the device via a bus 1050.
The processor 1010 may be implemented by a general-purpose CPU (Central Processing Unit ), microprocessor, application specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits, etc. for executing relevant programs to implement the technical solutions provided in the embodiments of the present disclosure.
The Memory 1020 may be implemented in the form of ROM (Read Only Memory), RAM (Random Access Memory ), static storage device, dynamic storage device, or the like. Memory 1020 may store an operating system and other application programs, and when the embodiments of the present specification are implemented in software or firmware, the associated program code is stored in memory 1020 and executed by processor 1010.
The input/output interface 1030 is used to connect with an input/output module for inputting and outputting information. The input/output module may be configured as a component in a device (not shown in the figure) or may be external to the device to provide corresponding functionality. Wherein the input devices may include a keyboard, mouse, touch screen, microphone, various types of sensors, etc., and the output devices may include a display, speaker, vibrator, indicator lights, etc.
Communication interface 1040 is used to connect communication modules (not shown) to enable communication interactions of the present device with other devices. The communication module may implement communication through a wired manner (such as USB, network cable, etc.), or may implement communication through a wireless manner (such as mobile network, WIFI, bluetooth, etc.).
Bus 1050 includes a path for transferring information between components of the device (e.g., processor 1010, memory 1020, input/output interface 1030, and communication interface 1040).
It should be noted that although the above-described device only shows processor 1010, memory 1020, input/output interface 1030, communication interface 1040, and bus 1050, in an implementation, the device may include other components necessary to achieve proper operation. Furthermore, it will be understood by those skilled in the art that the above-described apparatus may include only the components necessary to implement the embodiments of the present description, and not all the components shown in the drawings.
The electronic device of the foregoing embodiment is configured to implement the corresponding image processing method in any of the foregoing embodiments, and has the beneficial effects of the corresponding method embodiment, which is not described herein.
Based on the same inventive concept, the present application also provides a non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the image processing method according to any of the above embodiments, corresponding to the method according to any of the above embodiments.
The computer readable media of the present embodiments, including both permanent and non-permanent, removable and non-removable media, may be used to implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device.
The storage medium of the above embodiment stores computer instructions for causing the computer to perform the image processing method according to any one of the above embodiments, and has the advantages of the corresponding method embodiments, which are not described herein.
Those of ordinary skill in the art will appreciate that: the discussion of any of the embodiments above is merely exemplary and is not intended to suggest that the scope of the application (including the claims) is limited to these examples; the technical features of the above embodiments or in the different embodiments may also be combined within the idea of the application, the steps may be implemented in any order, and there are many other variations of the different aspects of the embodiments of the application as described above, which are not provided in detail for the sake of brevity.
Additionally, well-known power/ground connections to Integrated Circuit (IC) chips and other components may or may not be shown within the provided figures, in order to simplify the illustration and discussion, and so as not to obscure the embodiments of the present application. Furthermore, the devices may be shown in block diagram form in order to avoid obscuring the embodiments of the present application, and also in view of the fact that specifics with respect to implementation of such block diagram devices are highly dependent upon the platform within which the embodiments of the present application are to be implemented (i.e., such specifics should be well within purview of one skilled in the art). Where specific details (e.g., circuits) are set forth in order to describe example embodiments of the application, it should be apparent to one skilled in the art that embodiments of the application can be practiced without, or with variation of, these specific details. Accordingly, the description is to be regarded as illustrative in nature and not as restrictive.
While the application has been described in conjunction with specific embodiments thereof, many alternatives, modifications, and variations of those embodiments will be apparent to those skilled in the art in light of the foregoing description. For example, other memory architectures (e.g., dynamic RAM (DRAM)) may use the embodiments discussed.
The present embodiments are intended to embrace all such alternatives, modifications and variances which fall within the broad scope of the appended claims. Therefore, any omissions, modifications, equivalent substitutions, improvements, and the like, which are within the spirit and principles of the embodiments of the application, are intended to be included within the scope of the application.

Claims (10)

1. An image processing method, comprising:
acquiring a plane image, wherein the plane image is determined by projecting a first spherical image, and the first spherical image is an image obtained by shooting 360 degrees of content to be shot;
inputting the planar image into a trained saliency model, generating a second spherical image corresponding to the planar image through the saliency model, and determining spherical pixel point coordinates corresponding to the second spherical image;
determining a plane pixel point coordinate corresponding to the plane image through the saliency model based on the second spherical image and the spherical pixel point coordinate;
and outputting a target image with a salient region corresponding to the planar image through the salient model based on the planar pixel point coordinates.
2. The method of claim 1, wherein determining planar pixel point coordinates corresponding to the planar image based on the second spherical image and the spherical pixel point coordinates comprises:
Determining any one of a plurality of tangent points corresponding to the second spherical image, and determining a tangent plane with a preset size taking the tangent point as a center based on the tangent point;
projecting the spherical pixel point coordinates to the tangent plane based on the spherical pixel point coordinates to determine projection coordinates of the spherical pixel point coordinates in the tangent plane;
and determining the plane pixel point coordinates based on the projection coordinates.
3. The method of claim 2, wherein the projecting the spherical pixel point coordinates to the tangent plane based on the spherical pixel point coordinates to determine projection coordinates of the spherical pixel point coordinates in the tangent plane comprises:
establishing a coordinate system corresponding to the tangent plane by taking the tangent point as the center, dividing the tangent plane into areas based on the coordinate system corresponding to the tangent plane, and calculating unit coordinates of each area;
determining a corresponding region of the spherical pixel point coordinates in the tangential plane;
and calculating the projection coordinates based on the spherical pixel point coordinates and the unit coordinates of the region corresponding to the spherical pixel point coordinates.
4. The method of claim 2, wherein the determining the planar pixel point coordinates based on the projection coordinates comprises:
Determining plane pixel point coordinates in the tangent plane by the following formula:
wherein Γ is x (phi, theta) is the abscissa, Γ, of the planar pixel point in the tangent plane y (phi, theta) is the ordinate of the plane pixel point in the tangential plane, theta is the abscissa of the spherical pixel point, phi is the ordinate of the spherical pixel point, theta γ Phi is the abscissa in the projection coordinates γ Is the ordinate of the projected coordinates.
5. The method of claim 1, wherein the loss function for training the saliency model is determined by:
ι=L S-MSE (S,Q)+L CC (S,Q)+L KL (S,Q),
wherein iota is the loss function, L S-MSE (S, Q) is a weight, L CC (S, Q) represents a linear correlation, L KL (S, Q) represents the difference relation, S is the target image, and Q is the marked sample image.
6. The method of claim 5, wherein the step of determining the position of the probe is performed,
the linear correlation is determined by the following formula:
L CC (S,Q)=1-CC(S,Q),
wherein L is CC (S, Q) is the linear correlation, CC (S, Q) is a linear correlation coefficient, cov (S, Q) is a covariance, σ (S) is a standard deviation of the target image, σ (Q) is a standard deviation of the noted sample image, S is the target image, and Q is the noted sample image;
The difference relationship is determined by the following formula:
L KL (S,Q)=KL(S,Q),
wherein L is KL (S, Q) is the difference relation, KL (S, Q) is the difference between the target image and the marked sample image under the condition of information loss, S is the target image, Q is the marked sample image, epsilon is a regularization constant, n is the total number of initial plane pixel points, and i is the current pixel point.
7. The method of claim 1, wherein the saliency model is a convolutional neural network model,
outputting, based on the planar pixel point coordinates, a target image with a salient region corresponding to the planar image through the salient model, including:
in response to determining that the planar image has a preset calibration image corresponding to the planar image, inputting the preset calibration image into the significance model;
based on the plane pixel point coordinates and the preset calibration image, respectively extracting first features corresponding to the plane pixel point coordinates and second features corresponding to the preset calibration image by using a convolution layer of the saliency model;
outputting the target image through the saliency model based on the first feature and the second feature.
8. An image processing apparatus, comprising:
the acquisition module is configured to acquire a plane image, wherein the plane image is determined by projection of a first spherical image, and the first spherical image is an image obtained by shooting 360 degrees of content to be shot;
a first determination module configured to input the planar image into a trained saliency model, generate a second spherical image corresponding to the planar image from the saliency model based on the planar image, and determine spherical pixel point coordinates corresponding to the second spherical image;
the second determining module is configured to determine the plane pixel point coordinates corresponding to the plane image through the saliency model based on the second spherical image and the spherical pixel point coordinates;
and the output module is configured to output a target image with a salient region corresponding to the planar image through the salient model based on the planar pixel point coordinates.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any one of claims 1 to 7 when the program is executed by the processor.
10. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method of any one of claims 1 to 7.
CN202310842745.1A 2023-07-10 2023-07-10 Image processing method, device, electronic equipment and storage medium Pending CN117058012A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310842745.1A CN117058012A (en) 2023-07-10 2023-07-10 Image processing method, device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310842745.1A CN117058012A (en) 2023-07-10 2023-07-10 Image processing method, device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117058012A true CN117058012A (en) 2023-11-14

Family

ID=88654253

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310842745.1A Pending CN117058012A (en) 2023-07-10 2023-07-10 Image processing method, device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117058012A (en)

Similar Documents

Publication Publication Date Title
CN111354042B (en) Feature extraction method and device of robot visual image, robot and medium
CN110866977B (en) Augmented reality processing method, device, system, storage medium and electronic equipment
US9727775B2 (en) Method and system of curved object recognition using image matching for image processing
US9342886B2 (en) Devices, methods, and apparatuses for homography evaluation involving a mobile device
US20240046557A1 (en) Method, device, and non-transitory computer-readable storage medium for reconstructing a three-dimensional model
CN109754464B (en) Method and apparatus for generating information
CN107925755A (en) The method and system of plane surface detection is carried out for image procossing
JPWO2009001512A1 (en) Imaging apparatus, method, system integrated circuit, and program
CN113029128B (en) Visual navigation method and related device, mobile terminal and storage medium
CN111325798B (en) Camera model correction method, device, AR implementation equipment and readable storage medium
CN112927363A (en) Voxel map construction method and device, computer readable medium and electronic equipment
CN114511041B (en) Model training method, image processing method, device, equipment and storage medium
WO2020253716A1 (en) Image generation method and device
CN112102404B (en) Object detection tracking method and device and head-mounted display equipment
CN114627244A (en) Three-dimensional reconstruction method and device, electronic equipment and computer readable medium
CN112991441A (en) Camera positioning method and device, electronic equipment and storage medium
CN116205989A (en) Target detection method, system and equipment based on laser radar and camera fusion
CN117876608B (en) Three-dimensional image reconstruction method, three-dimensional image reconstruction device, computer equipment and storage medium
CN117953341A (en) Pathological image segmentation network model, method, device and medium
CN112258647B (en) Map reconstruction method and device, computer readable medium and electronic equipment
CN113034582A (en) Pose optimization device and method, electronic device and computer readable storage medium
CN109816791B (en) Method and apparatus for generating information
CN115937299B (en) Method for placing virtual object in video and related equipment
CN114694257B (en) Multi-user real-time three-dimensional action recognition evaluation method, device, equipment and medium
CN117058012A (en) Image processing method, device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination