WO2018233217A1

WO2018233217A1 - Image processing method, device and augmented reality apparatus

Info

Publication number: WO2018233217A1
Application number: PCT/CN2017/113578
Authority: WO
Inventors: 李祥艳; 徐梁栋
Original assignee: 歌尔科技有限公司
Priority date: 2017-06-23
Filing date: 2017-11-29
Publication date: 2018-12-27
Also published as: CN107230199A

Abstract

Embodiments of the present invention provide an image processing method, device and augmented reality apparatus. The method comprises: receiving a first image and a second image, wherein the first image and the second image are differently-sourced images acquired by capturing a same scene by a first camera and a second camera, respectively; acquiring a coordinate transformation matrix corresponding to the first image, wherein the coordinate transformation matrix uses the second image as a reference image; performing coordinate transformation on the first image by adopting the coordinate transformation matrix; and performing a fusion process on the first image having undergone coordinate transformation and the second image to fuse advantageous features of the two images and to enhance fused image quality.

Description

Image processing method, device and augmented reality device

cross reference

The present application is hereby incorporated by reference in its entirety in its entirety in its entirety in its entirety in the the the the the the the the the the the the

Technical field

The present invention relates to the field of image processing technologies, and in particular, to an image processing method, apparatus, and augmented reality device.

Background technique

Augmented Reality (AR) technology is a real-time computing camera position and angle. It combines image processing technology to superimpose the scene of the virtual world into the real world scene and display it to the user. AR technology has real-time interactive, real-world and virtual world information integration and the ability to add positioning virtual objects in 3D scale space, bringing people a new visual experience.

The realistic scene in the AR scene is taken by the camera. Generally, a certain type of camera is set in the AR device for collecting images of a real scene, such as a charge coupled device (CCD) camera. However, if the actual environment for shooting a real scene is in a dark, low-light environment, the sharpness of the image taken at this time tends to be unsatisfactory, so that the image quality that the user finally sees is not good, affecting the user experience.

Summary of the invention

In view of this, embodiments of the present invention provide an image processing method, apparatus, and augmented reality device, which improve image quality by performing image fusion processing on non-homologous images of the same scene.

In a first aspect, an embodiment of the present invention provides an image processing method, including:

Receiving a first image and a second image, the first image and the second image being non-homologous images obtained by capturing the same scene by the first camera and the second camera, respectively;

Obtaining a coordinate transformation matrix corresponding to the first image, where the coordinate transformation matrix uses the second image as a reference image;

Performing coordinate transformation on the first image by using the coordinate transformation matrix;

Performing image fusion processing on the coordinate-converted first image and the second image.

In a second aspect, an embodiment of the present invention provides an image processing apparatus, including:

a receiving module, configured to receive a first image and a second image, where the first image and the second image are non-homologous images obtained by capturing the same scene through the first camera and the second camera, respectively;

An acquiring module, configured to acquire a coordinate transformation matrix corresponding to the first image, where the coordinate transformation matrix uses the second image as a reference image;

a transform module, configured to perform coordinate transformation on the first image by using the coordinate transformation matrix;

And a fusion module, configured to perform image fusion processing on the coordinate-converted first image and the second image.

In a third aspect, an embodiment of the present invention provides an augmented reality device, including:

a first camera, a second camera, a memory, and a processor; wherein

The memory is for storing one or more computer instructions, wherein the one or more computer instructions are executed by the processor to implement an image processing method as described above.

In a fourth aspect, an embodiment of the present invention provides another augmented reality device, including:

a first camera, a second camera, an FPGA component; wherein

The FPGA component includes functional logic that implements the image processing method as described above.

An image processing method and apparatus provided by an embodiment of the present invention, two different classes are set in an AR device The first camera and the second camera are the first camera and the second camera, and the first camera and the second camera simultaneously capture the same scene to obtain the first image and the second image that are not homologous; the second image is used as the reference image, and the first image is acquired. a coordinate transformation matrix corresponding to the image, the coordinate transformation is performed on the first image by using the coordinate transformation matrix, so that the transformed first image corresponds to each pixel point in the second image; and further, the first image after the coordinate transformation Image fusion processing is performed with the second image. Since the first image and the second image are non-homologous images, the dominant features of the two are different. By combining the two, it is beneficial to fuse the superior features of the two and enhance the image quality after the fusion.

DRAWINGS

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, a brief description of the drawings used in the embodiments or the prior art description will be briefly described below. Obviously, the drawings in the following description It is a certain embodiment of the present invention, and other drawings can be obtained from those skilled in the art without any creative work.

FIG. 1 is a flowchart of Embodiment 1 of an image processing method according to an embodiment of the present invention;

2 is a flowchart of Embodiment 2 of an image processing method according to an embodiment of the present invention;

FIG. 3 is a flowchart of Embodiment 3 of an image processing method according to an embodiment of the present disclosure;

FIG. 4 is a schematic structural diagram of Embodiment 1 of an image processing apparatus according to an embodiment of the present disclosure;

FIG. 5 is a schematic structural diagram of Embodiment 2 of an image processing apparatus according to an embodiment of the present disclosure;

FIG. 6 is a schematic structural diagram of Embodiment 3 of an image processing apparatus according to an embodiment of the present disclosure;

FIG. 7 is a schematic structural diagram of Embodiment 1 of an augmented reality device according to an embodiment of the present disclosure;

FIG. 8 is a schematic structural diagram of Embodiment 2 of an augmented reality device according to an embodiment of the present disclosure;

FIG. 9 is a schematic structural diagram of a head mounted display device according to an embodiment of the present invention.

Detailed ways

The technical solutions in the embodiments of the present invention will be clearly and completely described in conjunction with the drawings in the embodiments of the present invention. It is a partial embodiment of the invention, and not all of the embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention.

The terms used in the embodiments of the present invention are for the purpose of describing particular embodiments only and are not intended to limit the invention. The singular forms "a", "the", "the" and "the" Generally, at least two types are included, but the case of including at least one is not excluded.

It should be understood that the term "and/or" as used herein is merely an association describing the associated object, indicating that there may be three relationships, for example, A and/or B, which may indicate that A exists separately, while A and B, there are three cases of B alone. In addition, the character "/" in this article generally indicates that the contextual object is an "or" relationship.

It should be understood that although the terms first, second, third, etc. may be used to describe XXX in embodiments of the invention, these XXX should not be limited to these terms. These terms are only used to distinguish XXX. For example, the first XXX may also be referred to as a second XXX without departing from the scope of the embodiments of the present invention. Similarly, the second XXX may also be referred to as a first XXX.

Depending on the context, the words "if" and "if" as used herein may be interpreted to mean "when" or "when" or "in response to determining" or "in response to detecting." Similarly, depending on the context, the phrase "if determined" or "if detected (conditions or events stated)" can be interpreted as "when determined" or "in response to determination" or "when detected (stated condition or event) "Time" or "in response to a test (condition or event stated)".

It should also be noted that the terms "including", "comprising" or "comprising" or any other variations thereof are intended to encompass a non-exclusive inclusion, such that the item or system comprising a plurality of elements includes not only those elements but also Other elements, or are included for this commodity or department The elements inherent in the system. An element defined by the phrase "comprising a ..." does not exclude the presence of additional identical elements in the item or system including the element, without further limitation.

It is further noted that the order between the steps in the various embodiments of the present invention may be adjusted, and is not necessarily performed in the order illustrated below.

FIG. 1 is a flowchart of Embodiment 1 of an image processing method according to an embodiment of the present invention. The image processing method provided by this embodiment may be implemented by an image processing apparatus, and the image processing apparatus may be implemented as a field programmable gate array ( A combination of some hardware devices in the Field-Programmable Gate Array (FPGA) component, which can be integrated into the AR device. As shown in FIG. 1, the method includes the following steps:

101. Receive a first image and a second image, where the first image and the second image are non-homologous images obtained by capturing the same scene by the first camera and the second camera, respectively.

102. Acquire a coordinate transformation matrix corresponding to the first image, where the coordinate transformation matrix uses the second image as a reference image.

103. Perform coordinate transformation on the first image by using a coordinate transformation matrix.

104. Perform image fusion processing on the first image and the second image after the coordinate transformation.

In the embodiment of the present invention, different types of cameras, such as the first camera and the second camera, may be disposed in the same AR device for capturing the same scene. The first camera and the second camera are disposed in the AR device, and the first camera and the second camera are respectively disposed on the left and right sides on a horizontal surface.

Alternatively, the first camera may be an infrared camera and the second camera may be a CCD camera. Correspondingly, the first image captured by the first camera is an infrared image, and the second image captured by the second camera is a visible light image.

The following is an explanation of the image acquisition and image fusion of the same scene by using different types of cameras in combination with an infrared camera and a CCD camera. The infrared camera and the CCD camera simultaneously take photos of the same scene and acquire the infrared image. The fusion with the dominant feature information of the visible light image can finally obtain a fused image with obvious feature points and rich information.

Specifically, when the temperature in the natural world is higher than the absolute zero, the object will produce infrared radiation, red. External imaging uses an infrared camera to convert invisible infrared radiation into a visible temperature distribution image. Infrared images are not susceptible to environmental influences, and temperature profiles of objects can be obtained in rain, snow, smoke, and dark environments. However, the resolution of the infrared camera tends to be low, so that the obtained infrared image has poor definition, the scene detail information is not obvious, and the formed image does not conform to human visual habits. On the contrary, the image acquired by the CCD camera is realized according to the energy of the reflected light of the object, and the image can better describe the detailed information of the scene, and the resolution is high, which is in line with the requirements of the human visual system. However, CCD cameras also have some shortcomings: in bad weather, the image capture ability is poor, the useful information is lost, and the comprehensive and detailed image information in the description scene cannot be obtained. It can be seen that the infrared image and the visible light image have their own advantages and disadvantages. Therefore, if the fusion algorithm is used to fuse the dominant features of the infrared image and the visible light image, the fused image contains rich feature point information, which is suitable for the human visual system. Will greatly enhance the visual experience of users watching.

In addition, the image processing method provided by the embodiment of the present invention can be implemented based on the hardware component of the FPGA component, that is, the fusion of the multi-source image is implemented based on the FPGA. Compared with pure software processing methods, FPGAs have rich resources such as storage resources and faster computing speed. In the scene of video image acquisition and display, the fused video is smoother, and the fused image can be output in real time to achieve a better visual experience.

In some scenarios, an AR device that integrates the first camera and the second camera described above is often used to capture video images of a real scene. It can be understood that since the first camera and the second camera are used to capture the same scene, it is necessary to ensure the clock synchronization of the two, that is, at the same time, the two cameras are shooting the same object in the scene. However, since the shooting parameters such as the shooting position and shooting angle of the two cameras will be different, even if the same subject is shot, the captured image will often be different.

In the above scenario, the first camera and the second camera simultaneously input the captured video image into the FPGA component through the video interface of the FPGA component, and after decoding the video decoder chip of the FPGA component, decoding into a YCbCr video image such as the BT.656 format. . In this scenario, the image processing method provided by the embodiment of the present invention performs fusion processing on two images corresponding to each time. For convenience of description, the description of the image fusion process is performed by taking only the first image and the second image corresponding to any one time as an example.

Since the first image and the second image have differences in shooting parameters such as resolution and shooting angle, to achieve image fusion of the first image and the second image, image registration of the first image and the second image is first required. Processing to establish a correspondence between each pixel point in the first image and the second image, so that the fusion of the first image and the second image can be performed based on the correspondence relationship of the pixel points.

The description of the image registration processing process is performed by taking the first image as an infrared image and the second image as a visible light image as an example.

Image registration processing includes scaling, rotation, and panning of the image. Since the visible light image has higher resolution and is more in line with the human eye visual habit than the infrared image, in this embodiment, the visible light image is used as the reference image, and the infrared image is used as the image to be registered for the infrared image. Zoom, rotate, and pan. The scaling, rotation and translation operations of the infrared image are performed based on the obtained coordinate transformation matrix, that is, the coordinate transformation matrix contains the scaling parameters required for the scaling operation, and the rotation parameters required for the rotation operation, And the translation parameters required for the panning operation. And these scaling parameters, rotation parameters, and translation parameters may be obtained in advance, so that a coordinate transformation matrix can be generated based on these parameters obtained in advance.

It should be noted that, in this embodiment, the reason why the infrared image is transformed by using the coordinate transformation matrix is because the conversion is more efficient than the method of sequentially performing three transformations on the infrared image, because only the conversion is more efficient. It is necessary to use a matrix to perform three transformations for each pixel in the infrared image.

After obtaining the coordinate transformation matrix described above, the coordinate-converted infrared image can be obtained based on matrix multiplication of the infrared image and the coordinate transformation matrix. Since the transformation parameters in the coordinate transformation matrix are based on the visible light image, the correspondence between the pixel points in the coordinate-converted infrared image and the pixel points in the visible light image can be obtained based on the transformation. Therefore, based on the correspondence, image fusion of the coordinate-converted infrared image and the visible light image can be performed.

In the implementation of image fusion on the FPGA hardware platform, the resources of the hardware platform, storage space and processing speed should be considered. In order to obtain a better fusion effect and make full use of the resources of the FPGA, in this embodiment, a pixel-level fusion method is selected: a method of weighted average of gray values, that is, a weighted average calculation of gray values of corresponding pixel points is implemented. The fusion of two images.

In summary, in the embodiment, two different types of cameras, that is, a first camera and a second camera are disposed in the AR device, and the first camera and the second camera simultaneously capture the same scene to obtain a non-homologous first image and a second image; taking a second image as a reference image, acquiring a coordinate transformation matrix corresponding to the first image, and performing coordinate transformation on the first image by using the coordinate transformation matrix, so that the transformed first image and the second image are Corresponding to each pixel point; further, image fusion processing is performed on the coordinate-converted first image and the second image. Since the first image and the second image are non-homologous images, the dominant features of the two are different. By combining the two, it is beneficial to fuse the superior features of the two and enhance the image quality after the fusion.

FIG. 2 is a flowchart of Embodiment 2 of an image processing method according to an embodiment of the present invention. As shown in FIG. 2, on the basis of the embodiment shown in FIG. 1, after step 103, the following steps may be further included:

201. Receive a first image and a second image, where the first image and the second image are non-homologous images obtained by capturing the same scene by the first camera and the second camera, respectively.

202. Perform pre-processing on the first image and the second image, where the pre-processing comprises: performing inverse processing on the gray value of the first image, and performing image enhancement processing on the second image.

203. Generate a rotation matrix B and a scaling matrix C according to locally stored rotation parameters and scaling parameters.

204. Send the preprocessed first image, the preprocessed second image, and the rotation parameter and the scaling parameter to the image registration processing component, so that the image registration processing component uses the preprocessed second image as a reference image. The pre-processed first image is subjected to registration processing in combination with the rotation parameter and the scaling parameter to obtain a translation parameter.

205. Generate a translation matrix A according to the translation parameter fed back by the image registration processing component.

206. Determine a coordinate transformation matrix T corresponding to the first image as a result of sequentially multiplying the translation matrix A, the rotation matrix B, and the scaling matrix C.

207. Perform coordinate transformation on the first image by using a coordinate transformation matrix.

208. Perform image fusion processing on the first image and the second image after the coordinate transformation.

In this embodiment, in order to ensure the quality of the subsequent fused image, optionally, it may be received The first image and the second image are subjected to certain pre-processing.

Take the first image as an infrared image and the second image as a visible light image as an example.

Since the infrared image is imaged according to the thermal radiation of the object, the brightness is too high and is not suitable for the human visual system. In this embodiment, the brightness of the infrared image is reduced by performing inverse processing of the gray value on the first image, and the feature points are highlighted.

Specifically, it is assumed that the size of the infrared image Simage1 is M*N, and the gray value of each pixel is 8 bits, that is, the gray level is divided into 28 or 256 gray levels, and the unit matrix E of M*N is constructed, and the inverse is performed. After the infrared image is Simage2, the inverse infrared image is determined according to the following formula: Simage2=256*E-Simage1.

The visible light image is an image formed according to the principle of reflection of energy. Since the visible light image is acquired in a harsh environment with low illumination, the screen is dark and the prominent feature points are small. Therefore, it is necessary to perform image enhancement processing on the visible light image. Specifically, the gray value of the pixel of the visible light image may be threshold-divided, and the traditional three-stage image enhancement method is adopted to enhance the image image by stretching the transform coefficient for the pixel points in different threshold ranges. .

After the infrared image and the visible light image are preprocessed, the coordinate transformation matrix corresponding to the preprocessed infrared image may be acquired based on the preprocessed infrared image and the visible light image.

Specifically, the coordinate transformation matrix may be obtained according to the translation matrix A, the rotation matrix B, and the scaling matrix C, wherein the translation matrix A, the rotation matrix B, and the scaling matrix C are respectively used to represent a translation parameter, a rotation parameter, and a scaling parameter. Therefore, the translation matrix A, the rotation matrix B, and the scaling matrix C are respectively generated according to the translation parameter, the rotation parameter, and the scaling parameter. Further, the coordinate transformation matrix T corresponding to the preprocessed infrared image can be determined as: translation matrix A. The rotation matrix B and the scaling matrix C sequentially perform matrix multiplication results: T=ABC.

For the scaling parameters, the scaling operation of the image is mainly performed for images of different resolutions. Since the resolution of the infrared image and the visible image are different, it is necessary to scale the preprocessed infrared image based on the preprocessed visible image to make the resolution and the resolution of the preprocessed visible image. Consistent.

Assume that the pixel point P(x, y) is any pixel in the preprocessed infrared image, the scaling factor in the X-axis direction is t _x , and the scaling factor in the Y-axis direction is t _y , which is obtained after scaling transformation. The pixel point is P'(x', y'), then there are: x'=x*t _x ; y'=y*t _y . If represented by a matrix, it is:

Thus, the scaling matrix

From this, it can be seen that to generate the above-described scaling matrix C, it is necessary to obtain scaling parameters t _x and t _y . The scaling parameters t _x and t _y can be determined according to the resolution of the infrared camera and the CCD camera, that is, the ratio of the X-axis resolutions of the two can determine t _x , and the ratio of the two Y-axis resolutions can determine t _y . Therefore, when the infrared camera and the CCD camera in the AR device are set, the scaling parameters t _x and t _y can be determined, and the scaling parameters t _x and t _y can be pre-stored in the storage space of the FPGA component.

For the rotation parameters, the rotation transformation of the image is mainly caused by the angle between the infrared image and the visible image due to human factors when shooting the infrared image and the visible light image, in order to make the corresponding feature points in the two images accurate. The matching needs to be performed on the pre-processed visible light image as a reference, and the pre-processed infrared image is rotated in the two-dimensional space.

Suppose the pixel point P(x, y) is any pixel in the preprocessed infrared image. After the rotation transformation, the corresponding pixel point is P'(x', y'). If the matrix is used, P'(x) The rotation relationship between ',y') and P(x,y) is:

Wherein, a Cartesian coordinate system is established centering on the origin in the preprocessed infrared image. Suppose that the angle between the connection of P(x, y) and the origin with the X axis is the first angle; the angle between the connection of P'(x', y') and the origin with the X axis is the second angle, then the second angle The difference from the first angle is θ, which means the angle of deflection between P'(x', y') and P(x, y).

Thus, the rotation matrix

From this, it can be seen that to generate the above-described rotation matrix B, it is necessary to obtain the rotation parameter θ. The rotation parameter θ can be determined according to the setting of the infrared camera and the CCD camera in the AR device. Specifically, the angle between the lens center and the horizontal surface of the infrared camera and the lens center and the horizontal surface of the CCD camera can be measured. Angle, the angle difference between the two angles is the rotation parameter θ. Therefore, when the infrared camera and the CCD camera in the AR device are set, the rotation parameter θ can be determined, and the rotation parameter θ can be pre-stored in the storage space of the FPGA component.

For the translation parameters, unlike the above rotation parameters and scaling parameters, when the translation transformation operation of the infrared image is required, the translation parameters need to be calculated based on the current infrared image and the visible light image. That is to say, the rotation parameter and the scaling parameter can be considered to be independent of the currently captured image, and do not depend on the currently captured image determination, but the translation parameter is related to the currently captured image and needs to be dependent on the current The captured image is determined.

The determination of the translation parameter needs to involve a responsible calculation process. The image processing method provided by the embodiment of the present invention can be implemented based on the hardware component of the FPGA component. If the FPGA component is used to calculate the translation parameter, the limitation is limited. Therefore, optionally, the translation parameter can be calculated based on the image registration processing component, and the image registration processing component can be implemented as a software program, and the translation processing is performed by the image registration processing component to obtain a translation parameter and feedback to the FPGA component. So that the FPGA component generates the corresponding translation matrix A.

The image registration processing component mainly uses the preprocessed visible light image as the reference image, and the preprocessed infrared image is used as the image to be registered, and the image registration processing is performed on the preprocessed infrared image to obtain the translation parameter. In the image registration process, the pre-processed infrared image is also scaled and rotated. Therefore, the FPGA can take the preprocessed infrared image, the preprocessed visible image, and the locally stored rotation. The parameter and the scaling parameter are sent to the image registration processing component, so that the image registration processing component takes the preprocessed visible light image as a reference image, and combines the rotation parameter and the scaling parameter to perform image registration processing on the preprocessed infrared image. Get pan parameter.

Simply explain the image registration process of the image registration processing component:

First, based on the scaling parameter and the rotation parameter, the pre-processed infrared image is subjected to scaling transformation and rotation transformation respectively; secondly, the transformed infrared image and the pre-processed visible light image common area are identified, and the feature points of the common area are identified; Then, a correspondence relationship between the transformed infrared image and the pre-processed visible light image common point feature point is established, and the translation parameter is determined based on the correspondence relationship.

The identification of the common area can be identified by, for example, a region of interest extraction (ROI) algorithm. The main idea is to define the contrast of the pixel in the color, brightness, direction, etc. as the significant value of the pixel (Saliency). ), the stronger the contrast, the greater the significant value of the pixel. The significant values of all pixels constitute a significant picture. The notable picture here is a grayscale image indicating the significance of each pixel of the image, and the brighter the greater the degree of saliency of the pixel. The region of interest of the image can be obtained based on the salient map. The respective regions of interest of the two images can be considered as common areas.

For feature points, a differential Gaussian pyramid algorithm can be used to detect feature points.

After the feature points in the two images are obtained, the correspondence between the feature points in the two images is established. For example, suppose the coordinates of any feature point in the transformed infrared image are (x, y), and the coordinates of all detected feature points on the preprocessed visible light image are (X ₁ , Y ₁ ), (X) ₂ , Y ₂ )...(X _N , Y _N ), determining the cosine between (x, y) and (X ₁ , Y ₁ ), (X ₂ , Y ₂ ) (X _N , Y _N ), respectively Minimum value: min(arctan(xX ₁ , yY ₁ ), arctan(xX ₂ , yY ₂ )...arctan(xX _N , yY _N )), (X ₁ , Y ₁ ), (X ₂ , the feature point corresponding to the minimum value in Y ₂ )...(X _N , Y _N ) is the feature point corresponding to (x, y), and it is assumed that the feature point corresponding to the minimum value is (X ₁ , Y ₁ ). Then, the offset Δx of (x, y) with respect to (X ₁ , Y ₁ ) in the X-axis direction can be determined according to the coordinate difference between x and X ₁ , and the offset Δy in the Y-axis direction can be based on y and Y ₁ The coordinate difference is determined. Finally, for all feature point pairs, the mean of the offsets of the feature point pairs can be obtained to obtain the translation parameters (dx, dy).

Thus, based on the translation parameter, the FPGA can generate the following translation matrix A:

Translation matrix

Therefore, in the FPGA, it is assumed that the pixel point P(x, y) is any pixel in the preprocessed infrared image, and after the translation transformation, the corresponding pixel point is P'(x', y'), then :

The FPGA generates the translation matrix A, the rotation matrix B, and the scaling matrix C, and can calculate and obtain the coordinate transformation matrix T.

After obtaining the coordinate transformation matrix T based on the foregoing process, the FPGA component can multiply the preprocessed infrared image by the matrix T to obtain the coordinate transformed infrared image, and further, the coordinate transformed infrared image and the preprocessed image. The visible light image is subjected to image fusion processing.

Specifically, the image fusion processing process may include:

The FFT fusion process is performed on the coordinate-converted infrared image and the pre-processed visible light image according to the following formula to obtain the fused grayscale image:

g(x,y)=w1(x,y)*f1(x,y)+w2(x,y)*f2(x,y), where f1(x,y) is the infrared image after coordinate transformation The gray value of any pixel point (x, y), f2 (x, y) is the gray value of the corresponding pixel in the preprocessed visible light image, and g(x, y) is the gray image Gray value corresponding to the pixel; w1 (x, y) and w2 (x, y) are weighting coefficients, w1 (x, y) + w2 (x, y) = 1;

Further, corresponding pixel points in the grayscale image are rendered with the chromaticity values of the respective pixels in the preprocessed visible light image to obtain a final fused image. Since the fused image is equivalent to a complementary fusion result of the dominant features of the infrared image and the visible image, the image quality is better.

FIG. 3 is a flowchart of Embodiment 3 of an image processing method according to an embodiment of the present invention. As shown in FIG. 3, on the basis of the embodiment shown in FIG. 1, before step 101, the following steps may be further included:

301. Receive a third image captured by the second camera.

302. Determine, according to a comparison result of the average gray value of the third image and the preset gray threshold, whether to trigger the first camera and the second camera to work simultaneously, and if yes, perform steps 101-104.

Since in practical applications, AR devices including different types of first camera and second camera are not only used in harsh environments, such as in low light environments, they are also used in normal environments. In a normal environment, if both the first camera and the second camera in the AR device work at the same time, it may not be necessary.

Therefore, the embodiment also provides a scheme for controlling whether the first camera and the second camera operate based on the current environment.

Take the first camera as the infrared camera and the second camera as the CCD camera as an example. In a normal environment, only the CCD camera can be operated, and in some harsh environments, the infrared camera and the CCD camera can work at the same time.

In this embodiment, the identification of whether the current environment is a normal environment or a harsh environment can be determined by recognizing the pixel gradation value of the image captured by the CCD camera.

Specifically, when the AR device is activated, the CCD camera may be first controlled to randomly capture an image, that is, the third image. An average gradation value is obtained by averaging the gradation values of all or part of the pixels in the third image. Further, the average gray value is compared with a preset gray threshold. If the gray threshold is greater than the gray threshold, the image resolution captured by the CCD camera can meet the viewing requirement, and the current environment is a normal environment. At this point, the CCD camera can be controlled to work alone. Conversely, if it is smaller than the gray threshold, it indicates that the resolution of the image captured by the CCD camera is insufficient to meet the viewing demand. The current environment is a harsh environment. In this case, the infrared camera and the CCD camera need to be controlled to work simultaneously.

In this embodiment, whether the current environment is a normal environment or an abnormally harsh environment is identified, and the operation of the different cameras set in the AR device is controlled to improve the intelligence of the AR device.

An image processing apparatus of one or more embodiments of the present invention will be described in detail below. Those skilled in the art will appreciate that these image processing devices can use commercially available hardware components through the solution. The steps taught are configured to be constructed.

FIG. 4 is a schematic structural diagram of Embodiment 1 of an image processing apparatus according to an embodiment of the present invention. As shown in FIG. 5, the apparatus includes: a receiving module 11, an obtaining module 12, a transforming module 13, and a merging module 14.

The receiving module 11 is configured to receive the first image and the second image, where the first image and the second image are non-homologous images obtained by capturing the same scene by the first camera and the second camera, respectively.

The obtaining module 12 is configured to acquire a coordinate transformation matrix corresponding to the first image, where the coordinate transformation matrix uses the second image as a reference image.

The transform module 13 is configured to perform coordinate transformation on the first image by using the coordinate transformation matrix.

The fusion module 14 is configured to perform image fusion processing on the coordinate-converted first image and the second image.

The apparatus shown in FIG. 4 can perform the method of the embodiment shown in FIG. 1. For the parts not described in detail in this embodiment, reference may be made to the related description of the embodiment shown in FIG. 1. For the implementation process and technical effects of the technical solution, refer to the description in the embodiment shown in FIG. 1, and details are not described herein again.

FIG. 5 is a schematic structural diagram of Embodiment 2 of an image processing apparatus according to an embodiment of the present invention. As shown in FIG. 5, on the basis of the embodiment shown in FIG. 4, the preprocessing module 21 is further included.

The pre-processing module 21 is configured to perform pre-processing on the first image and the second image, where the pre-processing comprises: performing gray level inversion processing on the first image, and performing the second image on the second image Image enhancement processing.

Optionally, the obtaining module 12 includes: a generating unit 121 and a determining unit 122.

The generating unit 121 is configured to generate a translation matrix A, a rotation matrix B, and a scaling matrix C, respectively.

The determining unit 122 is configured to determine a coordinate transformation matrix T corresponding to the first image as a result of sequentially multiplying the translation matrix A, the rotation matrix B, and the scaling matrix C.

Optionally, the generating unit 121 is specifically configured to:

Generating the rotation matrix B and the scaling matrix C according to locally stored rotation parameters and scaling parameters;

Transmitting the first image, the second image, and the rotation parameter and the scaling parameter And an image registration processing component, wherein the image registration processing component uses the second image as a reference image, and the first image is registered in combination with the rotation parameter and the scaling parameter to obtain a translation parameter;

The translation matrix A is generated according to the translation parameter fed back by the image registration processing component.

Optionally, the fusion module 14 includes: a grayscale fusion unit 141 and a chroma rendering unit 142.

The gradation fusion unit 141 is configured to perform gradation fusion processing on the coordinate-converted first image and the second image according to the following formula to obtain a fused grayscale image:

g(x,y)=w1(x,y)*f1(x,y)+w2(x,y)*f2(x,y), where f1(x,y) is the coordinate transformed a gray value of any pixel point (x, y) in the first image, f2 (x, y) is a gray value of a corresponding pixel point in the second image, and g(x, y) is a gray scale Gray value of the corresponding pixel in the image; w1 (x, y) and w2 (x, y) are weighting coefficients, w1 (x, y) + w2 (x, y) = 1;

The chroma rendering unit 142 is configured to render corresponding pixel points in the grayscale image with chroma values of respective pixels in the second image.

The apparatus shown in FIG. 5 can perform the method of the embodiment shown in FIG. 2. For the parts not described in detail in this embodiment, reference may be made to the related description of the embodiment shown in FIG. 2. For the implementation process and technical effects of the technical solution, refer to the description in the embodiment shown in FIG. 2, and details are not described herein again.

FIG. 6 is a schematic structural diagram of Embodiment 3 of an image processing apparatus according to an embodiment of the present invention. As shown in FIG. 6, the receiving module 11 is further configured to receive the second camera. The third image.

The apparatus can also include a determination module 31.

The determining module 31 is configured to determine whether to trigger the first camera and the second camera to work simultaneously according to a comparison result between the average gray value of the third image and the preset gray threshold.

The apparatus shown in FIG. 6 can perform the method of the embodiment shown in FIG. 3. For the parts not described in detail in this embodiment, reference may be made to the related description of the embodiment shown in FIG. For the implementation process and technical effects of the technical solution, refer to the description in the embodiment shown in FIG. 3, and details are not described herein again.

The device embodiments described above are merely illustrative, wherein the description as separate components The units may or may not be physically separate, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the embodiment. Those of ordinary skill in the art can understand and implement without deliberate labor.

FIG. 7 is a schematic structural diagram of Embodiment 1 of an augmented reality device according to an embodiment of the present invention. As shown in FIG. 7, the AR device may include: a first camera 41, a second camera 42, a memory 43, and a processor 44;

The memory 43 is for storing one or more computer instructions, wherein the one or more computer instructions are executed by the processor 44 to implement an image processing method as provided by the embodiments shown in FIGS.

The setting of the first camera 41 and the second camera 42 in the AR device may be: setting a first camera 41 and a second camera 42 on the left and right sides on a horizontal plane, that is, the display of the first camera and the second camera from the user AR device The vertical distance of the screen is equal.

Optionally, the first camera 41 is an infrared camera, and the second camera 42 is a CCD camera.

FIG. 8 is a schematic structural diagram of Embodiment 2 of an augmented reality device according to an embodiment of the present invention. As shown in FIG. 8 , the AR device includes:

a first camera 51, a second camera 52, and an FPGA component 53; wherein

The FPGA component 53 includes functional logic for implementing the image processing method provided by the embodiment shown in FIGS. 1 to 3. The FPGA component can be placed on the motherboard of the AR device.

Based on the FPGA component platform, the multi-source image fusion is realized. Because the storage resources of the FPGA are rich in resources, the computing speed will be faster. In the scene of video image acquisition and display, the fused video is smoother, and the fused image can be output in real time to achieve a better visual experience.

The electronic device provided by some embodiments of the present invention may be an external head mounted display device or an integrated head mounted display device, wherein the external head mounted display device needs to be used in conjunction with an external processing system (eg, a computer processing system).

FIG. 9 shows a schematic diagram of the internal configuration of the head mounted display device 900 in some embodiments.

The display unit 901 may include a display panel disposed on a side surface of the head mounted display device 900 facing the user's face, and may be a one-piece panel or left and right panels respectively corresponding to the left and right eyes of the user. The display panel may be an electroluminescence (EL) element, a liquid crystal display or a microdisplay having a similar structure, or a laser-scanned display in which the retina may be directly displayed or similar.

The virtual image optical unit 902 photographs the image displayed by the display unit 901 in an enlarged manner, and allows the user to observe the displayed image in the enlarged virtual image. As the display image outputted to the display unit 901, it may be an image of a virtual scene supplied from a content reproduction device (a Blu-ray disc or a DVD player) or a streaming server, or an image of a real scene photographed using an external camera 910. In some embodiments, virtual image optical unit 902 can include a lens unit, such as a spherical lens, an aspheric lens, a Fresnel lens, and the like. It can be understood that, in the embodiment of the present invention, the external camera 910 can specifically implement two cameras, that is, the first camera and the second camera mentioned in the foregoing embodiments.

The input operation unit 903 includes at least one operation member for performing an input operation, such as a button, a button, a switch, or other similarly functioned component, receives a user instruction through the operation member, and outputs an instruction to the control unit 907.

The status information acquisition unit 904 is configured to acquire status information of the user wearing the head mounted display device 900. The status information acquisition unit 904 may include various types of sensors for detecting status information by itself, and may acquire status information from an external device such as a smartphone, a wristwatch, and other multi-function terminals worn by the user through the communication unit 905. The status information acquisition unit 904 can acquire location information and/or posture information of the user's head. The status information acquisition unit 904 may include one or more of a gyro sensor, an acceleration sensor, a global positioning system (GPS) sensor, a geomagnetic sensor, a Doppler effect sensor, an infrared sensor, and a radio frequency field intensity sensor. Further, the state information acquisition unit 904 acquires state information of the user wearing the head-mounted display device 900, for example, acquires, for example, an operation state of the user (whether the user wears the head-mounted display device 900), an action state of the user (such as standing, walking, running) And the state of movement, such as the posture of the hand or fingertip, the open or closed state of the eye, Line direction, pupil size), mental state (whether the user is immersed in the image displayed by the observation and the like), and even the physiological state.

The communication unit 905 performs communication processing with the external device, modulation and demodulation processing, and encoding and decoding processing of the communication signal. In addition, the control unit 907 can transmit transmission data from the communication unit 905 to an external device. The communication method may be wired or wireless, such as mobile high-definition link (MHL) or universal serial bus (USB), high-definition multimedia interface (HDMI), wireless fidelity (Wi-Fi), Bluetooth communication, or low-power Bluetooth communication. And the mesh network of the IEEE802.11s standard. Additionally, communication unit 905 can be a cellular wireless transceiver that operates in accordance with Wideband Code Division Multiple Access (W-CDMA), Long Term Evolution (LTE), and the like.

In some embodiments, the head mounted display device 900 can also include a storage unit, the storage unit 906 being a mass storage device configured to have a solid state drive (SSD) or the like. In some embodiments, storage unit 906 can store applications or various types of data. For example, content viewed by the user using the head mounted display device 900 may be stored in the storage unit 906.

In some embodiments, the head mounted display device 900 can also include a control unit, and the control unit 907 can include a computer processing unit (CPU) or other device having similar functionality. In some embodiments, control unit 907 can be used to execute an application stored by storage unit 906, or control unit 907 can also be used to perform the methods, functions, and operations disclosed in some embodiments of the present application. In some embodiments, the control unit 907 may further include a memory chip such as a ROM 9071, a RAM 9072, etc., for the control unit 907 to execute an application stored therein. The image processing method provided by the foregoing embodiment can be implemented when the above application is executed.

The image processing unit 908 is for performing signal processing such as image quality correction related to the image signal output from the control unit 907, and converting its resolution into a resolution according to the screen of the display unit 901. Then, the display driving unit 909 sequentially selects each row of pixels of the display unit 901, and sequentially scans each row of pixels of the display unit 901 line by line, thereby providing pixel signals based on the signal-processed image signals.

In some embodiments, the head mounted display device 900 can also include an external camera. The external camera 910 may be disposed on the front surface of the body of the head mounted display device 900, and the external camera 910 may be one or more One. The external camera 910 can acquire three-dimensional information and can also be used as a distance sensor. Additionally, a position sensitive detector (PSD) or other type of distance sensor that detects reflected signals from the object can be used with the external camera 910. An external camera 910 and a distance sensor can be used to detect the body position, posture, and shape of the user wearing the head mounted display device 900. In addition, under certain conditions, the user can directly view or preview the real scene through the external camera 910.

In some embodiments, the head mounted display device 900 may further include a sound processing unit 911 that may perform sound quality correction or sound amplification of a sound signal output from the control unit 907, signal processing of an input sound signal, and the like. Then, the sound input/output unit 912 outputs the sound to the outside and the sound from the microphone after the sound processing.

It should be noted that the structure or component shown in bold in FIG. 1 may be independent of the head mounted display device 900, for example, may be disposed in an external processing system (eg, a computer system) for use with the head mounted display device 900; Alternatively, the structure or component shown in bold frame may be disposed inside or on the surface of the head mounted display device 900.

It should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, and are not limited thereto; although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that The technical solutions described in the foregoing embodiments are modified, or the equivalents of the technical features are replaced. The modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

An image processing method, comprising:

Receiving a first image and a second image, the first image and the second image being non-homologous images obtained by capturing the same scene by the first camera and the second camera, respectively;

Obtaining a coordinate transformation matrix corresponding to the first image, where the coordinate transformation matrix uses the second image as a reference image;

Performing coordinate transformation on the first image by using the coordinate transformation matrix;

Performing image fusion processing on the coordinate-converted first image and the second image.
The method according to claim 1, wherein before the acquiring the coordinate transformation matrix corresponding to the first image, the method further comprises:

Performing pre-processing on the first image and the second image, the pre-processing includes: performing gray level inversion processing on the first image, and performing image enhancement processing on the second image.
The method according to claim 1, wherein the acquiring a coordinate transformation matrix corresponding to the first image comprises:

Generating a translation matrix A, a rotation matrix B, and a scaling matrix C, respectively;

Determining a coordinate transformation matrix T corresponding to the first image is a result of sequentially multiplying the translation matrix A, the rotation matrix B, and the scaling matrix C.
The method according to claim 3, wherein the generating the translation matrix A, the rotation matrix B, and the scaling matrix C respectively comprises:

Generating the rotation matrix B and the scaling matrix C according to locally stored rotation parameters and scaling parameters;

Transmitting the first image, the second image, and the rotation parameter and the scaling parameter to an image registration processing component, so that the image registration processing component uses the second image as a reference image, combined The rotation parameter and the scaling parameter perform registration processing on the first image to obtain a translation parameter;

The translation matrix A is generated according to the translation parameter fed back by the image registration processing component.
The method according to claim 1, wherein the performing image fusion processing on the coordinate-converted first image and the second image comprises:

Performing gradation fusion processing on the coordinate-converted first image and the second image according to the following formula to obtain a fused grayscale image:

g(x,y)=w1(x,y)*f1(x,y)+w2(x,y)*f2(x,y), where f1(x,y) is the coordinate transformed a gray value of any pixel point (x, y) in the first image, f2 (x, y) is a gray value of a corresponding pixel point in the second image, and g(x, y) is a gray scale Gray value of the corresponding pixel in the image; w1 (x, y) and w2 (x, y) are weighting coefficients, w1 (x, y) + w2 (x, y) = 1;

Rendering corresponding pixel points in the grayscale image with chrominance values of respective pixel points in the second image.
The method according to any one of claims 1 to 5, wherein before the receiving the first image and the second image, the method further comprises:

Receiving a third image captured by the second camera;

And determining whether to trigger the first camera and the second camera to work simultaneously according to a comparison result of the average gray value of the third image and the preset gray threshold.
An image processing apparatus, comprising:

a receiving module, configured to receive a first image and a second image, where the first image and the second image are non-homologous images obtained by capturing the same scene through the first camera and the second camera, respectively;

An acquiring module, configured to acquire a coordinate transformation matrix corresponding to the first image, where the coordinate transformation matrix uses the second image as a reference image;

a transform module, configured to perform coordinate transformation on the first image by using the coordinate transformation matrix;

And a fusion module, configured to perform image fusion processing on the coordinate-converted first image and the second image.
The device according to claim 7, further comprising:

a pre-processing module, configured to perform pre-processing on the first image and the second image, where the pre-processing comprises: performing gray value inversion processing on the first image, and performing image on the second image Enhanced processing.
The device according to claim 7, wherein the obtaining module comprises:

Generating unit for respectively generating a translation matrix A, a rotation matrix B, and a scaling matrix C;

a determining unit, configured to determine a coordinate transformation matrix T corresponding to the first image as a result of sequentially multiplying the translation matrix A, the rotation matrix B, and the scaling matrix C.
The device according to claim 9, wherein the generating unit is specifically configured to:

Generating the rotation matrix B and the scaling matrix C according to locally stored rotation parameters and scaling parameters;

Transmitting the first image, the second image, and the rotation parameter and the scaling parameter to an image registration processing component, so that the image registration processing component uses the second image as a reference image, combined The rotation parameter and the scaling parameter perform registration processing on the first image to obtain a translation parameter;

The translation matrix A is generated according to the translation parameter fed back by the image registration processing component.
The device according to claim 7, wherein the fusion module comprises:

a gradation fusion unit, configured to perform gradation fusion processing on the coordinate-transformed first image and the second image according to the following formula to obtain a fused grayscale image:

g(x,y)=w1(x,y)*f1(x,y)+w2(x,y)*f2(x,y), where f1(x,y) is the coordinate transformed a gray value of any pixel point (x, y) in the first image, f2 (x, y) is a gray value of a corresponding pixel point in the second image, and g(x, y) is a gray scale Gray value of the corresponding pixel in the image; w1 (x, y) and w2 (x, y) are weighting coefficients, w1 (x, y) + w2 (x, y) = 1;

And a chroma rendering unit, configured to render corresponding pixel points in the grayscale image with chroma values of respective pixels in the second image.
The device according to any one of claims 7 to 11, wherein the receiving module is further configured to receive a third image captured by the second camera;

The device also includes:

And a determining module, configured to determine whether to trigger the first camera and the second camera to work simultaneously according to a comparison result between the average gray value of the third image and the preset gray threshold.
An augmented reality device, comprising:

a first camera, a second camera, a memory, and a processor; wherein

The first camera is configured to capture a first image;

The second camera is configured to capture a second image, where the first image and the second image are images obtained by respectively capturing the same scene;

The memory is for storing one or more computer instructions, wherein the one or more computer instructions are executed by the processor:

Obtaining a coordinate transformation matrix corresponding to the first image, where the coordinate transformation matrix uses the second image as a reference image;

Performing coordinate transformation on the first image by using the coordinate transformation matrix;

Performing image fusion processing on the coordinate-converted first image and the second image.
The device according to claim 13, wherein the processor is further configured to: perform pre-processing on the first image and the second image, the pre-processing comprising: performing the first image The gray value is inversely processed, and the second image is subjected to image enhancement processing.
The device according to claim 13, wherein the processor is configured to: when acquiring a coordinate transformation matrix corresponding to the first image,

Generating a translation matrix A, a rotation matrix B, and a scaling matrix C, respectively;

Determining a coordinate transformation matrix T corresponding to the first image is a result of sequentially multiplying the translation matrix A, the rotation matrix B, and the scaling matrix C.
The device according to claim 15, wherein when the processor generates the translation matrix A, the rotation matrix B, and the scaling matrix C, respectively, the processor is specifically configured to:

Generating the rotation matrix B and the scaling matrix C according to locally stored rotation parameters and scaling parameters;

Transmitting the first image, the second image, and the rotation parameter and the scaling parameter to an image registration processing component, so that the image registration processing component uses the second image as a reference image, combined The rotation parameter and the scaling parameter perform registration processing on the first image to obtain a translation parameter;

The translation matrix A is generated according to the translation parameter fed back by the image registration processing component.
An augmented reality device, comprising:

a first camera, a second camera, an FPGA component; wherein

The first camera is configured to capture a first image;

The second camera is configured to capture a second image, where the first image and the second image are images obtained by respectively capturing the same scene;

The FPGA component includes functional logic that implements the following steps:

Obtaining a coordinate transformation matrix corresponding to the first image, where the coordinate transformation matrix uses the second image as a reference image;

Performing coordinate transformation on the first image by using the coordinate transformation matrix;

Performing image fusion processing on the coordinate-converted first image and the second image.