CN115567658A

CN115567658A - Method and device for keeping image not deflecting and visual earpick

Info

Publication number: CN115567658A
Application number: CN202211545254.2A
Authority: CN
Inventors: 陈越; 涂海燕; 陈勇初
Original assignee: Quanzhou Archie Technology Co ltd
Current assignee: Quanzhou Archie Technology Co ltd
Priority date: 2022-12-05
Filing date: 2022-12-05
Publication date: 2023-01-03
Anticipated expiration: 2042-12-05
Also published as: CN115567658B

Abstract

The application relates to the field of visual technology, in particular to a method and a device for keeping an image not to deflect and a visual earpick, wherein the method comprises the following steps: receiving an original video stream; extracting key frame images of the original video stream according to a preset mode and a time sequence, and performing conversion processing; tracking and comparing image characteristic values between the converted key frame images according to a preset mode so as to calculate and obtain image displacement and deflection angles; according to the image displacement and the deflection angle, performing affine transformation on a preset key frame image according to the preset mode to keep the image not to deflect; wherein the preset mode comprises a single frame mode, a smooth mode or a mixed mode. According to the method and the device, the manufacturing difficulty of the product is reduced under the condition that no hardware is added, the image can be processed with low cost and high efficiency through a software algorithm, and meanwhile, the original image information is reserved and the image can be kept not to deflect.

Description

Method and device for keeping image not to deflect and visual earpick

Technical Field

The present disclosure relates to the field of visual technologies, and in particular, to a method and an apparatus for maintaining an image not to deflect, and a visual earpick.

Background

Visual earpick includes otoscope and otoscope body, and otoscope mainly used observes the ear, need during the use with the image transmission (generally through wireless, also can be wired) that otoscope gathered to APP end (also can be other equipment ends) demonstration. And the user is when using, often can rotate the earpick body, because otoscope and earpick body formula design usually is the integral type, the in-process of rotatory earpick body must also drive the otoscope rotatory, if do not do any processing, APP serves the image also can be along with rotatory, lets the user divide unclear which direction should go to the current motion easily, and this kind of condition can lead to the fact inconvenience for the user to reduce user's experience and feel.

The processing mode among the correlation technique is that an acceleration Sensor G-Sensor is installed on the earpick body, and when the user rotated the earpick body in the horizontal direction, the data of a rotation angle can be given to the APP end. The APP terminal receives the data of the rotation angle to process the image so as to keep the image on the APP terminal not to deflect.

In view of the related art as described above, the inventors thought that increasing G-Sensor leads to an increase in cost, and in addition, stability and reliability are not high due to the characteristics of G-Sensor itself.

Disclosure of Invention

In order to save production cost and improve user experience, the application provides a method and a device for keeping an image not to deflect and a visual earpick.

In a first aspect, the present application provides a method for maintaining an image from deflecting, which adopts the following technical solutions:

a method of maintaining an image undeflected, comprising:

receiving an original video stream;

extracting key frame images from the original video stream according to a preset mode and carrying out conversion processing;

tracking and comparing image characteristic values between the converted key frame images according to a preset mode so as to calculate and obtain image displacement and deflection angles;

according to the image displacement and the deflection angle, performing affine transformation on a preset key frame image according to the preset mode to keep the image not to deflect;

wherein the preset mode comprises a single frame mode, a smooth mode or a mixed mode.

Optionally, the single frame mode includes: extracting a key frame image every N frames from a preset initial frame, tracking and comparing the currently extracted key frame image with the previously extracted key frame image, and applying tracking and comparing results to all key frame images between the currently extracted frame and a next frame to be extracted; wherein N is a positive integer.

Optionally, the smoothing mode includes: starting from a preset initial frame, respectively tracking and comparing each key frame image in the current M frames with each previous key frame image by taking the M frames as a period, averaging each tracking and comparing result, and applying to all key frame images in the next M frames; wherein M is a positive integer.

Optionally, the mixed mode includes:

starting from a preset initial frame, and taking a P + Q frame as a period;

tracking and comparing a key frame image of the P-th frame in the current P + Q frame with a key frame image of the last frame in the last P + Q frame;

tracking and comparing each key frame image from the P +1 th frame to the P + Q th frame in the current P + Q frame with the respective previous key frame image;

averaging the tracking comparison results, and applying the tracking comparison results to all key frame images in the next P + Q frame;

wherein P and Q are both positive integers.

Optionally, the conversion treatment specifically includes:

converting the key frame image data into a matrix;

performing corner detection in the matrix;

and calculating the optical flows of the front frame image and the rear frame image according to the result of the angular point detection.

Optionally, the converting the key frame image data into a matrix specifically includes:

converting the keyframe image data into a vector;

converting the vector into a matrix;

and converting the matrix from a color matrix to a gray matrix.

Optionally, the affine transformation specifically includes:

appointing an image matrix to be rotated and a rotation angle, wherein the rotation angle is obtained by accumulating all deflection angles calculated from a preset initial frame;

marking the central point of the image matrix to be rotated as the center of the image to be rotated;

allocating memory space to the output matrix;

traversing each horizontal coordinate and each vertical coordinate of the image matrix to be rotated according to the image displacement and the rotation angle to obtain the output matrix;

converting the output matrix into a color output matrix;

and converting the color output matrix into a color output picture through image coding.

Optionally, the converting the color output matrix into a color output picture by image coding further includes:

rendering each obtained color output picture;

and sending and displaying the rendered color output picture.

Optionally, the calculating the optical flows of the front and rear frames of images specifically includes: and calculating the optical flows of the front frame image and the rear frame image by adopting a pyramid Lucas-Kanade algorithm.

Optionally, the method for calculating the image displacement and the deflection angle includes:

calculating affine transformation parameters between the two 2D point sets;

on the basis of the affine transformation parameters,

respectively obtaining a value dx of a 0 th row and a 2 nd column of a matrix after affine transformation and a value dy of a 1 st row and a 2 nd column of the matrix, wherein the dx is a component of the image displacement in the X-axis direction, and the dy is a component of the image displacement in the Y-axis direction;

and calculating the arctangent value between the value of the 1 st row and the 0 th column of the matrix after affine transformation and the value of the 0 th row and the 0 th column of the matrix to obtain the deflection angle.

Optionally, when the obtained deflection angle is outside the preset angle interval, the deflection angle is taken as 0.

In a second aspect, the present application further provides a device for maintaining an image from deflecting, which adopts the following technical solutions:

an apparatus for maintaining an image undeflected, comprising:

a memory for storing the original video stream, key frame images and a keep image undeflected program;

and the processor executes the steps of the image non-deflection keeping method when the image non-deflection keeping program is run.

In a third aspect, the present application further provides a visual earpick, which adopts the following technical scheme:

a visual earpick comprising:

an ear pick body;

the ear mirror is integrally arranged on the ear spoon body and is used for collecting pictures in ears;

a display device for displaying a picture in an ear;

the input end of the image non-deflection keeping device is connected with the otoscope, and the output end of the image non-deflection keeping device is connected with the display device.

To sum up, the manufacturing difficulty of the product is reduced under the condition that hardware does not need to be added, the image can be processed with low cost and high efficiency through a software algorithm, meanwhile, original image information is kept, the image can be kept not to deflect, the requirement that the visual otoscope stably outputs the video is met, better video watching experience can be obtained, and the user use experience is improved.

Drawings

FIG. 1 is a schematic structural diagram of an apparatus for maintaining an image undeflected according to the present application;

FIG. 2 is a schematic flow chart of a method for keeping an image from deflecting according to the present application;

fig. 3 is a schematic diagram of the 2 × 3 order matrix of affine transformation parameters of the present application.

Detailed Description

In the correlation technique, in order to keep the image not deflect, an acceleration Sensor G-Sensor is arranged on the earpick body, and when a user rotates the earpick body in the horizontal direction, data of a rotation angle can be given to the APP end. The APP terminal receives the data of the rotation angle to process the image so as to keep the image on the APP terminal not to deflect.

However, the addition of a G-Sensor to the earpick body increases the hardware cost, and if the G-Sensor is removed, the corresponding cost is reduced by 20-30%.

Moreover, because the G-Sensor has noise, the image at the APP end can also shake even if the earpick body is still, and because the otoscope video is a real-time system, many algorithms are not easy to apply, and the shaking of the image in the still state and the unification of the timely response speeds in the motion state are difficult to be considered.

In addition, in order to save cost, some adopt two-axis G-Sensor, when the earpick body is not placed horizontally, the obtained deviation angle has error, and in the limit, if the earpick body is placed vertically, the G-Sensor is completely ineffective.

The following detailed description of the embodiments of the visual earpick of the present application is provided in conjunction with the drawings of the specification, but the embodiments should not be construed as limiting the present application.

As shown in fig. 1, an embodiment of the present application provides a visual earpick, which includes an earpick body, an otoscope integrally disposed on the earpick body, a display device, and a device for keeping an image not deflecting; the image non-deflection keeping device is integrated with a memory and a processor, the input end of the image non-deflection keeping device is connected with the otoscope, and the output end of the image non-deflection keeping device is connected with the display device.

The otoscope is used for shooting pictures in ears and acquiring video streams for the processor to process; the display device is used for displaying the pictures in the ears processed by the processor; the memory is used for storing an original video stream, a key frame image and an image non-deflection keeping program, wherein the original video stream, the original key frame image, the processed video stream or the processed key frame image can be selectively stored and saved; the processor executes the following steps of the method for maintaining an image undeflected when running the program for maintaining an image undeflected.

It can be understood that the device for keeping the image from deflecting and the display device may be replaced by a mobile phone terminal, that is, the corresponding functions may be implemented by hardware such as a processor, a memory, and a display screen on the mobile phone terminal through software, or may be implemented by other devices such as a tablet and a computer. For convenience of description, the following description takes the mobile phone terminal as an embodiment.

An embodiment of the method for maintaining an image without deflection is described in further detail below in conjunction with a visual earpick.

As shown in fig. 2, an embodiment of the present application provides a method for keeping an image from deflecting, including:

s10: receiving an original video stream;

the original video stream refers to a video stream obtained by shooting a target (for example, an ear canal) by an image sensor such as an otoscope, and may be in a format such as MJPEG, RAW data, MPEG1-4, WMV, RMVB, FLV, and the like, which is not limited in this application.

It should be noted that, in this embodiment, the otoscope is integrally disposed on the earpick body, and a video stream obtained by shooting a target by the otoscope needs to be transmitted (generally, wirelessly or by wire) to the mobile phone end for processing and displaying, so that the otoscope needs to be connected to the mobile phone end by wire or connected to the mobile phone end by a wireless module (e.g., bluetooth, WIFI, etc.), which can be clearly understood by those skilled in the art and is not described herein again.

S20: extracting key frame images of the original video stream according to a preset mode and a time sequence, and performing conversion processing;

the preset modes comprise a single frame mode, a smooth mode or a mixed mode, and it can be understood that the key frame images are extracted in different preset modes, and which preset mode is selected can be selected in advance by a user and preset well, and can also be changed according to the actual effect in the using process, and after the preset mode is changed, the subsequent video stream is processed according to the newly selected preset mode.

S30: tracking and comparing image characteristic values between the converted key frame images according to a preset mode so as to calculate and obtain image displacement and deflection angles;

it can be understood that, under different preset modes, the key frame images to be compared are different, and when the preset mode is selected in the previous step, the step only needs to be executed according to the comparison rule set corresponding to the preset mode.

Tracking and comparing image characteristic values between the converted key frame images according to a preset mode so as to obtain a tracking and comparing result, wherein the tracking and comparing result comprises an image displacement and a deflection angle da, and the image displacement comprises a component dx of the image displacement in the X-axis direction and a component dy of the image displacement in the Y-axis direction.

S40: and carrying out affine transformation on a preset key frame image according to the image displacement and the deflection angle and the preset mode so as to keep the image not to deflect.

The single frame mode includes: extracting a key frame image every N frames from a preset initial frame, tracking and comparing the currently extracted key frame image with the previously extracted key frame image, and applying tracking and comparing results to all key frame images between the currently extracted frame and a next frame to be extracted (including the frame to be extracted); wherein N is a positive integer. It is understood that, in the single-frame mode, the preset key frame images refer to all key frame images between the current extraction frame and the next frame to be extracted (including the frame to be extracted). It should be noted that the preset initial frame also needs to be extracted, and the preset initial frame does not need to be processed and can be directly used for the mobile phone end to display.

In this embodiment, for convenience of explanation of the single frame mode, assuming that the preset initial frame is 0 and N =5, a key frame image is extracted every 5 frames from the 0 th frame, that is, the 0 th frame, the 5 th frame, the 10 th frame, and the 15 th frame \8230 \ 8230; \ key frame image is extracted, and the key frame images of the 5 th frame and the 0 th frame are tracked and compared, the tracking and comparison result is applied to all the key frame images between the 5 th frame and the 10 th frame (that is, the 6 th frame, the 7 th frame, the 8 th frame, the 9 th frame, and the 10 th frame), the key frame images of the 10 th frame and the 5 th frame are tracked and compared, the tracking and comparison result is applied to all the key frame images between the 10 th frame and the 15 th frame (that is, the 11 th frame, the 12 th frame, the 13 th frame, the 14 th frame, and the 15 th frame), and so on the like.

It can be understood that in the single-frame mode, the value size of N can directly affect the display effect by using single-frame contrast and single application. For example, when N is 1, it means that each key frame image in the video stream is tracked and compared, in this case, the display effect is best, but the required computational power is also highest, and in the case of limited computational power at the mobile phone end, the computation time may be too long, and junk (waste) frames may occur, thereby causing video blocking; and when the value of N is larger, the required calculation force is lower, but the deflection jitter is easier to occur, and the display effect is lower. Therefore, in the single-frame mode, the proper N value can be selected by combining the calculation force of the mobile phone terminal, and a better display effect is obtained.

The smoothing mode includes: starting from a preset initial frame, with M frames as a period, respectively tracking and comparing each key frame image in the current M frames (including the Mth frame) with a respective previous key frame image, averaging (for example, weighting and averaging) each tracking and comparing result, and then applying to all key frame images in the next M frames; wherein M is a positive integer. It is understood that, in the smoothing mode, the preset key frame images refer to all key frame images in the next M frames. It should be noted that the preset initial frame also needs to be extracted, and the preset initial frame does not need to be processed and can be directly used for the mobile phone end to display.

In this embodiment, for convenience of explanation of the smoothing mode, assuming that the preset initial frame is 0 and M =3, each key frame image in the current 3 frames (i.e. the 1 st frame, the 2 nd frame and the 3 rd frame) is tracked and compared with the respective previous key frame image in a period of 3 frames from the 0 th frame, that is: respectively tracking and comparing the 1 st frame with the 0 th frame to obtain E1, tracking and comparing the 2 nd frame with the 1 st frame to obtain E2, tracking and comparing the 3 rd frame with the 2 nd frame to obtain E3, and averaging (for example, weighting and averaging) the E1, the E2 and the E3, and then applying the average to the key frame images of the next 3 frames (namely, the 4 th frame, the 5 th frame and the 6 th frame); then tracking and comparing the 4 th frame with the 3 rd frame to obtain E4, tracking and comparing the 5 th frame with the 4 th frame to obtain E5, tracking and comparing the 6 th frame with the 5 th frame to obtain E6, and averaging (for example, weighting and averaging) the E4, the E5 and the E6, and applying the average to the key frame image of the next 3 frames (i.e., the 7 th frame, the 8 th frame and the 9 th frame), and so on.

It can be understood that in the smoothing mode, a multi-frame contrast and average application mode is adopted, and the calculation power can be obviously reduced compared with that of a single-frame mode, but the calculation granularity is relatively coarse, and the display effect is not as good as that of the single-frame mode.

As an embodiment of the present application, the hybrid mode includes:

starting from a preset initial frame, and taking a P + Q frame as a period;

tracking and comparing a key frame image of a P-th frame in a current P + Q frame with a key frame image of a last frame in a last P + Q frame, wherein when the P + Q frame is extracted for the first time, the last frame in the last P + Q frame is corresponding to the preset initial frame;

tracking and comparing each key frame image of the P +1 th frame to the P + Q th frame (including the P +1 th frame and the P + Q th frame) in the current P + Q frame with the respective previous key frame image;

averaging (e.g., weighted averaging) each tracking contrast result, and applying to all key frame images in the next P + Q frame;

wherein P and Q are both positive integers. It is understood that, in the hybrid mode, the preset key frame pictures refer to all key frame pictures in the next P + Q frame. It should be noted that the preset initial frame also needs to be extracted, and the preset initial frame does not need to be processed and can be directly used for the mobile phone end to display.

In this embodiment, for convenience of describing the hybrid mode, assuming that the preset initial frame is 0 and P =3 and Q =6, starting from the 0 th frame, within the first 9 frames (i.e., the 1 st frame to the 9 th frame), the key frame image of the 3 rd frame and the key frame image of the 0 th frame are tracked and compared to obtain F1, the key frame image of the 4 th frame and the key frame image of the 3 rd frame are tracked and compared to obtain F2, the key frame image of the 5 th frame and the key frame image of the 4 th frame are tracked and compared to obtain F3 \823030, the key frame image of the 9 th frame and the key frame image of the 8 th frame are tracked and compared to obtain F7, and F1, F2, F3 \8230, F7 is averaged (e.g., weighted average), and then applied to all the key frame images of the second 9 frames (i.e., the 10 th frame to the 18 th frame); in the second 9 frames (i.e. the 10 th frame to the 18 th frame), the key frame image of the 12 th frame and the key frame image of the 9 th frame are tracked and compared to obtain F11, the key frame image of the 13 th frame and the key frame image of the 12 th frame are tracked and compared to obtain F12, the key frame image of the 14 th frame and the key frame image of the 13 th frame are tracked and compared to obtain F13 \8230, the key frame image of the 18 th frame and the key frame image of the 17 th frame are tracked and compared to obtain F17, and F11, F12, F13 \8230, 8230, F17 is averaged (e.g. weighted average), and then is applied to all the key frame images of the third 9 frames (i.e. the 19 th frame to the 27 th frame), and so on.

As another embodiment of the present application, the hybrid mode includes:

starting from a preset initial frame, and taking a P + Q frame as a period;

tracking and comparing a key frame image of a P-th frame in a current P + Q frame with a key frame image of a last frame in a last P + Q frame, and applying a tracking and comparing result to all key frame images of a previous P frame in the next P + Q frame, wherein when the P + Q frame is extracted for the first time, the last frame in the last P + Q frame is corresponding to the preset initial frame;

tracking and comparing each key frame image of the P +1 th frame to the P + Q th frame (including the P +1 th frame and the P + Q th frame) in the current P + Q frame with the key frame image of the previous frame respectively, averaging (for example, weighted averaging) each tracking and comparing result, and applying the tracking and comparing result to all key frame images of the later Q frame in the next P + Q frame;

In this embodiment, for convenience of describing the hybrid mode, assuming that the preset initial frame is 0 and P =3 and Q =6, starting from the 0 th frame, within the first 9 frames (i.e., the 1 st frame to the 9 th frame), the key frame image of the 3 rd frame and the key frame image of the 0 th frame are tracked and compared to obtain G1, and the tracking and comparing result G1 is applied to all the key frame images of the previous 3 frames (i.e., the 10 th frame, the 11 th frame and the 12 th frame) in the second 9 frames, the key frame image of the 4 th frame and the key frame image of the 3 rd frame are tracked and compared to obtain G2, the key frame image of the 5 th frame and the key frame image of the 4 th frame to obtain G3 \8230, the key frame image of the 9 th frame and the key frame image of the 8 th frame are tracked and compared to obtain G7, and G2, G3 \8230, 8230are applied to all the second frames (i.e., the 8218) in the second 9 frames after the weighted averaging (e.g 7) is performed; in the second 9 frames (i.e. the 10 th to the 18 th frames), the key frame image of the 12 th frame and the key frame image of the 9 th frame are tracked and compared to obtain G11, the tracking and comparison result G11 is applied to all the key frame images of the first 3 frames (i.e. the 19 th, the 20 th and the 21 st frames) in the third 9 frames, the key frame image of the 13 th frame and the key frame image of the 12 th frame are tracked and compared to obtain G12, the key frame image of the 14 th frame and the key frame image of the 13 th frame are tracked and compared to obtain G13 \8230, the key frame image of the 18 th frame and the key frame image of the 17 th frame are tracked and compared to obtain G17, and the G12, G13 \82308230, the G17 is averaged (e.g. weighted average), and then applied to all the key frame images of the last 6 frames (i.e. the 22 th to the 27 th frames) in the third 9 frames, and so on.

It can be understood that the single frame contrast and single application mode of the single frame mode can obtain better display effect, but requires higher computational power; the calculation force can be obviously reduced by a multi-frame comparison and average application mode of a smooth mode, but the display effect is poor; and the mixed mode combining the advantages of the single frame mode and the flat sliding mode can more easily obtain the balance between the computing power and the display effect, so that in practical application, in order to adapt to mobile phone ends with different computing powers, the balance between the computing power and the display effect can be better adjusted by adopting the mixed mode.

It is to be understood that, in the above weighted average, the weight of each tracking comparison result may be assigned according to experience or standard, and those skilled in the art can clearly understand the calculation manner of the weighted average, which is not described herein again.

Specifically, in step S20, the conversion process specifically includes:

s21: converting the key frame image data into a matrix;

in the embodiment of the application, the key frame image data can be converted into the matrix according to the original format size, and the matrix is used without manually distributing the memory and releasing the memory which is not used any more, so that the memory required by executing the task is used at any time without extra operation, and the use process is convenient and quick.

Specifically, the step S21 specifically includes:

s211: converting the keyframe image data into a vector;

in this step, each key frame image data may be assigned a vector that encapsulates a sequential container of dynamic size arrays.

S212: converting the vector into a matrix by adopting a decoding memory data tool;

in this step, the memory data may be decoded by using a memory data decoding tool such as codec, and the vector may be converted into a matrix, where the conversion criterion may be that read _ unlocked returns the original image, and this matrix includes all information of the image, so as to more completely store the data of the source image, and facilitate the subsequent calculation.

S213: and converting the matrix from a color matrix to a gray matrix by using a color space conversion tool.

In the step, the cvtColor can be adopted for COLOR space conversion, and finally the COLOR space conversion is converted into a COLOR _ BGR2GRAY GRAY matrix, the cvtColor can ensure that the data type is unchanged in the conversion process, namely the data type and the bit depth of the converted image are consistent with those of the source image, the GRAY matrix uses a single tone to represent the image, and the image data amount can be effectively reduced after the conversion.

S22: detecting angular points in the matrix;

in the embodiment of the application, the corner detection of goodforturedtrack is carried out on the processed gray matrix image data according to a specific standard, so that an accurate image characteristic value is obtained; the above specific standard may adopt Harris corner detection, and may also adopt Shi Tomasi algorithm corner detection, and the detected corner is still at pixel level.

S23: and calculating the optical flows of the front frame image and the rear frame image according to the result of the angular point detection.

In the embodiment of the application, the optical flow of the two frames of images before and after can be calculated by means of calcptical flowpyrlk, and the optical flow calculation can calculate the optical flow of the sparse feature set by using an iterative Lucas-Kanade algorithm with a pyramid.

It should be understood that the previous and subsequent frames refer to not the previous and subsequent frames adjacent in time sequence to the original video stream, but refer to the previous and subsequent frames adjacent in time sequence to all the key frames extracted according to the preset mode.

Whenever the otoscope moves or rotates, the captured video stream will move or rotate similarly, and it is necessary to track the feature values in the key frame images to obtain the tracking contrast result of the following key frame image relative to the preceding key frame image, the tracking contrast result includes the image displacement and the deflection angle da, and the image displacement includes the component dx of the image displacement in the X-axis direction and the component dy of the image displacement in the Y-axis direction.

In this embodiment of the application, in the step S30, the method for calculating the image displacement and the deflection angle includes:

s31: calculating affine transformation parameters between the two 2D point sets;

in this step, the affine transformation parameters between two 2D point sets may be calculated using estimatedaffine 2D.

As shown in fig. 3, in the embodiment of the present application, the acquired affine transformation parameters may be represented by a 2 × 3 matrix, where (0,0) represents a value corresponding to row 0 and column 0, (0,1) represents a value corresponding to row 0 and column 1, (0,2) represents a value corresponding to row 0 and column 2, (1,0) represents a value corresponding to row 1 and column 0, (1,1) represents a value corresponding to row 1 and column 1, and (1,2) represents a value corresponding to row 1 and column 2.

S32: on the basis of the affine transformation parameters described,

with continued reference to fig. 3, the values of the 0 th row and the 2 nd column (0, 2) of the 2 × 3 th order matrix after affine transformation are obtained as the component dx of the image displacement in the X-axis direction, and the values of the 1 st row and the 2 nd column (1, 2) of the 2 × 3 order matrix after affine transformation are obtained as the component dy of the image displacement in the Y-axis direction;

calculating the arctangent value between the value of the 1 st row and the 0 th column (1, 0) of the 2 × 3 matrix after affine transformation and the value of the 0 th row and the 0 th column (0, 0) as the deflection angle da, it should be noted that the arctangent value between the two values is obtained by taking the value of the 1 st row and the 0 th column (1, 0) of the 2 × 3 matrix and the value of the 0 th row and the 0 th column (0, 0) of the 0 th row so as to obtain the arctangent value between the two values, which can be clearly understood by those skilled in the art and will not be described herein again.

It should be noted that, in this step, due to the existence of an acquisition error, an error may also occur in the obtained arctan value, so as to obtain an erroneous deflection angle, and if an image is corrected according to the erroneous deflection angle, an error may also occur in an output result, so when the obtained deflection angle may have an error, the erroneous deflection angle is discarded.

For example, in this embodiment, initially, the preset angle interval is fixedly set to [ -1/3 π,1/3 π ], and when the deflection angle is greater than 1/3 π or less than-1/3 π, the deflection angle is taken to be 0, that is: the current image is not deflected, so that the wrong deflection angle is discarded, and the subsequent steps are not influenced further. In addition, the preset angle interval can be adjusted according to the current acquisition frame rate, and the specific implementation mode is as follows: if the current acquisition frame rate is greater than the preset acquisition frame rate, reducing the preset angle interval, and if the current acquisition frame rate is less than the preset acquisition frame rate, increasing the preset angle interval, wherein the reduction or increase amplitude of the preset angle interval is determined by a difference value between the current acquisition frame rate and the preset acquisition frame rate, and the larger the difference value is, the larger the reduction or increase amplitude is. For example, initially, the preset angle interval is fixedly set to [ -1/3 pi, 1/3 pi ], and the preset acquisition frame rate is fixedly set to 25; if the current acquisition frame rate is 30, the preset angle interval is reduced to [ -1/6 π,1/6 π ], and if the current acquisition frame rate is 20, the preset angle interval is increased to [ -1/2 π,1/2 π ].

In the embodiment of the present application, in step S40, the affine transformation specifically includes:

s41: appointing an image matrix to be rotated and a rotation angle, wherein the rotation angle is obtained by accumulating all deflection angles calculated from a preset initial frame;

specifically, the image matrix to be rotated refers to the image matrix that needs to be processed currently, and since the deflection angle represents the deflection between the extracted key frame images of the two frames before and after, and the angle that the image matrix that needs to be processed currently needs to be rotated should be compared with the key frame image of the preset initial frame, the rotation angle is obtained by adding up all the deflection angles calculated from the preset initial frame.

S42: marking the central point of the image matrix to be rotated as the center of the image to be rotated;

specifically, the intersection of one half of the image length and one half of the width of the image matrix currently required to be processed is labeled as the center of the image to be rotated.

S43: allocating memory space to the output matrix;

specifically, memory space may be allocated to the output matrix according to the size and type of the original image.

S44: traversing each horizontal coordinate and each vertical coordinate of the image matrix to be rotated according to the image displacement and the rotation angle to obtain the output matrix;

specifically, traversing each horizontal and vertical coordinate of the image matrix to be rotated is to perform displacement transformation on each coordinate of the original image matrix, and rotate the rotated image in the original video stream back to the angle of the preset initial frame according to the image displacement and the rotation angle, so as to eliminate the influence of the displacement and the deflection on the image, and stabilize the displayed image at the same viewing angle, so that even if the earpick device rotates, the video image can be maintained at the same angle after processing.

It will be appreciated that pixels in the transformed output matrix that are not located at the coincident point may be black filled, since the last output image is rotated about the original image center and the image size is kept constant.

S45: converting the output matrix into a color output matrix;

in the present embodiment, the output matrix can be converted into a color output matrix using the function Vec3b (0, 0, 0).

S46: and converting the color output matrix into a color output picture through image coding.

The data output after rotation is also a matrix, and color output matrix data needs to be converted into a color output picture through image coding imerecode.

In the embodiment of the present application, after converting the color output matrix into a color output picture by image coding, the method further includes: rendering each obtained color output picture; and sending and displaying the rendered color output picture.

It can be understood that processed continuous color output pictures are continuously switched to real time or are displayed in an APP page of a mobile phone end after being cached, and videos of otoscope non-deflection images are displayed in real time at a user visual angle. Wherein, the real-time display means that all processing results are immediately applied to the key frame image of the current frame and displayed in real time; the display after the buffering means that multiple frames are buffered, the deflection angles of the frames are calculated through the method, and the multiple frames of key frame images are played back to a mobile phone terminal after the deflection angle processing is applied.

Partial black area can be left at the image edge after rotation processing, in order to promote user experience, can cut the image and show, finally show on the APP of cell-phone end for the perfect circle image, particularly, can use the image center as the initial point, minimum limit is the diameter, a perfect circle image of intercepting, just so can show complete and not have the image on black limit, wherein, minimum limit refers to the initial point to the minimum distance that black was filled, on the other hand, this minimum limit also can be set for according to the experience is artificial. Moreover, the image can be stored while being displayed, and the stored image can be an unprocessed original video or image or a processed video or image.

Based on the same inventive concept, the embodiment of the present application provides a computer-readable storage medium, which includes various steps that can be executed by a processor to implement the method for keeping an image from deflecting.

The computer-readable storage medium includes, for example: u disk, removable hard disk, read only memory, random access memory, magnetic or optical disk, etc. for storing program codes.

It is obvious to those skilled in the art that, for convenience and simplicity of description, the above division of each functional module is only used for illustration, and in practical applications, the above function distribution may be performed by different functional modules as needed, that is, the internal structure of the device is divided into different functional modules to perform all or part of the above described functions. For the specific working processes of the system, the apparatus, and the unit described above, reference may be made to the corresponding processes in the foregoing method embodiments, and details are not described here again.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. The integrated unit, if implemented in the form of a software functional unit and sold or used as a separate product, may be stored in a computer readable storage medium. Based on such understanding, the technical solutions of the present application, which are essential or part of the technical solutions contributing to the prior art, or all or part of the technical solutions, can be embodied in the form of a software product stored in a storage medium, and including several instructions for causing a computer device or a processor to execute all or part of the steps of the methods described in the embodiments of the present application.

The above embodiments are only used to describe the technical solutions of the present application in detail, but the above embodiments are only used to help understanding the method and the core idea of the present application, and should not be construed as limiting the present application. Those skilled in the art should also appreciate that various modifications and substitutions can be made without departing from the scope of the present disclosure.

Claims

1. A method for maintaining an image undeflected, comprising:

receiving an original video stream;

carrying out affine transformation on a preset key frame image according to the image displacement and the deflection angle and the preset mode so as to keep the image not to deflect;

2. The method for maintaining an undeflected image according to claim 1, wherein the single-frame mode comprises: extracting a key frame image every N frames from a preset initial frame, tracking and comparing the currently extracted key frame image with the previously extracted key frame image, and applying tracking and comparing results to all key frame images between the currently extracted frame and a next frame to be extracted; wherein N is a positive integer.

3. The method for maintaining an image undeflected according to claim 1, wherein the smoothing mode comprises: starting from a preset initial frame, tracking and comparing each key frame image in the current M frames with each previous key frame image by taking the M frames as a period, averaging each tracking and comparing result, and applying to all key frame images in the next M frames; wherein M is a positive integer.

4. The method for maintaining an image undeflected according to claim 1, wherein the blend mode comprises:

starting from a preset initial frame, and taking a P + Q frame as a period;

tracking and comparing each key frame image from the P +1 frame to the P + Q frame in the current P + Q frame with the respective previous key frame image;

averaging each tracking comparison result, and applying the result to all key frame images in the next P + Q frame;

wherein P and Q are both positive integers.

5. The method for maintaining an image undeflected according to claim 1, wherein the converting process specifically comprises:

converting the key frame image data into a matrix;

performing corner detection in the matrix;

6. The method for preserving image undeflected according to claim 5, wherein said converting the key frame image data into a matrix specifically comprises:

converting the keyframe image data into a vector;

converting the vector into a matrix;

and converting the matrix from a color matrix to a gray matrix.

7. Method for keeping an image undeflected according to claim 6, wherein said affine transformation comprises in particular:

allocating memory space to the output matrix;

converting the output matrix into a color output matrix;

8. The method for preserving image indeflection of claim 7, further comprising, after converting the color output matrix into a color output picture by image coding:

rendering each obtained color output picture;

and sending and displaying the rendered color output picture.

9. The method for maintaining an image undeflected according to claim 5, wherein the calculating optical flows of two frames of images includes: and calculating the optical flows of the front frame image and the rear frame image by adopting a pyramid Lucas-Kanade algorithm.

10. The method for maintaining an image undeflected according to claim 1, wherein the method for calculating the image displacement and deflection angle comprises:

calculating affine transformation parameters between two 2D point sets;

on the basis of the affine transformation parameters,

11. A method of maintaining an undeflected image according to claim 10, wherein: and when the obtained deflection angle is out of a preset angle interval, taking the deflection angle as 0.

12. An apparatus for maintaining an image undeflected, comprising:

a memory for storing an original video stream, a key frame image and a keep image undeflected program;

a processor for executing the steps of the method for maintaining an image undeflected according to any one of claims 1-11 when executing the program for maintaining an image undeflected.

13. A visual earpick, comprising:

an ear pick body;

a display device for displaying a picture in an ear;

a device for maintaining an image undeflected according to claim 12, wherein the input terminal is connected to the otoscope and the output terminal is connected to the display device.