CN118141307B

CN118141307B - AR technology-based endoscope operation and control method and endoscope system

Info

Publication number: CN118141307B
Application number: CN202410557787.5A
Authority: CN
Inventors: 王耀瓒; 谢崇青
Original assignee: Jiangxi Saixin Medical Technology Co ltd
Current assignee: Jiangxi Saixin Medical Technology Co ltd
Priority date: 2024-05-08
Filing date: 2024-05-08
Publication date: 2024-08-23
Anticipated expiration: 2044-05-08
Also published as: CN118141307A

Abstract

The application discloses an endoscope control method and an endoscope system based on AR technology, and relates to the technical field of medical treatment. An endoscopic manipulation method based on AR technology, comprising: the method comprises the following steps: step 1: collecting a left image sequence of a left camera and a right image sequence of a right camera in the binocular lens according to a time sequence; step 2: and aligning each picture in the left image sequence with each picture in the right image sequence according to the time sequence to obtain an image group arranged according to the time sequence. An endoscope system includes a binocular lens, an image processing apparatus, an information processing apparatus, and a control apparatus; in the scheme provided by the application, the image processing device is adopted to generate the distance information of the binocular lens and the object in front, and the movement multiple of the binocular lens is controlled by the distance information, so that the movement multiple of the binocular lens is continuously changed in the operation process so as to adapt to the surrounding situation.

Description

AR technology-based endoscope operation and control method and endoscope system

Technical Field

The application relates to the technical field of medical treatment, in particular to an endoscope operation method and an endoscope system based on AR technology.

Background

The visual endoscope based on AR intelligence is an intelligent AR device used in medical treatment, an image signal acquisition end is usually led into a patient, detected image information is transmitted to a signal receiving mechanism by an information acquisition end in an AR video mode, then data processing is carried out through a data processing module, and at the moment, a corresponding AR picture is displayed in an AR glasses main body.

In the existing operation mode, a doctor slowly controls the endoscope to move according to own experience so as to complete the endoscopic operation. However, in practical operation, not all the cavity channels are very complex environments, so that the moving speed of the endoscope in some cavities is very fast, and the moving speed in some cavities becomes slow, so that when a doctor performs a complex endoscopic operation, the inner wall of the cavity is easily damaged due to factors such as fatigue and the like. For example, when the endoscope moves rapidly, the endoscope suddenly enters into a complex cavity, and the endoscope collides with the cavity because the endoscope is not controlled, so that the surface of the cavity is scratched.

Disclosure of Invention

The summary of the application is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. The summary of the application is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

As a first aspect of the present application, in order to solve the technical problems mentioned in the background section above, some embodiments of the present application provide an endoscope manipulation method based on AR technology, including the steps of:

step 1: collecting a left image sequence of a left camera and a right image sequence of a right camera in the binocular lens according to a time sequence;

step 2: aligning each picture in the left image sequence with each picture in the right image sequence according to a time sequence to obtain an image group arranged according to the time sequence;

step 3: sequentially calculating the proportion of the similar areas of the pictures on the image group to obtain time sequence information of the similarity of the pictures;

step 4: inputting the time sequence information of the picture similarity into a distance neural network model to obtain distance information;

step 5: and generating control parameters according to the distance information, wherein the control parameters are used for generating movement multiples of the binocular lens so as to control the movement of the binocular lens.

In the scheme provided by the application, the image processing device is adopted to generate the distance information of the binocular lens and the object in front, and the movement multiple of the binocular lens is controlled by the distance information, so that the movement multiple of the binocular lens is continuously changed in the operation process so as to adapt to the surrounding situation; when a doctor operates, and when a front scene is simpler, the doctor can quickly pass through the device or finish corresponding operation because of large movement multiple; when the front scene is complex, because the movement multiple is small, even if a doctor generates hand shake due to an unintelligible action, the end part of the binocular lens can not be greatly moved, so that the damage to a patient is avoided.

The image information acquired by the camera needs to form video information, generally, the image information adopted by the current camera at least has 60 frames, so at least 60 pictures are provided in 1 second, and if similarity matching is performed on each frame of pictures, the calculated amount is too large, and the distance information cannot be generated in real time. Aiming at the problem, the application provides the following technical scheme:

further, step 2 includes the steps of:

step 21: collecting displacement data of the binocular lens, and generating corresponding acquisition nodes when the binocular lens moves to a preset unit distance;

step 22: extracting a picture which is closest to each acquisition node from the left image sequence to obtain a left camera acquisition sequence; extracting a picture closest to each acquisition node from the right image sequence to obtain a right camera acquisition sequence;

step 23: and aligning each picture in the left camera acquisition sequence and the right camera acquisition sequence according to the time sequence to obtain an image group arranged according to the time sequence.

According to the technical scheme provided by the application, corresponding acquisition points are generated according to the displacement data of the binocular lens, namely according to the displacement distance of the binocular lens, and then the acquisition points are used for deleting the image sequence collected by the binocular camera, so that the corresponding calculated amount is reduced. The corresponding acquisition points are obtained according to the displacement data, rather than acquisition according to time, the movement of the binocular lens can be controlled as much as possible, and the situation that larger risk factors occur due to too long designed acquisition intervals is avoided. For example, when the movement speed of the binocular lens is fast, more acquisition points are provided in the same time, so that the position change of the binocular lens can be timely found. On the contrary, when the movement amount of the binocular head is small, even if the binocular head does not move, the number of the acquisition points is reduced, so that the operation amount is reduced, and the response speed is increased.

Because the distance information needs to be calculated as soon as possible to control the movement multiple of the binocular lens, the situation that the control multiple can not be generated in real time according to the movement of the binocular lens because the calculation time is too long is avoided. In the previous solution, however, the number of pictures to be compared has been reduced. However, the number of pixels of each picture is large, and the whole contents of the two pictures are compared, so that the corresponding speed of the system is affected due to the large calculated amount, and the control multiple cannot be generated in real time. For this purpose, the application provides the following technical scheme:

further, step 3 includes the following steps:

Step 31: presetting a comparison range;

Step 32: for two pictures P ₁ and P ₂ in any one image group, taking the contrast range of the picture P ₁ near the edge side of the picture P ₂ as a contrast area;

Step 33: inputting the comparison region and the picture P ₂ into a picture comparison model together to obtain an overlapping region of the comparison region and the picture P ₂;

Step 34: the overlapping area in the picture P ₂ to the area near the edge of the picture P ₁ side is taken as the similar area, and the similarity ratio is calculated.

In the technical scheme provided by the application, instead of directly inputting two pictures into the picture comparison model together for similarity comparison, the comparison area of the picture P ₁ and the picture P ₂ are input into the comparison model for similarity comparison, so that the rest part of the picture P ₁ except for the comparison area does not need to be subjected to similarity comparison, and the difficulty of image comparison can be effectively reduced. Compared with the mode of reducing the definition of the image, the scheme is used for similarity comparison, and the scheme does not lose the detail characteristics of the image, so that the accuracy of comparison is higher.

Further, the picture comparison model calculates the similar region between the picture P ₂ and the comparison region by:

S1: extracting features of the comparison region and the picture P ₂ to obtain key points of the comparison region and the picture P ₂ respectively;

for image I (x, y), the scale space L (x, y, σ) is defined as: l (x, y, σ) =g (x, y, σ) ×i (x, y);

wherein G (x, y, σ) is a Gaussian function;

；

S2: for each keypoint, determining one or more principal directions based on the gradient direction histogram within the neighborhood of the keypoint;

The gradient amplitude m and the direction theta of the key points are respectively as follows:

；

wherein x and y are respectively the abscissa and ordinate of the pixel point, sigma is the standard deviation of a Gaussian function, m (x, y) represents the gradient amplitude at the position (x, y), and L represents the image after Gaussian blur is carried out on the image I under different scales;

S3: respectively constructing a feature point set A and a feature point set B of a comparison area and a picture P ₂, wherein A is the feature point set of the comparison area, and B is the feature point set of the comparison area;

s4: and matching the characteristic points in the characteristic point set A and the characteristic point set B to obtain a similar region of the picture P ₂ and the contrast region.

Further, the matching manner of the feature points in S4 is as follows:

For the feature point F ₁ in the feature point set a and the feature point F ₂ in the feature point set B, the matching degree is calculated as follows:

；

Where cos θ represents the similarity between feature point F ₁ and feature point F ₂, AndThe i-th component in the descriptor vectors of the feature point F ₁ and the feature point F ₂ are represented, respectively, and n represents the dimension of the feature point descriptor vector.

According to the scheme provided by the application, the SIFT algorithm is adopted to extract the corresponding characteristic points, the characteristic point set is formed, the similarity calculation between the pictures is realized by using the matching degree of the characteristic points, the characteristic that the scale is not changed can be realized, the accuracy of the similarity calculation can be effectively increased, namely, the characteristic that the scale of the SIFT algorithm is not changed when the left and right cameras of the binocular lens shoot the same area because of the deviation of angles or slight distances is utilized, and the characteristic that the scale of the SIFT algorithm is not changed can also be utilized to carry out characteristic point matching. In the foregoing solution, a matching algorithm of feature points is provided, but for a picture or a comparison area, there still exist a plurality of feature points, these feature points are extracted and feature point matching is performed, the efficiency of generating a matching multiple is affected because the number of feature points is too many, and reducing the number of feature points may result in low accuracy of the obtained similar area.

Further, after at least a feature points in the feature point set a are successfully matched with feature points in the feature point set B, the matching of the feature points is terminated, and the feature point farthest from the edge of the picture P ₂ is used as a boundary of the similar region.

In the technical scheme provided by the application, the middle characteristic points are used for matching, and then the furthest characteristic point is used as the boundary of the similar area after the a characteristic points are matched. Therefore, the matching quantity of the characteristic points can be reduced to the maximum extent, and the corresponding time of the system is reduced. The application is designed in such a way that the edge contour of the similar area is found essentially, because the middle area in the similar area is practically identical, and the feature points can be matched. Therefore, only the region where the feature points can be matched with each other needs to be found. Thus, the comparison is performed from the middle position of the picture, and the comparison quantity of the characteristic points can be reduced.

Further, the size of a is related to the definition of the image, a= [ G/G ], G is the number of pixel grids in the image, G is a preset scaling factor, and [ (] is a rounding symbol.

In the technical scheme provided by the application, the relation between the precision and the accuracy can be controlled by adjusting the proportionality coefficient, when the precision is required to be increased, the size of G is reduced, and otherwise, the size of G is increased.

As a second aspect of the present application, the present application provides an endoscope system in which the aforementioned endoscope manipulation method based on AR technology is applied; the endoscope system includes: a binocular lens, an image processing apparatus, an information processing apparatus, and a control apparatus; the binocular lens is in signal connection with the image processing device, and the information processing device is in signal connection with the image processing device and the control device respectively;

The binocular head is used for receiving the image information and sending the image information to the image processing device;

The image processing device is used for receiving the image information, judging the distance information between the binocular lens and the object in front according to the image information, and then sending the distance information to the information processing device;

an information processing device for generating control parameters according to the distance information and transmitting the control parameters to the control device;

The control device is used for generating the movement multiple of the binocular lens so as to control the movement of the binocular lens;

the control parameter is used for changing the movement multiple of the binocular lens controlled by the control device.

Further, the distance information is positively correlated with the movement multiple.

The distance information is positively correlated with the movement multiple, so that when the binocular lens is relatively close to the obstacle in front, the operator will be slower when operating the lens to move, and when the binocular lens is relatively far from the obstacle in front, the operator will be faster when operating the lens, so that the lens can be effectively avoided.

The application has the beneficial effects that: compared with the scheme that the control device is directly adopted to control the movement of the binocular lens by adopting the fixed movement multiple, the problem that the lens is easy to contact with the inner wall of the cavity is solved. In the scheme, the image processing device is used for generating the distance information of the binocular lens and the object in front, and the movement multiple of the binocular lens is controlled by the distance information, so that the movement multiple of the binocular lens can be continuously changed in the operation process so as to adapt to the surrounding situation.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application, are incorporated in and constitute a part of this specification. The drawings and their description are illustrative of the application and are not to be construed as unduly limiting the application.

In addition, the same or similar reference numerals denote the same or similar elements throughout the drawings. It should be understood that the figures are schematic and that elements and components are not necessarily drawn to scale.

In the drawings:

fig. 1 is a flowchart of an endoscopic manipulation method based on AR technology.

Fig. 2 is a schematic diagram of the binocular lens when the pictures overlap.

Detailed Description

Embodiments of the present application will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the application have been illustrated in the accompanying drawings, it is to be understood that the application may be embodied in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete. It should be understood that the drawings and embodiments of the application are for illustration purposes only and are not intended to limit the scope of the present application.

It should be noted that, for convenience of description, only the portions related to the present application are shown in the drawings. Embodiments of the application and features of the embodiments may be combined with each other without conflict.

The application will be described in detail below with reference to the drawings in connection with embodiments.

An AR technology-based endoscope manipulation system, comprising: a binocular lens, an image processing apparatus, an information processing apparatus, and a control apparatus; the binocular lens is in signal connection with the image processing device, and the information processing device is in signal connection with the image processing device and the control device respectively; the binocular head is used for receiving the image information and sending the image information to the image processing device; the binocular lens is a common camera in the existing endoscope, and compared with the conventional endoscope which is provided with only one camera, the technical scheme provided by the application is provided with two cameras. The specific camera arrangement is not described here in detail. However, since the two cameras are positioned at the same position at the end of the endoscope, there is a large portion of overlapping areas between the images captured by the two cameras.

And the image processing device is used for receiving the image information, judging the distance information between the binocular lens and the object in front according to the image information, and then sending the distance information to the information processing device.

The image processing device performs binocular vision calculation according to the image information of the two channels generated by the binocular lens, so as to obtain distance information between the front area and the binocular lens, wherein the distance information is the distance between the front barrier of the binocular lens and the binocular lens.

And the information processing device is used for generating control parameters according to the distance information and sending the control parameters to the control device. The information processing apparatus is actually a forwarding and generating apparatus of information, and is mainly used for control work of the control apparatus.

The control device is used for generating the movement multiple of the binocular lens so as to control the movement of the binocular lens; the control parameter is used for changing the movement multiple of the binocular lens controlled by the control device. The control device is actually a control platform for endoscope operation, and in some electronic endoscope operation works, the operation platform is used for controlling the forward and backward operation of the endoscope. By adopting the control mode, the control multiple can be mainly enlarged and reduced. The specific principle is as the DPI speed of the mouse, when the DPI is high, the moving speed of the mouse on the computer is high, and otherwise, the moving speed of the mouse on the computer is low. Accordingly, in endoscopic operation, there is also an operating platform that controls the movement of the endoscope, which also has a corresponding DPI, or the rate of movement described in the present application.

Specifically, the distance information is positively correlated with the movement multiple. That is, the farther the forward obstacle is from the binocular lens, the greater the fold of movement and the faster the endoscope is moved during operation. The smaller the barrier and binocular head in the reverse direction, the smaller the forward movement multiple, and the lower the endoscope movement rate during operation.

Referring to fig. 1, an endoscope manipulation method based on AR technology, which is applied to the foregoing endoscope manipulation system based on AR technology, includes the following steps:

Step 1: and collecting a left image sequence of a left camera and a right image sequence of a right camera in the binocular lens according to the time sequence.

The binocular lens would include two cameras, namely a left camera and a right camera, which would start working synchronously and end working synchronously. Therefore, the image information of the two cameras is collected according to the time sequence, the images of the two cameras can be aligned according to the time sequence information, and at the moment, the contents shot by the left camera and the right camera are identical.

Step2: and aligning each picture in the left image sequence with each picture in the right image sequence according to the time sequence to obtain an image group arranged according to the time sequence.

Step2 comprises the following steps:

Step 21: and collecting displacement data of the binocular lens, and generating corresponding acquisition nodes when the binocular lens moves to a preset unit distance.

The displacement data of the binocular lens needs to be obtained through a displacement sensor or a gyroscope, and how to arrange the displacement sensor on the binocular lens is not repeated here. The displacement sensor may be a laser sensor, and the laser sensor may calculate the displacement amount from a change in the position of the laser light reflected on the surface. This technique is a mature technique.

Therefore, after the displacement sensor is arranged on the binocular head, a collecting point can be generated when the binocular head moves a certain distance. For example, a shift of 2mm is taken as the position where the acquisition point is generated, so that one acquisition point is generated every 2mm of shift of the binocular lens. Furthermore, when the moving speed is high, the number of the acquisition points in unit time is large, and when the moving speed is low, the number of the acquisition points in unit time is small, so that the number of the acquisition points can be adaptively adjusted, the situation that the number of the acquisition points is too large and calculation is frequently carried out to influence the corresponding speed of a system or the number of the acquisition points is too small is avoided, and the binocular heads are easy to collide.

Step 22: extracting a picture which is closest to each acquisition node from the left image sequence to obtain a left camera acquisition sequence; and extracting a picture which is closest to each acquisition node from the right image sequence to obtain a right camera acquisition sequence.

The left camera acquisition sequence is actually a sequence of pictures obtained after acquisition from a left image sequence at intervals of acquisition points. Specifically, the left image sequence is an original image sequence acquired by the left camera, for example, the frame number of the video acquired by the left camera is 60 frames per second, and then the left image sequence is a video stream of 60 pictures per second. The left image acquisition sequence is a picture sequence related to the acquisition point. For example, if there are only 1 acquisition point in 1 second, the picture nearest to the acquisition point is reserved, and the remaining 59 pictures are deleted as the left camera acquisition sequence. As is the relationship of the right camera acquisition sequence to the right image sequence.

Step 3: and sequentially calculating the proportion of the similar areas of the pictures on the image group to obtain the time sequence information of the picture similarity.

Step 3 comprises the following steps:

Step 31: the comparison range is preset.

The contrast range is actually the maximum overlapping range of the left camera and the right camera in the binocular lens, and the images collected by the left camera and the right camera cannot be overlapped in percentage because the left camera and the right camera are not positioned at the same position. For this reason, the comparison difficulty is increased by directly comparing two pictures. For this purpose, a contrast range needs to be set. For example, the contrast range is set to 80%, that is, 80% of the picture is used for the contrast area, and the remaining 20% is discarded.

Step 32: for two pictures P ₁ and P ₂ in any one of the image groups, the contrast range near the edge side in the picture P ₂ in the picture P ₁ is taken as the contrast area.

The comparison range has been previously set in step 31, so that the comparison area is actually set according to the size of the comparison range. In this embodiment, the contrast range is set in the picture P ₁, that is, the 80% of the right area of the picture P ₁ is used as the contrast area, and the remaining area is directly discarded.

Step 33: the contrast region and the picture P ₂ are input together into the picture contrast model, resulting in an overlapping region of the contrast region and the picture P ₂.

The similarity ratio is actually the ratio of the overlapping area to the picture P ₂, for example, the size of the picture P ₂ is 100, and the overlapping area is 80, and the similarity ratio is 80%.

wherein G (x, y, σ) is a Gaussian function;

；

The matching mode of the feature points in S4 is as follows:

；

The feature extraction feature point matching of the scheme is a matching mode established based on the SIFT algorithm, and has the characteristic of non-scale transformation. In practice, the problem of the difference in binocular camera positions, which may lead to an unrecognizable overlap region due to a certain zoom and rotation, can be avoided.

Further, in S4, after at least a feature points in the feature point set a are successfully matched with feature points in the feature point set B, the matching of the feature points is terminated, and the feature point farthest from the edge of the picture P ₂ is used as a boundary of the similar region.

Referring to fig. 2, since the boundary line of the actual overlapping region between the picture P ₁ and the picture P ₂ is located in the middle region, the feature point extraction and feature point matching are performed from the middle to both sides, so that the number of feature point extraction can be reduced, and the delay of the system can be reduced.

Step 4: and inputting the time sequence information of the picture similarity into the distance neural network model to obtain distance information.

The neural network model is constructed in the prior art, and corresponding distance information can be converted under the condition that the picture similarity of the left camera and the right camera is obtained. The neural network model is used for conversion, so that the empirical computing capacity of the neural network model after learning is adopted, the computing difficulty is reduced, and the corresponding distance information can be obtained. And is arranged so that the essential reason is that the shift factor of the binocular lens is not a continuous variable but a discrete variable. The adjustment is also made between different gear positions when modified. Therefore, the distance information is acquired only by blurring.

In the above scheme, the distance information between the binocular lens and the object in front can be obtained, and the conversion mode of the control multiple and the distance information can be preset. In practice, any number of times may be selected. For example, when the distance between the binocular lens and the front is less than 5 cm, the control magnification is set to 1-fold, and when it is greater than 5 cm, it is set to 2-fold.

The above description is only illustrative of the few preferred embodiments of the present application and of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the application in the embodiments of the present application is not limited to the specific combination of the above technical features, but also encompasses other technical features formed by any combination of the above technical features or their equivalents without departing from the spirit of the application. Such as the above-described features, are mutually replaced with the technical features having similar functions (but not limited to) disclosed in the embodiments of the present application.

Claims

1. An endoscope system, comprising: a binocular lens, an image processing apparatus, an information processing apparatus, and a control apparatus;

the endoscope system adopts an endoscope control method based on AR technology to generate control parameters for controlling the movement of the binocular heads;

the endoscope operation method based on the AR technology comprises the following steps:

step 5: generating control parameters according to the distance information, wherein the control parameters are used for generating movement multiples of the binocular lens so as to control the movement of the binocular lens;

step2 comprises the following steps:

Step 22: extracting a picture which is closest to each acquisition node from the left image sequence to obtain a left camera acquisition sequence;

extracting a picture closest to each acquisition node from the right image sequence to obtain a right camera acquisition sequence;

step 23: aligning each picture in the left camera acquisition sequence and the right camera acquisition sequence according to a time sequence to obtain an image group arranged according to the time sequence;

step 3 comprises the following steps:

Step 31: presetting a comparison range;

Step 34: taking the overlapping area in the picture P ₂ to the area close to the edge of one side of the picture P ₁ as a similar area, and calculating the similarity proportion;

The picture contrast model calculates the similar region of the picture P ₂ to the contrast region as follows:

wherein G (x, y, σ) is a Gaussian function; ；

；

s3: respectively constructing a contrast area and a feature point set A and a feature point set B of the picture P2, wherein A is the feature point set of the contrast area, and B is the feature point set of the picture P2;

s4: matching the characteristic points in the characteristic point set A and the characteristic point set B to obtain a similar region of the picture P2 and the contrast region;

In the step S4, when the characteristic points in the characteristic point set A and the characteristic point set B are matched, the characteristic points positioned in the middle of the picture P2 are compared with the characteristic points in the comparison area;

in S4, after at least a feature points in the feature point set A and feature points in the feature point set B are successfully matched, the matching of the feature points is terminated, and the feature point farthest from the edge of the picture P2 is used as a boundary of a similar area;

The size of a is related to the definition of the image, a= [ G/G ], G is the number of pixel grids in the image, G is a preset proportionality coefficient, and [ (] is a rounding symbol.

2. An endoscope system according to claim 1 and wherein: the matching mode of the feature points in S4 is as follows:

For the feature point F ₁ in the feature point set a and the feature point F ₂ in the feature point set B, the matching degree is calculated as follows: ；

Where cos θ represents the similarity between feature point F ₁ and feature point F ₂, AndThe component represents the i-th component in the descriptor vectors of the feature point F ₁ and the feature point F ₂, and n represents the dimension of the feature point descriptor vector.