CN113679327B

CN113679327B - Endoscopic image acquisition method and device

Info

Publication number: CN113679327B
Application number: CN202111244507.8A
Authority: CN
Inventors: 冯健; 常培佳; 邵学军; 杨延成
Original assignee: Qingdao Medcare Digital Engineering Co ltd
Current assignee: Qingdao Medcare Digital Engineering Co ltd
Priority date: 2021-10-26
Filing date: 2021-10-26
Publication date: 2022-02-18
Anticipated expiration: 2041-10-26
Also published as: CN113679327A

Abstract

The invention relates to an endoscopic image acquisition method and a device, wherein the endoscopic image acquisition method comprises the following steps: in the endoscopy process, determining the sum of Hamming distances of adjacent frames of images of an endoscopy video according to a preset image-taking frame number threshold; and when the Hamming distance sum reaches a preset first Hamming distance threshold, storing the image with the highest identification probability value or meeting the preset identification threshold from the various categories of endoscopy images classified and identified in advance. The invention not only improves the endoscopy efficiency and the working efficiency of users, but also reduces the labor burden of endoscopy.

Description

Endoscopic image acquisition method and device

Technical Field

The invention relates to the field of medical image processing, in particular to an endoscopic image acquisition method and device.

Background

At present, a doctor needs to trigger the picture-taking function manually in the endoscope inspection process. In practice, the physician's hands operate the instrument and trigger an endoscopic view by pedaling or an assistant. But since the pedals are fixed at a designated place, the range of motion of the physician is limited. With the assistant provided to the physician, an additional burden of manpower is also added.

In addition, in the actual use process, there is a certain time delay from the time when the doctor observes the appropriate image to the time when the acquisition of the image is triggered, so the acquired image is not often the image required by the doctor, the doctor often freezes the image and then acquires the image (the examination device must have the function of freezing the image), or the doctor continuously acquires a plurality of images at the same position, and picks the appropriate image from the image when writing the report, which also reduces the working efficiency of the doctor.

In view of the above problems with current endoscopic image acquisition, no effective solution has been provided in the prior art.

Disclosure of Invention

The embodiment of the invention provides an endoscopy image acquisition method and device, which are used for at least improving the endoscopy efficiency.

In a first aspect, an embodiment of the present invention provides an endoscopic image acquisition method, including:

in the endoscopy process, determining the sum of Hamming distances of adjacent frames of images of an endoscopy video according to a preset image-taking frame number threshold;

and when the Hamming distance sum reaches a preset first Hamming distance threshold, storing the image with the highest identification probability value or meeting the preset identification threshold from the various categories of endoscopy images classified and identified in advance.

Optionally, when the sum of the hamming distances reaches a preset first hamming distance threshold, before saving an image with a highest recognition probability value or meeting a preset recognition threshold from the various categories of endoscopy images recognized by pre-classification, the method includes:

when the Hamming distance sum reaches a preset second Hamming distance threshold value, classifying and identifying each frame of image of the endoscopy video meeting the preset definition threshold value through a pre-constructed depth neural network model;

according to the classification identification, determining each category of endoscopy image and the corresponding identification probability of each category of endoscopy image;

and caching the various categories of endoscopy images and the corresponding recognition probabilities of the various categories of endoscopy images.

Optionally, the endoscopic image acquisition method further comprises:

filtering each frame image of an endoscopy video by adopting a band-pass filter and a high-pass filter in a gray variance algorithm; evaluating the definition value of each frame of image after filtering through the determined low-frequency part energy and high-frequency part energy;

determining each frame of image of the endoscopy video meeting the definition threshold from the filtered frame of images.

Optionally, the formula of the gray variance algorithm is as follows:

wherein F (I) is the definition value, and I (x, y) is the gray value at the pixel point (x, y).

Optionally, each category of image comprises images of a site category and a lesion category; the site categories include one or more of: esophagus, cardia, inverted cardia, fundus, corpus gastri, angle of stomach, antrum gastri, pylorus, pharynx, duodenal bulb, and descending duodenum.

Optionally, the classifying and identifying each frame of image of the endoscopy video meeting the preset definition threshold through the pre-constructed deep neural network model includes:

and classifying and identifying each frame of image of the endoscopy video meeting the definition threshold value through a pre-constructed MobileNet model.

Optionally, the endoscopy image acquisition method comprises:

carrying out percentile position calculation interval estimation on N sampling values obtained in advance to obtain a confidence interval of the target Hamming distance threshold; after the maximum value and the minimum value of the confidence interval are removed, the median of the confidence interval is used as a target Hamming distance threshold value;

when the sampling value is a first sampling value, the target Hamming distance threshold value is the first Hamming distance threshold value; when the sampling value is a second sampling value, the target Hamming distance threshold is the second Hamming distance threshold;

in the process of acquiring the endoscopy images for N times, when the endoscopy images are triggered to start to be acquired each time, determining a Hamming distance sum as a first sampling value each time according to the image acquisition frame number threshold, and determining a Hamming distance sum as a second sampling value each time at a position where the first Hamming distance threshold corresponds to a previously preset M frames of images of the endoscopy images; n and M are positive integers.

Optionally, the determining a sum of hamming distances of adjacent frames of images of the endoscopy video according to a preset threshold of the number of frames of image acquisition includes:

zooming each frame image of the endoscopy video according to a preset first image size;

converting the scaled each frame image into each frame gray level image;

zooming each frame of gray level image according to a preset second image size;

determining the Hamming distance sum of the scaled gray level images of each adjacent frame according to the image-taking frame number threshold, and taking the Hamming distance sum of the scaled gray level images of each frame as the Hamming distance sum of the images of each adjacent frame of the endoscopy video; the second image size is smaller than the first image size.

Optionally, the endoscopic image acquisition method further comprises:

determining the gray level average value of each frame of the scaled gray level images of two adjacent frames;

sequentially comparing the gray value of each pixel of each frame of the scaled gray image with the average gray value;

according to the comparison result, carrying out binarization on the pixel gray value of each frame of the scaled gray image to obtain a hash value of each frame of the scaled gray image;

and determining the Hamming distance of the scaled gray level images of the two adjacent frames according to the Hash values of the scaled gray level images of the two adjacent frames.

In a second aspect, embodiments of the present invention provide an endoscopy apparatus, the endoscopy apparatus comprising: a memory, a processor, and a computer program stored on the memory and executable on the processor;

the computer program, when being executed by the processor, implements the steps of an endoscopic image acquisition method as defined in any one of the above.

According to the embodiments of the invention, the endoscope inspection videos can be classified and identified in the endoscope inspection process, so that the image acquisition intention of a user is judged by determining the sum of the Hamming distance of each adjacent frame image of the endoscope inspection videos and judging whether the Hamming distance sum reaches the first Hamming distance threshold value, and the image with the highest identification probability value or meeting the preset identification threshold value is stored in each category of endoscope inspection images classified and identified according to judgment, so that the user is helped to automatically acquire the endoscope inspection images required by the user, the endoscope inspection efficiency is improved, the working efficiency of the user is improved, and the labor burden of the endoscope inspection is reduced.

Drawings

FIG. 1 is a flow chart of an alternative endoscopic image acquisition method according to an embodiment of the present invention;

FIG. 2 is a flow chart of another alternative endoscopic image acquisition method according to an embodiment of the present invention;

in the figure: s101, determining the sum of Hamming distances of adjacent frames of images of an endoscopy video according to a preset image-taking frame number threshold in an endoscopy process; and S102, when the sum of the Hamming distance reaches a preset first Hamming distance threshold, storing the image with the highest identification probability value or meeting the preset identification threshold from the various categories of endoscopy images classified and identified in advance.

Detailed Description

The present invention will be described in further detail with reference to the following drawings and specific embodiments, it being understood that the specific embodiments described herein are merely illustrative of the invention and are not to be construed as limiting the invention.

In the following description, suffixes such as "module", "component", or "unit" used to denote elements are used only for facilitating the explanation of the present invention, and have no specific meaning in itself. Thus, "module", "component" or "unit" may be used mixedly.

Example one

An embodiment of the present invention provides an endoscopy image capturing method, as shown in fig. 1, the endoscopy image capturing method including:

s101, determining the sum of Hamming distances of adjacent frames of images of an endoscopy video according to a preset image-taking frame number threshold in an endoscopy process; wherein the endoscopy video is a video which is observed by an endoscope in the endoscopy process and is displayed on a display of the endoscopy equipment;

s102, when the Hamming distance sum reaches a preset first Hamming distance threshold value, storing the image with the highest identification probability value or meeting the preset identification threshold value from the various categories of endoscopy images classified and identified in advance.

Wherein, the setting of the threshold of the number of sampling frames can determine the precision of image acquisition, and optionally, the threshold of the number of sampling frames selects 10 frames. In practical implementation, the first hamming distance threshold is mainly used for judging the intention of a user (such as a doctor) to start triggering the device to start acquiring the image. The intent may include both the intent of image capture and the intent of preparing to capture an image, for example, a user may first slow down an endoscope during an endoscopic procedure if they find it desirable to capture an image (prepare to capture an image), and manually trigger the device to capture an image after the appropriate image is observed.

Each category of endoscopy image may include images of a site category and a lesion category; the site categories include one or more of: esophagus, cardia, inverted cardia, fundus, corpus gastri, angle of stomach, antrum gastri, pylorus, pharynx, duodenal bulb, and descending duodenum.

In the specific implementation process, the endoscope inspection videos can be classified and identified in the endoscope inspection process, so that the sum of the Hamming distances of adjacent frames of images of the endoscope inspection videos is determined, whether the Hamming distance sum reaches a first Hamming distance threshold value or not is judged, the image acquisition intention of a user is judged, the image with the highest identification probability value or meeting the preset identification threshold value is saved in each category of endoscope inspection images identified by classification according to judgment, and the user is helped to automatically acquire the endoscope inspection images required by the user.

In some embodiments, when the hamming distance sum reaches a preset first hamming distance threshold, before saving an image with a highest recognition probability value or meeting a preset recognition threshold from the various categories of endoscopy images recognized by pre-classification, the method includes:

The method for determining the sum of the Hamming distance of each adjacent frame image of the endoscopy video according to the preset image-taking frame number threshold value comprises the following steps:

converting the scaled each frame image into each frame gray level image;

zooming each frame of gray level image according to a preset second image size;

determining the Hamming distance sum of the scaled gray level images of each adjacent frame according to the image-taking frame number threshold, and taking the Hamming distance sum of the scaled gray level images of each frame as the Hamming distance sum of the images of each adjacent frame of the endoscopy video; the second image size is smaller than the first image size. The first image size may be 224 x 224 pixels and the second image size may be 8 x 8 pixels.

In this embodiment, through the judgment of the second hamming distance threshold, the image-taking intention of the user to be acquired is further judged, and each frame of image of the endoscopy video is classified and identified, so that not only is the endoscopy image required by the user prevented from being missed, but also each frame of image of the endoscopy video meeting the preset definition threshold is classified and identified, and the identification probability corresponding to each category of endoscopy image and each category of endoscopy image is determined, so that a representative image required by the user, such as a typical focus or a typical part, can be acquired, and the process of selecting the image by the user in the later stage is reduced.

The following describes an embodiment of the present invention in detail by way of a specific embodiment.

As shown in fig. 2, an endoscopic image acquisition method includes:

step 1, obtaining image frames of an endoscopy video.

Step 2, zooming the image and converting the image into a gray image; optionally, zooming each frame image of the endoscopy video according to a preset first image size; converting the scaled each frame image into each frame gray level image; zooming each frame of gray level image according to a preset second image size;

in the embodiment, for ensuring the accuracy of the recognition of the later deep neural network, the image is firstly scaled to 224 × 224, and then the image is converted into a grayscale image, so that the number of pixels to be calculated is 50176, because too low resolution can lose a large number of image features, but too high resolution can increase the calculation load, and at present, the resolution can reach the rate of 20-10/frame second (floating according to the model configuration) on the mainstream computer in the market, and meets the application requirement. In addition, since the hamming distance only needs 8 × 8 pixels, the image of 224 × 224 pixels needs to be scaled to 8 × 8 pixels before the hamming distance is calculated, thereby further saving computation power.

Image scaling employs a Bilinear Interpolation (Bilinear Interpolation) algorithm.

The RGB-to-gray scale map adopts the following formula:

Gray = 0.30R + 0.59G + 0.11B。

step 3, calculating the Hamming distance of the previous image, and accumulating the Hamming distance of 10 frames (threshold of the number of image acquisition frames) of images to obtain a Hamming distance sum; specifically, for the scaled gray level images of two adjacent frames, determining the gray level average value of each frame of the scaled gray level images; comparing the pixel gray value of each frame of the scaled gray image with the average gray value in sequence; according to the comparison result, carrying out binarization on the pixel gray value of each frame of the scaled gray image to obtain a hash value of each frame of the scaled gray image; and determining the Hamming distance of the scaled gray level images of the two adjacent frames according to the Hash values of the scaled gray level images of the two adjacent frames.

For example, first, 8 × 8 gray level average Avg is calculated.

And comparing the gray value of the pixel with the average gray value in sequence, if the gray value of the pixel is greater than the average gray value, setting the value of the pixel to be 1, and otherwise, setting the value to be 0. The pseudo code is as follows

If（Pixel(x,y)>Avg）

Pixel(x,y)=1

Else

Pixel(x,y)=0

Pixel is the Pixel gray value of x row and y column.

This results in a 64-bit hash of an 8 x 8 gray scale map.

The Hamming distance of the two images is then calculated as follows

x and y respectively represent hash values corresponding to different pictures,

representing an exclusive or.

And 4, judging whether the Hamming distance sum reaches a second Hamming distance threshold value, if not, executing the step 1, and if so, executing the step 5.

Step 5, calculating the image definition; optionally, filtering each frame image of the endoscopy video by adopting a band-pass filter and a high-pass filter in a gray variance algorithm; evaluating the definition value of each frame of image after filtering through the determined low-frequency part energy and high-frequency part energy; determining each frame of image of the endoscopy video meeting the definition threshold from the filtered frame of images.

For example, the image definition adopts an improved "gray variance algorithm", and an original gray variance algorithm (Brenner gradient function) only examines the gray difference between a judged point and one of adjacent pixel points, and the formula is as follows:

the Brenner gradient operator can be regarded as that template T = [ -101 ] and image pixels [ I (x, y) I (x +1, y) I (x +2, y) ] at corresponding positions are sequentially convolved, the template T = [ -101 ] is a band-pass filter, the Brenner gradient operator filters out low-frequency energy with a large proportion through the band-pass filter, and intermediate-frequency part energy in the image is reserved.

The improved algorithm employs two template filters to overcome the effect of the threshold on the traditional Brenner algorithm evaluation results. The two filter templates are a band-pass filter T = [ -101 ] and a high-pass filter G = [ 1-1 ], respectively. And respectively filtering the image by using two filter templates, and evaluating the definition of the image by calculating the energy of a low-frequency part and the energy of a high-frequency part. The formula is as follows:

wherein F (I) is the definition value, I (x, y) is the pixel pointx, y) The gray value of (d).

The improved algorithm can filter out low-frequency components with a large proportion and simultaneously reserve medium-high frequency components with rich image details. And further removing the blurred image through the calculated image definition value.

Step 6, judging whether the image is clear or not; if not, go to step 9, and if yes, go to step 7.

Step 7, identifying image characteristics through a deep neural network model, and classifying and identifying the images through the image characteristics; optionally, each frame of image of the endoscopy video meeting the definition threshold is classified and identified through a pre-constructed MobileNet model.

In the embodiment, the classification network is adopted for the image recognition of the deep neural network, and the MobileNet model is selected for each classical network in comparison with the final network in the embodiment, so that the method has the advantages of less network parameters, high running speed and slightly lower precision, but the main purpose is to measure the conformity of the image feature points and do not need accurate recognition.

The MobileNets model is based on a depth-resolvable convolution, which can decompose a standard convolution into a depth convolution and a point convolution (1 × 1 convolution kernel). Deep convolution applies each convolution kernel to each channel, and 1 × 1 convolution is used to combine the output of the channel convolutions. The decomposition can effectively reduce the calculation amount and the size of the model.

Wherein, the standard convolution:

for example, by digitizing, the dimension of the input picture is 11 × 11 × 3, and the standard convolution is 3 × 3 × 3 × 16 (assuming stride is 2 and padding is 1), an output result with an output of 6 × 6 × 16 can be obtained.

Wherein, the depth convolution and one point convolution:

now, the input picture is unchanged, the intermediate output of 6 × 6 × 3 is obtained by deep convolution with one dimension of 3 × 3 × 1 × 3 (the input is 3 channels, and there are 3 convolution kernels, which are correspondingly calculated and understood as for loop), and then the output of 6 × 6 × 16 is also obtained by 1 × 1 convolution with one dimension of 1 × 1 × 3 × 16.

The MobileNet model is built in the above-described depth separable deconvolution (only the first layer is the standard convolution). Except for the last fully connected layer, all layers are followed by batchnorm and ReLU, and finally input to softmax for classification.

In this embodiment, the examination site is divided into 11 categories plus 13 categories of one 'lesion' category and one 'other' category. And then training the network by using the valuable image selected by the user as a training set. The "others" are images of unrecognizable sites, and 11 sites are esophagus, cardia, reversed cardia, fundus, corpus gastri, angle of stomach, antrum gastri, pylorus, throat, duodenal bulbar part, and descending duodenum part.

Images identified as 'other' categories are removed and the rest are cached after each image identification.

And 8, caching the image and the identification result.

Step 9, judging whether the Hamming distance sum reaches a first Hamming distance threshold value; if yes, step 10 is executed, otherwise step 1 is executed. Optionally, performing percentile position calculation interval estimation on N sampling values obtained in advance to obtain a confidence interval of the target hamming distance threshold; after the maximum value and the minimum value of the confidence interval are removed, the median of the confidence interval is used as a target Hamming distance threshold value; when the sampling value is a first sampling value, the target Hamming distance threshold value is the first Hamming distance threshold value; when the sampling value is a second sampling value, the target Hamming distance threshold is the second Hamming distance threshold; in the process of acquiring the endoscopy images for N times, when the endoscopy images are triggered to start to be acquired each time, determining a Hamming distance sum as a first sampling value each time according to the image acquisition frame number threshold, and determining a Hamming distance sum as a second sampling value each time at a position where the first Hamming distance threshold corresponds to a previously preset M frames of images of the endoscopy images; n and M are positive integers.

For example, in the embodiment, the hamming distance threshold is divided into two levels, one is only to instruct the system to enter a pre-image-taking stage (representing that a doctor prepares to take an image), in which stage we start definition calculation and deep neural network recognition, and this threshold is called as a second hamming distance threshold.

Another threshold indicates that the system is to take a picture (indicating that the physician is to take a picture), referred to as the first Hamming distance threshold. Both thresholds are automatically calculated. Wherein the second threshold value is calculated as follows.

After the user manually picks the image each time, the sum of the Hamming distances of the current 10 frames of images is taken as a sampling point, the sampling point is marked as X, then 500 sampling points X1 and X2 … X500 are recorded, then a confidence interval of a second threshold value is calculated by adopting a percentile position interval estimation algorithm, and the formula is as follows:

wherein (A) and (B)Y _i,Y _j) Is formed by sampling pointsX _i,X _jThe order statistics obtained.

From this, the confidence interval [ 2 ]Y _i,Y _j]Then, the maximum value and the minimum value of the confidence interval are removed, and the number of the digits is obtained, so that the value is the first Hamming distance threshold value.

Based on the first Hamming distance threshold, forward pushing M (e.g. 30) frame data, taking the sum of 10 Hamming distances recorded therein as one sampling point, recording 500 sampling points, and calculating the median of confidence interval as the second Hamming distance threshold by the same method

And 10, extracting and storing the image with the maximum recognition probability value in the buffer queue. And selecting the image with the highest probability value from the cached images for storage, wherein the higher probability value means that the more feature points of the image which is consistent with the high-value image, the more valuable the image is.

According to research, the embodiment of the invention discovers that when a doctor uses an image, the speed of an observation mirror is firstly slowed down, the image is manually collected after a proper image is observed, the image which the user wants to collect is generally a representative image such as a typical focus or a typical part, and on the basis, the embodiment of the invention designs a technical method which adopts Hamming distance, image definition and a deep neural network to help the user to automatically collect the required image. According to one embodiment of the invention, firstly, the intention of a user for image acquisition is judged through a Hamming distance algorithm, in the period of preparing and determining image acquisition by the user, blurred images are eliminated through image definition evaluation, then, the remaining images are extracted through a deep neural network to obtain image feature points, and finally, the image with the highest feature point ratio is selected and stored. Not only improves the efficiency of endoscope examination and the work efficiency of users, but also reduces the labor burden of endoscope examination.

Example two

An embodiment of the present invention provides an endoscopy device, where endoscopy includes: a memory, a processor, and a computer program stored on the memory and executable on the processor;

the computer program, when being executed by the processor, realizes the steps of the endoscopic image acquisition method according to any one of the embodiments.

EXAMPLE III

An embodiment of the present invention provides a computer-readable storage medium, on which an endoscopic image acquisition program is stored, and when the endoscopic image acquisition program is executed by a processor, the steps of the endoscopic image acquisition method according to any one of the embodiments are implemented.

In the concrete implementation process of the second embodiment to the third embodiment, reference may be made to the first embodiment, and corresponding technical effects are achieved.

While the present invention has been described with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, which are illustrative and not restrictive, and it will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. An endoscopy image capturing method, comprising:

when the Hamming distance sum reaches a preset first Hamming distance threshold, storing an image with the highest identification probability value or meeting a preset identification threshold from various categories of endoscopy images classified and identified in advance;

when the sampling value is a first sampling value, the target Hamming distance threshold is the first Hamming distance threshold; when the sampling value is a second sampling value, the target Hamming distance threshold is the second Hamming distance threshold;

2. An endoscopy image capturing method according to claim 1, wherein said storing, when said hamming distance sum reaches a preset first hamming distance threshold, an image having a highest recognition probability value or satisfying a preset recognition threshold from among the classes of endoscopy images recognized by pre-classification, comprises:

3. An endoscopic image acquisition method according to claim 2, further comprising:

4. An endoscopic image acquisition method according to claim 3, wherein said gray-scale variance algorithm has the following formula:

in the formula (I), the compound is shown in the specification,

in order to be a value of the sharpness value,I(x, y) Is a pixel point (x, y) The gray value of (d).

5. An endoscopy image capturing method according to claim 2, wherein each category of image comprises images of a site category and a lesion category; the site categories include one or more of: esophagus, cardia, inverted cardia, fundus, corpus gastri, angle of stomach, antrum gastri, pylorus, pharynx, duodenal bulb and descending duodenum;

the method for classifying and identifying each frame of image of the endoscopy video meeting the preset definition threshold through the pre-constructed deep neural network model comprises the following steps:

6. An endoscopy image capturing method according to any of claims 1-5, wherein said determining a sum of hamming distances between adjacent frames of images of an endoscopy video according to a preset threshold number of frames of images, comprises:

converting the scaled each frame image into each frame gray level image;

zooming each frame of gray level image according to a preset second image size;

7. An endoscopic image acquisition method according to claim 6, further comprising:

8. An endoscopy device, the endoscopy comprising: a memory, a processor, and a computer program stored on the memory and executable on the processor;

the computer program, when being executed by the processor, carries out the steps of the endoscopic image acquisition method according to any one of claims 1 to 7.