CN113393449A - Endoscope video image automatic storage method based on artificial intelligence - Google Patents

Endoscope video image automatic storage method based on artificial intelligence Download PDF

Info

Publication number
CN113393449A
CN113393449A CN202110710489.1A CN202110710489A CN113393449A CN 113393449 A CN113393449 A CN 113393449A CN 202110710489 A CN202110710489 A CN 202110710489A CN 113393449 A CN113393449 A CN 113393449A
Authority
CN
China
Prior art keywords
image
frame
video
similarity
frames
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110710489.1A
Other languages
Chinese (zh)
Inventor
俞晔
方圆圆
姜婷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai First Peoples Hospital
Original Assignee
Shanghai First Peoples Hospital
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai First Peoples Hospital filed Critical Shanghai First Peoples Hospital
Priority to CN202110710489.1A priority Critical patent/CN113393449A/en
Publication of CN113393449A publication Critical patent/CN113393449A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0012Biomedical image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/001Texturing; Colouring; Generation of texture or colour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/40Analysis of texture
    • G06T7/41Analysis of texture based on statistical description of texture
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • G06T9/002Image coding using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10068Endoscopic image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Multimedia (AREA)
  • Medical Informatics (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Radiology & Medical Imaging (AREA)
  • Quality & Reliability (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to the technical field of medical image storage, and discloses an endoscope video image automatic storage method based on artificial intelligence, which comprises the following steps of S1: selecting a video frame from an endoscope video as a reference frame; s2: comparing the similarity of two adjacent frames from the reference frame to the reference frame one by one, stopping until two adjacent frames with the similarity lower than a similarity threshold appear, and selecting the next frame in the two adjacent frames which are compared at last as the first frame; s3: comparing the similarity of two adjacent frames from the reference frame to the next frame by frame until two adjacent frames with the similarity lower than the similarity threshold appear, and selecting the previous frame of the two adjacent frames which are compared last as the tail frame; s4: selecting all video frames from the first frame to the last frame as target images; s5: constructing and training a convolutional neural network, and compressing a target image by using the convolutional neural network; s6: storing the compressed target image; the method reduces the space occupied by the required video frame and improves the space utilization rate of the memory.

Description

Endoscope video image automatic storage method based on artificial intelligence
Technical Field
The invention relates to the technical field of medical image storage, in particular to an endoscope video image automatic storage method based on artificial intelligence.
Background
Capsule endoscopy video examination is a commonly used examination method for gastrointestinal diseases, which can visually show the condition of the gastrointestinal part of a patient, is non-invasive and reduces the pain of the patient. In the existing capsule endoscopy, a patient swallows the capsule endoscope, the capsule endoscope passes through a part to be inspected through gastrointestinal peristalsis of the patient, and shot videos are sent to electronic equipment in a wireless mode for medical personnel to watch. The video frames of the whole section of capsule endoscope video are stored in a database of a hospital, so that a large storage space is occupied, and medical personnel are not convenient to search and watch.
Disclosure of Invention
Aiming at the defects in the prior art, the invention aims to provide an endoscope video image automatic storage method based on artificial intelligence.
In order to achieve the above purpose, the invention provides the following technical scheme:
an endoscope video image automatic storage method based on artificial intelligence is characterized by comprising the following steps: s1: selecting a video frame from an endoscope video as a reference frame; s2: comparing the similarity of two adjacent frames from the reference frame to the reference frame one by one, stopping until two adjacent frames with the similarity lower than a similarity threshold appear, and selecting the next frame in the two adjacent frames which are compared at last as the first frame; s3: comparing the similarity of two adjacent frames from the reference frame to the next frame by frame until two adjacent frames with the similarity lower than the similarity threshold appear, and selecting the previous frame of the two adjacent frames which are compared last as the tail frame; s4: selecting all video frames from the first frame to the last frame as target images; s5: constructing and training a convolutional neural network, and compressing a target image by using the convolutional neural network; s6: and storing the compressed target image.
In the present invention, preferably, the comparing the similarity between two adjacent frames in S2 and S3 includes: s21: respectively extracting color features of two video frames; s22: judging whether the color characteristics of the two video frames are similar, if so, executing S23, otherwise, executing S26; s23: respectively extracting texture features of two video frames; s24: judging whether the texture features of the two video frames are similar, if so, executing S25, otherwise, executing S26; s25: identifying two video frames as similar images; s26: two video frames are not considered to be similar images.
In the present invention, preferably, S21 includes: s211: converting the video frame from the RBG color space to the HSI color space; s212: sampling color saturation components of video frames of an HSI color space to form saturation vectors; s213: and carrying out standard normalization processing on the saturation vector to form color characteristics.
In the present invention, preferably, S23 includes: s231: establishing an image pyramid for tone components of a video frame of an HSI color space; s232: and each layer of the image pyramid adopts a local binary mode operator to extract texture features, so that the texture features are formed.
In the present invention, preferably, the determination of whether the color features of the two video frames are similar in S22 is implemented by comparing cosine values of vector angles of the color features of the two video frames.
In the present invention, preferably, the determining whether the texture features of the two video frames are similar in S24 is implemented by a method of statistical matching of global texture features.
In the present invention, preferably, the compressing the target image by using the convolutional neural network in S5 includes: s51: carrying out feature extraction on the target image to form a feature image corresponding to the target image; s52: removing redundant information in the characteristic image to form a concise characteristic image; s53: and reconstructing the concise feature image to form a reconstructed image corresponding to the target image.
In the present invention, preferably, S51 includes: s511: performing convolution on the target image by utilizing two cascaded first convolution layers to form a first characteristic image; s512: learning the first characteristic image by utilizing three cascaded residual modules to form a second characteristic image; s513: and forming a third characteristic image by convolving the second characteristic image by using a second convolution layer.
In the present invention, it is preferable that the removing of redundant information in the feature image in S52 is performed by a Round function.
In the present invention, preferably, S53 includes: s531: convolving the concise feature image by using a third convolution layer to form a fourth feature image, wherein the convolution kernel size of the third convolution layer is 1 multiplied by 1, the number of the convolution kernels is 512, and the convolution step length is 1; s532: convolving the fourth characteristic image by using a sub-pixel convolution layer to form a fifth characteristic image; s533: learning the fifth characteristic image by utilizing three cascaded residual modules to form a sixth characteristic image; s534: convolving the sixth characteristic image by using a sub-pixel convolution layer to form a seventh characteristic image; s535: and performing convolution on the seventh characteristic image by utilizing a sub-pixel convolution layer to form a reconstructed image.
Compared with the prior art, the invention has the beneficial effects that:
the endoscope video image automatic storage method based on artificial intelligence rapidly intercepts the required video frames in the endoscope video by manually selecting the reference frames and compresses the video frames through the convolutional neural network, so that the storage space occupied by the required video frames is obviously reduced, the space utilization rate of a storage is improved, and the searching by medical personnel is facilitated; the similarity of the video frames is compared by adopting the color features and the texture features, so that the accuracy of judging the similarity is ensured; and the convolutional neural network is adopted for image compression, so that the image compression efficiency is improved.
Drawings
FIG. 1 is a flow chart of an artificial intelligence based endoscopic video image automatic storage method.
Fig. 2 is a flowchart of comparing the similarity between two adjacent frames in S2 and S3 in the method for automatically storing endoscopic video images based on artificial intelligence.
Fig. 3 is a flowchart of S21 in the artificial intelligence based endoscopic video image automatic storage method.
Fig. 4 is a flowchart of S23 in the artificial intelligence based endoscopic video image automatic storage method.
Fig. 5 is a flowchart of compressing the target image by using the convolutional neural network in S5 of the method for automatically storing an endoscopic video image based on artificial intelligence.
Fig. 6 is a flowchart of S51 in the artificial intelligence based endoscopic video image automatic storage method.
Fig. 7 is a flowchart of S53 in the artificial intelligence based endoscopic video image automatic storage method.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It will be understood that when an element is referred to as being "secured to" another element, it can be directly on the other element or intervening elements may also be present. When a component is referred to as being "connected" to another component, it can be directly connected to the other component or intervening components may also be present. When a component is referred to as being "disposed on" another component, it can be directly on the other component or intervening components may also be present. The terms "vertical," "horizontal," "left," "right," and the like as used herein are for illustrative purposes only.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.
Referring to fig. 1 to 3, a preferred embodiment of the present invention provides an endoscope video image automatic storage method based on artificial intelligence, including:
s1: one video frame is selected from the endoscopic video as a reference frame.
The endoscopic video shot by the capsule endoscope contains video contents of human esophagus, stomach, small intestine, large intestine and other parts, the video length is long, and the whole observation needs several hours, while the endoscopy is usually performed on a certain part, so that an image of a certain segment needs to be intercepted from the whole endoscopic video for disease diagnosis. The intercepting process can be realized in a semi-automatic mode of the embodiment, namely, a user selects a video frame containing the inspection part as a reference frame, and then the boundary of the video segment is determined according to the reference frame, so that the required video segment can be accurately intercepted.
S2: and comparing the similarity of two adjacent frames from the reference frame to the reference frame one by one, stopping until two adjacent frames with the similarity lower than a similarity threshold appear, and selecting the next frame in the two adjacent frames which are compared at last as the first frame.
In the step, if the reference frame selected by the user is the Nth frame of the endoscope video, the previous frame is the (N-1) th frame, the similarity of the two frames is compared, if the similarity is lower than a similarity threshold value, the two frames are determined to be dissimilar, the comparison is stopped, and the Nth frame is selected as the first frame; if the similarity is not lower than the similarity threshold, the similarity is determined, and the similarity comparison between the frame N-1 and the frame N-2 is continued. And sequentially carrying out similarity comparison on adjacent frames until two adjacent video frames with the similarity lower than a similarity threshold are found: the X-1 th frame and the X-th frame, and the X-th frame is selected as the first frame.
S3: and comparing the similarity of two adjacent frames from the reference frame to the next frame by frame until two adjacent frames with the similarity lower than the similarity threshold appear, and selecting the previous frame in the two adjacent frames with the last comparison as the tail frame.
The procedure of this step is similar to S2 except that the direction of comparison is changed to be performed backward. If the reference frame selected by the user is the Nth frame of the endoscope video, the next frame is the (N + 1) th frame, the similarity of the two frames is compared, if the similarity is lower than the similarity threshold, the two frames are determined to be dissimilar, the comparison is stopped, and the Nth frame is selected as the tail frame; if the similarity is not lower than the similarity threshold, the similarity is determined, and the similarity comparison between the (N + 1) th frame and the (N + 2) th frame is continued. And sequentially carrying out similarity comparison on adjacent frames until two adjacent video frames with the similarity lower than a similarity threshold are found: the Yth frame and the Y +1 th frame, and the Yth frame is selected as the end frame.
In this embodiment, specifically, the similarity comparing two adjacent frames in S2 and S3 includes:
s21: and respectively extracting the color features of the two video frames.
In the feature extraction algorithm of computer vision, the color features do not need to be calculated in a large amount. Only the pixel values in the digital image need to be correspondingly converted and expressed as numerical values, so that the color features become better features with low complexity. The method for extracting color features includes methods for extracting color histograms, color moments and the like of images.
Since the endoscope video image is in the RGB color space, each component of the RGB color space is closely related to the brightness, as long as the brightness changes, all three components will change along with the change, and the three components are not independent, but the illumination condition of the endoscope inside the human body just changes greatly, so the RGB color space is not suitable for the processing of the endoscope video image.
To this end, the video frames may be converted to the HSI color space. The HSI color space has the obvious advantage that the intensity component I is separated from the color saturation S and the hue component H, so that the problem of uneven brightness distribution of the images of the capsule endoscopy can be effectively solved only by extracting and analyzing the characteristics of the color saturation S. Specifically, S21 includes:
s211: the video frame is converted from the RBG color space to the HSI color space.
The conversion formula for converting the original video frame from RBG color space to HSI color space is composed of three parts of hue component H, color saturation S and intensity I, and is as follows:
Figure BDA0003133523320000061
wherein the content of the first and second substances,
Figure BDA0003133523320000062
the color saturation S is calculated by the formula
Figure BDA0003133523320000071
The intensity I is calculated by the formula
Figure BDA0003133523320000072
S212: the color saturation components of the video frames of the HSI color space are sampled to form a saturation vector.
After the color space conversion is completed, the saturation of the video frame may be sampled at intervals, and the sampling is performed sequentially from left to right and from top to bottom with a number of (the number is determined as required, for example, 5) pixels as row-column intervals. After the sampling points of the edge invalid region are removed, an n-dimensional vector C is obtained, and the vector C is the saturation vector of the endoscope image.
S213: and carrying out standard normalization processing on the saturation vector to form color characteristics.
In order to improve the robustness of the extracted saturation vector, the vector C needs to be normalized by the following formula:
Figure BDA0003133523320000073
where μ is the mean of all components of vector C, σ represents the variance of all components of vector C, CiIs the i-th component of vector C, EiAnd the ith component of the vector C subjected to standard normalization is represented, and the vector E subjected to standard normalization is the color feature of the video frame.
S22: and judging whether the color features of the two video frames are similar, if so, executing S23, and if not, executing S26.
In this step, in order to determine the color similarity between two adjacent frames of images, similarity measurement needs to be performed on the color feature vectors of the video frames. The similarity measure is a comprehensive assessment of how similar two things are based on their characteristics. Currently, common similarity measurement methods include euclidean distance, cosine of included angle, manhattan distance, mahalanobis distance, correlation coefficient and the like. Specifically, the embodiment determines whether the color features of the two video frames are similar by comparing cosine values of vector angles of the color features of the two video frames. The calculation formula of the cosine value of the included angle is as follows:
Figure BDA0003133523320000081
and dividing the dot product of the color characteristic vectors E and E' of two adjacent video frames by the modulus of the two vectors, wherein the calculated result is the cosine value AS of the included angle of the two vectors, and the value range is [ -1,1 ]. The larger the value of AS, the smaller the angle theta between the two feature vectors E and E', and the more similar the color features of the two video frames. In the process of judging the similarity, the similarity threshold of the color features is determined according to actual needs.
S23: and respectively extracting texture features of the two video frames.
The video frames contain abundant texture information, and the texture characteristics can be extracted to be used as a basis for judging the similarity of the two video frames. Since the hue H of the HSI space is not sensitive to illumination and the image contains rich information, texture analysis is performed for the hue H. The extraction method includes an LBP (local binary pattern) feature method, a gray level co-occurrence matrix method and the like. The present embodiment employs an LBP feature method. Specifically, S23 includes:
s231: an image pyramid is established for the tone components of the video frame of the HSI color space.
In order to extract the characteristics of textures with different thicknesses in a video frame, the textures with different scales are expressed by adopting an image pyramid. Setting the total number of layers of the image pyramid as you n, wherein the image of the (n + 1) th layer is obtained by filtering and downsampling on the basis of the nth layer, and the filtering and downsampling formula is as follows:
Figure BDA0003133523320000082
wherein G ismImage of m-th layer representing pyramid, Gm+1Representing the image of the (m + 1) th layer of the pyramid, wherein m is more than or equal to 0 and less than or equal to 2,
when m is 0, the original image of the video frame is shown; ω is the template of the mean filter, the size of which is a × a, and a is the column-row spacing of the down-sampling. The mean filter is adopted in the filtering process, so that the calculation amount is small, and the constructed pyramid has a good visual effect.
S232: and each layer of the image pyramid adopts a local binary mode operator to extract texture features, so that the texture features are formed.
Local Binary Pattern (LBP) is a rectangular window of size 3 × 3, with the operator centered on the pixel value gcFor threshold, p pixel values g in a circular neighborhood of radius riCarrying out binarization treatment; less than the central pixel value gcThe point of (a) is binarized to 0, and the point greater than or equal to the central pixel value is binarized to 1, thereby obtaining an eight-bit binary number; are aligned in sequence at giThe binary result of the position point is endowed with a weight 2iAnd summed to obtain the corresponding LBP value, expressed as:
Figure BDA0003133523320000091
where s (x) is a binarization function defined as:
Figure BDA0003133523320000092
extracting texture features by adopting local binary pattern operator LBP at each layer of the image pyramid to obtain LBP feature spectrum OmAnd m is 1,2 and 3 respectively corresponding to three layers of the pyramid, namely the multi-scale local texture features extracted from the hue component H of the video frame.
S24: and judging whether the texture features of the two video frames are similar, if so, executing S25, and if not, executing S26.
In this step, in order to determine the texture similarity between two adjacent frames of images, similarity measurement needs to be performed on the texture feature vectors of the video frames. Specifically, the texture similarity measurement is implemented by a statistical matching method of global texture features.
The method firstly carries out the extraction of the characteristic spectrum OmPerforming global texture feature analysis, performing texture feature extraction on the tone component of the image pyramid of the video frame to obtain an LBP feature spectrum OmAnd m is 1,2, 3. Texture statistics is carried out on the characteristic spectrum of the m-th layer to obtain a texture code LBPm=nFrequency of (y)mnN is 0,1,2,3,4, and the statistical frequency number constitutes the texture feature vector Y of the layerm. The global texture feature vector [ Y ] can be obtained by carrying out texture statistics on the feature spectrum of each layer1,Y2,Y3]。
The similarity measurement of the global texture features adopts weighted Manhattan distance, and obtains a feature vector [ Y ] of two adjacent frames of images through statistics1,Y2,Y3]Similarity meter of its texture featuresThe calculation formula is as follows:
Figure BDA0003133523320000101
wherein, ymnAnd ymn' LBP code LBP for respectively representing m-th layer characteristic spectrum of two video framesm=nFrequency of (d); beta is amThe number of pixel points in the texture effective area of the mth layer of the image pyramid is set; lambda [ alpha ]1Is a weighting coefficient; result of calculation DmAnd representing the similarity of global texture features of the mth layer of the pyramid of the two video frames. In the process of judging the similarity, the similarity threshold of the texture features is determined according to actual needs.
S25: two video frames are considered similar images.
S26: two video frames are not considered to be similar images.
Through the steps of S21, S22, S23, and S24, it can be concluded whether or not two video frames are similar images.
S4: and selecting all video frames from the first frame to the last frame as target images.
After the selection of the first frame and the last frame is completed, all the video frames between the two frames are video images of the capsule endoscope at the part to be inspected, so the video frames should be determined as target images for medical personnel to diagnose.
S5: and constructing and training a convolutional neural network, and compressing the target image by using the convolutional neural network.
The core of adopting the convolutional neural network to compress and encode the video frame is to convert original image information into a characteristic image through convolutional downsampling, compared with the original image, the characteristic image has the advantages of smaller size, smaller information entropy and more suitability for binary encoding, and then the characteristic image is restored into a reconstructed image through deconvolution operation, so that a great deal of redundant information is reduced in the reconstructed image, key information is reserved, the image is compressed, and the occupied storage space is smaller. Specifically, S5 includes:
s51: and performing feature extraction on the target image to form a feature image corresponding to the target image.
In the step, an original target image is input into a convolution layer, the convolution layer performs convolution operation to extract a characteristic image, and each convolution is a characteristic extraction process. Specifically, S51 includes:
s511: and performing convolution on the target image by utilizing the two cascaded first convolution layers to form a first characteristic image.
In the step, the shallow layer features of the image target are extracted through the two first convolution layers, and for the convenience of distinguishing, the formed image is marked as a first feature image. The number of channels of the first convolutional layer is 128, the convolutional kernel size is 5 × 5, the step size is 2, and each first convolutional layer is followed by a Relu activation layer. After the image with the size of M multiplied by N passes through two first convolution layers, 128 first feature maps with the size of M/4 multiplied by N/4 are obtained.
S512: and learning the first characteristic image by utilizing three cascaded residual modules to form a second characteristic image.
In the step, the first characteristic image is learned through three cascaded residual modules, deeper characteristics are extracted, and the formed image is recorded as a second characteristic image for convenient distinguishing. The residual error module is internally provided with a plurality of residual error convolution layers, and jump connection is arranged between the input end and the output end of the residual error module. And the 128 first feature images with the size of M/4 multiplied by N/4 are subjected to three cascaded residual modules to obtain 128 second feature images with the size of M/4 multiplied by N/4.
S513: and forming a third characteristic image by convolving the second characteristic image by using a second convolution layer.
The step is to convolute the second characteristic image by a second convolution layer to realize the conversion of the channel number of the second characteristic image, and the formed image is marked as a third characteristic image for convenient distinguishing. The 128 second characteristic images with the size of M/4 multiplied by N/4 are processed by the second convolution layer with the convolution kernel size of 5 multiplied by 5, the step size of 2 and the number of convolution kernels of F to obtain F third characteristic images with the size of M/8 multiplied by N/8.
S52: and removing redundant information in the characteristic image to form a concise characteristic image.
The step can remove redundant information in the third characteristic image by inputting the third characteristic image into a specific estimation function for processing, and key information is reserved to obtain an indirect characteristic image. Preferably, the present embodiment implements the above process by using a Round function. The Round function is a quantization function, and quantizes the input third feature image and rounds the floating point number of the third feature image to an integer number. The Round function quantizes the input floating point number into an integer, and directly quantizes each third feature map during network forward propagation. The derivative of the quantization function itself is mostly 0 and is not conducive in other places, such as directly using the Round function itself to calculate the gradient and applying it to the network, which will make the gradient unable to be transmitted to the next layer through the Round layer. Therefore, it is necessary to approximate the Round function as a continuous function r (x), and to replace the Round derivative with the derivative of r (x) in the reverse propagation, i.e. to approximate the Round function as a continuous function r (x), i.e. to use the derivative of r (x) in the reverse propagation
Figure BDA0003133523320000121
Where round (x) is a quantization function, and r (x) is an approximation function of round (x). It can take round (x) r (x) x, where round (x) is used for the forward process when the network is actually trained, and r (x) x is used for the reverse propagation of the gradient to calculate the reverse derivative. When r (x) is equal to x, the derivative of r (x) to x is 1, so that after the gradient of the upper layer of the network passes through the Round function according to the chain rule, the gradient value is multiplied by 1, namely the gradient value is kept unchanged, and then the lower layer of the Round layer is transmitted, so that after the approximation according to the method, the Round layer is only substantially equivalent to one wire connecting the upper layer and the lower layer of the Round layer during back propagation, and the gradient of the network is not influenced.
S53: and reconstructing the concise feature image to form a reconstructed image corresponding to the target image.
Inputting the concise feature image into a deconvolution layer, and performing deconvolution operation on the deconvolution layer to recover the concise feature image. Specifically, S53 includes:
s531: and (3) convolving the concise feature image by using a third convolution layer to form a fourth feature image, wherein the convolution kernel size of the third convolution layer is 1 multiplied by 1, the number of the convolution kernels is 512, and the convolution step size is 1.
The step changes the number of channels, converts the concise feature images into 512 new feature images with the size of M/8 multiplied by N/8, and marks the new feature images as a fourth feature image for convenient distinguishing.
S532: and performing convolution on the fourth characteristic image by using a sub-pixel convolution layer to form a fifth characteristic image.
In the step, 256 new feature images with the size of M/4 multiplied by N/4 are obtained through the sub-pixel convolution layer and are marked as a fifth feature image for convenient distinguishing.
S533: and learning the fifth characteristic image by utilizing three cascaded residual modules to form a sixth characteristic image.
In the step, 256 new characteristic images with the size of M/4 multiplied by N/4 are obtained through three cascaded residual modules and are marked as a sixth characteristic image for convenient distinguishing.
S534: and performing convolution on the sixth characteristic image by using a sub-pixel convolution layer to form a seventh characteristic image.
In the step, 128 new characteristic images with the size of M/2 XN/2 are obtained through one sub-pixel convolution layer and are marked as a seventh characteristic image for convenient distinguishing.
S535: and performing convolution on the seventh characteristic image by utilizing a sub-pixel convolution layer to form a reconstructed image.
The seventh characteristic image is restored to a new image with the same size as the original image through a sub-pixel convolution layer, namely the reconstructed image.
S6: and storing the compressed target image.
Through the steps, redundant information is removed from the reconstructed image, the information amount of the reconstructed image is obviously less than that of the original target image, and the compressed target image, namely the reconstructed image can be stored in a memory in a bitmap storage mode.
The above description is intended to describe in detail the preferred embodiments of the present invention, but the embodiments are not intended to limit the scope of the claims of the present invention, and all equivalent changes and modifications made within the technical spirit of the present invention should fall within the scope of the claims of the present invention.

Claims (10)

1. An endoscope video image automatic storage method based on artificial intelligence is characterized by comprising the following steps:
s1: selecting a video frame from an endoscope video as a reference frame;
s2: comparing the similarity of two adjacent frames from the reference frame to the reference frame one by one, stopping until two adjacent frames with the similarity lower than a similarity threshold appear, and selecting the next frame in the two adjacent frames which are compared at last as the first frame;
s3: comparing the similarity of two adjacent frames from the reference frame to the next frame by frame until two adjacent frames with the similarity lower than the similarity threshold appear, and selecting the previous frame of the two adjacent frames which are compared last as the tail frame;
s4: selecting all video frames from the first frame to the last frame as target images;
s5: constructing and training a convolutional neural network, and compressing a target image by using the convolutional neural network;
s6: and storing the compressed target image.
2. The method of claim 1, wherein the comparing the similarity between two adjacent frames in S2 and S3 comprises:
s21: respectively extracting color features of two video frames;
s22: judging whether the color characteristics of the two video frames are similar, if so, executing S23, otherwise, executing S26;
s23: respectively extracting texture features of two video frames;
s24: judging whether the texture features of the two video frames are similar, if so, executing S25, otherwise, executing S26;
s25: identifying two video frames as similar images;
s26: two video frames are not considered to be similar images.
3. The method for automatically storing endoscopic video images based on artificial intelligence as claimed in claim 2, wherein S21 includes:
s211: converting the video frame from the RBG color space to the HSI color space;
s212: sampling color saturation components of video frames of an HSI color space to form saturation vectors;
s213: and carrying out standard normalization processing on the saturation vector to form color characteristics.
4. The artificial intelligence based endoscopic video image automatic storage method according to claim 3, wherein S23 includes:
s231: establishing an image pyramid for tone components of a video frame of an HSI color space;
s232: and each layer of the image pyramid adopts a local binary mode operator to extract texture features, so that the texture features are formed.
5. The method for automatically storing endoscopic video images based on artificial intelligence as claimed in claim 3, wherein said determining whether the color features of the two video frames are similar in S22 is performed by comparing cosine values of vector angles of the color features of the two video frames.
6. The method for automatically storing endoscopic video images based on artificial intelligence as claimed in claim 3, wherein said determining whether the texture features of two video frames are similar in S24 is implemented by statistical matching of global texture features.
7. The method for automatically storing artificial intelligence based endoscopic video images as claimed in claim 1, wherein said compressing the target image with convolutional neural network in S5 comprises:
s51: carrying out feature extraction on the target image to form a feature image corresponding to the target image;
s52: removing redundant information in the characteristic image to form a concise characteristic image;
s53: and reconstructing the concise feature image to form a reconstructed image corresponding to the target image.
8. The method for automatically storing endoscopic video images based on artificial intelligence as claimed in claim 7, wherein S51 includes:
s511: performing convolution on the target image by utilizing two cascaded first convolution layers to form a first characteristic image;
s512: learning the first characteristic image by utilizing three cascaded residual modules to form a second characteristic image;
s513: and forming a third characteristic image by convolving the second characteristic image by using a second convolution layer.
9. The method for automatically storing endoscope video images based on artificial intelligence as claimed in claim 7, wherein the removing redundant information in the characteristic images in S52 is implemented by Round function.
10. The method for automatically storing endoscopic video images based on artificial intelligence as claimed in claim 7, wherein S53 includes:
s531: convolving the concise feature image by using a third convolution layer to form a fourth feature image, wherein the convolution kernel size of the third convolution layer is 1 multiplied by 1, the number of the convolution kernels is 512, and the convolution step length is 1;
s532: convolving the fourth characteristic image by using a sub-pixel convolution layer to form a fifth characteristic image;
s533: learning the fifth characteristic image by utilizing three cascaded residual modules to form a sixth characteristic image;
s534: convolving the sixth characteristic image by using a sub-pixel convolution layer to form a seventh characteristic image;
s535: and performing convolution on the seventh characteristic image by utilizing a sub-pixel convolution layer to form a reconstructed image.
CN202110710489.1A 2021-06-25 2021-06-25 Endoscope video image automatic storage method based on artificial intelligence Pending CN113393449A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110710489.1A CN113393449A (en) 2021-06-25 2021-06-25 Endoscope video image automatic storage method based on artificial intelligence

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110710489.1A CN113393449A (en) 2021-06-25 2021-06-25 Endoscope video image automatic storage method based on artificial intelligence

Publications (1)

Publication Number Publication Date
CN113393449A true CN113393449A (en) 2021-09-14

Family

ID=77623889

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110710489.1A Pending CN113393449A (en) 2021-06-25 2021-06-25 Endoscope video image automatic storage method based on artificial intelligence

Country Status (1)

Country Link
CN (1) CN113393449A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090203964A1 (en) * 2008-02-13 2009-08-13 Fujifilm Corporation Capsule endoscope system and endoscopic image filing method
CN102117329A (en) * 2011-03-04 2011-07-06 南方医科大学 Capsule endoscope image retrieval method based on wavelet transformation
CN109635871A (en) * 2018-12-12 2019-04-16 浙江工业大学 A kind of capsule endoscope image classification method based on multi-feature fusion
CN110913243A (en) * 2018-09-14 2020-03-24 华为技术有限公司 Video auditing method, device and equipment
CN111327945A (en) * 2018-12-14 2020-06-23 北京沃东天骏信息技术有限公司 Method and apparatus for segmenting video
CN112070702A (en) * 2020-09-14 2020-12-11 中南民族大学 Image super-resolution reconstruction system and method for multi-scale residual error feature discrimination enhancement
CN112330542A (en) * 2020-11-18 2021-02-05 重庆邮电大学 Image reconstruction system and method based on CRCSAN network

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090203964A1 (en) * 2008-02-13 2009-08-13 Fujifilm Corporation Capsule endoscope system and endoscopic image filing method
CN102117329A (en) * 2011-03-04 2011-07-06 南方医科大学 Capsule endoscope image retrieval method based on wavelet transformation
CN110913243A (en) * 2018-09-14 2020-03-24 华为技术有限公司 Video auditing method, device and equipment
CN109635871A (en) * 2018-12-12 2019-04-16 浙江工业大学 A kind of capsule endoscope image classification method based on multi-feature fusion
CN111327945A (en) * 2018-12-14 2020-06-23 北京沃东天骏信息技术有限公司 Method and apparatus for segmenting video
CN112070702A (en) * 2020-09-14 2020-12-11 中南民族大学 Image super-resolution reconstruction system and method for multi-scale residual error feature discrimination enhancement
CN112330542A (en) * 2020-11-18 2021-02-05 重庆邮电大学 Image reconstruction system and method based on CRCSAN network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
彭同胜,刘小燕,龚军辉,蒋笑笑: "基于颜色匹配和改进 LBP 的胶囊内镜视频缩减", 《电子测量与仪器学报》 *

Similar Documents

Publication Publication Date Title
CN111127412B (en) Pathological image recognition device based on generation countermeasure network
CN113257413A (en) Cancer prognosis survival prediction method and device based on deep learning and storage medium
CN110738655B (en) Image report generation method, device, terminal and storage medium
US20130188845A1 (en) Device, system and method for automatic detection of contractile activity in an image frame
CN111407245A (en) Non-contact heart rate and body temperature measuring method based on camera
CN111667453A (en) Gastrointestinal endoscope image anomaly detection method based on local feature and class mark embedded constraint dictionary learning
CN113012140A (en) Digestive endoscopy video frame effective information region extraction method based on deep learning
CN115985505B (en) Multidimensional fusion myocardial ischemia auxiliary diagnosis model and construction method thereof
CN112200162A (en) Non-contact heart rate measuring method, system and device based on end-to-end network
CN111784668A (en) Digestive endoscopy image automatic freezing method based on perceptual hash algorithm
CN114511502A (en) Gastrointestinal endoscope image polyp detection system based on artificial intelligence, terminal and storage medium
CN114707530A (en) Bimodal emotion recognition method and system based on multi-source signal and neural network
Xie et al. Digital tongue image analyses for health assessment
US8929629B1 (en) Method and system for image-based ulcer detection
CN117542103A (en) Non-contact heart rate detection method based on multi-scale space-time feature map
CN113421250A (en) Intelligent fundus disease diagnosis method based on lesion-free image training
Yang et al. Lesion classification of wireless capsule endoscopy images
CN116189902B (en) Myocardial ischemia prediction model based on magnetocardiogram video data and construction method thereof
CN113393449A (en) Endoscope video image automatic storage method based on artificial intelligence
CN115581435A (en) Sleep monitoring method and device based on multiple sensors
CN114842104A (en) Capsule endoscope image super-resolution reconstruction method based on multi-scale residual errors
CN110334582B (en) Method for intelligently identifying and recording polyp removing video of endoscopic submucosal dissection
Zhong et al. Denoising auto-encoder network combined classfication module for brain tumors detection
Chen et al. Camera-based peripheral edema measurement using machine learning
CN113255781A (en) Representative picture selecting method and device for CP-EBUS and diagnosis system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210914