CN117935124A - Method, device, equipment and medium for identifying position starting point of endoscope image - Google Patents

Method, device, equipment and medium for identifying position starting point of endoscope image Download PDF

Info

Publication number
CN117935124A
CN117935124A CN202410180195.6A CN202410180195A CN117935124A CN 117935124 A CN117935124 A CN 117935124A CN 202410180195 A CN202410180195 A CN 202410180195A CN 117935124 A CN117935124 A CN 117935124A
Authority
CN
China
Prior art keywords
sliding window
video image
image data
data
target sliding
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410180195.6A
Other languages
Chinese (zh)
Inventor
高敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing Jinshan Science and Technology Group Co Ltd
Original Assignee
Chongqing Jinshan Science and Technology Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing Jinshan Science and Technology Group Co Ltd filed Critical Chongqing Jinshan Science and Technology Group Co Ltd
Priority to CN202410180195.6A priority Critical patent/CN117935124A/en
Publication of CN117935124A publication Critical patent/CN117935124A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Endoscopes (AREA)

Abstract

The application discloses a method, a device, equipment and a medium for identifying the starting point of a part of an endoscope image, which relate to the technical field of image processing and comprise the following steps: acquiring video image data shot by a capsule endoscope; processing the video image data by utilizing a target sliding window to extract characteristics of a plurality of frames of video images in the target sliding window to obtain corresponding characteristic data; and fusing all the feature data to obtain fused features, and determining a position starting point identification result of the video image data in the target sliding window based on the fused features. According to the method, the fusion characteristics of the plurality of frames of video images of the target sliding window are utilized to identify the position starting point, the information among the continuous frames of video images is fully utilized, errors caused by the fact that only single frame of image information is utilized to identify the position starting point are avoided, and the accuracy of the position starting point identification is improved.

Description

Method, device, equipment and medium for identifying position starting point of endoscope image
Technical Field
The present invention relates to the field of image processing technologies, and in particular, to a method, an apparatus, a device, and a medium for identifying a location start point of an endoscope image.
Background
A capsule endoscope is a medical device for examining the digestive tract, which enables the complete detection of the digestive tract of a patient by orally taking a capsule, and captures a large number of images in the digestive tract, and a doctor can observe abnormal lesions or foci by tracing the path of the capsule endoscope in the digestive tract. The key part starting point identification technology of the capsule endoscope refers to a technology for accurately identifying the position of the capsule after the capsule endoscope enters the alimentary canal, and the key part can be particularly esophagus, stomach, small intestine, large intestine and the like. Accurate site origin identification is very important to doctors because it can help doctors determine where the capsule endoscope is located, thereby improving success rate and accuracy of capsule endoscopy. For example, when a doctor needs to view stomach data, the stomach starting point and the small intestine starting point need to be accurately identified according to the endoscope image data, so that the endoscope image data between the stomach starting point and the small intestine starting point are screened out for analysis, and stomach diseases of a patient are better observed and diagnosed. Also, accurate starting point identification may reduce the time and discomfort of capsule endoscopy, and if the starting point is not accurately identified, the physician may need additional time to determine the location of the capsule, which may increase patient discomfort and inconvenience.
Currently, when identifying the origin of a site, the following methods are generally adopted: firstly, a traditional image processing technology is adopted, and the image processing technology can help doctors to identify the characteristics of the position of the capsule endoscope, such as anatomical structures of alimentary tracts, mucous membrane textures and the like; however, this method relies heavily on color and texture information, and when the digestive system environment is bad or the digestive system data photographed by some people are special, an origin recognition error occurs. Secondly, the capsule endoscope is usually provided with a plurality of sensors such as a camera, an optical sensor, a magnetic sensor and the like by adopting a sensor technology, and the sensors can provide information about the position of the capsule endoscope such as direction, angle, position and the like; however, this method relies excessively on hardware conditions, which increases the equipment cost, and the more precise the equipment, the greater the energy consumption, the limited capsule image data obtained, and causes a significant increase in the medical cost of the patient. Thirdly, single-frame image analysis based on deep learning can establish a deep learning identification model by analyzing and learning a large amount of capsule endoscope data, and can carry out position identification on the capsule endoscope image so as to judge whether the starting point of the key position of the digestive tract is reached or not; however, this method excessively depends on a single piece of identification information, and does not use the information relationship between consecutive frames, which may cause unstable identification starting points and cause large error in starting point identification under different shooting environments.
In summary, how to accurately identify the starting point of the digestive tract part to improve the success rate and accuracy of the capsule endoscopy is a problem to be solved at present.
Disclosure of Invention
In view of the above, an object of the present invention is to provide a method, apparatus, device, and medium for recognizing a starting point of a portion of an endoscope image, which can accurately recognize the starting point of a digestive tract portion, thereby improving the success rate and accuracy of capsule endoscopy. The specific scheme is as follows:
In a first aspect, the present application discloses a method for identifying a location origin of an endoscopic image, comprising:
acquiring video image data shot by a capsule endoscope;
Processing the video image data by utilizing a target sliding window to extract characteristics of a plurality of frames of video images in the target sliding window to obtain corresponding characteristic data;
And fusing all the feature data to obtain fused features, and determining a position starting point identification result of the video image data in the target sliding window based on the fused features.
Optionally, before the processing the video image data using the target sliding window, the method further includes:
Scaling the video image data according to a preset pixel proportion;
and carrying out normalization processing on the video image data after the scaling processing.
Optionally, the fusing all the feature data to obtain a fused feature includes:
And fusing the channel numbers of all the feature data to obtain fusion features.
Optionally, the fusing all the feature data to obtain a fused feature includes:
Adding all the characteristic data to obtain an addition result;
And normalizing the addition result to obtain a fusion characteristic.
Optionally, the fusing all the feature data to obtain a fused feature includes:
Determining a weight coefficient preset based on the target sliding window; the weight value of the characteristic data corresponding to the first frame video image and the last frame video image in the target sliding window is higher than the weight value of the characteristic data corresponding to other frame video images, and the sum value of all the weight coefficients is 1;
and weighting all the feature data by using the weight coefficient to obtain a fusion feature.
Optionally, the determining, based on the fusion feature, a location start point recognition result of the video image data in the target sliding window includes:
extracting the characteristics of the fusion characteristics to obtain characteristic extraction results;
And inputting the feature extraction result into a first classifier so that the first classifier outputs a part starting point identification result for representing whether the video image data in the target sliding window is the starting point position of the first preset tissue part or not.
Optionally, the determining, based on the fusion feature, a location start point recognition result of the video image data in the target sliding window includes:
extracting the characteristics of the fusion characteristics to obtain characteristic extraction results;
the feature extraction results are respectively input into a plurality of second classifiers to obtain confidence scores which are output by each second classifier and used for representing that video image data in the target sliding window are second preset tissue parts;
and taking the target tissue part corresponding to the highest confidence score as a classification result of the video image data in the target sliding window, and outputting a part starting point identification result of whether the target tissue part is the starting point position of the third preset tissue part.
In a second aspect, the present application discloses a part origin recognition apparatus of an endoscopic image, comprising:
the data acquisition module is used for acquiring video image data shot by the capsule endoscope;
the feature extraction module is used for processing the video image data by utilizing a target sliding window so as to extract features of a plurality of frames of video images in the target sliding window to obtain corresponding feature data;
The feature fusion module is used for fusing all the feature data to obtain fusion features;
And the result determining module is used for determining a part starting point identification result of the video image data in the target sliding window based on the fusion characteristic.
In a third aspect, the present application discloses an electronic device, comprising:
A memory for storing a computer program;
a processor for executing the computer program to implement the steps of the previously disclosed method for recognizing a location start point of an endoscopic image.
In a fourth aspect, the present application discloses a computer-readable storage medium for storing a computer program; wherein the computer program when executed by the processor implements the steps of the method for recognizing a location start point of an endoscopic image disclosed above.
Therefore, the application obtains the video image data shot by the capsule endoscope; processing the video image data by utilizing a target sliding window to extract characteristics of a plurality of frames of video images in the target sliding window to obtain corresponding characteristic data; and fusing all the feature data to obtain fused features, and determining a position starting point identification result of the video image data in the target sliding window based on the fused features. Therefore, the application needs to process the video image data shot by the capsule endoscope by utilizing the target sliding window, so as to extract the characteristics of a plurality of frames of video images in the target sliding window to obtain the corresponding characteristic data. That is, the application reduces the influence of environmental factors on the identification of the position starting point by utilizing the information between the continuous frame video images, avoids the error caused by the identification of the position starting point by utilizing the single frame image information, improves the accuracy of the identification of the position starting point, and has lower cost without depending on hardware equipment.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present application, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a method for identifying the origin of a portion of an endoscopic image according to the present application;
FIG. 2 is a schematic diagram of a sliding window detection process according to the present disclosure;
FIG. 3 is a flowchart of a method for identifying a location origin of an endoscopic image according to the present disclosure;
FIG. 4 is a schematic diagram of a classification and identification network model according to the present application;
FIG. 5 is a schematic diagram of a multi-classification recognition network model according to the present disclosure;
FIG. 6 is a schematic view of a device for recognizing the origin of a portion of an endoscopic image according to the present application;
fig. 7 is a block diagram of an electronic device according to the present disclosure.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
When the starting point of a part is identified, the traditional image processing technology excessively depends on color and texture information, and when the digestive tract system environment is bad or digestive tract system data shot by certain people are special, the starting point identification error can occur; the sensor technology depends on hardware conditions excessively, so that equipment cost is increased, the more precise equipment is, the larger energy consumption is, the acquired capsule image data is limited, and the medical cost of a patient is greatly increased; the analysis of the single frame image based on the deep learning excessively depends on single identification information, and the information relation between continuous frames is not used, so that the identification starting point is unstable, and the starting point identification error is large under different shooting environments.
Therefore, the embodiment of the application discloses a method, a device, equipment and a medium for identifying the starting point of a part of an endoscope image, which can accurately identify the starting point of a digestive tract part so as to improve the success rate and the accuracy of capsule endoscopy.
Referring to fig. 1, an embodiment of the present application discloses a method for identifying a location start point of an endoscope image, the method comprising:
Step S11: video image data captured by the capsule endoscope is acquired.
In this embodiment, a large number of images are continuously captured in the digestive tract after the capsule endoscope enters the digestive tract, and therefore, in this embodiment, it is first necessary to acquire video image data captured after the capsule endoscope enters the digestive tract.
Step S12: and processing the video image data by utilizing a target sliding window to extract the characteristics of a plurality of frames of video images in the target sliding window to obtain corresponding characteristic data.
In this embodiment, as shown in fig. 2, each rectangular frame in fig. 2 represents one frame of video image data, so that a video image data sequence is formed by a plurality of rectangular frames, and the process of processing the video image data by using the target sliding window is to slide the target sliding window from left to right on the video image data sequence, so as to extract features of a plurality of frames of video images in the target sliding window to obtain corresponding feature data, where the extracted features may specifically be color features, texture features, structural features, and so on. In a specific embodiment, the length of the target sliding window is set to be n, the sliding steps are set to be r, wherein n is larger than or equal to 5,r and smaller than n, namely, n frames of video images are subjected to feature extraction to obtain corresponding n pieces of feature data.
In this embodiment, the size of the feature data is c×h×l, where C is the number of channels, H is the length of the feature, and L is the width of the feature. Thus, after feature extraction, n features of size c×h×l are obtained.
Step S13: and fusing all the feature data to obtain fused features, and determining a position starting point identification result of the video image data in the target sliding window based on the fused features.
In this embodiment, all feature data are fused to obtain a fused feature, and then the fused feature is analyzed to determine a location starting point recognition result of video image data in the target sliding window.
In a first embodiment, the fusing all the feature data to obtain the fused feature includes: and fusing the channel numbers of all the feature data to obtain fusion features. That is, the present embodiment performs a fusion operation on n features, and specifically can fuse the channel numbers of all feature data, thereby forming one (n×c) ×h×l feature.
In a second embodiment, the fusing all the feature data to obtain the fused feature includes: adding all the characteristic data to obtain an addition result; and normalizing the addition result to obtain a fusion characteristic. That is, the fusion operation may also be a superposition of multiple features, as shown in the following formula:
Wherein, theta 1 represents the addition result, ρ 1 to ρ n each represent the feature data, n is the number of the feature data, Representing the superposition operation between the feature data.
Furthermore, normalization processing is required to be performed on the addition result to obtain the fusion characteristic.
In a third embodiment, the fusing all the feature data to obtain the fused feature includes: determining a weight coefficient preset based on the target sliding window; the weight value of the characteristic data corresponding to the first frame video image and the last frame video image in the target sliding window is higher than the weight value of the characteristic data corresponding to other frame video images, and the sum value of all the weight coefficients is 1; and weighting all the feature data by using the weight coefficient to obtain a fusion feature. That is, in this embodiment, the weighting coefficients set on the basis of the target sliding window may be used to perform weighting processing on all the feature data, so as to obtain the fusion feature, where the following formula is shown:
θ2=α·ρ1+β·ρ2+…γ·ρn
Wherein, alpha, beta and gamma are weight coefficients, rho 1 to rho n all represent characteristic data, n is the number of the characteristic data, and theta 2 is a fusion characteristic.
It should be noted that, considering that the first frame video image and the last frame video image in the target sliding window have a large influence on the recognition result, the weight value of the feature data corresponding to the first frame video image and the last frame video image in the target sliding window is higher than the weight value of the feature data corresponding to the other frame video images, that is, the weight values of α and γ are the largest, and the sum of all the weight coefficients is 1.
Therefore, the application obtains the video image data shot by the capsule endoscope; processing the video image data by utilizing a target sliding window to extract characteristics of a plurality of frames of video images in the target sliding window to obtain corresponding characteristic data; and fusing all the feature data to obtain fused features, and determining a position starting point identification result of the video image data in the target sliding window based on the fused features. Therefore, the application needs to process the video image data shot by the capsule endoscope by utilizing the target sliding window, so as to extract the characteristics of a plurality of frames of video images in the target sliding window to obtain the corresponding characteristic data. That is, the application reduces the influence of environmental factors on the identification of the position starting point by utilizing the information between the continuous frame video images, avoids the error caused by the identification of the position starting point by utilizing the single frame image information, improves the accuracy of the identification of the position starting point, and has lower cost without depending on hardware equipment.
Referring to fig. 3, an embodiment of the present application discloses a specific method for identifying a location start point of an endoscopic image, and compared with the previous embodiment, the present embodiment further describes and optimizes a technical solution. The method specifically comprises the following steps:
Step S21: video image data captured by the capsule endoscope is acquired.
Step S22: and scaling the video image data according to a preset pixel proportion, and normalizing the scaled video image data.
In this embodiment, after the video image data captured by the capsule endoscope is obtained, the video image data is further preprocessed. The preprocessing operation specifically includes scaling the video image data according to a preset pixel proportion, for example, all the image data can be uniformly scaled to 256×256 pixels, and then normalization processing is performed on the scaled video image data, so that a format capable of being sent into the recognition network model is finally achieved.
Step S23: and processing the video image data by utilizing a target sliding window to extract the characteristics of a plurality of frames of video images in the target sliding window to obtain corresponding characteristic data.
Step S24: and fusing all the feature data to obtain fused features, and extracting the features of the fused features to obtain feature extraction results.
In this embodiment, after the fusion feature is obtained, the convolution layer is further utilized to perform feature extraction on the fusion feature, so as to obtain a feature extraction result.
Step S25: and inputting the feature extraction result into a first classifier so that the first classifier outputs a part starting point identification result for representing whether the video image data in the target sliding window is the starting point position of the first preset tissue part or not.
In a specific embodiment, as shown in fig. 4, after the feature extraction result of the fusion feature is obtained through the convolution layer, the feature extraction result is input to the first classifier (i.e., softmax in fig. 4), so that the first classifier outputs a location start point identification result for indicating whether the video image data in the target sliding window is the start point location of the first preset tissue location. That is, the present embodiment may implement the two classification by softmax, that is, the first classifier only needs to identify whether the video image data in the target sliding window is the starting point position of the first preset tissue site, so as to output the "yes" or "no" classification result. The first preset tissue site may be esophagus, stomach, small intestine, large intestine, etc., that is, the present embodiment may perform the start point identification for a specific tissue site.
In another specific embodiment, after extracting the features of the fusion feature, obtaining a feature extraction result, the method may further include: the feature extraction results are respectively input into a plurality of second classifiers to obtain confidence scores which are output by each second classifier and used for representing that video image data in the target sliding window are second preset tissue parts; and taking the target tissue part corresponding to the highest confidence score as a classification result of the video image data in the target sliding window, and outputting a part starting point identification result of whether the target tissue part is the starting point position of the third preset tissue part. That is, as shown in fig. 5, the embodiment may also be used for multiple classification, after the feature extraction result of the fusion feature is obtained through the convolution layer, the feature extraction result is input to a plurality of second classifiers (i.e. softmax in fig. 5) to obtain the confidence score for characterizing the video image data in the target sliding window as the second preset tissue position output by each second classifier, that is, the confidence score of each position is output by the second classifier, so that the target tissue position corresponding to the highest value of the confidence score is used as the classification result of the video image data in the target sliding window. In this way, the embodiment of the present application can identify which tissue portion the video image data in the target sliding window specifically corresponds to, further, output a portion start point identification result according to the classification result, where the target tissue portion is the start point position of the third preset tissue portion, for example, if the video image data in the target sliding window is identified to correspond to the cardiac valve, the description is the start point position of the stomach, and if the video image data in the target sliding window is identified to correspond to the gastric volume, the description is not the start point position.
For more specific processing procedures in the steps S21 and S23, reference may be made to the corresponding contents disclosed in the foregoing embodiments, and no detailed description is given here.
Therefore, after the video image data shot by the capsule endoscope is obtained, the embodiment of the application also needs to perform preprocessing operation on the video image data, so that the format of the processed image data meets the format requirement of the identification network model. After the fusion characteristics are obtained, the convolution layer is further required to be used for carrying out characteristic extraction on the fusion characteristics to obtain characteristic extraction results, and the characteristic extraction results are input into the classifier for classification. In one case, the feature extraction result may be input to a first classifier, where the first classifier only needs to identify whether the video image data in the target sliding window is the starting point position of the first preset tissue site, so as to output a classification result of yes or no; in another case, the feature extraction result may be input to a plurality of second classifiers, and the confidence score of each location is output by the second classifiers, so that the target tissue location corresponding to the highest confidence score is used as the classification result of the video image data in the target sliding window, and further, the location start point recognition result of whether the target tissue location is the start point location of the third preset tissue location is output according to the classification result. Therefore, the method and the device can identify the starting point of the key part of the capsule endoscope data based on the video image data, fully utilize the information between the continuous images, reduce the error rate of identifying the starting point of the key part in the digestive system, improve the identification precision of the network, overcome the severe digestive tract environment and improve the robustness of the model.
Referring to fig. 6, an embodiment of the present application discloses a device for recognizing a position start point of an endoscopic image, the device comprising:
A data acquisition module 11 for acquiring video image data captured by the capsule endoscope;
The feature extraction module 12 is configured to process the video image data by using a target sliding window, so as to perform feature extraction on a plurality of frames of video images in the target sliding window to obtain corresponding feature data;
the feature fusion module 13 is used for fusing all the feature data to obtain fusion features;
The result determining module 14 is configured to determine a location start point identification result of the video image data in the target sliding window based on the fusion feature.
Therefore, the application obtains the video image data shot by the capsule endoscope; processing the video image data by utilizing a target sliding window to extract characteristics of a plurality of frames of video images in the target sliding window to obtain corresponding characteristic data; and fusing all the feature data to obtain fused features, and determining a position starting point identification result of the video image data in the target sliding window based on the fused features. Therefore, the application needs to process the video image data shot by the capsule endoscope by utilizing the target sliding window, so as to extract the characteristics of a plurality of frames of video images in the target sliding window to obtain the corresponding characteristic data. That is, the application reduces the influence of environmental factors on the identification of the position starting point by utilizing the information between the continuous frame video images, avoids the error caused by the identification of the position starting point by utilizing the single frame image information, improves the accuracy of the identification of the position starting point, and has lower cost without depending on hardware equipment.
Fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application. Specifically, the method comprises the following steps: at least one processor 21, at least one memory 22, a power supply 23, a communication interface 24, an input output interface 25, and a communication bus 26. Wherein the memory 22 is configured to store a computer program that is loaded and executed by the processor 21 to implement relevant steps in the method for recognizing a location start point of an endoscopic image performed by an electronic apparatus as disclosed in any of the foregoing embodiments.
In this embodiment, the power supply 23 is configured to provide an operating voltage for each hardware device on the electronic device 20; the communication interface 24 can create a data transmission channel between the electronic device 20 and an external device, and the communication protocol to be followed is any communication protocol applicable to the technical solution of the present application, which is not specifically limited herein; the input/output interface 25 is used for acquiring external input data or outputting external output data, and the specific interface type thereof may be selected according to the specific application requirement, which is not limited herein.
Processor 21 may include one or more processing cores, such as a 4-core processor, an 8-core processor, etc. The processor 21 may be implemented in at least one hardware form of DSP (DIGITAL SIGNAL Processing), FPGA (Field-Programmable gate array), PLA (Programmable Logic Array ). The processor 21 may also include a main processor, which is a processor for processing data in an awake state, also called a CPU (Central Processing Unit ), and a coprocessor; a coprocessor is a low-power processor for processing data in a standby state. In some embodiments, the processor 21 may integrate a GPU (Graphics Processing Unit, image processor) for rendering and drawing of content required to be displayed by the display screen. In some embodiments, the processor 21 may also include an AI (ARTIFICIAL INTELLIGENCE ) processor for processing computing operations related to machine learning.
The memory 22 may be a carrier for storing resources, such as a read-only memory, a random access memory, a magnetic disk, or an optical disk, and the resources stored thereon include an operating system 221, a computer program 222, and data 223, and the storage may be temporary storage or permanent storage.
The operating system 221 is used for managing and controlling various hardware devices on the electronic device 20 and the computer program 222, so as to implement the operation and processing of the processor 21 on the mass data 223 in the memory 22, which may be Windows, unix, linux. The computer program 222 may further include a computer program that can be used to perform other specific tasks in addition to the computer program that can be used to perform the method of recognizing the location start point of an endoscopic image performed by the electronic apparatus 20 as disclosed in any of the foregoing embodiments. The data 223 may include, in addition to data received by the electronic device and transmitted by the external device, data collected by the input/output interface 25 itself, and so on.
Further, the embodiment of the application also discloses a computer readable storage medium, wherein the storage medium stores a computer program, and when the computer program is loaded and executed by a processor, the method for identifying the position starting point of the endoscope image disclosed in any embodiment is realized.
In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, so that the same or similar parts between the embodiments are referred to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative elements and steps are described above generally in terms of functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Those skilled in the art may implement the described functionality using different approaches for each particular application, but such implementation is not intended to be limiting.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. The software modules may be disposed in random access Memory (Random Access Memory, i.e., RAM), memory, read-Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a compact disc Read-Only Memory (Compact Disc Read-Only Memory, i.e., CD-ROM), or any other form of storage medium known in the art.
Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The above detailed description of the method, the device, the equipment and the storage medium for identifying the position starting point of the endoscope image provided by the invention applies specific examples to illustrate the principle and the implementation of the invention, and the description of the examples is only used for helping to understand the method and the core idea of the invention; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present invention, the present description should not be construed as limiting the present invention in view of the above.

Claims (10)

1. A method for recognizing a starting point of a portion of an endoscopic image, comprising:
acquiring video image data shot by a capsule endoscope;
Processing the video image data by utilizing a target sliding window to extract characteristics of a plurality of frames of video images in the target sliding window to obtain corresponding characteristic data;
And fusing all the feature data to obtain fused features, and determining a position starting point identification result of the video image data in the target sliding window based on the fused features.
2. The method for recognizing a location origin of an endoscopic image according to claim 1, wherein before said processing of said video image data using a target sliding window, further comprising:
Scaling the video image data according to a preset pixel proportion;
and carrying out normalization processing on the video image data after the scaling processing.
3. The method for recognizing a location start point of an endoscopic image according to claim 1, wherein said fusing all of the feature data to obtain a fused feature comprises:
And fusing the channel numbers of all the feature data to obtain fusion features.
4. The method for recognizing a location start point of an endoscopic image according to claim 1, wherein said fusing all of the feature data to obtain a fused feature comprises:
Adding all the characteristic data to obtain an addition result;
And normalizing the addition result to obtain a fusion characteristic.
5. The method for recognizing a location start point of an endoscopic image according to claim 1, wherein said fusing all of the feature data to obtain a fused feature comprises:
Determining a weight coefficient preset based on the target sliding window; the weight value of the characteristic data corresponding to the first frame video image and the last frame video image in the target sliding window is higher than the weight value of the characteristic data corresponding to other frame video images, and the sum value of all the weight coefficients is 1;
and weighting all the feature data by using the weight coefficient to obtain a fusion feature.
6. The method of recognizing a location start point of an endoscopic image according to any one of claims 1 to 5, wherein said determining a location start point recognition result of video image data within said target sliding window based on said fusion feature comprises:
extracting the characteristics of the fusion characteristics to obtain characteristic extraction results;
And inputting the feature extraction result into a first classifier so that the first classifier outputs a part starting point identification result for representing whether the video image data in the target sliding window is the starting point position of the first preset tissue part or not.
7. The method of recognizing a location start point of an endoscopic image according to any one of claims 1 to 5, wherein said determining a location start point recognition result of video image data within said target sliding window based on said fusion feature comprises:
extracting the characteristics of the fusion characteristics to obtain characteristic extraction results;
the feature extraction results are respectively input into a plurality of second classifiers to obtain confidence scores which are output by each second classifier and used for representing that video image data in the target sliding window are second preset tissue parts;
and taking the target tissue part corresponding to the highest confidence score as a classification result of the video image data in the target sliding window, and outputting a part starting point identification result of whether the target tissue part is the starting point position of the third preset tissue part.
8. A part origin recognition device for an endoscopic image, comprising:
the data acquisition module is used for acquiring video image data shot by the capsule endoscope;
the feature extraction module is used for processing the video image data by utilizing a target sliding window so as to extract features of a plurality of frames of video images in the target sliding window to obtain corresponding feature data;
The feature fusion module is used for fusing all the feature data to obtain fusion features;
And the result determining module is used for determining a part starting point identification result of the video image data in the target sliding window based on the fusion characteristic.
9. An electronic device, comprising:
A memory for storing a computer program;
A processor for executing the computer program to implement the steps of the part origin recognition method of an endoscopic image as claimed in any one of claims 1 to 7.
10. A computer-readable storage medium storing a computer program; wherein the computer program when executed by a processor implements the steps of the method for recognizing a site origin of an endoscopic image according to any one of claims 1 to 7.
CN202410180195.6A 2024-02-18 2024-02-18 Method, device, equipment and medium for identifying position starting point of endoscope image Pending CN117935124A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410180195.6A CN117935124A (en) 2024-02-18 2024-02-18 Method, device, equipment and medium for identifying position starting point of endoscope image

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410180195.6A CN117935124A (en) 2024-02-18 2024-02-18 Method, device, equipment and medium for identifying position starting point of endoscope image

Publications (1)

Publication Number Publication Date
CN117935124A true CN117935124A (en) 2024-04-26

Family

ID=90750725

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410180195.6A Pending CN117935124A (en) 2024-02-18 2024-02-18 Method, device, equipment and medium for identifying position starting point of endoscope image

Country Status (1)

Country Link
CN (1) CN117935124A (en)

Similar Documents

Publication Publication Date Title
CN110599448B (en) Migratory learning lung lesion tissue detection system based on MaskScoring R-CNN network
WO2020207377A1 (en) Method, device, and system for image recognition model training and image recognition
CN110600122B (en) Digestive tract image processing method and device and medical system
CN113496489B (en) Training method of endoscope image classification model, image classification method and device
US20220172828A1 (en) Endoscopic image display method, apparatus, computer device, and storage medium
EP3998579A1 (en) Medical image processing method, apparatus and device, medium and endoscope
CN111091559A (en) Depth learning-based auxiliary diagnosis system for small intestine sub-scope lymphoma
KR102332088B1 (en) Apparatus and method for polyp segmentation in colonoscopy images through polyp boundary aware using detailed upsampling encoder-decoder networks
CN113379693A (en) Capsule endoscopy key focus image detection method based on video abstraction technology
US20220369920A1 (en) Phase identification of endoscopy procedures
US20240005494A1 (en) Methods and systems for image quality assessment
KR20190090150A (en) An apparatus for creating description of capsule endoscopy and method thereof, a method for searching capsule endoscopy image based on decsription, an apparatus for monitoring capsule endoscopy
CN115082448A (en) Method and device for scoring cleanliness of intestinal tract and computer equipment
EP3851025B1 (en) Information processing device, control method, and program
CN114693598A (en) Capsule endoscope gastrointestinal tract organ image automatic identification method
Ratheesh et al. Advanced algorithm for polyp detection using depth segmentation in colon endoscopy
CN116542883B (en) Magnetic control capsule gastroscope image focus mucosa enhancement system
US20220110505A1 (en) Information processing apparatus, control method, and non-transitory storage medium
CN117935124A (en) Method, device, equipment and medium for identifying position starting point of endoscope image
CN113744266B (en) Method and device for displaying focus detection frame, electronic equipment and storage medium
CN115994999A (en) Goblet cell semantic segmentation method and system based on boundary gradient attention network
CN112734707B (en) Auxiliary detection method, system and device for 3D endoscope and storage medium
KR102294739B1 (en) System and method for identifying the position of capsule endoscope based on location information of capsule endoscope
CN118037963B (en) Reconstruction method, device, equipment and medium of digestive cavity inner wall three-dimensional model
US12051199B2 (en) Image processing method and apparatus, server, medical image processing device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination