WO2022042352A1 - 图像识别方法、电子设备及可读存储介质 - Google Patents

图像识别方法、电子设备及可读存储介质 Download PDF

Info

Publication number
WO2022042352A1
WO2022042352A1 PCT/CN2021/112777 CN2021112777W WO2022042352A1 WO 2022042352 A1 WO2022042352 A1 WO 2022042352A1 CN 2021112777 W CN2021112777 W CN 2021112777W WO 2022042352 A1 WO2022042352 A1 WO 2022042352A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
marker
markers
detection
detection frames
Prior art date
Application number
PCT/CN2021/112777
Other languages
English (en)
French (fr)
Inventor
王廷旗
高飞
段晓东
侯晓华
谢小平
向雪莲
Original Assignee
安翰科技(武汉)股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 安翰科技(武汉)股份有限公司 filed Critical 安翰科技(武汉)股份有限公司
Priority to EP21860190.4A priority Critical patent/EP4207058A4/en
Priority to US18/023,973 priority patent/US20240029387A1/en
Publication of WO2022042352A1 publication Critical patent/WO2022042352A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0012Biomedical image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/10Image acquisition
    • G06V10/16Image acquisition using multiple overlapping images; Image stitching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/7715Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10116X-ray image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20021Dividing image into blocks, subimages or windows
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20076Probabilistic image processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30204Marker
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/03Recognition of patterns in medical or anatomical images

Definitions

  • the present invention relates to the field of image recognition of medical equipment, in particular to an image recognition method, an electronic device and a readable storage medium.
  • Gastrointestinal motility refers to the ability of normal gastrointestinal motility to help complete food digestion and absorption; when gastrointestinal motility is weak, it may cause indigestion.
  • the strength of gastrointestinal motility is usually judged by the method of gastrointestinal marker identification. Specifically, after the user swallows markers of different shapes in stages, the position of the marker is determined by the image obtained by X-ray photography, and then the position of the marker is determined. Determine the strength of gastrointestinal motility.
  • the positions and types of markers in an X-ray image are usually determined by manually observing the images; however, for markers of different shapes that are swallowed by fractions, the size and type displayed on the X-ray image are small and different. It is difficult to accurately count the positions and quantities of various markers by manual observation, so that it is impossible to judge the gastrointestinal motility of the subject.
  • the purpose of the present invention is to provide an image recognition method, an electronic device and a readable storage medium.
  • an embodiment of the present invention provides an image recognition method, the method includes: dividing an original image into a plurality of unit images having the same predetermined size, and several markers are distributed in the original image;
  • a plurality of pre-check unit images are spliced into a pre-output image
  • judging whether there are markers framed in two adjacent detection frames in the pre-output image are the same marker specifically includes: confirming the type of the marker in each detection frame according to the probability of the type of the marker, if two adjacent detection frames are The markers framed in two detection frames are of the same type, and then it is confirmed whether the framed markers are the same marker according to the coordinate values of the two adjacent detection frames.
  • the method further includes:
  • the original image is supplemented with edge pixel values before the unit image is formed, or the edge pixel value is supplemented on the unit image whose size is smaller than the predetermined size after the unit image is formed.
  • the method for constructing the neural network model includes: extracting at least one feature layer by using a convolutional neural network corresponding to each unit image;
  • the method further includes:
  • the unit image is processed by the pooling layer multiple times to obtain the corresponding feature layer.
  • the method includes:
  • the unit image is processed by the convolution layer at least once, and the size of the convolution kernel is the same.
  • confirming whether the frame-selected markers are the same marker according to the coordinate values of two adjacent detection frames includes:
  • the upper left corner of the original image as the coordinate origin to establish a rectangular coordinate system, and compare whether the difference between the eigenvalues of two horizontally adjacent detection frames is within the threshold range. If so, confirm that the two detection frames currently used for calculation are selected
  • the markers are the same marker; the feature value is the upper left corner coordinate value and the lower right corner coordinate value of each detection frame.
  • confirming whether the frame-selected markers are the same marker according to the coordinate values of two adjacent detection frames includes:
  • the upper left corner of the original image as the coordinate origin to establish a rectangular coordinate system, and compare whether the difference between the eigenvalues of two vertically adjacent detection frames is within the threshold range. If so, confirm that the two detection frames currently used in the calculation are selected
  • the markers are the same marker; the feature value is the upper left corner coordinate value and the lower right corner coordinate value of each detection frame.
  • the two detection frames are combined, including:
  • an embodiment of the present invention provides an electronic device, including a memory and a processor, the memory stores a computer program that can be executed on the processor, and the processor executes the computer
  • the program implements the steps in the image recognition method as described above.
  • an embodiment of the present invention provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, implements the steps in the image recognition method as described above.
  • the beneficial effects of the present invention are: in the image recognition method, electronic device and readable storage medium of the present invention, the original image is automatically processed by the neural network model to add a detection frame; The frame improves the accuracy of image identification, thereby effectively identifying the type and position of the marker in the image, so as to accurately determine the gastrointestinal motility of the subject.
  • FIG. 1 is a schematic flowchart of an image recognition method provided by an embodiment of the present invention.
  • FIG. 2 is a schematic structural diagram of a neural network according to a preferred embodiment of the present invention.
  • FIG. 3 is a schematic diagram of the result of performing steps S1 to S3 shown in FIG. 1 on the original image
  • FIG. 4 is a schematic diagram of the result after the processing of step S4 shown in FIG. 1 is performed on the basis of FIG. 3 .
  • a first embodiment of the present invention provides an image recognition method, and the method includes:
  • the obtained original image is usually large.
  • the image needs to be divided before the image is identified, and after the image is identified, the image is restored in the order of segmentation.
  • the image in the process of dividing the original image into a plurality of unit images with the same predetermined size, the image can be divided in sequence starting from the corners of the original image, or after selecting a point in the original image, the point can be used as the The basics split the image on demand.
  • the method further includes: if any unit image is smaller than a predetermined size in the segmentation process, supplementing the edge pixel value of the original image before the unit image is formed, or after the unit image is formed, the size is smaller than the predetermined size.
  • the unit images are supplemented with edge pixel values so that the size of each unit image is the same as the predetermined size.
  • the original image may be segmented first, and if the size of the finally formed unit image is smaller than the predetermined size, only the edge pixel values are supplemented for the last formed unit image.
  • image segmentation it is also possible to first calculate whether the current original image can be completely segmented according to a predetermined size, and if it cannot be completely segmented, the edge pixel values of the original image are supplemented at the edge of the original image before segmentation.
  • the pixel value of the supplementary position can be specifically set as required, for example, set to 0, which will not be further described here.
  • the segmented unit image is a square.
  • the segmented unit image has a size of 320*320, and the unit is a pixel.
  • the method for constructing the neural network model includes: extracting at least one feature layer by using a convolutional neural network (Convolutional Neural Networks, CNN) corresponding to each unit image.
  • a convolutional neural network Convolutional Neural Networks, CNN
  • the unit image is processed using p m*m convolution kernels as the anchor frame convolution predictor to predict the type and position of the marker.
  • m is an odd positive integer.
  • the corresponding feature layers are obtained by performing pooling layer processing on the unit image for many times; that is, the type and size of the marker determine the number and size of the feature layers. Specifically, the size of the marker on the original image is pre-divided to determine the number and size of the feature layers.
  • the method specifically includes: before performing the pooling layer processing on the unit image each time, Perform convolution layer processing on the unit image at least once, and the size of the convolution kernel is the same.
  • the convolution layer processing is to perform dimension reduction and feature extraction on the input image through convolution operations; the pooling layer processing is used to reduce the spatial size of the image.
  • the shapes of the markers are: dot type, "O" ring type, and three-chamber type. type. Since X-ray imaging may capture incomplete markers, or overlapping markers, even the same marker may appear at different sizes on an X-ray image. Therefore, it is necessary to pre-divide the size of the marker displayed on the X-ray image to determine the number of feature layers that need to be configured.
  • a neural network model for feature extraction in this example is established, which is configured with 3 feature layers.
  • the original image is divided in sequence to form multiple 320*320 unit images, and feature extraction is performed on each unit image by establishing a convolutional neural network.
  • the size of the input unit image is 320*320, and the input image is sequentially processed by convolution layer processing and pooling layer processing to obtain the required feature layers respectively.
  • the image size of feature layer 1 is reduced to 40*40, which can detect Markers with sizes ranging from 8*8 to 24*24.
  • the feature layer 2 is extracted after performing one pooling layer processing and three convolution layer processing in turn on the image (40*40) of feature layer 1. At this time, the image size of the feature layer 2 is doubled compared to that of the feature layer 1, which is 20*20, and markers with a size range of 16*16 to 48*48 can be detected.
  • the feature layer 3 is extracted after performing one pooling layer processing and three convolution layer processing in turn on the feature layer 2 image (20*20).
  • the image size of the feature layer 3 is 10*10, and markers with a size range of 32*32 to 96*96 can be detected.
  • the size of the convolution kernel and the number of convolution kernels used in each convolutional layer processing can be set according to actual needs.
  • the number of convolution kernels is 64, 128, 256, 512, etc.
  • a neural network can be established according to actual requirements, and the number of feature layers can be configured.
  • the anchor frame is a plurality of bounding boxes with different sizes and aspect ratios generated centered on any pixel point;
  • the output of c1 is a one-dimensional array, represented by result_c1[3].
  • result_c1[0], result_c1[1] and result_c1[2] are the three elements of this array, which respectively represent the probability of the marker type in the anchor box.
  • result_c1[0] represents the probability that the marker in the anchor box is a dot shape
  • result_c1[1] represents the probability that the marker in the anchor box is an “O” ring
  • result_c1[2] represents the anchor box
  • the marker is the probability that the three-compartment type is in the range of 0 to 1 for all three elements.
  • the type of marker in the anchor box is determined by the maximum value of these three elements. For example, when the value of result_c1[0] is the largest among these three elements, the marker in the corresponding anchor box is a dot type .
  • the output of c2 is a one-dimensional array, represented by result_c2[4].
  • result_c2[0], result_c2[1], result_c2[2] and result_c2[3] are the four elements of this array, which respectively represent the horizontal offset value of the upper left corner of the anchor frame and the vertical offset value of the upper left corner of the anchor frame.
  • the scaling factor for the width of the anchor box and the scaling factor for the height of the anchor box which range from 0 to 1.
  • the unit images used for training are first marked manually, and the marked content is the category information of the markers, including: the detection frame enclosing each marker, and the detection The upper left corner coordinate value and the lower right corner coordinate value corresponding to the box.
  • the unlabeled unit image into the newly built neural network model for prediction.
  • it is greater than the preset ratio it means that the neural network model can be applied normally.
  • the ratio of the predicted result to the manually labeled result is not greater than the preset ratio, the neural network model needs to be adjusted until the requirement is met.
  • the neural network model is trained through the above content to make its prediction results more accurate.
  • the neural network model needs to be adjusted until the requirement is met. In this way, the neural network model is trained through the above content to make its prediction results more accurate.
  • an Intersection over Union (IOU) can be used to evaluate the detection accuracy of the neural network model, which will not be described further here.
  • a neural network model is used to add a detection frame to the markers in the unit image to form a pre-output image as shown in FIG.
  • the markers located at the splicing positions of multiple pre-check unit images are decomposed into multiple parts, and multiple parts of the same marker located in different pre-check images are simultaneously identified by multiple detection frames.
  • the marker is identified multiple times in the final output image.
  • the image is composed of 4 pre-check unit images, and there are three types of markers to be identified, namely the dot type marked 1, the "O" ring type marked 2, and the three-chamber type marked 3 type.
  • the same three-chamber marker A located at the junction of the images of the four pre-checking units is repeatedly marked by three detection frames. If the current image is used for output, the marker A will be counted 3 times, and correspondingly, 3 marker positions will be given, which is not conducive to the statistics and position determination of the marker, thus affecting the judgment of the subject's gastrointestinal motility. accuracy.
  • step S4 it specifically includes: confirming the type of the marker in each detection frame according to the probability of the type of the marker, if the markers framed in two adjacent detection frames are of the same type, according to the adjacent detection frame The coordinate values of the two detection boxes confirm whether the selected markers are the same marker.
  • confirming whether the frame-selected markers are the same marker according to the coordinate values of two adjacent detection frames includes:
  • the XY rectangular coordinate system is established with the upper left corner of the original image as the coordinate origin, the X-axis extending to the right from the coordinate origin is the positive X-axis, and the downward extension of the coordinate origin is the positive Y-axis; where the x-axis is the horizontal axis, and the y-axis is vertical axis. Compare whether the difference between the eigenvalues of the two horizontally adjacent detection frames is within the threshold range, and if so, confirm that the markers framed in the two detection frames currently used in the calculation are the same marker; the eigenvalues are each A coordinate value of the upper left corner and the lower right corner of the detection frame;
  • abs() means taking the absolute value
  • rectangles[i][] means the coordinate value of the ith detection frame in the horizontal direction
  • x L and y L respectively represent the abscissa value and ordinate value of the upper left corner of the detection frame
  • i takes Integer, n1 ⁇ (1,2,3), n2 ⁇ (5,10,15).
  • confirming whether the frame-selected markers are the same marker according to the coordinate values of the two adjacent detection frames further includes: judging whether there are markers framed by the two adjacent detection frames in the longitudinally spliced pre-check unit images the same marker;
  • the XY rectangular coordinate system is established with the upper left corner of the original image as the coordinate origin, the X-axis extending to the right from the coordinate origin is the positive X-axis, and the downward extension of the coordinate origin is the positive Y-axis; where the x-axis is the horizontal axis, and the y-axis is vertical axis. Compare whether the difference between the eigenvalues of the two vertically adjacent detection frames is within the threshold range, and if so, confirm that the markers framed in the two detection frames currently used in the calculation are the same marker; the eigenvalues are each A coordinate value of the upper left corner and the lower right corner of the detection frame;
  • abs() represents the absolute value
  • rectangles[j][] represents the coordinate value of the jth detection frame in the vertical direction
  • x L and y L represent the abscissa value and ordinate value of the upper left corner of the detection frame respectively
  • j takes Integer, n3 ⁇ (40, 50, 60), n4 ⁇ (1, 2, 3).
  • the merging method specifically includes:
  • the coordinates of the upper left corner (x aL , y aL ) and the coordinates of the lower right corner (x aR , y aR ) of the horizontally merged detection frame are:
  • x aL min(rectangles[i+1][x L ], rectangles[i][x L ]),
  • x aR max(rectangles[i+1][x R ], rectangles[i][x R ]),
  • min() represents the minimum value
  • max() represents the maximum value
  • x R and y R represent the abscissa value and the ordinate value of the lower right corner of the detection frame, respectively.
  • the coordinates of the upper left corner (x bL , y bL ) and the coordinates of the lower right corner (x bR , y bR ) of the vertically merged detection frame are respectively:
  • x bL min(rectangles[j+1][x L ], rectangles[j][x L ]),
  • y bL min(rectangles[j+1][y L ], rectangles[j][y L ]),
  • x bR max(rectangles[j+1][x R ],rectangles[j][x R ]),
  • y bR max(rectangles[j+1][y R ], rectangles[j][y R ]).
  • the order of merging horizontally and vertically is not limited. It can be combined horizontally and then vertically, or can be combined vertically and then horizontally. The order of horizontal and vertical does not affect the final output result.
  • the feature values are described with the upper left corner coordinate value and the lower right corner coordinate value of each detection frame. In practical applications, the feature value may be selected as the same coordinate value with the same position on each detection frame.
  • the eigenvalues are the lower left corner coordinate value and the upper right corner coordinate value, the transformation of the eigenvalues will not affect the final output result; the scheme of selecting eigenvalues at different positions in the detection frame is included in the protection of this application Within the scope, no further details are given here.
  • the three-chamber marker A at the intersection of the 4 unit images in the final output image shows the display of merging the 3 detection frames in Figure 3 into one detection frame for output. condition.
  • the output image has a unique detection frame corresponding to each marker.
  • the type and position of the marker can be confirmed by the label and position of the detection frame in the image, and the positions of the different types of markers in the digestive tract can be confirmed, thereby confirming the gastrointestinal motility of the subject.
  • an embodiment of the present invention provides an electronic device, including a memory and a processor, the memory stores a computer program that can be executed on the processor, and the processor implements the above when executing the computer program. Describe the steps in the image recognition method.
  • an embodiment of the present invention provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, implements the steps in the image recognition method as described above.
  • the original image is automatically processed by the neural network model and the detection frame is added; further, the image identification accuracy is improved by merging the detection frames of the repeated identification, thereby effectively Identify the type and position of the marker in the image, confirm the distribution of the marker in the digestive tract, and accurately judge the gastrointestinal motility of the subject.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Radiology & Medical Imaging (AREA)
  • Quality & Reliability (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

本发明提供了一种图像识别方法、电子设备及可读存储介质,所述方法包括:将原始图像分割为具有相同预定尺寸的多个单元图像;将所述单元图像输入预建立的神经网络模型进行处理,以对每一单元图像中的标记物对应添加检测框,形成预检单元图像;按照每一单元图像在原始图像中的分割位置将多个预检单元图像拼接为一幅预输出图像;判断预输出图像是否存在相邻的两个检测框中框选的标记物为同一标记物:直至确认为同一标记物的检测框均合并完成后,将带有检测框的图像进行输出。本发明可以有效识别标记物在图像中的种类及位置。

Description

图像识别方法、电子设备及可读存储介质
本申请要求了申请日为2020年08月28日,申请号为202010881849.X,发明名称为“图像识别方法、电子设备及可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明涉及医疗设备图像识别领域,尤其涉及一种图像识别方法、电子设备及可读存储介质。
背景技术
胃肠动力是指正常的胃肠蠕动以帮助完成食物消化和吸收的能力;当胃肠动力弱时,就可能引起消化不良。
现有技术中,通常通过胃肠标记物识别的方法判断胃肠动力的强弱,具体的,用户分次吞咽下不同形状标记物后,通过X射线拍摄获取的图像确定标记物的位置,进而判断胃肠动力强弱。
现有技术中,对于X光图像中标记物的位置和种类,通常通过人工观察图像进行确定;然而,对于分次吞咽的不同形状的标记物,其显示在X光图像上的尺寸小,种类多,人工观察的方式难以准确统计各种标记物的位置及数量,从而无法判断被检者的胃肠动力。
发明内容
为解决上述技术问题,本发明的目的在于提供一种图像识别方法、电子设备及可读存储介质。
为了实现上述发明目的之一,本发明一实施方式提供一种图像识别方法,所述方法包括:将原始图像分割为具有相同预定尺寸的多个单元图像,所述原始图像中分布若干标记物;
将所述单元图像输入预建立的神经网络模型进行处理,以对每一单元图像中的标记物对应添加检测框,形成预检单元图像;所述检测框为围合所述标记物的最小矩形框;
按照每一单元图像在原始图像中的分割位置将多个预检单元图像拼接为一幅预输出图像;
判断预输出图像是否存在相邻的两个检测框中框选的标记物为同一标记物:
若是,将两个检测框进行合并;
若否,保留对应不同标记物的不同检测框;
直至确认为同一标记物的检测框均合并完成后,将带有检测框的图像进行输出;
其中,判断预输出图像是否存在相邻的两个检测框中框选的标记物为同一标记物具体包括:根据标记物种类的概率确认每个检测框中标记物的种类,若相邻的两个检测框中框选的标记物为同一种类,则根据相邻的两个检测框的坐标值确认框选的标记物是否为同一标记物。
作为本发明一实施方式的进一步改进,将原始图像分割为具有相同预定尺寸的多个单元图像过程时,所述方法还包括:
若分割过程中存在任一单元图像小于预定尺寸,则在单元图像形成之前对原始图像进行边缘像素值补充,或在单元图像形成之后,对尺寸小于预定尺寸的单元图像进行边缘像素值补充。
作为本发明一实施方式的进一步改进,所述神经网络模型的构建方法包括:对应每一单元图像采用卷积神经网络提取至少一个特征层;
提取特征层过程中,使用p个m*m的卷积核作为锚框的卷积预测器处理所述单元图像,p=(c1+c2)*k,其中,锚框为预设的长宽比不同的矩形框,m为奇数正整数,c1表示标记物种类的数量,k表示锚框的数量,c2为调整锚框的偏移参数的数量;所述检测框通过锚框进行大小变化获得。
作为本发明一实施方式的进一步改进,所述方法还包括:
根据标记物的种类及尺寸多次对单元图像做池化层处理得到对应的特征层。
作为本发明一实施方式的进一步改进,在根据标记物的种类及尺寸多次对单元图像做池化层处理得到对应的特征层过程中,所述方法包括:
在每次对单元图像进行池化层处理之前,均至少1次对所述单元图像做卷积层处理,且其卷积核大小相同。
作为本发明一实施方式的进一步改进,所述方法还包括:配置c2=4,调整锚框的偏移参数具体包括:左上角的横向偏移值和纵向偏移值,宽度的缩放倍数以及高度的缩放倍数。
作为本发明一实施方式的进一步改进,根据相邻的两个检测框的坐标值确认框选的标记物是否为同一标记物包括:
以原始图像的左上角为坐标原点建立直角坐标系,比较横向相邻的两个检测框的特征值的差值是否在阈值范围内,若是,则确认当前计算使用的两个检测框中框选的标记物为同一标记物;所述特征值为每一检测框的左上角坐标值和右下角坐标值。
作为本发明一实施方式的进一步改进,根据相邻的两个检测框的坐标值确认框选的标记物是否为同一标记物包括:
以原始图像的左上角为坐标原点建立直角坐标系,比较纵向相邻的两个检测框的特征值的差值是否在阈值范围内,若是,则确认当前计算使用的两个检测框中框选的标记物为同一标记物;所述特征值为每一检测框的左上角坐标值和右下角坐标值。
作为本发明一实施方式的进一步改进,将所述两个检测框进行合并,包括:
比较当前用于计算的两个检测框的左上角的坐标值,分别取其横坐标和纵坐标的最小值作为合并后的检测框的左上角坐标值;
比较所述两个检测框的右下角的坐标值,分别取其横坐标和纵坐标的最大值作为合并后的检测框的右下角坐标值。
为了解决上述发明目的之一,本发明一实施方式提供一种电子设备,包括存储器和处理器,所述存储器存储有可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现如上所述图像识别方法中的步骤。
为了解决上述发明目的之一,本发明一实施方式提供一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现如上所述图像识别方法中的步骤。
与现有技术相比,本发明的有益效果是:本发明的图像识别方法、电子设备及可读存储介质,原始图像通过神经网络模型自动处理添加检测框;进一步的,通过合并重复标识的检测框提高图像标识精确性,进而有效识别标记物在图像中的种类及位置,从而准确判断被检者的胃肠动力。
附图说明
图1是本发明一实施方式提供的图像识别方法的流程示意图;
图2是本发明一较佳实施方式的神经网络的结构示意图;
图3是对原始图像进行图1所示步骤S1至S3处理后的结果示意图;
图4是在图3的基础上进行图1所示步骤S4处理后的结果示意图。
具体实施方式
以下将结合附图所示的具体实施方式对本发明进行详细描述。但这些实施方式并不限制本发明,本领域的普通技术人员根据这些实施方式所做出的结构、方法、或功能上的变换均包含在本发明的保护范围内。
如图1所示,本发明第一实施方式中提供一种图像识别方法,所述方法包括:
S1、将原始图像分割为具有相同预定尺寸的多个单元图像,所述原始图像中分布若干标记物;
S2、将所述单元图像输入预建立的神经网络模型进行处理,以对每一单元图像中的标记物对应添加检测框,形成预检单元图像;所述检测框为围合所述标记物的最小矩形框;
S3、按照每一单元图像在原始图像中的分割位置将多个预检单元图像拼接为一幅预输出图像;
S4、判断预输出图像是否存在相邻的两个检测框中框选的标记物为同一标记物:
若是,执行S5:将两个检测框进行合并;
若否,执行S6:保留对应不同标记物的不同检测框;
S7、直至确认为同一标记物的检测框均合并完成后,将带有检测框的图像进行输出。
本发明具体实施方式中,获取的原始图像通常很大,为了提高标识的精确度,需要在对图像进行标识之前,将图像进行分割,在对图像标识之后,再按照分割顺序复原图像。
对于步骤S1,在将原始图像分割为具有相同预定尺寸的多个单元图像过程中,可以以原始图像的边角开始按序分割图像,也可以在原始图像中任选一点后,以该点为基础按需分割图像。相应的,所述方法还包括:若分割过程中存在任一单元图像小于预定尺寸,则在单元图像形成之前对原始图像进行边缘像素值补充,或在单元图像形成之后,对尺寸小于预定尺寸的单元图像进行边缘像素值补充,以使得每一单元图像的尺寸均与所述预定尺寸相同。
需要说明的是,在具体应用过程中,可以先对原始图像进行分割,若最后形成的单元图像尺寸小于预定尺寸,则仅对最后形成的单元图像进行边缘像素值的补充。也可以在图像分割之前,先行计算当前的原始图像是否可以按照预定尺寸被完整分割,若不能被完整分割,则在分割之前,在原始图像的边缘进行边缘像素值的补充。通常情况下,补充位置的像素值可以根据需要具体设定,例如:设置为0,在此不做进一步的赘述。
较佳的,分割后的所述单元图像为正方形,本发明一具体示例中,分割后的单元图像的尺寸为320*320,单位为像素。
对于步骤S2,所述神经网络模型的构建方法包括:对应每一单元图像采用卷积神经网络(Convolutional Neural Networks,CNN)提取至少一个特征层。
本发明可实现方式中,提取特征层过程中,使用p个m*m的卷积核作为锚框的卷积预测器处理所述单元图像,以预测标记物的种类和位置。其中,m为奇数正整数。所述锚框为预设的长宽比不同的矩形框。p=(c1+c2)*k,其中,c1表示标记物种类的数量,k表示锚框的数量,c2为调整锚框的偏移参数的数量;所述检测框通过锚框进行大小变 化获得。
进一步的,根据标记物的种类及尺寸多次对单元图像做池化层处理得到对应的特征层;即,标记物的种类及尺寸决定特征层数量及尺寸。具体的,将原始图像上的标记物尺寸进行预划分,以确定特征层的数量及尺寸。
较佳的,在根据标记物的种类及尺寸多次对单元图像做池化层处理得到对应的特征层过程中,所述方法具体包括:在每次对单元图像进行池化层处理之前,均至少1次对所述单元图像做卷积层处理,且其卷积核大小相同。
所述卷积层处理是通过卷积操作对输入图像进行降维和特征抽取;所述池化层处理用以减少图像的空间大小。
为了便于理解,以下描述一具体示例供参考。
如图2至图4所示,本发明一具体示例中,需要标识的标记物种类为三种,参阅图3所示,标记物的形状分别为:圆点型、“O”环型和三室型。由于X射线拍摄可能拍摄到不完整的标记物,或重叠的标记物,因此,即使同一种标记物在X光图像上显示的尺寸也可能不同。从而,需要对X光图像上显示的标记物的尺寸做预划分,以确定需要配置的特征层的数量。
参阅图2所示,建立本示例中特征提取的神经网络模型,配置有3个特征层。原始图像按序分割形成多个320*320的单元图像,通过建立卷积神经网络对每一单元图像进行特征提取。具体的,输入的单元图像的尺寸为320*320,依序对输入的图像进行卷积层处理和池化层处理分别得到所需的特征层。对单元图像(320*320)依次进行2次卷积层处理、1次池化层处理,2次卷积层处理、1次池化层处理,3次卷积层处理、1次池化层处理,3次卷积层处理后,提取特征层1。由于总共进行了3次池化层处理,而每一次池化层处理都会将图像尺寸相较于前一次的图像尺寸缩小一倍,因此,特征层1的图像尺寸缩小为40*40,能够检测尺寸范围为8*8至24*24的标记物。对特征层1图像(40*40)依次进行1次池化层处理和3次卷积层处理后,提取特征层2。此时,特征层2的图像尺寸相较于特征层1又缩小了一倍,为20*20,能够检测尺寸范围为16*16至48*48的标记物。对特征层2图像(20*20)依次进行1次池化层处理和3次卷积层处理后,提取特征层3。此时,特征层3的图像尺寸为10*10,能够检测尺寸范围为32*32至96*96的标记物。其中,每次卷积层处理使用的卷积核大小及卷积核个数,可以根据实际需求设定,例如:卷积核个数为图2中的64、128、256、512等数量。当然,在其他实施例中,可以根据实际需求建立神经网络,配置特征层数量。
较佳的,本发明具体示例中,使用3*3的卷积处理图像,即配置m=3;需要标识的 标记物种类为3种,即配置c1=3,其分别为圆点型,“O”环型,三室型;锚框为以任一个像素点为中心生成的多个大小和宽高比不同的边界框;调整锚框的偏移参数具体包括:左上角横向偏移值和纵向偏移值,宽度的缩放倍数以及高度的缩放倍数,即配置c2=4。
具体的,c1的输出是一个一维数组,用result_c1[3]表示。其中,result_c1[0]、result_c1[1]和result_c1[2]是这个数组的三个元素,分别表示锚框中的标记物种类的概率。一具体示例中,result_c1[0]表示锚框中的标记物为圆点型的概率,result_c1[1]表示锚框中的标记物为“O”环形的概率,result_c1[2]表示锚框中的标记物为三室型的概率,这三个元素的取值范围都为0到1。判断锚框中标记物的种类,由这三个元素的最大值决定,例如,当result_c1[0]的值在这三个元素中最大时,此时对应锚框中的标记物为圆点型。
c2的输出是一个一维数组,用result_c2[4]表示。其中,result_c2[0]、result_c2[1]、result_c2[2]和result_c2[3]是这个数组的四个元素,分别表示锚框左上角的横向偏移值、锚框左上角的纵向偏移值、锚框宽度的缩放倍数及锚框高度的缩放倍数,其取值范围都为0到1。通过c2的输出,调整锚框的大小从而形成检测框。
通过上述步骤,初步确认标记物的种类和位置。
另外,需要说明的是,神经网络模型初建立时,首先将用于训练的单元图像进行人工标注,其标注内容为标记物的类别信息,包括:围合每一标记物的检测框,以及检测框对应的左上角坐标值和右下角坐标值。之后,将未标注的单元图像输入到初建的神经网络模型进行预测,预测的结果越接近人工标注的结果,说明神经网络模型检测准确度越高,当预测的结果与人工标注的结果的比值大于预设的比值,表示神经网络模型可以正常应用。在此过程中,预测的结果与人工标注的结果的比值不大于预设的比值时,则需要调整神经网络模型,直至满足需求。如此,通过以上内容训练神经网络模型,使其预测结果更加准确。在此过程中,预测的结果与人工标注的结果的比值不大于预设的比值时,则需要调整神经网络模型,直至满足需求。如此,通过以上内容训练神经网络模型,使其预测结果更加准确。在神经网络建立过程中,可以采用交并比(Intersection over Union,IOU)评价该神经网络模型的检测准确度,在此不做进一步的赘述。
对于步骤S3,采用神经网络模型对单元图像中的标记物对应添加检测框后形成如图3所示的,由多幅预检单元图像按照原始图像的分割位置拼接形成的预输出图像。在该具体示例中,处于多个预检单元图像拼接位置的标记物被分解为多个部分,且处于不同预检图像中的同一标记物的多个部分被多个检测框同时标识。如此,导致最终形成的输出图像中,该标记物被多次标识。该具体示例中,图像由4个预检单元图像构成,需 要标识的标记物具有三种,分别为标号为1的圆点型,标号为2的“O”环型,以及标号为3的三室型。其中,处于4个预检单元图像交界位置的同一个三室型标记物A,由3个检测框重复标注。若以当前图像进行输出,则对于标记物A会被统计3次,相应的,也会对应给出3个标识位置,不利于标记物的统计及位置确定,从而影响判断被检者胃肠动力的准确性。
相应的,为了解决该问题,本发明较佳实施方式中,需要对同一标记物的多个检测框进行合并,使得最终输出的图像中对应于每一标记物仅唯一标识一个相应的检测框,以利于标记物的统计及位置判定,从而准确判断被检者的胃肠动力。
具体的,对于步骤S4,其具体包括:根据标记物种类的概率确认每个检测框中标记物的种类,若相邻的两个检测框中框选的标记物为同一种类,则根据相邻的两个检测框的坐标值确认框选的标记物是否为同一标记物。
进一步的,根据相邻的两个检测框的坐标值确认框选的标记物是否为同一标记物包括:
判断横向拼接的预检单元图像中是否存在相邻的两个检测框中框选的标记物为同一标记物;
以原始图像的左上角为坐标原点建立XY直角坐标系,以坐标原点向右延伸为X轴正向,以坐标原点向下延伸为Y轴正向;其中,x轴为横轴,y轴为纵轴。比较横向相邻的两个检测框的特征值的差值是否在阈值范围内,若是,则确认当前计算使用的两个检测框中框选的标记物为同一标记物;所述特征值为每一检测框的左上角坐标值和右下角坐标值;
即,同时满足abs(rectangles[i+1][x L]-rectangles[i][x L])<n1,
和(rectangles[i+1][y L]-rectangles[i][y L])<n2;
其中,abs()表示取绝对值,rectangles[i][]表示横向第i个检测框的坐标值,x L、y L分别表示检测框的左上角的横坐标值和纵坐标值;i取整数,n1∈(1,2,3),n2∈(5,10,15)。
另外,根据相邻的两个检测框的坐标值确认框选的标记物是否为同一标记物还包括:判断纵向拼接的预检单元图像中是否存在相邻的两个检测框中框选的标记物为同一标记物;
以原始图像的左上角为坐标原点建立XY直角坐标系,以坐标原点向右延伸为X轴正向,以坐标原点向下延伸为Y轴正向;其中,x轴为横轴,y轴为纵轴。比较纵向相邻的两个检测框的特征值的差值是否在阈值范围内,若是,则确认当前计算使用的两个 检测框中框选的标记物为同一标记物;所述特征值为每一检测框的左上角坐标值和右下角坐标值;
即,同时满足abs(rectangles[j+1][x L]-rectangles[j][x L])<n3,
和abs(rectangles[j+1][y L]-rectangles[j][y L])<n4;
其中,abs()表示取绝对值,rectangles[j][]表示纵向第j个检测框的坐标值,x L、y L分别表示检测框的左上角的横坐标值和纵坐标值;j取整数,n3∈(40,50,60),n4∈(1,2,3)。
进一步的,确认当前计算使用的两个检测框中框选的标记物为同一标记物后,将两个检测框进行合并,其合并方法具体包括:
比较当前用于计算的两个检测框的左上角的坐标值,分别取其横坐标和纵坐标的最小值作为合并后的检测框的左上角坐标值;
比较所述两个检测框的右下角的坐标值,分别取其横坐标和纵坐标的最大值作为合并后的检测框的右下角坐标值。
相应的,横向合并后的检测框的左上角坐标(x aL,y aL)和右下角坐标(x aR,y aR)分别为:
x aL=min(rectangles[i+1][x L],rectangles[i][x L]),
y aL=min(rectangles[i+1][y L],rectangles[i][y L]),
x aR=max(rectangles[i+1][x R],rectangles[i][x R]),
y aR=max(rectangles[i+1][y R],rectangles[i][y R]),
其中,min()表示取最小值,max()表示取最大值;x R、y R分别表示检测框的右下角的横坐标值和纵坐标值。
相应的,纵向合并后的检测框的左上角坐标(x bL,y bL)和右下角坐标(x bR,y bR)分别为:
x bL=min(rectangles[j+1][x L],rectangles[j][x L]),
y bL=min(rectangles[j+1][y L],rectangles[j][y L]),
x bR=max(rectangles[j+1][x R],rectangles[j][x R]),
y bR=max(rectangles[j+1][y R],rectangles[j][y R])。
需要说明的是,对于每幅预输出图像,均需要依次先后在横向、纵向方向上分别判断是否存在相邻的两个检测框中的标记物为同一标记物的情况。横向和纵向的合并顺序不做限定,其可以先横向合并再纵向合并,也可以先纵向合并再横向合并,横向和纵向的先后顺序不会影响最终的输出结果。另外,以上示例中,均以所述特征值为每一检测框的左上角坐标值和右下角坐标值进行描述。实际应用中,所述特征值可以选定为每个检测框上具有相同位置的同一坐标值。例如:其特征值为左下角坐标值和右上角坐标值, 所述特征值的变换,不会影响最终的输出结果;在检测框中不同位置选定特征值的方案均包括在本申请的保护范围内,在此不做进一步的赘述。
结合图4所示,经过合并后,处于4个单元图像交汇位置的三室型标记物A在最终的输出图像中,呈现出将图3中的3个检测框合并为一个检测框进行输出的显示状态。
进一步的,经过检测框的标识、合并,再进行图像输出,输出的图像中对应于每一标记物均唯一具有一个检测框。此时,通过图像中检测框的标号及位置可确认标记物的种类及位置,进而确认不同种类的标记物在消化道中的部位,从而确认被检者的胃肠动力。
进一步的,本发明一实施方式提供一种电子设备,包括存储器和处理器,所述存储器存储有可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现如上所述图像识别方法中的步骤。
进一步的,本发明一实施方式提供一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现如上所述图像识别方法中的步骤。
综上所述,本发明的图像识别方法、电子设备及可读存储介质,原始图像通过神经网络模型自动处理添加检测框;进一步的,通过合并重复标识的检测框提高图像标识精确性,进而有效识别标记物在图像中的种类及位置,确认标记物在消化道的分布情况,从而准确判断被检者的胃肠动力。
应当理解,虽然本说明书按照实施方式加以描述,但并非每个实施方式仅包含一个独立的技术方案,说明书的这种叙述方式仅仅是为清楚起见,本领域技术人员应当将说明书作为一个整体,各实施方式中的技术方案也可以经适当组合,形成本领域技术人员可以理解的其他实施方式。
上文所列出的一系列的详细说明仅仅是针对本发明的可行性实施方式的具体说明,它们并非用以限制本发明的保护范围,凡未脱离本发明技艺精神所作的等效实施方式或变更均应包含在本发明的保护范围之内。

Claims (12)

  1. 一种图像识别方法,其特征在于,所述方法包括:
    将原始图像分割为具有相同预定尺寸的多个单元图像,所述原始图像中分布若干标记物;
    将所述单元图像输入预建立的神经网络模型进行处理,以对每一单元图像中的标记物对应添加检测框,形成预检单元图像;所述检测框为围合所述标记物的最小矩形框;
    按照每一单元图像在原始图像中的分割位置将多个预检单元图像拼接为一幅预输出图像;
    判断预输出图像是否存在相邻的两个检测框中框选的标记物为同一标记物:
    若是,将两个检测框进行合并;
    若否,保留对应不同标记物的不同检测框;
    直至确认为同一标记物的检测框均合并完成后,将带有检测框的图像进行输出;
    其中,判断预输出图像是否存在相邻的两个检测框中框选的标记物为同一标记物具体包括:根据标记物种类的概率确认每个检测框中标记物的种类,若相邻的两个检测框中框选的标记物为同一种类,则根据相邻的两个检测框的坐标值确认框选的标记物是否为同一标记物。
  2. 根据权利要求1所述的图像识别方法,其特征在于,将原始图像分割为具有相同预定尺寸的多个单元图像过程时,所述方法还包括:
    若分割过程中存在任一单元图像小于预定尺寸,则在单元图像形成之前对原始图像进行边缘像素值补充,或在单元图像形成之后,对尺寸小于预定尺寸的单元图像进行边缘像素值补充。
  3. 根据权利要求1所述的图像识别方法,其特征在于,所述神经网络模型的构建方法包括:对应每一单元图像采用卷积神经网络提取至少一个特征层;
    提取特征层过程中,使用p个m*m的卷积核作为锚框的卷积预测器处理所述单元图像,p=(c1+c2)*k,其中,锚框为预设的长宽比不同的矩形框,m为奇数正整数,c1表示标记物种类的数量,k表示锚框的数量,c2为调整锚框的偏移参数的数量;所述检测框通过锚框进行大小变化获得。
  4. 根据权利要求3所述的图像识别方法,其特征在于,所述方法还包括:
    根据标记物的种类及尺寸多次对单元图像做池化层处理得到对应的特征层。
  5. 根据权利要求4所述的图像识别方法,其特征在于,在根据标记物的种类及尺 寸多次对单元图像做池化层处理得到对应的特征层过程中,所述方法包括:
    在每次对单元图像进行池化层处理之前,均至少1次对所述单元图像做卷积层处理,且其卷积核大小相同。
  6. 根据权利要求3所述的图像识别方法,其特征在于,所述方法还包括:配置c2=4,调整锚框的偏移参数具体包括:左上角的横向偏移值和纵向偏移值,宽度的缩放倍数以及高度的缩放倍数。
  7. 根据权利要求1所述的图像识别方法,其特征在于,根据相邻的两个检测框的坐标值确认框选的标记物是否为同一标记物包括:
    以原始图像的左上角为坐标原点建立直角坐标系,比较横向相邻的两个检测框的特征值的差值是否在阈值范围内,若是,则确认当前计算使用的两个检测框中框选的标记物为同一标记物;所述特征值为每一检测框的左上角坐标值和右下角坐标值。
  8. 根据权利要求1所述的图像识别方法,其特征在于,根据相邻的两个检测框的坐标值确认框选的标记物是否为同一标记物包括:
    以原始图像的左上角为坐标原点建立直角坐标系,比较纵向相邻的两个检测框的特征值的差值是否在阈值范围内,若是,则确认当前计算使用的两个检测框中框选的标记物为同一标记物;所述特征值为每一检测框的左上角坐标值和右下角坐标值。
  9. 根据权利要求7所述的图像识别方法,其特征在于,将所述两个检测框进行合并,包括:
    比较当前用于计算的两个检测框的左上角的坐标值,分别取其横坐标和纵坐标的最小值作为合并后的检测框的左上角坐标值;
    比较所述两个检测框的右下角的坐标值,分别取其横坐标和纵坐标的最大值作为合并后的检测框的右下角坐标值。
  10. 根据权利要求8所述的图像识别方法,其特征在于,将所述两个检测框进行合并,包括:
    比较当前用于计算的两个检测框的左上角的坐标值,分别取其横坐标和纵坐标的最小值作为合并后的检测框的左上角坐标值;
    比较所述两个检测框的右下角的坐标值,分别取其横坐标和纵坐标的最大值作为合并后的检测框的右下角坐标值。
  11. 一种电子设备,包括存储器和处理器,所述存储器存储有可在所述处理器上运行的计算机程序,其特征在于,所述处理器执行所述计算机程序时实现一种图像识别方法中的步骤,其中,所述图像识别方法包括:
    将原始图像分割为具有相同预定尺寸的多个单元图像,所述原始图像中分布若干标记物;
    将所述单元图像输入预建立的神经网络模型进行处理,以对每一单元图像中的标记物对应添加检测框,形成预检单元图像;所述检测框为围合所述标记物的最小矩形框;
    按照每一单元图像在原始图像中的分割位置将多个预检单元图像拼接为一幅预输出图像;
    判断预输出图像是否存在相邻的两个检测框中框选的标记物为同一标记物:
    若是,将两个检测框进行合并;
    若否,保留对应不同标记物的不同检测框;
    直至确认为同一标记物的检测框均合并完成后,将带有检测框的图像进行输出;
    其中,判断预输出图像是否存在相邻的两个检测框中框选的标记物为同一标记物具体包括:根据标记物种类的概率确认每个检测框中标记物的种类,若相邻的两个检测框中框选的标记物为同一种类,则根据相邻的两个检测框的坐标值确认框选的标记物是否为同一标记物。
  12. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现一种图像识别方法中的步骤,其中,所述图像识别方法包括:
    将原始图像分割为具有相同预定尺寸的多个单元图像,所述原始图像中分布若干标记物;
    将所述单元图像输入预建立的神经网络模型进行处理,以对每一单元图像中的标记物对应添加检测框,形成预检单元图像;所述检测框为围合所述标记物的最小矩形框;
    按照每一单元图像在原始图像中的分割位置将多个预检单元图像拼接为一幅预输出图像;
    判断预输出图像是否存在相邻的两个检测框中框选的标记物为同一标记物:
    若是,将两个检测框进行合并;
    若否,保留对应不同标记物的不同检测框;
    直至确认为同一标记物的检测框均合并完成后,将带有检测框的图像进行输出;
    其中,判断预输出图像是否存在相邻的两个检测框中框选的标记物为同一标记物具体包括:根据标记物种类的概率确认每个检测框中标记物的种类,若相邻的两个检测框中框选的标记物为同一种类,则根据相邻的两个检测框的坐标值确认框选的标记物是否为同一标记物。
PCT/CN2021/112777 2020-08-28 2021-08-16 图像识别方法、电子设备及可读存储介质 WO2022042352A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP21860190.4A EP4207058A4 (en) 2020-08-28 2021-08-16 IMAGE RECOGNITION METHOD, ELECTRONIC DEVICE AND READABLE STORAGE MEDIUM
US18/023,973 US20240029387A1 (en) 2020-08-28 2021-08-16 Image recognition method, electronic device and readable storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010881849.X 2020-08-28
CN202010881849.XA CN111739024B (zh) 2020-08-28 2020-08-28 图像识别方法、电子设备及可读存储介质

Publications (1)

Publication Number Publication Date
WO2022042352A1 true WO2022042352A1 (zh) 2022-03-03

Family

ID=72658900

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/112777 WO2022042352A1 (zh) 2020-08-28 2021-08-16 图像识别方法、电子设备及可读存储介质

Country Status (4)

Country Link
US (1) US20240029387A1 (zh)
EP (1) EP4207058A4 (zh)
CN (1) CN111739024B (zh)
WO (1) WO2022042352A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111739024B (zh) * 2020-08-28 2020-11-24 安翰科技(武汉)股份有限公司 图像识别方法、电子设备及可读存储介质
CN112308036A (zh) * 2020-11-25 2021-02-02 杭州睿胜软件有限公司 票据识别方法、装置及可读存储介质
CN113392857B (zh) * 2021-08-17 2022-03-11 深圳市爱深盈通信息技术有限公司 基于yolo网络的目标检测方法、装置和设备终端

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102542289A (zh) * 2011-12-16 2012-07-04 重庆邮电大学 一种基于多高斯计数模型的人流量统计方法
CN102999918A (zh) * 2012-04-19 2013-03-27 浙江工业大学 全景视频序列图像的多目标对象跟踪系统
CN106408594A (zh) * 2016-09-28 2017-02-15 江南大学 基于多伯努利特征协方差的视频多目标跟踪方法
US20180374233A1 (en) * 2017-06-27 2018-12-27 Qualcomm Incorporated Using object re-identification in video surveillance
CN110276305A (zh) * 2019-06-25 2019-09-24 广州众聚智能科技有限公司 一种动态商品识别方法
CN110427800A (zh) * 2019-06-17 2019-11-08 平安科技(深圳)有限公司 视频物体加速检测方法、装置、服务器及存储介质
CN110443142A (zh) * 2019-07-08 2019-11-12 长安大学 一种基于路面提取与分割的深度学习车辆计数方法
CN111275082A (zh) * 2020-01-14 2020-06-12 中国地质大学(武汉) 一种基于改进端到端神经网络的室内物体目标检测方法
CN111739024A (zh) * 2020-08-28 2020-10-02 安翰科技(武汉)股份有限公司 图像识别方法、电子设备及可读存储介质

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9122931B2 (en) * 2013-10-25 2015-09-01 TCL Research America Inc. Object identification system and method
KR102255417B1 (ko) * 2014-03-13 2021-05-24 삼성메디슨 주식회사 초음파 진단 장치 및 그에 따른 초음파 영상의 디스플레이 방법
CN106097335B (zh) * 2016-06-08 2019-01-25 安翰光电技术(武汉)有限公司 消化道病灶图像识别系统及识别方法
CN107993228B (zh) * 2017-12-15 2021-02-02 中国人民解放军总医院 一种基于心血管oct影像的易损斑块自动检测方法和装置
US11593656B2 (en) * 2017-12-31 2023-02-28 Astrazeneca Computational Pathology Gmbh Using a first stain to train a model to predict the region stained by a second stain
CN110176295A (zh) * 2019-06-13 2019-08-27 上海孚慈医疗科技有限公司 一种胃肠镜下部位和病灶的实时探测方法及其探测装置

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102542289A (zh) * 2011-12-16 2012-07-04 重庆邮电大学 一种基于多高斯计数模型的人流量统计方法
CN102999918A (zh) * 2012-04-19 2013-03-27 浙江工业大学 全景视频序列图像的多目标对象跟踪系统
CN106408594A (zh) * 2016-09-28 2017-02-15 江南大学 基于多伯努利特征协方差的视频多目标跟踪方法
US20180374233A1 (en) * 2017-06-27 2018-12-27 Qualcomm Incorporated Using object re-identification in video surveillance
CN110427800A (zh) * 2019-06-17 2019-11-08 平安科技(深圳)有限公司 视频物体加速检测方法、装置、服务器及存储介质
CN110276305A (zh) * 2019-06-25 2019-09-24 广州众聚智能科技有限公司 一种动态商品识别方法
CN110443142A (zh) * 2019-07-08 2019-11-12 长安大学 一种基于路面提取与分割的深度学习车辆计数方法
CN111275082A (zh) * 2020-01-14 2020-06-12 中国地质大学(武汉) 一种基于改进端到端神经网络的室内物体目标检测方法
CN111739024A (zh) * 2020-08-28 2020-10-02 安翰科技(武汉)股份有限公司 图像识别方法、电子设备及可读存储介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4207058A4 *

Also Published As

Publication number Publication date
EP4207058A4 (en) 2024-02-21
CN111739024A (zh) 2020-10-02
CN111739024B (zh) 2020-11-24
EP4207058A1 (en) 2023-07-05
US20240029387A1 (en) 2024-01-25

Similar Documents

Publication Publication Date Title
WO2022042352A1 (zh) 图像识别方法、电子设备及可读存储介质
DE102020100684B4 (de) Kennzeichnung von graphischen bezugsmarkierern
CN108009543B (zh) 一种车牌识别方法及装置
JP6871314B2 (ja) 物体検出方法、装置及び記憶媒体
CN107301402B (zh) 一种现实场景关键帧的确定方法、装置、介质及设备
JP6333871B2 (ja) 入力画像から検出した対象物を表示する画像処理装置
CN113298169B (zh) 一种基于卷积神经网络的旋转目标检测方法及装置
WO2023137914A1 (zh) 图像处理方法、装置、电子设备及存储介质
CN110852162B (zh) 一种人体完整度数据标注方法、装置及终端设备
CN108446694B (zh) 一种目标检测方法及装置
CN105809651B (zh) 基于边缘非相似性对比的图像显著性检测方法
CN107688783B (zh) 3d图像检测方法、装置、电子设备及计算机可读介质
CN107886074A (zh) 一种人脸检测方法以及人脸检测系统
CN111582021A (zh) 场景图像中的文本检测方法、装置及计算机设备
US11790672B2 (en) Image processing method, microscope, image processing system, and medium based on artificial intelligence
CN110443212B (zh) 用于目标检测的正样本获取方法、装置、设备及存储介质
WO2021258579A1 (zh) 图像拼接方法、装置、计算机设备和存储介质
CN113657409A (zh) 车辆损失检测方法、装置、电子设备及存储介质
CN111242848B (zh) 基于区域特征配准的双目相机图像缝合线拼接方法及系统
CN113221895B (zh) 小目标检测方法、装置、设备及介质
CN112580558A (zh) 红外图像目标检测模型构建方法、检测方法、装置及系统
CN112365498A (zh) 一种针对二维图像序列中多尺度多形态目标的自动检测方法
CN111881732B (zh) 一种基于svm的人脸质量评价方法
CN109635755A (zh) 人脸提取方法、装置及存储介质
CN115761401A (zh) 一种基于卷积神经网络的高速公路小目标检测方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21860190

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2021860190

Country of ref document: EP

Effective date: 20230328