CN106777090A - The medical science big data search method of the Skyline that view-based access control model vocabulary is matched with multiple features - Google Patents
The medical science big data search method of the Skyline that view-based access control model vocabulary is matched with multiple features Download PDFInfo
- Publication number
- CN106777090A CN106777090A CN201611150453.8A CN201611150453A CN106777090A CN 106777090 A CN106777090 A CN 106777090A CN 201611150453 A CN201611150453 A CN 201611150453A CN 106777090 A CN106777090 A CN 106777090A
- Authority
- CN
- China
- Prior art keywords
- image
- feature
- skyline
- vector
- medical
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 38
- 239000000284 extract Substances 0.000 claims abstract description 11
- 239000013598 vector Substances 0.000 claims description 97
- 230000000007 visual effect Effects 0.000 claims description 41
- 238000012545 processing Methods 0.000 claims description 31
- 238000000605 extraction Methods 0.000 claims description 20
- 238000007500 overflow downdraw method Methods 0.000 claims description 12
- 238000005192 partition Methods 0.000 claims description 10
- 238000013139 quantization Methods 0.000 claims description 10
- 238000012937 correction Methods 0.000 claims description 7
- 230000004927 fusion Effects 0.000 claims description 7
- 238000003064 k means clustering Methods 0.000 claims description 4
- 238000012549 training Methods 0.000 claims description 4
- 230000004438 eyesight Effects 0.000 claims 4
- 238000005070 sampling Methods 0.000 claims 2
- 230000001925 catabolic effect Effects 0.000 claims 1
- 238000010276 construction Methods 0.000 claims 1
- 238000009795 derivation Methods 0.000 claims 1
- 238000001514 detection method Methods 0.000 claims 1
- 230000035772 mutation Effects 0.000 claims 1
- 230000002093 peripheral effect Effects 0.000 claims 1
- 239000000843 powder Substances 0.000 claims 1
- 238000011002 quantification Methods 0.000 claims 1
- 238000005516 engineering process Methods 0.000 abstract description 9
- 238000011156 evaluation Methods 0.000 abstract description 3
- 238000004364 calculation method Methods 0.000 description 7
- 230000000875 corresponding effect Effects 0.000 description 4
- 206010028980 Neoplasm Diseases 0.000 description 3
- 238000006243 chemical reaction Methods 0.000 description 3
- 238000007405 data analysis Methods 0.000 description 3
- 238000000556 factor analysis Methods 0.000 description 3
- 230000003902 lesion Effects 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 239000003086 colorant Substances 0.000 description 2
- 238000012790 confirmation Methods 0.000 description 2
- 238000000354 decomposition reaction Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 238000007670 refining Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000007177 brain activity Effects 0.000 description 1
- 201000011510 cancer Diseases 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 239000002360 explosive Substances 0.000 description 1
- 238000007499 fusion processing Methods 0.000 description 1
- 230000012010 growth Effects 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000000491 multivariate analysis Methods 0.000 description 1
- 238000013138 pruning Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 230000004614 tumor growth Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/58—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/583—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
Landscapes
- Engineering & Computer Science (AREA)
- Library & Information Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Medical Treatment And Welfare Office Work (AREA)
- Measuring And Recording Apparatus For Diagnosis (AREA)
Abstract
Description
技术领域technical field
本发明专利属于智慧医疗与大数处理交叉领域,是一种基于视觉词汇表与多特征匹配的Skyline的医学大数据检索系统,该系统将度量空间Skyline查询应用到基于内容的医学图像检索技术当中,涉及到大规模医疗数据分析、云计算环境下的海量数据处理,涉及到智能数据处理与应用开发。The patent of the present invention belongs to the intersection field of smart medical care and large number processing. It is a Skyline medical big data retrieval system based on visual vocabulary and multi-feature matching. The system applies metric space Skyline query to content-based medical image retrieval technology. , involving large-scale medical data analysis, massive data processing in the cloud computing environment, and intelligent data processing and application development.
背景技术Background technique
随着互联网的发展和医疗数字化设备的普及,医疗图像数据呈指数级增长,相关的图像数据的检索技术也越来越受到人们的关注,海量数据不仅具有数据量大的特点,它们还蕴含着巨大的商业价值。例如分析医学癌症用户的肿瘤生长情况,可以指导医生进行相关的个性化治疗方案推荐;分析脑活动,心率的记录可以给医院厂家和病人带来诊疗指导或家庭监护的病前预警。然而,海量医学影像数据的爆炸式增长,使得传统的单机数据分析处理技术已经越来越不适应当前密集型数据分析和处理的需为了在保证图像检索精度的前提下,提高医学图像检索效率,度量空间Skyline查询(MetricSkylineQuery)算法在图像处理领域得到了很好的应用。该算法可以通过对度量空间中的数据剪枝来提高图像检索效率。With the development of the Internet and the popularization of medical digital equipment, medical image data has grown exponentially, and the retrieval technology of related image data has attracted more and more attention. Massive data not only has the characteristics of large data volume, but also contains Huge commercial value. For example, analyzing the tumor growth of medical cancer users can guide doctors to recommend relevant personalized treatment plans; analyzing brain activity and heart rate records can provide hospital manufacturers and patients with diagnosis and treatment guidance or pre-disease warning for family monitoring. However, the explosive growth of massive medical image data has made the traditional stand-alone data analysis and processing technology less and less suitable for the current intensive data analysis and processing. In order to improve the efficiency of medical image retrieval under the premise of ensuring the accuracy of image retrieval, The metric space Skyline query (MetricSkylineQuery) algorithm has been well applied in the field of image processing. The algorithm can improve the efficiency of image retrieval by pruning the data in the metric space.
现有图像数据的度量空间Skyline算法大多数是基于一般文本语义进行度量空间建模。在医学为背景的语义图像检索方法中,尽管图像的语义信息丰富,但也存在着语义信息复杂、语义理解主观、语义提取和表达困难等缺点,这些缺点影响了度量空间建模和医学图像检索效果;另外,由于语义信息的模糊性,大部分算法为了提高了查询精度,根据语义需要选择多张图像参与查询,这又大大增加了查询过程的计算量。计算量大成为度量空间Skyline查询的一大瓶颈,这点在海量医学图像数据处理上尤其突出。Most existing metric space Skyline algorithms for image data are based on general text semantics for metric space modeling. In the semantic image retrieval method with medical background, although the semantic information of the image is rich, there are also shortcomings such as complex semantic information, subjective semantic understanding, semantic extraction and expression difficulties, which affect the metric space modeling and medical image retrieval. In addition, due to the ambiguity of semantic information, most algorithms select multiple images to participate in the query according to semantic needs in order to improve the query accuracy, which greatly increases the amount of calculation in the query process. The large amount of calculation has become a major bottleneck of Skyline query in metric space, which is especially prominent in the processing of massive medical image data.
近年来,基于内容的图像检索技术得到了迅速的发展,并逐渐成为图像检索领域的主流技术。针对已有医学图像数据的度量空间算法选择图像语义信息进行检索的缺点,从医学图像内容入手,在度量空间上选取图像的底层特征作为研究对象。为了提高检索精度,为了节省计算开销、加快相似度距离计算速度,从多特征融合角度设计度量空间Skyline算法,基于此,我们设计并实现了该发明专利。In recent years, content-based image retrieval technology has developed rapidly, and has gradually become the mainstream technology in the field of image retrieval. Aiming at the shortcomings of existing metric space algorithms for medical image data to select image semantic information for retrieval, starting from the content of medical images, the underlying features of images are selected as the research object in metric space. In order to improve the retrieval accuracy, in order to save computational overhead and speed up the calculation of similarity distance, the metric space Skyline algorithm was designed from the perspective of multi-feature fusion. Based on this, we designed and implemented this invention patent.
发明内容Contents of the invention
根据上述背景技术中存在的缺陷和不足,本发明将度量空间Skyline查询应用到基于内容的医学大规模图像检索技术当中,并提出了一种基于视觉词汇表与Skyline多特征融合的医学大规模图像检索方法(BigFeatureFusionbySkyline,BSKFF),利用Skyline操作进行多特征的融合,设计了一种新的基于视觉词汇的医学大数据检索系统,更好的解决了医学大规模图象数据检索问题。According to the defects and deficiencies in the above-mentioned background technology, the present invention applies the metric space Skyline query to the content-based large-scale medical image retrieval technology, and proposes a large-scale medical image based on the fusion of visual vocabulary and Skyline multi-features The retrieval method (BigFeatureFusionbySkyline, BSKFF) uses Skyline operation to fuse multiple features, and designs a new medical big data retrieval system based on visual vocabulary, which better solves the problem of medical large-scale image data retrieval.
为了实现上述目的,本专利所采用的技术方案是:In order to achieve the above object, the technical solution adopted in this patent is:
一种基于视觉词汇表与多特征匹配的Skyline的医学大数据检索方法,其特征在于,包括如下步骤:A kind of medical big data retrieval method based on the Skyline of visual vocabulary and multi-feature matching, it is characterized in that, comprises the steps:
S1.提取医学图像的底层特征,分别对底层特征集合进行聚类,构建视觉词汇表,以此,将图像库中的图像量化为一个视觉单词出现频率的向量,得到分区特征向量;S1. Extract the underlying features of medical images, cluster the underlying feature sets respectively, and construct a visual vocabulary, thereby quantizing the images in the image library into a vector of the frequency of occurrence of visual words to obtain the partition feature vector;
S2.计算查询图像和图像库中的任意图像在每个特征上的相似度距离,以构造不同特征的图像相似度向量;S2. Calculate the similarity distance between the query image and any image in the image library on each feature to construct image similarity vectors of different features;
S3.调用基于Skyline的多特征融合方法进行分布式检索计算决策。S3. Invoking the Skyline-based multi-feature fusion method for distributed retrieval calculation decision-making.
进一步的,所述步骤S1.提取医学图像的特征数据,给定一个查询图像,提取该图像的底层特征,包括如下步骤:Further, the step S1. extracting the feature data of the medical image, given a query image, extracting the underlying features of the image, includes the following steps:
S1.1.Color特征的提取;S1.1. Extraction of Color features;
S1.2.SIFT特征的提取;S1.2. Extraction of SIFT features;
S1.3.构建视觉词汇表;S1.3. Build a visual vocabulary;
S1.4.图像量化表示。S1.4. Image quantization representation.
进一步的,所述步骤S2中构造不同特征的图像相似度向量的方法是:一个包含n幅医学图像的图像库和查询图像q,医学图像被表达为特征向量,查询图像q和图像库I中的任意图像oi在第t个特征上的相似度距离,其表示为两向量的L1距离:Further, the method for constructing image similarity vectors of different features in the step S2 is: an image library containing n pieces of medical images and the query image q, the medical image is expressed as a feature vector, the similarity distance between the query image q and any image o i in the image library I on the t-th feature, which is expressed as the L1 distance of the two vectors:
其中表示图像oi的第t个特征描述子向量,是图像oi的第t维底层特征的k维向量;in Represents the t-th feature descriptor vector of image o i , which is the k-dimensional vector of the t-th dimension bottom layer feature of image o i ;
基于公式1.3,得到查询医学图像q和医学图像库I中的任意图像oi在每个特征上的相似度距离,图像q和oi的相似度向量如定义1.2所示:Based on the formula 1.3, the similarity distance between the query medical image q and any image o i in the medical image library I on each feature is obtained, and the similarity vector of the image q and o i is shown in Definition 1.2:
定义1.2:设为包含n幅图像的图像库,q为查询图像,查询图像q与图像库I中任意图像oi的相似度向量表示为m维向量:Definition 1.2: Let is an image library containing n images, q is a query image, and the similarity vector between query image q and any image o i in image library I is expressed as an m-dimensional vector:
Vecti(oi,q)=<dist(oi.x1,q.x1),dist(oi.x2,q.x2),...,dist(oi.xm,q.xm)>Vect i (o i ,q)=<dist(o i .x 1 ,qx 1 ),dist(o i .x 2 ,qx 2 ),...,dist(o i .x m ,qx m )>
其中i∈[1,n],m表示底层特征数目,Vecti(oi,q)表示图像q与图像oi的相似度向量,dist(oi.xk,q.xk)表示两幅图像第k(k≤m)维特征的相似度距离;图像库I中的所有图像分别与查询图像q在各维特征上计算相似度距离,构造生成n个相似度向量。Where i∈[1,n], m represents the number of underlying features, Vect i (o i ,q) represents the similarity vector between image q and image o i , dist(o i .x k ,qx k ) represents two images The similarity distance of the kth (k≤m) dimensional feature; all the images in the image library I and the query image q calculate the similarity distance on each dimensional feature, and construct and generate n similarity vectors.
进一步的,所述步骤S3的具体方法:Further, the specific method of the step S3:
给定一个包含n幅图像的医学图像库和一幅查询图像q,集合R为多特征融合方法的查询结果,对于每幅图像的m个底层特征向量 Given a medical image library containing n images and a query image q, the set R is the query result of the multi-feature fusion method, for the m underlying feature vectors of each image
当一幅图像oi∈R,当且仅当满足如下条件:When an image o i ∈ R, if and only if it satisfies the following conditions:
则R集合包含了与查询图像q在X向量空间上相似度向量Vecti(oi,q)=<dist(oi.x1,q.x1),dist(oi.x2,q.x2),...,dist(oi.xm,q.xm)>不被医学图像库I上的其他任何图像相似度向量支配的所有图像的集合;Then the R set contains the similarity vector Vect i (o i ,q)=<dist(o i .x 1 ,qx 1 ),dist(o i .x 2 ,qx 2 ) with the query image q in the X vector space ,...,dist(o i .x m ,qx m )>The collection of all images not dominated by any other image similarity vectors on the medical image library I;
进一步的,基于Skyline的多特征融合方法的结果集是医学图像库的子集,且在多特征度量空间中不被图像集里任意图像所支配的图像集合,查询图像q与任意图像oi的SIFT和Color特征相似度距离值构成点,点的横坐标表示图像o1与查询图像q之间SIFT特征的相似度距离,纵坐标表示图像o1与查询图像q之间Color特征的相似度距离,该所述相似度距离在多特征度量空间上都是基于词袋模型计算得到的,相似度距离越小,两者之间越相似。Furthermore, the result set of the multi-feature fusion method based on Skyline is a subset of the medical image library, and the image set is not dominated by any image in the image set in the multi-feature metric space, the query image q and any image o i The SIFT and Color feature similarity distance values constitute a point, the abscissa of the point represents the similarity distance of the SIFT feature between the image o 1 and the query image q, and the ordinate represents the similarity distance of the Color feature between the image o 1 and the query image q , the similarity distance is calculated based on the bag-of-words model in the multi-feature metric space, and the smaller the similarity distance is, the more similar they are.
进一步的,使用Spark进行流处理,将流式计算分解成一系列短小的批处理作业,逐渐融合与决策结果推荐。Further, use Spark for stream processing, decompose stream computing into a series of short batch jobs, and gradually integrate and recommend decision results.
进一步的,步骤S1.1.Color特征的提取的方法如下:Further, the extraction method of step S1.1.Color feature is as follows:
Color特征用颜色属性CN描述子来表示,由红、黑、蓝、绿、褐、灰、粉、橙、白、紫、黄色颜色组成,把颜色属性CN定义为一个11维的变量,为图像中所有像素赋予一个颜色属性标签,此标签作为Skyline多因素分析的一个主因素,采用Spark进行流处理,结果逐渐完善与输出;The Color feature is represented by the color attribute CN descriptor, which is composed of red, black, blue, green, brown, gray, pink, orange, white, purple, and yellow colors. The color attribute CN is defined as an 11-dimensional variable, which is an image All pixels in the image are assigned a color attribute label, which is a main factor of Skyline multi-factor analysis, and Spark is used for stream processing, and the results are gradually improved and output;
进一步的,步骤S1.2.SIFT特征的提取的方法如下:Further, the method of step S1.2.SIFT feature extraction is as follows:
由检测特征点和描述特征点两部分组成,对原始图像进行尺度转换,得到图像的尺度空间表示序列,然后对图像进行处理得到特征点,采用128维的描述子向量来表示特征点,得到共128维的SIFT特征向量,用SIFT特征提取过程中生成的特征点,将特征点及其所在的周围区域作为局部区域,提取局部区域中的每个像素的CN向量,得到SIFT和CN局部特征向量,此向量作为Skyline多因素分析的一个主因素,采用Spark进行流处理,结果逐渐完善与输出;It consists of two parts: detecting feature points and describing feature points. Scale conversion is performed on the original image to obtain the scale space representation sequence of the image, and then the image is processed to obtain the feature points. The 128-dimensional descriptor vector is used to represent the feature points, and the total 128-dimensional SIFT feature vector, using the feature points generated during the SIFT feature extraction process, using the feature point and its surrounding area as a local area, extracting the CN vector of each pixel in the local area, and obtaining SIFT and CN local feature vectors , this vector is used as a main factor of Skyline multi-factor analysis, using Spark for stream processing, and the results are gradually improved and output;
进一步的,步骤S1.3.构建视觉词汇表的方法如下:Further, step S1.3. The method for constructing a visual vocabulary is as follows:
通过基于Spark的多层聚类算法k-means及其变种以及过采样修正,利用Spark系统,对图像库中的图像进行流式训练,并分别为SIFT和Color特征向量逐步生成视觉词汇表,生成视觉词汇表时,使用先切分数据,并用Spark系统,以流的方式进行分布式处理,并递增导出结果集;Through the Spark-based multi-layer clustering algorithm k-means and its variants and oversampling correction, the Spark system is used to perform streaming training on the images in the image library, and gradually generate visual vocabulary for SIFT and Color feature vectors, and generate For the visual vocabulary, the data is first segmented, and the Spark system is used for distributed processing in a streaming manner, and the result set is incrementally exported;
其中,多层k-means聚类算法是在一些维度的特征点集合X={x1,x2,...,xn}中寻找k个聚类中心C={c1,c2,...,ck},使每个特征点到所在簇中心的平方误差和最小;这些聚类中心将X划分成k个不相交的簇Y={Y1,Y2,...,Yk},使得对于任意的1≤i≠j≤k,对于一个簇Yi,它的中心点为:Among them, the multi-layer k-means clustering algorithm is to find k cluster centers C= { c 1 , c 2 , ...,c k }, so that the sum of square errors from each feature point to the center of the cluster is the smallest; these cluster centers divide X into k disjoint clusters Y={Y 1 ,Y 2 ,..., Y k }, such that for any 1≤i≠j≤k, For a cluster Y i , its center point is:
其中,过采样修正算法是利用一个SparkSpark作业来进行中心点选择和全局误差的计算(与传统的MapReduce不同在于,我们采用了Spark,利用分布式缓存进行处理,以加快迭带的速度,结果以流式递增的方式进行),其目标函数为:Among them, the oversampling correction algorithm uses a SparkSpark job to select the center point and calculate the global error (different from the traditional MapReduce, we use Spark and use distributed cache for processing to speed up the iteration speed. The result is flow-increasing way), its objective function is:
每一个分解阶段产生的OnR聚类算法的目标是找到一个最优的划分C,使得Spark的最终全局聚类误差φX(C)最小,其中φX(C)是利用中心点集C,对特征集合X划分产生的全局聚类误差,|| ||为欧几里得距离。分别对SIFT和CN特征集合进行聚类,得到的k个聚类中心即为它们视觉词汇表。The goal of the OnR clustering algorithm generated in each decomposition stage is to find an optimal partition C, so that the final global clustering error φ X (C) of Spark is the smallest, where φ X (C) uses the central point set C, for The global clustering error generated by the feature set X division, || || is the Euclidean distance. The SIFT and CN feature sets are clustered separately, and the k cluster centers obtained are their visual vocabulary.
进一步的,步骤S1.4.图像量化表示的方法如下:Further, step S1.4. The method of image quantization representation is as follows:
基于聚类算法生成的视觉词汇表,每幅图像的SIFT描述子被量化为一个装满单词的词袋,在视觉词袋模型中,给定一个特征的视觉词汇表其中j=1,...,m,k是视觉词汇表中单词的个数,图像库中,每幅图像被量化为一个视觉单词出现频率的k维向量,以相同的方式对Color特征进行量化处理,并且将每幅图像量化生成相应的特征向量,对于多特征的量化过程,以此类推,直到所有特征被量化,得到如定义1.1所示的特征向量;Based on the visual vocabulary generated by the clustering algorithm, the SIFT descriptor of each image is quantized as a bag of words full of words. In the visual bag of words model, the visual vocabulary of a given feature Where j=1,...,m, k is the number of words in the visual vocabulary, in the image library, each image is quantified as a k-dimensional vector of the frequency of a visual word, and the Color feature is processed in the same way Quantization processing, and quantize each image to generate a corresponding feature vector, for the multi-feature quantization process, and so on, until all features are quantized, and the feature vector shown in Definition 1.1 is obtained;
定义1.1:在每一个数据分区中,查找一个包含n幅图像的图像库假定每幅图像oi有一组底层特征m是底层特征的数量,每幅图像oi的特征向量表示为<oi.x1,oi.x2,...,oi.xm>。Definition 1.1: In each data partition, find an image library containing n images Assume that each image o i has a set of underlying features m is the number of underlying features, and the feature vector of each image o i is expressed as <o i .x 1 ,o i .x 2 ,...,o i .x m >.
有益效果:该医学大数据检索系统会通过相关技术在用户端获取到图片的相应信息上传并保存到云端服务器,然后云端服务器进行分布式处理,得到最佳的医学图像聚类方案并逐步反馈给用户。Beneficial effects: The medical big data retrieval system will obtain the corresponding information of the image on the client side through related technologies, upload it and save it to the cloud server, and then the cloud server will perform distributed processing to obtain the best medical image clustering scheme and gradually feed it back to the user.
附图说明Description of drawings
图1本发明的特征融合方法的系统模型;The system model of the feature fusion method of Fig. 1 of the present invention;
图2本发明基于Skyline的特征融合过程;Fig. 2 the feature fusion process based on Skyline of the present invention;
图3本发明的SKFF算法的伪代码。Fig. 3 is the pseudo code of the SKFF algorithm of the present invention.
具体实施方式detailed description
实施例1:参考图1,是一种基于视觉词汇表与多特征匹配的Skyline的医学大数据检索系统,所述系统由一个云中心服务系统和一个手机智能移动客户端软件系统组成。其中,云服务系统负责进行分布式逐步提取医学图像的SIFT、Color等特征数据,利用Skyline操作对图像的多个底层特征进行融合,每个特征相似度都作为Skyline的评价目标,经过Spark计算,逐步返回结果,而最终返回的结果是与查询图像在多维特征上都比较相似或某一维特征极其相似的候选图像;我们的移动医学端软件根据需要将需要进行医学大规模图像分层聚类的医学图像发送至云中心服务系统,并接收云端请求。 Embodiment 1: With reference to Fig. 1, it is a kind of medical big data retrieval system based on visual vocabulary and multi-feature matching Skyline, said system is made up of a cloud center service system and a mobile phone intelligent mobile client software system. Among them, the cloud service system is responsible for the distributed and gradual extraction of feature data such as SIFT and Color of medical images, and uses the Skyline operation to fuse multiple underlying features of the image. The similarity of each feature is used as the evaluation target of Skyline. After calculation by Spark, The results are returned step by step, and the final returned results are candidate images that are similar to the query image in terms of multi-dimensional features or extremely similar in one-dimensional features; our mobile medical software will perform hierarchical clustering of medical large-scale images as needed Send medical images to the cloud center service system and receive cloud requests.
作为一个实施例,该基于视觉词汇表与多特征匹配的Skyline的医学大数据检索系统的执行流程是,当移动用户通过医学影像扫描仪器,采集并发出相关医学图像检索的请求后,由云端系统提取医学图像的SIFT、Color等特征数据,利用Skyline操作对图像的多个底层特征进行融合,得到最好的聚类方案并返回逐步返回给用户,如果时间足够长,会将最终结果给用户,中间可以通过移动交流平台进行业务的逐步确认和最终完整结果的确认工作。As an example, the execution flow of the Skyline medical big data retrieval system based on visual vocabulary and multi-feature matching is that when a mobile user collects and sends a request for relevant medical image retrieval through a medical image scanning instrument, the cloud system Extract feature data such as SIFT and Color of medical images, use Skyline operation to fuse multiple underlying features of the image, get the best clustering scheme and return it to the user step by step, if the time is long enough, the final result will be given to the user, In the middle, the step-by-step confirmation of the business and the confirmation of the final and complete results can be carried out through the mobile communication platform.
SIFT、Color特征数据算法的处理步骤具体为:Color特征用颜色属性ColorNames(CN)描述子来表示,把颜色属性CN定义为一个11维的变量,为图像中所有像素赋予一个颜色属性标签,此标签作为Skyline多因素分析的一个主因素。SIFT特征提取是对原始图像进行尺度转换,得到图像的尺度空间表示序列,然后采用128维的描述子向量来表示特征点,得到共128维的SIFT特征向量。用SIFT特征提取过程中生成的特征点,将特征点及其所在的周围区域作为局部区域,提取局部区域中的每个像素的CN向量,得到SIFT和CN局部特征向量,此向量作为Skyline多因素分析的一个主因素。然后我们将对采集的CN标签和特征向量采用Spark进行流处理,结果逐渐完善与输出。基于SIFT和CN特征向量的提取方法,通过基于Spark的多层聚类算法k-means及其变种以及过采样修正,利用Spark系统,对大规模医学图像库中的图像进行流式训练,并分别为SIFT和Color特征向量逐步生成视觉词汇表,我们使用先切分数据,并用Spark系统,以流的方式进行分布式处理,并递增导出结果集;其中,多层k-means聚类算法是在一些维度(比如说网格或更高维空间中)的特征点集合中寻找k个聚类中心,使每个特征点到所在簇(病灶区)中心的平方误差和最小。这些聚类中心将特征点集合划分成k个不相交的簇(病灶区),使得对于任意的,对于一个簇(病灶区),即可算出病灶点。The processing steps of the SIFT and Color feature data algorithms are as follows: the Color feature is represented by the color attribute ColorNames (CN) descriptor, the color attribute CN is defined as an 11-dimensional variable, and a color attribute label is assigned to all pixels in the image. Tags were used as a principal factor in the Skyline multivariate analysis. SIFT feature extraction is to perform scale conversion on the original image to obtain the scale space representation sequence of the image, and then use the 128-dimensional descriptor vector to represent the feature points, and obtain a total of 128-dimensional SIFT feature vectors. Use the feature points generated during the SIFT feature extraction process, use the feature points and their surrounding areas as local areas, extract the CN vector of each pixel in the local area, and obtain SIFT and CN local feature vectors, which are used as Skyline multi-factor A major factor in the analysis. Then we will use Spark to stream process the collected CN tags and feature vectors, and the results will be gradually improved and output. Based on the extraction method of SIFT and CN feature vectors, through the Spark-based multi-layer clustering algorithm k-means and its variants and oversampling correction, using the Spark system, the images in the large-scale medical image database are streamed for training, and respectively To gradually generate a visual vocabulary for SIFT and Color feature vectors, we first segment the data, and use the Spark system to perform distributed processing in a streaming manner, and incrementally export the result set; among them, the multi-layer k-means clustering algorithm is in Find k cluster centers in the set of feature points of some dimensions (such as grid or higher-dimensional space), so that the sum of squared errors from each feature point to the center of the cluster (lesion area) where it is located is minimized. These clustering centers divide the set of feature points into k disjoint clusters (focus areas), so that for any , the focus points can be calculated for one cluster (focus area).
基于聚类算法生成的视觉词汇表,每幅图像的SIFT描述子被量化为一个装满单词的词袋。在视觉词袋模型中,给定一个特征的视觉词汇表其中j=1,...,m,k是视觉词汇表中单词的个数(即聚类中心个数)。于是医学图像库中,每幅医学图像被量化为一个视觉单词出现频率的向量(k维向量)。以相同的方式对Color特征进行量化处理,并且将每幅图像量化生成相应的特征向量。对于多特征(m≥2)的量化过程,以此类推,直到所有特征被量化。Based on the visual vocabulary generated by the clustering algorithm, the SIFT descriptor of each image is quantized as a bag-of-words filled with words. In the bag-of-visual-words model, given a feature’s visual vocabulary Where j=1,...,m, k is the number of words in the visual vocabulary (that is, the number of cluster centers). Therefore, in the medical image database, each medical image is quantified as a vector (k-dimensional vector) of the frequency of occurrence of a visual word. The Color feature is quantized in the same way, and each image is quantized to generate a corresponding feature vector. For the quantization process of multi-features (m≥2), and so on until all features are quantized.
作为另一个实施例,过采样修正算法的定义为:在每一次迭代中,过采样修正(OversamplingandRefining,简称为OnR)使用一个SparkSpark作业来进行中心点选择和全局误差的计算(与传统的MapReduce不同在于,我们采用了Spark,利用分布式缓存进行处理,以加快迭带的速度,结果以流式递增的方式进行),OnR方法受到scalablek-means++方法的启发,除了过采样因子,它使用另一个过采样因子,进一步增大Map阶段选的中心点的数目。As another embodiment, the oversampling correction algorithm is defined as: in each iteration, oversampling and refining (Oversampling and Refining, referred to as OnR) uses a SparkSpark job to carry out center point selection and global error calculation (different from traditional MapReduce The reason is that we use Spark and use distributed cache for processing to speed up the iteration speed, and the results are incrementally streamed), the OnR method is inspired by the scalablek-means++ method, in addition to the oversampling factor, it uses another The oversampling factor further increases the number of center points selected in the Map stage.
在每一个数据分区中,查找一个包含n幅医学图像的图像库和查询的医学图像q,根据S1,医学图像被表达为特征向量。于是,查询图像q和图像库I中的任意图像oi在第t个特征上的相似度距离可表示为两向量的L1距离,根据公式,我们得到查询图像q和图像库I中的任意图像oi在每个特征上的相似度距离,那么图像q和oi的相似度向量可以表示为两幅图像第k(k≤m)维特征的相似度距离。图像库I中的所有图像分别与查询图像q在各维特征上计算相似度距离,构造生成n个相似度向量。In each data partition, find an image library containing n medical images and the query medical image q, according to S1, the medical image is expressed as a feature vector. Therefore, the similarity distance between the query image q and any image o i in the image library I on the t-th feature can be expressed as the L 1 distance between two vectors. According to the formula, we get the query image q and any image o i in the image library I The similarity distance of image o i on each feature, then the similarity vector of image q and o i can be expressed as the similarity distance of the kth (k≤m) dimensional features of two images. Calculate the similarity distance between all the images in the image database I and the query image q on each dimension feature, and construct n similarity vectors.
参考图3,计算图像库中每幅图像和查询图像在特征SIFT和Color上的相似度,得到二维的图像相似度向量集合;进一步的,查询图像q与任意图像oi的SIFT和Color特征相似度距离值构成点,通过基于Skyline的多特征融合方法进行分布式计算决策,相似度距离越小,两者之间越相似,我们采用Spark进行流处理,结果逐渐融合与决策结果推荐,用户得到的结果随时时间会逐步精确。Referring to Figure 3, calculate the similarity between each image in the image library and the query image on the feature SIFT and Color, and obtain a two-dimensional image similarity vector set; further, the SIFT and Color features of the query image q and any image o i The similarity distance value constitutes a point, and the distributed computing decision is made through the multi-feature fusion method based on Skyline. The smaller the similarity distance, the more similar the two are. We use Spark for stream processing, and the results are gradually fused and recommended for decision results. Users The results obtained will be progressively more accurate over time.
实施例2:一种基于视觉词汇表与多特征匹配的Skyline的医学大数据检索系统,主要是提取医学图像的SIFT、Color等特征数据,利用分布式Skyline操作对图像的多个底层特征进行融合,每个特征相似度都作为Skyline的评价目标,返回的结果是与查询图像在多维特征上都比较相似或某一维特征极其相似的候选图像,最后利用云计算的Spark系统进行流氏处理,并实时得到查询或处理结果。可分为以下三个阶段: Embodiment 2: A Skyline medical big data retrieval system based on visual vocabulary and multi-feature matching, which mainly extracts feature data such as SIFT and Color of medical images, and uses distributed Skyline operations to fuse multiple underlying features of images , each feature similarity is used as the evaluation target of Skyline, and the returned result is a candidate image that is relatively similar to the query image in terms of multi-dimensional features or a certain dimensional feature is extremely similar, and finally uses the cloud computing Spark system for stream processing. And get query or processing results in real time. It can be divided into the following three stages:
第一阶段:提取图像的特征。给定一个查询图像,提取该图像的底层特征。步骤如下:The first stage: extracting the features of the image. Given a query image, extract the underlying features of that image. Proceed as follows:
S1.Color特征的提取;S1.Color feature extraction;
S2.SIFT特征的提取;S2.SIFT feature extraction;
S3.构建视觉词汇表;S3. Building a visual vocabulary;
S4.图像量化表示。S4. Image quantization representation.
进一步的,步骤S1.Color特征用颜色属性ColorNames(CN)描述子来表示,由11种基本颜色组成,即红、黑、蓝、绿、褐、灰、粉、橙、白、紫和黄色,由此把颜色属性CN定义为一个11维的变量,为图像中所有像素赋予一个颜色属性标签,此标签作为Skyline多因素分析的一个主因素,我们采用Spark进行流处理,结果逐渐完善与输出。Further, step S1.Color features are represented by the color attribute ColorNames (CN) descriptor, which consists of 11 basic colors, namely red, black, blue, green, brown, gray, pink, orange, white, purple and yellow, Therefore, the color attribute CN is defined as an 11-dimensional variable, and a color attribute label is assigned to all pixels in the image. This label is used as a main factor in Skyline multi-factor analysis. We use Spark for stream processing, and the results are gradually improved and output.
进一步的,步骤S2.SIFT特征提取过程由检测特征点和描述特征点两部分组成。对原始图像进行尺度转换,得到图像的尺度空间表示序列,然后对图像进行相关处理得到特征点。采用128维的描述子向量来表示特征点,得到共128维的SIFT特征向量。用SIFT特征提取过程中生成的特征点,将特征点及其所在的周围区域作为局部区域,提取局部区域中的每个像素的CN向量,得到SIFT和CN局部特征向量,此向量作为Skyline多因素分析的一个主因素,我们采用Spark进行流处理,结果逐渐完善与输出;Further, the step S2.SIFT feature extraction process consists of two parts: detecting feature points and describing feature points. Scale conversion is performed on the original image to obtain the scale space representation sequence of the image, and then the image is correlated to obtain the feature points. A 128-dimensional descriptor vector is used to represent the feature points, and a total of 128-dimensional SIFT feature vectors are obtained. Use the feature points generated during the SIFT feature extraction process, use the feature points and their surrounding areas as local areas, extract the CN vector of each pixel in the local area, and obtain SIFT and CN local feature vectors, which are used as Skyline multi-factor As a main factor of the analysis, we use Spark for stream processing, and the results are gradually improved and output;
进一步的,步骤S3.基于SIFT和CN特征向量的提取方法,通过基于Spark的多层聚类算法k-means及其变种以及过采样修正,利用Spark系统,对图像库中的图像进行流式训练,并分别为SIFT和Color特征向量逐步生成视觉词汇表,我们与之前的视觉词汇表不同在于,我们使用先切分数据,并用Spark系统,以流的方式进行分布式处理,并递增导出结果集;Further, step S3. Based on the extraction method of SIFT and CN feature vectors, through the Spark-based multi-layer clustering algorithm k-means and its variants and oversampling correction, using the Spark system to perform streaming training on the images in the image library , and gradually generate a visual vocabulary for SIFT and Color feature vectors respectively. The difference between us and the previous visual vocabulary is that we use the first data segmentation, and use the Spark system to perform distributed processing in a streaming manner, and incrementally export the result set ;
其中,多层k-means聚类算法是在一些维度(比如说网格或更高维空间中)的特征点集合X={x1,x2,...,xn}中寻找k个聚类中心C={c1,c2,...,ck},使每个特征点到所在簇中心(在肿瘤图像中,这些簇中心代表了肿瘤病灶区,或可能的病灶区)的平方误差和最小(SumofsquaredError,SSE)。这些聚类中心将X划分成k个不相交的簇Y={Y1,Y2,...,Yk},使得对于任意的1≤i≠j≤k, 对于一个簇Yi,它的中心点(即质心)为:Among them, the multi-layer k-means clustering algorithm is to find k feature point sets X={x 1 ,x 2 ,...,x n } in some dimensions (such as grid or higher-dimensional space) Clustering center C={c 1 ,c 2 ,...,c k }, so that each feature point is located at the center of the cluster (in the tumor image, these cluster centers represent tumor lesions, or possible lesions) The minimum sum of squared errors (SumofsquaredError, SSE). These cluster centers divide X into k disjoint clusters Y={Y 1 ,Y 2 ,...,Y k }, such that for any 1≤i≠j≤k, For a cluster Y i , its center point (ie centroid) is:
其中,过采样修正算法是利用一个SparkSpark作业来进行中心点选择和全局误差的计算(与传统的MapReduce不同在于,我们采用了Spark,利用分布式缓存进行处理,以加快迭带的速度,结果以流式递增的方式进行),其目标函数为:Among them, the oversampling correction algorithm uses a SparkSpark job to select the center point and calculate the global error (different from the traditional MapReduce, we use Spark and use distributed cache for processing to speed up the iteration speed. The result is flow-increasing way), its objective function is:
每一个分解阶段产生的OnR聚类算法的目标是找到一个最优的划分C,使得Spark的最终全局聚类误差φX(C)最小。其中φX(C)是利用中心点集C,对特征集合X划分产生的全局聚类误差,|| ||为欧几里得距离。分别对SIFT和CN特征集合进行聚类,得到的k个聚类中心即为它们视觉词汇表。The goal of the OnR clustering algorithm generated in each decomposition stage is to find an optimal partition C that minimizes the final global clustering error φ X (C) of Spark. Among them, φ X (C) is the global clustering error generated by dividing the feature set X by using the central point set C, and || || is the Euclidean distance. The SIFT and CN feature sets are clustered separately, and the k cluster centers obtained are their visual vocabulary.
进一步的,步骤S4.基于聚类算法生成的视觉词汇表,每幅图像的SIFT描述子被量化为一个装满单词的词袋。在视觉词袋模型中,给定一个特征的视觉词汇表其中j=1,...,m,k是视觉词汇表中单词的个数(即聚类中心个数)。于是图像库中,每幅图像被量化为一个视觉单词出现频率的向量(k维向量)。以相同的方式对Color特征进行量化处理,并且将每幅图像量化生成相应的特征向量。对于多特征(m≥2)的量化过程,以此类推,直到所有特征被量化,得到如定义1.1所示的特征向量。Further, step S4. Based on the visual vocabulary generated by the clustering algorithm, the SIFT descriptor of each image is quantized into a bag of words filled with words. In the bag-of-visual-words model, given a feature’s visual vocabulary Where j=1,...,m, k is the number of words in the visual vocabulary (that is, the number of cluster centers). Therefore, in the image library, each image is quantized as a vector (k-dimensional vector) of the frequency of occurrence of a visual word. The Color feature is quantized in the same way, and each image is quantized to generate a corresponding feature vector. For the quantization process of multi-features (m≥2), and so on, until all the features are quantized, the feature vector shown in Definition 1.1 is obtained.
定义1.1(分区特征向量):在每一个数据分区中,查找一个包含n幅图像的图像库假定每幅图像oi有一组底层特征m是底层特征的数量,每幅图像oi的特征向量表示为<oi.x1,oi.x2,...,oi.xm>。Definition 1.1 (partition feature vector): In each data partition, find an image library containing n images Assume that each image o i has a set of underlying features m is the number of underlying features, and the feature vector of each image o i is expressed as <o i .x 1 ,o i .x 2 ,...,o i .x m >.
第二阶段,特征匹配。分布式计算查询图像和图像库里每个每个数据分区中的图像的SIFT和Color的相似度。步骤如下:The second stage is feature matching. Distributed computing queries the image and the similarity of SIFT and Color of each image in each data partition in the image database. Proceed as follows:
S1.给定一个医学图像,利用Spark逐步提取它的SIFT特征和Color特征,然后根据已生成的视觉词汇表将其特征描述子各自量化为特征向量,我们采用Spark进行流处理,结果逐渐提取与量化;S1. Given a medical image, use Spark to gradually extract its SIFT features and Color features, and then quantify its feature descriptors into feature vectors according to the generated visual vocabulary. We use Spark for stream processing, and the results are gradually extracted and Quantify;
S2.计算医学图像之间各特征的相似度;S2. Calculate the similarity of each feature between medical images;
进一步的,步骤S2.现有一个包含n幅医学图像的图像库和查询图像q,根据S1,医学图像被表达为特征向量。于是,查询图像q和图像库I中的任意图像oi在第t个特征上的相似度距离可表示为两向量的L1距离:Further, step S2. There is an existing image library containing n medical images and query image q, medical images are expressed as feature vectors according to S1. Therefore, the similarity distance between the query image q and any image o i in the image library I on the t-th feature can be expressed as the L 1 distance between two vectors:
其中表示图像oi的第t个特征描述子向量,即代表着图像oi的第t维底层特征的k维向量。in Represents the t-th feature descriptor vector of image o i , that is, the k-dimensional vector representing the t-th dimension bottom layer feature of image o i .
基于公式1.3,我们得到查询医学图像q和医学图像库I中的任意图像oi在每个特征上的相似度距离。那么图像q和oi的相似度向量如定义1.2所示:Based on Equation 1.3, we obtain the similarity distance between the query medical image q and any image o i in the medical image library I on each feature. Then the similarity vectors of images q and o i are shown in Definition 1.2:
定义1.2(图像相似度向量):设为包含n幅图像的图像库,q为查询图像,查询图像q与图像库I中任意图像oi的相似度向量可以表示为m维向量:Definition 1.2 (image similarity vector): Let is an image library containing n images, q is a query image, and the similarity vector between query image q and any image o i in image library I can be expressed as an m-dimensional vector:
Vecti(oi,q)=<dist(oi.x1,q.x1),dist(oi.x2,q.x2),...,dist(oi.xm,q.xm)>Vect i (o i ,q)=<dist(o i .x 1 ,qx 1 ),dist(o i .x 2 ,qx 2 ),...,dist(o i .x m ,qx m )>
其中i∈[1,n],m表示底层特征数目,Vecti(oi,q)表示图像q与图像oi的相似度向量,dist(oi.xk,q.xk)表示两幅图像第k(k≤m)维特征的相似度距离。Where i∈[1,n], m represents the number of underlying features, Vect i (o i ,q) represents the similarity vector between image q and image o i , dist(o i .x k ,qx k ) represents two images The similarity distance of the kth (k≤m) dimensional feature.
图像库I中的所有图像分别与查询图像q在各维特征上计算相似度距离,构造生成n个相似度向量。Calculate the similarity distance between all the images in the image database I and the query image q on each dimension feature, and construct n similarity vectors.
第三阶段,特征融合。将不同特征的相似度向量构造成一个新的向量,调用基于Skyline的多特征融合方法(SKFF)进行分布式计算决策。最后,我们采用Spark进行流处理,结果逐渐融合与决策结果推荐,用户得到的结果随时时间会逐步精确。The third stage is feature fusion. The similarity vectors of different features are constructed into a new vector, and the Skyline-based multi-feature fusion method (SKFF) is called for distributed computing decision-making. Finally, we use Spark for stream processing, and the results are gradually integrated and recommended for decision-making results. The results obtained by users will be gradually accurate at any time.
S1.分布式计算图像库中每幅图像和查询图像在特征SIFT和Color上的相似度,得到二维的图像相似度向量集合;S1. Distributed calculation of the similarity between each image in the image library and the query image on the feature SIFT and Color, to obtain a two-dimensional image similarity vector set;
S2.利用Skyline的多特征融合进行特征融合,前面特征匹配的结果可作为Skyline操作的输入;S2. Use Skyline's multi-feature fusion to perform feature fusion, and the result of the previous feature matching can be used as the input of Skyline operation;
S3.利用云计算的Spark系统进行流氏处理,并实时得到查询或处理结果。S3. Use the Spark system of cloud computing to perform streaming processing, and obtain query or processing results in real time.
进一步的,给出基于Skyline的多特征融合方法的定义(4.1)。Further, the definition (4.1) of the multi-feature fusion method based on Skyline is given.
定义1.4(基于Skyline的多特征融合方法):给定一个包含n幅图像的医学图像库和一幅查询图像q,集合R为多特征融合方法的查询结果。对于每幅图像的m个底层特征向量R集合包含了与查询图像q在X向量空间上相似度向量Vecti(oi,q)=<dist(oi.x1,q.x1),dist(oi.x2,q.x2),...,dist(oi.xm,q.xm)>不被医学图像库I上的其他任何图像相似度向量支配的所有图像的集合,即当一幅图像oi∈R,当且仅当满足如下条件:Definition 1.4 (Skyline-based multi-feature fusion method): Given a medical image library containing n images and a query image q, the set R is the query result of the multi-feature fusion method. For the m underlying feature vectors of each image The R set contains the similarity vector Vect i (o i ,q)=<dist(o i .x 1 ,qx 1 ),dist(o i .x 2 ,qx 2 ), ...,dist(o i .x m ,qx m )>The set of all images not dominated by any other image similarity vector on the medical image database I, that is, when an image o i ∈R, if and only When the following conditions are met:
进一步的,基于Skyline的多特征融合方法(SKFF)的结果集是医学图像库的子集,且在多特征度量空间中不被图像集里任意图像所支配的图像集合。查询图像q与任意图像oi的SIFT和Color特征相似度距离值构成点,如图2所示,例如p1点的横坐标表示图像o1与查询图像q之间SIFT特征的相似度距离,纵坐标则表示它们之间Color特征的相似度距离,这些距离在多特征度量空间上都是基于词袋模型计算。Furthermore, the result set of the Skyline-based multi-feature fusion method (SKFF) is a subset of the medical image database, and an image collection that is not dominated by any image in the multi-feature metric space. The SIFT and Color feature similarity distance values between the query image q and any image o i constitute points, as shown in Figure 2. For example, the abscissa of point p 1 represents the similarity distance of SIFT features between image o 1 and query image q, The ordinate indicates the similarity distance of the Color features between them, and these distances are calculated based on the bag-of-words model in the multi-feature metric space.
进一步的,相似度距离越小,两者之间越相似,因此{p1,p2,p3,p4}是最后的Skyline结果,表示没有其他更好的图像比{o1,o2,o3,o4}在SIFT和Color特征上都与查询图像的更相似,即在图像库中没有图像与查询图像的相似度向量在SIFT和Color特征上支配它们。Further, the smaller the similarity distance, the more similar they are, so {p 1 ,p 2 ,p 3 ,p 4 } is the final Skyline result, indicating that there is no other better image than {o 1 ,o 2 ,o 3 ,o 4 } are more similar to the query image on both SIFT and Color features, that is, there are no images in the image library whose similarity vectors to the query image dominate them on SIFT and Color features.
S3.Spark进行流处理,逐渐融合与决策结果推荐。S3.Spark performs stream processing, gradually integrates and recommends decision results.
进一步的,步骤S2,得出最后的Skyline结果是{p1,p2,p3,p4}。Further, in step S2, the final Skyline result is {p 1 , p 2 , p 3 , p 4 }.
进一步的,利用Spark进行流处理,将流式计算分解成一系列短小的批处理作业。整个流式计算根据业务的需求可以对中间的结果进行叠加,或者存储到外部设备,把最佳的医学聚类方案逐步反馈给用户。Further, use Spark for stream processing, and decompose stream computing into a series of short batch jobs. The entire streaming computing can superimpose the intermediate results according to the needs of the business, or store them in an external device, and gradually feed back the best medical clustering scheme to the user.
以上所述,仅为本发明创造较佳的具体实施方式,但本发明创造的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明创造披露的技术范围内,根据本发明创造的技术方案及其发明构思加以等同替换或改变,都应涵盖在本发明创造的保护范围之内。The above is only a preferred embodiment of the present invention, but the scope of protection of the present invention is not limited thereto, any person familiar with the technical field within the technical scope of the disclosure of the present invention, according to the present invention Any equivalent replacement or change of the created technical solution and its inventive concept shall be covered within the scope of protection of the present invention.
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611150453.8A CN106777090A (en) | 2016-12-14 | 2016-12-14 | The medical science big data search method of the Skyline that view-based access control model vocabulary is matched with multiple features |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611150453.8A CN106777090A (en) | 2016-12-14 | 2016-12-14 | The medical science big data search method of the Skyline that view-based access control model vocabulary is matched with multiple features |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106777090A true CN106777090A (en) | 2017-05-31 |
Family
ID=58876961
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611150453.8A Pending CN106777090A (en) | 2016-12-14 | 2016-12-14 | The medical science big data search method of the Skyline that view-based access control model vocabulary is matched with multiple features |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106777090A (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107766472A (en) * | 2017-10-09 | 2018-03-06 | 中国人民解放军国防科技大学 | Contour hierarchical query parallel processing method based on multi-core processor |
CN108446740A (en) * | 2018-03-28 | 2018-08-24 | 南通大学 | A kind of consistent Synergistic method of multilayer for brain image case history feature extraction |
CN110362663A (en) * | 2018-04-09 | 2019-10-22 | 国际商业机器公司 | Adaptive multi-sensing similarity detection and resolution |
CN110516040A (en) * | 2019-08-14 | 2019-11-29 | 出门问问(武汉)信息科技有限公司 | Semantic Similarity comparative approach, equipment and computer storage medium between text |
CN111859004A (en) * | 2020-07-29 | 2020-10-30 | 书行科技(北京)有限公司 | Retrieval image acquisition method, device, equipment and readable storage medium |
CN112115446A (en) * | 2020-07-29 | 2020-12-22 | 航天信息股份有限公司 | Identity authentication method and system based on Skyline inquiry biological characteristics |
CN112287315A (en) * | 2020-07-29 | 2021-01-29 | 航天信息股份有限公司 | Skyline-based identity authentication method and system by inquiring biological characteristics |
CN115258963A (en) * | 2022-07-27 | 2022-11-01 | 山东中衡光电科技有限公司 | Safety protection system for underground hydraulic hoisting device and setting method for dangerous area |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101315663A (en) * | 2008-06-25 | 2008-12-03 | 中国人民解放军国防科学技术大学 | A Natural Scene Image Classification Method Based on Regional Latent Semantic Features |
CN101923653A (en) * | 2010-08-17 | 2010-12-22 | 北京大学 | An Image Classification Method Based on Multi-level Content Description |
CN102073748A (en) * | 2011-03-08 | 2011-05-25 | 武汉大学 | Visual keyword based remote sensing image semantic searching method |
CN105469096A (en) * | 2015-11-18 | 2016-04-06 | 南京大学 | Feature bag image retrieval method based on Hash binary code |
CN106203507A (en) * | 2016-07-11 | 2016-12-07 | 上海凌科智能科技有限公司 | A kind of k means clustering method improved based on Distributed Computing Platform |
-
2016
- 2016-12-14 CN CN201611150453.8A patent/CN106777090A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101315663A (en) * | 2008-06-25 | 2008-12-03 | 中国人民解放军国防科学技术大学 | A Natural Scene Image Classification Method Based on Regional Latent Semantic Features |
CN101923653A (en) * | 2010-08-17 | 2010-12-22 | 北京大学 | An Image Classification Method Based on Multi-level Content Description |
CN102073748A (en) * | 2011-03-08 | 2011-05-25 | 武汉大学 | Visual keyword based remote sensing image semantic searching method |
CN105469096A (en) * | 2015-11-18 | 2016-04-06 | 南京大学 | Feature bag image retrieval method based on Hash binary code |
CN106203507A (en) * | 2016-07-11 | 2016-12-07 | 上海凌科智能科技有限公司 | A kind of k means clustering method improved based on Distributed Computing Platform |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107766472A (en) * | 2017-10-09 | 2018-03-06 | 中国人民解放军国防科技大学 | Contour hierarchical query parallel processing method based on multi-core processor |
CN107766472B (en) * | 2017-10-09 | 2020-09-04 | 中国人民解放军国防科技大学 | A Parallel Processing Method for Contour Hierarchy Query Based on Multi-core Processor |
CN108446740A (en) * | 2018-03-28 | 2018-08-24 | 南通大学 | A kind of consistent Synergistic method of multilayer for brain image case history feature extraction |
CN110362663A (en) * | 2018-04-09 | 2019-10-22 | 国际商业机器公司 | Adaptive multi-sensing similarity detection and resolution |
CN110362663B (en) * | 2018-04-09 | 2023-06-13 | 国际商业机器公司 | Adaptive multi-perceptual similarity detection and analysis |
CN110516040A (en) * | 2019-08-14 | 2019-11-29 | 出门问问(武汉)信息科技有限公司 | Semantic Similarity comparative approach, equipment and computer storage medium between text |
CN111859004A (en) * | 2020-07-29 | 2020-10-30 | 书行科技(北京)有限公司 | Retrieval image acquisition method, device, equipment and readable storage medium |
CN112115446A (en) * | 2020-07-29 | 2020-12-22 | 航天信息股份有限公司 | Identity authentication method and system based on Skyline inquiry biological characteristics |
CN112287315A (en) * | 2020-07-29 | 2021-01-29 | 航天信息股份有限公司 | Skyline-based identity authentication method and system by inquiring biological characteristics |
CN112115446B (en) * | 2020-07-29 | 2024-02-09 | 航天信息股份有限公司 | Skyline query biological feature-based identity authentication method and system |
CN115258963A (en) * | 2022-07-27 | 2022-11-01 | 山东中衡光电科技有限公司 | Safety protection system for underground hydraulic hoisting device and setting method for dangerous area |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106777090A (en) | The medical science big data search method of the Skyline that view-based access control model vocabulary is matched with multiple features | |
Panda et al. | Diversity-aware multi-video summarization | |
CN110399895A (en) | The method and apparatus of image recognition | |
Huang et al. | MultiSpectralNet: Spectral clustering using deep neural network for multi-view data | |
WO2023108995A1 (en) | Vector similarity calculation method and apparatus, device and storage medium | |
CN116703531B (en) | Article data processing method, apparatus, computer device and storage medium | |
Zhang et al. | Boosting cross-media retrieval via visual-auditory feature analysis and relevance feedback | |
Deng et al. | Selective clustering for representative paintings selection | |
Etezadifar et al. | Scalable video summarization via sparse dictionary learning and selection simultaneously | |
CN110956213A (en) | Method and device for generating remote sensing image feature library and method and device for retrieving remote sensing image | |
Hamreras et al. | Content based image retrieval by convolutional neural networks | |
Asadi Amiri et al. | A novel content-based image retrieval system using fusing color and texture features | |
Prasomphan | Toward Fine-grained Image Retrieval with Adaptive Deep Learning for Cultural Heritage Image. | |
Sharma et al. | A survey of image data indexing techniques | |
CN106777094A (en) | The medical science big data searching system of the Skyline that view-based access control model vocabulary is matched with multiple features | |
Yu et al. | Visual query processing for efficient image retrieval using a SOM-based filter-refinement scheme | |
Bhardwaj et al. | A futuristic hybrid image retrieval system based on an effective indexing approach for swift image retrieval | |
Zhu et al. | Cross-modal contrastive learning with spatio-temporal context for correlation-aware multi-scale remote sensing image retrieval | |
Zou et al. | Local pattern collocations using regional co-occurrence factorization | |
Parseh et al. | Scene representation using a new two-branch neural network model | |
CN114708449B (en) | Similar video determination method, and training method and device of example characterization model | |
Shabbir et al. | Tetragonal Local Octa-Pattern (T-LOP) based image retrieval using genetically optimized support vector machines | |
CN102369525A (en) | System for searching visual information | |
CN106570127B (en) | Remote sensing image retrieval method and system based on object attribute association rule | |
Zhang et al. | Improved image retrieval algorithm of GoogLeNet neural network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170531 |
|
RJ01 | Rejection of invention patent application after publication |