CN106777090A - The medical science big data search method of the Skyline that view-based access control model vocabulary is matched with multiple features - Google Patents
The medical science big data search method of the Skyline that view-based access control model vocabulary is matched with multiple features Download PDFInfo
- Publication number
- CN106777090A CN106777090A CN201611150453.8A CN201611150453A CN106777090A CN 106777090 A CN106777090 A CN 106777090A CN 201611150453 A CN201611150453 A CN 201611150453A CN 106777090 A CN106777090 A CN 106777090A
- Authority
- CN
- China
- Prior art keywords
- image
- skyline
- vector
- feature
- characteristic
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/58—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/583—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
Landscapes
- Engineering & Computer Science (AREA)
- Library & Information Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Medical Treatment And Welfare Office Work (AREA)
- Measuring And Recording Apparatus For Diagnosis (AREA)
Abstract
The medical science big data search method of the Skyline that view-based access control model vocabulary is matched with multiple features, belong to intelligent medical treatment with big number treatment crossing domain, be applied in the middle of the medical Image Retrieval Technology based on content for metric space Skyline inquiries by the system, and technical essential is:Extract the characteristics such as SIFT, Color of medical image, multiple low-level image features of image are merged using distributed Skyline operations, each characteristic similarity as Skyline evaluation objective, the result of return is all to compare the similar or extremely similar candidate image of certain one-dimensional characteristic on multidimensional characteristic with query image, Liu Shi treatment finally is carried out using the Spark systems of cloud computing, and is inquired about in real time or result.Effect is:The corresponding information for getting picture in user terminal is uploaded and is saved in cloud server, and then cloud server is processed, and is obtained optimal medical image clustering schemes and is fed back to user.
Description
Technical field
Patent of the present invention belongs to intelligent medical treatment and big number treatment crossing domain, is a kind of view-based access control model vocabulary and multiple features
Be applied to based on content for metric space Skyline inquiries by the medical science big data searching system of the Skyline of matching, the system
In the middle of medical Image Retrieval Technology, it is related to the mass data processing under extensive Analysis of Medical Treatment Data, cloud computing environment, is related to
To data intelligence processing and application and development.
Background technology
With the development and the popularization of medical digital equipment of internet, medical image data exponentially increases, related
The retrieval technique of view data also increasingly paid close attention to by people, the characteristics of mass data not only has data volume big, it
Also contain huge commercial value.The tumour growth situation of medical science cancer user is for example analyzed, doctor can be instructed to carry out
Related personalized therapy program is recommended;Analysis cerebration, the record of heart rate can bring diagnosis and treatment to refer to producer of hospital and patient
Lead or domestic monitoring disease before early warning.However, the explosive growth of massive medical image data so that traditional unit data point
What analysis treatment technology had increasingly been not suitable with the analysis of current Method on Dense Type of Data Using and treatment need to be in order to ensure image retrieval precision
On the premise of, medical image retrieval efficiency is improved, metric space Skyline inquires about (MetricSkylineQuery) algorithm in figure
As process field has obtained good application.The algorithm can improve image retrieval by the data beta pruning in metric space
Efficiency.
Most of metric space Skyline algorithms of conventional images data are to carry out metric space based on general text semantic
Modeling.In medical science is for the semantic image retrieval method of background, although the semantic information of image is enriched, semantic letter is there is also
The shortcomings of breath complexity, semantic understanding subjectivity, extraction of semantics and difficult expression, these shortcomings have impact on metric space modeling and medical science
Image retrieval effect;Further, since the ambiguity of semantic information, most of algorithm in order to improve inquiry precision, according to semanteme
Need to select multiple images to participate in inquiry, this considerably increases the amount of calculation of query process again.It is computationally intensive as metric space
One big bottleneck of Skyline inquiries, this point is especially protruded in massive medical image data processing.
In recent years, CBIR technology has obtained rapid development, and is increasingly becoming field of image search
Mainstream technology.For the shortcoming that the metric space algorithms selection image, semantic information of existing medical image is retrieved,
Start with from medical science picture material, the low-level image feature of image is chosen on metric space as research object.In order to improve retrieval essence
Degree, in order to save computing cost, accelerate similarity apart from calculating speed, from multiple features fusion angle design metric space
Skyline algorithms, based on this, we have designed and Implemented the patent of invention.
The content of the invention
Be applied to for metric space Skyline inquiries by defect and deficiency according to present in above-mentioned background technology, the present invention
In the middle of medical science large-scale image retrieval technique based on content, and it is how special with Skyline to propose a kind of view-based access control model vocabulary
Medical science large-scale image search method (BigFeatureFusionbySkyline, BSKFF) of fusion is levied, is grasped using Skyline
The fusion of multiple features is carried out, a kind of medical science big data searching system of new view-based access control model vocabulary is devised, preferably solved
Medical science Large Scale Graphs image data search problem.
To achieve these goals, the technical scheme that this patent is used is:
A kind of medical science big data search method of the Skyline that view-based access control model vocabulary is matched with multiple features, its feature exists
In comprising the following steps:
S1. the low-level image feature of medical image is extracted, low-level image feature set is clustered respectively, build visual vocabulary table,
It is a vector for the vision word frequency of occurrences by the image quantization in image library with this, obtains partition characteristics vector;
S2. similarity distance of the arbitrary image in query image and image library in each feature is calculated, to construct not
With the image similarity vector of feature;
S3. call the multiple features fusion method based on Skyline to carry out distributed search and calculate decision-making.
Further, the step S1. extracts the characteristic of medical image, gives a query image, extracts the figure
The low-level image feature of picture, comprises the following steps:
The extraction of S1.1.Color features;
The extraction of S1.2.SIFT features;
S1.3. visual vocabulary table is built;
S1.4. image quantization is represented.
Further, the method for the image similarity vector of different characteristic is constructed in the step S2 is:One includes n width
The image library of medical imageWith query image q, medical image is expressed as characteristic vector, query image q and image
Arbitrary image o in the I of storehouseiSimilarity distance in t-th feature, its L for being expressed as two vectors1Distance:
WhereinRepresent image oiT-th Feature Descriptor vector, be image oiT dimension low-level image feature k tie up to
Amount;
Based on formula 1.3, the arbitrary image o inquired about in medical image q and medical image storehouse I is obtainediIn each feature
Similarity distance, image q and oiSimilarity vector as define 1.2 shown in:
Define 1.2:IfIt is the image library comprising n width images, q is query image, query image q and image library I
Middle arbitrary image oiSimilarity vector be expressed as m dimensional vectors:
Vecti(oi, q)=< dist (oi.x1,q.x1),dist(oi.x2,q.x2),...,dist(oi.xm,q.xm) >
Wherein i ∈ [1, n], m represent low-level image feature number, Vecti(oi, q) represent image q and image oiSimilarity to
Amount, dist (oi.xk,q.xk) represent two images kth (k≤m) dimensional feature similarity distance;All images in image library I
Calculate similarity distance, construction n similarity vector of generation on each dimensional feature with query image q respectively.
Further, the specific method of the step S3:
A given medical image storehouse comprising n width imagesIt is multiple features with width a query image q, set R
The Query Result of fusion method, for the m low-level image feature vector of each image
As piece image oi∈ R, and if only if meets following condition:
Then R set contains with query image q the similarity vector Vect in X vector spacesi(oi, q)=< dist
(oi.x1,q.x1),dist(oi.x2,q.x2),...,dist(oi.xm,q.xm) > is any by other on the I of medical image storehouse
The set of all images of image similarity vector domination;
Further, the result set of the multiple features fusion method based on Skyline is the subset in medical image storehouse, and many
The image collection do not arranged by arbitrary image in image set in characteristic measure space, query image q and arbitrary image oi's
SIFT and Color characteristic similarities distance value constitutes point, and the abscissa of point represents image o1The SIFT feature between query image q
Similarity distance, ordinate represents image o1The similarity distance of Color features between query image q, this is described similar
Degree distance is all based on what bag of words were calculated on multiple features metric space, and similarity is got between the two apart from smaller
It is similar.
Further, stream process are carried out using Spark, streaming are calculated and resolves into a series of short and small batch processing jobs,
Gradually fusion is recommended with the result of decision.
Further, the method for the extraction of step S1.1.Color features is as follows:
Color features describe son to represent with color attribute CN, by red, black, blue, green, brown, grey, powder, orange, white, purple, Huang
Color color is constituted, and color attribute CN is defined as one 11 variable of dimension, is all pixels one color attribute of imparting in image
Label, this label carries out stream process as a main factor of Skyline multiplicities using Spark, as a result gradually perfect
With output;
Further, the method for the extraction of step S1.2.SIFT features is as follows:
It is made up of detection characteristic point and Expressive Features point two parts, spatial scaling is carried out to original image, obtains image
Metric space represents sequence, then image process obtaining characteristic point, and feature is represented using the description subvector of 128 dimensions
Point, obtains the SIFT feature vector of totally 128 dimensions, with the characteristic point generated in SIFT feature extraction process, by characteristic point and its institute
Peripheral region as regional area, extract the CN vectors of each pixel in regional area, obtain SIFT and CN local special
Vector is levied, this vector carries out stream process as a main factor of Skyline multiplicities using Spark, as a result gradually complete
Kind and output;
Further, the method for step S1.3. structures visual vocabulary table is as follows:
By the multi-level clustering algorithm k-means based on Spark and its mutation and over-sampling amendment, using Spark systems
System, streaming training is carried out to the image in image library, and respectively SIFT and Color characteristic vectors progressively generate visual vocabulary
Table, during generation visual vocabulary table, using first cutting data, and uses Spark systems, and distributed treatment is carried out in a streaming manner, and
Being incremented by derives result set;
Wherein, multilayer k-means clustering algorithms are the set of characteristic points X={ x in some dimensions1,x2,...,xnIn seek
Look for k cluster centre C={ c1,c2,...,ck, make each characteristic point to the square error and minimum at place cluster center;These gather
X is divided into k disjoint cluster Y={ Y by class center1,Y2,...,YkSo that for arbitrary 1≤i ≠ j≤k,For a cluster Yi, its central point is:
Wherein, over-sampling correction algorithm is to carry out center point selection and global error using a SparkSpark operation
Calculating (be that we employ Spark with traditional MapReduce differences, processed using distributed caching, with plus
The speed of fast repeatedly band, is as a result carried out in the way of streaming is incremented by), its object function is:
The target of the OnR clustering algorithms that each catabolic phase is produced is to find an optimal division C so that Spark
Final global clustering error φX(C) it is minimum, wherein φX(C) it is, using center point set C, the complete of generation to be divided to characteristic set X
Office's cluster error, | | | | it is Euclidean distance.SIFT and CN characteristic sets are clustered respectively, in the k cluster for obtaining
The heart is their visual vocabulary tables.
Further, the method that step S1.4. image quantizations are represented is as follows:
Based on the visual vocabulary table of clustering algorithm generation, the SIFT of each image describes son and is quantified as one to fill word
Bag of words, in vision bag of words, give a visual vocabulary table for featureWherein j=1 ..., m, k are
The number of word in visual vocabulary table, in image library, each image be quantified as a vision word frequency of occurrences k tie up to
Amount, carries out quantification treatment to Color features in an identical manner, and each image is quantified into the corresponding characteristic vector of generation,
For the quantizing process of multiple features, by that analogy, until all features are quantized, obtain as defined the characteristic vector shown in 1.1;
Define 1.1:In each data partition, an image library comprising n width images is searchedIt is assumed that every
Width image oiThere is one group of low-level image featureM is the quantity of low-level image feature, each image oiCharacteristic vector be expressed as <
oi.x1,oi.x2,...,oi.xm>.
Beneficial effect:The medical science big data searching system can get the corresponding letter of picture by correlation technique in user terminal
Breath is uploaded and is saved in cloud server, and then cloud server carries out distributed treatment, obtains optimal medical image cluster
Scheme simultaneously progressively feeds back to user.
Brief description of the drawings
The system model of Fig. 1 Feature fusions of the invention;
Fig. 2 Fusion Features processes of the present invention based on Skyline;
The false code of Fig. 3 SKFF algorithms of the invention.
Specific embodiment
Embodiment 1:It is a kind of medical science big data of the Skyline that view-based access control model vocabulary is matched with multiple features with reference to Fig. 1
Searching system, the system is made up of a cloud center service system and a cell phone intelligent mobile client software systems.Its
In, cloud service system is responsible for carrying out the characteristics such as distributed SIFT, Color for progressively extracting medical image, using Skyline
Operate and multiple low-level image features of image are merged, each characteristic similarity passes through as the evaluation objective of Skyline
Spark is calculated, progressively returning result, and the result for finally returning that be all compare on multidimensional characteristic with query image it is similar or certain
The extremely similar candidate image of one-dimensional characteristic;Our mobile medical science end software will need to carry out medical science Large Scale Graphs as needed
As the medical image of hierarchical cluster is sent to cloud center service system, and receive high in the clouds request.
The medical science big data retrieval of the Skyline matched with multiple features as one embodiment, the view-based access control model vocabulary
The execution flow of system is, when mobile subscriber is by medical image scanner, to gather and send related medical image retrieval
After request, the characteristics such as SIFT, Color of medical image are extracted by cloud system, using Skyline operations to many of image
Individual low-level image feature is merged, and obtains best clustering schemes and return progressively returns to user, if time long enough, can be by
Final result is to user, the middle confirmation progressively confirmed with final complete result that business can be carried out by mobile intercommunion platform
Work.
The process step of SIFT, Color characteristic algorithm is specially:Color features color attribute ColorNames
(CN) describe son to represent, color attribute CN is defined as one 11 variable of dimension, be all pixels one face of imparting in image
Color attribute tags, this label as Skyline multiplicities a main factor.It is that original image is entered that SIFT feature is extracted
Row spatial scaling, the metric space for obtaining image represents sequence, then represents characteristic point using the description subvector of 128 dimensions,
Obtain the SIFT feature vector of totally 128 dimensions.With the characteristic point generated in SIFT feature extraction process, by characteristic point and its place
Peripheral region extracts the CN vectors of each pixel in regional area as regional area, obtain SIFT and CN local features to
Amount, this vector as Skyline multiplicities a main factor.Then we are by the CN labels and characteristic vector of collection
Stream process are carried out using Spark, is as a result gradually improved and output.Based on the extracting method of SIFT and CN characteristic vectors, by base
In the multi-level clustering algorithm k-means of Spark and its mutation and over-sampling amendment, using Spark systems, to extensive medical science
Image in image library carries out streaming training, and respectively SIFT and Color characteristic vectors progressively generate visual vocabulary table, we
Using first cutting data, and Spark systems are used, distributed treatment is carried out in a streaming manner, and be incremented by derivation result set;Wherein,
Multilayer k-means clustering algorithms are to find k in the set of characteristic points of some dimensions (such as in grid or more higher dimensional space)
Individual cluster centre, makes each characteristic point to the square error and minimum at place cluster (focal zone) center.These cluster centres are by spy
Levy point set and be divided into k disjoint cluster (focal zone) so that for arbitrary, for a cluster (focal zone), you can calculate
Go out focus point.
Based on the visual vocabulary table of clustering algorithm generation, the SIFT of each image describes son and is quantified as one to fill word
Bag of words.In vision bag of words, a visual vocabulary table for feature is givenWherein j=1 ..., m, k are
The number (i.e. cluster centre number) of word in visual vocabulary table.Then in medical image storehouse, every width medical image is quantified as
One vector of the vision word frequency of occurrences (k dimensional vectors).Quantification treatment is carried out to Color features in an identical manner, and
Each image is quantified into the corresponding characteristic vector of generation.For the quantizing process of multiple features (m >=2), by that analogy, until all
Feature is quantized.
Used as another embodiment, the definition of over-sampling correction algorithm is:In each iteration, over-sampling amendment
(OversamplingandRefining, referred to as OnR) carries out center point selection and complete using a SparkSpark operation
The calculating of office's error (is that we employ Spark with traditional MapReduce differences, at distributed caching
Reason, to accelerate the speed of repeatedly band, is as a result carried out in the way of streaming is incremented by), OnR methods are subject to scalablek-means++ side
The inspiration of method, except oversample factor, it uses another oversample factor, the central point that further the increase Map stages are selected
Number.
In each data partition, an image library comprising n width medical images is searchedWith the medical science of inquiry
Image q, according to S1, medical image is expressed as characteristic vector.Then, the arbitrary image o in query image q and image library Ii
Similarity distance in t-th feature is represented by the L of two vectors1Distance, according to formula, we obtain query image q and figure
As the arbitrary image o in the I of storehouseiSimilarity distance in each feature, then image q and oiSimilarity vector can represent
It is the similarity distance of two images kth (k≤m) dimensional feature.All images in image library I are respectively with query image q each
Similarity distance, construction n similarity vector of generation are calculated on dimensional feature.
With reference to Fig. 3, the similarity of each image and query image on feature SIFT and Color in image library is calculated, obtained
To the image similarity vector set of two dimension;Further, query image q and arbitrary image oiSIFT and Color feature phases
Point is constituted like degree distance value, Distributed Calculation decision-making, similarity distance are carried out by the multiple features fusion method based on Skyline
Smaller, more similar between the two, we carry out stream process using Spark, and as a result gradually fusion is recommended with the result of decision, and user obtains
The time can Query refinement at any time for the result for arriving.
Embodiment 2:A kind of medical science big data searching system of the Skyline that view-based access control model vocabulary is matched with multiple features,
The characteristics such as SIFT, Color of medical image are mainly extracted, the multiple bottoms to image are operated using distributed Skyline
Layer feature merged, each characteristic similarity as Skyline evaluation objective, the result of return is existed with query image
All compare the similar or extremely similar candidate image of certain one-dimensional characteristic on multidimensional characteristic, finally using the Spark systems of cloud computing
Liu Shi treatment is carried out, and is inquired about in real time or result.The three below stage can be divided into:
First stage:Extract the feature of image.A query image is given, the low-level image feature of the image is extracted.Step is such as
Under:
The extraction of S1.Color features;
The extraction of S2.SIFT features;
S3. visual vocabulary table is built;
S4. image quantization is represented.
Further, step S1.Color features describe son to represent with color attribute ColorNames (CN), by 11 kinds
Basic colors is constituted, i.e., red, black, blue, green, brown, grey, powder, orange, white, purple and yellow, and color attribute CN is thus defined as one
The variable of 11 dimensions, is all pixels one color attribute label of imparting in image, and this label is used as Skyline multiplicities
One main factor, we carry out stream process using Spark, as a result gradually improve and output.
Further, step S2.SIFT characteristic extraction procedures are made up of detection characteristic point and Expressive Features point two parts.It is right
Original image carries out spatial scaling, and the metric space for obtaining image represents sequence, and then carrying out relevant treatment to image obtains spy
Levy a little.Characteristic point is represented using the description subvector of 128 dimensions, the SIFT feature vector of totally 128 dimensions is obtained.Carried with SIFT feature
The characteristic point generated during taking, using the peripheral region at characteristic point and its place as regional area, in extraction regional area
Each pixel CN vector, obtain SIFT and CN local feature vectors, this vector as Skyline multiplicities a master
Factor, we carry out stream process using Spark, as a result gradually improve and output;
Further, step S3. is based on the extracting method of SIFT and CN characteristic vectors, by many stratas based on Spark
Class algorithm k-means and its mutation and over-sampling amendment, using Spark systems, streaming instruction are carried out to the image in image library
Practice, and respectively SIFT and Color characteristic vectors progressively generate visual vocabulary table, we it is different from visual vocabulary table before
In we use Spark systems using first cutting data, distributed treatment are carried out in a streaming manner, and be incremented by derivation result
Collection;
Wherein, multilayer k-means clustering algorithms are the features in some dimensions (such as in grid or more higher dimensional space)
Point set X={ x1,x2,...,xnK cluster centre C={ c of middle searching1,c2,...,ck, make each characteristic point to place cluster
The square error and minimum at center (in tumor image, these cluster center representatives tumor lesion area, or possible focal zone)
(SumofsquaredError, SSE).X is divided into k disjoint cluster Y={ Y by these cluster centres1,Y2,...,Yk, make
Obtain for arbitrary 1≤i ≠ j≤k, For a cluster Yi, its central point (i.e. barycenter) is:
Wherein, over-sampling correction algorithm is to carry out center point selection and global error using a SparkSpark operation
Calculating (be that we employ Spark with traditional MapReduce differences, processed using distributed caching, with plus
The speed of fast repeatedly band, is as a result carried out in the way of streaming is incremented by), its object function is:
The target of the OnR clustering algorithms that each catabolic phase is produced is to find an optimal division C so that Spark
Final global clustering error φX(C) it is minimum.Wherein φX(C) it is, using center point set C, the complete of generation to be divided to characteristic set X
Office's cluster error, | | | | it is Euclidean distance.SIFT and CN characteristic sets are clustered respectively, in the k cluster for obtaining
The heart is their visual vocabulary tables.
Further, step S4. is based on the visual vocabulary table of clustering algorithm generation, and SIFT description of each image are measured
Turn to the bag of words for filling word.In vision bag of words, a visual vocabulary table for feature is givenIts
Middle j=1 ..., m, k are the numbers (i.e. cluster centre number) of word in visual vocabulary table.Then in image library, each image
It is quantified as a vector for the vision word frequency of occurrences (k dimensional vectors).Color features are carried out at quantization in an identical manner
Reason, and each image is quantified into the corresponding characteristic vector of generation.For the quantizing process of multiple features (m >=2), by that analogy,
Until all features are quantized, obtain as defined the characteristic vector shown in 1.1.
Define 1.1 (partition characteristics vectors):In each data partition, an image library comprising n width images is searchedIt is assumed that each image oiThere is one group of low-level image featureM is the quantity of low-level image feature, each image oi's
Characteristic vector is expressed as < oi.x1,oi.x2,...,oi.xm>.
Second stage, characteristic matching.Figure in Distributed Calculation query image and image library in each each data partition
The similarity of the SIFT and Color of picture.Step is as follows:
S1. a medical image is given, its SIFT feature and Color features is progressively extracted using Spark, then basis
Its Feature Descriptor is each quantified as characteristic vector by the visual vocabulary table for having generated, and we carry out stream process using Spark, knot
Fruit is gradually extracted and quantifies;
S2. calculate medical image between each feature similarity;
Further, existing one image library comprising n width medical images of step S2.With query image q, root
According to S1, medical image is expressed as characteristic vector.Then, the arbitrary image o in query image q and image library IiIt is special at t-th
The similarity distance levied is represented by the L of two vectors1Distance:
WhereinRepresent image oiT-th Feature Descriptor vector, that is, represent image oiT tie up low-level image feature k
Dimensional vector.
Based on formula 1.3, we obtain the arbitrary image o inquired about in medical image q and medical image storehouse IiIt is special at each
The similarity distance levied.So image q and oiSimilarity vector as define 1.2 shown in:
Define 1.2 (image similarity vectors):IfIt is the image library comprising n width images, q is query image,
Arbitrary image o in query image q and image library IiSimilarity vector can be expressed as m dimensional vectors:
Vecti(oi, q)=< dist (oi.x1,q.x1),dist(oi.x2,q.x2),...,dist(oi.xm,q.xm) >
Wherein i ∈ [1, n], m represent low-level image feature number, Vecti(oi, q) represent image q and image oiSimilarity to
Amount, dist (oi.xk,q.xk) represent two images kth (k≤m) dimensional feature similarity distance.
All images in image library I calculate similarity distance, construction generation with query image q on each dimensional feature respectively
N similarity vector.
Phase III, Fusion Features.The similarity vector of different characteristic is configured to a new vector, is called and is based on
The multiple features fusion method (SKFF) of Skyline carries out Distributed Calculation decision-making.Finally, we carry out stream process using Spark,
Result is gradually merged recommends with the result of decision, and the time can Query refinement at any time for the result that user obtains.
S1. the similarity of each image and query image on feature SIFT and Color in Distributed Calculation image library, obtains
To the image similarity vector set of two dimension;
S2. Fusion Features are carried out using the multiple features fusion of Skyline, the result of preceding features matching can conduct
The input of Skyline operations;
S3. Liu Shi treatment is carried out using the Spark systems of cloud computing, and is inquired about in real time or result.
Further, the definition (4.1) of the multiple features fusion method based on Skyline is given.
Define 1.4 (the multiple features fusion methods based on Skyline):A given medical image storehouse comprising n width imagesIt is the Query Result of multiple features fusion method with width a query image q, set R.For m bottom of each image
Characteristic vectorR set contains with query image q the similarity vector Vect in X vector spacesi(oi, q)=<
dist(oi.x1,q.x1),dist(oi.x2,q.x2),...,dist(oi.xm,q.xm) > is by other on the I of medical image storehouse
The set of all images of any image similarity vector domination, i.e., as piece image oi∈ R, and if only if meets following bar
Part:
Further, the result set of the multiple features fusion method (SKFF) based on Skyline is the subset in medical image storehouse,
And in multiple features metric space not by image set in the image collection arranged of arbitrary image.Query image q and arbitrary image
oiSIFT and Color characteristic similarities distance value constitute point, as shown in Fig. 2 such as p1The abscissa of point represents image o1With look into
The similarity distance of SIFT feature between inquiry image q, ordinate then represents the similarity distance of Color features between them, this
A little distances are all based on bag of words calculating on multiple features metric space.
Further, similarity is more similar between the two apart from smaller, therefore { p1,p2,p3,p4It is last
Skyline results, represent without other better image ratios { o1,o2,o3,o4In SIFT and Color features all with query graph
Picture it is more like, i.e., there is no image and the similarity vector of query image to be arranged in SIFT and Color features in image library
They.
S3.Spark carries out stream process, and gradually fusion is recommended with the result of decision.
Further, step S2, show that last Skyline results are { p1,p2,p3,p4}。
Further, stream process are carried out using Spark, streaming is calculated and resolves into a series of short and small batch processing jobs.
Whole streaming is calculated can be overlapped according to the demand of business to middle result, or storage is to external equipment, optimal
Medical science clustering schemes progressively feed back to user.
The above, the protection domain of only the invention preferably specific embodiment, but the invention is not
Be confined to this, any one skilled in the art in the technical scope that the invention is disclosed, according to the present invention
The technical scheme of creation and its inventive concept are subject to equivalent or change, should all cover the invention protection domain it
It is interior.
Claims (10)
1. a kind of medical science big data search method of the Skyline that view-based access control model vocabulary is matched with multiple features, it is characterised in that
Comprise the following steps:
S1. the low-level image feature of medical image is extracted, low-level image feature set is clustered respectively, build visual vocabulary table, with this,
It is a vector for the vision word frequency of occurrences by the image quantization in image library, obtains partition characteristics vector;
S2. similarity distance of the arbitrary image in query image and image library in each feature is calculated, to construct different spies
The image similarity vector levied;
S3. call the multiple features fusion method based on Skyline to carry out distributed search and calculate decision-making.
2. the medical science big data retrieval side of the Skyline that view-based access control model vocabulary as claimed in claim 1 is matched with multiple features
Method, it is characterised in that the step S1. extracts the characteristic of medical image, gives a query image, extracts the image
Low-level image feature, comprises the following steps:
The extraction of S1.1.Color features;
The extraction of S1.2.SIFT features;
S1.3. visual vocabulary table is built;
S1.4. image quantization is represented.
3. the medical science big data retrieval side of the Skyline that view-based access control model vocabulary as claimed in claim 1 is matched with multiple features
Method, it is characterised in that the method for the image similarity vector of different characteristic is constructed in the step S2 is:One is cured comprising n width
Learn the image library of imageWith query image q, medical image is expressed as characteristic vector, query image q and image library I
In arbitrary image oiSimilarity distance in t-th feature, its L for being expressed as two vectors1Distance:
WhereinRepresent image oiT-th Feature Descriptor vector, be image oiT tie up low-level image feature k dimensional vectors;
Based on formula 1.3, the arbitrary image o inquired about in medical image q and medical image storehouse I is obtainediIt is similar in each feature
Degree distance, image q and oiSimilarity vector as define 1.2 shown in:
Define 1.2:IfIt is the image library comprising n width images, q is query image, is appointed in query image q and image library I
It is intended to as oiSimilarity vector be expressed as m dimensional vectors:
Vecti(oi, q)=< dist (oi.x1,q.x1),dist(oi.x2,q.x2),...,dist(oi.xm,q.xm) >
Wherein i ∈ [1, n], m represent low-level image feature number, Vecti(oi, q) represent image q and image oiSimilarity vector,
dist(oi.xk,q.xk) represent two images kth (k≤m) dimensional feature similarity distance;All images point in image library I
Do not calculate similarity distance, construction n similarity vector of generation on each dimensional feature with query image q.
4. the medical science big data retrieval side of the Skyline that view-based access control model vocabulary as claimed in claim 1 is matched with multiple features
Method, it is characterised in that the specific method of the step S3:
A given medical image storehouse comprising n width imagesIt is multiple features fusion side with width a query image q, set R
The Query Result of method, for the m low-level image feature vector of each image
As piece image oi∈ R, and if only if meets following condition:
Then R set contains with query image q the similarity vector Vect in X vector spacesi(oi, q)=< dist (oi.x1,
q.x1),dist(oi.x2,q.x2),...,dist(oi.xm,q.xm) > is by other any image phases on the I of medical image storehouse
Like the set of all images for spending vector domination.
5. the medical science big data retrieval side of the Skyline that view-based access control model vocabulary as claimed in claim 4 is matched with multiple features
Method, it is characterised in that the result set of the multiple features fusion method based on Skyline is the subset in medical image storehouse, and in Duo Te
Levy in metric space the image collection do not arranged by arbitrary image in image set, query image q and arbitrary image oiSIFT
Point is constituted with Color characteristic similarities distance value, the abscissa of point represents image o1The phase of SIFT feature between query image q
Like degree distance, ordinate represents image o1The similarity distance of Color features between query image q, the similarity away from
It is all based on what bag of words were calculated from multiple features metric space, similarity is more similar between the two apart from smaller.
6. the medical science big data retrieval side of the Skyline that view-based access control model vocabulary as claimed in claim 5 is matched with multiple features
Method, it is characterised in that carry out stream process using Spark, streaming is calculated and resolves into a series of short and small batch processing jobs, gradually
Fusion is recommended with the result of decision.
7. the medical science big data retrieval side of the Skyline that view-based access control model vocabulary as claimed in claim 2 is matched with multiple features
Method, it is characterised in that the method for the extraction of step S1.1.Color features is as follows:
Color features describe son to represent with color attribute CN, by red, black, blue, green, brown, grey, powder, orange, white, purple, yellow face
Colour cell into, color attribute CN is defined as one 11 variable of dimension, be that all pixels assign a color attribute label in image,
This label carries out stream process as a main factor of Skyline multiplicities using Spark, as a result gradually improve with it is defeated
Go out.
8. the medical science big data retrieval side of the Skyline that view-based access control model vocabulary as claimed in claim 2 is matched with multiple features
Method, it is characterised in that the method for the extraction of step S1.2.SIFT features is as follows:
It is made up of detection characteristic point and Expressive Features point two parts, spatial scaling is carried out to original image, obtains the yardstick of image
Space representation sequence, then to image process obtaining characteristic point, and characteristic point is represented using the description subvector of 128 dimensions,
The SIFT features vector of totally 128 dimensions is obtained, with the characteristic point generated in SIFT feature extraction process, by characteristic point and its place
Peripheral region extracts the CN vectors of each pixel in regional area as regional area, obtain SIFT and CN local features to
Amount, this vector carries out stream process as a main factor of Skyline multiplicities using Spark, as a result gradually improve with
Output.
9. the medical science big data retrieval side of the Skyline that view-based access control model vocabulary as claimed in claim 2 is matched with multiple features
Method, it is characterised in that the method that step S1.3. builds visual vocabulary table is as follows:
It is right using Spark systems by the multi-level clustering algorithm k-means based on Spark and its mutation and over-sampling amendment
Image in image library carries out streaming training, and respectively SIFT and Color characteristic vectors progressively generate visual vocabulary table, generation
During visual vocabulary table, using first cutting data, and Spark systems are used, distributed treatment is carried out in a streaming manner, and be incremented by derivation
Result set;
Wherein, multilayer k-means clustering algorithms are the set of characteristic points X={ x in some dimensions1,x2,...,xnMiddle searching k
Cluster centre C={ c1,c2,...,ck, make each characteristic point to the square error and minimum at place cluster center;In these clusters
X is divided into k disjoint cluster Y={ Y by the heart1,Y2,...,YkSo that for arbitrary 1≤i ≠ j≤k,For a cluster Yi, its central point is:
Wherein, over-sampling correction algorithm is the meter that center point selection and global error are carried out using a SparkSpark operation
Calculate and (be that we employ Spark with traditional MapReduce differences, processed using distributed caching, to accelerate to change
The speed of band, is as a result carried out in the way of streaming is incremented by), its object function is:
The target of the OnR clustering algorithms that each catabolic phase is produced is to find an optimal division C so that Spark is most
Whole global clustering error φX(C) it is minimum, wherein φX(C) it is, using center point set C, the overall situation for producing to be divided to characteristic set X and is gathered
Class error, | | | | it is Euclidean distance.SIFT and CN characteristic sets are clustered respectively, the k cluster centre for obtaining is i.e.
It is their visual vocabulary tables.
10. the medical science big data retrieval side of the Skyline that view-based access control model vocabulary as claimed in claim 2 is matched with multiple features
Method, it is characterised in that the method that step S1.4. image quantizations are represented is as follows:
Based on the visual vocabulary table of clustering algorithm generation, SIFT description of each image are quantified as a word for filling word
Bag, in vision bag of words, gives a visual vocabulary table for featureWherein j=1 ..., m, k are visions
The number of word in vocabulary, in image library, each image is quantified as a k dimensional vector for the vision word frequency of occurrences, with
Identical mode carries out quantification treatment to Color features, and each image is quantified into the corresponding characteristic vector of generation, for
The quantizing process of multiple features, by that analogy, until all features are quantized, obtains as defined the characteristic vector shown in 1.1;
Define 1.1:In each data partition, an image library comprising n width images is searchedIt is assumed that every width figure
As oiThere is one group of low-level image featureM is the quantity of low-level image feature, each image oiCharacteristic vector be expressed as <
oi.x1,oi.x2,...,oi.xm>.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611150453.8A CN106777090A (en) | 2016-12-14 | 2016-12-14 | The medical science big data search method of the Skyline that view-based access control model vocabulary is matched with multiple features |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611150453.8A CN106777090A (en) | 2016-12-14 | 2016-12-14 | The medical science big data search method of the Skyline that view-based access control model vocabulary is matched with multiple features |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106777090A true CN106777090A (en) | 2017-05-31 |
Family
ID=58876961
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611150453.8A Pending CN106777090A (en) | 2016-12-14 | 2016-12-14 | The medical science big data search method of the Skyline that view-based access control model vocabulary is matched with multiple features |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106777090A (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107766472A (en) * | 2017-10-09 | 2018-03-06 | 中国人民解放军国防科技大学 | Contour hierarchical query parallel processing method based on multi-core processor |
CN108446740A (en) * | 2018-03-28 | 2018-08-24 | 南通大学 | A kind of consistent Synergistic method of multilayer for brain image case history feature extraction |
CN110362663A (en) * | 2018-04-09 | 2019-10-22 | 国际商业机器公司 | Adaptive multi-sensing similarity detection and resolution |
CN110516040A (en) * | 2019-08-14 | 2019-11-29 | 出门问问(武汉)信息科技有限公司 | Semantic Similarity comparative approach, equipment and computer storage medium between text |
CN111859004A (en) * | 2020-07-29 | 2020-10-30 | 书行科技(北京)有限公司 | Retrieval image acquisition method, device, equipment and readable storage medium |
CN112115446A (en) * | 2020-07-29 | 2020-12-22 | 航天信息股份有限公司 | Identity authentication method and system based on Skyline inquiry biological characteristics |
CN112287315A (en) * | 2020-07-29 | 2021-01-29 | 航天信息股份有限公司 | Skyline-based identity authentication method and system by inquiring biological characteristics |
CN115258963A (en) * | 2022-07-27 | 2022-11-01 | 山东中衡光电科技有限公司 | Safety protection system for underground hydraulic hoisting device and setting method for dangerous area |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101315663A (en) * | 2008-06-25 | 2008-12-03 | 中国人民解放军国防科学技术大学 | Nature scene image classification method based on area dormant semantic characteristic |
CN101923653A (en) * | 2010-08-17 | 2010-12-22 | 北京大学 | Multilevel content description-based image classification method |
CN102073748A (en) * | 2011-03-08 | 2011-05-25 | 武汉大学 | Visual keyword based remote sensing image semantic searching method |
CN105469096A (en) * | 2015-11-18 | 2016-04-06 | 南京大学 | Feature bag image retrieval method based on Hash binary code |
CN106203507A (en) * | 2016-07-11 | 2016-12-07 | 上海凌科智能科技有限公司 | A kind of k means clustering method improved based on Distributed Computing Platform |
-
2016
- 2016-12-14 CN CN201611150453.8A patent/CN106777090A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101315663A (en) * | 2008-06-25 | 2008-12-03 | 中国人民解放军国防科学技术大学 | Nature scene image classification method based on area dormant semantic characteristic |
CN101923653A (en) * | 2010-08-17 | 2010-12-22 | 北京大学 | Multilevel content description-based image classification method |
CN102073748A (en) * | 2011-03-08 | 2011-05-25 | 武汉大学 | Visual keyword based remote sensing image semantic searching method |
CN105469096A (en) * | 2015-11-18 | 2016-04-06 | 南京大学 | Feature bag image retrieval method based on Hash binary code |
CN106203507A (en) * | 2016-07-11 | 2016-12-07 | 上海凌科智能科技有限公司 | A kind of k means clustering method improved based on Distributed Computing Platform |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107766472A (en) * | 2017-10-09 | 2018-03-06 | 中国人民解放军国防科技大学 | Contour hierarchical query parallel processing method based on multi-core processor |
CN107766472B (en) * | 2017-10-09 | 2020-09-04 | 中国人民解放军国防科技大学 | Contour hierarchical query parallel processing method based on multi-core processor |
CN108446740A (en) * | 2018-03-28 | 2018-08-24 | 南通大学 | A kind of consistent Synergistic method of multilayer for brain image case history feature extraction |
CN110362663A (en) * | 2018-04-09 | 2019-10-22 | 国际商业机器公司 | Adaptive multi-sensing similarity detection and resolution |
CN110362663B (en) * | 2018-04-09 | 2023-06-13 | 国际商业机器公司 | Adaptive multi-perceptual similarity detection and analysis |
CN110516040A (en) * | 2019-08-14 | 2019-11-29 | 出门问问(武汉)信息科技有限公司 | Semantic Similarity comparative approach, equipment and computer storage medium between text |
CN111859004A (en) * | 2020-07-29 | 2020-10-30 | 书行科技(北京)有限公司 | Retrieval image acquisition method, device, equipment and readable storage medium |
CN112115446A (en) * | 2020-07-29 | 2020-12-22 | 航天信息股份有限公司 | Identity authentication method and system based on Skyline inquiry biological characteristics |
CN112287315A (en) * | 2020-07-29 | 2021-01-29 | 航天信息股份有限公司 | Skyline-based identity authentication method and system by inquiring biological characteristics |
CN112115446B (en) * | 2020-07-29 | 2024-02-09 | 航天信息股份有限公司 | Skyline query biological feature-based identity authentication method and system |
CN115258963A (en) * | 2022-07-27 | 2022-11-01 | 山东中衡光电科技有限公司 | Safety protection system for underground hydraulic hoisting device and setting method for dangerous area |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106777090A (en) | The medical science big data search method of the Skyline that view-based access control model vocabulary is matched with multiple features | |
Gao et al. | The deep features and attention mechanism-based method to dish healthcare under social IoT systems: An empirical study with a hand-deep local–global net | |
Panda et al. | Diversity-aware multi-video summarization | |
EP2805262B1 (en) | Image index generation based on similarities of image features | |
Guo et al. | Multiple kernel learning based multi-view spectral clustering | |
Xiang et al. | Fabric image retrieval system using hierarchical search based on deep convolutional neural network | |
CN105205135B (en) | A kind of 3D model retrieval methods and its retrieval device based on topic model | |
Pan et al. | Pointatrousnet: Point atrous convolution for point cloud analysis | |
Deng et al. | Selective clustering for representative paintings selection | |
CN113222181A (en) | Federated learning method facing k-means clustering algorithm | |
Bhattacharjee et al. | Query adaptive multiview object instance search and localization using sketches | |
Chang et al. | Unsupervised video shot detection using clustering ensemble with a color global scale-invariant feature transform descriptor | |
Yoon et al. | Content-based video retrieval with prototypes of deep features | |
CN115116139A (en) | Multi-granularity human body action classification method based on graph convolution network | |
CN106777094A (en) | The medical science big data searching system of the Skyline that view-based access control model vocabulary is matched with multiple features | |
Pavithra et al. | An improved seed point selection-based unsupervised color clustering for content-based image retrieval application | |
Liu et al. | Classification of fashion article images based on improved random forest and VGG-IE algorithm | |
Pavithra et al. | An efficient seed points selection approach in dominant color descriptors (DCD) | |
Meng et al. | Merged region based image retrieval | |
Sun et al. | A pca–cca network for rgb-d object recognition | |
KR20220125422A (en) | Method and device of celebrity identification based on image classification | |
CN114708449B (en) | Similar video determination method, and training method and device of example characterization model | |
Arun et al. | On integrating re-ranking and rank list fusion techniques for image retrieval | |
Shabbir et al. | Tetragonal Local Octa-Pattern (T-LOP) based image retrieval using genetically optimized support vector machines | |
Manoharan et al. | A comparison and analysis of soft computing techniques for content based image retrieval system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170531 |