CN114357220A - Similar medical image calculation method based on locality sensitive hashing algorithm - Google Patents
Similar medical image calculation method based on locality sensitive hashing algorithm Download PDFInfo
- Publication number
- CN114357220A CN114357220A CN202210019188.9A CN202210019188A CN114357220A CN 114357220 A CN114357220 A CN 114357220A CN 202210019188 A CN202210019188 A CN 202210019188A CN 114357220 A CN114357220 A CN 114357220A
- Authority
- CN
- China
- Prior art keywords
- vector
- bucket
- vectors
- hash
- medical image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Apparatus For Radiation Diagnosis (AREA)
- Measuring And Recording Apparatus For Diagnosis (AREA)
Abstract
The invention provides a similar medical image calculation method based on a locality sensitive hashing algorithm. The method comprises the steps of medical image vectorization, hash bucket calculation, hash bucket list construction, medical image vector similarity calculation and the like. The method is mainly applied to a massive high-dimensional vector space formed by medical images, can quickly classify the medical images and quickly screen out similar items of target images in the high-dimensional vector space, and has better time complexity compared with the existing classification calculation method.
Description
Technical Field
The invention relates to the field of big data, belongs to medical images and computer interdisciplines, and particularly relates to a calculation method for solving similar vectors in a high-dimensional vector space.
Background
At present, the application of big data in the medical field is very wide, the medical field gathers very rich data resources, and medical images comprise X-rays, nuclear magnetic resonance imaging, ultrasonic waves and the like, which are all key links in the medical process. Radiologists often need to view each examination individually, which creates an unrealistic large amount of work and may delay the optimal treatment time for the patient. But big data can completely change their way of analysis. In the past, a large amount of medical image data are checked independently and classified in a manual mode, the method consumes manpower time, human resources in the medical field are precious, and how to help medical workers classify the medical image data by using a big data technology and the similar retrieval mode is very important. The invention provides a similar medical image calculation method technology based on a locality sensitive hash algorithm, which is an improvement on the existing locality sensitive hash algorithm and can help medical workers to quickly complete classification and retrieval of medical image data. The method can quickly classify the massive medical image data and screen out the similar items of the target image, thereby greatly saving the labor cost and relieving the data classification pressure of medical workers.
Disclosure of Invention
The invention provides a similar medical image calculation method technology based on a locality sensitive hashing algorithm, the method is mainly used for classification and retrieval of medical images, in a space containing massive medical images, the method can quickly complete image classification, and similar items of target images are screened out.
The technical solution for realizing the purpose of the invention is as follows:
For a (N × N) size medical image picture in a massive medical image space, obtaining a pixel matrix P of the medical image picture, wherein the dimension of the pixel matrix is (N × N), and each element value in the matrix is 0 or 1; the matrix is expanded by rows to obtain a matrix N2The high-dimensional vector is subjected to one-dimensional processing on all pictures in the medical image library according to the one-dimensional method to obtain a high-dimensional vector space Q
And 2, constructing a redundant hash table set of the high-dimensional vector space Q.
The projection values of all vectors in the vector space Q are calculated using two hash functions and projected into one or two hash buckets. The specific hash function is shown in equations 1.1 and 1.2.
Equation 1.1 calculates the central bucket, where v is a vector in the vector space Q, x is a random vector of the same dimension, each element of the random vector x satisfies the gaussian distribution, the random vector is used as a reference vector, and d (v, x) calculates the projection distance of the vector v in the direction of the vector x. w is the width of the hash bucket, and the selection of the width determines the number of vectors falling into the same hash bucket and the sparsity of the hash bucket to a great extent; of the first term of the above formulaIs calculated to obtainThe hash bucket corresponding to the rounding-down value n is taken as a central bucket, and the vector v needs to be placed into the central bucket at first.
Equation 1.2 compute redundant buckets, mod () represents a computationThe remainder r of (c). As shown in fig. 1, when the hash value of the vector v is close to the left boundary (right boundary), the similarity point of the target item may fall in the hash bucket on the left side thereof, and in order to sufficiently obtain the similarity set of the target item, the boundary condition should be discussed case by case.
The specific calculation method is as follows:
c is a hyper-parameter, if the obtained redundant bucket is equal to the central bucket, no putting operation is carried out, if the obtained redundant bucket is not equal to the central bucket, the items are simultaneously put into the redundant buckets. And repeating the steps to complete the projection calculation of all vectors in the vector space Q, and putting the corresponding vectors into the corresponding central bucket and the corresponding redundant bucket. The information contained in the vector is fully utilized by the proposal of the local redundancy sensitive hash concept, which is a relatively special place compared with the current mainstream local sensitive hash algorithm, and the information contained in the characteristic vector is fully mined. The redundant similar calculation can avoid the influence caused by the boundary error, and further ensure the similar accuracy of the algorithm calculation.
And (3) reselecting a reference vector x, and repeating all projection operations in the step (2) in the direction of the vector x to obtain a new hash bucket list. And repeatedly selecting n reference vectors in total, finishing all projection operations in n directions, and obtaining n hash bucket lists.
Step 3, calculating the similar vector of the single target medical image vector
After the above-mentioned redundant hash bucket set is established, how to calculate the similarity vector of the single target medical image vector is described next. Assuming a target directionQuantity g, randomly selecting m reference vectors from the n reference vectors, calculating for each reference vector according to the algorithm described in step 2Get the center barrel i, and calculateObtaining a remainder r, if the value of cxw is more than or equal to r and less than or equal to (1-c) xw, then no redundant barrel number exists, and at the moment, the scheme considers that all similarities of the target item are in the central barrel, and the algorithm only extracts the vector in the central barrel; if and only if the number of candidates in the central bucket is lower than a certain threshold, then vectors are randomly extracted from the hash buckets close to the two sides, as shown in fig. 2, in order to ensure similar correlation, the method extends the distance between at most two hash buckets to the left and right. If r is less than c multiplied by w, searching the hash bucket closest to the left side of the central bucket by the algorithm, and searching the distance between at most two hash buckets to the left; if the effective redundant bucket is retrieved, vectors in the central bucket and the redundant bucket are extracted at the same time to serve as candidate vectors, if the distance between the two hash buckets is searched leftwards, the algorithm considers that the central bucket does not have a left similar hash bucket, and at the moment, the algorithm only extracts candidate items in the central bucket; if r is more than or equal to (1-c) multiplied by w, the algorithm searches the right similar hash bucket by using a left similar same method and extracts the candidate vector. Due to the existence of the redundant hash bucket, repeated items may exist in the obtained candidate vector set, and the candidate vector set needs to be subjected to repeated vector elimination processing;
after the step 3, the algorithm obtains a candidate vector set x '═ { x'1,x′2,x′3,...,x′mThen, for each vector y in the candidate vector set x', we call the vector y and the target vector have a projection overlap, and a vector pair (x, y) composed of the target vector and the vector y performs an overlap count; using the method on each redundant hash bucket set to obtain m candidate vector sets, respectively executing overlap counting, regarding a vector y, if the corresponding overlap counting is not less than a specified threshold t, the algorithm considers the y as a similar vector of a target vector, and the threshold m is in a certain rangeThe degree determines the number of similarity vectors of the target vector.
Step 4, sorting candidate vector set
And (4) performing distance sorting on the similar vector set obtained in the step (3) to obtain N vectors with the closest distance to form a most similar vector set, selecting the vector with the first sorting as the most similar vector of the target vector, and completing similar vector retrieval.
Drawings
FIG. 1 is a redundant similarity calculation explanation of the present invention
FIG. 2 is a similar hash bucket interpretation of the present invention
Detailed Description
For a better understanding of the present disclosure, reference is made to the following description taken in conjunction with the accompanying drawings.
The invention discloses a local sensitive Hash similarity medical image calculation method, which comprises the following steps:
(1a) and constructing a random vector x with the dimension of k, wherein each element of the random vector x satisfies the Gaussian distribution, and the random vector is used as a reference vector.
(1b) For all vectors in the vector space Q, each vector in Q in the vector space is projected into one or two hash buckets using the following equation.
Wherein v is a vector in the vector space Q, x is the random k-dimensional vector, d (v, x) calculates the projection distance of the vector v in the direction of the vector x, w is the width of the hash bucket, and the selection of the width largely determines the number of vectors falling into the same hash bucket and the sparsity of the hash bucket. The formula for d (v, x) is as follows:
in the above formula (1-5), k represents a vector dimension, vi,xiRepresenting the ith component of the vector v, x, respectively.
(1c) In equation (1.4)Representation calculationThe hash bucket corresponding to the rounding-down value n is taken as a central bucket, and the vector v needs to be placed into the central bucket at first.
(1d) Equation (1.5) compute redundant bucket affinity, mod () represents the computationThe remainder r of (c). As shown in fig. 1, when the hash value of the target item is close to the left boundary (right boundary), the similarity point of the target vector may fall in the hash bucket on the left side thereof, and in order to sufficiently obtain the similarity set of the target vector, the boundary should be discussed case by case. The specific calculation formula is as follows:
c is a hyper-parameter and w is the width of the hash bucket, as explained below in terms of different scenarios.
If r < c × w, thenThe redundant bucket is located to the left of the central bucket, and vector x is placed into the left redundant bucket. If c x w is not more than r not more than (1-c) x w, thenAt this point, the redundant bucket equals the central bucket and no put operation is done. If r is not less than (1-c) xw, thenThe redundant bucket is now located to the right of the central bucket, and vector x is placed into the right redundant bucket.
And 3, reselecting a reference vector x, and repeating the projection operation in the step 3 in the direction of the vector x to obtain a new hash bucket list. And repeatedly selecting n reference vectors in total, finishing all projection operations in n directions, and obtaining n hash bucket lists.
Step 4, calculating a hash bucket of the target vector, assuming the target vector g, randomly selecting m reference vectors from the n reference vectors, and calculating each reference vector according to the algorithm in the step 1Get the center barrel i, and calculateAnd obtaining a remainder r, and calculating the hash bucket number of the target vector according to different conditions.
If the value of c multiplied by w is more than or equal to r and less than or equal to (1-c) multiplied by w, no redundant barrel number exists, the algorithm considers that all similarities of the target vectors are in the central barrel, and only the candidate vectors in the central barrel are extracted; and if and only if the number of the candidate vectors in the central bucket is lower than a certain threshold value, randomly extracting the vectors from the hash buckets close to the two sides, and expanding the distance between the two hash buckets at most towards the left and the right in order to ensure the similar correlation. If r is less than c multiplied by w, searching the hash bucket closest to the left side of the central bucket by the algorithm, and searching the distance between at most two hash buckets to the left; if the effective redundant bucket is searched, the candidate vectors in the central bucket and the redundant bucket are extracted at the same time, if the distance between the two hash buckets is searched leftwards, the algorithm considers that the central bucket does not have a left similar hash bucket, and at the moment, the algorithm only extracts the candidate vectors in the central bucket. If r is more than or equal to (1-c) multiplied by w, the algorithm searches the right similar hash bucket by using a left similar same method and extracts the candidate vector. Due to the existence of the redundant hash bucket, repeated vectors may exist in the obtained candidate vector set, and the candidate vector set needs to be subjected to repeated vector elimination processing.
Step 5, through the steps, obtaining a candidate vector set x '═ { x'1,x′2,x′3,...,x′mThen, for each vector x 'in the candidate vector set x'iWe call vector x'iAnd a target vector g, and a projection overlap is generated by the target vector and the vector x'iVector pair of compositions (g, s'i) Performing an overlap count plus one; using the above method on each redundant hash bucket set may result in 1 set of candidate vectors,
and 6, as can be known from the step 3, n hash bucket lists are constructed in the method, and the operation of the step 5 is repeated on each hash bucket list, so that a projection overlap counting table can be obtained. For each pair of vectors (g, x ') in the projection overlay technology table'i) If its corresponding overlap count is not less than the specified threshold t, the algorithm considers this vector x'iIs the similarity vector of the target vector, and the threshold t determines the number of similarity vectors of the target vector to a certain extent.
And 7, performing distance sorting on all the similar vectors obtained in the step 6 to obtain a vector with the closest distance, and reversely generating a (NxN) size medical image picture by using the vector to finish similar image retrieval.
The present invention will be described in further detail with reference to examples.
Example 1
And 2, calculating hash values of all vectors in the vector space Q, and determining a hash barrel number corresponding to each vector according to the hash values.
And 3, reselecting a reference vector x, and repeating the projection operation in the step 3 in the direction of the vector x to obtain a new hash bucket list. A total of 100 reference vectors are repeatedly selected, and all projection operations in 100 directions are completed, so that 100 hash bucket lists are obtained.
Step 4, calculating a hash bucket of the target vector, assuming the target vector g, randomly selecting 50 reference vectors from 100 reference vectors, and calculating each reference vector according to the algorithm in the step 1Get the center barrel i, and calculateAnd obtaining a remainder r, and calculating the hash bucket number of the target vector according to different conditions.
Step 5, through the steps, obtaining a candidate vector set x '═ { x'1,x′2,x′3,...,x′mThen, for each vector x 'in the candidate vector set x'iWe call vector x'iAnd a target vector g, and a projection overlap is generated by the target vector and the vector x'iVector pair of compositions (g, s'i) Performing an overlap count plus one; using the above method on each redundant hash bucket set, 1 candidate vector set can be obtained.
Step 6, as can be seen from step 3, n hash bucket lists are constructed in the method, and the operation of step 5 is repeated on each hash bucket list, so that a projection overlap count can be obtainedTable (7). For each pair of vectors (g, x ') in the projection overlay technology table'i) The algorithm considers this vector x 'if its corresponding overlap count is not less than the specified threshold 20'iIs the similarity vector of the target vector, and the threshold t determines the number of similarity vectors of the target vector to a certain extent.
And 7, performing distance sorting on all the similar vectors obtained in the step 6 to obtain a vector with the closest distance, and reversely generating a medical image picture with the size of (256 multiplied by 256) by using the vector to finish similar image retrieval.
Claims (4)
1. A similar medical image calculation method based on a locality sensitive hashing algorithm is characterized by comprising the following steps:
step 1, the medical image picture is unidimensional, the image picture with the size of (NxN) is executed with unidimensional operation, and the image picture is tiled into N2Vector of dimensions, construct a high-dimensional vector space.
And 2, constructing a random reference vector, and projecting all medical image vectors in the vector space to a central bucket and a redundant bucket.
And 3, repeating the operation in the step 2, constructing n hash bucket lists, and finishing the classification of the medical images.
And 4, calculating the similar vectors of the single medical image vector in the n hash bucket lists, and selecting the similar vectors by adopting a projection counting method.
And 5, performing distance sequencing on the similar vectors obtained in the step 4 to obtain the most similar medical image vectors.
And 6, reversely generating a (N x N) medical image picture according to the most similar medical image vector obtained in the step 5.
2. The method for computing a similar medical image based on locality sensitive hashing algorithm of claim 1, wherein the one-dimensional medical image picture of step 1 is obtained by obtaining a pixel matrix P of (N × N) for a medical image picture with size of (N × N), wherein each element value in the matrix is 0 or 1; the matrix is arranged according to rowsExpand to obtain one as N2And (3) maintaining a high-dimensional vector, and performing one-dimensional processing on all pictures in the medical image library according to the one-dimensional method to obtain a high-dimensional vector space Q.
3. The method for computing similar medical images based on locality sensitive hashing algorithm according to claim 1, wherein said computing step 2 computes central and redundant buckets of all medical image vectors in vector space as follows:
(1) step 1, obtaining a high-dimensional vector space Q, wherein the vector dimension is N2Constructing an N2A random vector x of dimensions, each element of the random vector x satisfying a gaussian distribution, with the random vector as a reference vector.
(2) For all vectors in the above vector space, they are projected into one or two hash buckets using the following equation.
Where v is a vector in vector space Q and x is the random N mentioned above2The dimension vector, d (v, x), calculates the projection distance of the vector v in the direction of the vector x, w is the width of the hash bucket, and the selection of the width largely determines the number of vectors falling into the same hash bucket and the sparseness of the hash bucket. The formula for d (v, x) is as follows:
in the above formula, k represents the vector dimension, vi,xiRepresenting the ith component of the vector v, x, respectively.
(3) In the above formulaRepresentation calculationThe hash bucket corresponding to the rounding-down value n is taken as a central bucket, and the vector v needs to be placed into the central bucket at first.
(4) Formula (II)Compute redundant buckets of similarity, mod () represents a computationThe remainder r of (c). As shown in fig. 1, when the hash value of the target item is close to the left boundary (right boundary), the similarity point of the target vector may fall in the hash bucket on the left side thereof, and in order to sufficiently obtain the similarity set of the target vector, the boundary should be discussed case by case. The specific calculation formula is as follows:
c is a hyper-parameter and w is the width of the hash bucket, as explained below in terms of different scenarios.
If r < c × w, thenThe redundant bucket is located to the left of the central bucket, and vector x is placed into the left redundant bucket. If c x w is not more than r not more than (1-c) x w, thenAt this point, the redundant bucket equals the central bucket and no put operation is done. If r is not less than (1-c) xw, thenWith the redundant bucket inTo the right of the heart bucket, vector x is placed into the right redundant bucket.
4. The method for computing similar medical images based on locality sensitive hashing algorithm according to claim 1, wherein the computing of the similar vectors of the single medical image vector in step 4 is as follows:
(1) calculating a hash bucket of a single target medical image vector, assuming a target vector g, randomly selecting m reference vectors from n reference vectors, and calculating for each reference vector according to the algorithm in step 2Get the center barrel i, and calculateAnd obtaining a remainder r, and calculating the hash bucket number of the target vector according to different conditions.
If the value of c multiplied by w is more than or equal to r and less than or equal to (1-c) multiplied by w, no redundant barrel number exists, the algorithm considers that all similarities of the target vectors are in the central barrel, and only the candidate vectors in the central barrel are extracted; and if and only if the number of the candidate vectors in the central bucket is lower than a certain threshold value, randomly extracting the vectors from the hash buckets close to the two sides, and expanding the distance between the two hash buckets at most towards the left and the right in order to ensure the similar correlation. If r is less than c multiplied by w, searching the hash bucket closest to the left side of the central bucket by the algorithm, and searching the distance between at most two hash buckets to the left; if the effective redundant bucket is searched, the candidate vectors in the central bucket and the redundant bucket are extracted at the same time, if the distance between the two hash buckets is searched leftwards, the algorithm considers that the central bucket does not have a left similar hash bucket, and at the moment, the algorithm only extracts the candidate vectors in the central bucket. If r is more than or equal to (1-c) multiplied by w, the algorithm searches the right similar hash bucket by using a left similar same method and extracts the candidate vector. Due to the existence of the redundant hash bucket, repeated vectors may exist in the obtained candidate vector set, and the candidate vector set needs to be subjected to repeated vector elimination processing.
(2) Through the steps, a candidate vector set x 'is obtained by the method'={x′1,x′2,x′3,...,x′mThen, for each vector x 'in the candidate vector set x'iWe call vector x'iAnd a target vector g, and a projection overlap is generated by the target vector and the vector x'iVector pair of compositions (g, x'i) Performing an overlap count plus one; using the above method on each redundant hash bucket set may result in 1 set of candidate vectors,
(3) the method constructs n hash bucket lists, and repeats the operations (1) and (2) on each hash bucket list to obtain a projection overlap count table. For each pair of vectors (g, x ') in the projection overlay technology table'i) If its corresponding overlap count is not less than the specified threshold t, the algorithm considers this vector x'iIs a similarity vector of the target vector.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210019188.9A CN114357220A (en) | 2022-01-07 | 2022-01-07 | Similar medical image calculation method based on locality sensitive hashing algorithm |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210019188.9A CN114357220A (en) | 2022-01-07 | 2022-01-07 | Similar medical image calculation method based on locality sensitive hashing algorithm |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114357220A true CN114357220A (en) | 2022-04-15 |
Family
ID=81107552
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210019188.9A Pending CN114357220A (en) | 2022-01-07 | 2022-01-07 | Similar medical image calculation method based on locality sensitive hashing algorithm |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114357220A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117251641A (en) * | 2023-11-20 | 2023-12-19 | 上海爱可生信息技术股份有限公司 | Vector database retrieval method, system, electronic device and storage medium |
-
2022
- 2022-01-07 CN CN202210019188.9A patent/CN114357220A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117251641A (en) * | 2023-11-20 | 2023-12-19 | 上海爱可生信息技术股份有限公司 | Vector database retrieval method, system, electronic device and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Jourabloo et al. | Pose-invariant 3D face alignment | |
US11288835B2 (en) | Lighttrack: system and method for online top-down human pose tracking | |
JP4545641B2 (en) | Similar image retrieval method, similar image retrieval system, similar image retrieval program, and recording medium | |
CN100566655C (en) | Be used to handle image to determine the method for picture characteristics or analysis candidate | |
EP3204888A1 (en) | Spatial pyramid pooling networks for image processing | |
US11830187B2 (en) | Automatic condition diagnosis using a segmentation-guided framework | |
JP2000215317A (en) | Image processing method and image processor | |
Mai et al. | Comparing salient object detection results without ground truth | |
CN111881804B (en) | Posture estimation model training method, system, medium and terminal based on joint training | |
Oliva et al. | Multilevel thresholding by fuzzy type II sets using evolutionary algorithms | |
US11875898B2 (en) | Automatic condition diagnosis using an attention-guided framework | |
Jiang et al. | Active object detection in sonar images | |
CN114357220A (en) | Similar medical image calculation method based on locality sensitive hashing algorithm | |
CN116721301A (en) | Training method, classifying method, device and storage medium for target scene classifying model | |
CN114119669A (en) | Image matching target tracking method and system based on Shuffle attention | |
Xie et al. | BSSNet: Building subclass segmentation from satellite images using boundary guidance and contrastive learning | |
CN112991394A (en) | KCF target tracking method based on cubic spline interpolation and Markov chain | |
Shi et al. | Combined channel and spatial attention for YOLOv5 during target detection | |
Morsi et al. | Efficient hardware implementation of PSO-based object tracking system | |
Nazarİ et al. | A Deep learning model for image retargetting level detection | |
Khotilin | The technology of constructing an informative feature of a natural hyperspectral image area for the classification problem | |
Gregor et al. | Empirical evaluation of dissimilarity measures for 3d object retrieval with application to multi-feature retrieval | |
JP6942330B2 (en) | Image features and 3D shape search system using them | |
Vijayarani et al. | An efficient algorithm for facial image classification | |
Diecente Cid et al. | Lung graph-model classiffication with SVM and CNN for tuberculosis severity assessment and automatic CT report generation: participation in the ImageCLEF 2019 Tuberculosis Task |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |