CN109766904A - The innovatory algorithm of medical domain image, semantic similarity matrix - Google Patents

The innovatory algorithm of medical domain image, semantic similarity matrix Download PDF

Info

Publication number
CN109766904A
CN109766904A CN201811060272.5A CN201811060272A CN109766904A CN 109766904 A CN109766904 A CN 109766904A CN 201811060272 A CN201811060272 A CN 201811060272A CN 109766904 A CN109766904 A CN 109766904A
Authority
CN
China
Prior art keywords
semantic
image
attribute
node
word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201811060272.5A
Other languages
Chinese (zh)
Inventor
王凯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BENGBU MEDICAL COLLEGE
Original Assignee
BENGBU MEDICAL COLLEGE
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BENGBU MEDICAL COLLEGE filed Critical BENGBU MEDICAL COLLEGE
Priority to CN201811060272.5A priority Critical patent/CN109766904A/en
Publication of CN109766904A publication Critical patent/CN109766904A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • G06F18/24155Bayesian classification

Abstract

The innovatory algorithm of medical domain image, semantic similarity matrix of the present invention, using semantic distance between medical domain image as research object, pass through the similarity relationship map of more strategy matchings, it is proposed that a kind of medical image similar matrix based on coarse semantic probabilistic model extracts modeling method, mainly include four steps: the semantic tagger including S1 based on bayesian probability model, S2 characteristics of image discretization and the reduction of S3 semantic feature and S4 are calculated based on the field similarity model of polymorphic theory.The present invention can effectively improve the accuracy rate of semantic information merging between medical domain image, promote the quality of fusion medical clinic applications domain knowledge base, calculation scale needed for reducing extensive excavation image, semantic information.

Description

The innovatory algorithm of medical domain image, semantic similarity matrix
Technical field
The invention belongs to medicine semantic networks and Knowledge Grid to calculate and retrieval technique field, and in particular to medical domain figure As the innovatory algorithm of semantic similarity matrix.
Background technique
Popularity of the medical domain knowledge because of its application, the increasingly attention by related scholar.Medical Information Resources because Numerous and jumbled, dispersion, isomery and show relatively isolated and be difficult to the situation for meeting user to information requirement, cause in same field Image data base show diversity and conflicting so that can not interoperate between knowledge base in field.
Along with the rapid development of the technologies such as network communication and cloud storage, the information source scale comprising various medical images It is gradually expanded.Implicit, valuable information how is obtained from mass data becomes the new direction of the field of data mining.Figure As sorting technique can determine theme, this method makes for the graphic collection of same or similar Subject Clustering in the form of set It obtains user not having to devote a tremendous amount of time to look for target image with energy, to preferably put into attention interested Image group.However image classification needs premised on measuring the semantic similarity between image, while the figure that machine can identify As basic visual information is limited, it can not match completely with the mankind to the inherent meaning understanding of image, cause at present for image Semantic classification there is a problem of many, and the effect of graphic collection is extremely limited, and whole efficiency is not generally high.
Increase with studying domain knowledge with application, most of domain knowledge base researchs based on image retrieval technologies It organizes to go out different domain knowledge base systems towards different application and developments, there is biggish differences between system.Although these The domain knowledge base system being not quite similar is the concentration description to same domain knowledge, still inevitably includes that many has Repeat semantic pictorial information, cause the waste of limited storage space, seriously reduce the efficiency of medical image semantic retrieval with Accuracy finally not interoperating between each knowledge entity in field, constrains the service efficiency of knowledge significantly.
Summary of the invention
The present invention utilizes Bayesian probability theory, and the feature of discretization, high-ranking military officer are extracted to acquired field image attributes Area image knowledge information source is changed into the keyword set based on semantic tagger, proposes based on can recognize theoretical general of differential matrix Characteristic attribute collection reduction method is read, the calculation scale of attribute reduction is reduced, constructs the field image based on multi-angle semantic distance Knowledge base obtains the similarity calculation based on image, semantic relationship.
To achieve the above object, technical solution of the present invention proposes a kind of life of semantic similarity matrix between field image At method, the present invention is specific as follows:
The innovatory algorithm of medical domain image, semantic similarity matrix transfers medical domain image knowledge library by computer Interior data, and handled as follows:
Step 1. carries out semantic information to the field image in medical domain image knowledge library with bayesian probability model Mark, and weight is assigned to mark word, obtain the mark word for possessing weight.The set of mark word with weight is denoted as " language Adopted vector space ".
Step 2. extracts the feature of discretization to the mark word with weight obtained by step 1, and obtaining includes discretization The tax token of characteristic infuses word.I.e. the tax token note word comprising discretization characteristic is corresponded with corresponding field image.It is described The collection of tax token note word comprising discretization characteristic is collectively referred to as " semantic space of image attributes ".
Step 3. " semantic space of image attributes " obtained to step 2 carries out reduction processing, obtains characteristic attribute, by Characteristic attribute constructs the most simple reduction collection of dimension.The reduction includes four steps: construction can recognize differential matrix, solve distinguishable The core for knowing differential matrix deletes the difference attribute item of recognizable vector, obtains the most simple reduction collection of dimension.
Step 4. by the most simple reduction collection building field image, semantic similarity of dimension that step 3 obtains computation model, Medical domain image, semantic similarity matrix is obtained by the computation model of field image, semantic similarity.
Furtherly, field image refers to the picture in image report.Medical domain image knowledge library is by field image The set of composition.
In step 1, high-ranking military officer's area image carries out region segmentation, is formed image collection { P1, P2 ... }.Using human-computer interaction Mode to field image carry out semantic tagger, formed mark set of words { C1, C2 ... }.Calculating mark set of words C1, C2 ... } in each mark word posterior probability, obtain cum rights value information semantic vector space, the language of the cum rights value information The set of adopted vector space is semantic vector space set.Field image refers to the picture in image report, medical domain figure As knowledge base is by the set of field image construction.
In step 2, using the semantic vector space set with weight obtained by step 1 as input quantity, item is constructed Part decision table.The decision-making value parameter j for calculating image attributes traverses the section of consecutive image attribute by dynamic threshold iteration Endpoint is divided, discrete picture attribute is obtained, by discrete segment by sequence arrangement of successively decreasing, the semantic space of image attributes is obtained, is denoted as Discrete codes set { A1, A2 ... }.
In step 3, using with bidirectional pointer binary tree step 2 is obtained discrete codes set A1, A2 ... } difference attribute stored, and by adjusting first sampling coefficient p and extract the sampling threshold value of function f (θ), Building is variable to know differential matrix, obtains the multistage square matrix of image attributes set cluster element, solves the reduction collection of image attributes dimension red()。
In step 4, by the reduction collection red () of image attributes dimension, mark word nodal community similarity, mark word are calculated Node depth attribute, mark word node asymmetry attribute, longitudinal semantic distance between lateral semantic distance, node between node metric, Asymmetric semantic distance between node metric solves semantic similarity between obtaining image using mark word linear weighted model.
Beneficial technical effect
Technical solution of the present invention aims to solve the problem that the resolution problem of semantic gap between image, accurate in information integration design Semantic distance metric question between the image that degree computational problem and clinical medicine decision differentiate, it is real by the optimization of system level It is semantic-based between existing medical image automatically or semi-automatically to merge.The present invention uses semantic tagger word as image, semantic information Domain knowledge indicate, using with different levels weight attribute as distinguish it is important mark concept feature vector, raising The accuracy rate that domain knowledge indicates reduces unrelated semantic pair of incidence, so that fusion domain knowledge becomes on a large scale It may.
Detailed description of the invention
Fig. 1 is flow chart of the method for the present invention.
Fig. 2 is the specific flow chart of step 1 in Fig. 1.
Fig. 3 is the specific flow chart of step 2 in Fig. 1.
Fig. 4 is the specific flow chart of step 3 in Fig. 1.
Fig. 5 is the specific flow chart of step 4 in Fig. 1.
Specific embodiment
Design philosophy of the invention is: using bayesian probability model by the semantic information that image is hidden to mark word set The dominant expression of the form of conjunction.It is obtained using the semantic weight of Attribute tuning image concept by constructing binary conditions attribute decision table Take Category Attributes value.Using the method that can recognize differential matrix, the calculation scale of reduction mark word.Introduce multi-angle semantic distance Matrix calculate, generative semantics similarity matrix.
The present embodiment system includes field image, semantic information labeling module, conditional decision entropy generation module, marks word about Simple module and matrix computing module, the present invention will be further described with reference to the accompanying drawing.
Referring to Fig. 1, a kind of medical domain image, semantic similarity matrix generation method transfers medical domain by computer Data in image knowledge library, and handled as follows:
Step 1. carries out semantic information to the field image in medical domain image knowledge library with bayesian probability model Mark, and weight is assigned to mark word, obtain the mark word for possessing weight.
The set of mark word with weight is denoted as " semantic vector space ".
Step 2. extracts the feature of discretization to the mark word with weight obtained by step 1, and obtaining includes discretization The tax token of characteristic infuses word.I.e. the tax token note word comprising discretization characteristic is corresponded with corresponding field image.
The collection of the tax token note word comprising discretization characteristic is collectively referred to as " semantic space of image attributes ".
Step 3. " semantic space of image attributes " obtained to step 2 carries out reduction processing, obtains characteristic attribute, by Characteristic attribute constructs the most simple reduction collection of dimension.
The reduction includes four steps: construction can recognize differential matrix, solve the core that can recognize differential matrix, deletion can The difference attribute item of discernibility matrixes obtains the most simple reduction collection of dimension.
Step 4. by the most simple reduction collection building field image, semantic similarity of dimension that step 3 obtains computation model, Medical domain image, semantic similarity matrix is obtained by the computation model of field image, semantic similarity.
Referring to Fig. 1, furtherly, field image refers to the picture in image report.Medical domain image knowledge library be by The set of field image construction.
In step 1, high-ranking military officer's area image carries out region segmentation, is formed image collection { P1, P2 ... }.
Semantic tagger is carried out to field image by the way of human-computer interaction, forms mark set of words { C1, C2 ... }.Meter The posterior probability of each mark word in mark set of words { C1, C2 ... } is calculated, the semantic vector space of cum rights value information is obtained, The set of the semantic vector space of the cum rights value information is semantic vector space set.Field image refers in image report Picture, medical domain image knowledge library are by the set of field image construction.In step 2, the band obtained by step 1 is had the right The semantic vector space set of weight constructs conditional decision table as input quantity.The decision-making value parameter j of image attributes is calculated, is led to It crosses dynamic threshold iteration, traverses the interval division endpoint of consecutive image attribute, obtain discrete picture attribute, by discrete segment by passing Decreasing order arrangement, obtains the semantic space of image attributes, is denoted as discrete codes set { A1, A2 ... }.
In step 3, using with bidirectional pointer binary tree step 2 is obtained discrete codes set A1, A2 ... } difference attribute stored, and by adjusting first sampling coefficient p and extract the sampling threshold value of function f (θ), Building is variable to know differential matrix, obtains the multistage square matrix of image attributes set cluster element, solves the reduction collection of image attributes dimension red()。
In step 4, by the reduction collection red () of image attributes dimension, mark word nodal community similarity, mark word are calculated Node depth attribute, mark word node asymmetry attribute, longitudinal semantic distance between lateral semantic distance, node between node metric, Asymmetric semantic distance between node metric solves semantic similarity between obtaining image using mark word linear weighted model.
Referring to Fig. 1, step 1 is specific to be carried out as follows:
S11: initialization is semantic: by way of human-computer interaction, extracting doctor and the mark of image in image report is believed Breath.The initial weight of the markup information extracted emptied, initialization keyword mark --- i.e. markup information is assigned a value of 0. It is the same to image contributions effect to default all semantic weights.Form semantic tagger set of words.
S12: by user's semantic input computer to be compared.Computer is according to semanteme to be compared to medical domain image knowledge Classification information concept in library is retrieved, and the image retrieved is obtained.
From user to computer typing primary keyword and non-key word.The major key is to utilize semantic tagger word set Close with for the image classification description information of different department, done based on doctor intersection as a result, the foundation of as similarity retrieval. The non-key word and primary keyword mutual exclusion.Computer carries out the image retrieved according to primary keyword and non-key word mutual exclusion Label:
It is to be positively correlated by image tagged relevant to primary keyword.Positive correlation number will be charged to labeled as positively related image Group, and the numeration coefficient variation that will organize interior each image tagged is primary from adding.
Will image relevant to non-key word --- image tagged i.e. unrelated with primary keyword is that retrieval is negatively correlated.It will mark It is denoted as negatively correlated image and charges to negatively correlated array, and the numeration coefficient variation that will organize interior each image tagged is primary from subtracting.
S13: respectively to be positively correlated array, the addition of negatively correlated array is positively correlated image, negatively correlated image, while recording and should The relevant semanteme of array obtains being positively correlated semantic phrase and negatively correlated semantic phrase.
Using watershed algorithm respectively to the semantic phrase of positive correlation and negatively correlated semantic phrase progress region segmentation, will calculate Divide primary attribute of the conditional density function of back zone area image as image build-in attribute out, and then obtains posterior probability values, than The region of probability value greatest measure is relatively filtered out, and with this according to probability value greatest measure sort descending.It is only counted in this step The assignment of preceding 20 weights chooses preceding 20 probability values in the value arranged from large to small, as description corresponding region pair The weight of semantic tagger is answered, semantic tagger is carried out, assigns corresponding weight.It obtains Weight and is positively correlated array.
S14: the sum of the array weight being divided in statistical picture where the positive feedback semantic tagger word in region, as The array weight of this group, while Naive Bayes Classification is done to the maximum semantic tagger word of weight in array, calculate its posteriority Probability, then the final weight in image segmentation region is the sum of its array weight for corresponding to semantic tagger word and posterior probability.
Assuming that target image g is divided into n region, wherein R={ m is used in each region respectively1,m2,m3……mnTable Show, by Bayes formula it is known that Subject Concept collection C={ c corresponding with the region1,c2,c3……cn, wherein The posterior probability of any Subject Concept ci are as follows:P () is condition in above formula Probability, fR() is using the regional ensemble as the marginal probability density function of object.It analyzes known to the formula: fR(m1,m2, m3......mn) it is to indicate all Marginal density function,s image-related with this, it is a constant, if assuming theme mark It is a certain events of equal probability, then divides resulting area condition probability and be equal, thinks so that above formula maximization, only needs so that fR (m1,m2,m3……mn/ci) maximization.In order to obtain the specific distribution situation of the Marginal density function, this point might as well be assumed Cutting between region is mutually indepedent relationship, then the conditional density function is equivalent to following formula:
fR(m1,m2,m3......mn/ci)=fR(m1/ci)×fR(m2/ci)×…×fR(mn/ci)。
S15: the image build-in attribute set that abovementioned steps obtain is done with corresponding Weight semantic tagger word based on general The mapping in rate space connects, and forms semantic vector space.
Referring to fig. 2, step 2 is specific carries out as follows:
S21: the frequency of occurrence that the image build-in attribute in semantic vector space is pressed is sorted from small to large, and by adjacent category Property value is divided into an equivalence class.Based on conditional semantics attribute, the region critical point between above-mentioned equivalence class is traversed out, and with this As the starting point section of initial semantic attribute, calculates and obtain conditional decision entropy.
S22: region critical point is compared with decision entropy: the numerical value of adjacent interval endpoint decision entropy being made the difference, if left Endpoint decision entropy is less than right endpoint decision entropy, then exchanges the left and right semantic attribute value in the section, traversal and each condition language of calculating The conditional information content of adopted attribute, and sort by number decrements, it traverses every time, retains the maximum conditional semantics attribute of numerical value, with this As the direction decision condition that section merges, adjacent interval is merged into single section.
S23: conditional decision entropy is sorted: uses the threshold adjustment methods given threshold based on dynamical feedback: if decision threshold Value is greater than section left end point decision entropy, then semantic attribute corresponding to the decision entropy is added into thick division group.Conversely, this is determined Semantic attribute corresponding to plan entropy is divided into thin division group.By traversing adjacent decision entropy endpoint difference smallest interval, dynamic is adjusted Decision-making value, when decision-making value is no larger than section left end point decision entropy, by semantic attribute corresponding to decision entropy with Discrete segment sequence where semantic attribute in thin division group.In other words, by semantic attribute corresponding to decision entropy do from Dispersion sequence, principle of ordering are that refinement is successively grouped the interior discrete segment adjacent with the semantic attribute to do section merging, up to Decision-making value is equal with section left end point decision entropy, completes the assignment of threshold value.
S24: if step S23 causes conditional decision semantic attribute identical neck occur the modification result of decision-making value Domain section, then cancellation step S23 condition value is exchanged, and is reduced into the original state in this section, and the discrete segment after division is pressed The descending sequence of its left end point is successively encoded from the positive integer greater than 0, and coding section set { A1, A2 ... } is obtained.
If step S23, which does not result in conditional decision semantic attribute to the modification result of decision-making value, there is identical field Section then retains the threshold value of step S23, and the discrete segment after division is pressed the descending sequence of its left end point, successively from greater than 0 Positive integer encoded, obtain coding section set { A1, A2 ... }.
Referring to fig. 4, step 3 is specific carries out as follows:
Firstly, to dispensable attributes collection, cluster Indiscernible relation ' differential matrix can be recognized, simple introduction can not be differentiated As follows: if P is family's equivalence relation, and R ∈ P, IND (P- { R })=IND (P), then relationship R is referred to as P dispensable attributes collection.If Any relationship R in equivalence relation P is dispensable attributes collection, then the independence collection for claiming R to be to rely on equivalence relation P It closes.
If Q, R is cluster Indiscernible relation, andIf ind (Q)=ind (R), then Q be referred to as one of R about Letter collection, is denoted as red (R).In R be formed by collection it is necessary to relationship and be collectively referred to as core about set R, be denoted as core (R).
The definition of differential matrix can be recognized, the angle of dependence divides conditional attribute and decision attribute, and definition is such as Under: for arbitrary attribute x, y ∈ G, knowledge-representation system I={ U, A, V, f }, wherein A=M ∪ { d } is attribute set, subset M and { d } are conditional attribute collection and decision kind set respectively, and f (a, s) can uniquely determine value of the object s about attribute a, can Recognize differential matrix MdIt is expressed as Md(i, j)={ am∈M∧f(am,sn), wherein d (sn)≠d(sm), M in the case of otherd(i, It j) is 0.If information system S=(U, A, R), V are a nonvoid subsets of property set A, then Indiscernible relationClaim x, y can not be differentiated on attribute V.
Specific step is as follows for this step:
S31: the most brief letter collection of the attribute dimensions of initialization discrete codes set { A1, A2 ... } sets red ()=ф, Core ()=ф.With first sampling step-length β, the data in function f (θ) segmentation discrete codes set { A1, A2 ... } are extracted, Obtain sub- code set { U1U2U3..., make sub- code set { U1U2U3... each section | ind (A) |=0.ind (A) it indicates based on the decision entropy under the conditions of set A.
S32: sub- code set { U is constructed1U2U3... conditional attribute equivalence relation collection.And it thus calculates and is classified Gather cluster element mij, 1≤i, j≤θ.θ is the two-dimensional coordinate endpoint for gathering cluster element.
S33: construction can recognize differential matrix: gather cluster element m by classificationijBuilding can recognize differential matrix Md(i, j)= {mij, it is described to recognize differential matrix Md(i, j) is θ rank square matrix, and specific structure is as follows:
Solution can recognize differential matrix MdThe minimum of (i, j) can not reduction core core ().
Wherein, the mode of solution is simply expressed as follows:
S34: judgement can recognize differential matrix MdWhether the order of (i, j) is empty:
If it is empty, then recognizable vector M is deleteddAfter the difference attribute item of (i, j), the most simple reduction collection of dimension is solved, is obtained Reduction discernible attributes set is obtained to close.
If not empty, then the most simple reduction collection of direct solution dimension, acquisition reduction discernible attributes set close.
Specific method for solving is described as follows:
Solve the most simple reduction collection of dimension
Do red (A)=red (A) ∪ { mij}
While(red(A)∈∪i∈M d(i,j))。
S35: difference attribute is closed to reduction discernible attributes set and carries out inconsistent judgement:
Classification inconsistency, which is done, with unused remaining object of sampling judges that difference can be recognized by using in step S34 Discernible attributes set complement of a set element in matrix does classification inconsistency judgement: "
If inconsistent number of objects of classifying is not less than the threshold parameter j based on dynamical feedback, return step S33, dynamic Threshold size in adjustment.
If inconsistent number of objects of classifying is less than the threshold parameter j based on dynamical feedback, by the most simple reduction of dimension Collection exports as a result.
Specific method for solving is described as follows:
If Count ()≤μ, Print (red (A)), β ++ if // inconsistent the number of objects of classification is less than based on dynamic The threshold parameter j of state feedback, then meet the requirements and export the reduction collection, and dynamic is otherwise needed to adjust the size of threshold value.
Return stepS33 (i.e. return step S33).
Referring to Fig. 5, step 4 is specific to be carried out as follows:
Acquisition mark word transverse direction nodal community, mark word longitudinal direction node depth attribute are calculated separately by the result of step 3:
S41: the reduction collection most simple to dimension carries out the processing of semantic tagger set of words, and the semantic tagger set of words is to figure As feature normalization describes.
Based on mark word transverse direction nodal community, calculates and semantic distance and sum it up between image.
The shared attribute amount and difference attribute amount for being included by calculating a pair of of mark word node, measure the language between mark word Adopted distance, semantic distance and the linear positive correlation of shared attribute and the linear negative correlation of difference attribute.
Only consider that the similarity calculation between adjacent node (a, b), semantic distance are equal to the upper layer parent of a node in this step Union is sought between nodal community set and the characteristic set of a node.The direct child node quantity of mark word node is the more, thin to its The mark word semantic description of change is more specific, i.e., the semantic similarity between its contained child class node is just bigger.The mark word section Point attribute information amount impact factor are as follows:
In formula, o (c1,c2) indicate to mark the shared attribute set of word node (c1, c2) union.WithIndicate the difference attribute amount of mark word node (c1, c2).λ, α representation formula adjustment parameter, avoid fraction from being not intended to Justice.
S42: it based on mark word longitudinal direction node depth attribute, calculates and semantic distance and sums it up between image.
The sum of the node depth of any one group of mark word in semantic tree is bigger, and the image attributes which reaches more has Body, semantic similarity distance is smaller, and similarity is higher, and the semanteme that the present invention describes node level by exponential function is similar Degree.The node level impact factor (also referred to as hierarchy factor) of the mark word are as follows:
S43: it is calculated by node level impact factor and obtains asymmetric factor: being based on mark word node asymmetry attribute, It marks in word semantic tree, the semantic similarity marked between word node has asymmetry, i.e. semantic similarity to a certain extent It is directional apart from matching strip.Mark word node and the similarity value of ancestor node are greater than its ancestors and the similarity of child node takes Value, if concept A is the ancestors of concept B, sim (A, B) is less than sim (B, A).Node is to the asymmetric category of (c1, c2) semantic distance Property impact factor are as follows:
S44: with linear method weight method by node attribute information amount impact factor, asymmetry attribute impact factor into Row integration, forms the semantic distance similarity matrix based on multi-angle:
It was found that not accounting for node based on mark word transverse direction nodal community similarity calculating method shares attribute amount The calculating error problem of the mark word node similarity identical, depth is different.Based on based on mark word longitudinal direction node depth attribute The indistinguishable depth of similarity calculating method is identical, shares attribute amount different problems.To optimize above-mentioned model, the present invention is proposed A kind of new mark word linear weighted model:
By marking word linear weighted model
Sim(c1,c2)=ε SimC (c1,c2)+(1-ε)SimDγ(c1,c2) by node attribute information amount impact factor, non- Symmetric properties impact factor is integrated, formed the semantic distance similarity matrix based on multi-angle, realize semantic tagger word it Between similarity distance metric essential characteristic:
In formula, ε indicates weight factor, and adjustment mark word node shares the shadow that attribute amount and depth measure semantic distance It rings.
This formula is 0 to 1 closed interval to (c1, c2) semantic similarity distance metric range, and value is bigger, and semantic distance is cured Closely.
For this formula to when (c1, c2) is mutually same node, semantic similarity value is 1.
This formula is bigger to the shared part of (c1, c2) in mark word semantic tree, i.e., shared attribute is more concentrated, it is semantic away from It is closer from.
The position in mark word semantic tree is deeper to (c1, c2) for this formula, and attribute more gathers to semantic distance is closer.
Referring to fig. 2, furtherly, in step 1, by way of human-computer interaction, by doctor in image report for doctor The description for learning respective image in image library extracts mark set of words, defines this as semantic content according to medicine classification knowledge Collection is combined into the feature vector of description image, semantic information.It regard " image+semantic tagger word " in the image of field as a constituent element Element.
In step 1 beginning keyword mark, be by semantic tagger with can digitized weighted value indicate that initial assignment is 0, constitute the object listing comprising semantic tagger word.
In step 1, it is retrieved using semanteme to be compared with the classification information concept in image library, to the image retrieved It is marked:
" classification information " is the classified description information based on doctor for the image of different department, and to be compared Semantic tagger word does similarity retrieval, and the image is every in retrieval occurs once, just doing forward direction to the semantic tagger word for image occur Label, while marking numeration variable certainly plus primary forward direction.
The image tagged unrelated with theme image is negatively correlated for retrieval, while negative indicia numeration variable is added one certainly Secondary, related image tagged is positive correlation, while negative indicia numeration variable is primary from adding, and then acquisition domain object The vertical dimension information of horizontal peacekeeping, wherein the variation of the corresponding positive label numeration variable of abscissa, ordinate correspond to negative indicia numeration and become The variation of amount.
In this step, array weight of the positive feedback where semantic is increased, is to the number where positive feedback semantic tagger word Group carries out weight static optimization using genetic algorithm, and the initial dynamic regulation coefficient of this group of weighted value is set as normal greater than 1 Number.
In this step, the array weight of negative-feedback is reduced, is the initial dynamic regulation system of reverse adjustment negative-feedback weight Number, and its value is set as the constant less than 1.
In this step, judge whether the subset of attribute of image attributes is that processing method after null value is: if non-empty, addition is new Semantic tagger attribute be added object listing.Conversely, stopping addition.
Using the semantic tagger word quantity of image as the foundation for measuring the image, semantic range, if the semantic tagger of image a Set of words is just the proper subclass of the semantic tagger set of words of another image b, then defines the subset that image a is image b.
In this step, region segmentation is carried out to object region using watershed algorithm, the region after calculating segmentation The conditional density function of image obtains posterior probability values, compares the region for filtering out probability value greatest measure, and with this according to number It is worth sort descending, weighted value is assigned to corresponding region.
Referring to Fig. 3, furtherly, conditional decision table described in step 2 is by object set and conditional decision entropy institute structure At binary crelation, wherein primary condition semantic attribute X and conditional decision Y constitute object-based equivalence relation, conditional decision Under the premise of entropy H (Y | X) indicates known conditions semantic attribute X, the degree of roughness of conditional decision Y.The calculating side of conditional decision table Method are as follows:Furtherly, conditional decision table is that one kind is determined in tradition On the basis of plan table, introduce using conditional probability as the Symbolic Representation method of decision Rule of judgment, condition used in the present invention Decision table is by mark set of words, Probability Condition rule set and the formed triple of operation behavior set, the core of the triple It is the binary crelation constituted based on semantic tagger set of words and conditional decision entropy, wherein conditional decision entropy is fusion rough set reason The probability in percent of the Rule Expression of support indicates in, defines in conjunction with the correlation rule conditional probabilityization of confidence level, construction The equivalence relation of image, it is several that the adjacent image of semantic tagger word, which is divided into an equivalence class for continuous semantic attribute cutting, Category Attributes value, constructing semantic attribute segmentation boundary.
In this step, semantic attribute is sorted from small to large by frequency of occurrence, and adjacent object is divided into an equivalence Class is based on conditional semantics attribute, the region critical point between above-mentioned equivalence class is traversed out, in this, as the area of initial semantic attribute Between endpoint.To it is each it is discrete after image, semantic attribute calculate its conditional decision entropy, and compare adjacent interval conditional letter The size of breath amount.
Defining for the size of the information content is as follows: using the semantic tagger word of image as its conditional semantics attribute, and will The ratio of conditional semantics attribute and decision semantic attribute is expressed as the conditional information content of adjacent interval union operation, for what is given Conditional decision table, above-mentioned ratio is bigger, shows that the conditional semantics attribute is more important to decision semantic attribute.Traversal is each with calculating The conditional information content of conditional semantics attribute, and sort by number decrements, if there is the identical situation of numerical value, according to equivalence class The descending primary arrangement of breakpoint quantity eliminates the conditional semantics attribute of arrangement position rearward.In this step, traverse every time, Retain the maximum conditional semantics attribute of numerical value, the direction decision condition merged in this, as section.
Minimum number is divided by conditional decision table, while conditional decision entropy being sorted, and by setting decision-making value, by condition Decision entropy is divided into coarseness group and fine granularity group, selects the smallest language of two-end-point difference in end-point condition decision entropy section every time Adopted attribute --- the lesser value of conditional semantics attribute adjacent in conditional decision table is replaced with into biggish numerical value.Condition is determined Adjacent conditional semantics attribute value is exchanged in plan table, guarantees that the left end point value in the field is consistently greater than right end point value, to prevent Overfitting.If modification result causes numerical value conflict, i.e. modification result causes conditional decision semantic attribute to occur identical Field section is then deleted and is this time modified, field section is exchanged again, is reduced into the original state in this section.Finally, will divide Discrete segment afterwards presses the descending sequence of its left end point, is successively encoded from the positive integer greater than 0.
Referring to fig. 4, furtherly, definition in step 2, does not mark word institute by the remaining of cut-in semantic tagger set of words A The set of composition is semantic tagger subset B.
Feature discretization is carried out to information table data, is that difference language is carried out to set A-B using the binary tree of bidirectional pointer The storage of adopted attribute item reduces the memory space of data, and by the conditional decision table in step 2, by continuous semantic attribute It is divided into the discrete segment by the descending sequence of weight.
First sampling step-length, the first sampling coefficient being manually entered construct son letter to extract segmentation information table data The conditional semantics Attribute Equivalence set of relations of table is ceased, foundation can recognize differential matrix classification set cluster element, constitute multistage square matrix, i.e., Construction can recognize differential matrix.The core of differential matrix can be recognized by differential matrix solution can be recognized.Search the item of recognizable vector The difference attribute item of part semantic attribute is to seek out all and conditional semantics Attribute Equivalence class incoherent attribute item of core, will The attribute item is stored in individual binary tree, and is established an attribute beta pruning for each decision semantic attribute and be associated with binary tree.
The difference attribute item of the conditional semantics attribute of recognizable vector is to solve for the most simple reduction collection judgment basis of dimension: It is associated with number of the weight less than 2 in the mark word node of binary tree by traversal attribute beta pruning, summing junction quantity uses simultaneously Unused remaining object of sampling does classification inconsistency judgement.
Referring to Fig. 5, furtherly, in step 4, according to the division relationship between image and mark word by its dualization, i.e., Mark word semantic tree is constructed using binary crelation.Attributive character is determined according to hierarchical structure of the semantic tagger word in semantic tree Set.Wherein, the hierarchical structure is that each layer of semantic tagger word node and the conditional semantics attribute one of the binary tree are a pair of It answers.If it is empty attributive character set judges the right child of the upper layer node of certain node, then this conditional semantics attribute must by traversal There are the attribute beta prunings of a non-empty to be associated with binary tree.
Then, semantic distance is measured in terms of node attribute information amount, node level and node asymmetry three It is extended, wherein
The direct child class node quantity that node attribute information figureofmerit note word semantic tree extension mark word node c is included, It is denoted as o (c).Node attribute information amount impact factor are as follows:
In formula, degree (anc12) indicate concept node 1,2 child node quantity.Degree (fc) indicates to be based on the node institute The maximum value of each brotgher of node degree in layer in lattice structure.
Node level refers to that in based on expressed mark word semantic tree, the binary crelation with paritially ordered set marks if it exists Word level lattice structure, then extension marks the sum of number of edges included in the shortest path of word node and root node in tree.Each cross To the refinement expression that level mark word node is to upper layer node, the level where node is bigger, the content that mark vocabulary reaches More specific, inherent semantic attribute is abundanter.If arbitrarily mark word node semantics apart from identical, mark the node depth and more of word Greatly, the semantic similarity distance marked between image expressed by word is more smaller.Node level semantic distance impact factor:
In formula, Depth (C) index infuses the summing function of word node depth.
Node asymmetry is for node to (A, B), if meeting Sim (A, B) ≠ Sim (B, A), then claims the node pair (A, B) is asymmetric node.It is proposed asymmetric semantic distance impact factor:
By introducing lateral node transparency operator, longitudinal node depth operator and asymmetric operator, make final semanteme Measuring similarity result is more accurate.Wherein, lateral node transparency operator uses the lateral node transparency based on mark word For attribute as input, longitudinal node depth operator uses longitudinal node depth attribute based on mark word as input, asymmetric Operator, as input, by semantic distance between calculating image and sums it up using based on mark word node asymmetry attribute, uses line Property weighting method output attribute value, formed the semantic distance similarity matrix based on multi-angle.

Claims (9)

1. the innovatory algorithm of medical domain image, semantic similarity matrix, it is characterised in that: transfer medical domain by computer Data in image knowledge library, and handled as follows:
Step 1. carries out the mark of semantic information with bayesian probability model to the field image in medical domain image knowledge library, And weight is assigned to mark word, obtain the mark word for possessing weight;The set of mark word with weight is denoted as " semantic vector Space ";
Step 2. extracts the feature of discretization to the mark word with weight obtained by step 1, and obtaining includes discretization characteristic Tax token infuse word;I.e. the tax token note word comprising discretization characteristic is corresponded with corresponding field image;It is described to include The collection of the tax token note word of discretization characteristic is collectively referred to as " semantic space of image attributes ";
Step 3. " semantic space of image attributes " obtained to step 2 carries out reduction processing, characteristic attribute is obtained, by feature Attribute constructs the most simple reduction collection of dimension;The reduction includes four steps: construction can recognize differential matrix, and solution can recognize difference The core of other matrix deletes the difference attribute item of recognizable vector, obtains the most simple reduction collection of dimension;
Step 4. is passed through by the computation model of the most simple reduction collection building field image, semantic similarity of dimension that step 3 obtains The computation model of field image, semantic similarity obtains medical domain image, semantic similarity matrix;
Step 2 is specific to be carried out as follows:
S21: the frequency of occurrence that the image build-in attribute in semantic vector space is pressed is sorted from small to large, and by neighboring property values It is divided into an equivalence class;Based on conditional semantics attribute, the region critical point between above-mentioned equivalence class is traversed out, and in this, as The starting point section of initial semantic attribute, calculates and obtains conditional decision entropy;
S22: region critical point is compared with decision entropy: the numerical value of adjacent interval endpoint decision entropy being made the difference, if left end point Decision entropy is less than right endpoint decision entropy, then exchanges the left and right semantic attribute value in the section, traversal and each conditional semantics category of calculating Property conditional information content, and by number decrements sort, every time traverse, retain the maximum conditional semantics attribute of numerical value, in this, as The direction decision condition that section merges, merges into single section for adjacent interval;
S23: conditional decision entropy is sorted: uses the threshold adjustment methods given threshold based on dynamical feedback: if decision-making value is big In section left end point decision entropy, then semantic attribute corresponding to the decision entropy is added into thick division group;Conversely, by the decision entropy Corresponding semantic attribute is divided into thin division group;By traversing adjacent decision entropy endpoint difference smallest interval, dynamic adjusts decision Threshold value, when decision-making value is no larger than section left end point decision entropy, by semantic attribute corresponding to decision entropy and thin stroke Discrete segment sequence where semantic attribute in grouping;
S24: if step S23 causes conditional decision semantic attribute identical domain area occur the modification result of decision-making value Between, then cancellation step S23 condition value is exchanged, and is reduced into the original state in this section, and the discrete segment after division is left by it The descending sequence of endpoint is successively encoded from the positive integer greater than 0, and coding section set { A1, A2 ... } is obtained;
If step S23, which does not result in conditional decision semantic attribute to the modification result of decision-making value, there is identical domain area Between, then retain the threshold value of step S23, the discrete segment after division is pressed into its left end point descending sequence, successively from greater than 0 Positive integer is encoded, and coding section set { A1, A2 ... } is obtained.
2. the innovatory algorithm of medical domain image, semantic similarity matrix according to claim 1, it is characterised in that:
In step 1, high-ranking military officer's area image carries out region segmentation, is formed image collection { P1, P2 ... };
Semantic tagger is carried out to field image by the way of human-computer interaction, forms mark set of words { C1, C2 ... };Calculate mark The posterior probability of each mark word in set of words { C1, C2 ... } is infused, the semantic vector space of cum rights value information, the band are obtained The set of the semantic vector space of value information is semantic vector space set;Field image refers to the figure in image report Piece, medical domain image knowledge library are by the set of field image construction;In step 2, weight is had by what is obtained by step 1 Semantic vector space set as input quantity, construct conditional decision table;The decision-making value parameter j for calculating image attributes, passes through Dynamic threshold iteration traverses the interval division endpoint of consecutive image attribute, obtains discrete picture attribute, by discrete segment by successively decreasing Sequence arrangement, obtains the semantic space of image attributes, is denoted as discrete codes set { A1, A2 ... };
In step 3, the discrete codes set { A1, A2 ... } step 2 obtained using the binary tree with bidirectional pointer Difference attribute is stored, and by adjusting first sampling coefficient p and the sampling threshold value of extraction function f (θ), constructing, which can be changed, knows Differential matrix obtains the multistage square matrix of image attributes set cluster element, solves the reduction collection red () of image attributes dimension;
In step 4, by the reduction collection red () of image attributes dimension, mark word nodal community similarity, mark word node are calculated Depth attribute, mark word node asymmetry attribute, longitudinal semantic distance, measurement between lateral semantic distance, node between node metric Asymmetric semantic distance between node solves semantic similarity between obtaining image using mark word linear weighted model.
3. the innovatory algorithm of medical domain image, semantic similarity matrix according to claim 1 or 2, it is characterised in that: Step 1 is specific to be carried out as follows:
S11: initialization is semantic: by way of human-computer interaction, extracting doctor for the markup information of image in image report;It will The initial weight of the markup information extracted empties, and initialization keyword mark --- i.e. markup information is assigned a value of 0;Default institute There is semantic weight the same to image contributions effect;Form semantic tagger set of words;
S12: by user's semantic input computer to be compared;Computer is according to semanteme to be compared in medical domain image knowledge library Classification information concept retrieved, obtain the image retrieved;
From user to computer typing primary keyword and non-key word;The major key, be using semantic tagger set of words with Based on doctor for the image classification description information of different department, do intersection as a result, the foundation of as similarity retrieval;It is described Non-key word and primary keyword mutual exclusion;Computer marks the image retrieved according to primary keyword and non-key word mutual exclusion Note:
It is to be positively correlated by image tagged relevant to primary keyword;Positive correlation array will be charged to labeled as positively related image, and The numeration coefficient variation of each image tagged is primary from adding in organizing;
Will image relevant to non-key word --- image tagged i.e. unrelated with primary keyword is that retrieval is negatively correlated;It will be labeled as Negatively correlated image charges to negatively correlated array, and the numeration coefficient variation that will organize interior each image tagged is primary from subtracting;
S13: image, negatively correlated image, while record and the array are positively correlated to positive correlation array, negatively correlated array addition respectively Relevant semanteme obtains being positively correlated semantic phrase and negatively correlated semantic phrase;
Using watershed algorithm respectively to semantic phrase is positively correlated and negatively correlated semantic phrase carries out region segmentation, will calculate point Primary attribute of the conditional density function of back zone area image as image build-in attribute is cut, and then obtains posterior probability values, compares sieve The region of probability value greatest measure is selected, and with this according to probability value greatest measure sort descending;Preceding 20 are only counted in this step The assignment of a weight chooses preceding 20 probability values in the value arranged from large to small, correspond to language as description corresponding region The weight of justice mark, carries out semantic tagger, assigns corresponding weight;It obtains Weight and is positively correlated array;
S14: the sum of the array weight being divided in statistical picture where the positive feedback semantic tagger word in region, as this group Array weight, while Naive Bayes Classification is done to the maximum semantic tagger word of weight in array, calculates its posterior probability, Then the final weight in image segmentation region is the sum of its array weight for corresponding to semantic tagger word and posterior probability;
S15: the image build-in attribute set that abovementioned steps obtain is done with corresponding Weight semantic tagger word based on probability sky Between mapping connection, formed semantic vector space.
4. the innovatory algorithm of medical domain image, semantic similarity matrix according to claim 3, it is characterised in that: step 3 specific progress as follows:
S31: the most brief letter collection of the attribute dimensions of initialization discrete codes set { A1, A2 ... } sets red ()=ф, core () =ф;With first sampling step-length β, the data in function f (θ) segmentation discrete codes set { A1, A2 ... } are extracted, son is obtained and compiles Code collection closes { U1U2U3..., make sub- code set { U1U2U3... each section | ind (A) |=0;Ind (A) indicates base Decision entropy under the conditions of set A;
S32: sub- code set { U is constructed1U2U3... conditional attribute equivalence relation collection;And it thus calculates and obtains classification set Cluster element mij, 1≤i, j≤θ;θ is the two-dimensional coordinate endpoint for gathering cluster element;
S33: construction can recognize differential matrix: gather cluster element m by classificationijBuilding can recognize differential matrix Md(i, j)={ mij, It is described to recognize differential matrix Md(i, j) is θ rank square matrix, specific structure are as follows:
Solution can recognize differential matrix MdThe minimum of (i, j) can not reduction core core ();
S34: judgement can recognize differential matrix MdWhether the order of (i, j) is empty:
If it is empty, then recognizable vector M is deleteddAfter the difference attribute item of (i, j), the most simple reduction collection of dimension is solved, reduction is obtained Discernible attributes set closes;
If not empty, then the most simple reduction collection of direct solution dimension, acquisition reduction discernible attributes set close;
S35: difference attribute is closed to reduction discernible attributes set and carries out inconsistent judgement:
Classification inconsistency, which is done, with unused remaining object of sampling judges that differential matrix can be recognized by using in step S34 Middle discernible attributes set complement of a set element does classification inconsistency judgement:;
If inconsistent number of objects of classifying is not less than the threshold parameter j based on dynamical feedback, return step S33, dynamic is adjusted In threshold size;
If inconsistent number of objects of classifying is less than the threshold parameter j based on dynamical feedback, the most simple reduction collection of dimension is made For result output.
5. the innovatory algorithm of medical domain image, semantic similarity matrix according to claim 4, it is characterised in that: step 4 specific progress as follows:
Acquisition mark word transverse direction nodal community, mark word longitudinal direction node depth attribute: S41: right are calculated separately by the result of step 3 The most simple reduction collection of dimension carries out the processing of semantic tagger set of words, and the semantic tagger set of words is retouched to characteristics of image standardization It states;
Based on mark word transverse direction nodal community, calculates and semantic distance and sum it up between image;
The shared attribute amount and difference attribute amount for being included by calculating a pair of of mark word node, measure between mark word it is semantic away from From semantic distance and the linear positive correlation of shared attribute and the linear negative correlation of difference attribute;
Only consider that the similarity calculation between adjacent node (a, b), semantic distance are equal to the upper layer class node of a node in this step Union is sought between attribute set and the characteristic set of a node;The direct child node quantity for marking word node the more, refines it Mark word semantic description is more specific, i.e., the semantic similarity between its contained child class node is just bigger;The mark word node category Property information content impact factor are as follows:
In formula, o (c1,c2) indicate to mark the shared attribute set of word node (c1, c2) union;With Indicate the difference attribute amount of mark word node (c1, c2);λ, α representation formula adjustment parameter, avoid fraction meaningless;
S42: it based on mark word longitudinal direction node depth attribute, calculates and semantic distance and sums it up between image;
The sum of the node depth of any one group of mark word in semantic tree is bigger, and the image attributes which reaches is more specific, Semantic similarity distance is smaller, and similarity is higher, and the present invention describes the semantic similarity of node level by exponential function;It should Mark the node level impact factor of word are as follows:
S43: it is calculated by node level impact factor and obtains asymmetric factor: based on mark word node asymmetry attribute, marked In word semantic tree, the semantic similarity marked between word node has asymmetry, i.e. semantic similarity distance to a certain extent Matching strip is directional;Mark word node and the similarity value of ancestor node are greater than the similarity value of its ancestors and child node, If concept A is the ancestors of concept B, sim (A, B) is less than sim (B, A).Node is to (c1, c2) semantic distance asymmetry attribute shadow Ring the factor are as follows:
S44: node attribute information amount impact factor, asymmetry attribute impact factor carried out with the method that linear method weights whole It closes, forms the semantic distance similarity matrix based on multi-angle:
By marking word linear weighted model
Sim(c1,c2)=ε SimC (c1,c2)+(1-ε)SimDγ(c1,c2) by node attribute information amount impact factor, asymmetric The properties affect factor is integrated, and the semantic distance similarity matrix based on multi-angle is formed, and realizes phase between semantic tagger word Like the essential characteristic of degree distance metric:
In this formula, ε indicates weight factor, and adjustment mark word node shares the influence that attribute amount and depth measure semantic distance;
This formula is 0 to 1 closed interval to (c1, c2) semantic similarity distance metric range, and value is bigger, and semantic distance is closer;
For this formula to when (c1, c2) is mutually same node, semantic similarity value is 1;
This formula is bigger to the shared part of (c1, c2) in mark word semantic tree, i.e., shared attribute is more concentrated, and semantic distance is cured Closely;
The position in mark word semantic tree is deeper to (c1, c2) for this formula, and attribute more gathers to semantic distance is closer.
6. the innovatory algorithm of medical domain image, semantic similarity matrix according to claim 3, it is characterised in that:
In step 1, by way of human-computer interaction, doctor in image report retouches respective image in medical image library It states as semantic content, according to medicine classification knowledge, extracts mark set of words, define this collection and be combined into description image, semantic information Feature vector;It regard " image+semantic tagger word " in the image of field as one group of element;
In step 1 beginning keyword mark, be by semantic tagger with can digitized weighted value indicate, initial assignment 0, structure At the object listing comprising semantic tagger word;
In step 1, it is retrieved using semanteme to be compared with the classification information concept in image library, the image retrieved is carried out Label:
" classification information " is the classified description information based on doctor for the image of different department, with semanteme to be compared Mark word does similarity retrieval, and the image is every in retrieval occurs once, just doing positive label to the semantic tagger word for image occur, Mark numeration variable certainly plus primary forward direction simultaneously;
It is retrieval negative correlation by the image tagged unrelated with theme image, while negative indicia numeration variable is certainly plus primary, has Associated image tagged is to be positively correlated, while negative indicia numeration variable is added horizontal dimension that is primary, and then obtaining domain object certainly With vertical dimension information, wherein the variation of the corresponding positive label numeration variable of abscissa, ordinate correspond to negative indicia numeration variable Variation;
In this step, array weight of the positive feedback where semantic is increased, is to the array where positive feedback semantic tagger word, benefit Weight static optimization is carried out with genetic algorithm, and the initial dynamic regulation coefficient of this group of weighted value is set as the constant greater than 1;
In this step, the array weight of negative-feedback is reduced, is the initial dynamic regulation coefficient of reverse adjustment negative-feedback weight, and Its value is set as the constant less than 1;
In this step, judges whether the subset of attribute of image attributes is that processing method after null value is: if non-empty, adding new language Object listing is added in the attribute of justice mark;Conversely, stopping addition;
Using the semantic tagger word quantity of image as the foundation for measuring the image, semantic range, if the semantic tagger word set of image a The proper subclass just for the semantic tagger set of words of another image b is closed, then defines the subset that image a is image b;
In this step, region segmentation is carried out to object region using watershed algorithm, the area image after calculating segmentation Conditional density function, obtain posterior probability values, compare the region for filtering out probability value greatest measure, and pass according to numerical value with this Emission reduction sequence assigns weighted value to corresponding region.
7. the innovatory algorithm of medical domain image, semantic similarity matrix according to claim 1, it is characterised in that:
Conditional decision table described in step 2 is the binary crelation being made of object set and conditional decision entropy, wherein initially The object-based equivalence relation of conditional semantics attribute X and conditional decision Y composition, and conditional decision entropy H (Y | X) indicate known conditions language Under the premise of adopted attribute X, the degree of roughness of conditional decision Y;The calculation method of conditional decision table is as follows:
In this step, semantic attribute is sorted from small to large by frequency of occurrence, and adjacent object is divided into an equivalence class, base In conditional semantics attribute, the region critical point between above-mentioned equivalence class is traversed out, in this, as the section end of initial semantic attribute Point;To it is each it is discrete after image, semantic attribute calculate its conditional decision entropy, and compare adjacent interval conditional information content Size;
Defining for the size of the information content is as follows: using the semantic tagger word of image as its conditional semantics attribute, and by condition The ratio of semantic attribute and decision semantic attribute is expressed as the conditional information content of adjacent interval union operation, for given condition Decision table, above-mentioned ratio is bigger, shows that the conditional semantics attribute is more important to decision semantic attribute;Traversal and each condition of calculating The conditional information content of semantic attribute, and sort by number decrements, if there is the identical situation of numerical value, according to the breakpoint of equivalence class The descending primary arrangement of quantity eliminates the conditional semantics attribute of arrangement position rearward;It in this step, traverses, retains every time The maximum conditional semantics attribute of numerical value, the direction decision condition merged in this, as section;
Minimum number is divided by conditional decision table, while conditional decision entropy being sorted, and by setting decision-making value, by conditional decision Entropy is divided into coarseness group and fine granularity group, selects the smallest semantic category of two-end-point difference in end-point condition decision entropy section every time Property --- the lesser value of conditional semantics attribute adjacent in conditional decision table is replaced with into biggish numerical value;By conditional decision table In adjacent conditional semantics attribute value exchange, guarantee that the left end point value in the field is consistently greater than right end point value, to prevent excessively Fitting;If modification result causes numerical value conflict, i.e. modification result causes conditional decision semantic attribute identical field occur Section is then deleted and is this time modified, field section is exchanged again, is reduced into the original state in this section;Finally, by after division Discrete segment presses the descending sequence of its left end point, is successively encoded from the positive integer greater than 0.
8. the innovatory algorithm of medical domain image, semantic similarity matrix according to claim 4, it is characterised in that:
In step 2, the set not constituted by the remaining mark word of cut-in semantic tagger set of words A, is semantic tagger for definition Subset B;
Feature discretization is carried out to information table data, is that difference semanteme category is carried out to set A-B using the binary tree of bidirectional pointer Property item storage, reduce the memory space of data, and by the conditional decision table in step 2, continuous semantic attribute is divided At the discrete segment for pressing the descending sequence of weight;
First sampling step-length, the first sampling coefficient being manually entered construct sub-information table to extract segmentation information table data Conditional semantics Attribute Equivalence set of relations, foundation can recognize differential matrix classification set cluster element, constitute multistage square matrix, that is, construct Differential matrix can be recognized;The core of differential matrix can be recognized by differential matrix solution can be recognized;Search the condition language of recognizable vector The difference attribute item of adopted attribute is to seek out all and conditional semantics Attribute Equivalence class incoherent attribute item of core, by the category Property item be stored in individual binary tree, and establish attribute beta pruning for each decision semantic attribute and be associated with binary tree;
The difference attribute item of the conditional semantics attribute of recognizable vector is to solve for the most simple reduction collection judgment basis of dimension: passing through Number of the weight less than 2 in the mark word node of attribute beta pruning association binary tree, summing junction quantity are traversed, while using sampling Unused residue object does classification inconsistency judgement.
9. the innovatory algorithm of medical domain image, semantic similarity matrix according to claim 5, it is characterised in that:
In step 4, according to the division relationship between image and mark word by its dualization, that is, binary crelation construction mark word is utilized Semantic tree;Attributive character set is determined according to hierarchical structure of the semantic tagger word in semantic tree;Wherein, the level knot Structure is that each layer of semantic tagger word node of the binary tree and conditional semantics attribute correspond;Attributive character set, by time It goes through, judges the right child of the upper layer node of certain node if it is empty, then this conditional semantics attribute must have the attribute beta pruning of a non-empty It is associated with binary tree;
Then, semantic distance is measured in terms of node attribute information amount, node level and node asymmetry three and is carried out Extension, wherein
The direct child class node quantity that node attribute information figureofmerit note word semantic tree extension mark word node c is included, is denoted as o (c);Node attribute information amount impact factor are as follows:
In formula, degree (anc12) indicate concept node 1,2 child node quantity;Degree (fc) is indicated based on layer where the node The maximum value of each brotgher of node degree in interior lattice structure;
Node level refers to that in based on expressed mark word semantic tree, the binary crelation with paritially ordered set marks word layer if it exists Secondary lattice structure, then extension marks the sum of number of edges included in the shortest path of word node and root node in tree;Each transverse layers Secondary mark word node is the refinement expression to upper layer node, and the level where node is bigger, and the content that mark vocabulary reaches more has Body, inherent semantic attribute are abundanter.If arbitrarily mark word node semantics apart from identical, mark the node depth of word and bigger, mark The semantic similarity distance infused between image expressed by word is more smaller;Node level semantic distance impact factor:
In formula, Depth (C) index infuses the summing function of word node depth;
Node asymmetry is for node to (A, B), if meeting Sim (A, B) ≠ Sim (B, A), then claims the node to (A, B) For asymmetric node.It is proposed asymmetric semantic distance impact factor:
By introducing lateral node transparency operator, longitudinal node depth operator and asymmetric operator, make final semantic similar It is more accurate to spend measurement results;Wherein, lateral node transparency operator uses the lateral node transparency property based on mark word As input, longitudinal node depth operator uses longitudinal node depth attribute based on mark word as input, asymmetric operator Using based on word node asymmetry attribute is marked as input, by semantic distance between calculating image and sums it up, added with linear The method output attribute value of power forms the semantic distance similarity matrix based on multi-angle.
CN201811060272.5A 2015-07-27 2015-07-27 The innovatory algorithm of medical domain image, semantic similarity matrix Withdrawn CN109766904A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811060272.5A CN109766904A (en) 2015-07-27 2015-07-27 The innovatory algorithm of medical domain image, semantic similarity matrix

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811060272.5A CN109766904A (en) 2015-07-27 2015-07-27 The innovatory algorithm of medical domain image, semantic similarity matrix
CN201510455087.6A CN105184307B (en) 2015-07-27 2015-07-27 A kind of generation method of medical domain image, semantic similarity matrix

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
CN201510455087.6A Division CN105184307B (en) 2015-07-27 2015-07-27 A kind of generation method of medical domain image, semantic similarity matrix

Publications (1)

Publication Number Publication Date
CN109766904A true CN109766904A (en) 2019-05-17

Family

ID=54906371

Family Applications (2)

Application Number Title Priority Date Filing Date
CN201811060272.5A Withdrawn CN109766904A (en) 2015-07-27 2015-07-27 The innovatory algorithm of medical domain image, semantic similarity matrix
CN201510455087.6A Active CN105184307B (en) 2015-07-27 2015-07-27 A kind of generation method of medical domain image, semantic similarity matrix

Family Applications After (1)

Application Number Title Priority Date Filing Date
CN201510455087.6A Active CN105184307B (en) 2015-07-27 2015-07-27 A kind of generation method of medical domain image, semantic similarity matrix

Country Status (1)

Country Link
CN (2) CN109766904A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110910991A (en) * 2019-11-21 2020-03-24 张军 Medical automatic image processing system
CN112580509A (en) * 2020-12-18 2021-03-30 中国民用航空总局第二研究所 Logical reasoning type road surface detection method and system
CN113094445A (en) * 2021-03-15 2021-07-09 北京工业大学 Task state brain image resource multidimensional labeling and organizing method based on semantic vector

Families Citing this family (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107229640A (en) * 2016-03-24 2017-10-03 阿里巴巴集团控股有限公司 Similarity processing method, object screening technique and device
CN106354715B (en) * 2016-09-28 2019-04-16 医渡云(北京)技术有限公司 Medical vocabulary processing method and processing device
CN107145910A (en) * 2017-05-08 2017-09-08 京东方科技集团股份有限公司 Performance generation system, its training method and the performance generation method of medical image
CN107315772B (en) * 2017-05-24 2019-08-16 北京邮电大学 The problem of based on deep learning matching process and device
CN108171568B (en) * 2017-12-11 2021-12-17 武汉纺织大学 Garment recommendation method and system based on knowledge base
CN107967472A (en) * 2017-12-11 2018-04-27 深圳市唯特视科技有限公司 A kind of search terms method encoded using dynamic shape
CN108710649A (en) * 2018-04-29 2018-10-26 蚌埠医学院 A kind of medicine AFR control makes up method
CN110827989B (en) * 2018-08-14 2022-07-12 上海明品医学数据科技有限公司 Control method for processing medical data based on key factors
CN110827945B (en) * 2018-08-14 2022-05-27 上海明品医学数据科技有限公司 Control method for generating key factors based on medical data
CN109145906B (en) * 2018-08-31 2020-04-24 北京字节跳动网络技术有限公司 Target object image determination method, device, equipment and storage medium
CN109830285B (en) * 2019-01-07 2022-12-27 东软医疗系统股份有限公司 Medical image file processing method and device
CN109977923B (en) * 2019-04-12 2020-12-29 江西科技学院 Driver gender detection method and system based on electroencephalogram signals
CN111984765B (en) * 2019-05-21 2023-10-24 南京大学 Knowledge base question-answering process relation detection method and device
CN110491519B (en) 2019-07-17 2024-01-02 上海明品医学数据科技有限公司 Medical data checking method
CN110649597A (en) * 2019-09-06 2020-01-03 国网山东省电力公司寿光市供电公司 RBF neural network-based power distribution network feeder automation control method
CN112508966B (en) * 2020-10-27 2021-08-24 北京科技大学 Interactive image segmentation method and system
CN113096796B (en) * 2021-04-01 2022-09-02 四川大学华西医院 Intelligent prediction system and method for cerebral hemorrhage hematoma expansion risk

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7644052B1 (en) * 2006-03-03 2010-01-05 Adobe Systems Incorporated System and method of building and using hierarchical knowledge structures
CN101963995B (en) * 2010-10-25 2012-02-01 哈尔滨工程大学 Image marking method based on characteristic scene
US8903198B2 (en) * 2011-06-03 2014-12-02 International Business Machines Corporation Image ranking based on attribute correlation
CN102663010A (en) * 2012-03-20 2012-09-12 复旦大学 Personalized image browsing and recommending method based on labelling semantics and system thereof

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110910991A (en) * 2019-11-21 2020-03-24 张军 Medical automatic image processing system
CN112580509A (en) * 2020-12-18 2021-03-30 中国民用航空总局第二研究所 Logical reasoning type road surface detection method and system
CN112580509B (en) * 2020-12-18 2022-04-15 中国民用航空总局第二研究所 Logical reasoning type road surface detection method and system
CN113094445A (en) * 2021-03-15 2021-07-09 北京工业大学 Task state brain image resource multidimensional labeling and organizing method based on semantic vector

Also Published As

Publication number Publication date
CN105184307B (en) 2018-10-30
CN105184307A (en) 2015-12-23

Similar Documents

Publication Publication Date Title
CN105184307B (en) A kind of generation method of medical domain image, semantic similarity matrix
Monath et al. Gradient-based hierarchical clustering using continuous representations of trees in hyperbolic space
Peters et al. Soft clustering–fuzzy and rough approaches and their extensions and derivatives
CN102364498B (en) Multi-label-based image recognition method
Lerman Foundations and methods in combinatorial and statistical data analysis and clustering
TW201426578A (en) Generation method and device and risk assessment method and device for anonymous dataset
CN109409128A (en) A kind of Mining Frequent Itemsets towards difference secret protection
CN106991446A (en) A kind of embedded dynamic feature selection method of the group policy of mutual information
Long et al. Hierarchical community structure preserving network embedding: A subspace approach
CN106709037A (en) Movie recommendation method based on heterogeneous information network
CN103778206A (en) Method for providing network service resources
CN106326923A (en) Sign-in position data clustering method in consideration of position repetition and density peak point
CN108960335A (en) One kind carrying out efficient clustering method based on large scale network
Zhang et al. Three-way clustering method for incomplete information system based on set-pair analysis
Wang et al. An efficient algorithm for distributed outlier detection in large multi-dimensional datasets
Leung Big data mining applications and services
CN109101567A (en) A kind of distributed text approximate KNN semantic search calculation method
Xu et al. A novel algorithm for associative classification of image blocks
Rodriguez et al. New approach to identify analogue reservoirs
Ding et al. Improved density peaks clustering based on natural neighbor expanded group
CN107577681B (en) A kind of terrain analysis based on social media picture, recommended method and system
Ihler et al. Using sample-based representations under communications constraints
Yang et al. Semantic categorization of digital home photo using photographic region templates
Xu Deep mining method for high-dimensional big data based on association rule
Li NNGDPC: a kNNG-based density peaks clustering

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20190517

WW01 Invention patent application withdrawn after publication