CN103942274B - A kind of labeling system and method for the biologic medical image based on LDA - Google Patents

A kind of labeling system and method for the biologic medical image based on LDA Download PDF

Info

Publication number
CN103942274B
CN103942274B CN201410120529.7A CN201410120529A CN103942274B CN 103942274 B CN103942274 B CN 103942274B CN 201410120529 A CN201410120529 A CN 201410120529A CN 103942274 B CN103942274 B CN 103942274B
Authority
CN
China
Prior art keywords
word
lda
theme
descriptor
medical image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410120529.7A
Other languages
Chinese (zh)
Other versions
CN103942274A (en
Inventor
徐颂华
林谋广
姜涛
薛凯军
肖剑
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Yat Sen University
Institute of Dongguan of Sun Yat Sen University
Original Assignee
Sun Yat Sen University
Institute of Dongguan of Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Yat Sen University, Institute of Dongguan of Sun Yat Sen University filed Critical Sun Yat Sen University
Priority to CN201410120529.7A priority Critical patent/CN103942274B/en
Publication of CN103942274A publication Critical patent/CN103942274A/en
Application granted granted Critical
Publication of CN103942274B publication Critical patent/CN103942274B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • G06F19/321
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B45/00ICT specially adapted for bioinformatics-related data visualisation, e.g. displaying of maps or networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Medical Informatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The invention discloses a kind of labeling system of the biologic medical image based on LDA, including LDA training modules, key words extraction module, descriptor refining module, index context sentence module, context generation module, mark generation module, LDA training modules are trained to LDA models;Key words extraction module carries out LDA modelings to the comment of image and extracts descriptor;Descriptor refining module optimizes to theme set of words;Index context sentence module index goes out the sentence collection with theme word association;Context generation module chooses the context of most close sentence pie graph picture;Mark generation module is modeled to the context of image, and several words are used as the mark word of biologic medical image before being chosen by calculating.The present invention discloses a kind of mask method of the biologic medical image based on LDA.The present invention can once generate multiple mark words, and accuracy is high, and associated picture is searched using keyword index, convenient and swift, more meet people's text retrieval custom.

Description

A kind of labeling system and method for the biologic medical image based on LDA
Technical field
The present invention relates to technical field of image processing, and in particular to a kind of mark system of the biologic medical image based on LDA System and method.
Background technology
With becoming increasingly popular for the equipment capable of taking pictures such as the development of digital audio-effect processing and digital camera, various images Being skyrocketed through for geometry level is presented in quantity.And the fast-developing of internet also causes image is propagated to become more to accelerate with shared simultaneously It is prompt.In order to effectively organize, inquire about and browse such large-scale image resource, image retrieval technologies are arisen at the historic moment, and turn into meter The research emphasis of calculation machine visual field.
Existing image search method is broadly divided into two kinds:CBIR(Content-Based Image Retrieval)With text based image retrieval(Text-Based Image Retrieval).CBIR Need user's offer piece image to be used as to inquire about, the bottom visual signature of system extraction image, such as color, texture and shape, Vision index is established for image, occurrence is then found out according to the visual similarity between image in database and inquiry, realizes inspection The purpose of rope.Due to inconsistency, i.e., so-called " semantic gap between image bottom visual signature and high level semantic-concept be present (Semantic Gap)", the performance of CBIR is unsatisfactory.Text based image retrieval, it is necessary to Text index is established in advance to image, system is according to the correlation of text as inquiry as long as submitting text during user search Returned with similar image is found out, so retrieval to image translates into the retrieval to text key word.
Compared with CBIR, text based image retrieval only needs user to submit text key word, It is convenient and swift, more favored by users, thus also turn into the major way of main flow commercialization image search engine.But this Kind mode needs to establish image text index, that is, realizes the semantic tagger of image, and this is text based image retrieval A job of great challenge in technology.Realize the semantic tagger of image, it has also become the weight of text based image retrieval technologies In it is weight.A kind of traditional mode is manually to be marked, but this mode time and effort consuming, especially in face of large-scale net During network image, it obviously can not be competent at.Therefore, how to break away from manual intervention, and quickly and efficiently realize to image from Dynamic semantic tagger, becomes particularly significant.
In order to realize that the automation of image marks, a kind of existing method of prior art is that image is classified, then Mark the result of classification as image.Specifically, each semantic key words are regarded as a category label(Label), And based on some graders of training, then classified with these graders to not marking image, institute is sub-category to be The mark of the image.Existing many ripe sorting algorithms at present, such as SVMs, stealthy Markov model etc..
However, although image labeling, dependent on the accuracy of sorting algorithm, current classification are carried out using the method for classification Although algorithm accuracy is higher, but still has certain error.In addition, existing sorting algorithm is binary classification mostly Device, such as SVMs, then for there is the image of multiple mark, it is necessary to design multiple graders, and carried out to image More subseries, efficiency is not also high.
Therefore, it is necessary to labeling system and the method for a kind of biologic medical image based on LDA are provided to meet existing need Ask.
The content of the invention
It is an object of the invention to provide a kind of accuracy is high, mark of the conveniently biologic medical image based on LDA System and method.
Therefore, the invention provides a kind of labeling system of the biologic medical image based on LDA, including LDA training modules, Key words extraction module, descriptor refining module, index context sentence module, context generation module, mark generation module, The LDA training modules are used to be trained LDA models;The key words extraction module is used for every width biologic medical image Comment carry out LDA modelings, then extract all descriptor from institute's established model;The descriptor refines module to institute Theme set of words caused by key words extraction module is stated to optimize;The index context sentence module is used to cure from biology Treat the sentence collection indexed out in the text of image with theme word association;The context generation module is from each descriptor institute Corresponding sentence, which is concentrated, chooses a most close sentence, then gathers all most close sentences, forms biologic medical image Context;The LDA models that the mark generation module is obtained by LDA training modules enter to the context of biologic medical image Row modeling, theme distribution and the word distribution of biologic medical image are obtained, each word is general during then theme-word is distributed Rate is multiplied by the probability of corresponding theme, weights of the acquired results as this word, according still further to the order of weights from big to small by institute There is word rank, mark word of several words as biologic medical image before selection.
It is preferred that the data set of the LDA models is the comment of all biologic medical images, from every width biologic medical The comment of node is extracted in text corresponding to image, the comment set of all images is constituted into LDA moulds The training dataset of type.
It is preferred that the training module is trained using the Gibbs method of samplings to LDA models, each list of first sampling out The distribution of theme corresponding to word, document-theme distribution is then extrapolated according to this distribution and theme-word is distributed.
It is preferred that the descriptor refining module includes to the optimization process of theme set of words:Biology is cured in LDA models In the result of comment modeling for treating image, if the probability of some subject word is zero in theme-word distribution, by the list Word is rejected from theme set of words;If not including some descriptor in the comment of biologic medical image, by the word from Descriptor, which is concentrated, to be rejected;If the word repeated is rejected, only retains one containing the word repeated in theme set of words.
It is preferred that the index context sentence module utilizes LUCENE gophers to each in theme set of words Word, as querying condition, retrieve all sentences for including the descriptor.
Include it is preferred that the most close sentence chooses process:Traversal includes each sentence of one of descriptor, If this sentence contains other descriptor, its number of votes obtained just accordingly increases, and a descriptor contributes a ticket, then chooses Most close sentence of the number of votes obtained highest sentence as this descriptor;The most close sentence for gathering all descriptor is formed up and down Text.
Invention also provides a kind of mask method of the biologic medical image based on LDA, comprise the following steps:Step 1:A part of biologic medical image construction training set is chosen, and is extracted from the text of every width biologic medical image in node Comment, form LDA models training dataset;Step 2:LDA models are trained, first sampled out corresponding to word The distribution of theme, then further calculate document-theme distribution and theme-word distribution;Step 3:It is secondary to one not mark image, It is modeled using the LDA models of training, chooses all descriptor, forms theme set of words;Step 4:To theme set of words Optimize, remove the word wherein repeated, the word and the not word in comment that probability is zero, so as to obtain refining master Epigraph set;Step 5:To a descriptor, all sentences comprising the word are retrieved from the text of image, are formed One sentence collection, it is denoted as the corresponding sentence collection of the descriptor;Step 6:Concentrate selection most close from the corresponding sentence of each descriptor The sentence cut, form the context of the image;Step 7:Context is modeled with the LDA models of training, then by theme- The probability of each word in word distribution is multiplied by the probability of corresponding theme, weights of the obtained result as word;In descending order Sort all words, several final marks as image before selection.
Compared with prior art, the present invention takes full advantage of the comment and text in data set associated by image To excavate the mark word of image, accuracy is high, and can once generate multiple mark words.Realize the standard of biologic medical image Really after mark, the image of correlation can be searched using keyword index, it is convenient and swift, more meet people's text retrieval custom.
Brief description of the drawings
In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing There is the required accompanying drawing used in technology description to be briefly described, it should be apparent that, drawings in the following description are only this Some embodiments of invention, for those of ordinary skill in the art, on the premise of not paying creative work, can be with Other accompanying drawings are obtained according to these accompanying drawings.
Fig. 1 is the structural representation of the labeling system of the biologic medical image of the invention based on LDA;
Fig. 2 is the flow chart of the mask method of the biologic medical image of the invention based on LDA;
Fig. 3 is the flow chart of the mask method of the biologic medical image based on LDA of the embodiment of the present invention.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete Site preparation describes, it is clear that described embodiment is only part of the embodiment of the present invention, rather than whole embodiments.It is based on Embodiment in the present invention, those of ordinary skill in the art are obtained all other under the premise of creative work is not made Embodiment, belong to the scope of protection of the invention.
As described above, the present invention is labeled for biologic medical image, and in biologic medical image corpus, every figure As there is a corresponding text.With reference to this particularity, it is proposed that one kind is based on LDA(Latent Dirichlet Allocation, latent Dirichletal location)Biologic medical image mask method, the comment using LDA from image (caption)Middle extraction descriptor, context then is extracted in the text according to corresponding to these descriptor from image, finally LDA is recycled to be modeled context, the resulting descriptor just final mark as biologic medical image.
Specifically, with reference to figure 1, the invention provides a kind of labeling system of the biologic medical image based on LDA, including LDA training modules, key words extraction module, descriptor refining module, index context sentence module, context generation module, Mark generation module.
LDA training modules are used to be trained LDA models;LDA models are by certain training data set pair LDA moulds Type is trained, and is distributed with generating document-theme distribution and theme-word.The data set of LDA models is all lifes in the present invention The comment of thing medical image.From the text corresponding to every width biologic medical image(XML format)Middle extraction caption The comment of the content of node, the i.e. image, the comment of all images gather together, and constitute the instruction of LDA models Practice data set.We set the Dirichlet prior parameter of theme number, document-theme distribution and theme-word distribution simultaneously It is set to empirical value.LDA training modules are trained using the Gibbs method of samplings to LDA models, and each word institute of first sampling out is right The distribution of theme is answered, document-theme distribution is then extrapolated according to this distribution and theme-word is distributed.
Key words extraction module is used to carry out LDA modelings to the comment of every width biologic medical image, then from being built Model(Theme distribution and word distribution)It is middle to extract all descriptor;Image is not marked for a pair, utilizes training module institute Comment of the caused LDA models to the image(caption)It is modeled, then from the result of modeling(Theme distribution and Word is distributed)The middle descriptor for extracting all words as the image, is added in theme set of words.
Descriptor refining module optimizes to theme set of words caused by the key words extraction module, obtains most smart Simple, maximally effective theme set of words.Comment in LDA models to image(caption)In the result of modeling, if main The probability of some subject word is zero in topic-word distribution, and the word is rejected from theme set of words;If the explanation of image Do not include some descriptor in word, the word is concentrated from descriptor and rejected;If contain the list repeated in theme set of words Word, the word repeated is rejected, only retains one.By these optimization operations, so as to the theme set of words more refined.It is logical Descriptor refining treatment is crossed, removes the descriptor of repetition, while removes the descriptor that probability is zero in LDA modeling results, and Remove picture specification word(caption)In the sentence that does not include
Index context sentence module is used to index out from the text of biologic medical image and theme word association Sentence collection;Index module is by the use of LUCENE as gopher, to each word in refining theme set of words, as Querying condition, retrieve all sentences for including the descriptor.After the completion of Index process, for each descriptor, there is one Individual sentence collection is associated.It is to be appreciated that in index context sentence module, the embodiment of the present invention is come real using LUCENE Existing text retrieval, also has other text retrieval instruments, can realize same function instead of LUCECE at present.
Sentence corresponding to context generation module from each descriptor, which is concentrated, chooses a most close sentence, Ran Houji All most close sentences are closed, form the context of biologic medical image(context), i.e., all molecular set of sentence closely It is exactly context.Include it is preferred that the most close sentence chooses process:Traversal includes each sentence of one of descriptor Son, if this sentence contains other descriptor, its number of votes obtained just accordingly increases, and a descriptor contributes a ticket, then Choose most close sentence of the number of votes obtained highest sentence as this descriptor;The most close sentence for gathering all descriptor is formed Context.
The LDA models that mark generation module is obtained by LDA training modules are built to the context of biologic medical image Mould, theme distribution and the word distribution of biologic medical image are obtained, then multiplies the probability of each word in theme-word distribution With the probability of corresponding theme, weights of the acquired results as this word, according still further to the order of weights from big to small by all lists Word sorts, mark word of several words as biologic medical image before selection.
With reference to figure 2, correspondingly, invention also provides a kind of mask method of the biologic medical image based on LDA, bag Include following steps:
Step S01:A part of biologic medical image construction training set is chosen, and it is literary from the text of every width biologic medical image The comment in node is extracted in part, forms the training dataset of LDA models;
Step S02:LDA models are trained, the distribution for theme corresponding to word of first sampling out, then further calculated Document-theme distribution and theme-word distribution;
Step S03:It is secondary to one not mark image, it is modeled using the LDA models of training, chooses all descriptor, Form theme set of words;
Step S04:Theme set of words is optimized, the word wherein repeated, the word that probability is zero is removed and does not exist Word in comment, so as to obtain refining theme set of words;
Step S05:To a descriptor, all sentences comprising the word, group are retrieved from the text of image Into a sentence collection, the corresponding sentence collection of the descriptor is denoted as;
Step S06:Concentrated from the corresponding sentence of each descriptor and choose most close sentence, form the upper and lower of the image Text;
Step S07:Context is modeled with the LDA models of training, then by each list in theme-word distribution The probability of word is multiplied by the probability of corresponding theme, weights of the obtained result as word;Sort all words in descending order, before selection Several final marks as image.
Coordinate with reference to figure 3, the specific behaviour as the biologic medical image labeling method based on LDA of one embodiment of the invention It is as follows to make step:
1st step, start
2nd step, a part of biologic medical image construction training set is chosen, and extracted from the text of each image Comment in CAPTION nodes, form the training dataset of LDA models;Meanwhile given number of topics, document-theme distribution Study first, the Study first of theme-word distribution.
3rd step, LDA models are trained using Gibbs sampling algorithms;First sample out theme corresponding to word point Cloth, then further calculate document-theme distribution and theme-word distribution.
4th step, it is secondary to one not mark image, it is modeled using the LDA models of training, chooses all descriptor, group Into theme set of words.
5th step, theme set of words is optimized, remove the word wherein repeated, the word that probability is zero and do not saying Word in plaintext word, so as to obtain refining theme set of words.
6th step, to a descriptor, all sentences comprising the word are retrieved from the text of image with LUCECE Son, a sentence collection is formed, be denoted as the corresponding sentence collection of the descriptor.
7th step, there is corresponding sentence collection if all of descriptor, then into the 8th step, otherwise into the 6th step.
8th step, using context generating algorithm, concentrated from the corresponding sentence of each descriptor and choose most close sentence, Form the context of the image.
9th step, the LDA models trained with the 3rd step are modeled to context, then will be every in theme-word distribution The probability of individual word is multiplied by the probability of corresponding theme, weights of the obtained result as word;Sort all words in descending order, choosing Several final marks as image before taking.
10th step, all images that do not mark are all marked, and into the 11st step, otherwise jump to the 4th step.
11st step, terminate.
Compared with prior art, the present invention takes full advantage of the comment of biologic medical image and corresponding text envelope Breath, the descriptor of image, and the text message traced back to where image are excavated from comment, one section of context is generated, enters And extract the mark word of image.This mode substantially increases the accuracy of mark, and can disposably generate image and be closed Multiple marks of connection.The present invention takes full advantage of comment in data set associated by image and text to excavate image Mark word, accuracy is high, and can once generate multiple mark words.After the accurate mark for realizing biologic medical image, The image of correlation can be searched using keyword index, it is convenient and swift, more meet people's text retrieval custom.
The labeling system and method for the biologic medical image based on LDA provided above the embodiment of the present invention, carry out It is discussed in detail, specific case is applied in the present invention principle and embodiment of the present invention are set forth, the above is implemented The explanation of example is only intended to help the method and its core concept for understanding the present invention;Meanwhile for the general technology people of this area Member, according to the thought of the present invention, there will be changes in specific embodiments and applications, in summary, this explanation Book content should not be construed as limiting the invention.

Claims (6)

1. a kind of labeling system of the biologic medical image based on LDA, it is characterised in that taken out including LDA training modules, descriptor Modulus block, descriptor refining module, index context sentence module, context generation module, mark generation module, the LDA Training module is used to be trained LDA models;The key words extraction module is used for the explanation to every width biologic medical image Word carries out LDA modelings, then extracts all descriptor from institute's established model;The descriptor refines module to the theme Theme set of words optimizes caused by word abstraction module;The index context sentence module is used for from biologic medical image Text in index out sentence collection with theme word association;The context generation module is from corresponding to each descriptor Sentence, which is concentrated, chooses a most close sentence, then gathers all most close sentences, forms the upper and lower of biologic medical image Text;The LDA models that the mark generation module is obtained by LDA training modules are built to the context of biologic medical image Mould, theme distribution and the word distribution of biologic medical image are obtained, then multiplies the probability of each word in theme-word distribution With the probability of corresponding theme, weights of the acquired results as this word, according still further to the order of weights from big to small by all lists Word sorts, mark word of several words as biologic medical image before selection;Wherein, the data set of the LDA models is all The comment of biologic medical image, the expository writing of node is extracted from the text corresponding to every width biologic medical image Word, the comment set of all images is constituted into the training dataset of LDA models.
2. the labeling system of the biologic medical image based on LDA as claimed in claim 1, it is characterised in that the training mould Block is trained using the Gibbs method of samplings to LDA models, the distribution for theme corresponding to each word of first sampling out, Ran Hougen Document-theme distribution and theme-word distribution are extrapolated according to this distribution.
3. the labeling system of the biologic medical image based on LDA as claimed in claim 1, it is characterised in that the descriptor Refining module includes to the optimization process of theme set of words:In the knot that LDA models model to the comment of biologic medical image In fruit, if the probability of some subject word is zero in theme-word distribution, the word is rejected from theme set of words;Such as Do not include some descriptor in the comment of fruit biologic medical image, the word is concentrated from descriptor and rejected;If theme Containing the word repeated in set of words, the word repeated is rejected, only retains one.
4. the labeling system of the biologic medical image based on LDA as claimed in claim 1, it is characterised in that on the index Hereafter sentence module, as querying condition, is examined using LUCENE gophers to each word in theme set of words Rope goes out all sentences for including the descriptor.
5. the labeling system of the biologic medical image based on LDA as claimed in claim 1, it is characterised in that described most close Sentence choose process include:Traversal includes each sentence of one of descriptor, if this sentence contains other masters Epigraph, its number of votes obtained just accordingly increase, and a descriptor contributes a ticket, then choose number of votes obtained highest sentence and are used as this The most close sentence of descriptor;The most close sentence for gathering all descriptor forms context.
6. a kind of mask method of the biologic medical image based on LDA, it is characterised in that comprise the following steps:
Step 1:A part of biologic medical image construction training set is chosen, and is carried from the text of every width biologic medical image The comment in node is taken, forms the training dataset of LDA models;
Step 2:LDA models are trained, the distribution for theme corresponding to word of first sampling out, then further calculate document- Theme distribution and theme-word distribution;
Step 3:It is secondary to one not mark image, it is modeled using the LDA models of training, chooses all descriptor, composition master Epigraph set;
Step 4:Theme set of words is optimized, removes the word wherein repeated, the word that probability is zero and not in expository writing Word in word, so as to obtain refining theme set of words;
Step 5:To a descriptor, all sentences comprising the word are retrieved from the text of image, form one Sentence collection, it is denoted as the corresponding sentence collection of the descriptor;
Step 6:Concentrated from the corresponding sentence of each descriptor and choose most close sentence, form the context of the image;
Step 7:Context is modeled with the LDA models of training, then by the general of each word in theme-word distribution Rate is multiplied by the probability of corresponding theme, weights of the obtained result as word;Sort all words in descending order, several works before selection For the final mark of image.
CN201410120529.7A 2014-03-27 2014-03-27 A kind of labeling system and method for the biologic medical image based on LDA Active CN103942274B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410120529.7A CN103942274B (en) 2014-03-27 2014-03-27 A kind of labeling system and method for the biologic medical image based on LDA

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410120529.7A CN103942274B (en) 2014-03-27 2014-03-27 A kind of labeling system and method for the biologic medical image based on LDA

Publications (2)

Publication Number Publication Date
CN103942274A CN103942274A (en) 2014-07-23
CN103942274B true CN103942274B (en) 2017-11-14

Family

ID=51189942

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410120529.7A Active CN103942274B (en) 2014-03-27 2014-03-27 A kind of labeling system and method for the biologic medical image based on LDA

Country Status (1)

Country Link
CN (1) CN103942274B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104021222A (en) * 2014-06-26 2014-09-03 深圳信息职业技术学院 Labeling algorithm for biomedical image based on invisible dirichlet model
CN104505090B (en) * 2014-12-15 2017-11-14 北京国双科技有限公司 The audio recognition method and device of sensitive word
CN107025369B (en) * 2016-08-03 2020-03-10 北京推想科技有限公司 Method and device for performing conversion learning on medical images
CN108984726B (en) * 2018-07-11 2022-10-04 黑龙江大学 Method for performing title annotation on image based on expanded sLDA model
CN109460756B (en) * 2018-11-09 2021-08-13 天津新开心生活科技有限公司 Medical image processing method and device, electronic equipment and computer readable medium
CN109918476A (en) * 2019-01-26 2019-06-21 北京工业大学 A kind of subject retrieval method based on topic model

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102902700A (en) * 2012-04-05 2013-01-30 中国人民解放军国防科学技术大学 Online-increment evolution topic model based automatic software classifying method
CN103324700A (en) * 2013-06-08 2013-09-25 同济大学 Noumenon concept attribute learning method based on Web information

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7672987B2 (en) * 2005-05-25 2010-03-02 Siemens Corporate Research, Inc. System and method for integration of medical information

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102902700A (en) * 2012-04-05 2013-01-30 中国人民解放军国防科学技术大学 Online-increment evolution topic model based automatic software classifying method
CN103324700A (en) * 2013-06-08 2013-09-25 同济大学 Noumenon concept attribute learning method based on Web information

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
主题模型LDA的多文档自动文摘;杨潇;《智能系统学报》;20100430;第5卷(第2期);期刊全文 *

Also Published As

Publication number Publication date
CN103942274A (en) 2014-07-23

Similar Documents

Publication Publication Date Title
CN103942274B (en) A kind of labeling system and method for the biologic medical image based on LDA
CN109492077B (en) Knowledge graph-based petrochemical field question-answering method and system
CN105320642B (en) A kind of digest automatic generation method based on Concept Semantic primitive
CN105608218B (en) The method for building up of intelligent answer knowledge base establishes device and establishes system
CN106294593B (en) In conjunction with the Relation extraction method of subordinate clause grade remote supervisory and semi-supervised integrated study
CN107463658B (en) Text classification method and device
CN104281653B (en) A kind of opining mining method for millions scale microblogging text
CN107609052A (en) A kind of generation method and device of the domain knowledge collection of illustrative plates based on semantic triangle
CN102799684B (en) The index of a kind of video and audio file cataloguing, metadata store index and searching method
US20150074112A1 (en) Multimedia Question Answering System and Method
CN103970848B (en) A kind of universal internet information data digging method
Argyrou et al. Topic modelling on Instagram hashtags: An alternative way to Automatic Image Annotation?
CN104281702B (en) Data retrieval method and device based on electric power critical word participle
CN106570708A (en) Management method and management system of intelligent customer service knowledge base
CN108121829A (en) The domain knowledge collection of illustrative plates automated construction method of software-oriented defect
CN107506389B (en) Method and device for extracting job skill requirements
CN112395395B (en) Text keyword extraction method, device, equipment and storage medium
CN107679110A (en) The method and device of knowledge mapping is improved with reference to text classification and picture attribute extraction
CN112948575B (en) Text data processing method, apparatus and computer readable storage medium
CN108363725A (en) A kind of method of the extraction of user comment viewpoint and the generation of viewpoint label
CN104298732B (en) The personalized text sequence of network-oriented user a kind of and recommendation method
CN110321549B (en) New concept mining method based on sequential learning, relation mining and time sequence analysis
CN108509521A (en) A kind of image search method automatically generating text index
CN106874397B (en) Automatic semantic annotation method for Internet of things equipment
CN107665188A (en) A kind of semantic understanding method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C53 Correction of patent for invention or patent application
CB03 Change of inventor or designer information

Inventor after: Xu Songhua

Inventor after: Lin Mouguang

Inventor after: Jiang Tao

Inventor after: Xue Kaijun

Inventor after: Xiao Jian

Inventor before: Lin Mouguang

Inventor before: Jiang Tao

Inventor before: Xue Kaijun

Inventor before: Xiao Jian

COR Change of bibliographic data

Free format text: CORRECT: INVENTOR; FROM: LIN MOUGUANG JIANG TAO XUE KAIJUN XIAO JIAN TO: XU SONGHUA LIN MOUGUANG JIANG TAO XUE KAIJUN XIAO JIAN

GR01 Patent grant
GR01 Patent grant