CN104750819A - Biomedicine literature search method and system based on word grading sorting algorithm - Google Patents
Biomedicine literature search method and system based on word grading sorting algorithm Download PDFInfo
- Publication number
- CN104750819A CN104750819A CN201510147696.5A CN201510147696A CN104750819A CN 104750819 A CN104750819 A CN 104750819A CN 201510147696 A CN201510147696 A CN 201510147696A CN 104750819 A CN104750819 A CN 104750819A
- Authority
- CN
- China
- Prior art keywords
- vocabulary
- candidate
- query
- inquiry
- expands
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a biomedicine literature search method and system based on a word grading sorting algorithm. The biomedicine literature search method comprises the steps that 1, inquiry and extraction are conducted through a search engine; 2, candidate expanding vocabulary is extracted; 3, the features of the candidate expanding vocabulary are extracted and labeled; 4, a candidate expanding vocabulary sorting model is trained; 5, online inquiry and extraction are conducted through the search engine; 6, the candidate expanding vocabulary is extracted online, the features of the candidate expanding vocabulary are extracted online, and grading is conducted online; 7, a search result is fed back. The biomedicine literature search system comprises a search engine inquiry and extraction module, a candidate expanding vocabulary extraction module, a candidate expanding vocabulary feature extracting and labeling module, a candidate expanding vocabulary sorting model training module, a search reconstruction module and a search result feedback module. By the adoption of the biomedicine literature search method and system, from the point of search expansion, the specialized vocabulary which can meet the user information needs to the greatest extent is selected by utilizing the word grading sorting algorithm and the inherent dictionary resources in the biomedicine field in the search expanding process, so that the search task is completed, and the search performance is improved.
Description
Technical field
The present invention relates to data mining and search engine technique field, especially a kind of Biomedical literature search method based on word grading sorting algorithm and system.
Background technology
In recent years, along with the fast development in biomedical (Biomedicine) field, biomedical correlative study achieves more valuable achievement, these achievements not only facilitate the treatment that some once seemed insoluble disease, from the angle of more far-reaching, the mankind are also promoted for the development self be familiar with and deeply.
But along with the increase at full speed of Biomedical literature quantity, the quantity of relevant information is also in exponentially property increase, the document of magnanimity and information are that the acquisition of information of biomedical researcher and relevant practitioner brings a difficult problem, and traditional manual information obtain manner has become no longer applicable gradually, therefore, need the techniques and methods by means of information retrieval, assist related personnel to obtain required information.
The inquiry that traditional information retrieval technique can be submitted to according to user, carries out relevance ranking to document or webpage, and ranking results is returned to user.And directly traditional information retrieval method is applied in the retrieval tasks of Biomedical literature, be difficult to obtain good retrieval performance.Its reason is the inherent characteristics failing to consider biomedical sector fully, and such as biomedical sector has more specialized vocabulary, and these specialized vocabularies often exist the situation of a lot of synonym and abb. simultaneously.If the feature of biomedical sector can be considered in traditional information retrieval method fully, the performance of biomedical information retrieval will be improved further.
Query expansion technology is one of the gordian technique in conventional IR field.On the basis of the original query that it can be submitted to user, the retrieval according to user is intended to, and carries out supplementary and perfect, thus is more met the inquiry of user search intention, improve the performance of retrieval to inquiry.Existing enquiry expanding method can be divided into two large classes: a class is the enquiry expanding method based on collection of document, these class methods with total data collection of document or partial data collection of document for research object, therefrom extract content associated with the query, improve original query; Another kind of is query expansion technology based on outside extended resources, external resource mainly includes dictionary resources, searching system inquiry log, Anchor Text and wikipedia etc., a lot of research shows to utilize outside extended resources to improve original query, can better complete query expansion task, and then promote the performance of retrieval.
There is the Domain resources such as more dictionary due to biomedical sector, if can in the process of information retrieval, make full use of these resources and carry out the inquiry that user submits to supplementary and perfect, by there is a strong possibility, property gets a promotion the performance of retrieval.
Set up the literature search being directed to biomedical sector, first should understand feature and the resource in this field.A large amount of specialized vocabularies is there is in the document of biomedical sector, and these vocabulary contain the complex situations such as a lot of synonym and abb., this foundation being searching system brings huge challenge, such as drug acetaminophen, its english name is called paracetamol, and in international standard classification of drug, its title is paracetamol (acetaminophen), C8H9NO2 or NO2BE01 at its formal name used at school of medicinal chemistry art, be directed to the situation of above multiple title, if only inquire about one of them name in retrieval, be difficult to retrieve all relevant documents.Fortunately, many intrinsic knowledge bases and resource is also there is at biomedical sector, such as MeSH (MeSH:MedicalSubjectHeadings) and gene ontology (GO:Gene Ontology) etc., if these resources can be utilized fully in the process of retrieval, huge lifting will be brought to the performance of Biomedical literature retrieval.
Sequence study (learning to rank) algorithm be a series of in information retrieval to the general name of the supervised learning algorithm of document ordering, its principal feature is that technology that applied for machines learns solves the sequencing problem in information retrieval, and obtains good retrieval ordering performance.Wherein sequencing problem also can be regarded as the select permeability of an optimum item, and therefore, Ranking Algorithm is applied to other task multiple in recent years, such as in commending system according to the historical information of user and article for user recommends corresponding article etc.
Summary of the invention
The object of this invention is to provide one and can provide Biomedical literature more accurately for user, more effectively meet the information requirement of user, the effective Biomedical literature search method based on word grading sorting algorithm and the system supplementing and improve user's inquiry.
The present invention solves the technical scheme that prior art problem adopts: a kind of Biomedical literature search method based on word grading sorting algorithm, and comprise following off-line training step and online query stage, wherein, off-line training step comprises the following steps:
S1, search engine inquiry extraction step: according to the historical query record of search engine, extract the front N bar Query Result document obtained in many group pollings and each inquiry; And will inquire about and Query Result document collection to one inquiry pond in, wherein N is natural number;
S2, candidate expand vocabulary extraction step: extract the specialized vocabulary in inquiry pond before each inquiry in N bar Query Result document according to biomedical resource, and statistics obtains the weighted sum of number of times that each specialized vocabulary occurs in described Query Result document or occurrence number; The weighted sum descending sort of the number of times occurred in Query Result document according to each specialized vocabulary or number of times, select M the highest specialized vocabulary of occurrence number weighted sum that is the highest or number of times alternatively to expand vocabulary, wherein M is natural number;
S3, candidate expand feature extraction and the annotation step of vocabulary:
Candidate expand vocabulary feature extraction and mark carry out simultaneously; Wherein, correlativity mark candidate being expanded to vocabulary is marked by the retrieval performance that contrasts original query and the height of this candidate being expanded retrieval performance when vocabulary joins in original query; The evaluation index of retrieval performance height comprises: accuracy rate, Average Accuracy, NDCG value and MRR value; The concrete mode of correlativity mark is as follows:
Wherein, eval () is the evaluation index function for evaluating retrieval performance height, eval (query+term) is for evaluation index function eval () is evaluating score candidate expanded when vocabulary term joins inquiry query, and eval (query) is the score of evaluation index function when evaluating inquiry query; Label be labeled as this candidate of 1 expression expand vocabulary to inquiry query be relevant; Label is labeled as this candidate of 0 expression and expands vocabulary and inquire about query incoherent;
Candidate expands the feature extraction of vocabulary, expand the correlation information of vocabulary and original query etc. for training order models from the distributed intelligence in biomedical resource of the distributed intelligence of extracting candidate before the inquiry biomedical resource and inquiry pond returns in N bar Query Result document and expand vocabulary, candidate's vocabulary and candidate to prepare, and after the same candidate of extraction expands the various features of vocabulary, all eigenwerts are normalized, so that all eigenwerts are controlled [0,1], on interval, normalized process is as follows:
Wherein, minValue and maxValue is respectively minimum value and the maximal value of a certain feature;
S4, candidate expand vocabulary order models training step: degree of correlation mark and the various features of expanding vocabulary according to candidate, word grading sorting algorithm is utilized to train the weighted value obtaining often kind of feature, concrete steps are: select to be noted as in a step S3 relevant candidate and expand vocabulary and be somely marked as incoherent candidate and expand vocabulary and form a word grouping, select some such words to divide into groups as training sample; Be the feature imparting initial weight of wherein each candidate's expansion word at random, by characteristic weighing score, the correlation candidate expansion vocabulary in each word grouping sorted; According to the ranking results that each word divides into groups, calculated population sequence loss, according to the weight of the every one-dimensional characteristic of Grad dynamic conditioning of loss function, loss of wherein sorting is:
wherein NumSample is the quantity that in word grouping, candidate expands vocabulary grouping, loss
ifor the penalty values that each word divides into groups, this penalty values is obtained by the sorting position calculating related expanding vocabulary, and the penalty values of the more forward correspondence of sorting position is less; By a process on loop iteration, train, using the eigenwert finally selected as the order models of having trained until overall loss value is less than a certain threshold value or reaches the iterations of specifying;
The online query stage comprises the following steps:
S5, on-line search engine queries and extraction step: the new inquiry that user is submitted to online, retrieval obtains front N1 bar Query Result; Extract the specialized vocabulary in front N1 bar result for retrieval and various features thereof according to biomedical resource, wherein N1 is natural number; ;
S6, online candidate expand word retrieval and feature extraction thereof and marking step: feature extracting method that vocabulary extracting method and candidate expand vocabulary extracts the online query stage specialized vocabulary in front N1 bar result for retrieval and various features thereof to utilize the candidate of off-line phase S2-S3 to expand according to biomedical resource to new inquiry, obtain online query stage candidate and expand vocabulary, the feature of extraction expands the importance of vocabulary in expanding query for weighing candidate; Train the feature weight obtained according to step S4, give a mark for online query stage candidate expands vocabulary, and the K1 selecting a mark forward candidate expands vocabulary joins as expanding query in the online new inquiry submitted to, wherein K1 is natural number;
To mark for utilizing biomedical resource and the some online query stage candidates extracted expand vocabulary, must being divided into of it
wherein FeatureNum is the sum of feature, and ai is the weighted value of i-th feature in order models, feature
i(term) be the eigenwert that online query stage candidate expands i-th feature corresponding to vocabulary term;
Expand vocabulary score according to online query stage candidate to sort to it, and forward K1 the online query stage candidate of selected and sorted expands vocabulary when joining in the online new inquiry submitted to as expansion vocabulary, the online query stage candidate added expands the weight of vocabulary in expanding query and can be expressed as
Wherein sign is sign function, sign=1 when in the new inquiry that this online query stage candidate's expansion word remittance abroad is submitted to now online, otherwise sign=0, weight
originalfor the weighted value of new inquiry in expanding query submitted to online;
S7, Query Result return step: retrieve according to expanding query, and result for retrieval is returned to user.
In step S2, the weighted sum of specialized vocabulary occurrence number in described Query Result document is
wherein count
ifor the number of times that this vocabulary occurs in i-th section of document, d (i) is the decay factor of i-th section of document.
In step s3, evaluation index function eval () is Average Accuracy function, that is:
Wherein, RelDoc
queryfor the number of the relevant documentation of given inquiry query, rank (i) represents the position of i-th section of relevant documentation in document results sorted lists.
In step sl, when situation without historical query record, by constructing the mode of biomedical retrieval and indexing method, the artificial record obtaining inquiry and result thereof; Described search method adopts vector space model, BM25 retrieval model or the language model based on different smoothing method.
In step S4, penalty values is:
wherein rank
ifor the position that relevant candidate's expansion word sorts in word group list.
Biomedical resource refers to the dictionary or knowledge base that comprise biomedical specialized vocabulary.
The document number that the feature that described candidate expands vocabulary comprises candidate expands frequency TF that vocabulary occurs in result document, candidate expands vocabulary TF-IDF value, candidate expands vocabulary and original query occurs jointly, candidate expand number of times that vocabulary and original query jointly occur in one text window, candidate's expansion word remittance abroad is existing in biomedical resource number of times, in biomedical resource, comprise this candidate and expand the number of the term concepts of vocabulary and the relation of inclusion between biomedical technical term concept.
Based on a Biomedical literature searching system for word grading sorting algorithm, comprise off-line training part and on-line search part; Described off-line training part comprises with lower part:
Search engine inquiry extraction module: for the historical query record according to search engine, extracts the front N bar Query Result document obtained in many group pollings and each inquiry; And will inquire about and Query Result document collection to one inquiry pond in, wherein N is natural number;
Candidate expands vocabulary extraction module: for when given user inquires about, utilize the resource that biomedical sector is intrinsic, in the top n Query Result document that search engine inquiry extraction module obtains, extraction obtains specialized vocabulary, and the weighted sum of the frequency occurred in Query Result document this specialized vocabulary or occurrence number carries out record; The weighted sum descending sort of the number of times occurred in Query Result document according to each specialized vocabulary or occurrence number, select M the highest specialized vocabulary of occurrence number alternatively to expand vocabulary, wherein M is natural number;
Candidate expands feature extraction and the labeling module of vocabulary: the candidate for obtaining in candidate's expansion word extraction module expands in vocabulary and extracts associated feature, and expanding the impact of vocabulary for retrieval performance according to candidate, mark candidate expands the degree of correlation of vocabulary;
Candidate expands vocabulary order models training module: for utilizing word grading sorting algorithm, and after extraction candidate expands vocabulary characteristic sum mark candidate expands vocabulary degree of correlation, training vocabulary order models obtains the weighted value that candidate expands each feature of vocabulary;
Described on-line search part comprises:
Query Reconstruction module: expand vocabulary marking for the specialized vocabulary extraction in newly inquiry and candidate; Be included in line search engine queries extraction module, online candidate expands word retrieval and feature extraction and scoring modules, wherein, on-line search engine queries extraction module is used for the new inquiry submitted to online user, and retrieval obtains front N1 bar Query Result; Extract the specialized vocabulary in front N1 bar result for retrieval and various features thereof according to biomedical resource, wherein N1 is natural number.Online candidate expands candidate that word retrieval and feature extraction and scoring modules thereof utilize vocabulary order models to export and expands vocabulary weighted value score and calculate corresponding weight, and is joined in original query, and be expanded inquiry;
Query Result returns module: for expanding query being retrieved the result document obtained, return to user.
Beneficial effect of the present invention is: the present invention is mainly from the angle of query expansion, by the specialized vocabulary utilizing the resource selection such as word grading sorting algorithm and the intrinsic dictionary of biomedical sector can express customer information requirement in query expansion, the more efficiently task of completing retrieval, thus provide the result for retrieval that demand is with it properer for user, the present invention utilizes the resource in biomedical sector, supplement and improve original query, and then improving the performance of retrieval.When the set of use TREC gene task data in literature is as data acquisition, when adopting traditional BM25 retrieval model to carry out literature search as reference retrieval model, the literature search accuracy rate of 25.62% can be obtained; And when adopting method and system involved in the present invention to retrieve on this basis, the literature search accuracy rate of 26.30% can be obtained, retrieval performance obtains significant lifting and peek-a-boo involved in the present invention can retrieve the Biomedical literature the most relevant with user's inquiry effectively, improves the satisfaction of user.
Accompanying drawing explanation
Fig. 1 is the schematic flow sheet of search method of the present invention;
Fig. 2 is the logical organization schematic diagram of searching system of the present invention.
Embodiment
Below in conjunction with the drawings and the specific embodiments, the present invention will be described:
Fig. 1 is the schematic flow sheet of a kind of Biomedical literature search method based on word grading sorting algorithm of the present invention, a kind of Biomedical literature search method based on word grading sorting algorithm, comprise following off-line training step and online query stage, wherein, off-line training step comprises the following steps:
S1, search engine inquiry extraction step: according to the historical query record of search engine, extract the front N bar Query Result document obtained in many group pollings and each inquiry; And will inquire about and Query Result document collection to one inquiry pond in, N is natural number.In the present embodiment, N=10;
Wherein, the historical query record of search engine is the query history that records for the searching system of Biomedical literature of pointer and corresponding Query Result mainly, and these inquiries and corresponding Query Result will be used for the training of order models under off-line state.
When situation without relevant historical query note, can by the mode of the biomedical retrieval and indexing of structure, the artificial record obtaining inquiry and result for retrieval thereof.Search method can adopt the multiple order models in conventional IR, includes but not limited to vector space model, BM25 retrieval model, based on the language model etc. of different smoothing method.
S2, candidate expand vocabulary extraction step: extract the specialized vocabulary in inquiry pond before each inquiry in N bar Query Result document according to biomedical resource, and statistics obtains the weighted sum of number of times that each specialized vocabulary occurs in described Query Result document or occurrence number; The weighted sum descending sort of the number of times occurred in Query Result document according to each specialized vocabulary or number of times, select the highest or that number of times weighted sum is the highest M the specialized vocabulary of occurrence number alternatively to expand vocabulary, wherein M is natural number;
Wherein, biomedical resource refers to the resources such as the dictionary that comprises biomedical specialized vocabulary or knowledge base, include but not limited to: MeSH (MeSH), the super lexicon (Metathesaurus) that gene ontology (GO) and Unified Medical Language System (UMLS) are issued, semantic network (Semantic Network) and expert's semantic dictionary instrument (SPECIALIST Lexicon and Lexical Tools) etc.
For MeSH MeSH as biomedical resource used in the present invention, extract the specialized vocabulary in the front N section Query Result document corresponding to inquiry, the weighted sum of its number of times occurred in a document or occurrence number that each specialized vocabulary wherein extracted is corresponding.Such as specialized vocabulary term in a front N section document occurrence number weighted sum by
calculate, wherein count
ifor the number of times that this vocabulary occurs in i-th section of document, d (i) is the decay factor of i-th section of document, the number of times weighted sum of specialized vocabulary is used for being weighted the word frequency occurred in different document, thus making the word frequency sorted in forward document have larger weight, the score that the specialized vocabulary controlling to make to comprise in the document sorted more rearward obtains is fewer.According to above-mentioned formula
in the value of count (term) from high to low selected specialized vocabulary is sorted, or the value according to score (term) sorts to selected specialized vocabulary from high to low, front M the vocabulary expansion vocabulary alternatively that selected and sorted is the most forward, the value of M is 150 in the present embodiment.
S3, candidate expand feature extraction and the correlativity annotation step of vocabulary:
Candidate expand vocabulary feature extraction and mark carry out simultaneously; Wherein, to candidate expand vocabulary correlativity mark by contrast original query retrieval performance and this expansion vocabulary is joined in original query time retrieval performance realize.The thinking that candidate expands the mark of vocabulary is: single candidate is expanded vocabulary and join in original query and retrieve, if the lifting of result for retrieval performance, then marks this expansion vocabulary and original query has correlativity.The evaluation index of retrieval performance is including but not limited to accuracy rate (Precision), Average Accuracy (MAP), NDCG value and MRR value etc.The concrete mode of mark is as follows:
Wherein, eval () is the evaluation index function for evaluating retrieval performance height, eval (query+term) is for evaluation index function eval () is evaluating score candidate expanded when vocabulary term joins given inquiry query, and eval (query) is the score of evaluation index function when evaluating given inquiry query.When add by original query evaluation score that a certain candidate's vocabulary carries out retrieving be greater than original query itself carry out the evaluation score retrieved time, this candidate is expanded vocabulary and is labeled as 1, be labeled as 1 and mean that this vocabulary and original query are relevant; And when original query add evaluation score that a certain candidate's vocabulary carries out retrieving be not more than original query itself carry out the evaluation score retrieved time, this candidate is expanded vocabulary is labeled as 0, is labeled as 0 incoherent when meaning this vocabulary and original query.
In the present embodiment, evaluation function eval () is Average Accuracy, that is:
Wherein, RelDoc
queryfor the number of the relevant documentation of given inquiry query, rank (i) represents the position of i-th section of relevant documentation in document results sorted lists, and such as rank (3)=5 to represent that in sort result list the 3rd section of relevant documentation appears at the 5th position of sorted lists.
And candidate expands the feature extraction of vocabulary, ask in result document from N investigation before the inquiry biomedical resource and inquiry pond returns to extract candidate and expand the distributed intelligence in biomedical resource of the distributed intelligence of vocabulary, candidate's vocabulary and candidate and expand the correlation information of vocabulary and original query etc. for training order models and prepare, and after the same candidate of extraction expands the various features of vocabulary, all eigenwerts are normalized; All eigenwerts to be controlled on [0,1] interval, normalized detailed process is:
Wherein, the feature expanding vocabulary specifically comprises:
1, candidate expands the frequency TF that vocabulary occurs in result document.This feature can obtain according to specialized vocabulary term occurrence number in result document.
2, candidate expands the TF-IDF value of vocabulary.TF-IDF is one of classical model of information retrieval field, can be used to weigh the relative importance of vocabulary, computing method as shown by the following formula:
Wherein count (term) expands the number of times that vocabulary occurs in i-th section of result document for candidate, and TotalDoc is the total number of documents in training data, and df (term) is for occurring that this candidate expands the number of the document of vocabulary.
3, candidate expands the document number that vocabulary and original query occur jointly.This feature can be used for calculating the degree of correlation that original query and candidate expand vocabulary.
4, candidate expands vocabulary and original query common number of times occurred in one text window.This feature is used for calculating the degree of correlation that query word in original query within the specific limits and this candidate expand vocabulary, wherein text window refers to occur that original query word is with within the scope of the document of this candidate's vocabulary, the number of the word at interval between this expansion vocabulary and original query word at a same section.
5, in biomedical resource as in MeSH, the number of times that candidate's expansion word remittance abroad is existing.This feature is used for calculating and weighing this candidate expanding the segment information of vocabulary in biomedical resource.
6, in biomedical resource as in MeSH, comprise the number that this candidate expands the term concepts of vocabulary.Between biomedical technical term concept, often have the relation comprised, this feature can weigh the importance of some candidate's vocabulary in biomedical resource equally.
Expand in lexical feature the above candidate extracted, feature 1 and feature 2 are used for weighing candidate and expand the distributed intelligence of vocabulary in literature collection; Feature 3 and feature 4 are used for weighing the degree of correlation information that candidate expands vocabulary and original query; And feature 5 and feature 6 be used for weigh candidate expand the distributed intelligence of vocabulary in biomedical resource.Expansion lexical feature involved in the present invention comprises but is not limited to above-mentioned feature, by above-mentioned manifold extraction, as the input of word grading sorting algorithm, can better weigh the significance level that candidate expands vocabulary.
S4, candidate expands vocabulary order models training step: according to the candidate obtained in step S3 expand vocabulary degree of correlation mark and various features as input, the order models of word grading sorting algorithm is utilized to train the weighted value obtaining often kind of feature, concrete steps are be noted as relevant candidate in selection step S3 expand vocabulary (candidate corresponding when namely label is 1 expands vocabulary) and be somely marked as incoherent candidate and expand vocabulary (candidate corresponding when label is 0 expands vocabulary) and form the grouping of word, some such words are selected to divide into groups as training sample, at random for each candidate expands the word feature imparting initial weight of vocabulary, by characteristic weighing score, the related expanding vocabulary in each word grouping is sorted, according to the ranking results that each word divides into groups, calculated population sequence loss, according to the weight of the every one-dimensional characteristic of Grad dynamic conditioning of loss function, loss of wherein sorting is:
wherein NumSample is the quantity that in word grouping, candidate expands vocabulary grouping, loss
ifor the penalty values that each word divides into groups, this penalty values is obtained by the sorting position calculating related expanding vocabulary, and the penalty values of the more forward correspondence of sorting position is less, by a process on loop iteration, train, using the eigenwert finally selected as the order models of having trained until overall loss value is less than a certain threshold value or reaches the iterations of specifying, iteration is selected to stop training 100 times in the present embodiment.
In the present embodiment, penalty values is:
wherein rank
ifor the position that relevant candidate's expansion word sorts in word group list, when it makes number one, loss is 0, loses be maximized when it rolls into last place.In addition, the computing formula of penalty values is including but not limited to this computing formula.
In order models, the computing formula of expansion vocabulary final score is as follows:
Wherein, FeatureNum is the sum of feature, a
ibe the weighted value of i-th feature, feature
i(term) be the eigenwert of i-th feature corresponding to candidate's vocabulary term.The order models herein obtained after training may be used for the selection of the expansion vocabulary that test query is correlated with.Above step completes all in off-line case.
The online query stage comprises the following steps:
S5, on-line search engine queries and extraction step: the new inquiry that user is submitted to online, retrieval obtains front N1 bar Query Result; Extract the specialized vocabulary in front N1 bar result for retrieval and various features thereof according to biomedical resource, wherein N1 is natural number;
It should be noted that, under this step refers to online situation, when after the inquiry that user submits to Biomedical literature search engine, the N1 section Query Result that the sequence of this method meeting automatic acquisition preliminary search is the most forward, for process such as the expansions inquired about user, this process is transparent concerning user.
S6, online candidate expand word retrieval and feature extraction thereof and marking step: feature extracting method that vocabulary extracting method and candidate expand vocabulary extracts the online query stage specialized vocabulary in front N1 bar result for retrieval and various features thereof to utilize the candidate of off-line phase S2-S3 to expand according to biomedical resource to new inquiry, obtain online query stage candidate and expand vocabulary, the feature of extraction expands the importance of vocabulary in expanding query for weighing candidate; The feature weight obtained is trained according to step S4, give a mark for online query stage candidate expands vocabulary, new inquiry is built according to marking, and the K1 selecting mark forward online query stage candidate expands vocabulary joins expanding query as on-line stage in the online new inquiry submitted to, wherein K1 is natural number;
To mark for utilizing biomedical resource and the some online query stage candidates extracted expand vocabulary, must being divided into of it
wherein FeatureNum is the sum of feature, a
ithe weighted value of i-th feature in order models, feature
i(term) be the eigenwert that online query stage candidate expands i-th feature corresponding to vocabulary term;
Expand vocabulary score according to online query candidate to sort to it, and forward K1 the vocabulary of selected and sorted as on-line stage candidate expand vocabulary join in new inquiry time, the on-line stage candidate added expands the weight of vocabulary in expanding query and can be expressed as
Wherein sign is sign function, sign=1 when in the new inquiry that this on-line stage candidate expansion word remittance abroad is submitted to now online, otherwise sign=0, weight
originalfor the weighted value of new inquiry in expanding query submitted to online;
The concrete form of final expanding query is as follows:
(weight
1query
originalweight
2(w
1term
1w
2term
2… w
kterm
k))
Wherein weight
1for the weight of new inquiry in expanding query submitted to online, weight
2for the weight of entirety in expanding query of expansion vocabulary newly added, w
1, w
2..., w
kfor expansion vocabulary term
1, term
2..., term
kcorresponding score weight, K is the number of the final expansion vocabulary selected.Weight in the present embodiment
1value is 0.5, weight
2value is the value of 0.5, K is 50.
S7, Query Result return step: retrieve according to expanding query, and result for retrieval is returned to user, complete retrieving.
Corresponding with said method, present invention also offers a kind of Biomedical literature searching system based on word grading sorting algorithm.Figure 2 shows the building-block of logic of this system.
Based on a Biomedical literature searching system for word grading sorting algorithm, comprise off-line training part and on-line search part; Described off-line training part comprises with lower part:
Search engine inquiry extraction module: for the historical query record according to search engine, extracts the front N bar Query Result document obtained in many group pollings and each inquiry; And will inquire about and Query Result document collection to one inquiry pond in, wherein N is natural number; Search engine inquiry extraction module can according to the inquiry of user, retrieval inquires about with user the Biomedical literature be associated, and the result of retrieval is returned to user, and in internal system, the computing such as expansion of inquiry and operation be can't see transparent user.
Candidate expands vocabulary extraction module: for when given user inquires about, utilize the resource that biomedical sector is intrinsic, in the top n Query Result document that search engine inquiry extraction module obtains, extraction obtains specialized vocabulary, and the weighted sum of the number of times (frequency) occurred in Query Result document this specialized vocabulary or occurrence number carries out record; The weighted sum descending sort of the number of times occurred in Query Result document according to each specialized vocabulary or occurrence number, select M the highest specialized vocabulary of occurrence number alternatively to expand vocabulary, wherein M is natural number;
Candidate expands feature extraction and the labeling module of vocabulary: the candidate for obtaining in candidate's expansion word extraction module expands in vocabulary and extracts associated feature, and expanding the impact of vocabulary for retrieval performance according to candidate, mark candidate expands the degree of correlation of vocabulary; When off-line training, candidate expand vocabulary degree of correlation mark and various features will be used for the input of word grading sorting algorithm; When online query, this module is for extracting the characteristic information expanded vocabulary with candidate and be associated.
Candidate expands vocabulary order models training module: for utilizing word grading sorting algorithm, and after extraction candidate expands vocabulary characteristic sum mark candidate expands vocabulary degree of correlation, training vocabulary order models exports the weighted value that candidate expands each feature of vocabulary; This weighted value can be used in the tolerance of the significance level of the expansion vocabulary to the unknown inquiry.
Described on-line search part comprises:
Query Reconstruction module: expand vocabulary marking for the specialized vocabulary extraction in newly inquiry and candidate; Be included in line search engine queries extraction module, online candidate expands word retrieval and feature extraction and scoring modules, wherein, on-line search engine queries extraction module is used for the new inquiry submitted to online user, and retrieval obtains front N1 bar Query Result; Extract the specialized vocabulary in front N1 bar result for retrieval and various features thereof according to biomedical resource, wherein N1 is natural number.Online candidate expands candidate that word retrieval and feature extraction and scoring modules thereof utilize vocabulary order models to export and expands vocabulary weighted value score and calculate corresponding weight, and is joined in original query, and be expanded inquiry.
Query Result returns module, for expanding query being retrieved the result document obtained, returns to user.What user obtained returns results the result returned results after query expansion being actually its submission input, and the process of query expansion is sightless concerning user.
According to the above-mentioned description being directed to method and system embodiment involved in the present invention, be described in conjunction with specific embodiments.Suppose in the present embodiment that user has completed the training of order models by historical data, when user submits one new inquiry " mad cow disease " (rabid ox disease) to, system is first according to the frequency information of this word in preliminary search before examination document, select the expansion vocabulary of candidate, wherein candidate expand 10 forward expansion vocabulary of rank in vocabulary and correlativity mark situation as shown in the table:
Rank | Vocabulary | Correlativity |
1 | Disease (disease) | Relevant |
2 | Prions (prion) | Relevant |
3 | Cause (causing) | Uncorrelated |
4 | Infectious (infectivity) | Relevant |
5 | Conversion (conversion) | Uncorrelated |
6 | Cow (ox) | Relevant |
7 | Spongiform (spongy tissue) | Relevant |
8 | Fatal (fatal) | Uncorrelated |
9 | Encephalopathies (epileptic encephalopathic) | Relevant |
10 | Mad (madness) | Relevant |
As can be seen from the above table, expand in vocabulary the candidate of first 10 of rank, uncorrelated vocabulary has 3, if directly joined in original query, can produce negative impact to retrieval performance.Next expand vocabulary relevant feature with extraction biomedical dictionary MeSH to candidate from document, and utilize order models to obtain the weight of often kind of feature, vocabulary is expanded to all candidates and again gives a mark and sort.
After sequence, before the final rank selected, the expansion vocabulary of 10 is as shown in the table.As can be seen from the table, 10 inquiries of sorting the most forward in the expanding query after sorting and improving are relative words.Using these inquiries according to the sequence score after its normalization as weight, join in original query, carry out retrieving and can improve the performance of retrieval further.
The description of above-described embodiment is explained and is described the Biomedical literature search method based on word grading sorting algorithm provided by the invention and system.The method and system can utilize the resources such as the knowledge base of biomedical sector to expand the original query that user submits to, word grading sorting algorithm is employed for expanding vocabulary importance measures in expansion, carry out supplementing to inquiry that user submits to by query expansion process and perfect, ensure that the accuracy of Query Result, meet the information requirement of user further.
Above content is in conjunction with concrete optimal technical scheme further description made for the present invention, can not assert that specific embodiment of the invention is confined to these explanations.For general technical staff of the technical field of the invention, without departing from the inventive concept of the premise, some simple deduction or replace can also be made, all should be considered as belonging to protection scope of the present invention.
Claims (8)
1. based on a Biomedical literature search method for word grading sorting algorithm, it is characterized in that, comprise following off-line training step and online query stage, wherein, off-line training step comprises the following steps:
S1, search engine inquiry extraction step: according to the historical query record of search engine, extract the front N bar Query Result document obtained in many group pollings and each inquiry; And will inquire about and Query Result document collection to one inquiry pond in, wherein N is natural number;
S2, candidate expand vocabulary extraction step: extract the specialized vocabulary in inquiry pond before each inquiry in N bar Query Result document according to biomedical resource, and statistics obtains the weighted sum of number of times that each specialized vocabulary occurs in described Query Result document or occurrence number; The weighted sum descending sort of the number of times occurred in Query Result document according to each specialized vocabulary or number of times, select M the highest specialized vocabulary of occurrence number weighted sum that is the highest or number of times alternatively to expand vocabulary, wherein M is natural number;
S3, candidate expand feature extraction and the annotation step of vocabulary:
Candidate expand vocabulary feature extraction and mark carry out simultaneously; Wherein, correlativity mark candidate being expanded to vocabulary is marked by the retrieval performance that contrasts original query and the height of this candidate being expanded retrieval performance when vocabulary joins in original query; The evaluation index of retrieval performance height comprises: accuracy rate, Average Accuracy, NDCG value and MRR value; The concrete mode of correlativity mark is as follows:
Wherein, eval () is the evaluation index function for evaluating retrieval performance height, eval (query+term) is for evaluation index function eval () is evaluating score candidate expanded when vocabulary term joins inquiry query, and eval (query) is the score of evaluation index function when evaluating inquiry query; Label be labeled as this candidate of 1 expression expand vocabulary to inquiry query be relevant; Label is labeled as this candidate of 0 expression and expands vocabulary and inquire about query incoherent;
Candidate expands the feature extraction of vocabulary, expand the correlation information of vocabulary and original query etc. for training order models from the distributed intelligence in biomedical resource of the distributed intelligence of extracting candidate before the inquiry biomedical resource and inquiry pond returns in N bar Query Result document and expand vocabulary, candidate's vocabulary and candidate to prepare, and after the same candidate of extraction expands the various features of vocabulary, all eigenwerts are normalized, so that all eigenwerts are controlled [0,1], on interval, normalized process is as follows:
Wherein, minValue and maxValue is respectively minimum value and the maximal value of a certain feature;
S4, candidate expand vocabulary order models training step: degree of correlation mark and the various features of expanding vocabulary according to candidate, word grading sorting algorithm is utilized to train the weighted value obtaining often kind of feature, concrete steps are: select to be noted as in a step S3 relevant candidate and expand vocabulary and be somely marked as incoherent candidate and expand vocabulary and form a word grouping, select some such words to divide into groups as training sample; Be the feature imparting initial weight of wherein each candidate's expansion word at random, by characteristic weighing score, the correlation candidate expansion vocabulary in each word grouping sorted; According to the ranking results that each word divides into groups, calculated population sequence loss, according to the weight of the every one-dimensional characteristic of Grad dynamic conditioning of loss function, loss of wherein sorting is:
wherein NumSample is the quantity that in word grouping, candidate expands vocabulary grouping, loss
ifor the penalty values that each word divides into groups, this penalty values is obtained by the sorting position calculating related expanding vocabulary, and the penalty values of the more forward correspondence of sorting position is less; By a process on loop iteration, train, using the eigenwert finally selected as the order models of having trained until overall loss value is less than a certain threshold value or reaches the iterations of specifying;
The online query stage comprises the following steps:
S5, on-line search engine queries and extraction step: the new inquiry that user is submitted to online, retrieval obtains front N1 bar Query Result; Extract the specialized vocabulary in front N1 bar result for retrieval and various features thereof according to biomedical resource, wherein N1 is natural number; ;
S6, online candidate expand word retrieval and feature extraction thereof and marking step: feature extracting method that vocabulary extracting method and candidate expand vocabulary extracts the online query stage specialized vocabulary in front N1 bar result for retrieval and various features thereof to utilize the candidate of off-line phase S2-S3 to expand according to biomedical resource to new inquiry, obtain online query stage candidate and expand vocabulary, the feature of extraction expands the importance of vocabulary in expanding query for weighing candidate; Train the feature weight obtained according to step S4, give a mark for online query stage candidate expands vocabulary, and the K1 selecting a mark forward candidate expands vocabulary joins as expanding query in the online new inquiry submitted to, wherein K1 is natural number;
To mark for utilizing biomedical resource and the some online query stage candidates extracted expand vocabulary, must being divided into of it
wherein FeatureNum is the sum of feature, and ai is the weighted value of i-th feature in order models, feature
i(term) be the eigenwert that online query stage candidate expands i-th feature corresponding to vocabulary term;
Expand vocabulary score according to online query stage candidate to sort to it, and forward K1 the online query stage candidate of selected and sorted expands vocabulary when joining in the online new inquiry submitted to as expansion vocabulary, the online query stage candidate added expands the weight of vocabulary in expanding query and can be expressed as
Wherein sign is sign function, sign=1 when in the new inquiry that this online query stage candidate's expansion word remittance abroad is submitted to now online, otherwise sign=0, weight
originalfor the weighted value of new inquiry in expanding query submitted to online;
S7, Query Result return step: retrieve according to expanding query, and result for retrieval is returned to user.
2. a kind of Biomedical literature search method based on word grading sorting algorithm according to claim 1, is characterized in that, in step S2, the weighted sum of specialized vocabulary occurrence number in described Query Result document is
wherein count
ifor the number of times that this vocabulary occurs in i-th section of document, d (i) is the decay factor of i-th section of document.
3. a kind of Biomedical literature search method based on word grading sorting algorithm according to claim 1, is characterized in that, in step s3, evaluation index function eval () is Average Accuracy function, that is:
Wherein, RelDoc
queryfor the number of the relevant documentation of given inquiry query, rank (i) represents the position of i-th section of relevant documentation in document results sorted lists.
4. a kind of Biomedical literature search method based on word grading sorting algorithm according to claim 1, it is characterized in that, in step sl, when situation without historical query record, by constructing the mode of biomedical retrieval and indexing method, the artificial record obtaining inquiry and result thereof; Described search method adopts vector space model, BM25 retrieval model or the language model based on different smoothing method.
5. a kind of Biomedical literature search method based on word grading sorting algorithm according to claim 1, it is characterized in that, in step S4, penalty values is:
wherein rank
ifor the position that relevant candidate's expansion word sorts in word group list.
6. a kind of Biomedical literature search method based on word grading sorting algorithm according to claim 1, it is characterized in that, biomedical resource refers to the dictionary or knowledge base that comprise biomedical specialized vocabulary.
7. a kind of Biomedical literature search method based on word grading sorting algorithm according to claim 1, it is characterized in that, the feature that described candidate expands vocabulary comprises candidate and expands the frequency TF that vocabulary occurs in result document, candidate expands the TF-IDF value of vocabulary, candidate expands the document number that vocabulary and original query occur jointly, candidate expands vocabulary and original query common number of times occurred in one text window, the number of times that candidate's expansion word remittance abroad is existing in biomedical resource, in biomedical resource, comprise this candidate and expand the number of the term concepts of vocabulary and the relation of inclusion between biomedical technical term concept.
8. based on a Biomedical literature searching system for word grading sorting algorithm, it is characterized in that, comprise off-line training part and on-line search part; Described off-line training part comprises with lower part:
Search engine inquiry extraction module: for the historical query record according to search engine, extracts the front N bar Query Result document obtained in many group pollings and each inquiry; And will inquire about and Query Result document collection to one inquiry pond in, wherein N is natural number;
Candidate expands vocabulary extraction module: for when given user inquires about, utilize the resource that biomedical sector is intrinsic, in the top n Query Result document that search engine inquiry extraction module obtains, extraction obtains specialized vocabulary, and the weighted sum of the frequency occurred in Query Result document this specialized vocabulary or occurrence number carries out record; The weighted sum descending sort of the number of times occurred in Query Result document according to each specialized vocabulary or occurrence number, select M the highest specialized vocabulary of occurrence number alternatively to expand vocabulary, wherein M is natural number;
Candidate expands feature extraction and the labeling module of vocabulary: the candidate for obtaining in candidate's expansion word extraction module expands in vocabulary and extracts associated feature, and expanding the impact of vocabulary for retrieval performance according to candidate, mark candidate expands the degree of correlation of vocabulary;
Candidate expands vocabulary order models training module: for utilizing word grading sorting algorithm, and after extraction candidate expands vocabulary characteristic sum mark candidate expands vocabulary degree of correlation, training vocabulary order models obtains the weighted value that candidate expands each feature of vocabulary;
Described on-line search part comprises:
Query Reconstruction module: expand vocabulary marking for the specialized vocabulary extraction in newly inquiry and candidate; Be included in line search engine queries extraction module, online candidate expands word retrieval and feature extraction and scoring modules, wherein, on-line search engine queries extraction module is used for the new inquiry submitted to online user, and retrieval obtains front N1 bar Query Result; Extract the specialized vocabulary in front N1 bar result for retrieval and various features thereof according to biomedical resource, wherein N1 is natural number.Online candidate expands candidate that word retrieval and feature extraction and scoring modules thereof utilize vocabulary order models to export and expands vocabulary weighted value score and calculate corresponding weight, and is joined in original query, and be expanded inquiry;
Query Result returns module: for expanding query being retrieved the result document obtained, return to user.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510147696.5A CN104750819B (en) | 2015-03-31 | 2015-03-31 | The Biomedical literature search method and system of a kind of word-based grading sorting algorithm |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510147696.5A CN104750819B (en) | 2015-03-31 | 2015-03-31 | The Biomedical literature search method and system of a kind of word-based grading sorting algorithm |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104750819A true CN104750819A (en) | 2015-07-01 |
CN104750819B CN104750819B (en) | 2018-01-23 |
Family
ID=53590503
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510147696.5A Active CN104750819B (en) | 2015-03-31 | 2015-03-31 | The Biomedical literature search method and system of a kind of word-based grading sorting algorithm |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104750819B (en) |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106095838A (en) * | 2016-06-01 | 2016-11-09 | 比美特医护在线(北京)科技有限公司 | A kind of data processing method and device |
CN106294654A (en) * | 2016-08-04 | 2017-01-04 | 首都师范大学 | A kind of body sort method and system |
CN106919649A (en) * | 2017-01-19 | 2017-07-04 | 北京奇艺世纪科技有限公司 | A kind of method and device of entry weight calculation |
CN107644011A (en) * | 2016-07-20 | 2018-01-30 | 百度(美国)有限责任公司 | System and method for the extraction of fine granularity medical bodies |
CN108509461A (en) * | 2017-02-28 | 2018-09-07 | 华为技术有限公司 | A kind of sequence learning method and server based on intensified learning |
CN108520038A (en) * | 2018-03-31 | 2018-09-11 | 大连理工大学 | A kind of Biomedical literature search method based on Ranking Algorithm |
CN109508392A (en) * | 2018-09-28 | 2019-03-22 | 中国标准化研究院 | A kind of technical literature index announcement search method |
CN109857731A (en) * | 2019-01-11 | 2019-06-07 | 吉林大学 | A kind of peek-a-boo and search method of biomedicine entity relationship |
CN110019888A (en) * | 2017-12-01 | 2019-07-16 | 北京搜狗科技发展有限公司 | A kind of searching method and device |
CN113434767A (en) * | 2021-07-07 | 2021-09-24 | 携程旅游信息技术(上海)有限公司 | UGC text content mining method, system, device and storage medium |
CN113486156A (en) * | 2021-07-30 | 2021-10-08 | 北京鼎普科技股份有限公司 | ES-based associated document retrieval method |
CN113742459A (en) * | 2021-11-05 | 2021-12-03 | 北京世纪好未来教育科技有限公司 | Vocabulary display method and device, electronic equipment and storage medium |
CN115016873A (en) * | 2022-05-05 | 2022-09-06 | 上海乾臻信息科技有限公司 | Front-end data interaction method and system, electronic equipment and readable storage medium |
CN115659047A (en) * | 2022-11-11 | 2023-01-31 | 南京汇宁桀信息科技有限公司 | Medical literature retrieval method based on hybrid algorithm |
CN117076658A (en) * | 2023-08-22 | 2023-11-17 | 南京朗拓科技投资有限公司 | Quotation recommendation method, device and terminal based on information entropy |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040158560A1 (en) * | 2003-02-12 | 2004-08-12 | Ji-Rong Wen | Systems and methods for query expansion |
CN103942302A (en) * | 2014-04-16 | 2014-07-23 | 苏州大学 | Method for establishment and application of inter-relevance-feedback relational network |
-
2015
- 2015-03-31 CN CN201510147696.5A patent/CN104750819B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040158560A1 (en) * | 2003-02-12 | 2004-08-12 | Ji-Rong Wen | Systems and methods for query expansion |
CN103942302A (en) * | 2014-04-16 | 2014-07-23 | 苏州大学 | Method for establishment and application of inter-relevance-feedback relational network |
Non-Patent Citations (3)
Title |
---|
徐博等: "基于模板抽取和丰富特征的药名词典生成", 《第五届全国信息检索学术会议论文集》 * |
朱玉皎: "个性化智能搜索引擎中查询扩展技术研究", 《万方数据》 * |
林原等: "一种基于位置优化的排序学习方法", 《山东大学学报(工学版)》 * |
Cited By (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106095838A (en) * | 2016-06-01 | 2016-11-09 | 比美特医护在线(北京)科技有限公司 | A kind of data processing method and device |
CN107644011A (en) * | 2016-07-20 | 2018-01-30 | 百度(美国)有限责任公司 | System and method for the extraction of fine granularity medical bodies |
CN107644011B (en) * | 2016-07-20 | 2023-11-07 | 百度(美国)有限责任公司 | System and method for fine-grained medical entity extraction |
CN106294654A (en) * | 2016-08-04 | 2017-01-04 | 首都师范大学 | A kind of body sort method and system |
CN106919649B (en) * | 2017-01-19 | 2020-06-26 | 北京奇艺世纪科技有限公司 | Entry weight calculation method and device |
CN106919649A (en) * | 2017-01-19 | 2017-07-04 | 北京奇艺世纪科技有限公司 | A kind of method and device of entry weight calculation |
WO2018157625A1 (en) * | 2017-02-28 | 2018-09-07 | 华为技术有限公司 | Reinforcement learning-based method for learning to rank and server |
US11500954B2 (en) | 2017-02-28 | 2022-11-15 | Huawei Technologies Co., Ltd. | Learning-to-rank method based on reinforcement learning and server |
CN108509461A (en) * | 2017-02-28 | 2018-09-07 | 华为技术有限公司 | A kind of sequence learning method and server based on intensified learning |
CN110019888A (en) * | 2017-12-01 | 2019-07-16 | 北京搜狗科技发展有限公司 | A kind of searching method and device |
CN108520038A (en) * | 2018-03-31 | 2018-09-11 | 大连理工大学 | A kind of Biomedical literature search method based on Ranking Algorithm |
CN108520038B (en) * | 2018-03-31 | 2020-11-10 | 大连理工大学 | Biomedical literature retrieval method based on sequencing learning algorithm |
CN109508392A (en) * | 2018-09-28 | 2019-03-22 | 中国标准化研究院 | A kind of technical literature index announcement search method |
CN109857731A (en) * | 2019-01-11 | 2019-06-07 | 吉林大学 | A kind of peek-a-boo and search method of biomedicine entity relationship |
CN113434767A (en) * | 2021-07-07 | 2021-09-24 | 携程旅游信息技术(上海)有限公司 | UGC text content mining method, system, device and storage medium |
CN113486156A (en) * | 2021-07-30 | 2021-10-08 | 北京鼎普科技股份有限公司 | ES-based associated document retrieval method |
CN113742459A (en) * | 2021-11-05 | 2021-12-03 | 北京世纪好未来教育科技有限公司 | Vocabulary display method and device, electronic equipment and storage medium |
CN115016873A (en) * | 2022-05-05 | 2022-09-06 | 上海乾臻信息科技有限公司 | Front-end data interaction method and system, electronic equipment and readable storage medium |
CN115659047A (en) * | 2022-11-11 | 2023-01-31 | 南京汇宁桀信息科技有限公司 | Medical literature retrieval method based on hybrid algorithm |
CN115659047B (en) * | 2022-11-11 | 2023-07-28 | 南京汇宁桀信息科技有限公司 | Medical document retrieval method based on hybrid algorithm |
CN117076658A (en) * | 2023-08-22 | 2023-11-17 | 南京朗拓科技投资有限公司 | Quotation recommendation method, device and terminal based on information entropy |
CN117076658B (en) * | 2023-08-22 | 2024-05-03 | 南京朗拓科技投资有限公司 | Quotation recommendation method, device and terminal based on information entropy |
Also Published As
Publication number | Publication date |
---|---|
CN104750819B (en) | 2018-01-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104750819A (en) | Biomedicine literature search method and system based on word grading sorting algorithm | |
CN104391942B (en) | Short essay eigen extended method based on semantic collection of illustrative plates | |
CN104199857B (en) | A kind of tax document hierarchy classification method based on multi-tag classification | |
CN104834735B (en) | A kind of documentation summary extraction method based on term vector | |
CN109344236A (en) | One kind being based on the problem of various features similarity calculating method | |
CN106663125A (en) | Question sentence generation device and computer program | |
CN111401040B (en) | Keyword extraction method suitable for word text | |
CN108520038B (en) | Biomedical literature retrieval method based on sequencing learning algorithm | |
CN101751455B (en) | Method for automatically generating title by adopting artificial intelligence technology | |
CN110879831A (en) | Chinese medicine sentence word segmentation method based on entity recognition technology | |
CN101963971A (en) | Use relevance feedback to carry out the method and the corresponding storage medium of database search | |
CN103927358A (en) | Text search method and system | |
CN103150381B (en) | A kind of High-precision Chinese predicate identification method | |
CN101539907A (en) | Part-of-speech tagging model training device and part-of-speech tagging system and method thereof | |
CN103729432A (en) | Method for analyzing and sequencing academic influence of theme literature in citation database | |
CN104765779A (en) | Patent document inquiry extension method based on YAGO2s | |
CN107291895A (en) | A kind of quick stratification document searching method | |
CN114090861A (en) | Education field search engine construction method based on knowledge graph | |
CN107436955A (en) | A kind of English word relatedness computation method and apparatus based on Wikipedia Concept Vectors | |
CN110851593A (en) | Complex value word vector construction method based on position and semantics | |
CN107679121B (en) | Mapping method and device of classification system, storage medium and computing equipment | |
Mohsen et al. | On the automatic construction of an Arabic thesaurus | |
CN113269477B (en) | Scientific research project query scoring model training method, query method and device | |
CN103793474B (en) | Knowledge management oriented user-defined knowledge classification method | |
CN110990376B (en) | Subject classification automatic indexing method based on multi-factor mixed ordering mechanism |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |