CN104750819A - Biomedicine literature search method and system based on word grading sorting algorithm - Google Patents

Biomedicine literature search method and system based on word grading sorting algorithm Download PDF

Info

Publication number
CN104750819A
CN104750819A CN201510147696.5A CN201510147696A CN104750819A CN 104750819 A CN104750819 A CN 104750819A CN 201510147696 A CN201510147696 A CN 201510147696A CN 104750819 A CN104750819 A CN 104750819A
Authority
CN
China
Prior art keywords
vocabulary
candidate
query
inquiry
expands
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510147696.5A
Other languages
Chinese (zh)
Other versions
CN104750819B (en
Inventor
徐博
林鸿飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian University of Technology
Original Assignee
Dalian University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian University of Technology filed Critical Dalian University of Technology
Priority to CN201510147696.5A priority Critical patent/CN104750819B/en
Publication of CN104750819A publication Critical patent/CN104750819A/en
Application granted granted Critical
Publication of CN104750819B publication Critical patent/CN104750819B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a biomedicine literature search method and system based on a word grading sorting algorithm. The biomedicine literature search method comprises the steps that 1, inquiry and extraction are conducted through a search engine; 2, candidate expanding vocabulary is extracted; 3, the features of the candidate expanding vocabulary are extracted and labeled; 4, a candidate expanding vocabulary sorting model is trained; 5, online inquiry and extraction are conducted through the search engine; 6, the candidate expanding vocabulary is extracted online, the features of the candidate expanding vocabulary are extracted online, and grading is conducted online; 7, a search result is fed back. The biomedicine literature search system comprises a search engine inquiry and extraction module, a candidate expanding vocabulary extraction module, a candidate expanding vocabulary feature extracting and labeling module, a candidate expanding vocabulary sorting model training module, a search reconstruction module and a search result feedback module. By the adoption of the biomedicine literature search method and system, from the point of search expansion, the specialized vocabulary which can meet the user information needs to the greatest extent is selected by utilizing the word grading sorting algorithm and the inherent dictionary resources in the biomedicine field in the search expanding process, so that the search task is completed, and the search performance is improved.

Description

A kind of Biomedical literature search method based on word grading sorting algorithm and system
Technical field
The present invention relates to data mining and search engine technique field, especially a kind of Biomedical literature search method based on word grading sorting algorithm and system.
Background technology
In recent years, along with the fast development in biomedical (Biomedicine) field, biomedical correlative study achieves more valuable achievement, these achievements not only facilitate the treatment that some once seemed insoluble disease, from the angle of more far-reaching, the mankind are also promoted for the development self be familiar with and deeply.
But along with the increase at full speed of Biomedical literature quantity, the quantity of relevant information is also in exponentially property increase, the document of magnanimity and information are that the acquisition of information of biomedical researcher and relevant practitioner brings a difficult problem, and traditional manual information obtain manner has become no longer applicable gradually, therefore, need the techniques and methods by means of information retrieval, assist related personnel to obtain required information.
The inquiry that traditional information retrieval technique can be submitted to according to user, carries out relevance ranking to document or webpage, and ranking results is returned to user.And directly traditional information retrieval method is applied in the retrieval tasks of Biomedical literature, be difficult to obtain good retrieval performance.Its reason is the inherent characteristics failing to consider biomedical sector fully, and such as biomedical sector has more specialized vocabulary, and these specialized vocabularies often exist the situation of a lot of synonym and abb. simultaneously.If the feature of biomedical sector can be considered in traditional information retrieval method fully, the performance of biomedical information retrieval will be improved further.
Query expansion technology is one of the gordian technique in conventional IR field.On the basis of the original query that it can be submitted to user, the retrieval according to user is intended to, and carries out supplementary and perfect, thus is more met the inquiry of user search intention, improve the performance of retrieval to inquiry.Existing enquiry expanding method can be divided into two large classes: a class is the enquiry expanding method based on collection of document, these class methods with total data collection of document or partial data collection of document for research object, therefrom extract content associated with the query, improve original query; Another kind of is query expansion technology based on outside extended resources, external resource mainly includes dictionary resources, searching system inquiry log, Anchor Text and wikipedia etc., a lot of research shows to utilize outside extended resources to improve original query, can better complete query expansion task, and then promote the performance of retrieval.
There is the Domain resources such as more dictionary due to biomedical sector, if can in the process of information retrieval, make full use of these resources and carry out the inquiry that user submits to supplementary and perfect, by there is a strong possibility, property gets a promotion the performance of retrieval.
Set up the literature search being directed to biomedical sector, first should understand feature and the resource in this field.A large amount of specialized vocabularies is there is in the document of biomedical sector, and these vocabulary contain the complex situations such as a lot of synonym and abb., this foundation being searching system brings huge challenge, such as drug acetaminophen, its english name is called paracetamol, and in international standard classification of drug, its title is paracetamol (acetaminophen), C8H9NO2 or NO2BE01 at its formal name used at school of medicinal chemistry art, be directed to the situation of above multiple title, if only inquire about one of them name in retrieval, be difficult to retrieve all relevant documents.Fortunately, many intrinsic knowledge bases and resource is also there is at biomedical sector, such as MeSH (MeSH:MedicalSubjectHeadings) and gene ontology (GO:Gene Ontology) etc., if these resources can be utilized fully in the process of retrieval, huge lifting will be brought to the performance of Biomedical literature retrieval.
Sequence study (learning to rank) algorithm be a series of in information retrieval to the general name of the supervised learning algorithm of document ordering, its principal feature is that technology that applied for machines learns solves the sequencing problem in information retrieval, and obtains good retrieval ordering performance.Wherein sequencing problem also can be regarded as the select permeability of an optimum item, and therefore, Ranking Algorithm is applied to other task multiple in recent years, such as in commending system according to the historical information of user and article for user recommends corresponding article etc.
Summary of the invention
The object of this invention is to provide one and can provide Biomedical literature more accurately for user, more effectively meet the information requirement of user, the effective Biomedical literature search method based on word grading sorting algorithm and the system supplementing and improve user's inquiry.
The present invention solves the technical scheme that prior art problem adopts: a kind of Biomedical literature search method based on word grading sorting algorithm, and comprise following off-line training step and online query stage, wherein, off-line training step comprises the following steps:
S1, search engine inquiry extraction step: according to the historical query record of search engine, extract the front N bar Query Result document obtained in many group pollings and each inquiry; And will inquire about and Query Result document collection to one inquiry pond in, wherein N is natural number;
S2, candidate expand vocabulary extraction step: extract the specialized vocabulary in inquiry pond before each inquiry in N bar Query Result document according to biomedical resource, and statistics obtains the weighted sum of number of times that each specialized vocabulary occurs in described Query Result document or occurrence number; The weighted sum descending sort of the number of times occurred in Query Result document according to each specialized vocabulary or number of times, select M the highest specialized vocabulary of occurrence number weighted sum that is the highest or number of times alternatively to expand vocabulary, wherein M is natural number;
S3, candidate expand feature extraction and the annotation step of vocabulary:
Candidate expand vocabulary feature extraction and mark carry out simultaneously; Wherein, correlativity mark candidate being expanded to vocabulary is marked by the retrieval performance that contrasts original query and the height of this candidate being expanded retrieval performance when vocabulary joins in original query; The evaluation index of retrieval performance height comprises: accuracy rate, Average Accuracy, NDCG value and MRR value; The concrete mode of correlativity mark is as follows:
label = 1 eval ( query + term ) > eval ( query ) 0 eval ( query + term ) ≤ eval ( query )
Wherein, eval () is the evaluation index function for evaluating retrieval performance height, eval (query+term) is for evaluation index function eval () is evaluating score candidate expanded when vocabulary term joins inquiry query, and eval (query) is the score of evaluation index function when evaluating inquiry query; Label be labeled as this candidate of 1 expression expand vocabulary to inquiry query be relevant; Label is labeled as this candidate of 0 expression and expands vocabulary and inquire about query incoherent;
Candidate expands the feature extraction of vocabulary, expand the correlation information of vocabulary and original query etc. for training order models from the distributed intelligence in biomedical resource of the distributed intelligence of extracting candidate before the inquiry biomedical resource and inquiry pond returns in N bar Query Result document and expand vocabulary, candidate's vocabulary and candidate to prepare, and after the same candidate of extraction expands the various features of vocabulary, all eigenwerts are normalized, so that all eigenwerts are controlled [0,1], on interval, normalized process is as follows:
newFeatureValue = oldFeatureValue - min Value max Value - min Value
Wherein, minValue and maxValue is respectively minimum value and the maximal value of a certain feature;
S4, candidate expand vocabulary order models training step: degree of correlation mark and the various features of expanding vocabulary according to candidate, word grading sorting algorithm is utilized to train the weighted value obtaining often kind of feature, concrete steps are: select to be noted as in a step S3 relevant candidate and expand vocabulary and be somely marked as incoherent candidate and expand vocabulary and form a word grouping, select some such words to divide into groups as training sample; Be the feature imparting initial weight of wherein each candidate's expansion word at random, by characteristic weighing score, the correlation candidate expansion vocabulary in each word grouping sorted; According to the ranking results that each word divides into groups, calculated population sequence loss, according to the weight of the every one-dimensional characteristic of Grad dynamic conditioning of loss function, loss of wherein sorting is: wherein NumSample is the quantity that in word grouping, candidate expands vocabulary grouping, loss ifor the penalty values that each word divides into groups, this penalty values is obtained by the sorting position calculating related expanding vocabulary, and the penalty values of the more forward correspondence of sorting position is less; By a process on loop iteration, train, using the eigenwert finally selected as the order models of having trained until overall loss value is less than a certain threshold value or reaches the iterations of specifying;
The online query stage comprises the following steps:
S5, on-line search engine queries and extraction step: the new inquiry that user is submitted to online, retrieval obtains front N1 bar Query Result; Extract the specialized vocabulary in front N1 bar result for retrieval and various features thereof according to biomedical resource, wherein N1 is natural number; ;
S6, online candidate expand word retrieval and feature extraction thereof and marking step: feature extracting method that vocabulary extracting method and candidate expand vocabulary extracts the online query stage specialized vocabulary in front N1 bar result for retrieval and various features thereof to utilize the candidate of off-line phase S2-S3 to expand according to biomedical resource to new inquiry, obtain online query stage candidate and expand vocabulary, the feature of extraction expands the importance of vocabulary in expanding query for weighing candidate; Train the feature weight obtained according to step S4, give a mark for online query stage candidate expands vocabulary, and the K1 selecting a mark forward candidate expands vocabulary joins as expanding query in the online new inquiry submitted to, wherein K1 is natural number;
To mark for utilizing biomedical resource and the some online query stage candidates extracted expand vocabulary, must being divided into of it wherein FeatureNum is the sum of feature, and ai is the weighted value of i-th feature in order models, feature i(term) be the eigenwert that online query stage candidate expands i-th feature corresponding to vocabulary term;
Expand vocabulary score according to online query stage candidate to sort to it, and forward K1 the online query stage candidate of selected and sorted expands vocabulary when joining in the online new inquiry submitted to as expansion vocabulary, the online query stage candidate added expands the weight of vocabulary in expanding query and can be expressed as weight = Σ i = 1 count weight i · feature i + sign · weight original , Wherein sign is sign function, sign=1 when in the new inquiry that this online query stage candidate's expansion word remittance abroad is submitted to now online, otherwise sign=0, weight originalfor the weighted value of new inquiry in expanding query submitted to online;
S7, Query Result return step: retrieve according to expanding query, and result for retrieval is returned to user.
In step S2, the weighted sum of specialized vocabulary occurrence number in described Query Result document is wherein count ifor the number of times that this vocabulary occurs in i-th section of document, d (i) is the decay factor of i-th section of document.
In step s3, evaluation index function eval () is Average Accuracy function, that is:
eval MAP = 1 RelDoc query · Σ i = 1 RelDoc query i rank ( i )
Wherein, RelDoc queryfor the number of the relevant documentation of given inquiry query, rank (i) represents the position of i-th section of relevant documentation in document results sorted lists.
In step sl, when situation without historical query record, by constructing the mode of biomedical retrieval and indexing method, the artificial record obtaining inquiry and result thereof; Described search method adopts vector space model, BM25 retrieval model or the language model based on different smoothing method.
In step S4, penalty values is: wherein rank ifor the position that relevant candidate's expansion word sorts in word group list.
Biomedical resource refers to the dictionary or knowledge base that comprise biomedical specialized vocabulary.
The document number that the feature that described candidate expands vocabulary comprises candidate expands frequency TF that vocabulary occurs in result document, candidate expands vocabulary TF-IDF value, candidate expands vocabulary and original query occurs jointly, candidate expand number of times that vocabulary and original query jointly occur in one text window, candidate's expansion word remittance abroad is existing in biomedical resource number of times, in biomedical resource, comprise this candidate and expand the number of the term concepts of vocabulary and the relation of inclusion between biomedical technical term concept.
Based on a Biomedical literature searching system for word grading sorting algorithm, comprise off-line training part and on-line search part; Described off-line training part comprises with lower part:
Search engine inquiry extraction module: for the historical query record according to search engine, extracts the front N bar Query Result document obtained in many group pollings and each inquiry; And will inquire about and Query Result document collection to one inquiry pond in, wherein N is natural number;
Candidate expands vocabulary extraction module: for when given user inquires about, utilize the resource that biomedical sector is intrinsic, in the top n Query Result document that search engine inquiry extraction module obtains, extraction obtains specialized vocabulary, and the weighted sum of the frequency occurred in Query Result document this specialized vocabulary or occurrence number carries out record; The weighted sum descending sort of the number of times occurred in Query Result document according to each specialized vocabulary or occurrence number, select M the highest specialized vocabulary of occurrence number alternatively to expand vocabulary, wherein M is natural number;
Candidate expands feature extraction and the labeling module of vocabulary: the candidate for obtaining in candidate's expansion word extraction module expands in vocabulary and extracts associated feature, and expanding the impact of vocabulary for retrieval performance according to candidate, mark candidate expands the degree of correlation of vocabulary;
Candidate expands vocabulary order models training module: for utilizing word grading sorting algorithm, and after extraction candidate expands vocabulary characteristic sum mark candidate expands vocabulary degree of correlation, training vocabulary order models obtains the weighted value that candidate expands each feature of vocabulary;
Described on-line search part comprises:
Query Reconstruction module: expand vocabulary marking for the specialized vocabulary extraction in newly inquiry and candidate; Be included in line search engine queries extraction module, online candidate expands word retrieval and feature extraction and scoring modules, wherein, on-line search engine queries extraction module is used for the new inquiry submitted to online user, and retrieval obtains front N1 bar Query Result; Extract the specialized vocabulary in front N1 bar result for retrieval and various features thereof according to biomedical resource, wherein N1 is natural number.Online candidate expands candidate that word retrieval and feature extraction and scoring modules thereof utilize vocabulary order models to export and expands vocabulary weighted value score and calculate corresponding weight, and is joined in original query, and be expanded inquiry;
Query Result returns module: for expanding query being retrieved the result document obtained, return to user.
Beneficial effect of the present invention is: the present invention is mainly from the angle of query expansion, by the specialized vocabulary utilizing the resource selection such as word grading sorting algorithm and the intrinsic dictionary of biomedical sector can express customer information requirement in query expansion, the more efficiently task of completing retrieval, thus provide the result for retrieval that demand is with it properer for user, the present invention utilizes the resource in biomedical sector, supplement and improve original query, and then improving the performance of retrieval.When the set of use TREC gene task data in literature is as data acquisition, when adopting traditional BM25 retrieval model to carry out literature search as reference retrieval model, the literature search accuracy rate of 25.62% can be obtained; And when adopting method and system involved in the present invention to retrieve on this basis, the literature search accuracy rate of 26.30% can be obtained, retrieval performance obtains significant lifting and peek-a-boo involved in the present invention can retrieve the Biomedical literature the most relevant with user's inquiry effectively, improves the satisfaction of user.
Accompanying drawing explanation
Fig. 1 is the schematic flow sheet of search method of the present invention;
Fig. 2 is the logical organization schematic diagram of searching system of the present invention.
Embodiment
Below in conjunction with the drawings and the specific embodiments, the present invention will be described:
Fig. 1 is the schematic flow sheet of a kind of Biomedical literature search method based on word grading sorting algorithm of the present invention, a kind of Biomedical literature search method based on word grading sorting algorithm, comprise following off-line training step and online query stage, wherein, off-line training step comprises the following steps:
S1, search engine inquiry extraction step: according to the historical query record of search engine, extract the front N bar Query Result document obtained in many group pollings and each inquiry; And will inquire about and Query Result document collection to one inquiry pond in, N is natural number.In the present embodiment, N=10;
Wherein, the historical query record of search engine is the query history that records for the searching system of Biomedical literature of pointer and corresponding Query Result mainly, and these inquiries and corresponding Query Result will be used for the training of order models under off-line state.
When situation without relevant historical query note, can by the mode of the biomedical retrieval and indexing of structure, the artificial record obtaining inquiry and result for retrieval thereof.Search method can adopt the multiple order models in conventional IR, includes but not limited to vector space model, BM25 retrieval model, based on the language model etc. of different smoothing method.
S2, candidate expand vocabulary extraction step: extract the specialized vocabulary in inquiry pond before each inquiry in N bar Query Result document according to biomedical resource, and statistics obtains the weighted sum of number of times that each specialized vocabulary occurs in described Query Result document or occurrence number; The weighted sum descending sort of the number of times occurred in Query Result document according to each specialized vocabulary or number of times, select the highest or that number of times weighted sum is the highest M the specialized vocabulary of occurrence number alternatively to expand vocabulary, wherein M is natural number;
Wherein, biomedical resource refers to the resources such as the dictionary that comprises biomedical specialized vocabulary or knowledge base, include but not limited to: MeSH (MeSH), the super lexicon (Metathesaurus) that gene ontology (GO) and Unified Medical Language System (UMLS) are issued, semantic network (Semantic Network) and expert's semantic dictionary instrument (SPECIALIST Lexicon and Lexical Tools) etc.
For MeSH MeSH as biomedical resource used in the present invention, extract the specialized vocabulary in the front N section Query Result document corresponding to inquiry, the weighted sum of its number of times occurred in a document or occurrence number that each specialized vocabulary wherein extracted is corresponding.Such as specialized vocabulary term in a front N section document occurrence number weighted sum by calculate, wherein count ifor the number of times that this vocabulary occurs in i-th section of document, d (i) is the decay factor of i-th section of document, the number of times weighted sum of specialized vocabulary is used for being weighted the word frequency occurred in different document, thus making the word frequency sorted in forward document have larger weight, the score that the specialized vocabulary controlling to make to comprise in the document sorted more rearward obtains is fewer.According to above-mentioned formula in the value of count (term) from high to low selected specialized vocabulary is sorted, or the value according to score (term) sorts to selected specialized vocabulary from high to low, front M the vocabulary expansion vocabulary alternatively that selected and sorted is the most forward, the value of M is 150 in the present embodiment.
S3, candidate expand feature extraction and the correlativity annotation step of vocabulary:
Candidate expand vocabulary feature extraction and mark carry out simultaneously; Wherein, to candidate expand vocabulary correlativity mark by contrast original query retrieval performance and this expansion vocabulary is joined in original query time retrieval performance realize.The thinking that candidate expands the mark of vocabulary is: single candidate is expanded vocabulary and join in original query and retrieve, if the lifting of result for retrieval performance, then marks this expansion vocabulary and original query has correlativity.The evaluation index of retrieval performance is including but not limited to accuracy rate (Precision), Average Accuracy (MAP), NDCG value and MRR value etc.The concrete mode of mark is as follows:
label = 1 eval ( query + term ) > eval ( query ) 0 eval ( query + term ) ≤ eval ( query )
Wherein, eval () is the evaluation index function for evaluating retrieval performance height, eval (query+term) is for evaluation index function eval () is evaluating score candidate expanded when vocabulary term joins given inquiry query, and eval (query) is the score of evaluation index function when evaluating given inquiry query.When add by original query evaluation score that a certain candidate's vocabulary carries out retrieving be greater than original query itself carry out the evaluation score retrieved time, this candidate is expanded vocabulary and is labeled as 1, be labeled as 1 and mean that this vocabulary and original query are relevant; And when original query add evaluation score that a certain candidate's vocabulary carries out retrieving be not more than original query itself carry out the evaluation score retrieved time, this candidate is expanded vocabulary is labeled as 0, is labeled as 0 incoherent when meaning this vocabulary and original query.
In the present embodiment, evaluation function eval () is Average Accuracy, that is:
eval MAP = 1 RelDoc query · Σ i = 1 RelDoc query i rank ( i )
Wherein, RelDoc queryfor the number of the relevant documentation of given inquiry query, rank (i) represents the position of i-th section of relevant documentation in document results sorted lists, and such as rank (3)=5 to represent that in sort result list the 3rd section of relevant documentation appears at the 5th position of sorted lists.
And candidate expands the feature extraction of vocabulary, ask in result document from N investigation before the inquiry biomedical resource and inquiry pond returns to extract candidate and expand the distributed intelligence in biomedical resource of the distributed intelligence of vocabulary, candidate's vocabulary and candidate and expand the correlation information of vocabulary and original query etc. for training order models and prepare, and after the same candidate of extraction expands the various features of vocabulary, all eigenwerts are normalized; All eigenwerts to be controlled on [0,1] interval, normalized detailed process is:
newFeatureValue = oldFeatureValue - min Value max Value - min Value , MinValue and maxValue is respectively minimum value and the maximal value of a certain feature.
Wherein, the feature expanding vocabulary specifically comprises:
1, candidate expands the frequency TF that vocabulary occurs in result document.This feature can obtain according to specialized vocabulary term occurrence number in result document.
2, candidate expands the TF-IDF value of vocabulary.TF-IDF is one of classical model of information retrieval field, can be used to weigh the relative importance of vocabulary, computing method as shown by the following formula:
score TF - IDF = count ( term ) · log TotalDoc df ( term )
Wherein count (term) expands the number of times that vocabulary occurs in i-th section of result document for candidate, and TotalDoc is the total number of documents in training data, and df (term) is for occurring that this candidate expands the number of the document of vocabulary.
3, candidate expands the document number that vocabulary and original query occur jointly.This feature can be used for calculating the degree of correlation that original query and candidate expand vocabulary.
4, candidate expands vocabulary and original query common number of times occurred in one text window.This feature is used for calculating the degree of correlation that query word in original query within the specific limits and this candidate expand vocabulary, wherein text window refers to occur that original query word is with within the scope of the document of this candidate's vocabulary, the number of the word at interval between this expansion vocabulary and original query word at a same section.
5, in biomedical resource as in MeSH, the number of times that candidate's expansion word remittance abroad is existing.This feature is used for calculating and weighing this candidate expanding the segment information of vocabulary in biomedical resource.
6, in biomedical resource as in MeSH, comprise the number that this candidate expands the term concepts of vocabulary.Between biomedical technical term concept, often have the relation comprised, this feature can weigh the importance of some candidate's vocabulary in biomedical resource equally.
Expand in lexical feature the above candidate extracted, feature 1 and feature 2 are used for weighing candidate and expand the distributed intelligence of vocabulary in literature collection; Feature 3 and feature 4 are used for weighing the degree of correlation information that candidate expands vocabulary and original query; And feature 5 and feature 6 be used for weigh candidate expand the distributed intelligence of vocabulary in biomedical resource.Expansion lexical feature involved in the present invention comprises but is not limited to above-mentioned feature, by above-mentioned manifold extraction, as the input of word grading sorting algorithm, can better weigh the significance level that candidate expands vocabulary.
S4, candidate expands vocabulary order models training step: according to the candidate obtained in step S3 expand vocabulary degree of correlation mark and various features as input, the order models of word grading sorting algorithm is utilized to train the weighted value obtaining often kind of feature, concrete steps are be noted as relevant candidate in selection step S3 expand vocabulary (candidate corresponding when namely label is 1 expands vocabulary) and be somely marked as incoherent candidate and expand vocabulary (candidate corresponding when label is 0 expands vocabulary) and form the grouping of word, some such words are selected to divide into groups as training sample, at random for each candidate expands the word feature imparting initial weight of vocabulary, by characteristic weighing score, the related expanding vocabulary in each word grouping is sorted, according to the ranking results that each word divides into groups, calculated population sequence loss, according to the weight of the every one-dimensional characteristic of Grad dynamic conditioning of loss function, loss of wherein sorting is: wherein NumSample is the quantity that in word grouping, candidate expands vocabulary grouping, loss ifor the penalty values that each word divides into groups, this penalty values is obtained by the sorting position calculating related expanding vocabulary, and the penalty values of the more forward correspondence of sorting position is less, by a process on loop iteration, train, using the eigenwert finally selected as the order models of having trained until overall loss value is less than a certain threshold value or reaches the iterations of specifying, iteration is selected to stop training 100 times in the present embodiment.
In the present embodiment, penalty values is: wherein rank ifor the position that relevant candidate's expansion word sorts in word group list, when it makes number one, loss is 0, loses be maximized when it rolls into last place.In addition, the computing formula of penalty values is including but not limited to this computing formula.
In order models, the computing formula of expansion vocabulary final score is as follows:
score ( term ) = Σ i = 1 FeatureNum a i · feature i ( term )
Wherein, FeatureNum is the sum of feature, a ibe the weighted value of i-th feature, feature i(term) be the eigenwert of i-th feature corresponding to candidate's vocabulary term.The order models herein obtained after training may be used for the selection of the expansion vocabulary that test query is correlated with.Above step completes all in off-line case.
The online query stage comprises the following steps:
S5, on-line search engine queries and extraction step: the new inquiry that user is submitted to online, retrieval obtains front N1 bar Query Result; Extract the specialized vocabulary in front N1 bar result for retrieval and various features thereof according to biomedical resource, wherein N1 is natural number;
It should be noted that, under this step refers to online situation, when after the inquiry that user submits to Biomedical literature search engine, the N1 section Query Result that the sequence of this method meeting automatic acquisition preliminary search is the most forward, for process such as the expansions inquired about user, this process is transparent concerning user.
S6, online candidate expand word retrieval and feature extraction thereof and marking step: feature extracting method that vocabulary extracting method and candidate expand vocabulary extracts the online query stage specialized vocabulary in front N1 bar result for retrieval and various features thereof to utilize the candidate of off-line phase S2-S3 to expand according to biomedical resource to new inquiry, obtain online query stage candidate and expand vocabulary, the feature of extraction expands the importance of vocabulary in expanding query for weighing candidate; The feature weight obtained is trained according to step S4, give a mark for online query stage candidate expands vocabulary, new inquiry is built according to marking, and the K1 selecting mark forward online query stage candidate expands vocabulary joins expanding query as on-line stage in the online new inquiry submitted to, wherein K1 is natural number;
To mark for utilizing biomedical resource and the some online query stage candidates extracted expand vocabulary, must being divided into of it wherein FeatureNum is the sum of feature, a ithe weighted value of i-th feature in order models, feature i(term) be the eigenwert that online query stage candidate expands i-th feature corresponding to vocabulary term;
Expand vocabulary score according to online query candidate to sort to it, and forward K1 the vocabulary of selected and sorted as on-line stage candidate expand vocabulary join in new inquiry time, the on-line stage candidate added expands the weight of vocabulary in expanding query and can be expressed as weight = Σ i = 1 count weight i · feature i + sign · weight original , Wherein sign is sign function, sign=1 when in the new inquiry that this on-line stage candidate expansion word remittance abroad is submitted to now online, otherwise sign=0, weight originalfor the weighted value of new inquiry in expanding query submitted to online;
The concrete form of final expanding query is as follows:
(weight 1query originalweight 2(w 1term 1w 2term 2… w kterm k))
Wherein weight 1for the weight of new inquiry in expanding query submitted to online, weight 2for the weight of entirety in expanding query of expansion vocabulary newly added, w 1, w 2..., w kfor expansion vocabulary term 1, term 2..., term kcorresponding score weight, K is the number of the final expansion vocabulary selected.Weight in the present embodiment 1value is 0.5, weight 2value is the value of 0.5, K is 50.
S7, Query Result return step: retrieve according to expanding query, and result for retrieval is returned to user, complete retrieving.
Corresponding with said method, present invention also offers a kind of Biomedical literature searching system based on word grading sorting algorithm.Figure 2 shows the building-block of logic of this system.
Based on a Biomedical literature searching system for word grading sorting algorithm, comprise off-line training part and on-line search part; Described off-line training part comprises with lower part:
Search engine inquiry extraction module: for the historical query record according to search engine, extracts the front N bar Query Result document obtained in many group pollings and each inquiry; And will inquire about and Query Result document collection to one inquiry pond in, wherein N is natural number; Search engine inquiry extraction module can according to the inquiry of user, retrieval inquires about with user the Biomedical literature be associated, and the result of retrieval is returned to user, and in internal system, the computing such as expansion of inquiry and operation be can't see transparent user.
Candidate expands vocabulary extraction module: for when given user inquires about, utilize the resource that biomedical sector is intrinsic, in the top n Query Result document that search engine inquiry extraction module obtains, extraction obtains specialized vocabulary, and the weighted sum of the number of times (frequency) occurred in Query Result document this specialized vocabulary or occurrence number carries out record; The weighted sum descending sort of the number of times occurred in Query Result document according to each specialized vocabulary or occurrence number, select M the highest specialized vocabulary of occurrence number alternatively to expand vocabulary, wherein M is natural number;
Candidate expands feature extraction and the labeling module of vocabulary: the candidate for obtaining in candidate's expansion word extraction module expands in vocabulary and extracts associated feature, and expanding the impact of vocabulary for retrieval performance according to candidate, mark candidate expands the degree of correlation of vocabulary; When off-line training, candidate expand vocabulary degree of correlation mark and various features will be used for the input of word grading sorting algorithm; When online query, this module is for extracting the characteristic information expanded vocabulary with candidate and be associated.
Candidate expands vocabulary order models training module: for utilizing word grading sorting algorithm, and after extraction candidate expands vocabulary characteristic sum mark candidate expands vocabulary degree of correlation, training vocabulary order models exports the weighted value that candidate expands each feature of vocabulary; This weighted value can be used in the tolerance of the significance level of the expansion vocabulary to the unknown inquiry.
Described on-line search part comprises:
Query Reconstruction module: expand vocabulary marking for the specialized vocabulary extraction in newly inquiry and candidate; Be included in line search engine queries extraction module, online candidate expands word retrieval and feature extraction and scoring modules, wherein, on-line search engine queries extraction module is used for the new inquiry submitted to online user, and retrieval obtains front N1 bar Query Result; Extract the specialized vocabulary in front N1 bar result for retrieval and various features thereof according to biomedical resource, wherein N1 is natural number.Online candidate expands candidate that word retrieval and feature extraction and scoring modules thereof utilize vocabulary order models to export and expands vocabulary weighted value score and calculate corresponding weight, and is joined in original query, and be expanded inquiry.
Query Result returns module, for expanding query being retrieved the result document obtained, returns to user.What user obtained returns results the result returned results after query expansion being actually its submission input, and the process of query expansion is sightless concerning user.
According to the above-mentioned description being directed to method and system embodiment involved in the present invention, be described in conjunction with specific embodiments.Suppose in the present embodiment that user has completed the training of order models by historical data, when user submits one new inquiry " mad cow disease " (rabid ox disease) to, system is first according to the frequency information of this word in preliminary search before examination document, select the expansion vocabulary of candidate, wherein candidate expand 10 forward expansion vocabulary of rank in vocabulary and correlativity mark situation as shown in the table:
Rank Vocabulary Correlativity
1 Disease (disease) Relevant
2 Prions (prion) Relevant
3 Cause (causing) Uncorrelated
4 Infectious (infectivity) Relevant
5 Conversion (conversion) Uncorrelated
6 Cow (ox) Relevant
7 Spongiform (spongy tissue) Relevant
8 Fatal (fatal) Uncorrelated
9 Encephalopathies (epileptic encephalopathic) Relevant
10 Mad (madness) Relevant
As can be seen from the above table, expand in vocabulary the candidate of first 10 of rank, uncorrelated vocabulary has 3, if directly joined in original query, can produce negative impact to retrieval performance.Next expand vocabulary relevant feature with extraction biomedical dictionary MeSH to candidate from document, and utilize order models to obtain the weight of often kind of feature, vocabulary is expanded to all candidates and again gives a mark and sort.
After sequence, before the final rank selected, the expansion vocabulary of 10 is as shown in the table.As can be seen from the table, 10 inquiries of sorting the most forward in the expanding query after sorting and improving are relative words.Using these inquiries according to the sequence score after its normalization as weight, join in original query, carry out retrieving and can improve the performance of retrieval further.
The description of above-described embodiment is explained and is described the Biomedical literature search method based on word grading sorting algorithm provided by the invention and system.The method and system can utilize the resources such as the knowledge base of biomedical sector to expand the original query that user submits to, word grading sorting algorithm is employed for expanding vocabulary importance measures in expansion, carry out supplementing to inquiry that user submits to by query expansion process and perfect, ensure that the accuracy of Query Result, meet the information requirement of user further.
Above content is in conjunction with concrete optimal technical scheme further description made for the present invention, can not assert that specific embodiment of the invention is confined to these explanations.For general technical staff of the technical field of the invention, without departing from the inventive concept of the premise, some simple deduction or replace can also be made, all should be considered as belonging to protection scope of the present invention.

Claims (8)

1. based on a Biomedical literature search method for word grading sorting algorithm, it is characterized in that, comprise following off-line training step and online query stage, wherein, off-line training step comprises the following steps:
S1, search engine inquiry extraction step: according to the historical query record of search engine, extract the front N bar Query Result document obtained in many group pollings and each inquiry; And will inquire about and Query Result document collection to one inquiry pond in, wherein N is natural number;
S2, candidate expand vocabulary extraction step: extract the specialized vocabulary in inquiry pond before each inquiry in N bar Query Result document according to biomedical resource, and statistics obtains the weighted sum of number of times that each specialized vocabulary occurs in described Query Result document or occurrence number; The weighted sum descending sort of the number of times occurred in Query Result document according to each specialized vocabulary or number of times, select M the highest specialized vocabulary of occurrence number weighted sum that is the highest or number of times alternatively to expand vocabulary, wherein M is natural number;
S3, candidate expand feature extraction and the annotation step of vocabulary:
Candidate expand vocabulary feature extraction and mark carry out simultaneously; Wherein, correlativity mark candidate being expanded to vocabulary is marked by the retrieval performance that contrasts original query and the height of this candidate being expanded retrieval performance when vocabulary joins in original query; The evaluation index of retrieval performance height comprises: accuracy rate, Average Accuracy, NDCG value and MRR value; The concrete mode of correlativity mark is as follows:
label = 1 eval ( query + term ) > eval ( query ) 0 eval ( query + term ) ≤ eval ( query )
Wherein, eval () is the evaluation index function for evaluating retrieval performance height, eval (query+term) is for evaluation index function eval () is evaluating score candidate expanded when vocabulary term joins inquiry query, and eval (query) is the score of evaluation index function when evaluating inquiry query; Label be labeled as this candidate of 1 expression expand vocabulary to inquiry query be relevant; Label is labeled as this candidate of 0 expression and expands vocabulary and inquire about query incoherent;
Candidate expands the feature extraction of vocabulary, expand the correlation information of vocabulary and original query etc. for training order models from the distributed intelligence in biomedical resource of the distributed intelligence of extracting candidate before the inquiry biomedical resource and inquiry pond returns in N bar Query Result document and expand vocabulary, candidate's vocabulary and candidate to prepare, and after the same candidate of extraction expands the various features of vocabulary, all eigenwerts are normalized, so that all eigenwerts are controlled [0,1], on interval, normalized process is as follows:
newFeatureValue = oldFeatureValue - min Value max Value - min Value
Wherein, minValue and maxValue is respectively minimum value and the maximal value of a certain feature;
S4, candidate expand vocabulary order models training step: degree of correlation mark and the various features of expanding vocabulary according to candidate, word grading sorting algorithm is utilized to train the weighted value obtaining often kind of feature, concrete steps are: select to be noted as in a step S3 relevant candidate and expand vocabulary and be somely marked as incoherent candidate and expand vocabulary and form a word grouping, select some such words to divide into groups as training sample; Be the feature imparting initial weight of wherein each candidate's expansion word at random, by characteristic weighing score, the correlation candidate expansion vocabulary in each word grouping sorted; According to the ranking results that each word divides into groups, calculated population sequence loss, according to the weight of the every one-dimensional characteristic of Grad dynamic conditioning of loss function, loss of wherein sorting is: wherein NumSample is the quantity that in word grouping, candidate expands vocabulary grouping, loss ifor the penalty values that each word divides into groups, this penalty values is obtained by the sorting position calculating related expanding vocabulary, and the penalty values of the more forward correspondence of sorting position is less; By a process on loop iteration, train, using the eigenwert finally selected as the order models of having trained until overall loss value is less than a certain threshold value or reaches the iterations of specifying;
The online query stage comprises the following steps:
S5, on-line search engine queries and extraction step: the new inquiry that user is submitted to online, retrieval obtains front N1 bar Query Result; Extract the specialized vocabulary in front N1 bar result for retrieval and various features thereof according to biomedical resource, wherein N1 is natural number; ;
S6, online candidate expand word retrieval and feature extraction thereof and marking step: feature extracting method that vocabulary extracting method and candidate expand vocabulary extracts the online query stage specialized vocabulary in front N1 bar result for retrieval and various features thereof to utilize the candidate of off-line phase S2-S3 to expand according to biomedical resource to new inquiry, obtain online query stage candidate and expand vocabulary, the feature of extraction expands the importance of vocabulary in expanding query for weighing candidate; Train the feature weight obtained according to step S4, give a mark for online query stage candidate expands vocabulary, and the K1 selecting a mark forward candidate expands vocabulary joins as expanding query in the online new inquiry submitted to, wherein K1 is natural number;
To mark for utilizing biomedical resource and the some online query stage candidates extracted expand vocabulary, must being divided into of it wherein FeatureNum is the sum of feature, and ai is the weighted value of i-th feature in order models, feature i(term) be the eigenwert that online query stage candidate expands i-th feature corresponding to vocabulary term;
Expand vocabulary score according to online query stage candidate to sort to it, and forward K1 the online query stage candidate of selected and sorted expands vocabulary when joining in the online new inquiry submitted to as expansion vocabulary, the online query stage candidate added expands the weight of vocabulary in expanding query and can be expressed as weight = Σ i = 1 count weight i · featur e i + sign · weigh t original , Wherein sign is sign function, sign=1 when in the new inquiry that this online query stage candidate's expansion word remittance abroad is submitted to now online, otherwise sign=0, weight originalfor the weighted value of new inquiry in expanding query submitted to online;
S7, Query Result return step: retrieve according to expanding query, and result for retrieval is returned to user.
2. a kind of Biomedical literature search method based on word grading sorting algorithm according to claim 1, is characterized in that, in step S2, the weighted sum of specialized vocabulary occurrence number in described Query Result document is wherein count ifor the number of times that this vocabulary occurs in i-th section of document, d (i) is the decay factor of i-th section of document.
3. a kind of Biomedical literature search method based on word grading sorting algorithm according to claim 1, is characterized in that, in step s3, evaluation index function eval () is Average Accuracy function, that is:
eval MAP = 1 RelDoc query · Σ i = 1 RelDoc query i rank ( i )
Wherein, RelDoc queryfor the number of the relevant documentation of given inquiry query, rank (i) represents the position of i-th section of relevant documentation in document results sorted lists.
4. a kind of Biomedical literature search method based on word grading sorting algorithm according to claim 1, it is characterized in that, in step sl, when situation without historical query record, by constructing the mode of biomedical retrieval and indexing method, the artificial record obtaining inquiry and result thereof; Described search method adopts vector space model, BM25 retrieval model or the language model based on different smoothing method.
5. a kind of Biomedical literature search method based on word grading sorting algorithm according to claim 1, it is characterized in that, in step S4, penalty values is: wherein rank ifor the position that relevant candidate's expansion word sorts in word group list.
6. a kind of Biomedical literature search method based on word grading sorting algorithm according to claim 1, it is characterized in that, biomedical resource refers to the dictionary or knowledge base that comprise biomedical specialized vocabulary.
7. a kind of Biomedical literature search method based on word grading sorting algorithm according to claim 1, it is characterized in that, the feature that described candidate expands vocabulary comprises candidate and expands the frequency TF that vocabulary occurs in result document, candidate expands the TF-IDF value of vocabulary, candidate expands the document number that vocabulary and original query occur jointly, candidate expands vocabulary and original query common number of times occurred in one text window, the number of times that candidate's expansion word remittance abroad is existing in biomedical resource, in biomedical resource, comprise this candidate and expand the number of the term concepts of vocabulary and the relation of inclusion between biomedical technical term concept.
8. based on a Biomedical literature searching system for word grading sorting algorithm, it is characterized in that, comprise off-line training part and on-line search part; Described off-line training part comprises with lower part:
Search engine inquiry extraction module: for the historical query record according to search engine, extracts the front N bar Query Result document obtained in many group pollings and each inquiry; And will inquire about and Query Result document collection to one inquiry pond in, wherein N is natural number;
Candidate expands vocabulary extraction module: for when given user inquires about, utilize the resource that biomedical sector is intrinsic, in the top n Query Result document that search engine inquiry extraction module obtains, extraction obtains specialized vocabulary, and the weighted sum of the frequency occurred in Query Result document this specialized vocabulary or occurrence number carries out record; The weighted sum descending sort of the number of times occurred in Query Result document according to each specialized vocabulary or occurrence number, select M the highest specialized vocabulary of occurrence number alternatively to expand vocabulary, wherein M is natural number;
Candidate expands feature extraction and the labeling module of vocabulary: the candidate for obtaining in candidate's expansion word extraction module expands in vocabulary and extracts associated feature, and expanding the impact of vocabulary for retrieval performance according to candidate, mark candidate expands the degree of correlation of vocabulary;
Candidate expands vocabulary order models training module: for utilizing word grading sorting algorithm, and after extraction candidate expands vocabulary characteristic sum mark candidate expands vocabulary degree of correlation, training vocabulary order models obtains the weighted value that candidate expands each feature of vocabulary;
Described on-line search part comprises:
Query Reconstruction module: expand vocabulary marking for the specialized vocabulary extraction in newly inquiry and candidate; Be included in line search engine queries extraction module, online candidate expands word retrieval and feature extraction and scoring modules, wherein, on-line search engine queries extraction module is used for the new inquiry submitted to online user, and retrieval obtains front N1 bar Query Result; Extract the specialized vocabulary in front N1 bar result for retrieval and various features thereof according to biomedical resource, wherein N1 is natural number.Online candidate expands candidate that word retrieval and feature extraction and scoring modules thereof utilize vocabulary order models to export and expands vocabulary weighted value score and calculate corresponding weight, and is joined in original query, and be expanded inquiry;
Query Result returns module: for expanding query being retrieved the result document obtained, return to user.
CN201510147696.5A 2015-03-31 2015-03-31 The Biomedical literature search method and system of a kind of word-based grading sorting algorithm Active CN104750819B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510147696.5A CN104750819B (en) 2015-03-31 2015-03-31 The Biomedical literature search method and system of a kind of word-based grading sorting algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510147696.5A CN104750819B (en) 2015-03-31 2015-03-31 The Biomedical literature search method and system of a kind of word-based grading sorting algorithm

Publications (2)

Publication Number Publication Date
CN104750819A true CN104750819A (en) 2015-07-01
CN104750819B CN104750819B (en) 2018-01-23

Family

ID=53590503

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510147696.5A Active CN104750819B (en) 2015-03-31 2015-03-31 The Biomedical literature search method and system of a kind of word-based grading sorting algorithm

Country Status (1)

Country Link
CN (1) CN104750819B (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106095838A (en) * 2016-06-01 2016-11-09 比美特医护在线(北京)科技有限公司 A kind of data processing method and device
CN106294654A (en) * 2016-08-04 2017-01-04 首都师范大学 A kind of body sort method and system
CN106919649A (en) * 2017-01-19 2017-07-04 北京奇艺世纪科技有限公司 A kind of method and device of entry weight calculation
CN107644011A (en) * 2016-07-20 2018-01-30 百度(美国)有限责任公司 System and method for the extraction of fine granularity medical bodies
CN108509461A (en) * 2017-02-28 2018-09-07 华为技术有限公司 A kind of sequence learning method and server based on intensified learning
CN108520038A (en) * 2018-03-31 2018-09-11 大连理工大学 A kind of Biomedical literature search method based on Ranking Algorithm
CN109508392A (en) * 2018-09-28 2019-03-22 中国标准化研究院 A kind of technical literature index announcement search method
CN109857731A (en) * 2019-01-11 2019-06-07 吉林大学 A kind of peek-a-boo and search method of biomedicine entity relationship
CN110019888A (en) * 2017-12-01 2019-07-16 北京搜狗科技发展有限公司 A kind of searching method and device
CN113434767A (en) * 2021-07-07 2021-09-24 携程旅游信息技术(上海)有限公司 UGC text content mining method, system, device and storage medium
CN113486156A (en) * 2021-07-30 2021-10-08 北京鼎普科技股份有限公司 ES-based associated document retrieval method
CN113742459A (en) * 2021-11-05 2021-12-03 北京世纪好未来教育科技有限公司 Vocabulary display method and device, electronic equipment and storage medium
CN115016873A (en) * 2022-05-05 2022-09-06 上海乾臻信息科技有限公司 Front-end data interaction method and system, electronic equipment and readable storage medium
CN115659047A (en) * 2022-11-11 2023-01-31 南京汇宁桀信息科技有限公司 Medical literature retrieval method based on hybrid algorithm
CN117076658A (en) * 2023-08-22 2023-11-17 南京朗拓科技投资有限公司 Quotation recommendation method, device and terminal based on information entropy

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040158560A1 (en) * 2003-02-12 2004-08-12 Ji-Rong Wen Systems and methods for query expansion
CN103942302A (en) * 2014-04-16 2014-07-23 苏州大学 Method for establishment and application of inter-relevance-feedback relational network

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040158560A1 (en) * 2003-02-12 2004-08-12 Ji-Rong Wen Systems and methods for query expansion
CN103942302A (en) * 2014-04-16 2014-07-23 苏州大学 Method for establishment and application of inter-relevance-feedback relational network

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
徐博等: "基于模板抽取和丰富特征的药名词典生成", 《第五届全国信息检索学术会议论文集》 *
朱玉皎: "个性化智能搜索引擎中查询扩展技术研究", 《万方数据》 *
林原等: "一种基于位置优化的排序学习方法", 《山东大学学报(工学版)》 *

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106095838A (en) * 2016-06-01 2016-11-09 比美特医护在线(北京)科技有限公司 A kind of data processing method and device
CN107644011A (en) * 2016-07-20 2018-01-30 百度(美国)有限责任公司 System and method for the extraction of fine granularity medical bodies
CN107644011B (en) * 2016-07-20 2023-11-07 百度(美国)有限责任公司 System and method for fine-grained medical entity extraction
CN106294654A (en) * 2016-08-04 2017-01-04 首都师范大学 A kind of body sort method and system
CN106919649B (en) * 2017-01-19 2020-06-26 北京奇艺世纪科技有限公司 Entry weight calculation method and device
CN106919649A (en) * 2017-01-19 2017-07-04 北京奇艺世纪科技有限公司 A kind of method and device of entry weight calculation
WO2018157625A1 (en) * 2017-02-28 2018-09-07 华为技术有限公司 Reinforcement learning-based method for learning to rank and server
US11500954B2 (en) 2017-02-28 2022-11-15 Huawei Technologies Co., Ltd. Learning-to-rank method based on reinforcement learning and server
CN108509461A (en) * 2017-02-28 2018-09-07 华为技术有限公司 A kind of sequence learning method and server based on intensified learning
CN110019888A (en) * 2017-12-01 2019-07-16 北京搜狗科技发展有限公司 A kind of searching method and device
CN108520038A (en) * 2018-03-31 2018-09-11 大连理工大学 A kind of Biomedical literature search method based on Ranking Algorithm
CN108520038B (en) * 2018-03-31 2020-11-10 大连理工大学 Biomedical literature retrieval method based on sequencing learning algorithm
CN109508392A (en) * 2018-09-28 2019-03-22 中国标准化研究院 A kind of technical literature index announcement search method
CN109857731A (en) * 2019-01-11 2019-06-07 吉林大学 A kind of peek-a-boo and search method of biomedicine entity relationship
CN113434767A (en) * 2021-07-07 2021-09-24 携程旅游信息技术(上海)有限公司 UGC text content mining method, system, device and storage medium
CN113486156A (en) * 2021-07-30 2021-10-08 北京鼎普科技股份有限公司 ES-based associated document retrieval method
CN113742459A (en) * 2021-11-05 2021-12-03 北京世纪好未来教育科技有限公司 Vocabulary display method and device, electronic equipment and storage medium
CN115016873A (en) * 2022-05-05 2022-09-06 上海乾臻信息科技有限公司 Front-end data interaction method and system, electronic equipment and readable storage medium
CN115659047A (en) * 2022-11-11 2023-01-31 南京汇宁桀信息科技有限公司 Medical literature retrieval method based on hybrid algorithm
CN115659047B (en) * 2022-11-11 2023-07-28 南京汇宁桀信息科技有限公司 Medical document retrieval method based on hybrid algorithm
CN117076658A (en) * 2023-08-22 2023-11-17 南京朗拓科技投资有限公司 Quotation recommendation method, device and terminal based on information entropy
CN117076658B (en) * 2023-08-22 2024-05-03 南京朗拓科技投资有限公司 Quotation recommendation method, device and terminal based on information entropy

Also Published As

Publication number Publication date
CN104750819B (en) 2018-01-23

Similar Documents

Publication Publication Date Title
CN104750819A (en) Biomedicine literature search method and system based on word grading sorting algorithm
CN104391942B (en) Short essay eigen extended method based on semantic collection of illustrative plates
CN104199857B (en) A kind of tax document hierarchy classification method based on multi-tag classification
CN104834735B (en) A kind of documentation summary extraction method based on term vector
CN109344236A (en) One kind being based on the problem of various features similarity calculating method
CN106663125A (en) Question sentence generation device and computer program
CN111401040B (en) Keyword extraction method suitable for word text
CN108520038B (en) Biomedical literature retrieval method based on sequencing learning algorithm
CN101751455B (en) Method for automatically generating title by adopting artificial intelligence technology
CN110879831A (en) Chinese medicine sentence word segmentation method based on entity recognition technology
CN101963971A (en) Use relevance feedback to carry out the method and the corresponding storage medium of database search
CN103927358A (en) Text search method and system
CN103150381B (en) A kind of High-precision Chinese predicate identification method
CN101539907A (en) Part-of-speech tagging model training device and part-of-speech tagging system and method thereof
CN103729432A (en) Method for analyzing and sequencing academic influence of theme literature in citation database
CN104765779A (en) Patent document inquiry extension method based on YAGO2s
CN107291895A (en) A kind of quick stratification document searching method
CN114090861A (en) Education field search engine construction method based on knowledge graph
CN107436955A (en) A kind of English word relatedness computation method and apparatus based on Wikipedia Concept Vectors
CN110851593A (en) Complex value word vector construction method based on position and semantics
CN107679121B (en) Mapping method and device of classification system, storage medium and computing equipment
Mohsen et al. On the automatic construction of an Arabic thesaurus
CN113269477B (en) Scientific research project query scoring model training method, query method and device
CN103793474B (en) Knowledge management oriented user-defined knowledge classification method
CN110990376B (en) Subject classification automatic indexing method based on multi-factor mixed ordering mechanism

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant