CN104239314B - A kind of method and system of query expansion word - Google Patents

A kind of method and system of query expansion word Download PDF

Info

Publication number
CN104239314B
CN104239314B CN201310231653.6A CN201310231653A CN104239314B CN 104239314 B CN104239314 B CN 104239314B CN 201310231653 A CN201310231653 A CN 201310231653A CN 104239314 B CN104239314 B CN 104239314B
Authority
CN
China
Prior art keywords
word
popular
label
popular word
query expansion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201310231653.6A
Other languages
Chinese (zh)
Other versions
CN104239314A (en
Inventor
郝玺龙
丁海星
牛合庆
陈金玉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin Mass Information Technology Ltd By Share Ltd
Original Assignee
Tianjin Mass Information Technology Ltd By Share Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin Mass Information Technology Ltd By Share Ltd filed Critical Tianjin Mass Information Technology Ltd By Share Ltd
Priority to CN201310231653.6A priority Critical patent/CN104239314B/en
Publication of CN104239314A publication Critical patent/CN104239314A/en
Application granted granted Critical
Publication of CN104239314B publication Critical patent/CN104239314B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • G06F16/2448Query languages for particular applications; for extensibility, e.g. user defined types

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a kind of method and system of query expansion word, this method includes setting the label word for being no less than one to each popular word, forms label word dictionary;According to the relation between second popular word on the first popular word periphery, it is weighted to each label word belonging to each second popular word, and be ranked up;From whole label words belonging to the second popular word, the high weight label word of predetermined number, the spreading range as the first popular word are extracted;When inputting the first popular word as term, popular word corresponding to the label word in the first popular word spreading range is shown;In popular word corresponding to label word out of spreading range, the popular word needed, the query expansion word as term are selected.Technical solution of the present invention can help user to obtain the term for meeting information retrieval target, so as to improve Information Retrieval Efficiency.

Description

A kind of method and system of query expansion word
Technical field
The present invention relates to technical field of information retrieval, more particularly to a kind of method and system of query expansion word.
Background technology
With the arrival of information age, people enter the ocean of information.Before the information faces of magnanimity, people are on the contrary without institute Follow, it is difficult to find the information needed for oneself in a short time.But the development of computer technology and network technology, examined to information Rope field provides help in a way, and people can build the search strategy needed for oneself, and utilize computer technology And network technology, to obtain correct information.
So-called search strategy, be exactly analysis retrieval put question on the basis of, it is determined that retrieval data source, retrieval word, And logical relation and the scientific arrangement of finding step between clear and definite term.Retrieve formula(Retrieve word and each operator The expression formula combined)It is exactly search strategy in the narrow sense.
Primary link in retrieving, clear and definite Search Requirement is sought to, if the first step is confused, be just far from being last The correctness of retrieval result.Because user is not always very bright to the demand of oneself, particularly potential, fuzzy demand Really, it is therefore desirable to analyzed, in the hope of one complete and clearly expresses.
One is being constructed completely and in clearly retrieving formula process, user is firstly the need of suitable term is found, so And user is before retrieval, to the message area for needing to obtain, some furs simply often is grasped, how understand some concepts Accurate structure retrieval formula is removed by these preliminary concepts, is extremely difficult for a user.
The content of the invention
The shortcomings that it is an object of the invention to overcome prior art and deficiency, there is provided a kind of method of query expansion word and be System, user can be helped to obtain the term for meeting information retrieval target, so as to improve Information Retrieval Efficiency.
An embodiment provides a kind of method of query expansion word, comprise the following steps:
The label word no less than one is set to each popular word, forms label word dictionary;
According to the relation between second popular word on the first popular word periphery, to belonging to each second popular word Each label word is weighted;
Whole label words belonging to second popular word are ranked up according to the weights of each label word;
From whole label words belonging to second popular word, the high weight label word of predetermined number is extracted, as The spreading range of first popular word;
When inputting the first popular word as term, by the label word in the first popular word spreading range Corresponding popular word is shown;
In popular word corresponding to label word out of described spreading range, the popular word needed is selected, as retrieval The query expansion word of word.
Preferably, it is further comprising the steps of:
According to the relation between second popular word on the first popular word periphery, added to each second popular word Power;
Popular word is according to popular word corresponding to each label word in the first popular word spreading range Weights order arranged.
Preferably, the frequency and/or distance occurred according to second popular word on the first popular word periphery, to each the Each label word belonging to two popular words and each second popular word is weighted.
Preferably, the label word dictionary is generated using man-machine interaction mode.
Preferably, it is further comprising the steps of:
Source file information where popular word corresponding to label word in the spreading range is shown, according to institute State the popular word that the selection of source file information needs, the query expansion word as term.
Preferably, the source file of retrieval includes being no less than a data source.
Preferably, the data source is news, forum and/or microblogging.
Preferably, the data source is the data in different technologies field or different business field.
An alternative embodiment of the invention additionally provides a kind of system of query expansion word, including tag unit, label word Dictionary unit, weighted units, sequencing unit, input block and selecting unit, wherein,
Tag unit is used to set the label word no less than one to each popular word;
Label word dictionary unit is used to store popular word and corresponding label word;
The relation that weighted units are used between the second popular word according to the first popular word periphery, it is general to each second Each label word belonging to logical vocabulary and each second popular word is weighted;
Sequencing unit is used to enter whole label words belonging to second popular word according to the weights of each label word Row sequence, from whole label words belonging to second popular word, extracts the high weight label word of predetermined number, as institute The spreading range of the first popular word is stated, for each label word in the first popular word spreading range is corresponding Popular word according to popular word weights order arranged;
Input block is used to input the first popular word as term;
Selecting unit is used in popular word corresponding to the label word out of described spreading range, selects the generic word needed Converge, the query expansion word as term.
Preferably, in addition to source file memory cell, the source file memory cell are used to store in the spreading range Label word corresponding to source file information where popular word;
The selecting unit is used for the popular word that needs are selected according to the source file information, the extension as term Term.
Technical solution of the present invention is employed, due to term closely related therewith can be expanded from a small number of terms, So as to help user to build the retrieval formula of complete and accurate, Information Retrieval Efficiency is improved.
Brief description of the drawings
Fig. 1 is the flow chart of query expansion word provided in an embodiment of the present invention;
Fig. 2 is the structural representation of query expansion word system provided in an embodiment of the present invention.
Embodiment
The embodiment of the present invention is described in detail below in conjunction with the accompanying drawings.But embodiments of the present invention are unlimited In this.
The main thought of technical solution of the present invention is exactly to a vocabulary, and word associated therewith is found out from different aspect Converge, user can select the vocabulary that oneself needs as expansion when building information retrieval formula from the association vocabulary that these are found out Open up term.Herein, these vocabulary can be referred to as popular word, and the vocabulary corresponding to different aspect, can be referred to as Label word.
Fig. 1 is the flow chart of query expansion word provided in an embodiment of the present invention.As shown in figure 1, the stream of the query expansion word Journey comprises the following steps:
Step 101, one or more label word is set to each popular word, forms label word dictionary.
In this label word dictionary, the vocabulary of magnanimity is on the one hand have collected, these vocabulary come from various data sources, bag Include news, forum and/or microblogging etc., or the data in different technologies field or different business field.These vocabulary are from number After being cut out according to source, just turn into popular word, on the other hand, according to the property of popular word, specification goes out several and is used for table Show the vocabulary of popular word attribute, these vocabulary are exactly label word.For each popular word, all assign one or more with Corresponding to label word, be formed label word dictionary.The label word dictionary can use man-machine interaction mode to generate.
Such as " technology " this popular word can stamps " product information " and " license " the two label words;" study course " This popular word can stamp " product information " and " teaching " the two label words;" Siemens " this popular word can be beaten Upper " industry control brand " and " Business Name " the two label words.
Passing through above-mentioned example, it can be seen that a popular word can correspond to one or more label word, and in turn one Individual label word can correspond to one or more popular word.
Step 102, according to the relation between second popular word on the first popular word periphery, to each second generic word Each label word belonging to remittance and each second popular word is weighted.
This step is for finding out the vocabulary related to a popular word, and to the degree amount of progress related between the two Change.
Here the first popular word refers to any popular word in data source, and the second popular word then refers in number According to appearing in other vocabulary on foregoing popular word periphery in source, including vocabulary in front is appeared in, be also included within the word of back Converge.
Relation between second popular word on the first popular word periphery can determine in several ways, such as in number According to the position that in source, some second popular word occurs on the first popular word periphery(Distance), and some second generic word The frequency on the present first popular word periphery of remittance abroad, etc..
By counting these quantizating index in data source, it is possible to add to each second popular word in data source Power, can equally be weighted to each label word belonging to each second popular word, obtain their weights.
Such as in data source, there is " Siemens technique ", " Siemens's study course ", by counting the second popular word " skill Art ", " study course " are located at the position on the first popular word " Siemens " periphery and/or the frequency of appearance, it is possible to " technology ", " study course " the two second popular words are weighted.Simultaneously can give " technology " corresponding " product information " and " license " this two Individual label word is weighted, and " study course " corresponding " product information " and " teaching " the two label words can be given to be weighted, and And due to " technology " and " study course " all corresponding " product information " this label word, so the weights of " product information " come from " technology " " study course " and the relation of " Siemens ".
Step 103, whole label words belonging to the second popular word are ranked up according to the weights of each label word.
For above-mentioned example, exactly the weights that " product information ", " license " and " teaching " obtains according to it are ranked up.
Step 104, from whole label words belonging to second popular word, extract the high weight label of predetermined number Word, the spreading range as the first popular word.
Because the second popular word in data source, appearing in the first popular word periphery is large number of, and each Two popular words correspond to one or more label word again, so label word is large number of corresponding to the first popular word, from can Capable angle, only select a number of label word, such as 10,20 label words, the extension model as the first popular word Enclose, these label words best embody the first popular word property.
Such as " Siemens " this first popular word, then from three label words " product information ", " license " and In " teaching ", " product information " and " teaching " the two vocabulary are selected as spreading range.
Popular word is according to common corresponding to step 105, each label word in the first popular word spreading range The weights order of vocabulary is arranged.
As mentioned above, a popular word can correspond to one or more label word, in turn a label word One or more popular word can be corresponded to, each label word in the first popular word spreading range is equally corresponding more Individual popular word.For example " product information " this label word has just corresponded to " technology " and " study course " the two popular words, then Can is ranked up according to the weights of " technology " and " study course ", the high popular word of the weights popular word lower than weights, more Property of first popular word in terms of this label word can be reflected.
Step 106, when input the first popular word as term when, by the first popular word spreading range Popular word corresponding to label word is shown.When showing, the weights order for being first according to label word is arranged, so Corresponding popular word is arranged according still further to respective weights in each label word afterwards.
In popular word corresponding to step 107, the label word out of this spreading range, the popular word needed is selected, is made For the query expansion word of term.
Such as some user wishes to learn the technology of Siemens, and the user understands the technology of Siemens not at all, Data source then can be first determined, for example is retrieved from forum data, " Siemens " is then inputted and is used as term, then can obtain The corresponding vocabulary of some aspects of property most relevant with " Siemens ", for example include " technology " in the label word of " product information " " study course ", include " study course " in the label word of " teaching ", user is according to the needs of oneself, it is possible to increases " study course " conduct Query expansion word.
, can will if user worries that these query expansion words may be unrelated with initial term, and when making a mistake Source file information corresponding to label word in spreading range where popular word is shown, and user just can be according to these sources Fileinfo judges whether these expansion words are relevant with initial term, and selects the popular word needed, as term Query expansion word.
In order to realize above-mentioned flow, an alternative embodiment of the invention additionally provides a kind of system of query expansion word, such as Shown in Fig. 2, the system includes tag unit 201, label word dictionary unit 202, weighted units 203, sequencing unit 204, input Unit 205, selecting unit 206 and source file memory cell 207.
Wherein, tag unit sets one or more label word to each popular word.
Label word dictionary unit stores popular word and corresponding label word.
Weighted units are according to the relation between second popular word on the first popular word periphery, to each second generic word Each label word belonging to remittance and each second popular word is weighted.
Whole label words belonging to second popular word are ranked up by sequencing unit according to the weights of each label word, From whole label words belonging to second popular word, the high weight label word of predetermined number is extracted, it is first common as this The spreading range of vocabulary, and by popular word corresponding to each label word in the first popular word spreading range according to The weights order of popular word is arranged.
Input block inputs the first popular word as term.
Source file memory cell stores the source file information where popular word corresponding to label word in the spreading range.
In popular word corresponding to label word of the selecting unit out of this spreading range, the popular word needed is selected, is made For the query expansion word of term;Or the popular word of needs is further selected according to the source file information, as term Query expansion word.
Technical solution of the present invention is employed, by carrying out weights quantization to the relation between vocabulary, can be retrieved from minority Word, term closely related therewith is expanded, so as to help user to build the retrieval formula of complete and accurate, improve information retrieval Efficiency.
Above-described embodiment is the preferable embodiment of the present invention, but embodiments of the present invention are not by above-described embodiment Limitation, other any Spirit Essences without departing from the present invention with made under principle change, modification, replacement, combine, simplification, Equivalent substitute mode is should be, is included within protection scope of the present invention.

Claims (10)

  1. A kind of 1. method of query expansion word, it is characterised in that comprise the following steps:
    The label word no less than one is set to each popular word, forms label word dictionary;
    It is common to each second according to the relation between the first popular word and second popular word on the first popular word periphery Each label word belonging to vocabulary is weighted;
    Whole label words belonging to second popular word are ranked up according to the weights of each label word;
    From whole label words belonging to second popular word, the high weight label word of predetermined number is extracted, as described The spreading range of first popular word;
    It is when inputting the first popular word as term, the label word in the first popular word spreading range is corresponding Popular word show;
    In popular word corresponding to label word out of described spreading range, the popular word needed is selected, as term Query expansion word.
  2. 2. the method for a kind of query expansion word according to claim 1, it is characterised in that further comprising the steps of:
    It is common to each second according to the relation between the first popular word and second popular word on the first popular word periphery Vocabulary is weighted;
    Second popular word corresponding to each label word in the first popular word spreading range is common according to second The weights order of vocabulary is arranged.
  3. 3. the method for a kind of query expansion word according to claim 2, it is characterised in that according to the first popular word periphery The frequency that occurs of the second popular word and/or the appearance of the first popular word and second popular word on the first popular word periphery Distance, be weighted to each label word belonging to each second popular word and each second popular word.
  4. 4. the method for a kind of query expansion word according to claim 1, it is characterised in that the label word dictionary uses people Machine interactive mode generates.
  5. 5. the method for a kind of query expansion word according to claim 1, it is characterised in that further comprising the steps of:
    Source file information where popular word corresponding to label word in the spreading range is shown, according to the source The popular word that fileinfo selection needs, the query expansion word as term.
  6. A kind of 6. method of query expansion word according to claim 1 or 5, it is characterised in that the source file bag of retrieval Include and be no less than a data source.
  7. 7. the method for a kind of query expansion word according to claim 6, it is characterised in that the data source is news, opinion Altar and/or microblogging.
  8. 8. the method for a kind of query expansion word according to claim 6, it is characterised in that the data source is different technologies Field or the data in different business field.
  9. 9. a kind of system of query expansion word, it is characterised in that including tag unit, label word dictionary unit, weighted units, row Sequence unit, input block and selecting unit, wherein,
    Tag unit is used to set the label word no less than one to each popular word;
    Label word dictionary unit is used to store popular word and corresponding label word;
    Weighted units are used for according to the relation between the first popular word and second popular word on the first popular word periphery, give Each label word belonging to each second popular word and each second popular word is weighted;
    Sequencing unit is used to be arranged whole label words belonging to second popular word according to the weights of each label word Sequence, from whole label words belonging to second popular word, the high weight label word of predetermined number is extracted, as described The spreading range of one popular word, for will be general corresponding to each label word in the first popular word spreading range Logical vocabulary is arranged according to the weights order of popular word;
    Input block is used to input the first popular word as term;
    Selecting unit is used in popular word corresponding to the label word out of described spreading range, selects the popular word needed, Query expansion word as term.
  10. 10. the system of a kind of query expansion word according to claim 9, it is characterised in that also stored including source file single Member, the source file memory cell are used to store the source file corresponding to the label word in the spreading range where popular word Information;
    The selecting unit is used for the popular word that needs are selected according to the source file information, the query expansion as term Word.
CN201310231653.6A 2013-06-09 2013-06-09 A kind of method and system of query expansion word Expired - Fee Related CN104239314B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310231653.6A CN104239314B (en) 2013-06-09 2013-06-09 A kind of method and system of query expansion word

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310231653.6A CN104239314B (en) 2013-06-09 2013-06-09 A kind of method and system of query expansion word

Publications (2)

Publication Number Publication Date
CN104239314A CN104239314A (en) 2014-12-24
CN104239314B true CN104239314B (en) 2018-01-19

Family

ID=52227405

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310231653.6A Expired - Fee Related CN104239314B (en) 2013-06-09 2013-06-09 A kind of method and system of query expansion word

Country Status (1)

Country Link
CN (1) CN104239314B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106897290B (en) * 2015-12-17 2020-04-24 中国移动通信集团上海有限公司 Method and device for establishing keyword model
CN108228643A (en) * 2016-12-21 2018-06-29 北京视联动力国际信息技术有限公司 A kind of search method and system
CN113742459B (en) * 2021-11-05 2022-03-04 北京世纪好未来教育科技有限公司 Vocabulary display method and device, electronic equipment and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102375885A (en) * 2011-10-21 2012-03-14 北京百度网讯科技有限公司 Method and device for providing search suggestions corresponding to query sequence
CN102622358A (en) * 2011-01-27 2012-08-01 天脉聚源(北京)传媒科技有限公司 Method and system for information searching

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100931025B1 (en) * 2008-03-18 2009-12-10 한국과학기술원 Query expansion method using additional terms to improve accuracy without compromising recall

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102622358A (en) * 2011-01-27 2012-08-01 天脉聚源(北京)传媒科技有限公司 Method and system for information searching
CN102375885A (en) * 2011-10-21 2012-03-14 北京百度网讯科技有限公司 Method and device for providing search suggestions corresponding to query sequence

Also Published As

Publication number Publication date
CN104239314A (en) 2014-12-24

Similar Documents

Publication Publication Date Title
CN102737120B (en) Personalized network learning resource recommendation method
CN104182517B (en) The method and device of data processing
CN103699689B (en) Method and device for establishing event repository
CN107368468A (en) A kind of generation method and system of O&M knowledge mapping
CN106155522B (en) Session data processing, knowledge base foundation, optimization, exchange method and device
CN104820677B (en) A kind of subject level methods of exhibiting and system
KR101426765B1 (en) System and method for supplying collaboration partner search service
CN106934032A (en) A kind of city knowledge mapping construction method and device
CN105068661A (en) Man-machine interaction method and system based on artificial intelligence
CN103106262B (en) The method and apparatus that document classification, supporting vector machine model generate
CN106527757A (en) Input error correction method and apparatus
CN102955848A (en) Semantic-based three-dimensional model retrieval system and method
CN108550292A (en) A kind of education resource multilayer tissue of on-line education system and representation method
WO2008025786A3 (en) Interpreting a plurality of m-dimensional attribute vectors assigned to a plurality of locations in an n-dimensional interpretation space
CN103744889B (en) A kind of method and apparatus for problem progress clustering processing
CN106227714A (en) A kind of method and apparatus obtaining the key word generating poem based on artificial intelligence
CN106934071A (en) Recommendation method and device based on Heterogeneous Information network and Bayes's personalized ordering
CN110490686A (en) A kind of building of commodity Rating Model, recommended method and system based on Time Perception
CN106462585B (en) System and method for particular column materialization scheduling
CN108831442A (en) Point of interest recognition methods, device, terminal device and storage medium
CN109918627A (en) Document creation method, device, electronic equipment and storage medium
CN105760514A (en) Method for automatically obtaining short text of knowledge domain from community question-and-answer website
CN104239314B (en) A kind of method and system of query expansion word
CN104699695B (en) A kind of Relation extraction method and information retrieval method based on multiple features semantic tree core
CN105701133A (en) Address input method and equipment

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 300090 Tianjin City Huayuan Industrial Zone Rong Yuan Road No. 1 North B room 322-323

Applicant after: Tianjin mass information technology Limited by Share Ltd

Address before: Beijing version information No. 3 port 100029 Beijing city Xicheng District Yumin Road two

Applicant before: Tianjin Hylanda Information Technology Co.,Ltd.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20180119

Termination date: 20200609

CF01 Termination of patent right due to non-payment of annual fee