CN106372226A - Information retrieval device and method - Google Patents

Information retrieval device and method Download PDF

Info

Publication number
CN106372226A
CN106372226A CN201610809109.9A CN201610809109A CN106372226A CN 106372226 A CN106372226 A CN 106372226A CN 201610809109 A CN201610809109 A CN 201610809109A CN 106372226 A CN106372226 A CN 106372226A
Authority
CN
China
Prior art keywords
retrieval
retrieval type
classification number
key word
type
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610809109.9A
Other languages
Chinese (zh)
Other versions
CN106372226B (en
Inventor
朱欣昱
崔国振
程序
孔文娟
谢虹霞
张素兰
赵亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Intellectual Property Press LLC
Original Assignee
Intellectual Property Press LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Intellectual Property Press LLC filed Critical Intellectual Property Press LLC
Priority to CN201610809109.9A priority Critical patent/CN106372226B/en
Publication of CN106372226A publication Critical patent/CN106372226A/en
Application granted granted Critical
Publication of CN106372226B publication Critical patent/CN106372226B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3349Reuse of stored results of previous queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3325Reformulation based on results of preceding query
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2216/00Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
    • G06F2216/11Patent retrieval

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses an information retrieval device and method. The device comprises a receiving unit, a keyword obtaining unit, a comparison unit, a classification number obtaining unit and a retrieval formula building unit, wherein the receiving unit receives a specific patent number input by a user; the keyword obtaining unit automatically extracts keywords from patent information corresponding to the specific patent number; the comparison unit divides the extracted keywords into a plurality of priority levels according to the correlation degree of the keywords and the specific patent; the classification number obtaining unit extracts a classification number from the patent information corresponding to the specific patent number, and dividing the plurality of extracted patent numbers into a plurality of priority levels; the retrieval formula building unit builds a retrieval formula on the keywords and the classification number in a sequence from high priority levels to low priority levels until the retrieval formula with the retrieval result meeting the preset threshold value condition is built. The keywords and the classification number are automatically extracted; the keywords and the classification number are subjected to priority classification according to the correlation degree; the retrieval formula is built according to the priority level sequence of the keywords and the classification number; the patient information similar to the expected retrieval theme is precisely, efficiently and automatically retrieved.

Description

Information indexing device and method
Technical field
The present invention relates to a kind of information indexing device and method, more particularly, to a kind of to technical information, such as patent information Enter the apparatus and method of line retrieval.
Background technology
Technical information, especially patent information are indispensable valuable sources for the development of enterprise or scientific research institutions. For example before enterprise or scientific research institutions are researched and developed or invest, the state of the art of particular technology area can be fully understanded, Determine correct research direction, it is to avoid overlapping development, time-consuming and research funding.
But recent years, patent information increases rapidly, the annual patent document published in the whole world is more than million;And Existing patent retrieval is typically carried out in patent database, and its method is first according to theme to be checked, according to searcher The related key word of experience input and its synonym or corresponding classification number, build retrieval type, and by the side of manual reviews Formula carries out to retrieval type repeating to adjust, thus obtaining required data, with the explosive growth of patent data amount, above-mentioned artificial The mode of retrieval relevant information becomes increasingly to waste time and energy, particularly with being unfamiliar with database structure database retrieval type structure For the technical staff building, accurately finding rapidly oneself information needed becomes more and more difficult.Accordingly, it is desirable to provide it is a kind of automatic Retrieve the apparatus and method of the patent information similar to the theme expecting retrieval.
Patent documentation 1 (publication number: jp2005-234868a) discloses a kind of similar application specification searching system, and this is System can be by the key word extracted out from patent specification to be checked, in multiple patent specifications of storage from data, retrieval Go out similar patent specification, this system includes retrieval language extraction unit, extract the language described in Patent right requirement to be checked out Speech, and as retrieval language output;Conceptual illustration book word extraction unit, extracts the invention theory base describing as retrieval language out The conceptual illustration word of the concept of plinth;Associated language extraction unit, extracts the language described in conceptual illustration word out, and as association Language exports;Document retrieval portion, by retrieval language and associated language, retrieves similar patent specification from data base.
Although patent documentation 1 can be automatically without the help of artificially retrieving and similar application, due to by machine In the word that device extracts automatically, generally include some nonsensical words, such as, in computer realm, " function " is that do not have The word of implication, additionally, different words is also different from the degree of closeness of theme, for example, when patent to be checked is related to one kind The camera lens of camera head, and when comprising " ccd " in the retrieval language extracting, by the scheme of patent documentation 1, should " ccd " also need Participate in structure retrieval type, it is apparent that the degree of association between " ccd " and " camera lens " is less, therefore, if " ccd " is also placed in building inspection Cable-styled it is possible to the term because introducing is excessive, cause leak subtraction on the contrary.
Accordingly, it is desirable to provide a kind of more accurately and efficiently automatically retrieval can go out the patent similar to the theme of expectation retrieval The apparatus and method of information.
Content of the invention
The technical problem to be solved in the present invention is to provide a kind of information indexing device and method, is provided in particular in a kind of patent Information indexing device and method, more accurately and efficiently automatically retrieval can go out the patent information similar to the theme of expectation retrieval Apparatus and method.
The information indexing device of the present invention, comprising: receiving unit, the specific patent No. of receiving user's input;Key word obtains Take unit, extracting keywords automatically from the corresponding patent information of the above-mentioned specific patent No.;Comparing unit, according to key word with upper State the degree of correlation between specific patent, the key word being extracted is divided into multiple priority;Classification number acquiring unit, from above-mentioned Extract classification number in the corresponding patent information of the specific patent No., and the multiple classification numbers being extracted are divided into multiple priority;Inspection Cable-styled construction unit, by key word and/or classification number according to priority sequential build retrieval type from high to low, until constructing inspection Hitch fruit meets the retrieval type of predetermined threshold condition.
The information retrieval method of the present invention, comprising: receiving step, the specific patent No. of receiving user's input;Key word obtains Take step, extracting keywords automatically from the corresponding patent information of the above-mentioned specific patent No.;Comparison step, according to key word with upper State the degree of correlation between specific patent, the key word being extracted is divided into multiple priority;Classification number obtaining step, from above-mentioned Extract classification number in the corresponding patent information of the specific patent No., and the multiple classification numbers being extracted are divided into multiple priority;Inspection Cable-styled construction step, by key word and/or classification number according to priority sequential build retrieval type from high to low, until constructing inspection Hitch fruit meets the retrieval type of predetermined threshold condition.
Due in the present invention, automatically extracting key word, and by degree of association, preferential fraction is carried out to the key word being extracted Level, is also carried out to classification number automatically extracting and priority classification, afterwards, according to keyword to build with the priority orders of classification number Retrieval type, thus, compared with prior art, because it is with the immediate key word of specific patent and classification number with user input To enter line retrieval, therefore more accurately and efficiently automatically retrieval can go out the patent information similar to the theme of expectation retrieval.
In the present invention, key word acquiring unit to the priority classification mode of key word is: specific special from user input Obtain key word in the artificial treatment data of profit and be used as high priority word;Carried out from patent information point according to semanteme afterwards Word, thus obtain semantic key words, and as general key word.Classification number acquiring unit according to the classification number being obtained is The classification number of artificial treatment, this classification number whether based on classification number or taxonomic hierarchieses type, and acquired classification number is divided Classification number for multiple priority.Because the key word of artificial treatment is compared with automatic semanteme participle, the core of patent more can be embodied Heart concept, and the classification number manually determining also more can embody the position of invention, therefore, in the present invention, by these artificial datas It can more improve the precision of retrieval as high priority word.Additionally, compared with secondary classification, Main classification more can embody invention Core concept, and some taxonomic hierarchieses more segment, accordingly it is also possible to classification number is divided excellent according to the mode of above-mentioned classification First level.
In the present invention, above-mentioned predetermined threshold condition is that retrieval result is more than or equal to the 4th threshold value and is less than or equal to the 5th threshold Value, and the 4th and the 5th threshold value is dynamically changeable.Because the data volume of different field is different, therefore, above-mentioned threshold value is set It is set to the precision that dynamically changeable can improve retrieval further.
The information indexing device of the present invention further includes similarity calculated, calculates each file in retrieval result With the similarity of the specific patent of user input, after this retrieval result is the retrieval type retrieval being built by retrieval type construction step Result;Sequencing unit, is ranked up to each file in above-mentioned retrieval result according to similarity.Thus, it is possible to press similarity Order retrieval result is ranked up, thus improving browse efficiency.
Brief description
Below in conjunction with the accompanying drawings the specific embodiment of the present invention is described in further detail, wherein:
Fig. 1 diagrammatically illustrates an embodiment of the information retrieval system according to the present invention;
Fig. 2 diagrammatically illustrates the embodiment being classified flow process according to the key word in the information retrieval system of the present invention;
Fig. 3 diagrammatically illustrates the classification implemented according to the classification number acquiring unit in the information retrieval system of the present invention Number classification flow process an embodiment;
Fig. 4 diagrammatically illustrates of the retrieval type structure flow process implemented according to the information retrieval system of the present invention;
Fig. 5 diagrammatically illustrates the structured flowchart of the retrieval type construction unit of second embodiment;
Fig. 6 (a) (b) (c) (d) diagrammatically illustrates the retrieval type structure that the retrieval type construction unit of second embodiment is implemented Build flow process;
Fig. 7 diagrammatically illustrates according to the dynamic threshold determining unit in the information retrieval system of the present invention;
Fig. 8 diagrammatically illustrates the another embodiment of the information retrieval system according to the present invention;
Fig. 9 diagrammatically illustrates an embodiment of the computer system according to the present invention.
Specific embodiment
First embodiment
Fig. 1 shows an embodiment of the information retrieval system of the present invention.Fig. 2 shows the information inspection according to the present invention The key word that cable system is implemented is classified an embodiment of flow process.Fig. 3 shows that the information retrieval system institute according to the present invention is real The classification number applied is classified an embodiment of flow process.Fig. 4 shows and builds according to the retrieval type in the information retrieval system of the present invention The retrieval type that unit is implemented builds an embodiment of flow process.Below, to carry out respective description in conjunction with Fig. 1-4.
As shown in figure 1, this information retrieval system includes input equipment 101, data searcher 201 and information database 301.Wherein, the information of above-mentioned input equipment 101 receiving user's input, this input information is, for example, certain specific patent No..Letter A collection of technological document information has been prestored, it is public that this technological document information includes but is not limited to various countries' patent in breath data base 301 Report, patent announcement, utility model publication, specific criteria, core periodical file etc..
As shown in figure 1, the data searcher of the present invention includes receiving unit 202, patent information acquiring unit 203, height Priority word acquiring unit 204, semantic participle unit 205, filter element 206, comparing unit 207, classification number acquiring unit 208th, retrieval type construction unit 209, thesaurus 211 and retrieval result memory element 210.In Fig. 1, high priority word obtains single Unit 204, semantic participle unit 205 and filter element 206 constitute the key word acquiring unit 213 of this data searcher.Language Adopted participle unit 205 and filter element 206 constitute semantic word acquiring unit 212.
As shown in Fig. 2 in step s2020, receiving unit 202 receiving user's input in data searcher 201 Information, the information of this user input is, for example, certain specific patent No..
In step s2030, the specific patent No. that patent information acquiring unit 203 receives according to receiving unit 202, inspection Rope information database 301, thus obtain patent information corresponding with this specific patent No..
After obtaining the corresponding patent information of the specific patent No. in step s2030, carry out key word and classification number respectively Classification, Fig. 3 shows the concrete hierarchical approaches of classification number.The s2040-s2074 of Fig. 2 shows the concrete classification side of key word Formula.
In step s2040 of Fig. 2, obtained in the patent information of this specific patent by high priority word acquiring unit 204 High priority word, wherein, this high priority word is, for example, the word specific patent being carried out extract after patent processing, such as in moral In temperature spy data base, every patent has key word list records, then high priority word acquiring unit 204 obtains this key word row Table record, and as high priority word, or this high priority word can also be for certain specific patent No. made special There is the word in retrieval record.
In step s2050, this specific patent information is carried out point according to the semantic meaning of word by semantic participle unit 205 Word, this specific patent information can be specification and claims, optionally, due to containing more methods in claim Rule information, preferentially can carry out participle to it, and for dependent claims, independent claims illustrate this spy Determine the protection domain of patent, illustrate the legal right scope of this patent, and more can embody the inventive concept of application, therefore, language Adopted participle unit 205 only can carry out participle to the independent claims of this invention.Certainly, those skilled in the art should manage Solution, also can carry out grade classification in semantic participle, for example, first can carry out semantic participle to denomination of invention, afterwards to independence Claim carries out participle, subsequently carries out participle to dependent claims again, finally carries out participle to description.To description When carrying out participle, due to it comprises excessive sentence, therefore now semantic participle can be carried out in conjunction with word frequency information.
In step s2060, semantic word segmentation result is compared by filter element 206 with filtering dictionary, and filters some screens Cover word and individual character, above-mentioned shielding word and individual character are the words in retrieval without concrete meaning, for example, for " there is panorama The high-temperature camera of function ", the semantic analysis result in step s2050 be " having ", " all ", " depending on ", " function ", " ", " high temperature ", " video camera ", now, filter element 206 may filter that the vocabulary without art-recognized meanings, for example " have ", " function ", " all ", " depending on ", " ", i.e. in this instance, after filtering through filter element 206, only retain key word " high temperature " and " video camera ".
In step s2070, comparing unit 207 is respectively by the high priority word obtaining in step s2040 and step s2060 In the semantic participle that obtained by semantic participle unit 205 and the denomination of invention of this specific patent and/or the master of independent claims Autograph title compares, thus the high priority word to above-mentioned acquisition and semantic word segmentation result carry out grade separation.Wherein, above-mentioned ratio Can be relatively that high priority word and semantic participle are carried out similarity analysis with denomination of invention and/or subject name, and be based on phase Above-mentioned word is classified like property analysis result.
In s2071, judge that the high priority word that step obtains in s2040 is divided with the semanteme of acquisition in step s2060 Whether word is more than or equal to first threshold with the similarity of denomination of invention and/or the subject name of independent claims.
This similarity analysis is for example as the master of high priority word and semantic participle and denomination of invention and/or independent claims It is believed that similarity highest when being completely superposed, now similarity is 1 to autograph title, and when the high priority word as comparison other with It is believed that similarity now is when the subject name of semantic participle and denomination of invention and/or independent claims has individual character to overlap " individual character/(length of the word of comparison other) ", for example, when " shooting " occurs in high priority word, and contains in denomination of invention When " shooting ", " shoot " due to now and " shooting " middle word overlapping is " taking the photograph ", therefore, and the word length of " shooting " is 2 Word, therefore, similarity now is 1/2, that is, 0.5.
Additionally, reference may also be made to thesaurus 211 to carry out similarity analysis, certainly in thesaurus 211 in advance According to its physical meaning, different similarities are imparted to several synonyms, for example, for " shooting " this word, " shooting " and its Similarity be 0.8, " camera ", although relevant with shooting, but its more expression is the equipment shooting, therefore, phase with it Low compared with " shooting " like degree, may be 0.4.
When the judged result in step s2071 is more than first threshold for similarity, that is, the judged result of step s2071 is During "Yes", in step s2072, the high priority word that similarity is more than first threshold is defined as the first priority word, i.e. a class Word.Then, word similarity being more than the semantic participle of first threshold and being not included in a class word is defined as the second priority Word, i.e. b class word.
When the judged result of step s2071 is "No", in step s2073, then judge the height that step s2040 obtains The similarity of priority word and denomination of invention and/or independent claims whether between first threshold and Second Threshold, that is, Whether less than first threshold but more than or equal to Second Threshold.
When the judged result of step s2073 is "Yes", in step s2074, by similarity between the first and second thresholds High priority word between value is defined as third priority word, i.e. c class word;By similarity between the first and second threshold values Semantic participle is defined as the 4th priority word, i.e. d class word.
When the judged result of step s2073 is "No", that is, process ends are that is to say, that be less than for similarity The high priority word of two threshold values and semantic participle are it is believed that retrieval result can be limited too small scope, no by the introducing of these words Beneficial to the retrieval carrying out follow-up similar topic, therefore, these words are not classified further, subsequently to be carried out retrieval type structure Build.Certainly, also further these words and the 3rd threshold value can be compared (the pass of the wherein the 3rd threshold value and first, second threshold value System be the 3rd threshold value < Second Threshold < and first threshold), and be unsatisfactory for step s2071 and the word of s2073 enters to above-mentioned further Row is classified further, and obtains such as e, f, g ... class word etc..
Wherein, above-mentioned a class, b class, c class, d class, e class, f class, g class .... key word has multiple, and above-mentioned all kinds Key word between misaligned each other.That is, when certain specific word is a class word, when being also c class word, it only can be defined as excellent The high word of first level, i.e. a class word.In this example, for convenience of explanation, a-d class keywords, i.e. the first~the 4th priority are only set Word.
As shown in figure 3, in step s2030 of Fig. 2, patent information acquiring unit 203 receives according to receiving unit 202 The specific patent No., retrieved information data storehouse 301, thus after obtaining patent information corresponding with this specific patent No., in Fig. 3 By the classification number acquiring unit 208 of Fig. 1, middle entrance step s2080, judges that whether there is high priority in this patent information classifies Number.
Classification number grade can be divided according to the type of taxonomic hierarchieses, due to as a rule, cpc classification number divides compared with ipc Class-mark is more accurate, and therefore, this high priority classification number can be the cpc information in this patent information.That is, when this patent information In comprise during cpc classification number it is believed that there is high priority classification number.
When being judged as YES high priority classification number in step s2080, when that is, the judged result of step s2080 is "Yes", In step s2081, this high priority classification number is defined as priority classification No. the first, i.e. a class classification number, wherein, has One or more this class classification number.
Afterwards, in step s2082, by priority classification No. the first, after that is, a class classification number carries out logic or retrieval, Carry out descending statistics by quantity in the retrieval result obtaining, and it is preferential that before descending statistics ranking ten classification number is defined as second Level classification number, i.e. b class classification number.This b class classification number is the classification number after eliminating a class classification number.Of course, it is possible to as needed The quantity of self-defined descending statistics, the quantity of this descending statistics can be the first two ten or front 15 etc..
When the judged result of step s2080 is "No", when not comprising high priority classification number in patent information, in step In rapid s2083, the classification number comprising in the patent information of the specific patent No. of user input is defined as third priority classification Number, i.e. c class classification number.Wherein, this c class classification number is to eliminate a class, the classification number after b class classification number, this c class classification number The classification number e.g. carrying in the disclosure of patent, wherein, has this c class classification number one or more.
In step s2084, by third priority classification number, after that is, c class classification number carries out logic or retrieval, obtain Carry out descending statistics by quantity in retrieval result, and the classification number that descending is counted before ranking ten orientates the 4th priority classification as Number, i.e. d class classification number.This d class classification number is to eliminate a class, b class, the classification number after c class classification number.Of course, it is possible to according to need Will self-defined descending statistics quantity, the quantity of this descending statistics can be the first two ten or front 15 etc..
Additionally, also can be classified to classification number further, for example, can extract a class classification number and the classification of b class respectively Big group number being located, and as e class classification number and f class classification number.In this example, for convenience of explanation, only setting a~d class is closed Keyword, i.e. priority classification No. the first~the 4th.
Fig. 4 shows the retrieval type implemented according to the retrieval type construction unit 209 in the information retrieval system of the present invention Build an embodiment of flow process.
The retrieval type of this embodiment builds criterion: it is the relation of logical AND between the gradational key word of institute, all Between classification number be logic or relation, but regardless of the priority level of key word and classification number.Certainly, common skill in the art Art personnel it should be understood that between multiple synonyms of key word should be logic or relation, such as setting tool has a class, b class, c class Three key word grades, a class word has key word a1 and a2, then between the synonym of this a1 be logic or relation, a2's is synonymous Between word be logic or relation, but between a1 and a2, and between different grades of key word, that is, between a class, b class and c class It is the relation of logical AND.
As shown in figure 4, in step s20749, i and j being respectively set to 1, and i and j is natural number.
In step s20750, retrieval type construction unit 209 access thesaurus 211, acquisition i-th, i-1 ... .1 level (on State i, i-1 ... .1 value is more than or equal to 1) synonym of priority word, due to i=1 in an initial condition, that is, now, only obtain excellent The synonym of first level highest the first priority word, i.e. the respective synonym of multiple a class keywords.
In step s20751, priority level is 1 grade of priority word of i-th, i-1 ..., is now multiple a class words And its synonym, carry out logic or operation between synonym, but carry out logical AND operation between multiple a class words.For example, if A class word is " shooting ", " rotation ", and after inquiry thesaurus 211, the synonym obtaining a class word " shooting " is " shooting ", " photograph " Deng the synonym of " rotation " is " rotation ", and therefore, in step s20751, the keyword expression of structure is that " (shooting or claps Take the photograph or photograph) and (rotation or rotates) ".
In step s20752, obtain jth, j-1 ... 1 priority classification number, due to j=1 in an initial condition, that is, this When, obtain priority classification No. the first of highest priority, i.e. multiple a class classification numbers, and the plurality of a class classification number is patrolled Volume or operation.For example, if the multiple a class classification numbers being obtained are h04n5/225, g03b17/55 and g03b17/02 respectively, Then constructed in step s20752 expressed formula is " h04n5/225or g03b17/55or g03b17/02 ".
In step s20753, the expression formula building in step s20751 and step s20752 is carried out logical AND, and shape Become retrieval type, and enter line retrieval with the retrieval type being formed in information database 301.
In step s20754, judge whether the retrieval result in step s20753 is more than or equal to the 4th threshold value and is less than etc. In the 5th threshold value, or whether i > 4 or j > 4.I.e., if the key word of whole four grades or classification number are participated in retrieval type Structure, as i > 4 or j > 4 when, illustrate by gradational key word or classification number take part in the structure of retrieval type, now Preserve this retrieval type, flow process terminates.
When retrieval result is more than or equal to the 4th threshold value and is less than or equal to five threshold values it is believed that retrieval result is proper, The result field thinking retrieved is more close, more related to the content of the specific patent of user input, and now, preserving should Retrieval type and process ends.
And when retrieval result is very few, now think that the territory of machine (information) retrieval is narrow;And when retrieval result is excessive, Think that the dependency of the document that machine (information) retrieval introduces some noises, is obtained is not strong.In this example, can be set according to field feature Put the 4th and the 5th threshold value, for example, the 4th threshold value can be set to 1500, the 5th threshold value is set to 2000.
Therefore, in step s20755, judge whether retrieval result is less than the 4th threshold value.When judged result is for "Yes", When i.e. retrieval result is less than four threshold values, if j=j+1, afterwards, enter in step s20752, to multiple jth, j-1 ... 1 Priority classification number carries out logic or operation, due to now j=2, the classification number of the second, first priority is carried out logic or Operation, if the priority classification No. the second for example being obtained is h04n5/222, during h04n5/235, now, by the second priority Classification number and the first priority classification h04n5/225, g03b17/55 and g03b17/02 mono- acts as logic or operation, that is, at this In step s20752 of loop, constructed retrieval type is " h04n5/225or g03b17/55or g03b17/02or h04n5/222or h04n5/235”.
Afterwards, then form retrieval type in step s20753, and enter line retrieval in information database 301, subsequently, in step Continue in rapid s20754 whether to judge retrieval result more than or equal to the 4th threshold value and less than or equal to the 5th threshold value, or meet i 4 Or j > 4.
When the judged result of step s20755 is "No", now show that retrieval result is more than the 5th threshold value, now, if I=i+1, afterwards, in step s20750, accesses thesaurus, obtains the synonym of i-th, i-1 ... 1 priority word, due to Now i=2, that is, obtain the one one, second priority word, i.e. a class word and b class word and its synonym.
Afterwards, in step s20751, multiple i-th, i-1 ... ... 1 priority words are carried out logical AND operation, in key Logic or operation is done between the synonym of word.For example, in this example, the first priority word, that is, a class word is " shooting ", " rotation ", Second priority word, that is, b class word is " cooling ", and after " cooling " inquiry thesaurus, its synonym has " high temperature resistant ", " reduces Temperature ", " cooling ", therefore, the keyword expression now building in step s20751 is " (shooting or shoots or photograph) And (rotation or rotates) and (the cooling high temperature resistant or of or reduces the cooling of temperature or) ".
Afterwards, in step s20753, continuously form retrieval type, in information database 301, enter line retrieval, and in step Judge in s20754 whether retrieval result meets less than or equal to the 5th threshold value and be more than or equal to the 4th threshold value, or meet i 4 or j >4.
Afterwards, above-mentioned circulation is carried out always according to judged result, until retrieval result is less than or equal to the 5th threshold value and big In equal to the 4th threshold value, or i > 4 or j > 4.In above-mentioned, be set to as i > 4 or j > 4 when terminate flow process and be because in this example, For key word, be only provided with a class, b class, c class, d class level Four priority, for classification number be also only provided with a class, b class, c class, D class level Four priority.Certainly, it will be apparent to those skilled in the art that working as the key word of setting and the priority of classification number When more, the arranges value of i and j of process ends is also bigger, and the priority number corresponding to this division.
Retrieval type is being constructed according to the step of Fig. 2-4 by retrieval type construction unit 209, is entering in information database 301 After line retrieval, retrieval result is stored in retrieval result memory element 210.
In this example, the word that key word acquiring unit 213 is obtained by data mart modeling and word segmentation result, and further by it Compare with the theme of denomination of invention and/or application, by its similarity, key word is classified, but also can be according to it He classifies to key word at mode, if there is other key words in the retrieving of data mart modeling or big data in ratio, this When, the order of accuarcy also dependent on key word carries out the further priority classification of key word, for example, it is possible to according to its with artificial The order of accuarcy of the meaning of a word between identification, obtainable key word is divided into multiple grades, then by itself and denomination of invention and application Theme compare, thus key word is divided into more priority levels.
Additionally, in terms of judging key word and the dependency of invention thus dividing the priority of key word, in this example, illustrating Key word is carried out similarity-rough set with denomination of invention and/or claimed subject matter title, thus obtain priority level Example, but other modes may also be employed to divide the priority level of key word, for example can be to the application of whole invention File is analyzed, and afterwards, then carries out word frequency analysis, obtain word frequency analysis list, then by the key word automatically obtaining with upper Predicate frequency analysis list being compared, thus judging the priority level of key word.
In upper example, if there is confirmed cpc classification number, that is, as high priority classification number in application.Certainly, Other taxonomic hierarchieses, such as ft, uc taxonomic hierarchieses etc. can be adopted, or due in different fields, each taxonomic hierarchieses thin Divide degree different, such as, in field of cameras, ft taxonomic hierarchieses subdivision degree is higher, at this time it is also possible to specific in this video camera In field, when having ft classification number in the patent information of this specific patent, this ft classification number can be set as high priority Classification number.
Or, in some data bases, also comprise the classification number information after confirming, such as in the data base of EUROPEAN PATENT OFFICE In, there is the cpc classification number field after for example confirming;And the searcher of this specific patent in some data bases, may be comprised The classification number once using, accordingly it is also possible to the classification number that the classification number after above-mentioned confirmation or searcher are used is as height Priority classification number.
Or, if only existing ipc classification number in certain application, to determine priority level also dependent on ipc classification number, this When, generally using the classification number after manual confirmation is examined or authorize text classification number as high priority classification number divide according to According to, and using the classification number in disclosure as lower grade priority partitioning standards.
In upper example, illustrate to build retrieval type according to the priority level of key word and classification number simultaneously, but also can be only Priority according to key word or classification number to build retrieval type, when the grade only in accordance with key word to build retrieval type, The classification number automatically extracting first can be carried out logic or computing, according to priority grade to add key word step by step more afterwards;When When building retrieval type only in accordance with the grade of classification number, first can automatically extract out key word, example from some specific contents As only automatically extracted out key word from denomination of invention or independent claims, and after being carried out shielding the removal of word, Build retrieval type by adding classification number step by step.
Second embodiment
The frame structure of second embodiment is identical with Fig. 1 of first embodiment, and it is different only in that with first embodiment The retrieval type building mode of retrieval type construction unit 209 is different, therefore, here only describe its different from first embodiment it Place, and omit other explanations.
Fig. 5 shows the module map of the retrieval type construction unit 209 ' according to the second embodiment of the present invention.Fig. 6 shows Build another reality of flow process according to the retrieval type that the retrieval type construction unit 209 ' in the information retrieval system of the present invention is implemented Apply example.
In this second embodiment, key word is identical with above-mentioned first embodiment with the hierarchical approaches of classification number, that is, second In embodiment, in the way of such as first embodiment keyword classification is also multiple priority words, here, for convenience of explanation, only Enumerate and be divided into a class, b class, the situation of c class priority word;Additionally, also classification number is also classified into multiple priority classifications number, for saying Bright convenience, also only enumerates the situation being classified as a class and b class.Certainly, multiple a classes are contained respectively in a class, b class and c class word In key word, b class keywords and c class keywords, a class and b class classification number, also there are multiple classification numbers, if a1 and a2 is a class word, B1, b2 and b3 are b class word, and c1, c2 and c3 are c class word, a1 and a2 is a class classification number, b1 and b2 is b class classification number.
It should be noted that here as example, show and key word is divided into 3 priority, classification number is divided For the situation of 2 priority, it will be appreciated by those skilled in the art that, can as needed key word and classification number be divided into more Many grades, and key word can identical can also differ with the division number of levels of classification number, for example can also be with embodiment 1 Identical, key word and classification number are respectively divided into 4 priority.
The retrieval type construction unit of second embodiment is different from the retrieval type construction unit of first embodiment to be to retrieve Formula build criterion difference: in the first embodiment, be the relation of logical AND between key word, between classification number be logic or Relation, but regardless of the priority level of key word and classification number, is also carried out logical AND operation simultaneously between key word and classification number. And in a second embodiment, retrieval type construction unit builds criterion by the first retrieval type or the second retrieval type builds criterion and to build Retrieval type, wherein, builds in criterion in the first retrieval type, carries out logic or operation between classification number, but regardless of its priority etc. Level, but for key word, carry out logical AND operation between the key word of same levels, enter between different grades of key word Row logic or operation, meanwhile, carry out logical AND operation between key word and classification number;And build in criterion in the second retrieval type, Carry out logic or operation between classification number, between key word, carry out logical AND operation, but regardless of its priority level, and key word Carry out logical AND operation and classification number between.
Certainly, when building retrieval type, the synonym of each key word a1~a2, b1~b3, c1~c3 should also be considered, this Technical staff in field it should be understood that between the synonym of each key word should be logic or relation.
Here, for convenience of explanation, when the change only relating to grade, without regard to the key word in a certain grade or classification number During the change of number, when building criterion to build retrieval type according to the first retrieval type, only " a1and a2 " is represented with " a class word " Situation, represent the situation of " a1or a2 " with a class classification number, certainly, b class word is identical with a class word with the situation of c class word, b class The situation of classification number is identical with a class classification number.
As shown in figure 5, the retrieval type construction unit 209 ' of this second embodiment includes first module 20913, second unit 20914th, the 3rd unit 20915, the 4th unit 20916, the 5th unit 20917, the 6th unit 20918, the second comparing unit 2094 and retrieval type acquiring unit 2093.
This first module 20913 is in the first priority critical word (i.e. a class keywords) and priority classification No. the first (i.e. a Class classification number) build criteria construction retrieval type by the first retrieval type, that is, constructed retrieval type is that " a class word and a class is classified Number " after, it is judged as that retrieval result is less than work during four threshold values by the second comparing unit 2094, first module 20913 is first pressed excellent Interpolation (due in this embodiment, only having divided the key word of three priority, therefore, presses b, c class keywords to first level order afterwards successively Order) different grades of key word, that is, when retrieval result is less than four threshold values, first add b class keywords, and will be added B class keywords and priority classification No. the first (a class classification number) by first retrieval type build criterion to build retrieval type (i.e. institute The retrieval type building is " (a class word or b class word) and a class classification number ", if retrieval result is still less than four threshold values, then adds Plus c class keywords, and it is built criterion to build retrieval type by the first retrieval type with a class classification number, when its retrieval result still During less than four threshold values, more according to priority order adds different grades of classification number successively (in this embodiment, due to only having divided two Priority classification number, therefore here add the second priority classification b class classification number), build criterion by the first retrieval type and carry out structure Build retrieval type, until the second comparing unit 2094 is judged as that retrieval result meets more than or equal to the 4th threshold value and is less than or equal to the 5th The threshold condition of threshold value.
After second unit 20914 adds key word or the classification number of specific grade in first module 20913, compared by second It is judged as that retrieval result is more than work during five threshold values compared with unit 2094.
For example, in this embodiment, when adding third priority key word, that is, during c class word, find that its retrieval result is more than 5th threshold value.Now, after this c class keywords of this interpolation, constructed retrieval type is (a class word or b class word or c class word) and a Class classification number.
Second unit 20914 is more than the 5th threshold with reference to the key word adding above-mentioned specific grade or classification number, retrieval result Constructed retrieval type before value, i.e. retrieval type " (a class word or b class word) and a class classification number ", keep in above-mentioned retrieval type Key word constant, and according to priority order add more low-grade classification number (i.e. b class classification number) successively, and by first retrieval Formula builds criterion to build retrieval type, that is, be configured to " (a class word or b class word) and (a class classification number or b class classification number) ", directly Be judged as that retrieval result meets above-mentioned threshold condition to the second comparing unit 2094, that is, retrieval result be more than or equal to the 4th threshold value and Less than or equal to the 5th threshold value.
After 3rd unit 20915 adds the gradational key word of institute and classification number in first module 20913 or second single After adding the gradational classification number of institute in unit 20914, by be judged as after the second comparing unit 2094 retrieval result still less than Work during four threshold values.In this embodiment, if building retrieval type, " (a class word or b class word) and (divide by a class classification number or b class Class-mark) " retrieval after, find retrieval result still less than the 4th threshold value.Now, the 3rd unit 20915 is with reference to above-mentioned retrieval type, by excellent First level order (pressing c class, b class, the order of a class keywords successively) from low to high, in multiple key words of same grade, Delete key word by order from back to front successively, until the key word of the only remaining predetermined number of this grade.
In this embodiment, because referenced retrieval type is " (a class word or b class word) and (a class classification number or b class Classification number) ", there are not c class keywords, therefore, when deleting, only start to delete from the minimum b class keywords of priority, at this In example, first delete b3 key word, build criterion to build retrieval type by the first retrieval type afterwards, the retrieval type now being retained is " ((a1and a2) or (b1and b2)) and (a1or a2or b1or b2) ", if being still unsatisfactory for threshold condition, then deletes Except b2, until b grade key word is left b1 key word, start afterwards to delete a2 key word, until retrieval type is changed into " (a1or B1 by the second comparing unit 2094) and (a1or a2or b1or b2) ", before each deletion, judges that retrieval result is No meet above-mentioned threshold condition.
Retrieval type acquiring unit 2093 obtains final retrieval type, wherein,
When the second comparing unit 2094 is judged as that retrieval result meets above-mentioned threshold condition, that is, obtains and meet retrieval result Retrieval type;Deleting the gradational key word of institute when Unit the 3rd 20915 (certainly, now can be at predetermined regular in each etc. Level retains the key word of predetermined quantity) retrieval result of constructed afterwards retrieval type still less than four threshold values when, obtain final structure The retrieval result of the retrieval type built, even retrieval type " (a1or b1) and (a1or a2or b1or b2) " is still less than the 4th threshold During value, that is, obtain the retrieval type that above-mentioned retrieval type is used as finally building.
After the 3rd unit 20915 deletes the key word of specific grade, the retrieval result of constructed retrieval type is more than the 5th During threshold value, when the retrieval result of even retrieval type " (a1or b1) and (a1or a2or b1or b2) " is more than five threshold values, this When obtain constructed retrieval type before the key word deleting specific grade, that is, obtain retrieval type " ((a1and a2) or b1) And (a1or a2or b1or b2) " is used as final retrieval type.
4th unit 20916 is in the first priority critical word (i.e. a class keywords) and priority classification No. the first (i.e. a class Classification number) build criterion to build after retrieval type (i.e. retrieval type is " a class word and a class classification number ") by the first retrieval type, by It is judged as after second comparing unit 2094 that its retrieval result is more than work during five threshold values.Now, the 4th unit 20916 is pressed Classification number order from back to front deletes one of priority classification No. first successively, i.e. in this embodiment, deletes preferential A rear classification number a2 in level highest classification number a1 and a2, until the classification number of the only remaining predetermined number of this grade is (in this example In, it is set as only remaining 1), key word keeps constant, builds criterion to build retrieval type by the second retrieval type, that is, the inspection building Cable-styled is " (a1and a2) and a1 ", is judged as that retrieval result meets above-mentioned threshold value bar after by the second comparing unit Part.
5th unit 20917 Unit the 4th 20916 delete specific classification number after, after being compared by the second comparing unit 2094 It is judged as that retrieval result is less than work during four threshold values.Assume the retrieval type " (a1and a2) and a1 " after deleting a2 classification number Retrieval result be less than the 4th threshold value, then Unit the 5th according to delete specific classification number, its retrieval result be less than the 4th threshold value Retrieval type before, i.e. " a class word and a class classification number ", keeps the classification number in above-mentioned retrieval type constant and according to priority suitable Sequence (in this instance by a class, b class, c class word order) add more low-grade key word successively, and by second retrieval type build Criterion is building retrieval type, i.e. " (a class word and b class word) and a class classification number ", until the second comparing unit 2094 judges Meet above-mentioned threshold condition for retrieval result.
The classification number that 6th unit 20918 deletes only remaining predetermined number in the 4th unit 20916 to build retrieval After formula, it is judged as that retrieval result works still greater than during five threshold values by the second comparing unit 2094.After assuming to delete a2 classification number Retrieval type " (a1and a2) and a1 " retrieval result be more than the 5th threshold value, the wherein the 6th unit 20918 is according to above-mentioned Retrieval type, keeps the classification number in above-mentioned retrieval type constant, and according to priority order (i.e. a class, b class, the order of c class word), In the priority word of same grade, add more low-grade key word successively by order from front to back, and press the second retrieval type Build criterion to build retrieval type that is to say, that when adding b class word, first adding b1, that is, retrieval type is " (a1and a2and B1) and a1 ", then adds b2, afterwards for b3, until the second comparing unit 2094 is judged as that retrieval result meets above-mentioned threshold value Condition.
Afterwards, retrieval type is obtained by retrieval type acquiring unit 2093, wherein,
When the second comparing unit 2094 is judged as that retrieval result meets above-mentioned threshold condition, that is, obtains and meet retrieval result Retrieval type.
After Unit the 6th 20918 adds institute's gradational key word, the retrieval result of constructed retrieval type is by the second ratio When being judged as still greater than five threshold values compared with unit 2094, that is, in the present embodiment, if " (a class word and b class word and c class word) When the retrieval result of and a1 " is still greater than five threshold values, then obtain the final retrieval type building.
After the 5th unit 20917 adds the key word of specific grade, the retrieval result of constructed retrieval type is compared by second When being judged as less than four threshold values compared with unit 2094, obtain constructed retrieval type before adding the key word of specific grade.That is, In the above example, if the retrieval result setting retrieval type " (a class word and b1) and a class classification number " is less than four threshold values, Then by " (a1and a2) and a1 " as last retrieval type.
Fig. 6 shows that the retrieval type of the retrieval type construction unit of this embodiment builds an embodiment of flow process.
In step s20760, retrieval type construction unit 209 ' accesses thesaurus 211, obtains first order priority word, The i.e. respective synonym of multiple a class keywords.
In step s20761, multiple first priority words, i.e. a class word and its synonym are carried out between synonym Logic or operation, but carry out logical AND operation between multiple a class words.Even setting a class word has a1 and a2, and the synonym of a1 is The synonym of a1 ', a1 ", a2 is a2 ', and therefore, in step s20761, constructed retrieval type is " (a1or a1 ' or a1 ") and(a2or a2’)”.Below, for convenience of description, when carrying out the structure of retrieval type, a class word is only shown to represent above-mentioned inspection Cable-styled structure situation, or only represent the above-mentioned expression formula extending after synonym with " a1and a2 ".
In step s20762, obtain priority classification No. the first, i.e. a class classification number, and the plurality of a class classification number is entered Row logic or operation.If setting a class classification number is a1 and a2, therefore, constructed expression formula is " a1or in this step A2 ", after for convenience of explanation, only represent with a class classification number that above-mentioned retrieval type builds situation.
In step s20763, by a in the first priority critical word a class word and step s20752 in step s20761 Class classification number carries out logical AND, and forms expression formula 1, i.e. " a class word and a class classification number ", afterwards, with formed retrieval Formula enters line retrieval in information database 301.
In step s20764, judge whether the retrieval quantity of the retrieval result of this expression formula 1 is more than or equal to the 4th threshold value And it is less than or equal to the 5th threshold value, whether meet predetermined threshold condition.If the judged result of step s20764 is "Yes", explanation Retrieval quantity is more suitable, now preserves retrieval type, and flow process terminates.
When judging to retrieve whether quantity is less than the 4th threshold value in step s20765, when its judged result is "Yes", keep Classification number is constant, and the b class word of the second priority and the first priority word are carried out logic or, and obtaining expression formula 2.Press above-mentioned First retrieval type builds criterion and carries out retrieval type structure, and now constructed expression 2 is " (a class word or b class word) and a class Classification number ".
Afterwards, in step s20766, whether the retrieval quantity of judgment expression 2 meets predetermined threshold condition.Work as satisfaction During above-mentioned predetermined threshold condition, preserve retrieval type, flow process terminates.If continuing to be judged as that retrieval quantity is little in step s20767 When four threshold values, continue to keep classification number constant in step s20768, and add key word (the i.e. c class of third priority Word), build criteria construction retrieval type by the first retrieval type, and obtain expression formula 3, that is, this expression 3 is " (a class word or b class word Or c class word) and a class classification number ".
When the judged result in step s20767 is no, when that is, retrieval quantity is more than five threshold values, now in step The expression before expression 2 is preserved, that is, expression formula 1 is as expression formula 1 ' in s20771.
Afterwards, continue to judge to retrieve whether quantity meets predetermined threshold condition and inspection in step s20769 and s20770 Whether rope quantity is less than the 4th threshold value.When the retrieval quantity in step s20769 meets predetermined threshold condition, preserve retrieval Formula, flow process terminates, and the retrieval judged result worked as in step s20770 is no, when that is, retrieval quantity is more than five threshold values, in step The expression before expression 3 is taken, that is, expression formula 2 class is as expression formula 1 ' in rapid s20772.
When being judged as in step s20770 retrieving quantity still less than four threshold values, then take above-mentioned expression in step s20773 Formula 3 is used as expression formula 1 '.
Afterwards, in step s20795, preserve expression formula 1 '.
If being judged as NO in step s20765, when that is, retrieval quantity is more than five threshold values, keep closing in step s20774 Keyword is constant, deletes one of priority classification No. the first (a class classification number) classification number in reverse order, carrys out construction expression formula 4, In this example, there is a1 and a2 because setting a class classification number, therefore, and before a1 comes a2, therefore, now, keep key word constant, and Delete classification number in reverse order successively until the classification number of remaining predetermined number, in this instance, be to delete a2, therefore, now in step The inspection building in rapid s20744 is " a class word and a1 ".
Due in this embodiment, only two a priority classification a1 and a2, therefore, now, no longer delete the first priority Classification number, that is, now the classification number number of remaining predetermined quantity is 1, certainly it will be appreciated by those skilled in the art that can To retain remaining classification number number as needed, for example this remaining classification number number can be 2 or 3.
Judge whether retrieval result meets predetermined threshold condition and inspection respectively in step s20775 and step s20776 Whether rope quantity is more than the 5th threshold value.When judging in step s20775 that retrieval result meets predetermined threshold condition, preserve this When retrieval type, flow process terminates.
When being judged as in step s20776 that retrieving quantity is more than five threshold values, one can be entered due to now not existing The classification number that step is deleted, therefore, takes this expression formula 4 to be used as expression formula 1 ' in step s20778 '.
When being judged as NO in step s20776, when that is, now retrieval quantity is less than four threshold values, then in step s20777 In take expression formula before expression 4, that is, expression formula 1 is used as expression 1 ' '.
Afterwards, in step s20796, preserve this expression formula 1 ' '.
In step s20779, in expression formula 1 ', keep key word constant, and according to priority order adds classification successively Number, in this instance, only classification number is divided into two priority levels, therefore, in step s20779, adds the second preferential fraction Class-mark, i.e. b class classification number, and build criterion to build retrieval type by the first retrieval type, and obtain expression 2 '.
Afterwards, in step s20780 and step s20781, judge respectively to retrieve whether quantity meets predetermined threshold value bar Whether part and retrieval quantity are less than the 4th threshold value.When the judged result in step s20780 is to be, preserve retrieval type now, Flow process terminates.
When being judged as in step s20781 that retrieving quantity is less than four threshold values, expression formula 2 ' is taken to make in step s20783 For expression formula 1 ' ' '.Certainly it should be understood by those skilled in the art that in this example, only classification number being divided into a class and b class two-stage is excellent First level, when having more stages classification number, when being such as categorized as a class, b class, c class, d class level Four priority, in above-mentioned flow process, If be judged as that retrieving quantity is less than four threshold values always, sequentially added c class, d class priority, but keep key word constant, And build criterion to build retrieval type by the first retrieval type.
When the judged result in step s20781 is "No", when that is, retrieval quantity is more than five threshold values, now in step The expression formula before expression formula 2 ' is taken, that is, expression formula 1 ' is used as expression formula 1 ' in rapid s20782 ' '.
Afterwards, in step s20797, preserve expression formula 1 ' ' '.
In step s20784, keep classification number constant, and delete the number of key word, and build by the first retrieval type accurate Then to build retrieval type.The deletion mode of this key word is: according to priority order from low to high deletes key word, with first-class In multiple key words of level, delete key word by order inverted order from back to front, until the pass of the only remaining predetermined number of this grade Keyword.In this, it is assumed that expression formula 1 ' ' ' for " (a class word or b class word or c class word) and (classify by a class classification number or b class Number) ".And because a class word has a1 and a2, b class word has b1, b2 and b3, c class word has c1 and c2.Therefore, in step s20784, first Delete the posterior c2 of ranking in the minimum c class word of priority, because, in this example, c class word only has two key word c1 and c2, because This, the predetermined number now retaining is 1.The expression formula 2 ' now building ' it is " ((a1and a2) or (b1and b2and B3) or c1) and (a class classification number or b class classification number) ".
During deleting, always as step s20785 and s20786, judge the retrieval quantity of above-mentioned expression formula Whether meet predetermined threshold value, and judge whether this retrieval quantity is less than the 4th threshold value.If retrieval quantity is still less than the 4th threshold value, Then then delete the key word b3 of the second priority b class word, the expression formula now building is " ((a1and a2) or (b1and B2) or c1) and (a class classification number or b class classification number) ", if retrieval quantity is still less than the 4th threshold value, continue to delete key Word b2, if still less than the 4th threshold value, will delete key word a2.Thus, the expression formula obtaining is " (a1or b1or c1) and (a Class classification number or b class classification number) ".
If being judged as in step s20785, above-mentioned expression formula meets predetermined threshold condition, preserves above-mentioned retrieval type, And terminate flow process.
If being judged as "No" in step s20786, when that is, retrieval quantity is more than five threshold values, by the table before this expression formula Reach formula and save as expression formula 1 " ".If being judged as "Yes" in step s20786, that is, retrieval quantity still less than the 4th threshold value, then should Expression formula is as expression formula 1 " ".In step s20798, preserve expression formula 1 " ", and terminate flow process.
In step s20787, in expression formula 1 " in, keep classification number constant, and increase the number of key word, and by the Two retrieval types build criterion to build retrieval type.When increasing the number of key word, according to priority order from high to low, same In multiple key words of one grade, by vertical order, i.e. increasing order interpolation key word successively.In this, it is assumed that representing 1 " it is " a class word and a class classification number ", then first add the b1 in b class word, it is assumed herein that a class word has two key words of a1 and a2, The expression formula now building is " ((a1and a2) and b1) and a class classification number ", is judged afterwards, and is judging retrieval When quantity is still greater than five threshold values, continue to add the b2 in b class word, i.e. " ((a1and a2) and (b1and b2)) and a class Classification number ", is the b3 in b class word afterwards, afterwards according to rule, if retrieval quantity is still greater than the 5th threshold value, by c class word The order of c1, c2, c3 to be added successively.
In step s20788, if being judged as, retrieving quantity meets predetermined threshold condition, preserves this expression formula, and ties Line journey.
If being judged as "Yes" in step s20789, that is, when all of key word is all using finishing, but retrieval quantity is still big When five threshold values, now preserve this expression formula as expression formula 1 " " '.If be judged as "No" in step s20789, that is, this When retrieval quantity be less than four threshold values when, then using the expression formula before this expression formula as expression formula 1 " " '.In step s20799 In, preserve this expression formula 1 " " ', and terminate flow process.
3rd embodiment
In above-mentioned first and second embodiments, above-mentioned 4th and the 5th threshold value is a fixed value, but due to different necks Document Quantity under the same classification number in domain is different, and for example, from the point of view of the applications in each field in recent years are distributed, electricity is led 4th and the 5th threshold value therefore, all will be fixed as one in each field apparently higher than the applications of chemical field by the applications in domain Individual fixed value is irrational.
Therefore, in the third embodiment, it is set to make above-mentioned 4th and the 5th threshold value be dynamic value.Its block diagram and first is in fact Apply that example is identical, with first embodiment, 3rd embodiment only difference is that it has dynamic threshold determining unit, for determining 4th and the 5th threshold value, Fig. 7 is the concrete structure of the dynamic threshold determining unit of 3rd embodiment.Therefore, in 3rd embodiment In, for Fig. 1 identical structure and unit, be hereby incorporated same reference, and omit the description, here, only illustrate its with The difference of Fig. 1.
As shown in fig. 7, semantic participle unit 205 is identical with Fig. 1 with the structure of filter element 206.Through filter element 206 mistake After filter, the key word with physical meaning after output filtering, for example in this example, as shown in figure 1, through filter element 206 mistake After filter, only retain word " high temperature " and " video camera ".
Multiple classification numbers that the specific patent No. that only obtains classification number acquiring unit 2081 ' in Fig. 7 contains, and do not carry out point Level.Afterwards, using the key word being exported by filter element in the second retrieval type construction unit 2091 of Fig. 7, and by reference to Thesaurus 211, obtains the synonym of the reservation word after this filtration.
Afterwards, in the catch cropping logic of synonym or the operation of the multiple key words retaining, simultaneously the plurality of key word it Between similarly carry out logic or operation, to form keyword expression.Afterwards, it is also carried out between the multiple classification numbers being obtained Logic or operation, to be formed expressed formula,.Then, the second retrieval type construction unit 2091 is in the antistop list being formed Reach the catch cropping logical AND operation of formula and expressed formula, and the retrieval type building.
For example, when the key word retaining is " high temperature " and " video camera ", after inquiring about thesaurus 211, find " high temperature " Synonym be " temperature high ", the synonym of " video camera " is " photographing unit ", " camera ", and this specific patent classification number of itself is During h04n5/222 and h04n5/235, now, the retrieval type constructed by the second retrieval type construction unit 2091 is " ((high temperature or temperature Degree is high) or (video camera or photographing unit or camera)) and (h04n5/222or h04n5/235) ".
Afterwards, retrieval type constructed by the second retrieval type construction unit 2091 for the record is carried out in information database 301 Obtained retrieval hit amount after retrieval.Afterwards, this retrieval hit amount is fluctuated certain deviation by data-bias unit 2092 Amount, and using the final value obtaining as upper lower threshold value, i.e. the 4th and the 5th threshold value.This side-play amount can be 50% or 25% etc., If for example setting retrieval hit amount as 5000, side-play amount is 50%, then the 4th threshold value can be 5000* (1-0.5)=2500, the Five threshold values can be 5000* (1+0.5)=7500.
Afterwards, after determining above-mentioned dynamic threshold scope, when carrying out the structure of retrieval type of Fig. 4 to Fig. 6, can be by The dynamic threshold of Fig. 7 is used for the determination of the 4th and the 5th threshold value of Fig. 4 to Fig. 6, thus constructing suitable retrieval type, and automatic Retrieve the patent information similar to desired theme.
Fourth embodiment
In above-mentioned first~3rd embodiment, illustrate only and search result storage is come by retrieval result memory element 210.But It is also every patent records in the retrieval result of final hit can be carried out similarity ratio with the specific patent of user input Relatively, and according to similarity every record in retrieval result is ranked up, thus, it is high that user can browse similarity in advance File, improves browse efficiency.
Fig. 8 shows the fourth embodiment of the information retrieval system of the present invention.In Fig. 8, for Fig. 1 identical structure or Module gives same reference and omits corresponding explanation, here, its difference is only described.
Wherein it is possible to the file of storage and use in retrieval result memory element 210 are calculated by similarity calculated 214 The similarity of the specific patent of family input.Above-mentioned Similarity Measure can be using vector comparison method commonly used in the art.For example, may be used Using by the weight list of word and word as a file vector it is assumed that the specific patent of user input vector simplify example be [video camera 1] [high temperature 0.5] [rotation 0.2].And the vector of one of retrieval result retrieving file is [camera lens 1] [ccd 0.7] [high temperature 0.6] [rotation 0.5].Found by analysis, in the file vector in two files, coincidence word is " high temperature " " rotation ", therefore, by its multiplication of vectors, and obtains its similarity=0.5*0.6+0.2*0.5=0.4.
Afterwards, the file of storage in retrieval result memory element 210 is arranged according to similarity by sequencing unit 215 Sequence, and show the ranking results of each file in retrieval result.
Thus, easily inspection can be browsed from front to back by the similarity of file when related personnel browses retrieval result Rope is as a result, it is possible to greatly improve efficiency.
Certainly, in first and second embodiment of the present invention, when the flow process according to Fig. 4 or Fig. 6 is through multipriority The retrieval type of key word and classification number find still can not meet after building during predetermined threshold condition it is also possible to according to similarity Lai Delete or add predetermined file.
For example, when the retrieval quantity of retrieval type after finding in Fig. 4 or Fig. 6 through this all flow process is still greater than the 5th threshold value When, at this point it is possible to be ranked up to the file in retrieval result memory element 210 according to similarity, and delete similarity rear File so that retrieval quantity be equal to the 5th threshold value.
Or, when the retrieval quantity through finding retrieval type after all flow processs in Fig. 4 or Fig. 6 is still less than the 4th threshold value When, at this point it is possible to build retrieval according to the building mode of the second retrieval type construction unit 2091 of Fig. 7 of 3rd embodiment Formula, after afterwards retrieval result being ranked up by similarity, by similarity order from high to low to retrieval result memory element Supplementary document, i.e. the retrieval result supplementary document to retrieval type construction unit, style of writing of going forward side by side part duplicate removal, finally make all files Quantity is equal to the 4th threshold value.
5th embodiment
The information retrieval system of the present invention can computer system 501 as shown in Figure 9 realizing.As shown in figure 9, this Bright computer system 501 includes input equipment 5013, memorizer 5011 and processor 5012, and wherein user is to this input equipment 5013 input informations, store computer instruction information and thesaurus in this memorizer 5011, this computer instruction information It is the command information with the corresponding flow process of third and fourth embodiment for the flow process of executable such as Fig. 2-6;This processor 5012 is from storage Read this computer instruction information and thesaurus storage result in device 5011, to be acted upon so as to receiving user's input The specific patent No.;Extracting keywords automatically from the corresponding patent information of the above-mentioned specific patent No.;According to key word and above-mentioned spy Determine the degree of correlation between patent, the key word being extracted is divided into multiple priority;Corresponding specially from the above-mentioned specific patent No. Extract classification number in sharp information, and the multiple classification numbers being extracted are divided into multiple priority;Key word and classification number are pressed excellent First level sequential build retrieval type from high to low, meets the retrieval type of predetermined threshold condition until constructing retrieval result.
The information indexing device of the present invention and method can implemented following aspects:
For example, for enterprise, when technical staff input the corresponding patent No. when, its can easily obtain with upper State the similar file of the theme of patent expectation retrieval, thus technical staff can be with fast browsing correlation technique, thus improve it grinding Initiate point.
For patent analysis personnel, it can also rely on this technology easily above-mentioned retrieval result to be carried out point Analysis, thus the inventor of clear and definite correlation technique, applicant, chief competitor etc..
For patent retrieval personnel, it is similar that it can be readily available the theme to be retrieved to it by the way File, therefore, it can preferentially browse above-mentioned file, thus improving its recall precision.
The embodiments of the invention above with reference to brief description, but the scope of the present invention is not limited to above-described embodiment, The structure suitably combining or replacing each embodiment is also contained in the scope of the present invention.Those of ordinary skill in the art can root According to structure or the composition of the combination of its knowledge or replacement the various embodiments described above, the embodiment of these deformation is also contained in the present invention's In scope.

Claims (39)

1. a kind of information indexing device is it is characterised in that include:
Receiving unit, the specific patent No. of receiving user's input;
Key word acquiring unit, extracting keywords automatically from the corresponding patent information of the above-mentioned specific patent No.;
Comparing unit, according to the degree of correlation between key word and above-mentioned specific patent, the key word being extracted is divided into multiple Priority;Classification number acquiring unit, extract classification number from the corresponding patent information of the above-mentioned specific patent No., and will be extracted Multiple classification numbers are divided into multiple priority;
Retrieval type construction unit, by key word and/or classification number according to priority sequential build retrieval type from high to low, until structure Build out the retrieval type that retrieval result meets predetermined threshold condition.
2. information indexing device according to claim 1 it is characterised in that:
Retrieval type construction unit to carry out retrieval type structure with reference to the thesaurus of key word.
3. information indexing device according to claim 1 it is characterised in that:
Key word acquiring unit includes high priority word acquiring unit and semantic word acquiring unit;Wherein high priority word obtains Unit obtains high priority word from the artificial treatment data of above-mentioned specific patent;Semantic word acquiring unit is according to semanteme from special Carry out participle, thus obtaining semantic key words in sharp information.
4. information indexing device according to claim 3 it is characterised in that:
Semantic word acquiring unit includes semantic participle unit and filter element, and wherein filter element divides from the semanteme of patent information Shielding word and individual character is removed in word result.
5. information indexing device according to claim 1 it is characterised in that:
Classification number acquiring unit according to the classification number whether classification number being obtained is artificial treatment, this classification number whether based on point One or more of class-mark, type of predtermined category system, and acquired classification number is divided into multiple priority.
6. information indexing device according to claim 1 it is characterised in that:
Wherein predetermined threshold condition is that retrieval result is more than or equal to the 4th threshold value and is less than or equal to the 5th threshold value.
7. information indexing device according to claim 6 it is characterised in that:
Above-mentioned 4th threshold value and the 5th threshold value are dynamically changeables.
8. information indexing device according to claim 7 it is characterised in that:
This information indexing device further includes dynamic threshold determining unit, for adjusting above-mentioned 4th and the 5th threshold value.
9. information indexing device according to claim 8 it is characterised in that:
Above-mentioned key word acquiring unit includes semantic word acquiring unit, carries out participle from patent information, and from word segmentation result Middle removal shields word and individual character, thus obtaining semantic key words;
Above-mentioned dynamic threshold determining unit includes the second retrieval type construction unit data offset units, wherein, this second retrieval Formula construction unit, obtains multiple classification that the semantic key words of semantic word acquiring unit acquisition and classification number acquiring unit extract Number, to build retrieval type and to enter line retrieval, and to obtain retrieval hit amount;
Data-bias unit, by the retrieval positive negative offset of hit amount one scheduled volume, and using the value of forward migration as the 5th threshold value, Using the value of negative offset as the 4th threshold value.
10. information indexing device according to claim 6 it is characterised in that:
Retrieval type construction unit, when building retrieval type, is the relation of logical AND between the key word of identical or different grade, phase With or different grades of classification number between be logic or relation, and be the relation of logical AND between key word and classification number.
11. information indexing devices according to claim 10 it is characterised in that:
When retrieval result is less than four threshold values, the classification number of according to priority order interpolation low priority comes retrieval type construction unit Build retrieval type, until retrieval result meets predetermined threshold condition or there is not the classification number that can add further;
When retrieval result is more than five threshold values, according to priority order adds the key word of low priority successively building retrieval Formula, until retrieval result meets predetermined threshold condition or there is not the key word that can add further.
12. information indexing devices according to claim 6 it is characterised in that:
Retrieval type construction unit, builds criterion by the first retrieval type or the second retrieval type builds criterion to build retrieval type, wherein,
Build in criterion in the first retrieval type, between the classification number of identical or different grade, carry out logic or operation, same levels Key word between carry out logical AND operation, carry out logic or operation between different grades of key word, and key word and classification Carry out logical AND operation between number;
Build in criterion in the second retrieval type, between the classification number of identical or different grade, carry out logic or operation, identical or not Carry out logical AND operation between the key word of ad eundem, and between key word and classification number, carry out logical AND operation.
13. information indexing devices according to claim 12 it is characterised in that:
Retrieval type construction unit, when retrieval result is less than four threshold values, builds criteria construction retrieval type by the first retrieval type;? When retrieval result is more than five threshold values, build criteria construction retrieval type by the second retrieval type.
14. information indexing devices according to claim 12 it is characterised in that:
Retrieval type construction unit build retrieval type when, the classification of the use priority highest key word and highest priority at first Number building retrieval type, and judge whether retrieval result meets predetermined threshold condition.
15. information indexing devices according to claim 13 it is characterised in that:
Retrieval type construction unit includes first module, and this first module is in the key word of highest priority and dividing of highest priority Class-mark is built criterion to build after retrieval type by the first retrieval type, is judged as that retrieval result is less than the 4th threshold by the second comparing unit Work during value, first module first according to priority order adds different grades of key word successively, and by the key word being added with The classification number of highest priority is built criterion to build retrieval type by the first retrieval type, more according to priority order adds difference successively The classification number of grade, is built criterion to build retrieval type by the first retrieval type, until the second comparing unit is judged as retrieval result Meet above-mentioned threshold condition;Second unit, after the key word adding specific grade in first module or classification number, is compared by second Compared with unit judges be retrieval result be more than five threshold values when work, second unit according to interpolation above-mentioned specific grade key word or After classification number, the constructed retrieval type before being more than the 5th threshold value of retrieval result, keep the key word in above-mentioned retrieval type constant, And according to priority order adds more low-grade classification number successively, and build criterion to build retrieval type by the first retrieval type, directly It is judged as that retrieval result meets above-mentioned threshold condition to the second comparing unit;
Unit the 3rd, add all etc. after adding the gradational key word of institute and classification number or in second unit in first module Level classification number after, by be judged as after the second comparing unit retrieval result still less than during four threshold values work, Unit the 3rd According to priority order from low to high, in multiple key words of same grade, deletes key successively by order from back to front Word, until the key word of the only remaining predetermined number of this grade, is built criterion to build retrieval type by the first retrieval type afterwards, until Second comparing unit is judged as that retrieval result meets above-mentioned threshold condition;
Retrieval type acquiring unit, obtains retrieval type, wherein, when the second comparing unit is judged as that retrieval result meets above-mentioned threshold value bar During part, that is, obtain the retrieval type meeting retrieval result;When the 3rd element deletion gradational key word after constructed retrieval When the retrieval result of formula is still less than four threshold values, obtain the final retrieval type building;Pass when the specific grade of the 3rd element deletion When after keyword, the retrieval result of constructed retrieval type is more than five threshold values, after obtaining the key word deleting specific grade, retrieval The constructed retrieval type before being more than the 5th threshold value of result.
16. information indexing devices according to claim 13 it is characterised in that:
Retrieval type construction unit includes Unit the 4th, and Unit the 4th is in the key word of highest priority and dividing of highest priority Class-mark is built criterion to build after retrieval type by the first retrieval type, is being judged as that its retrieval result is more than the by the second comparing unit Work during five threshold values, by classification number, order from back to front deletes in the classification number of highest priority one to Unit the 4th successively Individual, until the classification number of the only remaining predetermined number of this grade, key word keeps constant, builds criterion to build by the second retrieval type Retrieval type, is judged as that retrieval result meets above-mentioned threshold condition after by the second comparing unit;
Unit the 5th, after deleting specific classification in Unit the 4th, is judged as retrieval result by after the second comparing unit Less than during four threshold values work, Unit the wherein the 5th according to delete specific classification number after, retrieval result be less than the 4th threshold value it Front retrieval type, keeps the classification number in above-mentioned retrieval type constant, and according to priority order adds more low-grade key successively Word, and build criterion to build retrieval type by the second retrieval type, until the second comparing unit, to be judged as that retrieval result meets above-mentioned Threshold condition;
Unit the 6th, after the classification number deleting only remaining predetermined number in Unit the 4th to build retrieval type, is compared by second During compared with being judged as retrieval result still greater than five threshold values after unit, the 6th cell operation, Unit the wherein the 6th according to Above-mentioned retrieval type, keeps the classification number in above-mentioned retrieval type constant, and according to priority order from high to low is many in same grade In individual key word, add key word successively by vertical order, and build criterion to build retrieval type by the second retrieval type, Until the second comparing unit is judged as that retrieval result meets above-mentioned threshold condition;
Retrieval type acquiring unit, obtains retrieval type, wherein, when the second comparing unit is judged as that retrieval result meets above-mentioned threshold value bar During part, that is, obtain the retrieval type meeting retrieval result;When Unit the 6th add gradational key word after constructed retrieval When the retrieval result of formula is judged as still greater than five threshold values by the second comparing unit, obtain the final retrieval type building;When the 5th After unit adds the key word of specific grade, the retrieval result of constructed retrieval type is judged as less than the by the second comparing unit During four threshold values, after obtaining the key word adding specific grade, the constructed retrieval type before being less than the 4th threshold value of retrieval result.
17. information indexing devices according to claim 1 or 9 or 11 or 15 or 16 it is characterised in that:
This information indexing device further includes similarity calculated, calculates each file in retrieval result and user input Specific patent similarity, this retrieval result is the result after the retrieval type retrieval being built by retrieval type construction unit;
Sequencing unit, is ranked up to each file in above-mentioned retrieval result according to similarity.
18. information indexing devices according to claim 17 it is characterised in that:
When the 3rd element deletion gradational key word after constructed retrieval type retrieval result sentenced by the second comparing unit Break when being still less than four threshold values, obtain the retrieval result being obtained by the retrieval type that the second retrieval type construction unit builds;
Every in the retrieval result being obtained by the retrieval type that above-mentioned similarity calculated builds to the second retrieval type construction unit Individual file carries out Similarity Measure;
Carry out by the file in the retrieval result to the retrieval type that retrieval type construction unit builds for the similarity order from high to low Supplement, style of writing part duplicate removal of going forward side by side is so that the quantity of documents after supplementing is equal to the 4th threshold value.
19. information indexing devices according to claim 17 it is characterised in that:
When Unit the 6th add gradational key word after constructed retrieval type retrieval result sentenced by the second comparing unit Break when being still greater than five threshold values, the similarity calculating by above-mentioned similarity calculated order from small to large delete successively by File in the retrieval result of retrieval type that retrieval type construction unit builds, until quantity of documents is equal to the 5th threshold value.
A kind of 20. computer systems are it is characterised in that include:
Input equipment, by the specific patent No. of user input;
Memorizer, wherein stores thesaurus and predetermined computer instruction;
Processor, it reads corresponding computer instruction and synonym from memorizer, so that this computer system receives using The specific patent No. of family input;Extracting keywords automatically from the corresponding patent information of the above-mentioned specific patent No.;According to key word Degree of correlation and above-mentioned specific patent between, the key word being extracted is divided into multiple priority;From the above-mentioned specific patent No. Extract classification number in corresponding patent information, and the multiple classification numbers being extracted are divided into multiple priority;With reference to from memorizer The synonym reading, by key word and/or classification number according to priority sequential build retrieval type from high to low, until constructing inspection Hitch fruit meets the retrieval type of predetermined threshold condition.
A kind of 21. information retrieval methods are it is characterised in that include:
Receiving step, the specific patent No. of receiving user's input;
Key word obtaining step, extracting keywords automatically from the corresponding patent information of the above-mentioned specific patent No.;
Comparison step, according to the degree of correlation between key word and above-mentioned specific patent, the key word being extracted is divided into multiple Priority;Classification number obtaining step, extract classification number from the corresponding patent information of the above-mentioned specific patent No., and will be extracted Multiple classification numbers are divided into multiple priority;
Retrieval type construction step, by key word and/or classification number according to priority sequential build retrieval type from high to low, until structure Build out the retrieval type that retrieval result meets predetermined threshold condition.
22. information retrieval methods according to claim 21 it is characterised in that:
Retrieval type construction step to carry out retrieval type structure with reference to the thesaurus of key word.
23. information retrieval methods according to claim 21 it is characterised in that:
Key word obtaining step includes high priority word obtaining step and semantic word obtaining step;Wherein high priority word obtains Step obtains high priority word from the artificial treatment data of above-mentioned specific patent;Semantic word obtaining step is according to semanteme from special Carry out participle, thus obtaining semantic key words in sharp information.
24. information retrieval methods according to claim 23 it is characterised in that:
Semantic word obtaining step includes semantic participle step and filtration step, and wherein filtration step divides from the semanteme of patent information Shielding word and individual character is removed in word result.
25. information retrieval methods according to claim 21 it is characterised in that:
Classification number obtaining step according to the classification number whether classification number being obtained is artificial treatment, this classification number whether based on point One or more of class-mark, type of taxonomic hierarchieses, and acquired classification number is divided into the classification number of multiple priority.
26. information retrieval methods according to claim 21 it is characterised in that:
Wherein predetermined threshold condition is that retrieval result is more than or equal to the 4th threshold value and is less than or equal to the 5th threshold value.
27. information retrieval methods according to claim 26 it is characterised in that:
Above-mentioned 4th threshold value and the 5th threshold value are dynamically changeables.
28. information retrieval methods according to claim 27 it is characterised in that:
This information retrieval method further includes that dynamic threshold determines step, for adjusting above-mentioned 4th and the 5th threshold value.
29. information retrieval methods according to claim 28 it is characterised in that:
Above-mentioned key word obtaining step includes semantic word obtaining step, carries out participle from patent information, and from word segmentation result Middle removal shields word and individual character, thus obtaining semantic key words;
Above-mentioned dynamic threshold determines that step includes the second retrieval type construction step data bias step, wherein this second retrieval type Construction step, obtains multiple classification that the semantic key words of semantic word obtaining step acquisition and classification number obtaining step extract Number, to build retrieval type and to enter line retrieval, and to obtain retrieval hit amount;
Data-bias step, by the retrieval positive negative offset of hit amount one scheduled volume, and using the value of forward migration as the 5th threshold value, Using the value of negative offset as the 4th threshold value.
30. information retrieval methods according to claim 26 it is characterised in that:
Retrieval type construction step, when building retrieval type, is the relation of logical AND between the key word of identical or different grade, phase With or different grades of classification number between be logic or relation, and be the relation of logical AND between key word and classification number.
31. information retrieval methods according to claim 30 it is characterised in that:
When retrieval result is less than four threshold values, the classification number of according to priority order interpolation low priority comes retrieval type construction step Build retrieval type, until retrieval result meets predetermined threshold condition or there is not the classification number that can add further;
When retrieval result is more than five threshold values, according to priority order adds the key word of low priority successively building retrieval Formula, until retrieval result meets predetermined threshold condition or there is not the key word that can add further.
32. information retrieval methods according to claim 26 it is characterised in that:
Retrieval type construction step, builds criterion by the first retrieval type or the second retrieval type builds criterion to build retrieval type, wherein,
Build in criterion in the first retrieval type, between the classification number of identical or different grade, carry out logic or operation, same levels Key word between carry out logical AND operation, carry out logic or operation between different grades of key word, and key word and classification Carry out logical AND operation between number;
Build in criterion in the second retrieval type, between the classification number of identical or different grade, carry out logic or operation, identical or not Carry out logical AND operation between the key word of ad eundem, and between key word and classification number, carry out logical AND operation.
33. information retrieval methods according to claim 32 it is characterised in that:
Retrieval type construction step, when retrieval result is less than four threshold values, builds criteria construction retrieval type by the first retrieval type;? When retrieval result is more than five threshold values, build criteria construction retrieval type by the second retrieval type.
34. information retrieval methods according to claim 32 it is characterised in that:
Retrieval type construction step build retrieval type when, the classification of the use priority highest key word and highest priority at first Number building retrieval type, and judge whether retrieval result meets predetermined threshold condition.
35. information retrieval methods according to claim 33 it is characterised in that:
Retrieval type construction step includes first step, and this first step is in the key word of highest priority and dividing of highest priority Class-mark is built criterion to build after retrieval type by the first retrieval type, is judged as that retrieval result is less than the 4th threshold by the second comparison step Work during value, first step first keeps classification number constant, according to priority order adds different grades of key word successively, and by institute The key word adding is built criterion to build retrieval type by the first retrieval type with classification number, more according to priority order is added not successively The classification number of ad eundem, is built criterion to build retrieval type by the first retrieval type, until the second comparison step is judged as retrieval knot Fruit meets above-mentioned threshold condition;
Second step, after the key word adding specific grade in first step or classification number, is judged as by the second comparison step Retrieval result is more than work during five threshold values, after second step is according to the key word adding above-mentioned specific grade or classification number, inspection The constructed retrieval type before being more than the step of the 5th threshold value of hitch fruit, keeps the key word in above-mentioned retrieval type constant, and presses Priority orders add more low-grade classification number successively, and build criterion to build retrieval type, Zhi Dao by the first retrieval type Two comparison step are judged as that retrieval result meets above-mentioned threshold condition;
Third step, add all etc. after adding the gradational key word of institute and classification number in the first step or in second step After the classification number of level, it is judged as that retrieval result works still less than during four threshold values by the second comparison step, third step is pressed preferential Level order from low to high, in multiple key words of same grade, by order from back to front delete successively key word until The key word of the only remaining predetermined number of this grade, is built criterion to build retrieval type by the first retrieval type afterwards, until the second ratio It is judged as that retrieval result meets above-mentioned threshold condition compared with step;
Retrieval type obtaining step, obtains retrieval type, wherein, when the second comparison step is judged as that retrieval result meets above-mentioned threshold value bar During part, that is, obtain the retrieval type meeting retrieval result;When third step delete gradational key word after constructed retrieval When the retrieval result of formula is still less than four threshold values, obtain the final retrieval type building;When third step deletes the pass of specific grade When after keyword, the retrieval result of constructed retrieval type is more than five threshold values, after obtaining the key word deleting specific grade, retrieval Result is more than the retrieval type constructed by step before the 5th threshold value.
36. information retrieval methods according to claim 33 it is characterised in that:
Retrieval type construction step includes four steps, and this four steps is in the key word of highest priority and dividing of highest priority Class-mark is built criterion to build after retrieval type by the first retrieval type, is judged as that its retrieval result is more than the 5th by the second comparison step Work during threshold value, by classification number, order from back to front deletes one of classification number of highest priority to four steps successively, Until the classification number of the only remaining predetermined number of this grade, key word keeps constant, builds criterion to build inspection by the second retrieval type Cable-styled, it is judged as that retrieval result meets above-mentioned threshold condition after by the second comparison step;
5th step, after deleting specific classification in four steps, is judged as retrieval result by after the second comparison step Work less than during four threshold values, the wherein the 5th step, according to deleting specific classification number retrieval type before, keeps above-mentioned retrieval Classification number in formula is constant, and according to priority order adds more low-grade key word successively, and builds accurate by the second retrieval type Then to build retrieval type, until the second comparison step is judged as that retrieval result meets above-mentioned threshold condition;
6th step, after the classification number deleting only remaining predetermined number in four steps to build retrieval type, is compared by second Compared with being judged as that retrieval result works still greater than during five threshold values after step, the wherein the 6th step according to above-mentioned retrieval type, Keep the classification number in above-mentioned retrieval type constant, according to priority order from high to low, in multiple key words of same grade, Add key word successively by vertical order, and build criterion to build retrieval type by the second retrieval type, until the second ratio It is judged as that retrieval result meets above-mentioned threshold condition compared with step;
Retrieval type obtaining step, obtains retrieval type, wherein, when the second comparison step is judged as that retrieval result meets above-mentioned threshold value bar During part, that is, obtain the retrieval type meeting retrieval result;When the 6th step add gradational key word after constructed retrieval When the retrieval result of formula is judged as still greater than five threshold values by the second comparison step, obtain the final retrieval type building;When the 5th After step adds the key word of specific grade, the retrieval result of constructed retrieval type is judged as less than the by the second comparison step During four threshold values, after obtaining the key word adding specific grade, the constructed inspection before being less than the step of the 4th threshold value of retrieval result Cable-styled.
37. information retrieval methods according to claim 21 or 29 or 31 or 36 it is characterised in that:
This information retrieval method further includes Similarity Measure step, calculates each file in retrieval result and user input Specific patent similarity, this retrieval result is the result after the retrieval type retrieval being built by retrieval type construction step;
Sequence step, is ranked up to each file in above-mentioned retrieval result according to similarity.
38. information retrieval methods according to claim 37 it is characterised in that:
When third step delete gradational key word after constructed retrieval type retrieval result sentenced by the second comparison step Break when being still less than four threshold values, obtain the retrieval result being obtained by the retrieval type that the second retrieval type construction step builds;
Every in the retrieval result being obtained by the retrieval type that above-mentioned Similarity Measure step builds to the second retrieval type construction step Individual file carries out Similarity Measure;
Enter as the file in the retrieval result to the retrieval type constructed by retrieval type construction step for the similarity order from high to low Row supplements, and style of writing part duplicate removal of going forward side by side is so that the quantity of documents after supplementing is equal to the 4th threshold value.
39. information retrieval methods according to claim 37 it is characterised in that:
When the 6th step add gradational key word after constructed retrieval type retrieval result sentenced by the second comparison step Break when being still greater than five threshold values, delete, by similarity order from small to large, the retrieval being built by retrieval type construction step successively File in the retrieval result of formula, until quantity of documents is equal to the 5th threshold value.
CN201610809109.9A 2016-09-07 2016-09-07 Information retrieval device and method Active CN106372226B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610809109.9A CN106372226B (en) 2016-09-07 2016-09-07 Information retrieval device and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610809109.9A CN106372226B (en) 2016-09-07 2016-09-07 Information retrieval device and method

Publications (2)

Publication Number Publication Date
CN106372226A true CN106372226A (en) 2017-02-01
CN106372226B CN106372226B (en) 2020-08-25

Family

ID=57898935

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610809109.9A Active CN106372226B (en) 2016-09-07 2016-09-07 Information retrieval device and method

Country Status (1)

Country Link
CN (1) CN106372226B (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106934010A (en) * 2017-03-09 2017-07-07 深圳市华第时代科技有限公司 Automatic duplicate checking method and device
CN108460066A (en) * 2017-02-17 2018-08-28 云拓科技有限公司 Search keyword suggestion method for patent search
WO2018161309A1 (en) * 2017-03-09 2018-09-13 深圳市华第时代科技有限公司 Automatic duplication checking method and device
CN108664508A (en) * 2017-03-31 2018-10-16 百度在线网络技术(北京)有限公司 Information-pushing method and device
CN108920484A (en) * 2018-04-28 2018-11-30 广州市百果园网络科技有限公司 Search for content processing method, device and storage equipment, computer equipment
CN109344224A (en) * 2018-09-18 2019-02-15 江苏润桐数据服务有限公司 A kind of automatic denoising method of patent retrieval and device
CN109359299A (en) * 2018-09-28 2019-02-19 中国电子科技集团公司信息科学研究院 A kind of internet of things equipment ability ontology based on commodity data is from construction method
CN110083674A (en) * 2019-03-04 2019-08-02 温州涌润信息科技有限公司 A kind of intellectual property information treating method and apparatus
CN110503281A (en) * 2018-05-16 2019-11-26 北京牡丹电子集团有限责任公司 Innovative product value-added tax function develops assistant system and its method
CN110597863A (en) * 2019-09-25 2019-12-20 上海依图网络科技有限公司 Retrieval system and method for keeping stable performance in control library through dynamic threshold
CN110895556A (en) * 2018-09-13 2020-03-20 深圳市蓝灯鱼智能科技有限公司 Text retrieval method and device, storage medium and electronic device
CN111538880A (en) * 2020-04-28 2020-08-14 中南林业科技大学 Intelligent analysis and retrieval system for tenon structural design
CN112131455A (en) * 2020-09-28 2020-12-25 贝壳技术有限公司 List page retrieval degradation method and device, electronic equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005234868A (en) * 2004-02-19 2005-09-02 Ntt Data Corp Similar patent specification retrieval system, method therefor and program
CN101539916A (en) * 2008-03-17 2009-09-23 亿维讯软件(北京)有限公司 Initial patent retrieving device, secondary patent retrieving device and patent retrieving system
CN101546306A (en) * 2008-03-27 2009-09-30 上海市知识产权服务中心 Method and system for searching patent documentation by utilizing IPC classification

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005234868A (en) * 2004-02-19 2005-09-02 Ntt Data Corp Similar patent specification retrieval system, method therefor and program
CN101539916A (en) * 2008-03-17 2009-09-23 亿维讯软件(北京)有限公司 Initial patent retrieving device, secondary patent retrieving device and patent retrieving system
CN101546306A (en) * 2008-03-27 2009-09-30 上海市知识产权服务中心 Method and system for searching patent documentation by utilizing IPC classification

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108460066A (en) * 2017-02-17 2018-08-28 云拓科技有限公司 Search keyword suggestion method for patent search
CN106934010A (en) * 2017-03-09 2017-07-07 深圳市华第时代科技有限公司 Automatic duplicate checking method and device
WO2018161309A1 (en) * 2017-03-09 2018-09-13 深圳市华第时代科技有限公司 Automatic duplication checking method and device
CN108664508A (en) * 2017-03-31 2018-10-16 百度在线网络技术(北京)有限公司 Information-pushing method and device
CN108664508B (en) * 2017-03-31 2021-12-24 百度在线网络技术(北京)有限公司 Information pushing method and device
CN108920484A (en) * 2018-04-28 2018-11-30 广州市百果园网络科技有限公司 Search for content processing method, device and storage equipment, computer equipment
CN108920484B (en) * 2018-04-28 2022-06-10 广州市百果园网络科技有限公司 Search content processing method and device, storage device and computer device
CN110503281A (en) * 2018-05-16 2019-11-26 北京牡丹电子集团有限责任公司 Innovative product value-added tax function develops assistant system and its method
CN110895556A (en) * 2018-09-13 2020-03-20 深圳市蓝灯鱼智能科技有限公司 Text retrieval method and device, storage medium and electronic device
CN109344224A (en) * 2018-09-18 2019-02-15 江苏润桐数据服务有限公司 A kind of automatic denoising method of patent retrieval and device
CN109359299A (en) * 2018-09-28 2019-02-19 中国电子科技集团公司信息科学研究院 A kind of internet of things equipment ability ontology based on commodity data is from construction method
CN110083674A (en) * 2019-03-04 2019-08-02 温州涌润信息科技有限公司 A kind of intellectual property information treating method and apparatus
CN110083674B (en) * 2019-03-04 2023-05-12 深圳云联智汇物联科技有限公司 Intellectual property information processing method and device
CN110597863A (en) * 2019-09-25 2019-12-20 上海依图网络科技有限公司 Retrieval system and method for keeping stable performance in control library through dynamic threshold
CN110597863B (en) * 2019-09-25 2023-01-24 上海依图网络科技有限公司 Retrieval system and method for keeping stable performance in control library through dynamic threshold
CN111538880A (en) * 2020-04-28 2020-08-14 中南林业科技大学 Intelligent analysis and retrieval system for tenon structural design
CN112131455A (en) * 2020-09-28 2020-12-25 贝壳技术有限公司 List page retrieval degradation method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN106372226B (en) 2020-08-25

Similar Documents

Publication Publication Date Title
CN106372226A (en) Information retrieval device and method
CN110059311B (en) Judicial text data-oriented keyword extraction method and system
CN104765769B (en) The short text query expansion and search method of a kind of word-based vector
JP6231668B2 (en) Keyword expansion method and system and classification corpus annotation method and system
US20090307213A1 (en) Suffix Tree Similarity Measure for Document Clustering
CN111368088A (en) Text emotion classification method based on deep learning
CN110188349A (en) A kind of automation writing method based on extraction-type multiple file summarization method
CA3166094A1 (en) Commodity short title generation method and apparatus
CN104834739B (en) Internet information storage system
Jayaram et al. A review: Information extraction techniques from research papers
CN113282834A (en) Web search intelligent ordering method, system and computer storage medium based on mobile internet data deep mining
CN103853797B (en) A kind of picture retrieval method and system based on n member picture indices structures
US20130052619A1 (en) Method for building information on emotion lexicon and apparatus for the same
CN109446399A (en) A kind of video display entity search method
CN109446313A (en) A kind of ordering system and method based on natural language analysis
CN107908749A (en) A kind of personage&#39;s searching system and method based on search engine
Huda et al. Text Summarization of Hadits in Indonesian Language Using The Combination of Fuzzy Logic Scoring And Latent Semantic Analysis (LSA)
CN110532538A (en) Property dispute judgement document&#39;s critical entities extraction algorithm
CN108062563A (en) A kind of representative sample based on classification equilibrium finds method
CN110555196B (en) Method, apparatus, device and storage medium for automatically generating article
Cheng et al. Content-based video retrieval using the shot cluster tree
CN113449195B (en) Intelligent knowledge pushing method and system
CN112463918B (en) Information recommendation method, system, storage medium and terminal equipment
CN115982319A (en) Method and device for full-text retrieval of media
CN110555198B (en) Method, apparatus, device and computer readable storage medium for generating articles

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant