CN102930022A - User-oriented information search engine system and method - Google Patents

User-oriented information search engine system and method Download PDF

Info

Publication number
CN102930022A
CN102930022A CN2012104337316A CN201210433731A CN102930022A CN 102930022 A CN102930022 A CN 102930022A CN 2012104337316 A CN2012104337316 A CN 2012104337316A CN 201210433731 A CN201210433731 A CN 201210433731A CN 102930022 A CN102930022 A CN 102930022A
Authority
CN
China
Prior art keywords
word
search
user
collection
result
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2012104337316A
Other languages
Chinese (zh)
Other versions
CN102930022B (en
Inventor
贾倩
张巍
杨秋皓
许怡婷
张冶
王志勇
章乐平
杨玉堃
毕经元
王立伟
杜俊鹏
褚厚斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Academy of Launch Vehicle Technology CALT
Original Assignee
China Academy of Launch Vehicle Technology CALT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Academy of Launch Vehicle Technology CALT filed Critical China Academy of Launch Vehicle Technology CALT
Priority to CN201210433731.6A priority Critical patent/CN102930022B/en
Publication of CN102930022A publication Critical patent/CN102930022A/en
Application granted granted Critical
Publication of CN102930022B publication Critical patent/CN102930022B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a user-oriented information search engine system and a method. The system comprises eight modules of a search term pushing module, a user initiating search module, a user concern updating module, a primary search module, a user interest deducing module, a user concern result word segmentation module, a search word refactoring module and a secondary search module. The method includes analyzing and pushing user selectable search words, performing user initiating search, updating user concerns, executing the primary search, deducing user interests, performing word segmentation for user concern results, refactoring the search words and executing the secondary search. Accordingly, the system has the advantages of being complete in inquiring range and high in inquiring accuracy. Besides, by means of the system, the select input and automatic sequencing of the search words can be achieved, the accuracy of following inquiring results can be improved through the interactive operation, and a flexible, convenient and intelligent interactive interface is provided for the user to execute information search.

Description

User oriented information search engine system and method
Technical field
The present invention relates to a kind of user oriented information search engine system and method, belong to the information search technique field.
Background technology
Current, search engine has become the main tool of information inquiry.Along with the magnanimity explosive growth of information, intelligence, efficient searching method can increase greatly inquiry velocity, improve recall ratio and precision ratio, make the user obtain concern information as much as possible within the short as far as possible time, for the user brings great convenience.
Difference is a little set about in control according to object, at present improving one's methods of Design of Search Engine mainly is divided into two classes: towards the method for expanded search word justice with towards the method for inferring user interest.
Resolve the semantic net of search word towards the method for expanded search word justice by ontology, to reach the expanded search word, to enlarge the effect of query context.But make the shortcoming that has in this way two aspects, the one, only search word is carried out semantic analysis, do not consider may exist during Search Results is in full the key message of polymerization semanteme; The 2nd, often pay close attention to the semanteme of search word itself and ignored user's intention, be difficult to make Search Results to meet user's requirement.
To the method for inferring user interest by record and the user operation to the historical search result, the interested information of analysis user, thus infer user's Focus Area.Make in this way shortcoming be only to have considered user's point of interest, do not expand from the aspect of semanteme own, because the user often exists limitation and inaccuracy to the assurance of own real intention, this kind method often also makes the user be difficult to obtain really to meet the Search Results of intention.
In addition, existing search engine system all needs the user manually to input keyword, even if Search Hints is provided, also only enumerated in order user historical search result in the past, fail by corresponding parsing, push by frequency of utilization, and run user selects single word and sort, increased to a certain extent the loaded down with trivial details property of user interactions.
Summary of the invention
Technology of the present invention is dealt with problems: remedy the deficiencies in the prior art, a kind of search engine system and method that the Query Result scope is more complete, precision is higher that make is provided.The method has realized the reconstruct of search word on the basis of inferring user interest, and in the reconstruct of search word, considered with reference to authoritative thesaurus and carried out semantic expansion, enlarged the hunting zone, in addition, the user can realize the selection input of search word, autonomous ordering by this system, and can improve by interactive operation subsequent query result's accuracy, for the user carry out information search provide a kind of flexibly, convenient, intelligent interactive interface.
Technical solution of the present invention: user oriented information search engine system, as shown in Figure 1, consisted of by client and server, be responsible for the rear end of data that client is transmitted at server and resolve and work for the treatment of, look into piece and binary search module in server end deployment search word pushing module, user's focus update module, first search module, user interest inference module, search word reconstruct; Client host is undertaken alternately by B/S mode and server, initiates search module, first search module the user of client's mouth end administration; Wherein above-mentioned each module is achieved as follows:
The search word pushing module: server is according to active user's identity information, inquiring user is paid close attention to the storehouse, described user pays close attention to the storehouse and forms by my historical focus and with the historical focus two parts of interest user, described my historical focus and form with the frequency of utilization of the historical focus of interest user by historical search word and search word, at first resolve user's historical search word, sort from high to low according to the search word frequency of utilization, the choice for use frequency surpasses the historical search word of certain threshold value, write according to the order of sequence the historical set of words of paying close attention to of access customer, it is the searchVoc_past collection, travel through afterwards the searchVoc_past collection, obtain each historical search word other historical users except the active user, write with the interest user and gather, it is the user_sameInt collection, obtain successively the historical search word that user_sameInt concentrates each user, inquire about respectively the frequency of utilization of each historical search word, write from high to low the historical set of words of paying close attention to the interest user according to frequency of utilization, it is the searchVoc_past_other collection, the searchVoc_past_other collection is traveled through, avoiding under the prerequisite of repetition, word order is wherein added the searchVoc_past collection, form search word according to the searchVoc_past collection and push tabulation, export client to, initiate search module for the user and call;
The user initiates search module: the search word that receives the output of search word pushing module pushes tabulation, resolve search word wherein, be presented in order client, and provide check button and rank button, the permission user selects each search word or cancels, and the priority that search word is set, dynamically change the search word set according to user's selection result, support simultaneously the user that search word is gathered and carry out artificial supplementation or modification, to form the search application of final submission, call for user's focus update module and first search module;
User's focus update module: receive the search application, the Client-initiated search behavior is carried out record, described search behavior is comprised of the search word of user's input and the order of search word, the search word of user's input is write the set of search word user selection according to the order of sequence, it is the searchVoc_select collection, traversal searchVoc_select collection, whether judgement search word wherein is present in the user is paid close attention in the storehouse, if exist, then upgrade the current frequency of utilization of this word, otherwise my the historical focus of then this word being write in the access customer concern storehouse is gathered, and it is initial value that current frequency of utilization is set simultaneously;
First search module: carry out first search according to the Client-initiated search behavior, at first according to the priority of search word whole search words that searchVoc_select concentrates are carried out full permutation and combination, searchVoc_select collection after the permutation and combination is denoted as searchVoc_select restructuring collection, comprising autonomous word and portmanteau word, traversal searchVoc_select restructuring collection, the Search Results that is complementary of inquiry and each word wherein successively, namely represent to comprise in the Search Results this autonomous word with the autonomous word coupling, represent namely that with the portmanteau word coupling Search Results comprises each element, matching result for each search word, statistics in full in the matching frequency of search word, sort from high to low by matching frequency, press the word order of searchVoc_select restructuring collection with the search result list combination of all couplings, write the initial search result set, it is the result_first collection, described search result list is by the object information title, summary, the source forms, wherein, summary is passages maximum with the search word coupling in result's full text, export the result_first collection that forms to client, check for the user;
The user interest inference module: recording user writes first Search Results user with user's screening behavior and screens collection, be i.e. the result_userSelect collection the operation of result_first collection.Described user screen behavior by user selection as a result ID, number of clicks and result form the time of checking as a result.For each bar result, " number of clicks x result checks the time as a result " carried out read group total, obtain the user to each bar result's degree of concern, sort from high to low according to the degree of concern value, parse respectively each result's summary info, summary info is write access customer the selection result summary collection in order, and namely the result_abstract collection exports the user to and pays close attention to the conclusive participle module;
The user pays close attention to the conclusive participle module: traversal result_abstract collection, parse successively the summary info that the user pays close attention to the result, the contrast wordbook, adopt the reverse matching algorithm participle, described wordbook is Hash table, it is the array that HashMap forms, array length is the Chinese character number that can be used as lead-in in the dictionary, array indexing is the region-position code of this Chinese character, and each element of array is the HashMap that all words corresponding to this lead-in form, and wherein word itself is as the key of HashMap, word frequency is as the value of HashMap, after participle was complete, the nonsense word rejected in contrast nonsense dictionary, with the word segmentation result of each piece summary as independent array, write summary word segmentation result discrete set, i.e. abstract_cut_apart collection extracts the union of word segmentation result simultaneously, the maximum set that does not namely have repetitor, write summary word segmentation result combination of sets, namely the abstract_cut_unit collection all exports abstract_cut_apart collection and two set of abstract_cut_unit collection to the search word reconstructed module;
Search word reconstructed module: the word that traversal abstract_cut_unitt concentrates, comparison abstract_cut_apart collection, resolve the number of times that each word occurs in the difference summary, the number of times that described each word occurs in the difference summary does not comprise the number of times that this word occurs in same summary, the word that occurrence number is identical with the summary record, be that the word that all occurs in each piece summary compiles and writes the summary word segmentation result and occurs simultaneously, it is the abstract_cut__same collection, contrast Chinese classification scheme vocabulary, analyze the abstract_cut_same collection, for having with generation relation and the word of correlationship with word wherein, write summary word segmentation result restructuring collection, be the abstract_cut_reorg collection, all export abstract_cut_same collection and two set of abstract_cut_reorg collection to the binary search module;
Binary search module: at first resolve the abstract_cut_same collection, carry out permutation and combination according to the word in the method pair set in the first search module, each search word that traversal abstract_cut_same concentrates, the document that matches in obtaining successively in full, the picture that matches in the title and video, wherein, for portmanteau word, wherein each element is satisfied in the expression that matches, afterwards, resolve the abstract_cut_reorg collection, obtain and the document of each autonomous word coupling wherein, picture and video, all document files are write binary search document results collection by search order, it is the result_second_doc collection, all picture files are write binary search picture result set by search order, it is the result_second_image collection, all video files are write binary search results for video collection by search order, it is the result_second_vedio collection, return the result_second_doc collection, three of result_second_image collection and result_second_vedio collection are gathered to client, category shows the user with Search Results, this Search Results of prompting user may more meet its intention, deeply checks for the user.
Described search word pushing module implementation procedure is as follows:
(1) catches user profile, store the session of identity information when logining according to the user, obtain current registrant's user name, Customs Assigned Number, i.e. ID;
(2) pay close attention to the storehouse according to the user ID inquiring user, extract historical search word and the search word frequency of utilization of mating with this ID, search word is designated as V, and frequency of utilization is designated as F, the result is pressed the descending sort of F value;
(3) establishing default word frequency threshold value is E, than the size of frequency of utilization F with setting threshold E;
If F>=E c., then the V that F is corresponding writes the historical word set of paying close attention to of access customer, is denoted as the searchVoc_past collection;
If d. F<E does not then process;
(4) resolve the searchVoc_past collection, travel through successively search word V wherein, inquiring user is paid close attention to the storehouse, and other user ID except the active user of acquisition and V coupling write with the interest user and gather, be i.e. the user_sameInt collection;
(5) concentrate each user ID according to user_sameInt, inquiring user is paid close attention to the storehouse, obtain respectively the historical search word record with each user ID coupling, historical search word in the traversal record, add up respectively each search word user and pay close attention to frequency of utilization in the storehouse, write from high in the end historical set of words, i.e. the searchVoc_past_other collection paid close attention to the interest user by frequency;
(6) traversal searchVoc_past_other collection judges successively whether this word has been present in searchVoc_past and has concentrated;
If c. exist, then this word is not dealt with, continue to resolve next word;
D. if there is no, then this word being joined searchVoc_past concentrates;
(7) with the searchVoc_past collection as storage of array in buffer memory, push tabulation as search word and export client to, initiate search module for the user and call.
Described search word reconstructed module implementation procedure is as follows:
(1) traversal user the selection result summary collection, it is the result_abstract collection, parse successively the summary info that the user pays close attention to the result, the contrast wordbook, adopt the reverse matching algorithm participle, the word segmentation result of each piece summary writes summary word segmentation result discrete set as independent array, be the abstract_cut_apart collection, the array number is designated as N;
(2) extract the union of word segmentation result, namely do not have the maximum set of repetitor, write summary word segmentation result combination of sets, i.e. the absrtact_cut_unit collection;
(3) the abstract_cut_unit collection is traveled through, to each search word wherein, carry out following operation;
(3.1) frequency of occurrences F of initialization current search word Abs=0;
(3.2) each concentrated array element of traversal abstract_cut_apan judges whether comprise the current search word in this array element;
If c. comprise, F then Abs=F Abs+ 1, continue to judge next bar array element;
If d. do not comprise, F AbsBe worth constant.
(3.3) F that the current search word is corresponding AbsThe array number that value and abstract_cut_apart concentrate compares;
If c. Fabs=N writes the summary word segmentation result with the current search word and occurs simultaneously, namely
abstract_cut_same;
If d. Fabs<N does not process, continue to judge next search word.
(4) traversal abstract_cut_same collection, to each search word wherein, retrieval is take the semantic net of this word as an article or item in a contract descriptor in Chinese classification scheme vocabulary;
(4.1) if the relative that is designated " Y " is arranged in the semantic net, represent that this word has formal expression word, will formally express word and write set abstract_cut_reorg;
(4.2) if the relative that is designated " D " is arranged in the semantic net, represent that this word has unofficial expression word, will unofficially express word and write set abstract_cut_reorg;
(4.3) if the relative that is designated " C " is arranged in the semantic net, represent that this word has the associated expression word of the meaning of a word, the correlated expression word is write set abstract_cut_reorg;
(5) all export abstract_cut_same collection and abstract_cut_reorg collection to the binary search module as array.
User oriented information search engine is that the performing step of implementation method is as follows:
(1) server is according to active user's identity information, inquiring user is paid close attention to the storehouse, described user pays close attention to the storehouse and forms by my historical focus and with the historical focus two parts of interest user, described my historical focus and form with the frequency of utilization of the historical focus of interest user by historical search word and search word, at first resolve user's historical search word, sort from high to low according to the search word frequency of utilization, the choice for use frequency surpasses the historical search word of certain threshold value, write according to the order of sequence the historical set of words of paying close attention to of access customer, it is the searchVoc_past collection, travel through afterwards the searchVoc_past collection, obtain each historical search word other historical users except the active user, write with the interest user and gather, it is the user_sameInt collection, obtain successively the historical search word that user_sameInt concentrates each user, inquire about respectively the frequency of utilization of each historical search word, write from high to low the historical set of words of paying close attention to the interest user according to frequency of utilization, it is the searchVoc_past_other collection, the searchVoc_past_other collection is traveled through, avoiding under the prerequisite of repetition, word order is wherein added the searchVoc_past collection, form search word according to the searchVoc_past collection and push tabulation, export client to, initiate search module for the user and call;
(2) receive search word and push tabulation, resolve search word wherein, be presented in order client, and provide check button and rank button, the permission user selects each search word or cancels, and the priority that search word is set, and dynamically changes the search word set according to user's selection result, support simultaneously the user that search word is gathered and carry out artificial supplementation or modification, to form the search application of final submission;
(3) receive the search application, the Client-initiated search behavior is carried out record, described search behavior is comprised of the search word of user's input and the order of search word, the search word of user's input is write the set of search word user selection according to the order of sequence, it is the searchVoc_select collection, traversal searchVoc_select collection, whether judgement search word wherein is present in the user is paid close attention in the storehouse, if exist, then upgrade the current frequency of utilization of this word, otherwise my the historical focus of then this word being write in the access customer concern storehouse is gathered, it is initial value that current frequency of utilization is set simultaneously, provides the data basis for follow-up search word pushes;
(4) carry out first search according to the Client-initiated search behavior, at first according to the priority of search word whole search words that searchVoc_select concentrates are carried out full permutation and combination, searchVoc_select after the permutation and combination concentrates and comprises autonomous word and portmanteau word, searchVoc_select collection after the traversal permutation and combination, the Search Results that is complementary of inquiry and each word wherein successively, namely represent to comprise in the Search Results this autonomous word with the autonomous word coupling, represent namely that with the portmanteau word coupling Search Results comprises each element, matching result for each search word, statistics in full in the matching frequency of search word, sort from high to low by matching frequency, press the word order of searchVoc_select collection with the search result list combination of all couplings, write the initial search result set, it is the result_first collection, described search result list is by the object information title, summary, the source forms, wherein, summary is passages maximum with the search word coupling in result's full text, export the result_first collection that forms to client, check for the user;
(5) recording user writes first Search Results user with user's screening behavior and screens collection, be i.e. the result_userSelect collection the operation of result_first collection.Described user screen behavior by user selection as a result ID, number of clicks and result form the time of checking as a result.For each bar result, " number of clicks x result checks the time as a result " carried out read group total, obtain the user to each bar result's degree of concern, sort from high to low according to the degree of concern value, parse respectively each result's summary info, summary info is write access customer the selection result summary collection in order, i.e. result_abstract collection is for participle;
(6) traversal result_abstract collection, parse successively the summary info that the user pays close attention to the result, the contrast wordbook, adopt the reverse matching algorithm participle, described wordbook is Hash table, it is the array that HashMap forms, array length is the Chinese character number that can be used as lead-in in the dictionary, array indexing is the region-position code of this Chinese character, each element of array is the HashMap that all words corresponding to this lead-in form, and wherein word itself is as the key of HashMap, and word frequency is as the value of HashMap, after participle is complete, the nonsense word rejected in contrast nonsense dictionary, with the word segmentation result of each piece summary as independent array, write summary word segmentation result discrete set, be the abstract_cut_apart collection, extract simultaneously the union of word segmentation result, namely do not have the maximum set of repetitor, write summary word segmentation result combination of sets, i.e. the abstract_cut_unit collection;
(7) the concentrated word of traversal abstract_cut_unitt, comparison abstract_cut_apart collection, resolve the number of times that each word occurs in the difference summary, the number of times that described each word occurs in the difference summary does not comprise the number of times that this word occurs in same summary, the word that occurrence number is identical with the summary record, be that the word that all occurs in each piece summary compiles and writes the summary word segmentation result and occurs simultaneously, it is the abstract_cut_same collection, contrast Chinese classification scheme vocabulary, analyze the abstract_cut_same collection, for having with generation relation and the word of correlationship with word wherein, write summary word segmentation result restructuring collection, be the abstract_cut_reorg collection, for binary search.
(8) at first resolve the abstract_cut_same collection, carry out permutation and combination according to the word in the method pair set in the first search module, each search word that traversal abstract_cut_same concentrates, the document that matches in obtaining successively in full, the picture that matches in the title and video, wherein, for portmanteau word, wherein each element is satisfied in the expression that matches, afterwards, resolve the abstract_cut_reorg collection, obtain and the document of each autonomous word coupling wherein, picture and video, all document files are write binary search document results collection by search order, it is result_second doc collection, all picture files are write binary search picture result set by search order, it is the result_second_image collection, all video files are write binary search results for video collection by search order, it is the result_second_vedio collection, return result_second doc collection, three of result_second image collection and result_second_vedio collection are gathered to client, category shows the user with Search Results, for the user provides more accurately Search Results.
The present invention's advantage compared with prior art is:
(1) the present invention combines the semantic expansion of keyword and user interest deduction, on the basis that catches user interactive, carry out semantic expansion by the key message in extracting and resolve in full, realize the reconstruct of search word, improved authority, the convergence of search word, made Search Results more meet user's true intention.
(2) the present invention can be by catching the user profile realization to the automatic propelling movement of historical search word, and support the user to the selection input of search word, autonomous ordering, simplified the workload of user's inputted search word in the existing search engine, for the user carry out information search provide a kind of flexibly, interactive interface easily.
(3) the present invention can constantly replenish and improve user's focus by the search application of recording user initiation, strengthens subsequent query result's accuracy, has improved the intelligent degree of search engine system.
(4) the present invention at first returns the Search Results of some according to the initial ranging word when the user submits searching request to, responds fast user's request; When the user checks information, carry out search word reconstruct and binary search according to user's operational feedback, and more deep Search Results is fed back to the user with recommendation form, when guaranteeing search efficiency, improved recall ratio and precision ratio.
Description of drawings
Fig. 1 is the system assumption diagram of system of the present invention;
Fig. 2 is the search word pushing module implementation procedure in the system of the present invention;
Fig. 3 is that the user in the system of the present invention initiates the search module implementation procedure;
Fig. 4 is that the user's focus in the system of the present invention upgrades and first search module implementation procedure;
Fig. 5 is the user interest inference module implementation procedure in the system of the present invention;
Fig. 6 is that user of the present invention pays close attention to conclusive participle, reconstruct and binary search module implementation procedure.
Embodiment
The user oriented information search engine of the present invention system, its system is comprised of the server and client side, database server adopts the Xeon2.8 dual core processor, the 16G internal memory, the 2TB hard disk, be responsible for all data messages of storage, dispose simultaneously tape library and backup software, use as historical data backup and recovery; Application server adopts LinuX operating system, the data management software that Oracle9i is above, comprise that search word pushing module, user's focus upgrade and first search module, user interest inference module, search word reconstruct and binary search module, be responsible for the rear end of data that client is transmitted and resolve and work for the treatment of; Client host adopts 3.0CPU, 4G internal memory, 500G hard disk, use Windows XP operating system, undertaken alternately by B/S mode and server, major function is that front end is showed, comprise that the user initiates search module, and first Search Results and binary search result's displaying work.
In order to understand better the present invention, first basic concepts is once explained.
Search word pushes tabulation: by an array of recommending search word to form, element in the array forms by user's historical search word with interest user's historical search word, each element is a search word, and the order of element is arranged from high to low by the frequency of utilization of search word.
The set of search word user selection: the user searches for wish according to self, artificially screens the search word tabulation of formation by the search word that system is pushed.The screening operation comprises that selecting certain propelling movement word to remove certain as search word, from the tabulation of current search word pushes word, adjusts the word order in the search word tabulation, additional new search word etc.The set of search word user selection is denoted as the searchVoc_select collection.
The searchVoc_select collection of recombinating: the search word that searchVoc_select is concentrated carries out the set after the full permutation and combination, word order after the full permutation and combination is take the priority of abideing by each word in the former set as principle, suppose that the searchVoc_select collection is (A, B, C), then carry out full permutation and combination searchVoc_select restructuring collection afterwards and be (ABC, AB, AC, BC, A, B, C), restructuring is concentrated and is comprised autonomous word and portmanteau word, independently refers to have the word of the independent meaning of a word, comprises A in this example, B, C, portmanteau word refers to that a plurality of word combinations at the word of first phase, comprise ABC in this example, AB, AC, BC.
Reverse matching algorithm: a kind of basic minute word algorithm, its basic thought is: suppose that the contained Chinese character number of maximum entry is n in the dictionary, from the end of pending character string, get forward n word as matching field, search the participle dictionary, if contain this word in the dictionary, then the match is successful, tell this word, then belong to n+1 beginning from the end from pending character string and get again the field of n word composition and again dictionary, mate; The match is successful if do not have, and reject last position of the field that then this n word is formed, mates in dictionary with the field that n-1 remaining word forms, and so goes on, until the cutting success.For example, in the participle process, suppose that the word string in the text is ABC, W is dictionary, if C ∈ is W, and BC ∈ W,
Figure BSA00000799472300111
W so just gets cutting A/BC.
Summary word segmentation result discrete set: the Search Results user is screened concentrated each piece summary info carry out respectively participle, the array that is formed by each piece word segmentation result.Array length is the record of Search Results, and array element is each piece summary minute set of words, for example, note summary word segmentation result discrete set is abstract_cut_apart, the form of its first element is: abstract_cut_apart[0]={ liquid engine forms, and comprises ... }.
Summary word segmentation result combination of sets: the array that the union of each piece summary word segmentation result forms.Array length is 1, and array element is the union that comprises all participles.
Chinese classification scheme vocabulary: the retrieval language vocabulary of the standardization dynamic of semantic relation between word and word of showing topics, be the main tool of subject indexing, retrieval and organization directory, index, the theme professional range that Chinese classification scheme vocabulary relates to comprises subject and the Subject Concept of philosophy, social science and all spectras such as natural science, engineering.By the semantic relation of search word in the Chinese classification scheme vocabulary of inquiry, realize the expansion of search word semanteme among the present invention.
Semantic net: the combination of word and word Relations Among in the Chinese classification scheme vocabulary, the relation between the descriptor comprise week, generation, genus, minute, family, ginseng, the relation of its correspondence meets and is respectively " Y ", " D ", " S ", " F ", " Z ", " C ".Wherein, the vocabulary of " Y " back shows the formal expression of an article or item in a contract descriptor; The vocabulary of " D " back shows the unofficial expression of an article or item in a contract descriptor; The vocabulary of " S " back shows the hypernym of an article or item in a contract descriptor, than the high grade of an article or item in a contract descriptor; The vocabulary of " F " back shows the hyponym of an article or item in a contract descriptor, than the low grade of an article or item in a contract descriptor; The vocabulary of " Z " back shows the top term of an article or item in a contract descriptor; The vocabulary of " C " back show an article or item in a contract descriptor with reference to word.
The present invention is described in detail below in conjunction with accompanying drawing
As shown in Figure 1, the user oriented information search engine systematic search of the present invention word pushing module, user initiate search module, user's focus update module, search module, user interest inference module, user pay close attention to conclusive participle module, search word reconstructed module and binary search module composition for the first time.
Whole implementation procedure is as follows:
(1) server is according to active user's identity information, inquiring user is paid close attention to the storehouse, described user pays close attention to the storehouse and forms by my historical focus and with the historical focus two parts of interest user, described my historical focus and form with the frequency of utilization of the historical focus of interest user by historical search word and search word, at first resolve user's historical search word, sort from high to low according to the search word frequency of utilization, the choice for use frequency surpasses the historical search word of certain threshold value, write according to the order of sequence the historical set of words of paying close attention to of access customer, it is the searchVoc_past collection, travel through afterwards the searchVoc_past collection, obtain each historical search word other historical users except the active user, write with the interest user and gather, it is the user_sameInt collection, obtain successively the historical search word that user_sameInt concentrates each user, inquire about respectively the frequency of utilization of each historical search word, write from high to low the historical set of words of paying close attention to the interest user according to frequency of utilization, it is the searchVoc_past_other collection, the searchVoc_past_other collection is traveled through, avoiding under the prerequisite of repetition, word order is wherein added the searchVoc_past collection, form search word according to the searchVoc_past collection and push tabulation, export client to;
(2) receive search word and push tabulation, resolve search word wherein, be presented in order client, and provide check button and rank button, the permission user selects each search word or cancels, and the priority that search word is set, and dynamically changes the search word set according to user's selection result, support simultaneously the user that search word is gathered and carry out artificial supplementation or modification, to form the search application of final submission;
(3) receive the search application, the Client-initiated search behavior is carried out record, described search behavior is comprised of the search word of user's input and the order of search word, the search word of user's input is write the set of search word user selection according to the order of sequence, it is the searchVoc_select collection, traversal searchVoc_select collection, whether judgement search word wherein is present in the user is paid close attention in the storehouse, if exist, then upgrade the current frequency of utilization of this word, otherwise, my the historical focus of then this word being write in the access customer concern storehouse is gathered, it is initial value that current frequency of utilization is set simultaneously, for pushing, follow-up search word provides the data basis, simultaneously, carry out first search according to the Client-initiated search behavior, at first according to the priority of search word whole search words that searchVoc_select concentrates are carried out full permutation and combination, searchVoc_select after the permutation and combination concentrates and comprises autonomous word and portmanteau word, searchVoc_select collection after the traversal permutation and combination, the Search Results that is complementary of inquiry and each word wherein successively, namely represent to comprise in the Search Results this autonomous word with the autonomous word coupling, represent namely that with the portmanteau word coupling Search Results comprises each element, matching result for each search word, statistics in full in the matching frequency of search word, sort from high to low by matching frequency, press the word order of searchVoc_select collection with the search result list combination of all couplings, write the initial search result set, it is the result_first collection, described search result list is by the object information title, summary, the source forms, wherein, summary is passages maximum with the search word coupling in result's full text, export the result_first collection that forms to client, check for the user;
(4) recording user writes first Search Results user with user's screening behavior and screens collection, be i.e. the result_userSelect collection the operation of resuIt_first collection.Described user screen behavior by user selection as a result ID, number of clicks and result form the time of checking as a result.For each bar result, " number of clicks x result checks the time as a result " carried out read group total, obtain the user to each bar result's degree of concern, sort from high to low according to the degree of concern value, parse respectively each result's summary info, summary info is write access customer the selection result summary collection in order, i.e. result_abstract collection is for participle;
(5) traversal result_abstract collection, parse successively the summary info that the user pays close attention to the result, the contrast wordbook, adopt the reverse matching algorithm participle, described wordbook is Hash table, it is the array that HashMap forms, array length is the Chinese character number that can be used as lead-in in the dictionary, array indexing is the region-position code of this Chinese character, each element of array is the HashMap that all words corresponding to this lead-in form, wherein word itself is as the key of HashMap, word frequency is as the value of HashMap, after participle is complete, contrast nonsense dictionary, the nonsense word is rejected, with the word segmentation result of each piece summary as independent array, write summary word segmentation result discrete set, it is the abstract_cut_apart collection, extract simultaneously the union of word segmentation result, the maximum set that does not namely have repetitor, write summary word segmentation result combination of sets, it is the abstract_cut_unit collection, the word that traversal abstract_cut_uIitt concentrates, comparison abstract_cut_apart collection, resolve the number of times that each word occurs in the difference summary, the number of times that described each word occurs in the difference summary does not comprise the number of times that this word occurs in same summary, the word that occurrence number is identical with the summary record, be that the word that all occurs in each piece summary compiles and writes the summary word segmentation result and occurs simultaneously, it is the abstract_cut_same collection, contrast Chinese classification scheme vocabulary, analyze the abstract_cut_same collection, for having with generation relation and the word of correlationship with word wherein, write summary word segmentation result restructuring collection, it is the abstract_cut_reorg collection, resolve the abstract_cut_same collection, carry out permutation and combination according to the word in the method pair set in the first search module, each search word that traversal abstract_cut_same concentrates, the document that matches in obtaining successively in full, the picture that matches in the title and video, wherein, for portmanteau word, wherein each element is satisfied in the expression that matches, afterwards, resolve the abstract_cut_reorg collection, obtain and the document of each autonomous word coupling wherein, picture and video, all document files are write binary search document results collection by search order, it is the result_second_doc collection, all picture files are write binary search picture result set by search order, it is the result_second_image collection, all video files are write binary search results for video collection by search order, it is the result_second_vedio collection, return the result_second_doc collection, three of result_second_image collection and result_second_vedio collection are gathered to client, category shows the user with Search Results, for the user provides more accurately Search Results.The specific implementation process of above-mentioned each module is as follows:
1. search word pushing module
This Model Implement process is as shown in Figure 2:
(1) catches user profile, store the session of identity information when logining according to the user, obtain current registrant's user name, ID;
(2) pay close attention to table according to the user ID inquiring user, be denoted as searchVoc_past_table, extract historical search word and the search word frequency of utilization of mating with this ID, search word is designated as V, and frequency of utilization is designated as F, the result is pressed the descending sort of F value;
(3) establishing default word frequency threshold value is E, than the size of frequency of utilization F with setting threshold E;
If F>=E a., then the V that F is corresponding writes the historical word set that access customer is paid close attention to, and is denoted as the searchVoc_past collection;
If b. F<E does not then process;
(4) resolve the searchVoc_past collection, travel through successively search word V wherein, inquiry searchVoc_past_able table, acquisition and other user ID except the active user that V mates write the user_sameInt collection;
(5) concentrate each user ID according to user_sameInt, inquiry searchVoc_past_table, obtain the historical search word record of coupling, add up respectively the frequency of utilization of each search word in searchVoc_past_table, write from high in the end the searchVoc_past_other collection by frequency;
(6) traversal searchVoc_past_other collection judges successively whether this word has been present in searchVoc_past and has concentrated;
If a. exist, then this word is not dealt with, continue to resolve next word;
B. if there is no, then this word being written to searchVoc_past concentrates.
(7) after traversal finished, the searchVoc_past collection of formation was search word and pushes tabulation.
2. the user initiates search module
This Model Implement process is as shown in Figure 3:
(1) receive search word and push tabulation, i.e. searchVoc_past collection, and read in the buffer zone;
(2) length of judgement searchVoc_past collection is designated as L;
(3) if L>0, then
(3.1) read successively search word take L as loop limit and push search word in the tabulation, comprise search word ID and search word content, the search word content is presented at client, and generates the check box button in each search word the place ahead, the ID of check box button is the ID of the current search word that reads;
(3.2) after traversal finishes, generate rank button, be presented at client;
(4) content in the search box is stored as character string, is denoted as str_searchVoc;
(5) judge that whether str_searchVoc is empty, if be sky, then initialization str_searchVoc be one to the lattice character;
(6) parsing user's operation,
(6.1) choose certain search word check box, judge whether comprise selected search word among the str_searchVoc, if comprise, do not do any operation; If do not comprise, then this search word is attached to after the srt_searchVoc, simultaneously the appended space separator;
(6.2) certain search word check box is chosen in cancellation, judges whether comprise selected search word among the str_searchVoc; If comprise, then remove this search word with and subsequent space-separated symbol; If do not comprise, do not do any operation;
(6.3) move/move down in the ordering, judge whether the search word chosen,
If (6.3.1) do not have, prompting user is selected;
If the search word of (6.3.2) choosing is greater than one, prompting user can only select a search word to operate;
If (6.3.3) chosen a search word, then
A. move/move down one on will this search word order and arrange the position, and according to current sort order each search word be formed a character string, be denoted as str_searchVoc_newSeq, between word and the word with space-separated;
B. compare of analysis str_searchVoc and str_searchVoc_newSeq, resolve successively the search word among the str_searchVoc_newSeq, and judge whether to be present among the str_searchVoc, if there is no in str_searchVoc, in str_searchVoc_newSeq, reject;
C. the character string that is disposed is replaced str_searchVoc, and the word among the str_searchVoc is write again the search box of client;
(6.4) after the user confirms operation, submit searching request to, str_searchVoc is submitted to server.
3. user's focus upgrades and first search module
User's focus upgrade and first search module by user's focus update module and two sub-module compositions of first search module, its implementation as shown in Figure 4:
(1) receive the search application, i.e. str_searchVoc, and deposit buffer zone in;
(2) resolve str_searchVoc, by separator the search word among the str_searchVoc is divided into array, write successively the searchVoc_select collection;
(3) traversal searchVoc_select collection judges whether wherein search word is present in the user and pays close attention to user's concern table in the storehouse, i.e. searchVoc_past_table,
If a. exist, then read the current frequency of utilization of this word, be designated as f, after being converted to integer, the f value adds 1;
B. if there is no, then this word is write the user's concern table in the access customer concern storehouse, the insertion value comprises user ID, search word content and search word frequency of utilization, and the search word frequency of utilization is set as initial value.
(4) resolve the searchVoc_select collection, because search word too much can cause the bulk redundancy of Search Results, therefore only extract from front to back the search word of some;
(5) form all permutation and combination of the search word extract, form and reset character string, between the portmanteau word with branch ("; ") separate, separate with comma (", ") between the element of portmanteau word (each autonomous word that namely forms portmanteau word);
(6) by semicolon separated symbol ("; ") counterweight row character string separates, and writes searchVoc_select restructuring collection, is denoted as searchVoc_select_reArr;
(7) traversal searchVoc_select_reArr proceeds as follows wherein each element:
(7.1) press CSV symbol (", ") each element is separated, generate the arr_conVoc array;
(7.2) length L of judgement arr_conVoc array,
If (7.2.1) L=1 illustrates that then this element is autonomous word, this word is carried out match search, for each Search Results, carry out following operation:
A. resolve Search Results in full, be separated out each natural paragraph take carriage return as separator, be denoted as the arr_result_para array;
B. travel through the arr_result_para array, add up successively the number that comprises this autonomous word in each element, one sections that number is maximum extract, as summary;
C. title, summary, source are combined as search result list, write the initial search result collection, be i.e. the result_first collection;
If (7.2.2) L>1 illustrates that then this element is portmanteau word, extract successively the daughter element in the arr_conVoc array of this element, namely autonomous word is carried out search according to the step in (4.2.1), obtains to satisfy the result of all autonomous words, will
(8) after traversal finishes, form final result_first collection, export client to, check for the user
4. user interest inference module
The implementation procedure of user interest inference module is as shown in Figure 5:
(1) the first Search Results user of initialization screens collection, i.e. the result_userSelect collection;
(2) for user's operation, record selected result's ID;
(3) judge whether this ID has been present in result_userSelect and has concentrated,
(3.1) if there is no, then
A. the serial number P of this Search Results of this initialization ID is 1, and number of clicks N is 1, records the current operation time T Current, write result_userSelect and concentrate;
B. take out result_userSelect and concentrate maximum T value, be designated as T MaxLastIf the T value does not exist, then get T MaxLastBe 0;
C. pass through T Current-T MaxLastCalculate T MaxLastThe browsing time of corresponding Search Results, write result_userSelect and concentrate;
(3.2) if exist, then take out the number of clicks N of the current correspondence of this ID, the N value is increased 1, and upgrade result_userSelect and concentrate respective value;
(4) traversal result_userSelect collection carries out user's attention rate by " for the first time click of result order/summation (number of clicks x result checks the time as a result) " and calculates, and tries to achieve the user to each bar result's of screening attention rate;
(5) from high to low the result_userSelect collection is sorted by the respective user attention rate, the information of the some that extraction is stood out, take out respectively summary info corresponding to each ID value, write access customer the selection result summary collection, i.e. the result_abstract collection.
5. the user pays close attention to conclusive participle, reconstruct and binary search module
The user pays close attention to conclusive participle, reconstruct and binary search and pays close attention to conclusive participle module, search word reconstructed module and three sub-module compositions of binary search module by the user, implementation procedure as shown in Figure 6:
(1) traversal result_absrtact collection, parse successively the summary info that the user pays close attention to the result, the contrast wordbook, adopt the reverse matching algorithm participle, the word segmentation result of each piece summary is as independent array, write summary word segmentation result discrete set, i.e. abstract_cut_apart collection, the array number is designated as N;
(2) extract the union of word segmentation result, namely do not have the maximum set of repetitor, write summary word segmentation result combination of sets, i.e. the abstract_cut_unit collection;
(3) set that all summary word segmentation result is formed travels through, and to each search word wherein, carries out following operation;
(3.1) frequency of occurrences F of initialization current search word Abs=0;
(3.2) each concentrated array element of traversal abstract_cut_apart judges whether comprise the current search word in this array element;
If e. comprise, F then Abs=F Abs+ 1, continue to judge next bar array element;
If f. do not comprise, F AbsBe worth constant.
(3.3) F that the current search word is corresponding AbsThe array number that value and abstract_cut_apart concentrate compares;
If a. Fabs=N writes the summary word segmentation result with the current search word and occurs simultaneously, be i.e. abstract_cut_same;
If b. Fabs<N does not process, continue to judge next search word.
(4) traversal abstract_cut_same collection, to each search word wherein, retrieval is take the semantic net of this word as an article or item in a contract descriptor in Chinese classification scheme vocabulary;
(4.1) if the relative that is designated " Y " is arranged in the semantic net, represent that this word has formal expression word, word be will formally express and summary word segmentation result restructuring collection, i.e. abstract_cut_reorg write;
(4.2) if the relative that is designated " D " is arranged in the semantic net, represent that this word has unofficial expression word, will unofficially express word and write set abstract_cut_reorg;
(4.3) if the relative that is designated " C " is arranged in the semantic net, represent that this word has the associated expression word of the meaning of a word, the correlated expression word is write set abstract_cut_reorg.
(5) according to the method in the first search module word in the abstract_cut_same set is carried out permutation and combination, each search word that traversal abstract_cut_same concentrates, according to the searching method of first search module, the picture and the video that match in the document that matches in obtaining successively in full, the title;
(6) each autonomous word is read in traversal abstract_cut_reorg set, the picture and the video that match in the document that matches in obtaining successively in full, the title;
(7) all document files are write the result_second_doc collection by search order, all picture files are write the result_second_image collection by search order, all video files are write the result_second_vedio collection by search order;
(8) returning three of result_second_doc, result_second_image and esult_second_vedio gathers to client, category shows the user with Search Results, this Search Results of prompting user may more meet its intention, deeply checks for the user.
Applicating example: system and method for the present invention has been successfully applied in the development of spacecraft model of space transporter Institute for Research and Technology, assist quick, the convenient knowledge information that obtains needing most of research and development designer, proved that system and method for the present invention has dirigibility, convenience and intelligentized advantage.
The part that the present invention does not describe in detail belongs to techniques well known.

Claims (4)

1. user oriented information search engine system, it is characterized in that described information search engine system is made of client and server, dispose search word pushing module, user's focus update module, first search module, user interest inference module, search word reconstruct at server end and look into piece and binary search module; Client host is undertaken alternately by B/S mode and server, initiates search module, first search module the user of client's mouth end administration; Wherein above-mentioned each module is achieved as follows:
The search word pushing module: server is according to active user's identity information, and inquiring user is paid close attention to the storehouse, and described user pays close attention to the storehouse and forms by my historical focus and with the historical focus two parts of interest user; Described my historical focus and form with the frequency of utilization of the historical focus of interest user by historical search word and search word; At first resolve user's historical search word, sort from high to low according to the search word frequency of utilization, the choice for use frequency surpasses the historical search word of setting threshold, write according to the order of sequence the historical set of words of paying close attention to of access customer, it is the searchVoc_past collection, travel through afterwards the searchVoc_past collection, obtain each historical search word other historical users except the active user, write with the interest user and gather, it is the user_sameInt collection, obtain successively the historical search word that user_sameInt concentrates each user, inquire about respectively the frequency of utilization of each historical search word, write from high to low historical set of words, i.e. the searchVoc_past_other collection paid close attention to the interest user according to frequency of utilization, the searchVoc_past_other collection is traveled through, avoiding under the prerequisite of repetition, the word order adding searchVoc_past collection with wherein forms search word according to the searchVoc_past collection and pushes tabulation, export client to, initiate search module for the user and call;
The user initiates search module: the search word that receives the output of search word pushing module pushes tabulation, resolve search word wherein, be presented in order client, and provide check button and rank button, the permission user selects each search word or cancels, and the priority that search word is set, dynamically change the search word set according to user's selection result, support simultaneously the user that search word is gathered and carry out artificial supplementation or modification, to form the search application of final submission, call for user's focus update module and first search module;
User's focus update module: receive the search application, the Client-initiated search behavior is carried out record, described search behavior is comprised of the search word of user's input and the order of search word, the search word of user's input is write the set of search word user selection according to the order of sequence, it is the searchVoc_select collection, traversal searchVoc_select collection, whether judgement search word wherein is present in the user is paid close attention in the storehouse, if exist, then upgrade the current frequency of utilization of this word, otherwise my the historical focus of then this word being write in the access customer concern storehouse is gathered, and it is initial value that current frequency of utilization is set simultaneously;
First search module: carry out first search according to the Client-initiated search behavior, at first according to the priority of search word whole search words that searchVoc_select concentrates are carried out full permutation and combination, searchVoc_select collection after the permutation and combination is denoted as searchVoc_select restructuring collection, comprising autonomous word and portmanteau word, traversal searchVoc_select restructuring collection, the Search Results that is complementary of inquiry and each word wherein successively, namely represent to comprise in the Search Results this autonomous word with the autonomous word coupling, represent namely that with the portmanteau word coupling Search Results comprises each element, matching result for each search word, statistics in full in the matching frequency of search word, sort from high to low by matching frequency, press the word order of searchVoc_select restructuring collection with the search result list combination of all couplings, write the initial search result set, it is the result_first collection, described search result list is by the object information title, summary, the source forms, wherein, summary is passages maximum with the search word coupling in result's full text, export the result_first collection that forms to client, check for the user;
The user interest inference module: recording user writes first Search Results user with user's screening behavior and screens collection, be i.e. the result_userSelect collection the operation of result_first collection; Described user screen behavior by user selection as a result ID, number of clicks and result form the time of checking as a result; For each bar result, " number of clicks x result checks the time as a result " carried out read group total, obtain the user to each bar result's degree of concern, sort from high to low according to the degree of concern value, parse respectively each result's summary info, summary info is write access customer the selection result summary collection in order, and namely the result_abstract collection exports the user to and pays close attention to the conclusive participle module;
The user pays close attention to the conclusive participle module: traversal result_abstract collection, parse successively the summary info that the user pays close attention to the result, the contrast wordbook, adopt the reverse matching algorithm participle, described wordbook is Hash table, it is the array that HashMap forms, array length is as the Chinese character number of lead-in in the dictionary, array indexing is the region-position code of this Chinese character, and each element of array is the HashMap that all words corresponding to this lead-in form, and wherein word itself is as the key of HashMap, word frequency is as the value of HashMap, after participle was complete, the nonsense word rejected in contrast nonsense dictionary, with the word segmentation result of each piece summary as independent array, write summary word segmentation result discrete set, i.e. abstract_cut_apart collection extracts the union of word segmentation result simultaneously, the maximum set that does not namely have repetitor, write summary word segmentation result combination of sets, namely the abstract_cut_unit collection all exports abstract_cut_apart collection and two set of abstract_cut_unit collection to the search word reconstructed module;
Search word reconstructed module: the word that traversal abstract_cut_unitt concentrates, comparison abstract_cut_apart collection, resolve the number of times that each word occurs in the difference summary, the number of times that described each word occurs in the difference summary does not comprise the number of times that this word occurs in same summary, the word that occurrence number is identical with the summary record, be that the word that all occurs in each piece summary compiles and writes the summary word segmentation result and occurs simultaneously, it is the abstract_rcut_same collection, contrast Chinese classification scheme vocabulary, analyze the abstract_cut_same collection, for having with generation relation and the word of correlationship with word wherein, write summary word segmentation result restructuring collection, be the abstract_cut_reorg collection, all export abstract_cut_same collection and two set of abstract_cut_reorg collection to the binary search module;
Binary search module: at first resolve the abstract_cut_same collection, carry out permutation and combination according to the word in the method pair set in the first search module, each search word that traversal absrtact_cut_same concentrates, the document that matches in obtaining successively in full, the picture that matches in the title and video, wherein, for portmanteau word, wherein each element is satisfied in the expression that matches, afterwards, resolve the abstract_cut_reorg collection, obtain and the document of each autonomous word coupling wherein, picture and video, all document files are write binary search document results collection by search order, it is the result_second_doc collection, all picture files are write binary search picture result set by search order, it is the result_second_image collection, all video files are write binary search results for video collection by search order, it is the result_second_vedio collection, return the result_second_doc collection, three of result_second_image collection and result_second_vedio collection are gathered to client, category shows the user with Search Results, this Search Results of prompting user can more meet user view, deeply checks for the user.
2. user oriented information search engine according to claim 1 system, it is characterized in that: described search word pushing module implementation procedure is as follows:
(1) catches user profile, store the session of identity information when logining according to the user, obtain current registrant's user name, Customs Assigned Number, i.e. ID;
(2) pay close attention to the storehouse according to the user ID inquiring user, extract historical search word and the search word frequency of utilization of mating with this ID, search word is designated as V, and frequency of utilization is designated as F, the result is pressed the descending sort of F value;
(3) establishing default word frequency threshold value is E, than the size of frequency of utilization F with word frequency threshold value E;
If F>=E a., then the V that F is corresponding writes the historical word set of paying close attention to of access customer, is denoted as the searchVoc_past collection;
If b. F<E does not then process;
(4) resolve the searchVoc_past collection, travel through successively search word V wherein, inquiring user is paid close attention to the storehouse, and other user ID except the active user of acquisition and V coupling write with the interest user and gather, be i.e. the uset_sameInt collection;
(5) concentrate each user ID according to user_sameInt, inquiring user is paid close attention to the storehouse, obtain respectively the historical search word record with each user ID coupling, historical search word in the traversal record, add up respectively each search word user and pay close attention to frequency of utilization in the storehouse, write from high in the end historical set of words, i.e. the searchVbc_past_other collection paid close attention to the interest user by frequency;
(6) traversal searchVoc_past_other collection judges successively whether this word has been present in searchVoc_past and has concentrated;
If a. exist, then this word is not dealt with, continue to resolve next word;
B. if there is no, then this word being joined searchVoc_past concentrates;
(7) with the searchVoc_past collection as storage of array in buffer memory, push tabulation as search word and export client to, initiate search module for the user and call.
3. user oriented information search engine according to claim 1 system, it is characterized in that: described search word reconstructed module implementation procedure is as follows:
(1) traversal user the selection result summary collection, it is the result_abstract collection, parse successively the summary info that the user pays close attention to the result, the contrast wordbook, adopt the reverse matching algorithm participle, the word segmentation result of each piece summary writes summary word segmentation result discrete set as independent array, be the abstract_cut_apart collection, the array number is designated as N;
(2) extract the union of word segmentation result, namely do not have the maximum set of repetitor, write summary word segmentation result combination of sets, i.e. the abstract_cut_unit collection;
(3) the abstract_cut_unit collection is traveled through, to each search word wherein, carry out following operation;
(3.1) frequency of occurrences F of initialization current search word Abs=0;
(3.2) each concentrated array element of traversal abstract_cut_apart judges whether comprise the current search word in this array element;
If a. comprise, F then Abs=F Abs+ 1, continue to judge next bar array element;
If b. do not comprise, F AbsBe worth constant.
(3.3) F that the current search word is corresponding AbsThe array number that value and abstract_cut_apart concentrate compares;
If a. Fabs=N writes the summary word segmentation result with the current search word and occurs simultaneously, be i.e. abstract_cut_same;
If b. Fabs<N does not process, continue to judge next search word;
(4) traversal abstract_cut_same collection, to each search word wherein, retrieval is take the semantic net of this word as an article or item in a contract descriptor in Chinese classification scheme vocabulary;
(4.1) if the relative that is designated " Y " is arranged in the semantic net, represent that this word has formal expression word, will formally express word and write set abstract_cut_reorg;
(4.2) if the relative that is designated " D " is arranged in the semantic net, represent that this word has unofficial expression word, will unofficially express word and write set abstract_cut_reorg;
(4.3) if the relative that is designated " C " is arranged in the semantic net, represent that this word has the associated expression word of the meaning of a word, the correlated expression word is write set abstract_cut_reorg;
(5) all export abstract_cut_same collection and absrtact_cut_reorg collection to the binary search module as array.
4. user oriented information search engine implementation method is characterized in that step is as follows:
(1) server is according to active user's identity information, inquiring user is paid close attention to the storehouse, described user pays close attention to the storehouse and forms by my historical focus and with the historical focus two parts of interest user, described my historical focus and form with the frequency of utilization of the historical focus of interest user by historical search word and search word, at first resolve user's historical search word, sort from high to low according to the search word frequency of utilization, the choice for use frequency surpasses the historical search word of certain threshold value, write according to the order of sequence the historical set of words of paying close attention to of access customer, it is the searchVoc_past collection, travel through afterwards the searchVoc_past collection, obtain each historical search word other historical users except the active user, write with the interest user and gather, it is the user_sameInt collection, obtain successively the historical search word that user_sameInt concentrates each user, inquire about respectively the frequency of utilization of each historical search word, write from high to low the historical set of words of paying close attention to the interest user according to frequency of utilization, it is the searchVoc_past_other collection, the searchVoc_past_other collection is traveled through, avoiding under the prerequisite of repetition, word order is wherein added the searchVoc_past collection, form search word according to the searchVoc_past collection and push tabulation, export client to, initiate search module for the user and call;
(2) receive search word and push tabulation, resolve search word wherein, be presented in order client, and provide check button and rank button, the permission user selects each search word or cancels, and the priority that search word is set, and dynamically changes the search word set according to user's selection result, support simultaneously the user that search word is gathered and carry out artificial supplementation or modification, to form the search application of final submission;
(3) receive the search application, the Client-initiated search behavior is carried out record, described search behavior is comprised of the search word of user's input and the order of search word, the search word of user's input is write the set of search word user selection according to the order of sequence, it is the searchVoc_select collection, traversal searchVoc_select collection, whether judgement search word wherein is present in the user is paid close attention in the storehouse, if exist, then upgrade the current frequency of utilization of this word, otherwise my the historical focus of then this word being write in the access customer concern storehouse is gathered, it is initial value that current frequency of utilization is set simultaneously, provides the data basis for follow-up search word pushes;
(4) carry out first search according to the Client-initiated search behavior, at first according to the priority of search word whole search words that searchVoc_select concentrates are carried out full permutation and combination, searchVoc_select collection after the permutation and combination is denoted as searchVoc_select restructuring collection, comprising autonomous word and portmanteau word, traversal searchVoc_select restructuring collection, the Search Results that is complementary of inquiry and each word wherein successively, namely represent to comprise in the Search Results this autonomous word with the autonomous word coupling, represent namely that with the portmanteau word coupling Search Results comprises each element, matching result for each search word, statistics in full in the matching frequency of search word, sort from high to low by matching frequency, press the word order of searchVoc_select restructuring collection with the search result list combination of all couplings, write the initial search result set, it is the result_first collection, described search result list is by the object information title, summary, the source forms, wherein, summary is passages maximum with the search word coupling in result's full text, export the result_first collection that forms to client, check for the user;
(5) recording user writes first Search Results user with user's screening behavior and screens collection, be i.e. the result_userSelect collection the operation of result_first collection; Described user screen behavior by user selection as a result ID, number of clicks and result form the time of checking as a result; For each bar result, " number of clicks x result checks the time as a result " carried out read group total, obtain the user to each bar result's degree of concern, sort from high to low according to the degree of concern value, parse respectively each result's summary info, summary info is write access customer the selection result summary collection in order, i.e. result_abstract collection is for participle;
(6) traversal result_abstract collection, parse successively the summary info that the user pays close attention to the result, the contrast wordbook, adopt the reverse matching algorithm participle, described wordbook is Hash table, it is the array that HashMap forms, array length is the Chinese character number that can be used as lead-in in the dictionary, array indexing is the region-position code of this Chinese character, each element of array is the HashMap that all words corresponding to this lead-in form, and wherein word itself is as the key of HashMap, and word frequency is as the value of HashMap, after participle is complete, the nonsense word rejected in contrast nonsense dictionary, with the word segmentation result of each piece summary as independent array, write summary word segmentation result discrete set, be the abstract_cut_apart collection, extract simultaneously the union of word segmentation result, namely do not have the maximum set of repetitor, write summary word segmentation result combination of sets, i.e. the abstract_cut_unit collection;
(7) the concentrated word of traversal abstract_cut_unitt, comparison abstract_cut_apart collection, resolve the number of times that each word occurs in the difference summary, the number of times that described each word occurs in the difference summary does not comprise the number of times that this word occurs in same summary, the word that occurrence number is identical with the summary record, be that the word that all occurs in each piece summary compiles and writes the summary word segmentation result and occurs simultaneously, it is the abstract_cut_same collection, contrast Chinese classification scheme vocabulary, analyze the abstract_cut_same collection, for having with generation relation and the word of correlationship with word wherein, write summary word segmentation result restructuring collection, be the abstract_cut_reorg collection, for binary search;
(8) at first resolve the absrtact_cut_same collection, carry out permutation and combination according to the word in the method pair set in the first search module, each search word that traversal abstract_cut_same concentrates, the document that matches in obtaining successively in full, the picture that matches in the title and video, wherein, for portmanteau word, wherein each element is satisfied in the expression that matches, afterwards, resolve the abstract_cut_reorg collection, obtain and the document of each autonomous word coupling wherein, picture and video, all document files are write binary search document results collection by search order, it is the result_second_doc collection, all picture files are write binary search picture result set by search order, it is the result_second_image collection, all video files are write binary search results for video collection by search order, it is the result_second_vedio collection, return the result_second_doc collection, three of result_second_image collection and result_second_vedio collection are gathered to client, category shows the user with Search Results, for the user provides more accurately Search Results.
CN201210433731.6A 2012-10-31 2012-10-31 User oriented information search engine system and method Active CN102930022B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210433731.6A CN102930022B (en) 2012-10-31 2012-10-31 User oriented information search engine system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210433731.6A CN102930022B (en) 2012-10-31 2012-10-31 User oriented information search engine system and method

Publications (2)

Publication Number Publication Date
CN102930022A true CN102930022A (en) 2013-02-13
CN102930022B CN102930022B (en) 2015-11-25

Family

ID=47644820

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210433731.6A Active CN102930022B (en) 2012-10-31 2012-10-31 User oriented information search engine system and method

Country Status (1)

Country Link
CN (1) CN102930022B (en)

Cited By (47)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103268312A (en) * 2013-05-03 2013-08-28 同济大学 Training corpus collection system and method based on user feedback
CN103294814A (en) * 2013-06-07 2013-09-11 百度在线网络技术(北京)有限公司 Search result recommendation method, system and search engine
CN103391320A (en) * 2013-07-18 2013-11-13 百度在线网络技术(北京)有限公司 Content recommending method and device based on interest point change
CN103593195A (en) * 2013-11-22 2014-02-19 安一恒通(北京)科技有限公司 Method and device for customizing personalized software
CN103617266A (en) * 2013-12-03 2014-03-05 北京奇虎科技有限公司 Personalized extension search method, device and system
CN103631929A (en) * 2013-12-09 2014-03-12 江苏金智教育信息技术有限公司 Intelligent prompt method, module and system for search
CN104009970A (en) * 2013-09-17 2014-08-27 宁波公众信息产业有限公司 Network information acquisition method
CN104102847A (en) * 2014-07-25 2014-10-15 中国科学技术信息研究所 Chinese descriptor list building system
CN104166700A (en) * 2014-08-01 2014-11-26 百度在线网络技术(北京)有限公司 Search term recommendation method and device
CN104346160A (en) * 2013-08-09 2015-02-11 联想(北京)有限公司 Method for processing information and electronic equipment
CN104933092A (en) * 2015-05-19 2015-09-23 苏州工讯科技有限公司 Screening type searching method aiming at industrial product search
CN105009115A (en) * 2013-11-29 2015-10-28 华为终端有限公司 Method and apparatus for obtaining network resources
CN105069032A (en) * 2015-07-20 2015-11-18 东南大学 Filtering expression and rendering engine based method for automatically monitoring update of dynamic webpage
CN105117479A (en) * 2015-09-11 2015-12-02 北京金山安全软件有限公司 Acquisition method and processing method of user search behavior information and electronic equipment
CN105302897A (en) * 2015-10-21 2016-02-03 无锡天脉聚源传媒科技有限公司 Search result acquisition method and apparatus
CN105447192A (en) * 2015-12-21 2016-03-30 北京奇虎科技有限公司 Method and device for recommending personalized search terms on navigation page
CN105574176A (en) * 2015-12-21 2016-05-11 北京奇虎科技有限公司 Hot word recommending method and device with combination of multiple data sources
CN105808737A (en) * 2016-03-10 2016-07-27 腾讯科技(深圳)有限公司 Information retrieval method and server
CN106156256A (en) * 2015-04-28 2016-11-23 天脉聚源(北京)科技有限公司 A kind of user profile classification transmitting method and system
CN106407337A (en) * 2016-09-05 2017-02-15 深圳震有科技股份有限公司 Quick search method and system
CN106776743A (en) * 2016-11-18 2017-05-31 广东小天才科技有限公司 A kind of reminding method and device for searching for content
CN106919693A (en) * 2017-03-07 2017-07-04 广州优视网络科技有限公司 It is a kind of to improve the method and apparatus that hot word exposes coverage rate
CN107330023A (en) * 2017-06-21 2017-11-07 北京百度网讯科技有限公司 Content of text based on focus recommends method and apparatus
CN107341251A (en) * 2017-07-10 2017-11-10 江西博瑞彤芸科技有限公司 A kind of extraction and the processing method of medical folk prescription and keyword
CN107346336A (en) * 2017-06-29 2017-11-14 北京百度网讯科技有限公司 Information processing method and device based on artificial intelligence
CN107423355A (en) * 2017-05-26 2017-12-01 北京三快在线科技有限公司 Information recommendation method and device, electronic equipment
CN107562747A (en) * 2016-06-30 2018-01-09 上海博泰悦臻网络技术服务有限公司 Method for information display and system, electronic equipment and database
CN107679211A (en) * 2017-10-17 2018-02-09 百度在线网络技术(北京)有限公司 Method and apparatus for pushed information
CN107748745A (en) * 2017-11-08 2018-03-02 厦门美亚商鼎信息科技有限公司 A kind of enterprise name keyword extraction method
CN107832332A (en) * 2017-09-29 2018-03-23 北京奇虎科技有限公司 The method, apparatus and electronic equipment for recommending word are generated in navigating search frame
CN108121731A (en) * 2016-11-29 2018-06-05 渡鸦科技(北京)有限责任公司 Intension recognizing method and device
CN108270840A (en) * 2017-01-04 2018-07-10 阿里巴巴集团控股有限公司 A kind of business monitoring, the searching method of business datum, device and electronic equipment
CN109543113A (en) * 2018-12-21 2019-03-29 北京字节跳动网络技术有限公司 Determine method, apparatus, storage medium and the electronic equipment clicked and recommend word
WO2019072007A1 (en) * 2017-10-12 2019-04-18 阿里巴巴集团控股有限公司 Data processing method and device
CN110019646A (en) * 2017-10-12 2019-07-16 北京京东尚科信息技术有限公司 A kind of method and apparatus for establishing index
CN110222265A (en) * 2019-05-28 2019-09-10 深圳市轱辘汽车维修技术有限公司 A kind of method, system, user terminal and the server of information push
CN111046221A (en) * 2019-12-17 2020-04-21 腾讯科技(深圳)有限公司 Song recommendation method and device, terminal equipment and storage medium
CN111190947A (en) * 2019-12-26 2020-05-22 航天信息股份有限公司企业服务分公司 Ordered hierarchical sorting method based on feedback
CN111190993A (en) * 2019-12-26 2020-05-22 航天信息股份有限公司企业服务分公司 Hierarchical sorting method based on ordered set of keywords
CN111190948A (en) * 2019-12-26 2020-05-22 航天信息股份有限公司企业服务分公司 Retrieval coding method based on keyword sorting
CN111209378A (en) * 2019-12-26 2020-05-29 航天信息股份有限公司企业服务分公司 Ordered hierarchical ordering method based on business dictionary weight
CN111222030A (en) * 2018-11-27 2020-06-02 阿里巴巴集团控股有限公司 Information recommendation method and device and electronic equipment
CN111831884A (en) * 2020-07-14 2020-10-27 深圳市众创达企业咨询策划有限公司 Matching system and method based on information search
CN112765494A (en) * 2017-06-20 2021-05-07 创新先进技术有限公司 Search method and search device
CN112905927A (en) * 2021-03-19 2021-06-04 北京字节跳动网络技术有限公司 Searching method, device, equipment and medium
CN112966177A (en) * 2021-03-05 2021-06-15 北京百度网讯科技有限公司 Method, device, equipment and storage medium for identifying consultation intention
CN111046221B (en) * 2019-12-17 2024-06-07 腾讯科技(深圳)有限公司 Song recommendation method, device, terminal equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002048921A1 (en) * 2000-12-13 2002-06-20 Znow, Inc Method and apparatus for searching a database and providing relevance feedback
WO2006017364A1 (en) * 2004-07-13 2006-02-16 Google, Inc. Personalization of placed content ordering in search results
CN101201838A (en) * 2007-08-21 2008-06-18 新百丽鞋业(深圳)有限公司 Method for improving searching engine based on keyword index using phrase index technique

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002048921A1 (en) * 2000-12-13 2002-06-20 Znow, Inc Method and apparatus for searching a database and providing relevance feedback
WO2006017364A1 (en) * 2004-07-13 2006-02-16 Google, Inc. Personalization of placed content ordering in search results
CN101019118A (en) * 2004-07-13 2007-08-15 谷歌股份有限公司 Personalization of placed content ordering in search results
CN101201838A (en) * 2007-08-21 2008-06-18 新百丽鞋业(深圳)有限公司 Method for improving searching engine based on keyword index using phrase index technique

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
徐小乐: "搜索引擎个性化检索及用户推荐功能的设计与实现", 《中国优秀硕士学位论文全文数据库信息科技辑》 *
黄磊: "基于实例学习的搜索引擎结果优化系统设计与实现", 《中国优秀硕士学位论文全文数据库信息科技辑》 *

Cited By (71)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103268312B (en) * 2013-05-03 2016-04-06 同济大学 A kind of corpus collection system based on user feedback and method thereof
CN103268312A (en) * 2013-05-03 2013-08-28 同济大学 Training corpus collection system and method based on user feedback
CN103294814A (en) * 2013-06-07 2013-09-11 百度在线网络技术(北京)有限公司 Search result recommendation method, system and search engine
WO2014194844A1 (en) * 2013-06-07 2014-12-11 百度在线网络技术(北京)有限公司 Method and system for recommending search result and search engine
CN103391320A (en) * 2013-07-18 2013-11-13 百度在线网络技术(北京)有限公司 Content recommending method and device based on interest point change
CN104346160A (en) * 2013-08-09 2015-02-11 联想(北京)有限公司 Method for processing information and electronic equipment
CN104346160B (en) * 2013-08-09 2018-02-27 联想(北京)有限公司 The method and electronic equipment of information processing
CN104009970A (en) * 2013-09-17 2014-08-27 宁波公众信息产业有限公司 Network information acquisition method
CN103593195A (en) * 2013-11-22 2014-02-19 安一恒通(北京)科技有限公司 Method and device for customizing personalized software
CN105009115A (en) * 2013-11-29 2015-10-28 华为终端有限公司 Method and apparatus for obtaining network resources
CN105009115B (en) * 2013-11-29 2019-06-11 华为终端有限公司 The method and apparatus for obtaining Internet resources
US9965468B2 (en) 2013-11-29 2018-05-08 Huawei Device Co., Ltd. Method and apparatus for acquiring network resource
CN103617266A (en) * 2013-12-03 2014-03-05 北京奇虎科技有限公司 Personalized extension search method, device and system
CN103631929A (en) * 2013-12-09 2014-03-12 江苏金智教育信息技术有限公司 Intelligent prompt method, module and system for search
CN103631929B (en) * 2013-12-09 2016-08-31 江苏金智教育信息股份有限公司 A kind of method of intelligent prompt, module and system for search
CN104102847B (en) * 2014-07-25 2017-11-10 中国科学技术信息研究所 Chinese thesaurus constructing system
CN104102847A (en) * 2014-07-25 2014-10-15 中国科学技术信息研究所 Chinese descriptor list building system
CN104166700A (en) * 2014-08-01 2014-11-26 百度在线网络技术(北京)有限公司 Search term recommendation method and device
CN106156256A (en) * 2015-04-28 2016-11-23 天脉聚源(北京)科技有限公司 A kind of user profile classification transmitting method and system
CN104933092B (en) * 2015-05-19 2018-09-21 苏州工讯科技有限公司 A kind of screening type searching method for industrial products search
CN104933092A (en) * 2015-05-19 2015-09-23 苏州工讯科技有限公司 Screening type searching method aiming at industrial product search
CN105069032A (en) * 2015-07-20 2015-11-18 东南大学 Filtering expression and rendering engine based method for automatically monitoring update of dynamic webpage
CN105117479A (en) * 2015-09-11 2015-12-02 北京金山安全软件有限公司 Acquisition method and processing method of user search behavior information and electronic equipment
CN105302897B (en) * 2015-10-21 2018-11-20 无锡天脉聚源传媒科技有限公司 A kind of acquisition methods and device of search result
CN105302897A (en) * 2015-10-21 2016-02-03 无锡天脉聚源传媒科技有限公司 Search result acquisition method and apparatus
CN105574176A (en) * 2015-12-21 2016-05-11 北京奇虎科技有限公司 Hot word recommending method and device with combination of multiple data sources
CN105447192A (en) * 2015-12-21 2016-03-30 北京奇虎科技有限公司 Method and device for recommending personalized search terms on navigation page
CN105808737A (en) * 2016-03-10 2016-07-27 腾讯科技(深圳)有限公司 Information retrieval method and server
CN107562747A (en) * 2016-06-30 2018-01-09 上海博泰悦臻网络技术服务有限公司 Method for information display and system, electronic equipment and database
CN106407337A (en) * 2016-09-05 2017-02-15 深圳震有科技股份有限公司 Quick search method and system
CN106407337B (en) * 2016-09-05 2019-08-20 深圳震有科技股份有限公司 A kind of method and system of fast search
CN106776743A (en) * 2016-11-18 2017-05-31 广东小天才科技有限公司 A kind of reminding method and device for searching for content
CN108121731A (en) * 2016-11-29 2018-06-05 渡鸦科技(北京)有限责任公司 Intension recognizing method and device
CN108270840A (en) * 2017-01-04 2018-07-10 阿里巴巴集团控股有限公司 A kind of business monitoring, the searching method of business datum, device and electronic equipment
CN106919693A (en) * 2017-03-07 2017-07-04 广州优视网络科技有限公司 It is a kind of to improve the method and apparatus that hot word exposes coverage rate
CN106919693B (en) * 2017-03-07 2020-12-01 阿里巴巴(中国)有限公司 Method and device for improving hot word exposure coverage rate
CN107423355A (en) * 2017-05-26 2017-12-01 北京三快在线科技有限公司 Information recommendation method and device, electronic equipment
CN112765494A (en) * 2017-06-20 2021-05-07 创新先进技术有限公司 Search method and search device
CN107330023A (en) * 2017-06-21 2017-11-07 北京百度网讯科技有限公司 Content of text based on focus recommends method and apparatus
CN107330023B (en) * 2017-06-21 2021-02-12 北京百度网讯科技有限公司 Text content recommendation method and device based on attention points
US10671656B2 (en) 2017-06-21 2020-06-02 Beijing Baidu Netcom Science And Technology Co., Ltd. Method for recommending text content based on concern, and computer device
CN107346336A (en) * 2017-06-29 2017-11-14 北京百度网讯科技有限公司 Information processing method and device based on artificial intelligence
US11620321B2 (en) 2017-06-29 2023-04-04 Beijing Baidu Netcom Science And Technology Co., Ltd. Artificial intelligence based method and apparatus for processing information
CN107341251A (en) * 2017-07-10 2017-11-10 江西博瑞彤芸科技有限公司 A kind of extraction and the processing method of medical folk prescription and keyword
CN107832332A (en) * 2017-09-29 2018-03-23 北京奇虎科技有限公司 The method, apparatus and electronic equipment for recommending word are generated in navigating search frame
CN110019646A (en) * 2017-10-12 2019-07-16 北京京东尚科信息技术有限公司 A kind of method and apparatus for establishing index
WO2019072007A1 (en) * 2017-10-12 2019-04-18 阿里巴巴集团控股有限公司 Data processing method and device
TWI710917B (en) * 2017-10-12 2020-11-21 開曼群島商創新先進技術有限公司 Data processing method and device
CN107679211A (en) * 2017-10-17 2018-02-09 百度在线网络技术(北京)有限公司 Method and apparatus for pushed information
CN107679211B (en) * 2017-10-17 2021-12-28 百度在线网络技术(北京)有限公司 Method and device for pushing information
US11151206B2 (en) 2017-10-17 2021-10-19 Baidu Online Network Technology (Beijing) Co., Ltd. Method and apparatus for pushing information
CN107748745A (en) * 2017-11-08 2018-03-02 厦门美亚商鼎信息科技有限公司 A kind of enterprise name keyword extraction method
CN107748745B (en) * 2017-11-08 2021-08-03 厦门美亚商鼎信息科技有限公司 Enterprise name keyword extraction method
CN111222030A (en) * 2018-11-27 2020-06-02 阿里巴巴集团控股有限公司 Information recommendation method and device and electronic equipment
CN111222030B (en) * 2018-11-27 2023-10-20 阿里巴巴集团控股有限公司 Information recommendation method and device and electronic equipment
CN109543113A (en) * 2018-12-21 2019-03-29 北京字节跳动网络技术有限公司 Determine method, apparatus, storage medium and the electronic equipment clicked and recommend word
CN109543113B (en) * 2018-12-21 2022-02-01 北京字节跳动网络技术有限公司 Method and device for determining click recommendation words, storage medium and electronic equipment
CN110222265A (en) * 2019-05-28 2019-09-10 深圳市轱辘汽车维修技术有限公司 A kind of method, system, user terminal and the server of information push
CN110222265B (en) * 2019-05-28 2022-02-08 深圳市轱辘车联数据技术有限公司 Information pushing method, system, user terminal and server
CN111046221B (en) * 2019-12-17 2024-06-07 腾讯科技(深圳)有限公司 Song recommendation method, device, terminal equipment and storage medium
CN111046221A (en) * 2019-12-17 2020-04-21 腾讯科技(深圳)有限公司 Song recommendation method and device, terminal equipment and storage medium
CN111190947B (en) * 2019-12-26 2024-02-23 航天信息股份有限公司企业服务分公司 Orderly hierarchical ordering method based on feedback
CN111190948A (en) * 2019-12-26 2020-05-22 航天信息股份有限公司企业服务分公司 Retrieval coding method based on keyword sorting
CN111209378A (en) * 2019-12-26 2020-05-29 航天信息股份有限公司企业服务分公司 Ordered hierarchical ordering method based on business dictionary weight
CN111190993A (en) * 2019-12-26 2020-05-22 航天信息股份有限公司企业服务分公司 Hierarchical sorting method based on ordered set of keywords
CN111190947A (en) * 2019-12-26 2020-05-22 航天信息股份有限公司企业服务分公司 Ordered hierarchical sorting method based on feedback
CN111209378B (en) * 2019-12-26 2024-03-12 航天信息股份有限公司企业服务分公司 Ordered hierarchical ordering method based on business dictionary weights
CN111831884A (en) * 2020-07-14 2020-10-27 深圳市众创达企业咨询策划有限公司 Matching system and method based on information search
CN112966177A (en) * 2021-03-05 2021-06-15 北京百度网讯科技有限公司 Method, device, equipment and storage medium for identifying consultation intention
CN112966177B (en) * 2021-03-05 2022-07-26 北京百度网讯科技有限公司 Method, device, equipment and storage medium for identifying consultation intention
CN112905927A (en) * 2021-03-19 2021-06-04 北京字节跳动网络技术有限公司 Searching method, device, equipment and medium

Also Published As

Publication number Publication date
CN102930022B (en) 2015-11-25

Similar Documents

Publication Publication Date Title
CN102930022B (en) User oriented information search engine system and method
US11126647B2 (en) System and method for hierarchically organizing documents based on document portions
Deshpande et al. Building, maintaining, and using knowledge bases: a report from the trenches
Gao et al. Navigating the data lake with datamaran: Automatically extracting structure from log datasets
Nagwani Summarizing large text collection using topic modeling and clustering based on MapReduce framework
US20080077570A1 (en) Full Text Query and Search Systems and Method of Use
CN103092943B (en) A kind of method of advertisement scheduling and advertisement scheduling server
KR101285721B1 (en) System and method for generating content tag with web mining
Efstathiou et al. Semantic source code models using identifier embeddings
EP2013788A2 (en) Full text query and search systems and method of use
CN104699841A (en) Method and device for providing list summary information of search results
CN104408115A (en) Semantic link based recommendation method and device for heterogeneous resource of TV platform
Garrido et al. Temporally anchored relation extraction
Benitez et al. Semantic knowledge construction from annotated image collections
Chen et al. WTR: A test collection for web table retrieval
Jones Improving collection understanding for web archives with storytelling: shining light into dark and stormy archives
CN116304347A (en) Git command recommendation method based on crowd-sourced knowledge
Csurka et al. Unsupervised visual and textual information fusion in multimedia retrieval-a graph-based point of view
Gazendam et al. Automatic annotation suggestions for audiovisual archives: Evaluation aspects
Boguraev et al. A natural language front end to databases with evaluative feedback
CN113377896A (en) Full-text quick retrieval method and device, electronic equipment and storage medium
Ding et al. Hierarchical clustering for micro-learning units based on discovering cluster center by LDA
Rakib et al. Fast clustering of short text streams using efficient cluster indexing and dynamic similarity thresholds
Van Britsom et al. Automatically generating multi-document summarizations
Nadim et al. A Comparative Assessment of Unsupervised Keyword Extraction Tools

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant