CN107122447A - The network searching system and control method of a kind of multi-data source fusion based on preference - Google Patents

The network searching system and control method of a kind of multi-data source fusion based on preference Download PDF

Info

Publication number
CN107122447A
CN107122447A CN201710274669.3A CN201710274669A CN107122447A CN 107122447 A CN107122447 A CN 107122447A CN 201710274669 A CN201710274669 A CN 201710274669A CN 107122447 A CN107122447 A CN 107122447A
Authority
CN
China
Prior art keywords
user
data source
unit
information
preference
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710274669.3A
Other languages
Chinese (zh)
Inventor
徐名海
王笃会
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Post and Telecommunication University
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing Post and Telecommunication University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Post and Telecommunication University filed Critical Nanjing Post and Telecommunication University
Priority to CN201710274669.3A priority Critical patent/CN107122447A/en
Publication of CN107122447A publication Critical patent/CN107122447A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Abstract

The invention discloses a kind of network searching system of multi-data source fusion based on preference and control method, include keyword input block, the first data source unit, the second data source unit, the 3rd data source unit, the first pretreatment unit, the second pretreatment unit, the 3rd pretreatment unit, pre- integral unit, integral unit, user preferences modeling unit, terminal adaptation unit, the present invention can not only obtain the search result for meeting oneself preference, and can obtain Consumer's Experience more more preferable than other search platforms;The accuracy of information result is embodied in first, it is not necessary to check that many bar information are obtained with the search result of oneself satisfaction.Moreover, the accuracy for search result has certain guarantee;Secondly, the presentation of search result meets the use habit of user usually, it frequently need not be set manually;Finally, social networks is incorporated, the exchange between user can be promoted, enrich the way of search of user.

Description

The network searching system and control method of a kind of multi-data source fusion based on preference
Technical field
The invention belongs to data processing field, more particularly to a kind of web search system of the multi-data source fusion based on preference System and control method.
Background technology
With the fast development of computer and Internet technology, the convenience that people obtain information has obtained greatly carrying It is high.In particular with the popularization of internet and mobile terminal, the search to mass data in network is increasingly becoming people's research Important topic.Oneself desired information can be inquired for the ease of user, various network searching systems arise at the historic moment.User By inputting keyword to network system, it is possible to find the web page contents comprising keyword.
However, because internet data amount is increasingly huge, scanned for by network searching system, acquisition meets condition Search result it is also very many, and search result is not often ranked up sequentially reasonably according to one.For these search As a result generally require the multiple click of progress and check operation, can just find and meet itself desired search result.
On the other hand, existing search result will not account for evaluation of the other users for the information, it is impossible to use Family provides more accurately search result.Secondly, it is existing to search for the fusion for equally ignoring social networks, with social networks skill Exchange between the development of art, people becomes more frequent, and user searches desired knot by existing search system After fruit, it is impossible to realize the instant experience exchangement between user.User exchanges search result by platform, can share to search knot The understanding and gains in depth of comprehension of fruit.Such as:Xxx comes a strange city and played, it is desirable to finds one and is more conform with oneself requirement Sight spot.Then xxx opens the browser on mobile phone, and the information searched out is varied, searches more time-consuming.Xxx is beaten Prescribe and comment website, check that the user at related sight spot evaluates one by one, but single information obtained from comment website is not accurate enough, oneself Also screening and filtering is carried out to bulk information.Results contrast trouble is so obtained, and also there is doubt to the reliability of information. Xxx brushes microblogging after thinking, and halo one shows, and microblogging has some users for the recommendation at related local sight spot and commented Valency, can also be used as a reference.Xxx contemplates consulting associated friends, but the friend understood this side again it is few there is provided letter Cease also limited, the people near on-line consulting, to result again can not be very believe.But can just be improved with this search system The situation, because this search system can set up preference pattern according to the preference of user first, user is carrying out keyword search When, this search system can call the interface of each search engine, and each search result is carried out into information sifting first, will not meet The information filtering of user preference, it is to best suit user preference custom so to ensure the information obtained.Secondly user can also call phase The user reviews data that website is commented at sight spot are closed, increase the reliability of data.The user of this certain search can also obtain social activity Data in network carry out corresponding Data correction.Can be online good friend and user can carry out on-line consulting, can also It is neighbouring people, obtains recommendation information.Then the system can to from several respects data source provide data message according to Family preference pattern carries out the calculating of preference Relevance scores, is ranked up according to score height.Last the system can be according to user Selection, related preferences custom and the adaptation of terminal are presented.Meanwhile, the final result selection of user and presentation mode Selection all can play the role of renewal to user preferences modeling.Finally, user can not only obtain expects closer data with oneself Information, meanwhile, Consumer's Experience as snug as a bug in a rug can be also brought to user.
Prior art related to the present invention:
Prior art one:It is a kind of to search for the network search method being combined with social activity and its system;
It is in its technical scheme:User obtains initial search result and relevant user information by inputting keyword.Then Relevant user information is sorted according to similarity and the use state of matching user, matching user's sequence is formed, user will be matched Sequence and search result information are incorporated into a page presentation to user.Finally according to the page info for showing user, selection Match user mutual to inquire into, search result is further screened, obtain final search result.
The shortcoming of prior art one:
1. data source is single, matching user's sequence, is not improved for search result only more than general search.
2. Lexical Similarity comparison for calculation methods obfuscation, there is certain use limitation;Wherein, address phase recency is determined Justice is not entirely accurate, is existed a kind of possible:When user searches in strange land, this definition method is inaccurate.
3. the program only considered the final choice with relevant matches user, the preference pattern of individual subscriber is not accounted for. Personal preference pattern more has reference value for the selection of user's search result.
Prior art two related to the present invention
The technical scheme of prior art two:
A kind of information search method and system based on multi-data source.
Multiple data sources are divided into primary data source and time data source by the program, then according to query word, from each main number According to target data is searched in source and time data source, target data is found according to from each data source, determined number is according to source Data entries.Then the Data entries of the target data found from each primary data source and each data source are entered Row hybrid-sorting, search result is determined according to hybrid-sorting result.
The shortcoming of prior art two:
1. the number of data source number is only considered, but the species of data source is consistent.Other species are not accounted for Data source is verified to search result.
2. data search process does not account for user preference, the result searched out is caused not necessarily to meet the preference of user.
3. the division of data source primary and secondary is not obvious, lack clearly definition.
The content of the invention
The technical problems to be solved by the invention are the deficiencies for background technology there is provided a kind of majority based on preference The network searching system and control method merged according to source.
The present invention uses following technical scheme to solve above-mentioned technical problem
A kind of network searching system of the multi-data source fusion based on preference, includes keyword input block, the first data Source unit, the second data source unit, the 3rd data source unit, the first pretreatment unit, the second pretreatment unit, the 3rd pretreatment Unit, pre- integral unit, integral unit, user preferences modeling unit, terminal adaptation unit;
Wherein, keyword input block, for obtaining user input data information;
First data source unit, for being scanned for keyword;
Second data source unit, for user's keyword public praise Website Evaluation information search;
3rd data source unit, for the search to key word information in social networks;
First pretreatment unit, the search result for the multi-data source to belonging to the first data source unit is inclined according to user Good model carries out the information sifting of all search results, will not meet the information filtering of user preference;
Second pretreatment unit, under user preferences modeling control to belonging to the multi-data source of the second data source unit Search result arranged from high to low according to the score of every result or according to sequence of the result from excellent to bad of every result;
3rd pretreatment unit, the acquisition of row information can be entered and to information according to complete for user by social networks Whole degree is ranked up;
Pre- integral unit, for the information of the first data source unit and the pretreatment of the second data source unit to be integrated;
Integral unit, for the output result of pre- integral unit and the output result of the 3rd pretreatment unit to be carried out again Integrate;
User preferences modeling unit, for setting up user preferences modeling, is carried out to pre- integral unit and integral unit information Sequence correction;
Terminal adaptation unit, shows for garbled information to be set according to user preference and terminal device.
A kind of control method of the network searching system of multi-data source fusion, is specifically comprised the following steps;
Step 1, the preference of user is obtained, user preferences modeling is set up;
Step 2, the keyword of user's input is obtained, the first data source unit, the first data are obtained respectively according to keyword Source unit, the search result of the first data source unit and relevant user information, so obtain different data sources search result or Seek advice from result;
Step 3, the output result of step 2 is pre-processed according to the preference pattern of user;
Step 4, the output result of the first pretreatment unit and the second pretreatment unit is entered to the pre- integration of row information;
Step 5, the result of the output result of the 3rd pretreatment unit and step 4 is integrated again;
Step 6, information result step 5 integrated is accustomed to according to the performance of terminal adaptation unit and the preference of user Shown.
The further preferred scheme of the control method of the network searching system merged as a kind of multi-data source of the invention, In step 1, user interest is extracted by method that is explicit and implicitly combining.
The further preferred scheme of the control method of the network searching system merged as a kind of multi-data source of the invention, In step 2, the source of user profile specifically includes following aspect:
(1) the user interest preference model that user oneself definition sets and changed;
(2) in searching interface, the search key of user's input;
(3) user collects;
(4) user browsing behavior.
It is used as a kind of further preferred scheme of the control method of the network searching system of multi-data source fusion of the invention, institute Step 3 is stated specifically to comprise the following steps:
Step 3.1, by the first pretreatment unit to the search result of the multi-data source of the first data source unit according to Family preference pattern carries out information sifting;
Step 3.2, the search result of the multi-data source of the second data source is tied according to every by the second pretreatment unit The score of fruit is arranged from high to low;
Step 3.3, user can be entered the acquisition of row information by social networks and information is arranged according to integrity degree Sequence.
It is used as a kind of further preferred scheme of the control method of the network searching system of multi-data source fusion of the invention, institute Step 4 is stated specifically to comprise the following steps:
Step 4.1, the first data source pretreatment unit and the second pretreated information of data source pretreatment unit are entered Row is integrated;
Step 4.2, acceptable standard of the user for the pretreated information of the second data source unit is judged;
Step 4.3, the information and the Relevance scores phase of the user evaluation of 4.2 pairs of each bar information and step 4.1 drawn Multiply, calculate the new score of every information, resequenced according to this newest score, obtain the result after pre- integration;Step 4.4, step 3.3 gained information aggregate is calculated into Relevance scores according to above- mentioned information and preference relevance algorithms, with step 4.3 The data acquisition system of gained, sorts according to information score again.
The further preferred scheme of the control method of the network searching system merged as a kind of multi-data source of the invention, In step 2, size of the preference of the user custom comprising display font, brightness, the search result bar number of every page of display;Terminal adaptation The performance of unit includes the performance and resolution ratio of processor.
The present invention uses above technical scheme compared with prior art, with following technique effect:
1st, the foundation and renewal of user preferences modeling:User preferences modeling is the setting according to preference during user's registration first Set up, the preference pattern of user is updated according to the navigation patterns of user when user carries out user's inquiry;
2nd, the fusion of multi-data source is realized:The present invention is broadly divided into three class data sources, and the first data source is search engine, such as Baidu, Google etc.;Second data source is data pick-up under comment group buying websites or smart city etc., comments on group buying websites Including:New U.S. is big, take journey etc., and the present invention mainly obtains its evaluation to searched result, and the sensor under smart city includes The sensors such as air quality, traffic conditions, the present invention mainly obtains its gathered data, and relevant search is corrected;3rd number It is social networks according to source, including microblogging, wechat, QQ etc., on-line consulting on the one hand can be carried out, seeks to recommend, on the other hand may be used State and article on social networks are issued to obtain others;By carrying out data according to the respective characteristic of three data sources The problem of merging, data mapping search result can be solved inaccurately, and do not account for user's individual preference, so as to provide To the more preferable search result of user and Consumer's Experience;
3rd, by using this search system, the search result for meeting oneself preference can be not only obtained, and can obtain Consumer's Experience more more preferable than other search platforms;The accuracy of information result is embodied in first, it is not necessary to check many bar information The search result of oneself satisfaction is obtained with, moreover, the accuracy for search result has certain guarantee;Secondly, search As a result presentation meets the use habit of user usually, it frequently need not be set manually;Finally, social networks is combined Come in, the exchange between user can be promoted, enrich the way of search of user.
Brief description of the drawings
Fig. 1 is for the overall structure diagram of multi-data source emerging system provided by the present invention;
Fig. 2 is the processing structure figure of multi-data source emerging system provided by the present invention;
Fig. 3 is the terminal adaptation flow chart of multi-data source emerging system provided by the present invention;
Fig. 4 is user preference modeling structure figure provided by the present invention;
Fig. 5 is user preference modeling procedure figure provided by the present invention;
Fig. 6 is that user preferences modeling provided by the present invention updates flow chart.
Embodiment
Technical scheme is described in further detail below in conjunction with the accompanying drawings:
The present invention can set up preference pattern according to the preference of user first, and user is when keyword search is carried out, originally Invention can call the interface of each search engine, and each search result is carried out into information sifting first, will not meet user preference Information filtering, it is to best suit user preference custom so to ensure the information obtained.Secondly user can also call related keyword point The user reviews data of website are commented, increase the reliability of data.The user of this certain search can also be obtained in social networks Data carry out corresponding Data correction.And user can carry out on-line consulting, can be online good friend or near People, obtains recommendation information.Or the search of associative key is carried out in microblogging, the search result of great deal of related information is had, This part search result is screened according to user preferences modeling, the information for meeting user preference is selected.Then to correlation letter Cease and sorted according to the propagation effect of the information, mainly thumb up and forwarding quantity sum is used as measurement index, it can be ensured that come The reliability of information above.Then the present invention can be to the data from the first data source unit and the offer of the second data source unit Information is pre-processed according to user preferences modeling, is ranked up according to score height.Then through social network data pair Pretreated data carry out verification sequence.Last the system can according to user selection, related preferences custom and terminal Adaptation is presented.Meanwhile, the final result selection of user and presentation mode selection all can have renewal to user preferences modeling Effect.Finally, user can not only obtain expects closer data message with oneself, meanwhile, it can also be brought very to user Comfortable Consumer's Experience.
It provide firstly a kind of network searching system of the multi-data source fusion based on preference, its system architecture such as Fig. 1 institutes Show, it includes keyword input block, data source unit, pretreatment unit, pre- integral unit, integral unit, user preference mould Type unit, terminal adaptation unit.Wherein, the keyword that data source unit is obtained by keyword input block carries out each data source Keyword search;Pretreatment unit according to obtained by the first data source unit and the second data source unit information characteristic according to Family preference pattern is pre-processed;Integral unit is then by the 3rd pretreated data of data source unit and the number after pre- integrate Sequence is reconfigured according to according to user preferences modeling progress;Terminal adaptation unit be then user's adaptive terminal disposal ability and Friendly interface is presented to user according to user preferences modeling;User preferences modeling unit is manually set and user according to user's Behavior is set up, and is present in each data source processing stage, plays the role of directive property to data processing, while preference pattern can be real-time Study renewal is carried out according to user behavior.
First pretreatment unit, search result for the multi-data source to belonging to the first data source unit first according to Family preference pattern carries out the information sifting of all search results, will not meet the information filtering of user preference;Data source 1 can be Search engine:Such as Baidu, Google etc..For example:Same keyword scans for obtaining above in several different search engines Obtain different search result sets;
Second pretreatment unit, under user preferences modeling control to belonging to the multi-data source of the second data source unit Search result arranged from high to low according to the score of every result or according to sequence of the result from excellent to bad of every result. Data source 2 can be comment website, such as:Group of U.S., masses comment on or taken the sensor number under journey etc. or smart city According to, such as traffic conditions, air quality situation etc..For example:The same keyword score that user evaluates on different comment websites It is different.First by its evaluation criterion 5 points of systems of unified chemical conversion, such as 8 points are obtained in the evaluation website of 10 points of full marks, then this is evaluated Divide and be converted into 4 points, then the averaging point by unitized each website to same keyword, arranged according to average mark height Sequence;User preferences modeling is contrasted again, by the information filtering of user preference is not met, leaves the information for meeting user's requirement.Compare again Such as:Route is scanned for, the selection of a plurality of route is had, providing the selection of user's priority according to user preferences modeling is carried out Sequence from optimal to worst.As user wishes that the used time minimum, sorts from less to more according to the time.
3rd pretreatment unit, the acquisition of row information can be entered and to information according to complete for user by social networks Whole degree is ranked up.Data source 3 can be this platform user or terminal positioning near people and social networks, such as it is micro- The people and online registration user for the vicinity that letter, QQ, microblogging, or this platform are provided.Such as:A problem of seeking help is issued, can The return of multiple information can be had;Or seek help different good friends and neighbouring people, have different answers.Or in microblogging The search of associative key is carried out, the search result of great deal of related information is had, to this part search result according to user preference Model is screened, and selects the information for meeting user preference.Then relevant information is sorted according to the propagation effect of the information, it is main If thumb up and forwarding quantity sum are used as measurement index, it can be ensured that come the reliability of information above.
Pre- integral unit, the information for data source 1 and data source 2 to be pre-processed is integrated.First to the knot of data source 1 Fruit document information and user preference information carry out Relevance scores calculating, so as to be really achieved the purpose of personalization.
Integral unit, for information aggregate obtained by step S33 to be calculated into phase with preference relevance algorithms according to above- mentioned information Closing property score, with the data acquisition system obtained by step S43, sorts according to information score again.In sequencer procedure, information is first confirmd that Between whether correspond to same subject, such as same hotel, same sight spot, if in the presence of seeking the average mark of two scores;If nothing, Then directly sorted according to score value size.So as to the result after being integrated.
2.2.2. the processing structure figure of multi-data source emerging system:
On the basis of said structure figure, the present invention further thinning preprocess unit, pre- integral unit, integral unit part Processing, structure chart such as Fig. 2.Comprise the following steps that
S1, obtains the preference of user, sets up user preferences modeling;The keyword of user's input is obtained, is obtained according to keyword Obtain search result tentatively and relevant user information;
S2, obtains the keyword of user's input, and preliminary search result is obtained according to keyword, obtains different data sources Search result or consulting result;
S3, is pre-processed according to the preference pattern of user to the result of each data source;
S4, the result that the first pretreatment unit and the second data source unit are pre-processed enters the pre- integration of row information;
S5, the result of the 3rd data source unit and S4 is integrated again;
S6, is shown that user is by the information result integrated according to the performance of terminal and the preference custom of user Oneself desired information can be obtained.
Wherein more preferably, step S1 further comprises following steps:
S11, system extracts user interest by the method that person is implicit and implicitly combines, and carries out the digging of user interest preference Pick.The excavation and extraction of user interest are the first steps for setting up user preferences modeling, efficient and effective user interest model It is favorably improved efficiency and the degree of accuracy of whole system.
Wherein, the main source of user profile mainly has the following aspects:
(1) user oneself definition is set and modification user interest preference model;
(2) in searching interface, the search key of user's input
(3) user collects:Oneself document interested is added in collection by usual user to be preserved, so as to next time Conduct interviews again, so, the present invention makes the following assumptions:Think that the document has user's sense if user's collection current document emerging The content of interest.
(4) user browsing behavior:When user activates current document and system is active, user is clear in the document Residence time of looking at is longer, represents that its interest to the page is bigger.
S12, the present invention expresses the interest of user using the keyword set of field theme set of words and each field descriptor Hobby.Available area theme node set { k1,k2,k3,…,knRepresent, the keyword set of i-th of theme shares { ki1, ki2,ki3,…,kin}.It is the search procedure for finding user interest preference from field subject key words set.Wherein, keyword Weight, the present invention is using the method for expressing in TF-IDF algorithms:
Wherein, WijFor jth keyword k in the i-th themeijWeight, TF (kij, d) it is keyword kijOccur in document d Number of times, n is total number of documents, DF (kij) it is the number of files comprising keyword, the weights of each keyword are calculated successively, can be handy Family interest topic vector, the weight of field theme node is represented with the weight sum of its attached child node;
Wherein more preferably, step S3 further comprises following steps:
S31, pretreatment 1 be the problem of solve to belong to the search result of the multi-data source of data source 1 first according to Family preference pattern carries out the information sifting of all search results, will not meet the information filtering of user preference.Data source 1 can be Search engine:Such as Baidu, Google etc..For example:Same keyword scans for obtaining above in several different search engines Obtain different search result sets;
S32, the second pretreatment unit problem to be solved is to belonging to the second data under user preferences modeling control The search result of the multi-data source of source unit is arranged or according to the knot of every result from high to low according to the score of every result Sequence of the fruit from excellent to bad.Second data source unit can be comment website, such as:Group of U.S., masses comment on or take journey etc., also may be used To be the sensing data under smart city, such as traffic conditions, air quality situation etc..For example:Same keyword is different Comment on the score that user evaluates on website different.First by its evaluation criterion 5 points of systems of unified chemical conversion, such as commenting in 10 points of full marks Valency website obtains 8 points, then the evaluation score is converted into 4 points, then the averaging by unitized each website to same keyword Point, it is ranked up according to average mark height;User preferences modeling is contrasted again, will not met the information filtering of user preference, is left Meet the information of user's requirement.Again such as:Route is scanned for, the selection of a plurality of route is had, according to user preferences modeling Sequence of user's priority selection progress from optimal to worst is provided.As user wishes that the used time is minimum, according to the time from it is few to Many sequences.
S33, the 3rd pretreatment unit problem to be solved is that user can enter the acquisition of row information by social networks And information is ranked up according to integrity degree.Data source 3 can be this platform user or terminal positioning near people with And the people and online registration user of social networks, such as wechat, QQ, microblogging, or the vicinity of this platform offer.Such as:Issue One problem of seeking help, might have the return of multiple information;Or seek help different good friends and neighbouring people, have different answer Case.Or the search of associative key is carried out in microblogging, the search result of great deal of related information is had, this part is searched for and tied Fruit is screened according to user preferences modeling, selects the information for meeting user preference.Then to relevant information according to the information Propagation effect sorts, and mainly thumb up and forwarding quantity sum is used as measurement index, it can be ensured that come information above can By property.
S41, the information that the first data source unit and the second data source unit are pre-processed is integrated.First to data source 1 result document information and user preference information carry out Relevance scores calculating, so as to be really achieved the purpose of personalization.
Wherein more preferably, its document information and preference Relevance scores algorithm are expressed as follows:
(1) according to searching keyword, the search result set S1 after pretreatment is utilized
(2) iterations i=0 is put;
(3) to i-th document in set S1, document is integrated into feature set of words and by user interest preference model In Key Word Description be characterized word sequence, according to feature set of words, set up and field theme feature space vector;
(4) according to following equation, the similarity of file characteristics space vector and user interest profile space vector is calculated;
Wherein, DjThe feature set of words represented for each document gram, p represents the field theme set of user preference;
piIt is field theme, kijIt is i-th k-th of keyword of field theme, djIt is document j, n is that relevant search document is total Number, j is each keyword in the theme of traversal i fields;
(5) according to similarity Sim (Pi,Dj) to theme carry out descending arrangement, document is included into the document similarity most High field theme Pi
(6) if document i is last document in set S1, turn (5);Otherwise, i=i+1 is put, is returned (3);
(7) according to user to field theme PiPreference be HiSize be ranked up, determine document of the ranking in front end Belong to the field theme that user is most interested in, generate new list collection S2..
Wherein, more preferably, document scores are as follows:
Cosine value between (1) two normalized vector, computational methods are:
(2) the entry weight in user interest preference model, used here as above-mentioned TF-IDF algorithms, calculation formula is:
(3) the keyword frequency of occurrences, keyword entry institute's accounting in a document in a document in user interest preference model Weight.Therefore, result document is scored, in addition it is also necessary to which calculating has:User interest preference vector and document vector;Interest topic word Pi The frequency of occurrences in a document;The length ratio of the length of all keyword entries and document in document.User interest preference mould In type, descriptor PiP is combined into by the collection that entry is obtained after word segmentation processingi=(ki,k2,...kn), descriptor PiIn document DjIn The frequency of occurrences and entry coverage rate Gra (pi,Dj), calculation formula is as follows:
Gra(pi, Dj)=Fre (pi, Dj)*DL(pi/Dj)*∑tf(pi/Dj)
Wherein Fre (pi,Dj) represent document DjIn, interest topic word piThe number of times divided by interest topic word p of appearanceiDivided The total entry number n obtained after word processing, for example, interest topic word piIn result document DjIn occur in that m times, then Fre (pi, Dj)=m/n.
Wherein tf (pi/Dj) represent word piThe entry parsed is in document DjIn frequency, the present invention define the frequency be word The square root of occurrence number of the bar in result document.
Wherein DL (pi/Dj) the Document Length factor is represented, calculation formula is as follows:
DL(pi/Dj)=1.0/math.aqrt (unm)
Wherein num represents the entry sum in result document.
(4) in summary factor, the scoring to result document is represented by:
Score (U, Dj)=Gra (pi, Dj)*wij*Sim(pi, Dj)
Scored number is higher, and the degree of correlation for representing result document and user is bigger.Then according to relevance scores from high to low Arrangement.
S42 is commented with hotel, it is necessary to judge acceptable standard of the user for the pretreated information of the second data source unit It is divided into example, such as full marks are 5 points, and the acceptable minimum standard of user is 4 points, then 4 points of hotel's weights are 1, and score value is 4.5 Comment correspondence weights are 4.5/4=1.125, and score value is that 3 points of hotel's correspondence weights are 3/4=0.75, and each hotel is calculated successively Weights.
S43, the information that evaluation and S41 by S42 to each bar information are drawn is multiplied with the Relevance scores of user, calculates The new score of every information, resequences according to this newest score, obtains the result after pre- integration.
S5, Relevance scores are calculated by information aggregate obtained by step S33 according to above- mentioned information and preference relevance algorithms, with Data acquisition system obtained by step S43, sorts according to information score again.Whether in sequencer procedure, it is right between information to first confirm that Should be in same subject, such as same hotel, same sight spot, if in the presence of seeking the average mark of two scores;If nothing, directly according to Score value size sorts.So as to the result after being integrated.
2.2.3 the terminal adaptation flow chart of multi-data source emerging system
On the basis of said structure figure, the terminal adaptation flow chart of multi-data source emerging system provided by the present invention is as schemed 3.Information aggregate after user integrates first determines whether it is user's acquiescence selection, such as want inquiry is neighbouring hotel, usually Hotel brand is the factor of overriding concern, therefore the user preferences modeling established pays the utmost attention to hotel brand recommendation.But use Family is now relatively more tired, wants to preferably select apart near hotel, therefore user can be preferential with chosen distance, and preference pattern is according to user Selection immediate updating.Meanwhile, information score is recalculated according to preference with information score, rearrangement.It is inclined according still further to user Interface presentation is carried out well, and such as user often uses font size, the bar number not shown etc..
2.2.4 user preference modeling structure figure
The preference pattern modeling such as Fig. 4 of multi-data source emerging system provided by the present invention.When first according to user's registration The selection of preference carries out the identification of user preference interest, and user preference is classified, field theme set of words and each field master The keyword set of epigraph expresses the hobby of user.Available area theme node set use k1, k2, k3 ..., and kn } carry out table Show, the keyword set of i-th of theme is shared { ki1, ki2, ki3 ..., kin }.It is to find user from field subject key words set The search procedure of interest preference.Sight spot is as a field, and its keyword then includes all properties related to sight spot, such as valency Lattice, place, feature etc..All user preference informations collected are sorted out, while entering according to TF-IDF algorithms Row keyword weight is calculated.First determine whether it is which field belonged to when user inputs search term, then according to imparting weight Each attribute carries out the calculating of relevant documentation score, and specific modeling process is shown in 2.2.5.At the same time it can also be inquired and browsed according to user The users such as behavior custom updates the database of user preference, and detailed process is shown in 2.2.6.
2.2.5 user preference modeling procedure figure:
Particular user preference modeling method such as Fig. 5.New user is determined whether when User logs in, if new user is then Registration is needed, the setting of interest preference is carried out according to the rule of default, it is emerging that backstage automatically updates user according to the behavior of user Interesting preference storehouse.If user has registered, input inquiry keyword, track user navigation patterns, as checking information when Long and attributes field etc., extracts the attribute keyword under the field in subordinate act analysis, by the weights of the attribute with it is set Threshold value be compared.If weights are more than threshold value, storehouse is updated the data, otherwise is abandoned.Specific update method is shown in 2.2.6.
2.2.6 user preferences modeling renewal flow chart is as shown in Figure 6:
User's reading documents Dj, system statistics goes out in document to occur characteristic key words kj, and calculated according to TF-IDF algorithms Keyword kjWeight coefficient wj.If keyword kjIt is present in model library, then by kjFormer weights and the weights newly calculated wjIt is added, draws new kjWeight coefficient.If keyword kjIt is not present in model library, then by kjWith weight wjAll it is added to mould Type storehouse, so as to have updated the preference pattern of user.

Claims (7)

1. a kind of network searching system of the multi-data source fusion based on preference, it is characterised in that:Comprising keyword input block, First data source unit, the second data source unit, the 3rd data source unit, the first pretreatment unit, the second pretreatment unit, Three pretreatment units, pre- integral unit, integral unit, user preferences modeling unit, terminal adaptation unit;
Wherein, keyword input block, for obtaining user input data information;
First data source unit, for being scanned for keyword;
Second data source unit, for user's keyword public praise Website Evaluation information search;
3rd data source unit, for the search to key word information in social networks;
First pretreatment unit, the search result for the multi-data source to belonging to the first data source unit is according to user preference mould Type carries out the information sifting of all search results, will not meet the information filtering of user preference;
Second pretreatment unit, for being searched under user preferences modeling control to the multi-data source for belonging to second data source unit Hitch fruit arranges or according to sequence of the result from excellent to bad of every result from high to low according to the score of every result;
3rd pretreatment unit, the acquisition of row information can be entered and to information according to integrity degree for user by social networks It is ranked up;
Pre- integral unit, for the information of the first data source unit and the pretreatment of the second data source unit to be integrated;
Integral unit, it is whole again for the output result of pre- integral unit and the output result of the 3rd pretreatment unit to be carried out Close;
User preferences modeling unit, for setting up user preferences modeling, is ranked up to pre- integral unit and integral unit information Correction;
Terminal adaptation unit, shows for garbled information to be set according to user preference and terminal device.
2. a kind of control method of the network searching system of the multi-data source fusion based on described in claim 1, it is characterised in that: Specifically comprise the following steps;
Step 1, the preference of user is obtained, user preferences modeling is set up;
Step 2, the keyword of user's input is obtained, the first data source unit, the first data source list is obtained respectively according to keyword Member, the search result of the first data source unit and relevant user information, and then obtain search result or the consulting of different data sources As a result;
Step 3, the output result of step 2 is pre-processed according to the preference pattern of user;
Step 4, the output result of the first pretreatment unit and the second pretreatment unit is entered to the pre- integration of row information;
Step 5, the result of the output result of the 3rd pretreatment unit and step 4 is integrated again;
Step 6, information result step 5 integrated is accustomed to carrying out according to the performance of terminal adaptation unit and the preference of user Display.
3. a kind of control method of the network searching system of multi-data source fusion according to claim 2, it is characterised in that: In step 1, user interest is extracted by method that is explicit and implicitly combining.
4. a kind of control method of the network searching system of multi-data source fusion according to claim 2, it is characterised in that: In step 2, the source of user profile specifically includes following aspect:
(1) the user interest preference model that user oneself definition sets and changed;
(2) in searching interface, the search key of user's input;
(3) user collects;
(4) user browsing behavior.
5. a kind of control method of the network searching system of multi-data source fusion according to claim 2, it is characterised in that: The step 3 is specifically comprised the following steps:
Step 3.1, it is inclined according to user to the search result of the multi-data source of the first data source unit by the first pretreatment unit Good model carries out information sifting;
Step 3.2, by the second pretreatment unit to the search result of the multi-data source of the second data source according to every result Score is arranged from high to low;
Step 3.3, user can be entered the acquisition of row information by social networks and information is ranked up according to integrity degree.
6. a kind of control method of the network searching system of multi-data source fusion according to claim 2, it is characterised in that: The step 4 is specifically comprised the following steps:
Step 4.1, the first data source pretreatment unit and the pretreated information of the second data source pretreatment unit are carried out whole Close;
Step 4.2, acceptable standard of the user for the pretreated information of the second data source unit is judged;
Step 4.3, the information evaluation of 4.2 pairs of each bar information and step 4.1 drawn is multiplied with the Relevance scores of user, counts The new score of every information is calculated, is resequenced according to this newest score, obtains the result after pre- integration;Step 4.4, it will walk Rapid 3.3 gained information aggregate calculates Relevance scores according to above- mentioned information and preference relevance algorithms, with the number obtained by step 4.3 According to set, sorted again according to information score.
7. a kind of control method of the network searching system of multi-data source fusion according to claim 2, it is characterised in that: In step 2, the preference custom of user comprising the size of display font, brightness, every page of display search result bar number;Terminal is fitted Performance with unit includes the performance and resolution ratio of processor.
CN201710274669.3A 2017-04-25 2017-04-25 The network searching system and control method of a kind of multi-data source fusion based on preference Pending CN107122447A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710274669.3A CN107122447A (en) 2017-04-25 2017-04-25 The network searching system and control method of a kind of multi-data source fusion based on preference

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710274669.3A CN107122447A (en) 2017-04-25 2017-04-25 The network searching system and control method of a kind of multi-data source fusion based on preference

Publications (1)

Publication Number Publication Date
CN107122447A true CN107122447A (en) 2017-09-01

Family

ID=59726290

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710274669.3A Pending CN107122447A (en) 2017-04-25 2017-04-25 The network searching system and control method of a kind of multi-data source fusion based on preference

Country Status (1)

Country Link
CN (1) CN107122447A (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107862004A (en) * 2017-10-24 2018-03-30 科大讯飞股份有限公司 Intelligent sorting method and device, storage medium, electronic equipment
CN108985898A (en) * 2018-07-12 2018-12-11 广东工业大学 A kind of place methods of marking, device and computer readable storage medium
CN109062980A (en) * 2018-07-01 2018-12-21 东莞市华睿电子科技有限公司 One kind commenting on approximate social client account recommended method based on sight spot
CN109299375A (en) * 2018-10-24 2019-02-01 中国平安人寿保险股份有限公司 Information personalized push method, device, electronic equipment and storage medium
CN109815414A (en) * 2019-01-23 2019-05-28 四川易诚智讯科技有限公司 Social networks character relation analysis method based on multitiered network community division
CN109902218A (en) * 2019-01-25 2019-06-18 重庆科技学院 A kind of internet statistical data acquisition methods and system
WO2019128394A1 (en) * 2017-12-29 2019-07-04 Guangdong Oppo Mobile Telecommunications Corp., Ltd. Method for processing fusion data and information recommendation system
CN110069696A (en) * 2017-09-21 2019-07-30 阿里巴巴集团控股有限公司 A kind of searching method, hybrid-sorting method, equipment and system
CN110334325A (en) * 2019-07-16 2019-10-15 同方知网数字出版技术股份有限公司 A kind of full text similarity analysis method compared towards publishing house's strange land resource joint
CN110990437A (en) * 2019-12-05 2020-04-10 大众问问(北京)信息科技有限公司 Data fusion method and device and computer equipment
CN117349535A (en) * 2023-12-04 2024-01-05 四川启明芯智能科技有限公司 Cross-platform multi-business comprehensive travel management system and method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6968332B1 (en) * 2000-05-25 2005-11-22 Microsoft Corporation Facility for highlighting documents accessed through search or browsing
CN101477554A (en) * 2009-01-16 2009-07-08 西安电子科技大学 User interest based personalized meta search engine and search result processing method
CN101770520A (en) * 2010-03-05 2010-07-07 南京邮电大学 User interest modeling method based on user browsing behavior
CN102081604A (en) * 2009-11-27 2011-06-01 上海电机学院 Search method for meta search engine and device thereof

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6968332B1 (en) * 2000-05-25 2005-11-22 Microsoft Corporation Facility for highlighting documents accessed through search or browsing
CN101477554A (en) * 2009-01-16 2009-07-08 西安电子科技大学 User interest based personalized meta search engine and search result processing method
CN102081604A (en) * 2009-11-27 2011-06-01 上海电机学院 Search method for meta search engine and device thereof
CN101770520A (en) * 2010-03-05 2010-07-07 南京邮电大学 User interest modeling method based on user browsing behavior

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张跃火: "基于用户兴趣偏好模型的个性化搜索算法", 《中国优秀硕士学位论文全文数据库信息科技辑》 *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110069696A (en) * 2017-09-21 2019-07-30 阿里巴巴集团控股有限公司 A kind of searching method, hybrid-sorting method, equipment and system
CN107862004A (en) * 2017-10-24 2018-03-30 科大讯飞股份有限公司 Intelligent sorting method and device, storage medium, electronic equipment
US11061966B2 (en) 2017-12-29 2021-07-13 Guangdong Oppo Mobile Telecommunications Corp., Ltd. Method for processing fusion data and information recommendation system
WO2019128394A1 (en) * 2017-12-29 2019-07-04 Guangdong Oppo Mobile Telecommunications Corp., Ltd. Method for processing fusion data and information recommendation system
CN109062980A (en) * 2018-07-01 2018-12-21 东莞市华睿电子科技有限公司 One kind commenting on approximate social client account recommended method based on sight spot
CN108985898A (en) * 2018-07-12 2018-12-11 广东工业大学 A kind of place methods of marking, device and computer readable storage medium
CN108985898B (en) * 2018-07-12 2022-02-15 广东工业大学 Site scoring method and device and computer readable storage medium
CN109299375A (en) * 2018-10-24 2019-02-01 中国平安人寿保险股份有限公司 Information personalized push method, device, electronic equipment and storage medium
CN109815414A (en) * 2019-01-23 2019-05-28 四川易诚智讯科技有限公司 Social networks character relation analysis method based on multitiered network community division
CN109902218A (en) * 2019-01-25 2019-06-18 重庆科技学院 A kind of internet statistical data acquisition methods and system
CN110334325A (en) * 2019-07-16 2019-10-15 同方知网数字出版技术股份有限公司 A kind of full text similarity analysis method compared towards publishing house's strange land resource joint
CN110990437A (en) * 2019-12-05 2020-04-10 大众问问(北京)信息科技有限公司 Data fusion method and device and computer equipment
CN117349535A (en) * 2023-12-04 2024-01-05 四川启明芯智能科技有限公司 Cross-platform multi-business comprehensive travel management system and method

Similar Documents

Publication Publication Date Title
CN107122447A (en) The network searching system and control method of a kind of multi-data source fusion based on preference
CN106156127B (en) Method and device for selecting data content to push to terminal
CN106227815B (en) Multi-modal clue personalized application program function recommendation method and system
US8620849B2 (en) Systems and methods for facilitating open source intelligence gathering
CN101489107B (en) Collaborative filtering recommendation method based on population attribute keyword vector
US20140181125A1 (en) Systems and methods for facilitating the gathering of open source intelligence
CN102004782A (en) Search result sequencing method and search result sequencer
CN105930469A (en) Hadoop-based individualized tourism recommendation system and method
EP2511869A2 (en) Method and system for providing user-customized content
US20170277798A9 (en) System for finding website invitation cueing keywords and for atrribute-based generation of invitation-cueing instructions
US20130301939A1 (en) Information processing apparatus, information processing method, and program
CN102236677A (en) Question answering system-based information matching method and system
CN103955529A (en) Internet information searching and aggregating presentation method
CN113254630B (en) Domain knowledge map recommendation method for global comprehensive observation results
CN108460082A (en) A kind of recommendation method and device, electronic equipment
CN105843850A (en) Searching optimization method and device
CN104199938B (en) Agricultural land method for sending information and system based on RSS
CN110119478B (en) Similarity-based item recommendation method combining multiple user feedback data
CN105787068A (en) Academic recommendation method and system based on citation network and user proficiency analysis
CN106663100A (en) Multi-domain query completion
CN103198072A (en) Method and device for mining and recommendation of popular search word
CN105843817A (en) Method and apparatus for searching on terminal device, and device
Zhang et al. Proposing a new friend recommendation method, FRUTAI, to enhance social media providers' performance
CN105975609A (en) Industrial design product intelligent recommendation method and system
Zahálka et al. Interactive multimodal learning for venue recommendation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20170901