CN101075229A - Method and system for analyzing phrase semantic tendency - Google Patents

Method and system for analyzing phrase semantic tendency Download PDF

Info

Publication number
CN101075229A
CN101075229A CN 200710075013 CN200710075013A CN101075229A CN 101075229 A CN101075229 A CN 101075229A CN 200710075013 CN200710075013 CN 200710075013 CN 200710075013 A CN200710075013 A CN 200710075013A CN 101075229 A CN101075229 A CN 101075229A
Authority
CN
China
Prior art keywords
retrieval
result
word
analyzed
key word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN 200710075013
Other languages
Chinese (zh)
Inventor
张会鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN 200710075013 priority Critical patent/CN101075229A/en
Publication of CN101075229A publication Critical patent/CN101075229A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A method for analyzing semantic trend of terms includes selecting one or multiple reference word, combining terms to be analyzed with one or multiple reference word to form one or multiple key character, indexing one or multiple key character and/or indexing terms to be analyzed to generate one group or multi-group of indexed result and analyzing one group or multi-group of indexed result for obtaining analysis result describing semantic trend of terms to be analyzed. The analysis system used for realizing said method is also disclosed.

Description

Phrase semantic tendency analytical approach and system
Technical field
The present invention relates to electric digital data processing field, particularly phrase semantic tendency analytical approach and system.
Background technology
The facility that the fast development of existing information technology provides information to compile in a large number and propagate, retrieve and consult.People carry out information retrieval according to the object of paying close attention to, and read related data or comment, sum up the viewpoint that forms oneself.Because information resources are too huge, the work of collecting, read, estimating is very loaded down with trivial details and consuming time, and the instrument that therefore needs urgently to realize automatic search, analysis, conclusion is finished this a series of work for people.
The technology of existing word based on sentiment classification is according to the semantic relation dictionary, calculates word to be analyzed and praises, demotes semantic distance between the benchmark speech and determine the degree of passing judgement on of speech to be analyzed.Common way is: calculate word to be analyzed respectively and praise, demote close relation degree between the benchmark speech, draw the metric of expression phrase semantic tendency to be analyzed.The benchmark speech be from the semantic relation dictionary, choose pass judgement on attitude obviously, strong, representative word.Metric is commonly defined as the real number of boundary between ± 1, when metric is higher than certain threshold value, thinks that word is the commendation tendency; Otherwise think that then word is the derogatory sense tendency.Differentiate the tendency of passing judgement on of word to be analyzed according to the metric that calculates gained: to get in touch tight more then metric big more with commendation benchmark speech, and the commendation tendency of word to be analyzed is obvious more; To get in touch tight more then metric more little with derogatory sense benchmark speech, and the derogatory sense tendency of word to be analyzed is obvious more.
Word to be analyzed in the said method and pass judgement on the benchmark speech and all be limited to the scope of including in the semantic relation dictionary, analysis result is very big to the dependence of semantic relation dictionary.Because the semantic relation dictionary is included to be limited in scope and to be difficult to and is upgraded in time, existing word based on sentiment classification technology only is fit to everyday words (as the speech in the xinhua dictionary) degree of the passing judgement on analysis to standard, then can't be suitable for the widespread demand that the high speed development of incompatibility information changes and word is analyzed for emerging word, some particular words or new semanteme.
Summary of the invention
The invention provides a kind of phrase semantic tendency analytical approach and system, phrase semantic tendency and tendency degree are analyzed flexibly, it is very little to treat the restriction of analysis word and benchmark speech, go for emerging word, some particular words or new semanteme, meet the widespread demand that high speed development changes and word is analyzed of information; The analysis efficiency height, method is flexible and changeable, and the higher reference value of analysis result tool is easy to application.
The invention provides a kind of phrase semantic tendency analytical approach, comprising: choose one or more benchmark speech, word to be analyzed and the synthetic one or more key words of one or more benchmark phrase; Retrieve one or more key words and/or retrieve word to be analyzed, produce one or more groups result for retrieval; With one or more groups result for retrieval of analysis, draw the analysis result of describing phrase semantic tendency to be analyzed.Wherein one or more key words comprise at least one commendation key word and/or at least one derogatory sense key word; Or one or more key words comprise at least one neutral key word; Or one or more key words comprise the first neutral key word and the second neutral key word that at least one pair of meaning is relative.
One or more groups result for retrieval of above-mentioned analysis is that the result for retrieval sum that result for retrieval sum that retrieval commendation key word is obtained and retrieval derogatory sense key word obtain is compared, and maybe will retrieve the result for retrieval sum that commendation key word or retrieval derogatory sense key word obtain and retrieves the result for retrieval sum that word to be analyzed obtains and compare; Maybe will retrieve neutral key word result for retrieval sum that obtains and the result for retrieval sum that retrieval word to be analyzed obtains compares; Or to analyze one or more groups result for retrieval be that the result for retrieval sum that the result for retrieval sum that the first neutral key search is obtained and the second neutral key search obtain is compared.
Above-mentioned retrieval is in information resources key word to be retrieved, and information resources comprise Internet resources, database, file and/or collected works.Wherein, the array mode of benchmark speech and word to be analyzed comprise with or, NOT logic relation, or by with or, the non-combinational logic relation that combines.
Wherein, analyze one or more groups result for retrieval and also comprise result for retrieval is gone heavily, screen out the result for retrieval of repetition; Analyze one or more groups result for retrieval and also comprise, produce the weighting result for retrieval the result for retrieval weighting; Analyze one or more groups result for retrieval and be the weighting result for retrieval sum that weighting result for retrieval sum that retrieval commendation key word is obtained and retrieval derogatory sense key word obtain and compare, maybe will retrieve the weighting result for retrieval sum that commendation key word or retrieval derogatory sense key word obtain and retrieve the weighting result for retrieval sum that word to be analyzed obtains and compare; Or to analyze one or more groups result for retrieval be that the weighting result for retrieval sum that weighting result for retrieval sum that the retrieval first neutral key word is obtained and the retrieval second neutral key word obtain is compared, and maybe will retrieve the weighting result for retrieval sum that neutral key word obtains and retrieve the weighting result for retrieval sum that word to be analyzed obtains and compare.
Above-mentioned to result for retrieval weighting be field under the phrase semantic tendency according to the benchmark speech, the word to be analyzed, word to be analyzed and benchmark speech semantic relation, word to be analyzed and the benchmark speech in result for retrieval in result for retrieval the position or distance, result for retrieval record produce date, result for retrieval in position that result for retrieval is concentrated and/or one or more factors in different search channel or the gopher carry out.
The present invention also provides a kind of phrase semantic tendency analytic system, comprising: the combined retrieval module, with word to be analyzed and the synthetic one or more key words of one or more benchmark phrase, and retrieve one or more key words, and produce one or more groups result for retrieval; Analysis module is analyzed one or more groups result for retrieval, draws the analysis result of describing phrase semantic tendency to be analyzed; Also comprise benchmark selected ci poem delivery piece, be used to choose one or more benchmark speech.Wherein analysis module will be retrieved the result for retrieval that an above key word obtains and compare, maybe will retrieve an above key word result for retrieval that obtains and the result for retrieval that retrieval word to be analyzed obtains and compare, draw the analysis result of describing phrase semantic tendency to be analyzed; Analysis module also comprises weighting block, to the result for retrieval weighted, produces the weighting result for retrieval.
Description of drawings
Fig. 1 is the schematic flow sheet of first embodiment of the invention;
Fig. 2 is the schematic flow sheet of second embodiment of the invention;
Fig. 3 is the module diagram of fourth embodiment of the invention;
Fig. 4 is the module diagram of fifth embodiment of the invention.
The realization of the object of the invention, functional characteristics and advantage will be in conjunction with the embodiments, are described further with reference to accompanying drawing.
Embodiment
The present invention proposes first embodiment, treats analysis word W and carries out semantic analysis.Present embodiment is chosen a plurality of commendation benchmark speech and a plurality of derogatory sense benchmark speech, and word to be analyzed is formed a plurality of commendation key words and a plurality of derogatory sense key word with commendation, derogatory sense benchmark speech respectively.Retrieve these key words respectively, obtain one group of commendation key search result and one group of derogatory sense key search result.Relatively commendation key search number of results and derogatory sense key search number of results obtain the semantic tendency of word W to be analyzed, if commendation key search number of results, thinks that word to be analyzed is the commendation tendency greater than derogatory sense key search number of results; If commendation key search number of results, thinks that word to be analyzed is the derogatory sense tendency less than derogatory sense key search number of results.
The information resources that present embodiment is selected for use are Internet resources, and gopher is a search engine, and result for retrieval is a web page contents.Search key be on the internet with key word as search condition, use the search engine searches internet information, the record retrieval result.
Present embodiment adopts metric x to describe phrase semantic tendency to be analyzed as analysis result, and metric x is defined as the real number of span between [1,1].Pre-defined threshold value a is as the cut off value of judging semantic tendency, and this threshold value a is defined as the real number between the metric span.When metric x is higher than threshold value a, think that word W to be analyzed is the commendation tendency; Otherwise think that then word W to be analyzed is the derogatory sense tendency.The absolute value of metric x | x| represents the degree of semantic tendency, and pre-defined herein threshold value a is 0.
Present embodiment is chosen the benchmark speech according to field under actual conditions, speech habits and the word W to be analyzed, word to be analyzed and the employing of benchmark speech " with " the logical relation combination, be expressed as " and ".Different according to word to be analyzed and benchmark speech, the key word that is combined to form can be one or more words, word, phrase even statement.
Specifically with reference to Fig. 1, first embodiment of the invention realizes that the method step of phrase semantic tendency analysis is as follows:
S11, the commendation benchmark speech that to choose k semantic tendency be commendation, the semantic tendency of k benchmark speech is the derogatory sense benchmark speech of derogatory sense;
S12 forms k commendation key word with each commendation benchmark word combination respectively with word W to be analyzed, and word W to be analyzed forms k derogatory sense key word with each commendation benchmark word combination respectively;
S13 retrieves each commendation key word respectively in information resources, obtain and the record retrieval result, retrieves each derogatory sense key word, obtains and the record retrieval result;
S14 analyzes above-mentioned result for retrieval, draws the analysis result of describing word W semantic tendency to be analyzed.
The analysis formula (1) that present embodiment is analyzed result for retrieval is:
x = ( Σ i = 1 k GetSearchResultNum ( C i ) - Σ i = 1 k GetSearchResultNum ( D i ) ) /
( Σ i = 1 k GetSearchResultNum ( C i ) + Σ i = 1 k GetSearchResultNum ( D i ) ) - - - ( 1 )
Wherein k is the number of commendation key word, the value of i from 1 to k." C i" represent the 1st to k commendation key word, GetSearchResultNum (C i) be with each commendation key word " C i" retrieve the result for retrieval number that obtains, use
Figure A20071007501300091
The result for retrieval sum that function adds up k result for retrieval number to obtain the commendation key word; " D i" represent the 1st to k derogatory sense key word, GetSearchResultNum (D i) for using derogatory sense key word " D i" retrieve the result for retrieval number that obtains, use
Figure A20071007501300092
The result for retrieval sum that function adds up k result for retrieval number to obtain the derogatory sense key word, with commendation result sum and derogatory sense as a result the difference of sum divided by sum and that the obtain word W to be analyzed as a result metric x of commendation result sum and derogatory sense.
10 key words of 5 commendation benchmark speech for example, choosing, 5 derogatory sense benchmark speech and combination results are exemplified below:
Commendation benchmark speech The commendation key word Derogatory sense benchmark speech The derogatory sense key word
Good W and is good Difference W and is poor
Well W and is pretty good Bad W and is bad
Handy W and is handy Difficult using W and is difficult to be used
Rod W and rod Poor W and is poor
Good W and is good Mashed W and is mashed
And it is as follows to retrieve the result for retrieval number that each key word obtains:
The commendation key word The result for retrieval number The derogatory sense key word The result for retrieval number
W and is good 200 W and is poor 100
W and is pretty good 80 W and is bad 30
W and is handy 50 W and is difficult to be used 20
W and rod 70 W and is poor 10
W and is good 20 W and is mashed 0
Calculate Σ i = 1 k GetSearchResultNum ( C i ) = 200 + 80 + 50 + 70 + 20 = 420 , And Σ i = 1 k GetSearchResultNum ( D i ) = 100 + 3 + 20 + 10 + 0 = 160 , X=(420-160)/(420+160) ≈ 0.448, metric x be greater than threshold value a, illustrates that word W to be analyzed is the commendation tendency, treats the evaluation of analyzing word W or understanding and tend to the front or praise and honour in information resources.
In the foregoing description the syntagmatic of benchmark speech and word to be analyzed be " with " logical relation, according to the actual conditions of actual analysis demand, information resources and word to be analyzed also can select for use or, non-, with non-or non-, comprise, equal, be not equal to, greater than, less than etc. one or more logical relations and combinational logic relation.
In the foregoing description, each benchmark speech is combined to form a key word with word to be analyzed respectively, in actual applications, can will be correlated with or close a plurality of benchmark speech and key word of word common combination to be analyzed formation, syntagmatic can be with reference to above-mentioned one or more logical relations and combinational logic relation.
The information resources of above-mentioned retrieval are not limited only to Internet resources, can also be database, file or collected works, and gopher also is not limited only to network search engines, can also be the search functions of data of database query facility, file search instrument or collected works.
In the foregoing description, the benchmark speech of choosing is to have the praising of tangible mood tendency, derogatory term.In some professional domain or practical application, the neutral words that can select the obvious mood tendency of tool not for use is as the benchmark speech.For example for automotive field, can select that height/chassis, chassis is low, wheelbase is big for use/the relative neutral benchmark speech of meaning such as wheelbase is little.Adopt the analytical approach of neutral benchmark speech similar to the above embodiments, choose the first relative neutral benchmark speech of at least one pair of meaning and the second neutral benchmark speech, word to be analyzed is formed at least one first neutral key word and at least one second neutral key word with the first neutral benchmark speech and the second neutral benchmark speech respectively.Retrieve these key words respectively, obtain one group of first neutral key search result and one group of second neutral key search result.Compare the first neutral key search number of results and the second neutral key search number of results, obtain the semantic tendency of word W to be analyzed.If the first neutral key search number of results is greater than the second neutral key search number of results, think that then the semantic tendency of word to be analyzed is in the first neutral benchmark speech; If the first neutral key search number of results is less than the second neutral key search number of results, think that then the semantic tendency of word to be analyzed is in the second neutral benchmark speech.When adopting this analytical approach, the step of the choosing of the definition of information resources, gopher, metric, threshold value, word to be analyzed and benchmark contamination mode and combinational logic, analysis semantic tendency and to analyze used formula all similar to the aforementioned embodiment is not so give unnecessary details.
For adapting to the concrete phrase semantic analyze demands and the diversity of information resources, the present invention proposes the logic that result for retrieval is analyzed is not limited only to the simple comparison of result for retrieval number, can also screen, go weight, weighting and/or identification to result for retrieval, to obtain more accurate analysis result.
Propose the second embodiment of the present invention herein, treat analysis word W and carry out semantic analysis.Present embodiment is chosen a plurality of commendatory terms and a plurality of derogatory term respectively as weighting benchmark speech, word to be analyzed is formed a plurality of commendation key words with one or more commendation weighting benchmark speech respectively, forms a plurality of derogatory sense key words with one or more commendation weighting benchmark speech respectively again.Retrieve these commendation key words and derogatory sense key word respectively, obtain one group of commendation key search result and one group of derogatory sense key search result.Respectively commendation key search result and derogatory sense key search result are gone heavily to handle, to going heavy commendation key search result and derogatory sense key search result to be weighted processing, obtain the weighting result for retrieval of commendation key word and the weighting result for retrieval of derogatory sense key word again.Compare the weighting result for retrieval sum of commendation key word and the weighting result for retrieval sum of derogatory sense key word, obtain the semantic tendency analysis result of word W to be analyzed.If commendation weighted keyword result for retrieval sum, thinks that word to be analyzed is the commendation tendency greater than derogatory sense weighted keyword result for retrieval sum; If commendation weighted keyword result for retrieval sum, thinks that word to be analyzed is the derogatory sense tendency less than derogatory sense weighted keyword result for retrieval sum.
The information resources that present embodiment is selected for use are database, and gopher is the search function of database, and search key is in database, as search key, use the information in the search function searching database of database with key word.The result for retrieval that database returns comprises title and summary two parts, all may contain key word, the record retrieval result in title and the summary.
Specifically with reference to Fig. 2, second embodiment of the invention realizes that the method step of phrase semantic tendency analysis is as follows:
S21, choosing at least one semantic tendency is the commendation benchmark speech of commendation, the semantic tendency of at least one benchmark speech is the derogatory sense benchmark speech of derogatory sense; Total weight of all commendation benchmark speech equates with total weight of all derogatory sense benchmark speech;
S22 forms p commendation key word with one or more commendation benchmark word combinations respectively with word W to be analyzed, and word W to be analyzed forms q derogatory sense key word with one or more commendation benchmark word combinations respectively;
S23 retrieves each commendation key word respectively in information resources, obtain and write down p group commendation key search result, retrieve each derogatory sense key word, obtain and write down q group derogatory sense key search result,, judge whether that all commendation key words and derogatory sense key word are all retrieved to finish up to S24; Otherwise return S22, the retrieval of next key word is carried out in circulation.
S25 goes heavily to handle to commendation key search result and derogatory sense key search result;
S26 to each commendation key search result after going heavily and derogatory sense key search result weighting respectively, writes down the weighting result for retrieval of each commendation key word and the weighting result for retrieval of each derogatory sense key word;
S27 compares the weighting result for retrieval sum of commendation key word and the weighting result for retrieval sum of derogatory sense key word, obtains describing the analysis result of word W semantic tendency to be analyzed.
The selection rule of benchmark speech is different with a last embodiment in the present embodiment, and commendation benchmark speech that present embodiment is chosen and derogatory sense benchmark speech number can not wait, but total weight of all commendation benchmark speech should equate with total weight of all derogatory sense benchmark speech.
Present embodiment is more flexible with the last embodiment of the mechanism of word to be analyzed and benchmark word combination, and word not only to be analyzed can form key word with one or more benchmark word combinations, but also can use the combination of Different Logic.
Present embodiment goes heavily to handle to result for retrieval and adopts segmentation signature method, and predetermined threshold value b cuts into multistage to the result for retrieval content according to certain rule, and each section is signed, and forms the information fingerprint of each section.If the quantity of identical information fingerprint surpasses that threshold value b thinks then that the content of two result for retrieval repeats or identical in certain two result for retrieval, one of them result for retrieval is screened out.
Present embodiment was weighted the date that result for retrieval deposits in the database.Definition result for retrieval time factor, this time factor are along with the date that result for retrieval deposits in the database successively decreases and successively decreases, and the result for retrieval that promptly deposits at the latest in the database has maximum time factor.The setting-up time factor is the real number between 0~1, and the time that result for retrieval deposits in the database is remote more, and time factor is more near 0; The time that result for retrieval deposits in the database is near more, and time factor is more near 1.
The definition of present embodiment moderate value x and threshold value c and a last embodiment are similar, so do not give unnecessary details, the formula that present embodiment is analyzed the weighting result for retrieval (2) is:
x = ( Σ i = 1 p Score ( C i ) - Σ j = 1 q Score ( D i ) ) / ( Σ i = 1 p Score ( C i ) + Σ j = 1 q Score ( D i ) ) - - - ( 2 )
In the above-mentioned formula (2), p is the number of commendation key word, and the value of i is from 1 to p, " C i" represent the 1st to p commendation key word, Score (C i) be the weighting result for retrieval sum of commendation key word; Q is the number of derogatory sense key word, and the value of j is from 1 to q, " D i" represent the 1st to q derogatory sense key word, Score (D i) be the weighting result for retrieval sum of derogatory sense key word; With the difference of the weighting result for retrieval of commendation key word sum and the weighting result for retrieval sum of derogatory sense key word divided by the weighting result for retrieval sum of commendation key word and metric x weighting result for retrieval sum and that obtain word W to be analyzed of derogatory sense key word.
Score (C i) computing formula (3) as follows:
Score ( C i ) = m * GetSearchResultNum ( C i ) + n * Σ k = 1 GetSearchResultNum ( C i ) (number that contains the benchmark speech in the result for retrieval time factor * result for retrieval title) (3)
And Score (D i) computing formula (4) as follows:
Score ( D i ) = m * GetSearchResultNum ( D i ) + n * Σ k = 1 GetSearchResultNum ( D i ) (number that contains the benchmark speech in the result for retrieval time factor * result for retrieval title) (4)
Wherein m, n are predefined weighting weights, GetSearchResultNum (C i) and GetSearchResultNum (D i) be respectively with each commendation key word " C i", each derogatory sense key word " D i" the result for retrieval number that obtains of retrieval; To each bar result for retrieval, calculate its score value with " number that contains the benchmark speech in the result for retrieval time factor * result for retrieval title ",
Figure A20071007501300133
(number that contains the benchmark speech in the result for retrieval time factor * result for retrieval title) is the score value sum of all result for retrieval of obtaining of each commendation key search;
Figure A20071007501300134
(number that contains the benchmark speech in the result for retrieval time factor * result for retrieval title) is the score value sum of all result for retrieval of obtaining of each derogatory sense key search.
According to present embodiment, 4 commendation benchmark speech and 5 derogatory sense benchmark speech of choosing are exemplified below:
Commendation benchmark speech Weight Derogatory sense benchmark speech Weight
Good 1 Difference 0.9
Well 0.8 Bad 0.7
Handy 0.5 Difficult using 0.7
Rod 0.7 Poor 0.5
Mashed 0.2
Use by with or, the non-combinational logic relation that combines is exemplified below 2 commendation key words, 3 derogatory sense key words and the corresponding result for retrieval statistics that benchmark speech and word to be analyzed are combined to form:
The commendation key word The result for retrieval number The derogatory sense key word The result for retrieval number
W and (good or is pretty good) 1020 W and (or is bad for difference) 2300
W and (handy or rod) 560 W and (difficult poor) with or 1800
W and is mashed 56
After going heavily to handle, screen out the result for retrieval that part contains duplicate contents, the result for retrieval fractional statistics that obtains after heavy is as follows:
The commendation key word The result for retrieval number The derogatory sense key word The result for retrieval number
W and (good or is pretty good) 1000 W and (or is bad for difference) 2000
W and (handy or rod) 510 W and (difficult poor) with or 1200
W and is mashed 40
Again the number that contains the benchmark speech in date of each result for retrieval and the result for retrieval title is discerned, calculate the x value greater than threshold value c and | x| is near 1, illustrate that word W to be analyzed is a derogatory sense tendency, in information resources, treat the evaluation of analyzing word W or understanding strong tendency in negative or demote.
In the foregoing description, different according to word to be analyzed and benchmark speech, the key word that is combined to form can be one or more words, word, phrase even statement; The information resources of above-mentioned retrieval are not limited only to database resource, can also be Internet resources, file or collected works; The definition of metric x and threshold value a, threshold value b, threshold value c, the definition of weighting weight, benchmark speech number, key word number and array mode all can be determined according to the actual requirements.And result for retrieval weighting rule also is not limited only to according to the date weighting that deposits in the database, can also be according to the phrase semantic tendency of benchmark speech, field under the word to be analyzed, word to be analyzed and the semantic relation of benchmark speech in result for retrieval, word to be analyzed and benchmark speech position or the distance in result for retrieval, the result for retrieval record deposits the date in the database in, result for retrieval in position that result for retrieval is concentrated and/or one or more factors in different search channel or the gopher factor be weighted, principle with among the above-mentioned embodiment according to depositing date in the database in to the mechanism of result for retrieval weighting.
Going the heavy segmentation signature method that also is not limited only in the foregoing description, can also select one or more methods in webpage algorithm, Shingle method, HP laboratory method, bloom filter method, content+linking relationship method, the I-Match method etc. of duplicating for use based on keyword.
For simplifying semantic analysis process, obtain semantic analysis result fast, the 3rd embodiment is proposed, the result for retrieval number of key search number of results and word to be analyzed is compared, weigh key search result shared ratio in word search number of results to be analyzed, judge the semantic tendency of word to be analyzed with this.Specifically, can select an above commendation benchmark speech for use, be combined into an above commendation key word with word to be analyzed, these commendation key words of retrieval obtain one group of commendation key search result in information resources; Be search condition with word to be analyzed again, retrieval word to be analyzed obtains one group of word search result to be analyzed in information resources, calculates commendation key search number of results shared ratio in word search number of results to be analyzed, if ratio is higher, think that word to be analyzed tends to commendation; If ratio is lower, can think that word to be analyzed is not inclined to commendation.The selection of the information resources of this method, benchmark speech number, key word number and array mode is all similar with first, second embodiment, except that selecting commendation benchmark speech for use, also can select derogatory sense benchmark speech or neutral benchmark speech for use.This method also can go to analyze after weight, the weighted to result for retrieval again, also can be according to actual conditions predetermined threshold value d, the result for retrieval number of the key word that combines when certain benchmark speech and word to be analyzed is when shared ratio is higher than threshold value d in word search number of results to be analyzed, thinks that the semantic tendency of word to be analyzed is in this benchmark speech.
Referring to Fig. 3, the present invention also proposes the 4th embodiment, a kind of phrase semantic tendency analytic system, comprise benchmark selected ci poem delivery piece 11, be used to choose one or more benchmark speech, comprise at least one commendation benchmark speech and/or at least one derogatory sense benchmark speech, or comprise at least one neutral benchmark speech, or comprise the first neutral benchmark speech and the second neutral benchmark speech that at least one pair of meaning is relative; Combined retrieval module 12 with word to be analyzed and the synthetic one or more key words of one or more benchmark phrase, is retrieved one or more key words, produces one or more groups result for retrieval; With analysis module 13, be used for one or more groups result for retrieval analysis, draw the analysis result of describing phrase semantic tendency to be analyzed.
The method that the phrase semantic tendency that the workflow of present embodiment and result for retrieval analysis logic provide with reference to first embodiment is analyzed is not so give unnecessary details.
For adapting to the phrase semantic analyze demands of continuous lifting, adapt to high speed information and change situation, the 5th embodiment is proposed on the 4th embodiment basis.The benchmark selected ci poem delivery piece 11 of present embodiment is similar with a last embodiment, so do not give unnecessary details.Combined retrieval module 12 comprises composite module 121 and retrieval module 122, benchmark speech that composite module 121 is chosen benchmark selected ci poem delivery piece 11 and word to be analyzed with or, the NOT logic relation, or by with or, the non-combinational logic relation that combines sets up and is combined into key word, the key word that commendation benchmark speech and word to be analyzed are combined into is the commendation key word, and the key word that derogatory sense benchmark speech and word to be analyzed are combined into is the derogatory sense key word; The key word that neutral benchmark speech and word to be analyzed are combined into is neutral key word; The key word that the first neutral benchmark speech and word to be analyzed are combined into is the first neutral key word, and the key word that second neutral benchmark speech relative with the first neutral benchmark speech meaning and word to be analyzed are combined into is the second neutral key word.Retrieval module 122 is retrieved all key words of composite module 121 combination results in information resources, produces result for retrieval.
The result for retrieval that analysis module 13 produces according to retrieval module 122, retrieval commendation key word result's sum that obtains and the sum as a result that retrieval derogatory sense key word obtains are compared, maybe will retrieve first key word result's sum that obtains and the sum as a result that retrieval second key word obtains compares, maybe will retrieve result's sum that commendation key word, derogatory sense key word or neutral key word obtain and retrieve the sum as a result that word to be analyzed obtains and compare, draw the analysis result of describing phrase semantic tendency to be analyzed.
Analysis module 13 comprises that information resources are selected and composite module 131, remove molality piece 132, weighting block 133, algoritic module 134 and output module 135 as a result.Wherein resource selection and composite module 121 are selected from multiple information resources partly or entirely, with reference to choosing the benchmark speech, also supply retrieval module 122 with reference to the information resources of selecting to be used to retrieve for benchmark selected ci poem delivery piece 11.Go 132 pairs of result for retrieval of molality piece to go heavily to handle, filter out precision higher search result.133 pairs of result for retrieval that weighed through the past of weighting block are weighted processing, employing comprise according to semantic relation, word to be analyzed and the benchmark speech in result for retrieval of field, word to be analyzed and benchmark speech under the phrase semantic tendency of benchmark speech, the word to be analyzed in result for retrieval the position or distance, result for retrieval record to produce position and/or one or more factors such as different search channel or gopher that date, result for retrieval concentrate at result for retrieval be the result for retrieval weighting, result for retrieval after the generation weighting sends to algoritic module 134 and handles.Output module 135 receiving algorithm modules 134 analysis result or the error message of sending as a result sends it to the user of phrase semantic tendency analytic system.
The method that the phrase semantic tendency that present embodiment workflow and analysis logic reference second embodiment and the 3rd embodiment provide is analyzed is not so give unnecessary details.
The retrieval module 122 of present embodiment can also comprise information resources interface 1221, and retrieval module 122 is connected with information resources 10 by this information resources interface 1221, realizes the retrieval to information resources 10.The information resources of present embodiment comprise Internet resources, database, file or collected works, choose one or more according to actual needs and retrieve.The api interface that information resources interface 1221 adopts search engine to provide, with Google is example, soap and wsdl standard are followed in the Google service, the combined retrieval module 12 of present embodiment can be with multiple development environment such as java, perl, or .net realizes, the api interface that retrieval module 122 utilizes information resources interface 1221 just can the access search engine, realizes the function of retrieval and statistics result for retrieval.The information resources interface 1221 of present embodiment can also adopt other schemes, to adapt to the retrieval standard of different information resources, retrieval module 122 is by information resources interface 1221 addressable multiple information resources, realize the selection and the combined retrieval of multiple information resources, the result for retrieval in the multiple information resources is combined the semantic tendency analysis of carrying out word.
Present embodiment adopts hardware, software or software and hardware combining mode to realize that can also comprise benchmark dictionary 14, benchmark selected ci poem delivery piece 11 is chosen the benchmark speech from the benchmark dictionary; Can also comprise result for retrieval storehouse 15, deposit result for retrieval in, call for analysis module 13 by combined retrieval module 12.
The above only is the preferred embodiments of the present invention; be not so limit claim of the present invention; every equivalent structure or equivalent flow process conversion that utilizes instructions of the present invention and accompanying drawing content to be done; or directly or indirectly be used in other relevant technical fields, all in like manner be included in the scope of patent protection of the present invention.

Claims (13)

1. phrase semantic tendency analytical approach comprises:
Choose one or more benchmark speech, with word to be analyzed and the synthetic one or more key words of described one or more benchmark phrase;
Retrieve described one or more key word and/or retrieve word to be analyzed, produce one or more groups result for retrieval; With
Analyze described one or more groups result for retrieval, draw the analysis result of describing phrase semantic tendency to be analyzed.
2. phrase semantic tendency analytical approach according to claim 1 is characterized in that, described one or more key words comprise at least one commendation key word and/or at least one derogatory sense key word; Or described one or more key word comprises at least one neutral key word; Or described one or more key word comprises the first neutral key word and the second neutral key word that at least one pair of meaning is relative.
3. phrase semantic tendency analytical approach according to claim 2, it is characterized in that, one or more groups result for retrieval of described analysis is that the result for retrieval sum that result for retrieval sum that retrieval commendation key word is obtained and retrieval derogatory sense key word obtain is compared, and maybe will retrieve the result for retrieval sum that commendation key word or retrieval derogatory sense key word obtain and retrieves the result for retrieval sum that word to be analyzed obtains and compare; Maybe will retrieve neutral key word result for retrieval sum that obtains and the result for retrieval sum that retrieval word to be analyzed obtains compares; Or one or more groups result for retrieval of described analysis is that the result for retrieval result for retrieval sum total and that the second neutral key search obtains that the first neutral key search obtains is compared.
4. according to the described phrase semantic tendency analytical approach of arbitrary claim in the claim 1 to 3, it is characterized in that, described retrieval is in information resources described key word to be retrieved, and described information resources comprise Internet resources, database, file and/or collected works.
5. according to the described phrase semantic tendency analytical approach of arbitrary claim in the claim 1 to 3, it is characterized in that, the array mode of described benchmark speech and word to be analyzed comprise with or, NOT logic relation, or by with or, the non-combinational logic relation that combines.
6. according to the described phrase semantic tendency analytical approach of arbitrary claim in the claim 1 to 3, it is characterized in that one or more groups result for retrieval of described analysis also comprises result for retrieval is gone heavily, screens out the result for retrieval of repetition.
7. according to the described phrase semantic tendency analytical approach of arbitrary claim in the claim 1 to 3, it is characterized in that one or more groups result for retrieval of described analysis also comprises the result for retrieval weighting, produces the weighting result for retrieval.
8. phrase semantic tendency analytical approach according to claim 7, it is characterized in that, one or more groups result for retrieval of described analysis is that the weighting result for retrieval sum that weighting result for retrieval sum that retrieval commendation key word is obtained and retrieval derogatory sense key word obtain is compared, and maybe will retrieve the weighting result for retrieval sum that commendation key word or retrieval derogatory sense key word obtain and retrieves the weighting result for retrieval sum that word to be analyzed obtains and compare; Or one or more groups result for retrieval of described analysis is that the weighting result for retrieval sum that weighting result for retrieval sum that the retrieval first neutral key word is obtained and the retrieval second neutral key word obtain is compared, and maybe will retrieve the weighting result for retrieval sum that neutral key word obtains and retrieves the weighting result for retrieval sum that word to be analyzed obtains and compare.
9. phrase semantic tendency analytical approach according to claim 7, it is characterized in that, to result for retrieval weighting be field under the phrase semantic tendency according to the benchmark speech, the word to be analyzed, word to be analyzed and benchmark speech semantic relation, word to be analyzed and the benchmark speech in result for retrieval in result for retrieval the position or distance, result for retrieval record produce date, result for retrieval in position that result for retrieval is concentrated and/or one or more factors in different search channel or the gopher carry out.
10. phrase semantic tendency analytic system comprises:
The combined retrieval module with word to be analyzed and the synthetic one or more key words of one or more benchmark phrase, and is retrieved described one or more key word, produces one or more groups result for retrieval;
Analysis module is analyzed described one or more groups result for retrieval, draws the analysis result of describing phrase semantic tendency to be analyzed.
11. phrase semantic tendency analytic system according to claim 10 is characterized in that, also comprises benchmark selected ci poem delivery piece, is used to choose one or more benchmark speech.
12. phrase semantic tendency analytic system according to claim 10, it is characterized in that, described analysis module will be retrieved the result for retrieval that an above key word obtains and compare, maybe will retrieve an above key word result for retrieval that obtains and the result for retrieval that retrieval word to be analyzed obtains and compare, draw the analysis result of describing phrase semantic tendency to be analyzed.
13. according to the described phrase semantic tendency analytic system of the arbitrary claim of claim 10 to 12, it is characterized in that described analysis module also comprises weighting block,, produce the weighting result for retrieval to the result for retrieval weighted.
CN 200710075013 2007-06-09 2007-06-09 Method and system for analyzing phrase semantic tendency Pending CN101075229A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 200710075013 CN101075229A (en) 2007-06-09 2007-06-09 Method and system for analyzing phrase semantic tendency

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 200710075013 CN101075229A (en) 2007-06-09 2007-06-09 Method and system for analyzing phrase semantic tendency

Publications (1)

Publication Number Publication Date
CN101075229A true CN101075229A (en) 2007-11-21

Family

ID=38976283

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 200710075013 Pending CN101075229A (en) 2007-06-09 2007-06-09 Method and system for analyzing phrase semantic tendency

Country Status (1)

Country Link
CN (1) CN101075229A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104361038B (en) * 2008-03-13 2018-06-05 商业合伙人有限公司 Improved search engine

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104361038B (en) * 2008-03-13 2018-06-05 商业合伙人有限公司 Improved search engine

Similar Documents

Publication Publication Date Title
CN101046820A (en) System and method for prioritizing websites during a webcrawling process
CN1112647C (en) Feature diffusion across hyperlinks
Kersten et al. The researcher's guide to the data deluge: Querying a scientific database in just a few seconds
CN1240011C (en) File classifying management system and method for operation system
CN1904886A (en) Method and apparatus for establishing link structure between multiple documents
CN1290036C (en) Computer system and method for establishing concept knowledge according to machine readable dictionary
CN101079064A (en) Web page sequencing method and device
CN1755682A (en) System and method for ranking search results using link distance
CN1609859A (en) Search result clustering method
CN110543595B (en) In-station searching system and method
CN101963965B (en) Document indexing method, data query method and server based on search engine
CN1858737A (en) Method and system for data searching
CN101051313A (en) Integrated data source finding method for deep layer net page data source
CN101079056A (en) Retrieving method and system
CN103902597A (en) Method and device for determining search relevant categories corresponding to target keywords
CN1750002A (en) Method for providing research result
CN102456016B (en) Method and device for sequencing search results
CN1614594A (en) Clustering method and system of XML documents
CN1818908A (en) Feedbakc information use of searcher in search engine
CN1967536A (en) Region based multiple features Integration and multiple-stage feedback latent semantic image retrieval method
CN103714149A (en) Self-adaptive incremental deep web data source discovery method
CN1716246A (en) Multi-column multi-data type internationalized sort extension method for WEB applications
CN1492361A (en) Processing method for embedded data bank searching
CN103064841A (en) Retrieval device and retrieval method
CN1499403A (en) Method and system of computer aided analyzing patent data

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C57 Notification of unclear or unknown address
DD01 Delivery of document by public notice

Addressee: Liu Yang

Document name: Written notice of preliminary examination of application for patent for invention

Addressee: Liu Yang

Document name: Deemed not to advise

C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C12 Rejection of a patent application after its publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20071121