CN108415903B - Evaluation method, storage medium, and apparatus for judging validity of search intention recognition - Google Patents
Evaluation method, storage medium, and apparatus for judging validity of search intention recognition Download PDFInfo
- Publication number
- CN108415903B CN108415903B CN201810202366.5A CN201810202366A CN108415903B CN 108415903 B CN108415903 B CN 108415903B CN 201810202366 A CN201810202366 A CN 201810202366A CN 108415903 B CN108415903 B CN 108415903B
- Authority
- CN
- China
- Prior art keywords
- intention
- search
- participle
- search intention
- domain
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
Abstract
The invention provides an evaluation method for judging the identification effectiveness of a search intention, which comprises the following steps: s1, acquiring the total number of the participles and the information of each participle in the process of identifying the search intention to be evaluated; s2, respectively calculating the sum of word frequencies corresponding to each participle and the frequency of the inverted document according to the information; and S3, calculating the effectiveness score of the search intention identification process to be evaluated by using the sum of the word frequencies corresponding to each participle obtained in the step S2 and the frequency of the inverted documents, and further judging whether the search intention identification process is effective. The invention also relates to a related computer readable storage medium and an electronic device.
Description
Technical Field
The invention relates to the field of big data search, in particular to an evaluation method for judging the identification effectiveness of a search intention, a related storage medium and electronic equipment.
Background
On the live broadcast platform, the real intention of the user can be guessed according to the search query of the user, and more accurate search results can be returned through the real intention. However, to the extent that the results reflect the user's true intent, the actual utility of such identified weak intent may be very low if the correlation is poor. Therefore, the problem to be solved is how to measure the relevance of the intention matching so as to judge the effectiveness of the intention identification.
Unlike the scenario where results are returned directly from text matching of search terms, there may be no textual relevance between the search results returned after using the intent recognition algorithm and the search terms, so it is very easy to measure relevance using text edit distance.
Therefore, it is necessary to propose a new evaluation method for judging the validity of the search intention recognition.
Disclosure of Invention
In view of the above, in order to overcome at least one aspect of the above problems, embodiments of the present invention provide an evaluation method for determining validity of search intention recognition based on TF-IDF.
According to an aspect of the present invention, there is provided an evaluation method of judging validity of search intention recognition, including the steps of:
s1, acquiring the total number of the participles and the information of each participle in the process of identifying the search intention to be evaluated;
s2, respectively calculating the sum of word frequencies corresponding to each participle and the frequency of the inverted document according to the information;
and S3, calculating the effectiveness score of the search intention identification process to be evaluated by using the sum of the word frequencies corresponding to each participle obtained in the step S2 and the frequency of the inverted documents, and further judging whether the search intention identification process is effective.
For example, the information includes intention domains matched with each participle, wherein each intention domain has a preset weight; the number of times each participle is matched in the intention domain; the total times of searching of the user in a preset time period and the times of searching containing each participle in the total times of searching.
For example, the sum of the word frequencies corresponding to each participle is calculated according to the following formula:
h is a matched intention domain set and consists of a plurality of different intention domains, and f is one intention domain;
nfis the number of words in the intention field f;
wfis the weight of the intention field f.
For example, the frequency of the inverted document corresponding to each word is calculated according to the following formula
N is the total searching times of the user in a preset time period; n (t)i) To contain each participle tiLog is the natural logarithm.
For example, the effectiveness score R of the search intention recognition process to be evaluated is calculated according to the following formula:
where n is the total number of tokens.
Further, step S3 further includes:
comparing the validity score with a preset threshold, and if the validity score is greater than the preset threshold, judging that the search intention identification process is valid; and if the validity score is smaller than a preset threshold value, judging that the search intention identification process is invalid.
The present invention also provides a computer-readable storage medium having stored thereon executable instructions, characterized in that the instructions, when executed by a processor, implement the steps of any of the evaluation methods for judging the validity of search intention recognition as described above.
The present invention also provides an electronic device, comprising:
a memory for storing executable instructions; and
a processor for executing the executable instructions stored in the memory to implement the steps of any of the evaluation methods for determining the validity of search intention recognition as described above.
Compared with the prior art, the method can scientifically and accurately judge whether the search intention identification is effective or not, and solves the problem that the traditional correlation evaluation method cannot be applied.
Drawings
Other objects and advantages of the present invention will become apparent from the following description of the invention which refers to the accompanying drawings, and may assist in a comprehensive understanding of the invention.
Fig. 1 is a flowchart illustrating implementation steps of an evaluation method for determining validity of search intention recognition according to an embodiment of the present invention;
FIG. 2 is a schematic structural diagram of a computer-readable storage medium according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be described clearly and completely with reference to the accompanying drawings of the embodiments of the present invention. It should be apparent that the described embodiment is one embodiment of the invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the described embodiments of the invention without any inventive step, are within the scope of protection of the invention.
Unless defined otherwise, technical or scientific terms used herein shall have the ordinary meaning as understood by one of ordinary skill in the art to which this invention belongs.
In this context, the expression search intention refers to the content of intention that the user actually wants to search, as judged from the user search query phrase.
The expression intention domain refers to an intention set which divides the search intention of the user according to business experience, and a main broadcasting intention domain, a partition intention domain and the like are common in live broadcast search. Each intention field is composed of several index words.
According to one aspect of the invention, an evaluation method for judging the effectiveness of search intention recognition is provided, and the specific implementation idea is as follows:
based on TF-FID, the word frequency of all the participles obtained after the participle processing in the search intention identification process and the frequency of the participles appearing in a preset time period are calculated, so that the evaluation score of the search intention identification process is obtained, and whether the identification process is effective or not can be judged.
More specifically, the evaluation method for determining the validity of search intention recognition according to the present invention will be described in detail below with reference to the accompanying drawings.
Referring to fig. 1, an evaluation method for determining validity of search intention recognition according to an embodiment of the present invention may include the following steps:
s1, acquiring the total number of the participles and the information of each participle in the process of identifying the search intention to be evaluated;
in this embodiment, the information of each participle may include an intention domain matched with each participle, and it should be noted that the number of the intention domains matched with each participle may be different, and the category may also be different, for example, the participle t may be different1Can be matched to an intention field A and divided into words t2The intention domain B and the intention domain C can be matched. And different intention domains all have preset weights, and the weights can be set according to the prior business experience.
The information of each participle may also include the number of times each participle can be matched in the respective matched intention domain, i.e. the number of times a participle can be matched to a word in the intention domain. For example, the word segmentation t1The words in the intention field A can be matched 5 times, and the participle t2Words in intention domain B can be matched 2 times and words in intention domain C3 times.
The information of each word segmentation can also comprise the total times of searching by the user in a preset time period, wherein the total times comprise the times of searching of each word segmentation. In the present embodiment, the preset time period may be 30 days. Of course, in other embodiments, other lengths of time are possible.
For example, in 30 days, a total of 100000 searches including the participle t are performed by all users1The number of times of (1) is 100, including the word segmentation t2The number of times of200 times.
S2, respectively calculating the sum of word frequencies corresponding to each participle and the frequency of the inverted document according to the information;
in this embodiment, the sum of the word frequencies corresponding to each participle may be calculated according to the following formula:
h is an intention domain set which can be matched by all the participles and consists of a plurality of different intention domains, and f is one of the intention domains;
nfis the number of words in the intention field f;
wfis the weight of the intention field f.
In this embodiment, the frequency of the inverted document corresponding to each word segmentation can be calculated according to the following formula
N is the total searching times of the user in a preset time period; n (t)i) To contain each participle tiLog is the natural logarithm.
And S3, calculating the effectiveness score of the search intention identification process to be evaluated by using the sum of the word frequencies corresponding to each participle obtained in the step S2 and the frequency of the inverted documents, and further judging whether the search intention identification process is effective.
In this embodiment, the effectiveness score R of the search intention recognition process to be evaluated may be calculated according to the following formula:
where n is the total number of tokens.
In a further preferred embodiment, step S3 may further include:
comparing the validity score with a preset threshold, and if the validity score is greater than the preset threshold, judging that the search intention identification process is valid; if the validity score is less than the preset threshold, it may be determined that the search intent recognition process is invalid.
The following is a practical example to specifically explain how to judge the evaluation of the validity of the search intention recognition.
Suppose there are three intention domains, and the number of words and the weight of each intention domain are:
intention domain A: n isA=1000,wA=1.0
An intention domain B: n isB=500,wB=0.5
An intention domain C: n isC=100,wC=0.8
In one recognition, the words can be divided into two words t according to the word segmentation1、t2
Wherein t is1Matches the words in the intention Domain A5 times, t2Matching the word 2 times in the intention domain B and the word 1 time in the intention domain C.
The total number of the user is 100000 searches in 30 days, wherein the word t is contained1Has 100 searches containing the word t2There are 200 searches.
The intent match relevance score for this search is then:
5/1000*1.0*log(100000/100)+(2/500*0.5+1/100*0.8)*log(100000/200)=0.0967
0.0967 is then compared with a preset threshold to determine whether the search intent recognition is valid.
The evaluation method for judging the validity of the search intention recognition provided by the embodiment can solve the problem that the traditional correlation evaluation method cannot be applied, and can judge the validity of the intention recognition more scientifically and effectively.
Based on the same inventive concept, as shown in fig. 2, the embodiment of the present invention further provides a computer-readable storage medium 201, on which executable instructions 202 are stored, and when the executable instructions 202 are executed by one or more processors, the steps of the evaluation method for judging the validity of the search intention identification as the above embodiment can be implemented.
Based on the same inventive concept, referring to fig. 3, an embodiment of the present invention further provides an electronic device 301, where the electronic device 301 may include:
a memory 310 for storing executable instructions 311; and
a processor 320 for executing the executable instructions 311 stored in the memory 310 to implement the steps of the evaluation method for judging the validity of the search intention identification as any one of the above embodiments.
It should also be noted that, in the case of the embodiments of the present invention, features of the embodiments and examples may be combined with each other to obtain a new embodiment without conflict.
Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.
Claims (4)
1. An evaluation method for judging the effectiveness of search intention recognition comprises the following steps:
s1, acquiring the total number of participles and information of each participle in the process of identifying the search intention to be evaluated, wherein the information comprises at least one intention domain matched with each participle, and each intention domain has a preset weight; the number of times each participle is matched in the intention domain; the total search times of the user in a preset time period and the search times of each word in the total search times are included;
s2, respectively calculating the sum of word frequencies corresponding to each participle and the frequency of the inverted document according to the information;
s3, calculating the effectiveness score of the search intention identification process to be evaluated by using the sum of the word frequency corresponding to each participle and the inverted document frequency obtained in the step S2, and further judging whether the search intention identification process is effective;
wherein, the sum of the word frequencies corresponding to each participle is calculated according to the following formula:
h is a matched intention domain set and consists of a plurality of different intention domains, and f is one intention domain;
nfis the number of words in the intention field f;
wfis the weight of the intention field f;
calculating the frequency of the inverted document corresponding to each word segmentation according to the following formula
N is the total searching times of the user in a preset time period; n (t)i) To contain each participle tiLog is a natural logarithm;
calculating the effectiveness score R of the search intention identification process to be evaluated according to the following formula:
where n is the total number of tokens.
2. The method of claim 1, wherein step S3 further comprises:
comparing the validity score with a preset threshold, and if the validity score is greater than the preset threshold, judging that the search intention identification process is valid; and if the validity score is smaller than a preset threshold value, judging that the search intention identification process is invalid.
3. A computer-readable storage medium having stored thereon executable instructions, wherein the instructions, when executed by a processor, implement the steps of the method of any one of claims 1-2.
4. An electronic device, comprising:
a memory for storing executable instructions; and
a processor for executing executable instructions stored in the memory to implement the steps of the method of any of claims 1-2.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810202366.5A CN108415903B (en) | 2018-03-12 | 2018-03-12 | Evaluation method, storage medium, and apparatus for judging validity of search intention recognition |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810202366.5A CN108415903B (en) | 2018-03-12 | 2018-03-12 | Evaluation method, storage medium, and apparatus for judging validity of search intention recognition |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108415903A CN108415903A (en) | 2018-08-17 |
CN108415903B true CN108415903B (en) | 2021-09-07 |
Family
ID=63131129
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810202366.5A Active CN108415903B (en) | 2018-03-12 | 2018-03-12 | Evaluation method, storage medium, and apparatus for judging validity of search intention recognition |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108415903B (en) |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101661474A (en) * | 2008-08-26 | 2010-03-03 | 华为技术有限公司 | Search method and system |
CN102768679A (en) * | 2012-06-25 | 2012-11-07 | 深圳市汉络计算机技术有限公司 | Searching method and searching system |
CN102999521A (en) * | 2011-09-15 | 2013-03-27 | 北京百度网讯科技有限公司 | Method and device for identifying search requirement |
CN103186574A (en) * | 2011-12-29 | 2013-07-03 | 北京百度网讯科技有限公司 | Method and device for generating searching result |
CN103823906A (en) * | 2014-03-19 | 2014-05-28 | 北京邮电大学 | Multi-dimension searching sequencing optimization algorithm and tool based on microblog data |
CN104679778A (en) * | 2013-11-29 | 2015-06-03 | 腾讯科技(深圳)有限公司 | Search result generating method and device |
CN104838375A (en) * | 2012-11-13 | 2015-08-12 | 微软技术许可有限责任公司 | Intent-based presentation of search results |
CN105005589A (en) * | 2015-06-26 | 2015-10-28 | 腾讯科技(深圳)有限公司 | Text classification method and text classification device |
CN106959971A (en) * | 2016-01-12 | 2017-07-18 | 阿里巴巴集团控股有限公司 | The processing method and processing device of user behavior data |
CN107133259A (en) * | 2017-03-22 | 2017-09-05 | 北京晓数聚传媒科技有限公司 | A kind of searching method and device |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101820592A (en) * | 2009-02-27 | 2010-09-01 | 华为技术有限公司 | Method and device for mobile search |
CN103246681B (en) * | 2012-02-13 | 2018-10-26 | 深圳市世纪光速信息技术有限公司 | A kind of searching method and device |
CN104050163B (en) * | 2013-03-11 | 2017-08-25 | 广州帷策智能科技有限公司 | Content recommendation system |
US11392629B2 (en) * | 2014-11-18 | 2022-07-19 | Oracle International Corporation | Term selection from a document to find similar content |
CN106021626A (en) * | 2016-07-27 | 2016-10-12 | 成都四象联创科技有限公司 | Data search method based on data mining |
CN106502980B (en) * | 2016-10-09 | 2019-05-17 | 武汉斗鱼网络科技有限公司 | A kind of search method and system based on text morpheme cutting |
-
2018
- 2018-03-12 CN CN201810202366.5A patent/CN108415903B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101661474A (en) * | 2008-08-26 | 2010-03-03 | 华为技术有限公司 | Search method and system |
CN102999521A (en) * | 2011-09-15 | 2013-03-27 | 北京百度网讯科技有限公司 | Method and device for identifying search requirement |
CN103186574A (en) * | 2011-12-29 | 2013-07-03 | 北京百度网讯科技有限公司 | Method and device for generating searching result |
CN102768679A (en) * | 2012-06-25 | 2012-11-07 | 深圳市汉络计算机技术有限公司 | Searching method and searching system |
CN104838375A (en) * | 2012-11-13 | 2015-08-12 | 微软技术许可有限责任公司 | Intent-based presentation of search results |
CN104679778A (en) * | 2013-11-29 | 2015-06-03 | 腾讯科技(深圳)有限公司 | Search result generating method and device |
CN103823906A (en) * | 2014-03-19 | 2014-05-28 | 北京邮电大学 | Multi-dimension searching sequencing optimization algorithm and tool based on microblog data |
CN105005589A (en) * | 2015-06-26 | 2015-10-28 | 腾讯科技(深圳)有限公司 | Text classification method and text classification device |
CN106959971A (en) * | 2016-01-12 | 2017-07-18 | 阿里巴巴集团控股有限公司 | The processing method and processing device of user behavior data |
CN107133259A (en) * | 2017-03-22 | 2017-09-05 | 北京晓数聚传媒科技有限公司 | A kind of searching method and device |
Also Published As
Publication number | Publication date |
---|---|
CN108415903A (en) | 2018-08-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107329949B (en) | Semantic matching method and system | |
Hartawan et al. | Using vector space model in question answering system | |
WO2020140373A1 (en) | Intention recognition method, recognition device and computer-readable storage medium | |
WO2021212801A1 (en) | Evaluation object identification method and apparatus for e-commerce product, and storage medium | |
CN106874441A (en) | Intelligent answer method and apparatus | |
CN109062912B (en) | Translation quality evaluation method and device | |
CN110990533B (en) | Method and device for determining standard text corresponding to query text | |
WO2015085805A1 (en) | Method and apparatus for determining core word of image cluster description text | |
CN108763272B (en) | A kind of event information analysis method, computer readable storage medium and terminal device | |
CN110765760A (en) | Legal case distribution method and device, storage medium and server | |
US9087122B2 (en) | Corpus search improvements using term normalization | |
CN111274366A (en) | Search recommendation method and device, equipment and storage medium | |
JP2009193219A (en) | Indexing apparatus, method thereof, program, and recording medium | |
CN110362813B (en) | Search relevance measuring method, storage medium, device and system based on BM25 | |
CN110674388A (en) | Mapping method and device for push item, storage medium and terminal equipment | |
CN108415903B (en) | Evaluation method, storage medium, and apparatus for judging validity of search intention recognition | |
CN106202127B (en) | Method and device for processing retrieval request by vertical search engine | |
CN111476026A (en) | Statement vector determination method and device, electronic equipment and storage medium | |
CN112182448A (en) | Page information processing method, device and equipment | |
CN113240322B (en) | Climate risk disclosure quality method, apparatus, electronic device, and storage medium | |
CN112115237B (en) | Construction method and device of tobacco science and technology literature data recommendation model | |
CN110909532B (en) | User name matching method and device, computer equipment and storage medium | |
CN110851560B (en) | Information retrieval method, device and equipment | |
CN114491056A (en) | Method and system for improving POI (Point of interest) search in digital police scene | |
CN113591004A (en) | Game tag generation method and device, storage medium and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |