CN108415903B - Evaluation method, storage medium, and apparatus for judging validity of search intention recognition - Google Patents

Evaluation method, storage medium, and apparatus for judging validity of search intention recognition Download PDF

Info

Publication number
CN108415903B
CN108415903B CN201810202366.5A CN201810202366A CN108415903B CN 108415903 B CN108415903 B CN 108415903B CN 201810202366 A CN201810202366 A CN 201810202366A CN 108415903 B CN108415903 B CN 108415903B
Authority
CN
China
Prior art keywords
intention
search
participle
search intention
domain
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810202366.5A
Other languages
Chinese (zh)
Other versions
CN108415903A (en
Inventor
王璐
陈少杰
张文明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Douyu Network Technology Co Ltd
Original Assignee
Wuhan Douyu Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Douyu Network Technology Co Ltd filed Critical Wuhan Douyu Network Technology Co Ltd
Priority to CN201810202366.5A priority Critical patent/CN108415903B/en
Publication of CN108415903A publication Critical patent/CN108415903A/en
Application granted granted Critical
Publication of CN108415903B publication Critical patent/CN108415903B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Abstract

The invention provides an evaluation method for judging the identification effectiveness of a search intention, which comprises the following steps: s1, acquiring the total number of the participles and the information of each participle in the process of identifying the search intention to be evaluated; s2, respectively calculating the sum of word frequencies corresponding to each participle and the frequency of the inverted document according to the information; and S3, calculating the effectiveness score of the search intention identification process to be evaluated by using the sum of the word frequencies corresponding to each participle obtained in the step S2 and the frequency of the inverted documents, and further judging whether the search intention identification process is effective. The invention also relates to a related computer readable storage medium and an electronic device.

Description

Evaluation method, storage medium, and apparatus for judging validity of search intention recognition
Technical Field
The invention relates to the field of big data search, in particular to an evaluation method for judging the identification effectiveness of a search intention, a related storage medium and electronic equipment.
Background
On the live broadcast platform, the real intention of the user can be guessed according to the search query of the user, and more accurate search results can be returned through the real intention. However, to the extent that the results reflect the user's true intent, the actual utility of such identified weak intent may be very low if the correlation is poor. Therefore, the problem to be solved is how to measure the relevance of the intention matching so as to judge the effectiveness of the intention identification.
Unlike the scenario where results are returned directly from text matching of search terms, there may be no textual relevance between the search results returned after using the intent recognition algorithm and the search terms, so it is very easy to measure relevance using text edit distance.
Therefore, it is necessary to propose a new evaluation method for judging the validity of the search intention recognition.
Disclosure of Invention
In view of the above, in order to overcome at least one aspect of the above problems, embodiments of the present invention provide an evaluation method for determining validity of search intention recognition based on TF-IDF.
According to an aspect of the present invention, there is provided an evaluation method of judging validity of search intention recognition, including the steps of:
s1, acquiring the total number of the participles and the information of each participle in the process of identifying the search intention to be evaluated;
s2, respectively calculating the sum of word frequencies corresponding to each participle and the frequency of the inverted document according to the information;
and S3, calculating the effectiveness score of the search intention identification process to be evaluated by using the sum of the word frequencies corresponding to each participle obtained in the step S2 and the frequency of the inverted documents, and further judging whether the search intention identification process is effective.
For example, the information includes intention domains matched with each participle, wherein each intention domain has a preset weight; the number of times each participle is matched in the intention domain; the total times of searching of the user in a preset time period and the times of searching containing each participle in the total times of searching.
For example, the sum of the word frequencies corresponding to each participle is calculated according to the following formula:
Figure BDA0001594427840000021
h is a matched intention domain set and consists of a plurality of different intention domains, and f is one intention domain;
Figure BDA0001594427840000022
is a word segmentation tiThe number of times that it can be matched in the intention field f;
nfis the number of words in the intention field f;
wfis the weight of the intention field f.
For example, the frequency of the inverted document corresponding to each word is calculated according to the following formula
Figure BDA0001594427840000023
Figure BDA0001594427840000024
N is the total searching times of the user in a preset time period; n (t)i) To contain each participle tiLog is the natural logarithm.
For example, the effectiveness score R of the search intention recognition process to be evaluated is calculated according to the following formula:
Figure BDA0001594427840000025
where n is the total number of tokens.
Further, step S3 further includes:
comparing the validity score with a preset threshold, and if the validity score is greater than the preset threshold, judging that the search intention identification process is valid; and if the validity score is smaller than a preset threshold value, judging that the search intention identification process is invalid.
The present invention also provides a computer-readable storage medium having stored thereon executable instructions, characterized in that the instructions, when executed by a processor, implement the steps of any of the evaluation methods for judging the validity of search intention recognition as described above.
The present invention also provides an electronic device, comprising:
a memory for storing executable instructions; and
a processor for executing the executable instructions stored in the memory to implement the steps of any of the evaluation methods for determining the validity of search intention recognition as described above.
Compared with the prior art, the method can scientifically and accurately judge whether the search intention identification is effective or not, and solves the problem that the traditional correlation evaluation method cannot be applied.
Drawings
Other objects and advantages of the present invention will become apparent from the following description of the invention which refers to the accompanying drawings, and may assist in a comprehensive understanding of the invention.
Fig. 1 is a flowchart illustrating implementation steps of an evaluation method for determining validity of search intention recognition according to an embodiment of the present invention;
FIG. 2 is a schematic structural diagram of a computer-readable storage medium according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be described clearly and completely with reference to the accompanying drawings of the embodiments of the present invention. It should be apparent that the described embodiment is one embodiment of the invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the described embodiments of the invention without any inventive step, are within the scope of protection of the invention.
Unless defined otherwise, technical or scientific terms used herein shall have the ordinary meaning as understood by one of ordinary skill in the art to which this invention belongs.
In this context, the expression search intention refers to the content of intention that the user actually wants to search, as judged from the user search query phrase.
The expression intention domain refers to an intention set which divides the search intention of the user according to business experience, and a main broadcasting intention domain, a partition intention domain and the like are common in live broadcast search. Each intention field is composed of several index words.
According to one aspect of the invention, an evaluation method for judging the effectiveness of search intention recognition is provided, and the specific implementation idea is as follows:
based on TF-FID, the word frequency of all the participles obtained after the participle processing in the search intention identification process and the frequency of the participles appearing in a preset time period are calculated, so that the evaluation score of the search intention identification process is obtained, and whether the identification process is effective or not can be judged.
More specifically, the evaluation method for determining the validity of search intention recognition according to the present invention will be described in detail below with reference to the accompanying drawings.
Referring to fig. 1, an evaluation method for determining validity of search intention recognition according to an embodiment of the present invention may include the following steps:
s1, acquiring the total number of the participles and the information of each participle in the process of identifying the search intention to be evaluated;
in this embodiment, the information of each participle may include an intention domain matched with each participle, and it should be noted that the number of the intention domains matched with each participle may be different, and the category may also be different, for example, the participle t may be different1Can be matched to an intention field A and divided into words t2The intention domain B and the intention domain C can be matched. And different intention domains all have preset weights, and the weights can be set according to the prior business experience.
The information of each participle may also include the number of times each participle can be matched in the respective matched intention domain, i.e. the number of times a participle can be matched to a word in the intention domain. For example, the word segmentation t1The words in the intention field A can be matched 5 times, and the participle t2Words in intention domain B can be matched 2 times and words in intention domain C3 times.
The information of each word segmentation can also comprise the total times of searching by the user in a preset time period, wherein the total times comprise the times of searching of each word segmentation. In the present embodiment, the preset time period may be 30 days. Of course, in other embodiments, other lengths of time are possible.
For example, in 30 days, a total of 100000 searches including the participle t are performed by all users1The number of times of (1) is 100, including the word segmentation t2The number of times of200 times.
S2, respectively calculating the sum of word frequencies corresponding to each participle and the frequency of the inverted document according to the information;
in this embodiment, the sum of the word frequencies corresponding to each participle may be calculated according to the following formula:
Figure BDA0001594427840000041
h is an intention domain set which can be matched by all the participles and consists of a plurality of different intention domains, and f is one of the intention domains;
Figure BDA0001594427840000051
is a word segmentation tiThe number of times that it can be matched in the intention field f;
nfis the number of words in the intention field f;
wfis the weight of the intention field f.
In this embodiment, the frequency of the inverted document corresponding to each word segmentation can be calculated according to the following formula
Figure BDA0001594427840000052
Figure BDA0001594427840000053
N is the total searching times of the user in a preset time period; n (t)i) To contain each participle tiLog is the natural logarithm.
And S3, calculating the effectiveness score of the search intention identification process to be evaluated by using the sum of the word frequencies corresponding to each participle obtained in the step S2 and the frequency of the inverted documents, and further judging whether the search intention identification process is effective.
In this embodiment, the effectiveness score R of the search intention recognition process to be evaluated may be calculated according to the following formula:
Figure BDA0001594427840000054
where n is the total number of tokens.
In a further preferred embodiment, step S3 may further include:
comparing the validity score with a preset threshold, and if the validity score is greater than the preset threshold, judging that the search intention identification process is valid; if the validity score is less than the preset threshold, it may be determined that the search intent recognition process is invalid.
The following is a practical example to specifically explain how to judge the evaluation of the validity of the search intention recognition.
Suppose there are three intention domains, and the number of words and the weight of each intention domain are:
intention domain A: n isA=1000,wA=1.0
An intention domain B: n isB=500,wB=0.5
An intention domain C: n isC=100,wC=0.8
In one recognition, the words can be divided into two words t according to the word segmentation1、t2
Wherein t is1Matches the words in the intention Domain A5 times, t2Matching the word 2 times in the intention domain B and the word 1 time in the intention domain C.
The total number of the user is 100000 searches in 30 days, wherein the word t is contained1Has 100 searches containing the word t2There are 200 searches.
The intent match relevance score for this search is then:
5/1000*1.0*log(100000/100)+(2/500*0.5+1/100*0.8)*log(100000/200)=0.0967
0.0967 is then compared with a preset threshold to determine whether the search intent recognition is valid.
The evaluation method for judging the validity of the search intention recognition provided by the embodiment can solve the problem that the traditional correlation evaluation method cannot be applied, and can judge the validity of the intention recognition more scientifically and effectively.
Based on the same inventive concept, as shown in fig. 2, the embodiment of the present invention further provides a computer-readable storage medium 201, on which executable instructions 202 are stored, and when the executable instructions 202 are executed by one or more processors, the steps of the evaluation method for judging the validity of the search intention identification as the above embodiment can be implemented.
Based on the same inventive concept, referring to fig. 3, an embodiment of the present invention further provides an electronic device 301, where the electronic device 301 may include:
a memory 310 for storing executable instructions 311; and
a processor 320 for executing the executable instructions 311 stored in the memory 310 to implement the steps of the evaluation method for judging the validity of the search intention identification as any one of the above embodiments.
It should also be noted that, in the case of the embodiments of the present invention, features of the embodiments and examples may be combined with each other to obtain a new embodiment without conflict.
Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims (4)

1. An evaluation method for judging the effectiveness of search intention recognition comprises the following steps:
s1, acquiring the total number of participles and information of each participle in the process of identifying the search intention to be evaluated, wherein the information comprises at least one intention domain matched with each participle, and each intention domain has a preset weight; the number of times each participle is matched in the intention domain; the total search times of the user in a preset time period and the search times of each word in the total search times are included;
s2, respectively calculating the sum of word frequencies corresponding to each participle and the frequency of the inverted document according to the information;
s3, calculating the effectiveness score of the search intention identification process to be evaluated by using the sum of the word frequency corresponding to each participle and the inverted document frequency obtained in the step S2, and further judging whether the search intention identification process is effective;
wherein, the sum of the word frequencies corresponding to each participle is calculated according to the following formula:
Figure FDA0003112395800000011
h is a matched intention domain set and consists of a plurality of different intention domains, and f is one intention domain;
Figure FDA0003112395800000012
is a word segmentation tiThe number of times that it can be matched in the intention field f;
nfis the number of words in the intention field f;
wfis the weight of the intention field f;
calculating the frequency of the inverted document corresponding to each word segmentation according to the following formula
Figure FDA0003112395800000013
Figure FDA0003112395800000014
N is the total searching times of the user in a preset time period; n (t)i) To contain each participle tiLog is a natural logarithm;
calculating the effectiveness score R of the search intention identification process to be evaluated according to the following formula:
Figure FDA0003112395800000015
where n is the total number of tokens.
2. The method of claim 1, wherein step S3 further comprises:
comparing the validity score with a preset threshold, and if the validity score is greater than the preset threshold, judging that the search intention identification process is valid; and if the validity score is smaller than a preset threshold value, judging that the search intention identification process is invalid.
3. A computer-readable storage medium having stored thereon executable instructions, wherein the instructions, when executed by a processor, implement the steps of the method of any one of claims 1-2.
4. An electronic device, comprising:
a memory for storing executable instructions; and
a processor for executing executable instructions stored in the memory to implement the steps of the method of any of claims 1-2.
CN201810202366.5A 2018-03-12 2018-03-12 Evaluation method, storage medium, and apparatus for judging validity of search intention recognition Active CN108415903B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810202366.5A CN108415903B (en) 2018-03-12 2018-03-12 Evaluation method, storage medium, and apparatus for judging validity of search intention recognition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810202366.5A CN108415903B (en) 2018-03-12 2018-03-12 Evaluation method, storage medium, and apparatus for judging validity of search intention recognition

Publications (2)

Publication Number Publication Date
CN108415903A CN108415903A (en) 2018-08-17
CN108415903B true CN108415903B (en) 2021-09-07

Family

ID=63131129

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810202366.5A Active CN108415903B (en) 2018-03-12 2018-03-12 Evaluation method, storage medium, and apparatus for judging validity of search intention recognition

Country Status (1)

Country Link
CN (1) CN108415903B (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101661474A (en) * 2008-08-26 2010-03-03 华为技术有限公司 Search method and system
CN102768679A (en) * 2012-06-25 2012-11-07 深圳市汉络计算机技术有限公司 Searching method and searching system
CN102999521A (en) * 2011-09-15 2013-03-27 北京百度网讯科技有限公司 Method and device for identifying search requirement
CN103186574A (en) * 2011-12-29 2013-07-03 北京百度网讯科技有限公司 Method and device for generating searching result
CN103823906A (en) * 2014-03-19 2014-05-28 北京邮电大学 Multi-dimension searching sequencing optimization algorithm and tool based on microblog data
CN104679778A (en) * 2013-11-29 2015-06-03 腾讯科技(深圳)有限公司 Search result generating method and device
CN104838375A (en) * 2012-11-13 2015-08-12 微软技术许可有限责任公司 Intent-based presentation of search results
CN105005589A (en) * 2015-06-26 2015-10-28 腾讯科技(深圳)有限公司 Text classification method and text classification device
CN106959971A (en) * 2016-01-12 2017-07-18 阿里巴巴集团控股有限公司 The processing method and processing device of user behavior data
CN107133259A (en) * 2017-03-22 2017-09-05 北京晓数聚传媒科技有限公司 A kind of searching method and device

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101820592A (en) * 2009-02-27 2010-09-01 华为技术有限公司 Method and device for mobile search
CN103246681B (en) * 2012-02-13 2018-10-26 深圳市世纪光速信息技术有限公司 A kind of searching method and device
CN104050163B (en) * 2013-03-11 2017-08-25 广州帷策智能科技有限公司 Content recommendation system
US11392629B2 (en) * 2014-11-18 2022-07-19 Oracle International Corporation Term selection from a document to find similar content
CN106021626A (en) * 2016-07-27 2016-10-12 成都四象联创科技有限公司 Data search method based on data mining
CN106502980B (en) * 2016-10-09 2019-05-17 武汉斗鱼网络科技有限公司 A kind of search method and system based on text morpheme cutting

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101661474A (en) * 2008-08-26 2010-03-03 华为技术有限公司 Search method and system
CN102999521A (en) * 2011-09-15 2013-03-27 北京百度网讯科技有限公司 Method and device for identifying search requirement
CN103186574A (en) * 2011-12-29 2013-07-03 北京百度网讯科技有限公司 Method and device for generating searching result
CN102768679A (en) * 2012-06-25 2012-11-07 深圳市汉络计算机技术有限公司 Searching method and searching system
CN104838375A (en) * 2012-11-13 2015-08-12 微软技术许可有限责任公司 Intent-based presentation of search results
CN104679778A (en) * 2013-11-29 2015-06-03 腾讯科技(深圳)有限公司 Search result generating method and device
CN103823906A (en) * 2014-03-19 2014-05-28 北京邮电大学 Multi-dimension searching sequencing optimization algorithm and tool based on microblog data
CN105005589A (en) * 2015-06-26 2015-10-28 腾讯科技(深圳)有限公司 Text classification method and text classification device
CN106959971A (en) * 2016-01-12 2017-07-18 阿里巴巴集团控股有限公司 The processing method and processing device of user behavior data
CN107133259A (en) * 2017-03-22 2017-09-05 北京晓数聚传媒科技有限公司 A kind of searching method and device

Also Published As

Publication number Publication date
CN108415903A (en) 2018-08-17

Similar Documents

Publication Publication Date Title
CN107329949B (en) Semantic matching method and system
Hartawan et al. Using vector space model in question answering system
WO2020140373A1 (en) Intention recognition method, recognition device and computer-readable storage medium
WO2021212801A1 (en) Evaluation object identification method and apparatus for e-commerce product, and storage medium
CN106874441A (en) Intelligent answer method and apparatus
CN109062912B (en) Translation quality evaluation method and device
CN110990533B (en) Method and device for determining standard text corresponding to query text
WO2015085805A1 (en) Method and apparatus for determining core word of image cluster description text
CN108763272B (en) A kind of event information analysis method, computer readable storage medium and terminal device
CN110765760A (en) Legal case distribution method and device, storage medium and server
US9087122B2 (en) Corpus search improvements using term normalization
CN111274366A (en) Search recommendation method and device, equipment and storage medium
JP2009193219A (en) Indexing apparatus, method thereof, program, and recording medium
CN110362813B (en) Search relevance measuring method, storage medium, device and system based on BM25
CN110674388A (en) Mapping method and device for push item, storage medium and terminal equipment
CN108415903B (en) Evaluation method, storage medium, and apparatus for judging validity of search intention recognition
CN106202127B (en) Method and device for processing retrieval request by vertical search engine
CN111476026A (en) Statement vector determination method and device, electronic equipment and storage medium
CN112182448A (en) Page information processing method, device and equipment
CN113240322B (en) Climate risk disclosure quality method, apparatus, electronic device, and storage medium
CN112115237B (en) Construction method and device of tobacco science and technology literature data recommendation model
CN110909532B (en) User name matching method and device, computer equipment and storage medium
CN110851560B (en) Information retrieval method, device and equipment
CN114491056A (en) Method and system for improving POI (Point of interest) search in digital police scene
CN113591004A (en) Game tag generation method and device, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant