CN112749246B - Evaluation method and device of search phrase, server and storage medium - Google Patents

Evaluation method and device of search phrase, server and storage medium Download PDF

Info

Publication number
CN112749246B
CN112749246B CN201911048275.1A CN201911048275A CN112749246B CN 112749246 B CN112749246 B CN 112749246B CN 201911048275 A CN201911048275 A CN 201911048275A CN 112749246 B CN112749246 B CN 112749246B
Authority
CN
China
Prior art keywords
integrity
search phrase
relevance
search
phrase
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911048275.1A
Other languages
Chinese (zh)
Other versions
CN112749246A (en
Inventor
田沐燃
郝心
李晓亮
黄艺华
刘一岑
曹晟
龙柏炜
张懿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201911048275.1A priority Critical patent/CN112749246B/en
Publication of CN112749246A publication Critical patent/CN112749246A/en
Application granted granted Critical
Publication of CN112749246B publication Critical patent/CN112749246B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • G06F16/325Hash tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The embodiment of the invention discloses a search phrase evaluation method, a search phrase evaluation device, a search phrase server and a search phrase storage medium. The evaluation method of the search phrase comprises the following steps: acquiring corpus; obtaining a search phrase corresponding to the corpus, wherein the search phrase is a phrase generated by combining keywords in the corpus with matched popular search words, and the popular search words are entity words with a search quantity larger than a preset threshold; detecting the search phrase to obtain multidimensional feature data of the search phrase; and evaluating the search phrase according to the multidimensional feature data. The embodiment of the invention effectively improves the evaluation efficiency of the search phrase through the multidimensional intelligent evaluation of the search phrase.

Description

Evaluation method and device of search phrase, server and storage medium
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a method and apparatus for evaluating a search phrase, a server, and a storage medium.
Background
Contextual search is an important direction of search for content product development. By comprehensively considering the background and the interest and hobbies of the user, the user intention is deeply understood, the potential search requirement of the user is fully mined, and the situation search can provide the content wanted for the user more intelligently and conveniently, so that the traditional search mode of 'search word-search result' is broken through. Related search phrases are recommended based on text content, the extended reading requirement of a user is met, and the method is one of main application scenes of contextual search.
Currently, recommended search phrases are mainly generated by an algorithm model, but a corpus on which the algorithm model depends has unreliability, so that the search phrases generated by the algorithm model are unreliable. Aiming at the problem, the prior art adopts manual evaluation to evaluate search phrases generated by an algorithm model one by one, but the manual evaluation leads to low evaluation efficiency.
Disclosure of Invention
The invention provides a search phrase evaluation method, a search phrase evaluation device, a search phrase evaluation server and a search phrase storage medium, which can effectively improve the evaluation efficiency of search phrases.
In a first aspect, the present invention provides a method for evaluating a search phrase, including:
acquiring corpus;
acquiring search phrases recommended based on the corpus; the search phrase is a phrase generated by combining keywords in the corpus with matched hot search words, and the hot search words are entity words with the search quantity larger than a preset threshold value
Detecting the search phrase to obtain multidimensional feature data of the search phrase;
and evaluating the search phrase according to the multidimensional feature data.
In some embodiments of the present invention, the detecting the search phrase to obtain multidimensional feature data of the search phrase specifically includes:
Detecting the relevance of the search phrase and the corpus to obtain a relevance;
detecting the integrity of the search phrase to obtain the integrity;
detecting the availability of the search phrase according to preset availability conditions to obtain availability;
the relevance, the completeness, and the availability are added to multi-dimensional feature data of the search phrase.
In some embodiments of the present invention, the detecting the relevance between the search phrase and the corpus to obtain the relevance specifically includes:
detecting the topic relevance of the search phrase and the corpus to obtain topic relevance;
detecting the entity correlation between the search phrase and the corpus to obtain an entity correlation degree;
and determining the relevance of the search phrase and the corpus according to the topic relevance and the entity relevance.
In some embodiments of the present invention, the detecting the topic relevance between the search phrase and the corpus to obtain the topic relevance specifically includes:
identifying a central word of the corpus;
identifying core words in the search phrase, wherein the core words comprise the key words or the popular search words;
Detecting whether the center word and the core word meet at least one theme condition in preset theme conditions according to a pre-established knowledge graph; the subject conditions include belonging to the same concept, being associated with the same event, having a affiliation, or belonging to the same subject;
if not, determining the topic relevance of the search phrase and the corpus as a first topic relevance;
if yes, determining that the topic relevance of the search phrase and the corpus is second topic relevance, wherein the second topic relevance is larger than the first topic relevance.
In some embodiments of the present invention, the detecting the entity relevance between the search phrase and the corpus to obtain the entity relevance specifically includes:
detecting whether ambiguity exists between the central word and the core word according to a pre-established knowledge graph;
if yes, determining the entity correlation degree of the search phrase and the corpus as a first entity correlation degree;
if not, determining that the entity correlation degree of the search phrase and the corpus is a second entity correlation degree, wherein the second entity correlation degree is larger than the first entity correlation degree.
In some embodiments of the present invention, the determining, according to the topic relevance and the entity relevance, the relevance between the search phrase and the corpus specifically includes:
If the topic relevance is a first topic relevance, determining that the relevance of the search phrase and the corpus is the first relevance;
if the topic relevance is a second topic relevance and the entity relevance is a first entity relevance, determining that the relevance between the search phrase and the corpus is a second relevance;
if the topic relevance is a second topic relevance and the entity relevance is a second entity relevance, determining that the relevance of the search phrase and the corpus is a third relevance, wherein the first relevance, the second relevance and the third relevance are sequentially increased.
In some embodiments of the present invention, the detecting the integrity of the search phrase to obtain the integrity specifically includes:
detecting the text integrity of the search phrase to obtain the text integrity;
detecting the semantic integrity of the search phrase to obtain the semantic integrity;
and determining the integrity of the search phrase according to the text integrity and the semantic integrity.
In some embodiments of the present invention, the detecting the text integrity of the search phrase to obtain the text integrity specifically includes:
Detecting whether the grammar structure of the search phrase is complete;
if the grammar structure is incomplete, determining that the text integrity of the search phrase is the first text integrity;
if the grammar structure is complete, detecting whether the search phrase is a parallel phrase or a bias phrase;
if yes, determining that the text integrity of the search phrase is second text integrity;
if not, determining that the text integrity of the search phrase is third text integrity, wherein the first text integrity, the second text integrity and the third text are sequentially increased.
In some embodiments of the present invention, the detecting the semantic integrity of the search phrase to obtain the semantic integrity specifically includes:
identifying semantic information corresponding to the search phrase;
if the corresponding semantic information is not identified, determining the semantic integrity of the search phrase as a first semantic integrity;
if the search phrase is identified to correspond to at least two semantic information, determining that the semantic integrity of the search phrase is a second semantic integrity;
if the search phrase is identified to correspond to one piece of semantic information, determining that the semantic integrity of the search phrase is third semantic integrity, wherein the first semantic integrity, the second semantic integrity and the third semantic integrity are sequentially increased.
In some embodiments of the present invention, the determining the integrity of the search phrase according to the text integrity and the semantic integrity specifically includes:
if the text integrity is the first text integrity or the semantic integrity is the first semantic integrity, determining that the integrity of the search phrase is the first integrity;
if the text integrity is the second text integrity or the third text integrity and the semantic integrity is the second semantic integrity, determining that the integrity of the search phrase is the second integrity;
if the text integrity is the second text integrity or the third text integrity and the semantic integrity is the third semantic integrity, determining that the integrity of the search phrase is the third integrity, wherein the first integrity, the second integrity and the third integrity are sequentially increased.
In some embodiments of the present invention, the correlation comprises a first correlation, a second phase Guan Du, and a third correlation that are sequentially incremented, the integrity comprises a first integrity, a second integrity, and a third integrity that are sequentially incremented, and the availability comprises a first availability and a second availability that are sequentially incremented;
The step of evaluating the search phrase according to the multidimensional feature data specifically comprises the following steps:
if the relevance is first relevance, or the integrity is first integrity, or the availability is first availability, evaluating the recommendation degree of the search phrase as first recommendation degree;
if the relevance is a third relevance, the integrity is a third integrity, and the availability is a second availability, evaluating that the recommendation of the search phrase is a third recommendation;
otherwise, evaluating the recommendation degree of the search phrase as a second recommendation degree, wherein the first recommendation degree, the second recommendation degree and the third recommendation degree are sequentially increased.
In some embodiments of the invention, the method further comprises:
acquiring an evaluation result of the search phrase;
and if the evaluation result meets the preset evaluation condition, adding the search phrase and the evaluation result thereof into a training sample.
In some embodiments of the invention, the method further comprises:
and storing the evaluation result of the search phrase in the block chain in the form of a block.
In a second aspect, the present invention provides an evaluation apparatus for search phrases, including:
The corpus acquisition module is used for acquiring corpus;
the search phrase acquisition module is used for acquiring search phrases corresponding to the corpus; the search phrase is a phrase generated by combining keywords in the corpus with matched hot search words, and the hot search words are entity words with the search quantity larger than a preset threshold value;
the detection module is used for detecting the search phrase to obtain multidimensional feature data of the search phrase; the method comprises the steps of,
and the evaluation module is used for evaluating the search phrase according to the multidimensional characteristic data.
In a third aspect, the present invention provides a server comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of:
acquiring corpus;
acquiring search phrases corresponding to the corpus; the search phrase is a phrase generated by combining keywords in the corpus with matched hot search words, and the hot search words are entity words with the search quantity larger than a preset threshold value;
detecting the search phrase to obtain multidimensional feature data of the search phrase;
and evaluating the search phrase according to the multidimensional feature data.
In a fourth aspect, the present invention provides a storage medium storing a plurality of instructions adapted to be loaded by a processor to perform the steps in the method of evaluating a search phrase of any of the first aspects.
According to the embodiment of the invention, the corpus is obtained, the search phrase corresponding to the corpus is obtained, the search phrase is detected, the multidimensional feature data of the search phrase is obtained, the multidimensional intelligent evaluation of the search phrase is realized according to the multidimensional feature data, and the evaluation efficiency of the search phrase is effectively improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic illustration of one scenario of an evaluation system for search phrases provided in an embodiment of the present invention;
FIG. 2 is a schematic diagram of an alternative architecture of a distributed system for use in a blockchain system provided in an embodiment of the present invention;
FIG. 3 is an alternative schematic diagram of a block structure provided in an embodiment of the present invention;
FIG. 4 is a flow diagram of a method for evaluating a search phrase provided in an embodiment of the present invention;
FIG. 5 is a schematic diagram of a recommendation interface for searching phrases in an embodiment of the present invention;
FIG. 6 is another flow diagram of a method for evaluating a search phrase provided in an embodiment of the present invention;
FIG. 7 is a schematic diagram of a search phrase evaluation apparatus provided in an embodiment of the present invention;
fig. 8 is a schematic structural diagram of a server according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to fall within the scope of the invention.
In the description that follows, embodiments of the invention will be described with reference to steps and symbols performed by one or more computers, unless otherwise indicated. Thus, these steps and operations will be referred to in several instances as being performed by a computer, which as referred to herein performs operations that include processing units by the computer that represent electronic signals that represent data in a structured form. This operation transforms the data or maintains it in place in the computer's memory system, which may reconfigure or otherwise alter the computer's operation in a manner well known to those skilled in the art. The data structure maintained by the data is the physical location of the memory, which has specific characteristics defined by the data format. However, the principles of the present invention are described in the foregoing text and are not meant to be limiting, and one skilled in the art will recognize that various steps and operations described below may also be implemented in hardware.
The term "module" or "unit" as used herein may be considered a software object executing on the computing system. The various components, modules, engines, and services described herein may be viewed as implementing objects on the computing system. The apparatus and methods described herein are preferably implemented in software, but may of course also be implemented in hardware, all within the scope of the invention.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless expressly stated otherwise, as understood by those skilled in the art. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. The term "and/or" as used herein includes all or any element and all combination of one or more of the associated listed items.
The embodiment of the invention provides a search phrase evaluation method, a search phrase evaluation device, a search phrase evaluation server and a search phrase storage medium.
Artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use the knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.
The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.
Machine Learning (ML) is a multi-domain interdisciplinary, involving multiple disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, etc. It is specially studied how a computer simulates or implements learning behavior of a human to acquire new knowledge or skills, and reorganizes existing knowledge structures to continuously improve own performance. Machine learning is the core of artificial intelligence, a fundamental approach to letting computers have intelligence, which is applied throughout various areas of artificial intelligence. Machine learning and deep learning typically include techniques such as artificial neural networks, confidence networks, reinforcement learning, transfer learning, induction learning, teaching learning, and the like.
The scheme provided by the embodiment of the invention can be an evaluation method of search phrases related to artificial intelligence, namely the embodiment of the invention provides an evaluation method of search phrases based on artificial intelligence, which comprises the following steps: acquiring corpus; acquiring search phrases corresponding to the corpus; the search phrase is a phrase generated by combining keywords in the corpus with matched hot search words, and the hot search words are entity words with the search quantity larger than a preset threshold value; detecting the search phrase by using a machine learning algorithm to obtain multidimensional feature data of the search phrase; and evaluating the search phrase according to the multidimensional feature data.
Referring to fig. 1, fig. 1 is a schematic view of a scenario of a search phrase evaluation system according to an embodiment of the present invention, where the search phrase evaluation system may include a server 10, and an evaluation device of a search phrase is integrated in the server 10. The server 10 in the embodiment of the invention is mainly used for acquiring corpus; acquiring search phrases corresponding to the corpus; the search phrase is a phrase generated by combining keywords in the corpus with matched hot search words, and the hot search words are entity words with the search quantity larger than a preset threshold value; detecting the search phrase to obtain multidimensional feature data of the search phrase; and evaluating the search phrase according to the multidimensional feature data.
In the embodiment of the present invention, the server 10 may be an independent server, or may be a server network or a server cluster formed by servers, for example, the server 10 described in the embodiment of the present invention includes, but is not limited to, a computer, a network host, a single network server, a plurality of network server sets, or a cloud server formed by a plurality of servers. Wherein the Cloud server is composed of a large number of computers or web servers based on Cloud Computing (Cloud Computing).
It will be appreciated by those skilled in the art that the application environment shown in fig. 1 is merely an application scenario of the present application, and is not limited to the application scenario of the present application, and other application environments may also include more or fewer servers than those shown in fig. 1, or a server network connection relationship, for example, only 1 server is shown in fig. 1, and it will be appreciated that the evaluation system of the search phrase may also include one or more other servers, or/and one or more clients connected to a server network, which is not limited herein.
In addition, as shown in fig. 1, the evaluation system of the search phrase may further include a memory 20 for storing data, such as a corpus, in which corpus such as information, articles, search phrases corresponding to the corpus are stored, and the memory 20 may further include a feature database in which multidimensional feature data of the search phrase is stored, and the memory 20 may further include an evaluation result database in which evaluation results of the search phrase are stored.
It should be noted that, the schematic view of the scenario of the evaluation system of the search phrase shown in fig. 1 is only an example, and the evaluation system and scenario of the search phrase described in the embodiment of the present application are for more clearly describing the technical solution of the embodiment of the present application, and do not constitute a limitation on the technical solution provided by the embodiment of the present application, and as one of ordinary skill in the art can know, along with the evolution of the evaluation system of the search phrase and the appearance of a new service scenario, the technical solution provided by the embodiment of the present application is equally applicable to similar technical problems.
The evaluation system of the search phrase according to the embodiment of the present invention may be a distributed system formed by connecting a plurality of nodes (any form of computing devices in an access network, such as the server 10, etc.) through a form of network communication.
Taking a distributed system as an example of a blockchain system, referring To fig. 2, fig. 2 is a schematic diagram of an alternative architecture of the distributed system 100 applied To the blockchain system according To an embodiment of the present invention, the architecture is formed by a plurality of nodes 200 (any type of computing devices in an access network, such as servers) and clients 300, and a Peer-To-Peer (P2P, peer To Peer) network is formed between the nodes, where the P2P protocol is an application layer protocol running on top of a transmission control protocol (TCP, transmission Control Protocol) protocol. In a distributed system, any machine, such as a server, a terminal, may join to become a node, including a hardware layer, an intermediate layer, an operating system layer, and an application layer. The server 10 in the embodiment of the present invention is a node in a blockchain system.
Referring to the functionality of each node in the blockchain system shown in fig. 2, the functions involved include:
1) The routing, the node has basic functions for supporting communication between nodes.
Besides the routing function, the node can also have the following functions:
2) The application is used for being deployed in a block chain to realize specific service according to actual service requirements, recording data related to the realization function to form recorded data, carrying a digital signature in the recorded data to represent the source of task data, sending the recorded data to other nodes in the block chain system, and adding the recorded data into a temporary block when the source and the integrity of the recorded data are verified by the other nodes.
For example, the services implemented by the application include:
2.1 Wallet for providing electronic money transactions, including initiating a transaction (i.e., sending a transaction record of the current transaction to other nodes in the blockchain system, the other nodes, after verification, storing record data of the transaction in a temporary block of the blockchain in response to acknowledging that the transaction is valid; of course, the wallet also supports inquiry of remaining electronic money in the electronic money address;
2.2 The shared account book is used for providing the functions of storing, inquiring, modifying and the like of account data, sending record data of the operation on the account data to other nodes in the blockchain system, and after the other nodes verify to be effective, storing the record data into a temporary block as a response for acknowledging that the account data is effective, and also sending confirmation to the node initiating the operation.
2.3 A computerized agreement that can execute the terms of a contract, implemented by code deployed on a shared ledger for execution when certain conditions are met, for completing automated transactions based on actual business demand codes, such as querying the physical distribution status of the goods purchased by the buyer, transferring the electronic money of the buyer to the merchant's address after the buyer signs for the goods; of course, the smart contract is not limited to executing the contract for the transaction, and may execute a contract that processes the received information.
3) The blockchain comprises a series of blocks (blocks) which are connected with each other according to the generated sequence time, the new blocks are not removed once being added into the blockchain, and record data submitted by nodes in the blockchain system are recorded in the blocks.
Referring to fig. 3, fig. 3 is an optional Block Structure (Block Structure) provided in an embodiment of the present invention, where each Block includes a hash value of a transaction record stored in the Block (hash value of the Block) and a hash value of a previous Block, and each Block is connected by the hash value to form a Block chain. In addition, the block may include information such as a time stamp at the time of block generation. The Blockchain (Blockchain), which is essentially a de-centralized database, is a string of data blocks that are generated in association using cryptographic methods, each of which contains associated information that is used to verify the validity (anti-counterfeiting) of its information and to generate the next block.
When the evaluation system of the search phrase in the embodiment of the present invention is a blockchain system, in the embodiment of the present invention, the server is a node in the blockchain system, and the evaluation result of the search phrase can be stored in the blockchain. Specifically, in an embodiment of the present invention, the method further includes: acquiring an evaluation result of the search phrase; and storing the evaluation result of the search phrase in the block chain in the form of a block. The specific manner of adding the blocks may refer to the description of the blockchain system described above, and will not be repeated here.
The following describes in detail specific embodiments.
In the present embodiment, description will be made from the viewpoint of evaluation means of a search phrase, which may be integrated in the server 10 in particular.
The invention provides a search phrase evaluation method, which comprises the following steps: acquiring corpus; acquiring search phrases corresponding to the corpus; the search phrase is a phrase generated by combining keywords in the corpus with matched hot search words, and the hot search words are entity words with the search quantity larger than a preset threshold value; detecting the search phrase to obtain multidimensional feature data of the search phrase; and evaluating the search phrase according to the multidimensional feature data.
Referring to fig. 4, a flowchart of a method for evaluating a search phrase according to an embodiment of the present invention includes:
401. and obtaining corpus.
The corpus refers to language materials, and the corpus in the embodiment of the invention refers to text information such as information, articles and the like on websites, which is provided for users to read and view. In which, an information or an article forms a corpus, as shown in fig. 5, the main display area 51 of the browser displays an article titled "direct broadcast, australian, du, jiedu, 14 days, and Chaetocery, xu Mou, which is unique, and Sunzhen, is hopeful to take a crown" to form a corpus.
402. Acquiring search phrases corresponding to the corpus; the search phrase is a phrase generated by combining keywords in the corpus with matched hot search words, and the hot search words are entity words with the search quantity larger than a preset threshold value.
Wherein, entity words refer to proper nouns with unique referents. In order to meet the extended reading requirements of users, a recommendation model is constructed, so that the recommendation model generates relevant search phrases based on specific contents of corpus and recommends the relevant search phrases to the users. Specifically, a plurality of candidate keywords are extracted from the corpus according to a pre-constructed word stock, and then the importance degree of each candidate keyword in the corpus is calculated, so that the candidate keywords with the importance degree higher than a preset degree value are used as keywords of the corpus. And simultaneously, acquiring a plurality of hot search words in the search engine, wherein the plurality of hot search words are obtained by the search engine according to actual search statistics of a large number of users. And then, matching the keywords of the corpus with each popular search word to obtain the matching degree of the keywords of the corpus and each popular search word, and combining the popular search words with the matching degree larger than a preset matching value with the keywords of the corpus to generate search phrase recommendation to the user.
After the user reads the corpus, the user does not need to manually input the related content of the corpus, and the user can quickly search the related content of the corpus by clicking and selecting the search phrase corresponding to the corpus, so that the user experience is greatly improved. However, since the popular search word is a word actually searched by the user, there is a certain unreliability, and thus the search phrase generated based on the popular search word is also a certain unreliability, multi-dimensional detection is required to be performed on the search phrase to evaluate the recommendation degree of the search phrase, and so on.
Before multi-dimensional detection of the search phrase, the search phrase may be initially screened to remove undesirable phrases. Specifically, the recommendation model may generate a plurality of search phrases based on the method, where each search phrase corresponds to a keyword of a corpus. And detecting whether the keyword corresponding to each search phrase is a sensitive word, and removing the search phrase with the keyword being the sensitive word, namely, the search phrase does not carry out subsequent evaluation operation.
As shown in fig. 5, based on the content of the article, the recommendation model generates six search phrases and displays the six search phrases "grand-on-child coach, grand-on-child vs-on-child live, 2019 wen-net boy bill, cctv5+ what, xu Mou coach" in the bottom area 52 of the main display area 51.
403. And detecting the search phrase to obtain multidimensional feature data of the search phrase.
In order to improve the accuracy of search phrase detection, multi-dimensional detection can be performed on the search phrases, and corresponding feature data can be obtained through each-dimensional detection of the search phrases, so that multi-dimensional feature data can be obtained. The multi-dimensional detection may include correlation detection, integrity detection, and availability detection, and accordingly, the multi-dimensional feature data may include correlation, integrity, and availability.
Specifically, the detecting the search phrase in step 403, to obtain multidimensional feature data of the search phrase includes: detecting the relevance of the search phrase and the corpus to obtain a relevance; detecting the integrity of the search phrase to obtain the integrity; detecting the availability of the search phrase according to preset availability conditions to obtain availability; the relevance, the completeness, and the availability are added to multi-dimensional feature data of the search phrase. Wherein, the relevance, the integrity and the availability can be realized by different detection modes, and the method comprises the following steps:
(1) Correlation detection
Relevance detection is the analysis of relevance of search phrases and corpora. The relevance detection of the search phrase and the corpus can be realized by the relevance detection of the search phrase and the words in the corpus. Specifically, the detecting the relevance between the search phrase and the corpus to obtain the relevance specifically includes: extracting a central sentence of the corpus; identifying a center word in the center sentence; identifying core words in the search phrase; and detecting the relevance of the search phrase and the corpus according to the central word and the core word to obtain the relevance.
The central sentence is the most refined sentence capable of profiling the core content of the corpus, namely, deleting any content in the central sentence can cause incomplete key information. The center word in the center sentence refers to a proper noun (subject or object) or a distinctive action, state (predicate or table) in the sentence. The recognition of the center word can be realized through a neural network, namely, candidate center words are firstly extracted from the center sentence, meanwhile, the weight of the candidate center words is calculated, and the candidate center words with the weight greater than a threshold value (such as 0.5) are selected as the center words. The recognition of the core words in the search phrase can be realized through posterior probability and information gain, candidate core words are extracted from the search phrase, the weight of the candidate core words is calculated, and the candidate core words with the weight greater than a threshold value (such as 0.5) are selected as the core words.
It should be noted that, a plurality of center words may be identified in the center sentence, and a plurality of core words may be identified in the search phrase, for example, keywords of corpus corresponding to the search phrase, popular search words, and the like. And respectively detecting the relevance of each central word and each core word, wherein the relevance detection of any central word and any core word can obtain corresponding relevance, and the maximum relevance is selected from the obtained relevance and is used as the relevance of the search phrase and the corpus. For example, 2 central words A, B are identified in the central sentence, 3 core words C, D, E are identified in the search phrase, the relevance of the central word a to the core word C, D, E is detected respectively, meanwhile, the relevance of the central word B to the core word C, D, E is detected respectively, 6 relevance is obtained, and the relevance of the central word B to the core word C is the largest in the 6 relevance, so that the relevance of the central word B to the core word C is taken as the relevance of the search phrase to the corpus. In addition, it should be noted that, the correlation detection in the embodiment of the present invention refers to the detection of positive correlation, that is, the larger the detected correlation, the more relevant the search phrase and corpus are.
The relevance detection may include two-way detection, namely topic relevance detection and entity relevance detection. Specifically, the detecting, according to the core word and the center word, the relevance between the search phrase and the corpus to obtain a relevance includes: detecting the topic relevance of the search phrase and the corpus according to the central word and the core word to obtain topic relevance; detecting the entity correlation between the search phrase and the corpus according to the central word and the core word to obtain the entity correlation; and determining the relevance of the search phrase and the corpus according to the topic relevance and the entity relevance.
The topic relevance detection of the search phrase and the corpus can be realized through topic relevance detection of the central word and the core word. Topic relevance refers to the relevance of both primary content. Specifically, the detecting the topic relevance between the search phrase and the corpus according to the core word and the center word to obtain the topic relevance includes: detecting whether the core word and the center word meet at least one theme condition in preset theme conditions according to a pre-established knowledge graph; the subject matter conditions include belonging to the same concept, being associated with the same event, having a relationship, describing the same thing, or belonging to the same subject; if not, determining the topic relevance of the search phrase and the corpus as a first topic relevance; if yes, determining that the topic relevance of the search phrase and the corpus is second topic relevance, wherein the second topic relevance is larger than the first topic relevance. Wherein, the larger the topic relevance is, the more relevant the search phrase is to the topic of the corpus.
It should be noted that, the knowledge graph is intended to describe various entities or concepts and relationships thereof existing in the real world, which constitute a huge semantic network graph, and the nodes represent the entities or concepts, and the edges are composed of attributes or relationships. Because the center word and the core word are entity words, the association relationship between the center word and the core word can be obtained through the knowledge graph, and then the correlation between the center word and the core word can be detected according to the association relationship.
According to the association relation required by the two related words, a plurality of topic conditions are preset, so that the topic correlation of the search phrase and the corpus is detected by detecting whether the central word and the core word meet any topic condition. For example, if a subject condition is set to belong to the same concept, whether the center word and the core word are directly related in the knowledge graph is detected, and whether the entity attribute is the same, wherein the entity attribute refers to the type pointed by the entity, such as an automobile class, an event class and the like, if the center word and the core word are directly related in the knowledge graph, and the entity attribute is the same, the center word and the core word are judged to meet the subject condition, such as that the center word is a public CC, the core word is a Toyota SUV, and the public CC and the Toyota SUV belong to the automobile, so that the subject condition is met. And setting a theme condition as being related to the same event, detecting whether the center word and the core word are directly related in the knowledge graph and are noun entities, if so, judging that the center word and the core word meet the theme condition, if the center word is a world trade building, the core word is a world trade building, and the center word and the world trade building are both in the 911 event, so that the theme condition is met. Setting a theme condition as having an affiliated relation, detecting whether a core word is an upper word of a central word in a knowledge graph, namely whether a relation arrow points to the central word from the core word, if so, judging that the central word and the core word meet the theme condition, if the central word is a Li-in-one, the core word is a fantasy novel, and the Li-in-one is a character name of the fantasy novel, then the Li-in-one has an affiliated relation with the fantasy novel, and the theme condition is met. Setting a subject condition to belong to the same subject, such as describing the same subject, or a specific action, a specific state belongs to the same subject, and the like, and detecting whether the core word and the core word are related to the same word, if yes, judging that the core word and the core word meet the subject condition, if the core word is a beast, and the beast can evolve into a beast, namely, the beast and the beast belong to different stages of the same subject, the subject condition is met, and if the core word is a cold in a dog cold, the core word is a cold in a human cold, and the dog and the human do not belong to the same subject, and the subject condition is not met.
The association relation between the central word and the core word can determine the topic correlation of the central word and the core word as long as any topic condition is met, so that the topic correlation of the search phrase and the corpus is determined, and the topic correlation is set as a second topic correlation, such as 1; if the topic conditions are not met, determining that the topic of the central word and the topic of the core word are irrelevant, determining that the topic of the search phrase and the topic of the corpus are irrelevant, and setting the topic relevance thereof as a first topic relevance, such as 0.
The entity relevance detection of the search phrase and the corpus can be realized through the entity relevance detection of the core word and the center word. Entity relevance refers to the relevance of the entities described by the two. Specifically, the detecting the entity relevance between the search phrase and the corpus according to the center word and the core word to obtain the entity relevance includes: detecting whether ambiguity exists between the central word and the core word according to a pre-established knowledge graph; if yes, determining the entity correlation degree as a first entity correlation degree; if not, determining the entity correlation degree as a second entity correlation degree, wherein the second entity correlation degree is larger than the first entity correlation degree. Wherein, the greater the entity relevance, the more relevant the search phrase is to the entity of the corpus.
It should be noted that the ambiguity detection of the two words can be implemented by detecting the upper words of the two words in the knowledge graph, that is, the upper words of the central word and the core word in the knowledge graph are respectively obtained, whether the upper words of the central word and the upper words of the core word are identical is detected, if so, the central word and the core word are not ambiguous, and if not, the central word and the core word are ambiguous. For example, the center word is a western lake in Guangzhou western lake, the core word is a western lake in Hangzhou western lake, and the upper level words of the two western lakes are different and ambiguous.
If ambiguity exists between the central word and the core word, determining that the central word is irrelevant to the entity of the core word, thereby determining that the search phrase is irrelevant to the entity of the corpus, and setting the entity relevance thereof as a first entity relevance, such as 0; if the central word and the core word have no ambiguity, determining that the central word is related to the core word entity, thereby determining that the central word is related to the core word entity, and setting the entity relevance thereof as a second entity relevance, such as 1.
After obtaining the topic relevance and the entity relevance of the search phrase and the corpus, the relevance of the search phrase and the corpus can be determined according to the topic relevance and the entity relevance, specifically: if the topic relevance is a first topic relevance, determining that the relevance of the search phrase and the corpus is the first relevance; if the topic relevance is a second topic relevance and the entity relevance is a first entity relevance, determining that the relevance between the search phrase and the corpus is a second relevance; if the topic relevance is a second topic relevance and the entity relevance is a second entity relevance, determining that the relevance of the search phrase and the corpus is a third relevance, wherein the first relevance, the second relevance and the third relevance are sequentially increased.
Because the topic relevance occupies a larger weight in the relevance detection of the search phrase and the corpus, the relevance of the search phrase and the corpus is mainly set based on the topic relevance. If the topic relevance is a first topic relevance, such as 0, and the topic of the search phrase and the corpus is irrelevant, determining that the search phrase and the corpus are irrelevant, and setting the relevance of the search phrase and the corpus as the first relevance, such as 0; if the topic relevance is the second topic relevance, such as 1, and the entity relevance is the first entity relevance, such as 0, determining that the search phrase is related to the corpus, but the relevance is low, and setting the relevance of the search phrase and the corpus as the second relevance, such as 1; if the topic relevance is the second topic relevance, such as 1, and the entity relevance is the second entity relevance, such as 1, the search phrase and the corpus are determined to be related, and the relevance is high, and the relevance of the search phrase and the corpus is set to be the third relevance, such as 2, and the specific relevance determination rule is shown in table 1.
TABLE 1
(2) Integrity detection
Integrity detection refers to integrity detection of structural and semantic information of a search phrase. Integrity detection may include two-way detection, namely text integrity detection and semantic integrity detection. Specifically, the detecting the integrity of the search phrase to obtain the integrity includes: detecting the text integrity of the search phrase to obtain the text integrity; detecting the semantic integrity of the search phrase to obtain the semantic integrity; and determining the integrity of the search phrase according to the text integrity and the semantic integrity.
Wherein text integrity detection of the search phrase may be achieved by detecting a grammatical structure of the search phrase. Specifically, the detecting the text integrity of the search phrase to obtain the text integrity includes: detecting whether the grammar structure of the search phrase is complete; if the grammar structure is incomplete, determining that the text integrity of the search phrase is the first text integrity; if the grammar structure is complete, detecting whether the search phrase is a parallel phrase or a bias phrase; if yes, determining that the text integrity of the search phrase is second text integrity; if not, determining that the text integrity of the search phrase is third text integrity, wherein the first text integrity, the second text integrity and the third text integrity are sequentially increased. Wherein the greater the text integrity, the more complete the text of the search phrase.
It should be noted that, the integrity of the grammar structure includes structural integrity and word integrity, whether the grammar structure is complete or not can be judged by a probability model, namely, a probability model is built in advance, search words actively input by a user are collected, the collected search words are input into the probability model for training, so that the probability model can detect the probability that the user actively searches different words, word segmentation processing is carried out on the search phrases, the last word in the search phrases is obtained, the word is input into the probability model, the probability that the user actively searches the word is obtained, if the probability is higher than a certain threshold, the grammar structure of the search phrases is judged to be complete, otherwise, the grammar structure of the search phrases is incomplete. The incomplete grammar structure of the search phrase can directly determine the text incompleteness of the search phrase, such as Shenzhen present and Shenzhen day oil price. When the search phrase is determined to be incomplete, setting the text integrity of the search phrase to be a first text integrity, such as 0.
The grammar structure of the search phrase is complete, the text of the search phrase can be determined to be complete, but the text integrity of different search phrases can be different, the structure type of the search phrase is required to be identified, whether the search phrase is word-word splicing, namely parallel phrases, bias phrases and the like is detected, if yes, the text integrity of the search phrase is lower, such as tiger oil consumption, a friend of king and a friend of king, and the text integrity is determined to be a second text integrity, such as 1. If the search phrase is a short text formed by combining a main meaning phrase, a movable guest phrase and the like with a movable complement phrase, a shape complement phrase, a mediate guest phrase and the like, the text integrity of the search phrase is higher, and the text integrity of the search phrase is determined to be a third text integrity, such as 2.
The detection of semantic integrity may be accomplished by identifying semantic information corresponding to the search terms. Specifically, the detecting the semantic integrity of the search phrase to obtain the semantic integrity includes: identifying semantic information corresponding to the search phrase; if the corresponding semantic information is not identified, determining the semantic integrity of the search phrase as a first semantic integrity; if the search phrase is identified to correspond to at least two semantic information, determining that the semantic integrity of the search phrase is a second semantic integrity; if the search phrase is identified to correspond to one piece of semantic information, determining that the semantic integrity of the search phrase is third semantic integrity, wherein the first semantic integrity, the second semantic integrity and the third semantic integrity are sequentially increased. Wherein, the higher the semantic integrity, the more complete the semantic of the search term.
It should be noted that, the identifying process of the semantic information may be to perform word segmentation processing on the search phrase, determine the meaning of each word in the search phrase and the association relationship between the words, so as to determine whether the search phrase describes a specific content, event, scene, behavior, concept, method, etc., if not, it indicates that the search phrase does not correspond to any semantic information, it is difficult to understand the semantic meaning of the search phrase, or the search phrase is meaningless, has no semantic meaning, etc., i.e., the semantic meaning of the search phrase is incomplete, such as a parent student, so as to determine the semantic integrity of the search phrase as a first semantic integrity, such as 0; if yes, the meaning of the search phrase is complete. However, the semantic integrity of different search terms can be different, whether the search terms have various continuous possibilities or not needs to be detected, if so, the fact that the search terms correspond to a plurality of semantic information is indicated, namely, the semantic integrity of the search terms is lower, for example, the oil consumption of the tiger can correspond to semantic information of 'the tiger oil consumption is not large', and the semantic information of 'the tiger oil consumption is different in winter and summer greatly', so that the semantic integrity of the search terms is determined to be the second semantic integrity, for example, 1; if not, the search phrase is indicated to correspond to only one semantic information, and the semantic integrity is high, such as the full text of the sushi butterfly loving flower, so that the semantic integrity of the search phrase is determined to be a third semantic integrity, such as 2.
After the text integrity and the semantic integrity of the search phrase are obtained, the integrity of the search phrase can be determined according to the text integrity and the semantic integrity, specifically: if the text integrity is the first text integrity or the semantic integrity is the first semantic integrity, determining that the integrity of the search phrase is the first integrity; if the text integrity is the second text integrity or the third text integrity and the semantic integrity is the second semantic integrity, determining that the integrity of the search phrase is the second integrity; if the text integrity is the second text integrity or the third text integrity and the semantic integrity is the third semantic integrity, determining that the integrity of the search phrase is the third integrity, wherein the first integrity, the second integrity and the third integrity are sequentially increased. Wherein a higher integrity indicates a more complete search phrase.
Since semantic integrity is weighted more heavily in the integrity detection of search terms, the integrity of search terms is set primarily based on semantic integrity. If the text integrity is the first text integrity, such as 0, or the semantic integrity is the first semantic integrity, such as 0, that is, any one of the text integrity and the semantic integrity is 0, determining that the search phrase is incomplete, and setting the integrity of the search phrase as the first integrity, such as 0; if the text integrity is the second text integrity or the third text integrity, namely the text integrity is not 0, namely the text is complete, the integrity of the search phrase is consistent with the semantic integrity: if the semantic integrity is the second semantic integrity, such as 1, the integrity of the search phrase is the second integrity, such as 1, and if the semantic integrity is the third semantic integrity, such as 2, the integrity of the search phrase is the third integrity, such as 2, and the specific integrity determination rule is shown in table 2.
TABLE 2
(3) Availability detection
Availability detection refers to detecting whether a search phrase can be used. The detection of the availability of the search phrase may be accomplished by preset availability conditions. Specifically, the detecting the availability of the search phrase to obtain the availability degree includes: detecting whether the search phrase meets a preset available condition or not; the available conditions include not belonging to query intent class phrases, not having sensitive words, not having word defects, and not belonging to rumors; if not, determining the availability of the search phrase as a first availability; if yes, determining the availability of the search phrase as second availability, wherein the second availability is larger than the first availability. Wherein a greater availability indicates a more available search phrase.
Wherein the availability condition detects the availability of the search phrase by setting four conditions. For the first condition, detecting whether the search phrase belongs to a query intention class phrase, wherein the query intention class phrase can comprise a content acquisition class phrase, such as Hangzhou fire control registration network; resource acquisition class phrases such as Axure8 activate code, legend single edition; information inquiry phrases such as Shenzhen 92 gasoline price, liu some wife and Liu some daughter. The detection of the query intention class phrase can be achieved by obtaining the suffix of the search phrase and detecting whether the suffix of the search phrase is a preset query intention class word, if the suffix of the search phrase is the query intention class word, the search phrase is indicated to be the query intention class phrase, the search phrase is judged not to meet the first condition, and otherwise, the search phrase is judged to meet the first condition.
For the second condition, detecting whether the search phrase has sensitive words, wherein the sensitive words can comprise words such as yellow anti-violence, such as pornography, violence, reaction and politics sensitive words, and other words which do not accord with the core value of sociality; sensitive words may also include words that cause discomfort to the user, such as nausea, horror, hypoquivalence, visceral speech, words with a strong negative emotion. The detection of the sensitive words can be realized by respectively detecting whether each word in the search phrase is a preset sensitive word after the word segmentation is carried out on the search phrase, if the word in the search phrase is the sensitive word, the search phrase is indicated to have the sensitive word, the search phrase is judged not to meet the second condition, and otherwise, the search phrase is judged to meet the second condition.
For the third condition, it is detected whether there is a word defect in the search phrase, which means that there is a loss of content in the search phrase, resulting in the internal word having no meaning, such as a learner. The detection of word defects can be achieved by identifying words in the search phrase, if the search phrase has unrecognizable words, the word defects are indicated in the search phrase, the search phrase is judged not to meet the third condition, and otherwise, the search phrase is judged to meet the third condition.
For the fourth condition, whether the search phrase belongs to rumors is detected, and the rumors refer to information with true authenticity of contents or common sense errors, such as sugar cane is taken to help weight loss and smoking is beneficial to health. The rumor detection may be performed by detecting whether the search phrase corresponds to a search result in the search engine, and if the search phrase does not have a corresponding search result, indicating that the search phrase belongs to the rumor, determining that the search phrase does not satisfy the fourth condition, otherwise, determining that the search phrase satisfies the fourth condition.
The search phrase needs to meet the four conditions simultaneously to determine that the search phrase is available, and the availability of the search phrase is determined to be a second availability, such as 1. If the search phrase does not meet any of the conditions, determining that the search phrase is not available, and setting the availability of the search phrase to be a first availability, such as 0.
After the relevance of the search phrase and the corpus, the integrity and the availability of the search phrase are obtained, the search phrase can be evaluated according to the relevance, the integrity and the availability, specifically: if the relevance is first relevance, or the integrity is first integrity, or the availability is first availability, evaluating the recommendation degree of the search phrase as first recommendation degree; if the relevance is a third relevance, the integrity is a third integrity, and the availability is a second availability, evaluating that the recommendation of the search phrase is a third recommendation; otherwise, evaluating the recommendation degree of the search phrase as a second recommendation degree, wherein the first recommendation degree, the second recommendation degree and the third recommendation degree are sequentially increased.
It should be noted that, if the relevance is the first relevance, or the integrity is the first integrity, or the availability is the first availability, that is, the search phrase is not related to the corpus, or the search phrase is incomplete, or the search phrase is not available, the search phrase is evaluated as an unreferenced phrase, and the recommendation degree of the search phrase is set to be the first recommendation degree, for example, 0; if the relevance is third relevance, the integrity is third integrity, and the availability is second availability, namely, the relevance of the search phrase and the corpus is high, the integrity of the search phrase is high, and the search phrase is available, the search phrase is estimated to be a recommended phrase, the recommendation is high, and the recommendation is set to be the third recommendation, for example, 2; otherwise, the relevance is the second relevance, the integrity is the third relevance, and the availability is the second availability, that is, the relevance of the search phrase and the corpus is low, the integrity of the search phrase is high, and the search phrase is available, the search phrase is evaluated as a recommended phrase, but the recommendation is low, the recommendation is set to be the second recommendation, for example, 1, or the relevance is the third relevance, the integrity is the second integrity, and the availability is the second availability, that is, the relevance of the search phrase and the corpus is high, the integrity of the search phrase is low, and the search phrase is available, the search phrase is evaluated as a recommended phrase, but the recommendation is low, and the recommendation is set to be the second recommendation, for example, 1.
In addition, in the process of evaluating the search phrase, the embodiment of the invention can also select a high-quality search phrase to realize iterative optimization of various algorithm models. Specifically, the method further comprises: acquiring an evaluation result of the search phrase; and if the evaluation result meets the preset evaluation condition, adding the search phrase and the evaluation result thereof into a training sample.
The evaluation result of the search phrase may include a recommendation degree of the search phrase, and the preset evaluation condition may be greater than a preset recommendation degree threshold. Comparing the recommendation degree of the search phrase with a preset recommendation degree threshold, and if the recommendation degree of the search phrase is larger than the preset recommendation degree threshold, adding the search phrase and the recommendation degree thereof into a training sample of a recommendation model.
For example, the recommendation degree can be divided into three levels, namely a first recommendation degree (e.g. 0), a second recommendation degree (e.g. 1) and a third recommendation degree (e.g. 2), and if the preset recommendation degree threshold is 0.5, the search phrase evaluated as the second recommendation degree or the third recommendation degree and the recommendation degree thereof can be added into a training sample of the recommendation model, so that iterative optimization of the recommendation model is realized, and the recommendation model generates the search phrase more conforming to the user expectation.
Similarly, according to the embodiment of the invention, the search phrase with the relevance being larger than the relevance threshold value can be selected according to the relevance between the search phrase and the corpus, and the search phrase is added into the training sample of the relevance model, so that iterative optimization of the relevance model is realized; selecting search phrases with the integrity larger than an integrity threshold according to the integrity of the search phrases, and adding the search phrases to a training sample of an integrity model to realize iterative optimization of the integrity model; according to the availability of the search phrase, selecting the search phrase with the availability larger than the availability threshold value, and adding the search phrase to a training sample of the availability model to realize iterative optimization of the availability model.
More specifically, according to the embodiment of the invention, the search phrase with the topic relevance greater than the topic relevance threshold can be selected according to the topic relevance of the search phrase and the corpus, and added into the training sample of the topic relevance model to realize iterative optimization of the topic relevance model; according to the entity correlation degree of the search phrase and the corpus, selecting a search phrase with the entity correlation degree larger than an entity correlation degree threshold value, and adding the search phrase into a training sample of an entity correlation model to realize iterative optimization of the entity correlation model; selecting a search phrase with the text integrity larger than a text integrity threshold according to the text integrity of the search phrase, and adding the search phrase into a training sample of a text integrity model to realize iterative optimization of the text integrity model; according to the semantic integrity of the search phrase, selecting the search phrase with the semantic integrity larger than the semantic integrity threshold value, and adding the search phrase into a training sample of the semantic integrity model to realize iterative optimization of the semantic integrity model. In addition, the embodiment of the invention can select corresponding search phrases according to the recognition result of the sensitive words and add the corresponding search phrases into the training samples of the sensitive word recognition model to realize iterative optimization of the sensitive word recognition model; and selecting a corresponding search phrase according to the identification result of the rumor, and adding the search phrase into a training sample of the rumor identification model to realize iterative optimization of the rumor identification model.
The method for evaluating the search phrase in the embodiment of the invention is described below with reference to a specific application scenario.
Referring to fig. 6, a flowchart of another embodiment of a method for evaluating a search phrase according to an embodiment of the present invention is shown, where the method for evaluating a search phrase is applied to a server, and the method for evaluating a search phrase includes:
601. the article content is obtained.
For example, the article content entitled "direct broadcast Australian-broadcast 14-day-resolution-forecast, xu Mou unique out-war, grandchild somehow hopefully capturing" in FIG. 5 is obtained.
602. And obtaining the search phrase corresponding to the article content.
The search phrase is generated based on keywords in the article content. When the search phrase is obtained, whether the keyword corresponding to the search phrase is a sensitive word or not can be detected, and if the keyword is the sensitive word, the search phrase is removed. For example, the search phrase "child" recommended at the bottom of the chapter content in fig. 5 is obtained, and the keyword corresponding to the search phrase is "child" and is not a sensitive word, so that the search phrase is retained, and the subsequent evaluation is continued.
603. Topic relevance of the article content to the search phrase is detected and scored.
If the topics are related, the score of the topic relevance is 1, and if the topics are not related, the score of the topic relevance is 0. For example, if the core word "grandchild" in the search phrase "grandchild" and the center word "grandchild" in the article content belong to the same subject, the search phrase is related to the article content subject, and the score is 1.
604. Entity relevance of the article content to the search phrase is detected and scored.
If the entities are related, the score of the entity relevance is 1, and if the entities are not related, the score of the entity relevance is 0. For example, if the core word "grandchild" in the search phrase "grandchild" and the center word "grandchild" in the article content refer to the same person, there is no ambiguity, the search phrase is related to the article content entity and the score is 1.
605. And determining the relevance scores of the article content and the search phrase according to the scores of the subject relevance and the entity relevance.
If the score of the subject relevance is 0, the overall relevance score is 0; if the score of the subject relevance is 1 and the score of the entity relevance is 0, the overall relevance score is 1; if the topic relevance is scored as 1 and the entity relevance is scored as 1, the overall relevance score is 2. For example, the article content in FIG. 5 has a relevance score of 2 to the search phrase "grandchild" for a coach.
606. The text integrity of the search phrase is detected and scored.
If the text is incomplete, the score of the text integrity is 0, if the text is complete, but the text integrity is low, the score of the text integrity is 1, and if the text is complete and the text integrity is high, the score of the text integrity is 2. For example, if the grammar structure of the search phrase "sun-certain coach" is complete but belongs to a bias phrase, the text of the search phrase is complete, but the text integrity is low, and the score of the text integrity is 1.
607. Semantic integrity of the search terms is detected and scored.
If the semantics are incomplete, the score of the semantic integrity is 0, if the semantics are complete, but the semantic integrity is low, the score of the semantic integrity is 1, and if the semantics are complete and the semantic integrity is high, the score of the semantic integrity is 2. For example, if the search term "sun-certain coach" may correspond to various semantic information, it indicates that the search term is semantically complete, but the semantic integrity is low, and the score of the semantic integrity is 1.
608. And determining the integrity score of the search phrase according to the score of the text integrity and the score of the semantic integrity.
If any one of the text integrity and the semantic integrity is scored as 0, the integral integrity of the search phrase is scored as 0; if the scores of the text integrity and the semantic integrity are not 0, the integral score of the search phrase is consistent with the score of the semantic integrity. For example, the search phrase "grandchild" has a score of 1 for both text integrity and semantic integrity, and the search phrase has an integrity score of 1.
609. And detecting whether the search phrase is a query intention phrase, and scoring the query intention according to the detection result.
If the search phrase is a query intent phrase, the query intent score is 0, and if the search phrase is not a query intent phrase, the query intent score is 1. For example, if the search phrase "sun-certain coach" belongs to an information query class phrase, i.e., to a query intent class phrase, then the query intent score for that search phrase is 0.
610. And detecting whether the search phrase has sensitive words, and scoring the sensitive words according to the detection result.
If the search phrase has a sensitive word, the sensitive word score is 0, and if the search phrase does not have a sensitive word, the sensitive word score is 1. For example, if the search phrase "sun certain coach" does not have a sensitive word, the sensitive word of the search phrase scores 1.
611. And detecting whether the search phrase has word incomplete, and scoring the word incomplete according to the detection result.
If the search phrase has word disability, the word disability score is 0, and if the search phrase does not have word disability, the word disability score is 1. For example, if no word stumbling exists in the search phrase "grandchild" then the word stumbling score for that search phrase is 1.
612. Detecting whether the search phrase belongs to rumors or not, and scoring the rumors according to the detection result.
If the search phrase belongs to the rumor, the rumor score is 0, and if the search phrase does not belong to the rumor, the rumor score is 1. For example, if a search phrase "grandchild" does not belong to a rumor, the rumor score for that search phrase is 1.
613. The availability score of the search phrase is determined based on the query intent score, the sensitive word score, the term incomplete score, and the rumor score.
If any one of the query intention score, the sensitive word score, the word incomplete score and the rumor score is 0, the usability score of the search phrase is 0; if the query intent score, the sensitive word score, the term incomplete score, and the rumor score are all 1, the usability score of the search phrase is 1. For example, if the query intent score of the search phrase "grandchild" is 0, the availability score of the search phrase is 0.
614. The overall score of the search phrase is evaluated based on the relevance score, the integrity score, and the availability score.
If any one of the relevance score, the integrity score and the availability score is 0, the overall score of the search phrase is 0; if none of the relevance score, the integrity score, and the availability score is 0, the overall score of the search phrase is the lower of the relevance score and the integrity score. For example, if the relevance score of the search phrase "grandchild coach" is 1, the integrity score is 1, the availability score is 0, and the overall score of the search phrase is 0.
After the overall score is obtained, an evaluation record table may be further made to record URL (Uniform Resource Locator ) addresses of articles, article titles, article contents, search phrases, source words, topic relevance scores, entity relevance scores, text integrity scores, semantic integrity scores, query intent scores, sensitive word scores, word incomplete scores, rumor scores, relevance scores, integrity scores, availability scores, overall scores, and the like in the evaluation record table, so as to provide necessary classification training samples for iterative optimization of subsequent various algorithm models. Wherein the evaluation record table may be as shown in table 3.
TABLE 3 Table 3
In summary, the embodiment of the invention acquires the corpus, further acquires the search phrase corresponding to the corpus, detects the search phrase to obtain the multidimensional feature data of the search phrase, and realizes the multidimensional intelligent evaluation of the search phrase according to the multidimensional feature data, thereby effectively improving the evaluation efficiency of the search phrase. In addition, the embodiment of the invention effectively promotes the classification optimization of various algorithm models by systematically and procedural quality evaluation of the multi-dimension of the search phrase and mining the problem of low commonality of the search phrase.
In order to facilitate better implementation of the search phrase evaluation method provided by the embodiment of the invention, the embodiment of the invention also provides a device based on the search phrase evaluation method. Where the meaning of nouns is the same as in the search phrase evaluation method described above, specific implementation details may be referred to in the description of the method embodiments.
Referring to fig. 7, fig. 7 is a schematic structural diagram of an evaluation device for search phrases according to an embodiment of the present invention, where the evaluation device for search phrases may include:
a corpus acquisition module 701, configured to acquire a corpus;
a search phrase obtaining module 702, configured to obtain a search phrase corresponding to the corpus; the search phrase is a phrase generated by combining keywords in the corpus with matched hot search words, and the hot search words are entity words with the search quantity larger than a preset threshold value;
A detection module 703, configured to detect the search phrase to obtain multidimensional feature data of the search phrase; the method comprises the steps of,
and the evaluation module 704 is used for evaluating the search phrase according to the multidimensional feature data.
In some embodiments of the present invention, the detection module 703 is specifically configured to:
detecting the relevance of the search phrase and the corpus to obtain a relevance;
detecting the integrity of the search phrase to obtain the integrity;
detecting the availability of the search phrase according to preset availability conditions to obtain availability;
the relevance, the completeness, and the availability are added to multi-dimensional feature data of the search phrase.
In some embodiments of the present invention, the detection module 703 is specifically configured to:
detecting the topic relevance of the search phrase and the corpus to obtain topic relevance;
detecting the entity correlation between the search phrase and the corpus to obtain an entity correlation degree;
and determining the relevance of the search phrase and the corpus according to the topic relevance and the entity relevance.
In some embodiments of the present invention, the detection module 703 is specifically configured to:
Identifying a central word of the corpus;
identifying core words in the search phrase, wherein the core words comprise the key words or the popular search words;
detecting whether the center word and the core word meet at least one theme condition in preset theme conditions according to a pre-established knowledge graph; the subject conditions include belonging to the same concept, being associated with the same event, having a affiliation, or belonging to the same subject;
if not, determining the topic relevance of the search phrase and the corpus as a first topic relevance;
if yes, determining that the topic relevance of the search phrase and the corpus is second topic relevance, wherein the second topic relevance is larger than the first topic relevance.
In some embodiments of the present invention, the detection module 703 is specifically configured to:
detecting whether ambiguity exists between the central word and the core word according to a pre-established knowledge graph;
if yes, determining the entity correlation degree of the search phrase and the corpus as a first entity correlation degree;
if not, determining that the entity correlation degree of the search phrase and the corpus is a second entity correlation degree, wherein the second entity correlation degree is larger than the first entity correlation degree.
In some embodiments of the present invention, the detection module 703 is specifically configured to:
if the topic relevance is a first topic relevance, determining that the relevance of the search phrase and the corpus is the first relevance;
if the topic relevance is a second topic relevance and the entity relevance is a first entity relevance, determining that the relevance between the search phrase and the corpus is a second relevance;
if the topic relevance is a second topic relevance and the entity relevance is a second entity relevance, determining that the relevance of the search phrase and the corpus is a third relevance, wherein the first relevance, the second relevance and the third relevance are sequentially increased.
In some embodiments of the present invention, the detection module 703 is specifically configured to:
detecting the text integrity of the search phrase to obtain the text integrity;
detecting the semantic integrity of the search phrase to obtain the semantic integrity;
and determining the integrity of the search phrase according to the text integrity and the semantic integrity.
In some embodiments of the present invention, the detection module 703 is specifically configured to:
detecting whether the grammar structure of the search phrase is complete;
If the grammar structure is incomplete, determining that the text integrity of the search phrase is the first text integrity;
if the grammar structure is complete, detecting whether the search phrase is a parallel phrase or a bias phrase;
if yes, determining that the text integrity of the search phrase is second text integrity;
if not, determining that the text integrity of the search phrase is third text integrity, wherein the first text integrity, the second text integrity and the third text are sequentially increased.
In some embodiments of the present invention, the detection module 703 is specifically configured to:
identifying semantic information corresponding to the search phrase;
if the corresponding semantic information is not identified, determining the semantic integrity of the search phrase as a first semantic integrity;
if the search phrase is identified to correspond to at least two semantic information, determining that the semantic integrity of the search phrase is a second semantic integrity;
if the search phrase is identified to correspond to one piece of semantic information, determining that the semantic integrity of the search phrase is third semantic integrity, wherein the first semantic integrity, the second semantic integrity and the third semantic integrity are sequentially increased.
In some embodiments of the present invention, the detection module 703 is specifically configured to:
if the text integrity is the first text integrity or the semantic integrity is the first semantic integrity, determining that the integrity of the search phrase is the first integrity;
if the text integrity is the second text integrity or the third text integrity and the semantic integrity is the second semantic integrity, determining that the integrity of the search phrase is the second integrity;
if the text integrity is the second text integrity or the third text integrity and the semantic integrity is the third semantic integrity, determining that the integrity of the search phrase is the third integrity, wherein the first integrity, the second integrity and the third integrity are sequentially increased.
In some embodiments of the present invention, the correlation comprises a first correlation, a second phase Guan Du, and a third correlation that are sequentially incremented, the integrity comprises a first integrity, a second integrity, and a third integrity that are sequentially incremented, and the availability comprises a first availability and a second availability that are sequentially incremented; the detection module 703 is specifically configured to:
if the relevance is first relevance, or the integrity is first integrity, or the availability is first availability, evaluating the recommendation degree of the search phrase as first recommendation degree;
If the relevance is a third relevance, the integrity is a third integrity, and the availability is a second availability, evaluating that the recommendation of the search phrase is a third recommendation;
otherwise, evaluating the recommendation degree of the search phrase as a second recommendation degree, wherein the first recommendation degree, the second recommendation degree and the third recommendation degree are sequentially increased.
In some embodiments of the invention, the apparatus further comprises a sample addition module, specifically for:
acquiring an evaluation result of the search phrase;
and if the evaluation result meets the preset evaluation condition, adding the search phrase and the evaluation result thereof into a training sample.
In some embodiments of the present invention, the apparatus further comprises a storage module, specifically configured to:
acquiring an evaluation result of the search phrase;
the evaluation results are saved in the blockchain in the form of blocks.
In the implementation, each module may be implemented as an independent entity, or may be combined arbitrarily, and implemented as the same entity or several entities, and the implementation of each module may be referred to the foregoing method embodiment, which is not described herein again.
According to the embodiment of the invention, the corpus is obtained, the search phrase corresponding to the corpus is obtained, the search phrase is detected, the multidimensional feature data of the search phrase is obtained, and the multidirectional evaluation of the search phrase is realized according to the multidimensional feature data, so that the evaluation efficiency and the evaluation accuracy of the search phrase are effectively improved. In addition, the embodiment of the invention effectively promotes the classification optimization of various algorithm models by systematically and procedural quality evaluation of the multi-dimension of the search phrase and mining the problem of low commonality of the search phrase.
The embodiment of the invention also provides a server, as shown in fig. 8, which shows a schematic structural diagram of the server according to the embodiment of the invention, specifically:
the server may include components such as a processor 801 of one or more processing cores, a memory 802 of one or more computer-readable storage media, a power supply 803, and an input unit 804. Those skilled in the art will appreciate that the server architecture shown in fig. 8 is not limiting of the server and may include more or fewer components than shown, or may combine certain components, or a different arrangement of components. Wherein:
The processor 801 is a control center of the server, connects various parts of the entire server using various interfaces and lines, and performs various functions of the server and processes data by running or executing software programs and/or modules stored in the memory 802, and calling data stored in the memory 802. Optionally, the processor 801 may include one or more processing cores; preferably, the processor 801 may integrate an application processor that primarily processes operating storage media, user interfaces, applications, etc., with a modem processor that primarily processes wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 801.
The memory 802 may be used to store software programs and modules, and the processor 801 executes various functional applications and data processing by executing the software programs and modules stored in the memory 802. The memory 802 may mainly include a storage program area and a storage data area, wherein the storage program area may store a storage medium, an application program (such as a sound playing function, an image playing function, etc.) required for operating at least one function, and the like; the storage data area may store data created according to the use of the server, etc. In addition, memory 802 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device. Accordingly, the memory 802 may also include a memory controller to provide the processor 801 with access to the memory 802.
The server also includes a power supply 803 for powering the various components, preferably, the power supply 803 can be logically coupled to the processor 801 via a power management storage medium such that functions such as managing charge, discharge, and power consumption can be performed via the power management storage medium. The power supply 803 may also include one or more of any components, such as a direct current or alternating current power supply, a rechargeable storage medium, a power failure detection circuit, a power converter or inverter, a power status indicator, and the like.
The server may further comprise an input unit 804, which input unit 804 may be used for receiving input digital or character information and for generating keyboard, mouse, joystick, optical or trackball signal inputs in connection with user settings and function control.
Although not shown, the server may further include a display unit or the like, which is not described herein. In this embodiment, the processor 801 in the server loads executable files corresponding to the processes of one or more application programs into the memory 802 according to the following instructions, and the processor 801 executes the application programs stored in the memory 802, so as to implement various functions as follows:
Acquiring corpus; acquiring search phrases corresponding to the corpus; the search phrase is a phrase generated by combining keywords in the corpus with matched hot search words, and the hot search words are entity words with the search quantity larger than a preset threshold value; detecting the search phrase to obtain multidimensional feature data of the search phrase; and evaluating the search phrase according to the multidimensional feature data.
Those of ordinary skill in the art will appreciate that all or a portion of the steps of the various methods of the above embodiments may be performed by instructions, or by instructions controlling associated hardware, which may be stored in a computer-readable storage medium and loaded and executed by a processor.
To this end, embodiments of the present invention provide a storage medium having stored therein a plurality of instructions capable of being loaded by a processor to perform the steps of any of the search phrase evaluation methods provided by embodiments of the present invention. For example, the instructions may perform the steps of:
acquiring corpus; acquiring search phrases corresponding to the corpus; the search phrase is a phrase generated by combining keywords in the corpus with matched hot search words, and the hot search words are entity words with the search quantity larger than a preset threshold value; detecting the search phrase to obtain multidimensional feature data of the search phrase; and evaluating the search phrase according to the multidimensional feature data.
The specific implementation of each operation above may be referred to the previous embodiments, and will not be described herein.
Wherein the storage medium may include: read Only Memory (ROM), random access Memory (RAM, random Access Memory), magnetic or optical disk, and the like.
The steps in the method for evaluating a search phrase according to the embodiment of the present invention can be performed by the instructions stored in the storage medium, so that the beneficial effects that can be achieved by the method for evaluating a search phrase according to the embodiment of the present invention can be achieved, and detailed descriptions of the foregoing embodiments are omitted.
The foregoing describes in detail a search phrase evaluation method, apparatus, server and storage medium provided by the embodiments of the present invention, and specific examples are applied to illustrate the principles and embodiments of the present invention, where the foregoing examples are only used to help understand the method and core idea of the present invention; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in light of the ideas of the present invention, the present description should not be construed as limiting the present invention.

Claims (11)

1. A method of evaluating a search phrase, comprising:
acquiring corpus;
acquiring search phrases corresponding to the corpus; the search phrase is a phrase generated by combining keywords in the corpus with matched hot search words, and the hot search words are entity words with the search quantity larger than a preset threshold value;
identifying a central word of the corpus;
identifying core words in the search phrase, wherein the core words comprise the key words or the popular search words;
detecting whether the center word and the core word meet at least one theme condition in preset theme conditions according to a pre-established knowledge graph; the subject conditions include belonging to the same concept, being associated with the same event, having a affiliation, or belonging to the same subject;
if the central word and the core word do not meet at least one topic condition in preset topic conditions, determining that the topic relevance of the search phrase and the corpus is a first topic relevance;
if the central word and the core word meet at least one topic condition in preset topic conditions, determining that the topic relevance of the search phrase and the corpus is a second topic relevance, wherein the second topic relevance is larger than the first topic relevance;
Detecting whether ambiguity exists between the central word and the core word according to a pre-established knowledge graph;
if ambiguity exists between the central word and the core word, determining that the entity correlation degree between the search phrase and the corpus is a first entity correlation degree;
if the central word and the core word have no ambiguity, determining that the entity correlation degree of the search phrase and the corpus is a second entity correlation degree, wherein the second entity correlation degree is larger than the first entity correlation degree;
determining the relevance of the search phrase and the corpus according to the topic relevance and the entity relevance;
detecting the integrity of the search phrase to obtain the integrity;
detecting the availability of the search phrase according to preset availability conditions to obtain availability;
adding the relevance, the completeness, and the availability to multi-dimensional feature data of the search phrase;
and evaluating the search phrase according to the multidimensional feature data.
2. The method for evaluating search phrases according to claim 1, wherein the determining the relevance of the search phrase to the corpus according to the topic relevance and the entity relevance specifically comprises:
If the topic relevance is a first topic relevance, determining that the relevance of the search phrase and the corpus is the first relevance;
if the topic relevance is a second topic relevance and the entity relevance is a first entity relevance, determining that the relevance between the search phrase and the corpus is a second relevance;
if the topic relevance is a second topic relevance and the entity relevance is a second entity relevance, determining that the relevance of the search phrase and the corpus is a third relevance, wherein the first relevance, the second relevance and the third relevance are sequentially increased.
3. The method for evaluating a search phrase according to claim 1, wherein the detecting the integrity of the search phrase to obtain the integrity specifically includes:
detecting the text integrity of the search phrase to obtain the text integrity;
detecting the semantic integrity of the search phrase to obtain the semantic integrity;
and determining the integrity of the search phrase according to the text integrity and the semantic integrity.
4. The method for evaluating a search phrase according to claim 3, wherein the detecting the text integrity of the search phrase to obtain the text integrity specifically includes:
Detecting whether the grammar structure of the search phrase is complete;
if the grammar structure is incomplete, determining that the text integrity of the search phrase is the first text integrity;
if the grammar structure is complete, detecting whether the search phrase is a parallel phrase or a bias phrase;
if yes, determining that the text integrity of the search phrase is second text integrity;
if not, determining that the text integrity of the search phrase is third text integrity, wherein the first text integrity, the second text integrity and the third text are sequentially increased.
5. The method for evaluating search terms according to claim 4, wherein the detecting the semantic integrity of the search terms to obtain the semantic integrity specifically comprises:
identifying semantic information corresponding to the search phrase;
if the corresponding semantic information is not identified, determining the semantic integrity of the search phrase as a first semantic integrity;
if the search phrase is identified to correspond to at least two semantic information, determining that the semantic integrity of the search phrase is a second semantic integrity;
if the search phrase is identified to correspond to one piece of semantic information, determining that the semantic integrity of the search phrase is third semantic integrity, wherein the first semantic integrity, the second semantic integrity and the third semantic integrity are sequentially increased.
6. The method for evaluating search terms according to claim 5, wherein said determining the integrity of the search terms according to the text integrity and the semantic integrity comprises:
if the text integrity is the first text integrity or the semantic integrity is the first semantic integrity, determining that the integrity of the search phrase is the first integrity;
if the text integrity is the second text integrity or the third text integrity and the semantic integrity is the second semantic integrity, determining that the integrity of the search phrase is the second integrity;
if the text integrity is the second text integrity or the third text integrity and the semantic integrity is the third semantic integrity, determining that the integrity of the search phrase is the third integrity, wherein the first integrity, the second integrity and the third integrity are sequentially increased.
7. The method of claim 1, wherein the relevance comprises sequentially increasing first, second and third relevance, the integrity comprises sequentially increasing first, second and third integrity, and the availability comprises sequentially increasing first and second availability;
The step of evaluating the search phrase according to the multidimensional feature data specifically comprises the following steps:
if the relevance is first relevance, or the integrity is first integrity, or the availability is first availability, evaluating the recommendation degree of the search phrase as first recommendation degree;
if the relevance is a third relevance, the integrity is a third integrity, and the availability is a second availability, evaluating that the recommendation of the search phrase is a third recommendation;
otherwise, evaluating the recommendation degree of the search phrase as a second recommendation degree, wherein the first recommendation degree, the second recommendation degree and the third recommendation degree are sequentially increased.
8. The method of evaluating a search phrase of claim 1, further comprising:
acquiring an evaluation result of the search phrase;
and if the evaluation result meets the preset evaluation condition, adding the search phrase and the evaluation result thereof into a training sample.
9. An evaluation device for search phrases, comprising:
the corpus acquisition module is used for acquiring corpus;
the search phrase acquisition module is used for acquiring search phrases corresponding to the corpus; the search phrase is a phrase generated by combining keywords in the corpus with matched hot search words, and the hot search words are entity words with the search quantity larger than a preset threshold value;
The detection module is used for identifying the central word of the corpus; identifying core words in the search phrase, wherein the core words comprise the key words or the popular search words; detecting whether the center word and the core word meet at least one theme condition in preset theme conditions according to a pre-established knowledge graph; the subject conditions include belonging to the same concept, being associated with the same event, having a affiliation, or belonging to the same subject; if the central word and the core word do not meet at least one topic condition in preset topic conditions, determining that the topic relevance of the search phrase and the corpus is a first topic relevance; if the central word and the core word meet at least one topic condition in preset topic conditions, determining that the topic relevance of the search phrase and the corpus is a second topic relevance, wherein the second topic relevance is larger than the first topic relevance; detecting whether ambiguity exists between the central word and the core word according to a pre-established knowledge graph; if ambiguity exists between the central word and the core word, determining that the entity correlation degree between the search phrase and the corpus is a first entity correlation degree; if the central word and the core word have no ambiguity, determining that the entity correlation degree of the search phrase and the corpus is a second entity correlation degree, wherein the second entity correlation degree is larger than the first entity correlation degree; determining the relevance of the search phrase and the corpus according to the topic relevance and the entity relevance; detecting the integrity of the search phrase to obtain the integrity; detecting the availability of the search phrase according to preset availability conditions to obtain availability; adding the relevance, the completeness, and the availability to multi-dimensional feature data of the search phrase; the method comprises the steps of,
And the evaluation module is used for evaluating the search phrase according to the multidimensional characteristic data.
10. A server comprising a memory and a processor, the memory having stored therein a computer program which, when executed by the processor, causes the processor to perform the steps in the method of evaluating a search phrase of claim 1.
11. A storage medium storing a plurality of instructions adapted to be loaded by a processor to perform the steps in the method of evaluating a search phrase of any of claims 1 to 8.
CN201911048275.1A 2019-10-30 2019-10-30 Evaluation method and device of search phrase, server and storage medium Active CN112749246B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911048275.1A CN112749246B (en) 2019-10-30 2019-10-30 Evaluation method and device of search phrase, server and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911048275.1A CN112749246B (en) 2019-10-30 2019-10-30 Evaluation method and device of search phrase, server and storage medium

Publications (2)

Publication Number Publication Date
CN112749246A CN112749246A (en) 2021-05-04
CN112749246B true CN112749246B (en) 2023-11-28

Family

ID=75640999

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911048275.1A Active CN112749246B (en) 2019-10-30 2019-10-30 Evaluation method and device of search phrase, server and storage medium

Country Status (1)

Country Link
CN (1) CN112749246B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105138511A (en) * 2015-08-10 2015-12-09 北京思特奇信息技术股份有限公司 Method and system for semantically analyzing search keyword
CN109460499A (en) * 2018-10-16 2019-03-12 青岛聚看云科技有限公司 Target search word generation method and device, electronic equipment, storage medium
CN109522465A (en) * 2018-10-22 2019-03-26 国家电网公司 The semantic searching method and device of knowledge based map
CN110377817A (en) * 2019-06-13 2019-10-25 百度在线网络技术(北京)有限公司 Search entry method for digging and device and its application in multimedia resource

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160004766A1 (en) * 2006-10-10 2016-01-07 Abbyy Infopoisk Llc Search technology using synonims and paraphrasing

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105138511A (en) * 2015-08-10 2015-12-09 北京思特奇信息技术股份有限公司 Method and system for semantically analyzing search keyword
CN109460499A (en) * 2018-10-16 2019-03-12 青岛聚看云科技有限公司 Target search word generation method and device, electronic equipment, storage medium
CN109522465A (en) * 2018-10-22 2019-03-26 国家电网公司 The semantic searching method and device of knowledge based map
CN110377817A (en) * 2019-06-13 2019-10-25 百度在线网络技术(北京)有限公司 Search entry method for digging and device and its application in multimedia resource

Also Published As

Publication number Publication date
CN112749246A (en) 2021-05-04

Similar Documents

Publication Publication Date Title
CN104933164B (en) In internet mass data name entity between relationship extracting method and its system
CN102890713B (en) A kind of music recommend method based on user's current geographic position and physical environment
Rehman et al. A benchmark dataset and learning high-level semantic embeddings of multimedia for cross-media retrieval
Xie et al. Fast and accurate near-duplicate image search with affinity propagation on the ImageWeb
Reinanda et al. Mining, ranking and recommending entity aspects
CN107918644B (en) News topic analysis method and implementation system in reputation management framework
Petkos et al. Two-level Message Clustering for Topic Detection in Twitter.
CN110968684A (en) Information processing method, device, equipment and storage medium
CN113962293B (en) LightGBM classification and representation learning-based name disambiguation method and system
JP2016540332A (en) Visual-semantic composite network and method for forming the network
Elshater et al. godiscovery: Web service discovery made efficient
CN110888991A (en) Sectional semantic annotation method in weak annotation environment
CN115563313A (en) Knowledge graph-based document book semantic retrieval system
WO2015084757A1 (en) Systems and methods for processing data stored in a database
CN113742446A (en) Knowledge graph question-answering method and system based on path sorting
Budikova et al. ConceptRank for search-based image annotation
Fernández et al. Vits: video tagging system from massive web multimedia collections
An et al. A heuristic approach on metadata recommendation for search engine optimization
CN112911331A (en) Music identification method, device and equipment for short video and storage medium
CN114328800A (en) Text processing method and device, electronic equipment and computer readable storage medium
CN103226601A (en) Method and device for image search
Kordumova et al. Exploring the long tail of social media tags
US20170124090A1 (en) Method of discovering and exploring feature knowledge
CN111752922A (en) Method and device for establishing knowledge database and realizing knowledge query
CN112749246B (en) Evaluation method and device of search phrase, server and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40048359

Country of ref document: HK

GR01 Patent grant
GR01 Patent grant