CN112749246A - Search phrase evaluation method, device, server and storage medium - Google Patents

Search phrase evaluation method, device, server and storage medium Download PDF

Info

Publication number
CN112749246A
CN112749246A CN201911048275.1A CN201911048275A CN112749246A CN 112749246 A CN112749246 A CN 112749246A CN 201911048275 A CN201911048275 A CN 201911048275A CN 112749246 A CN112749246 A CN 112749246A
Authority
CN
China
Prior art keywords
search phrase
integrity
relevance
search
corpus
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911048275.1A
Other languages
Chinese (zh)
Other versions
CN112749246B (en
Inventor
田沐燃
郝心
李晓亮
黄艺华
刘一岑
曹晟
龙柏炜
张懿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201911048275.1A priority Critical patent/CN112749246B/en
Publication of CN112749246A publication Critical patent/CN112749246A/en
Application granted granted Critical
Publication of CN112749246B publication Critical patent/CN112749246B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • G06F16/325Hash tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention discloses a method and a device for evaluating a search phrase, a server and a storage medium. The evaluation method of the search phrase comprises the following steps: obtaining a corpus; acquiring a search phrase corresponding to the corpus, wherein the search phrase is generated by combining keywords in the corpus with matched hot search words, and the hot search words are entity words with search quantity larger than a preset threshold value; detecting the search phrase to obtain multi-dimensional feature data of the search phrase; evaluating the search phrase according to the multi-dimensional feature data. According to the embodiment of the invention, the evaluation efficiency of the search phrase is effectively improved through multi-dimensional intelligent evaluation of the search phrase.

Description

Search phrase evaluation method, device, server and storage medium
Technical Field
The invention relates to the technical field of computers, in particular to a search phrase evaluation method, a search phrase evaluation device, a search phrase evaluation server and a storage medium.
Background
Contextual searches are an important direction in the development of search content products. By comprehensively considering the background and interests of the user, the intention of the user is deeply understood, the potential search requirements of the user are fully mined, the situation search can more intelligently and conveniently provide the desired content for the user, and the traditional search mode of 'search word-search result' is broken through. Recommending related search phrases based on text content meets the extended reading requirements of users, and is one of the main application scenes of contextual search.
Currently, recommended search phrases are mainly generated by an algorithm model, but a corpus on which the algorithm model depends has unreliability, so that the search phrases generated by the algorithm model are unreliable. For the problem, in the prior art, search phrases generated by an algorithm model are evaluated one by one manually, but the evaluation efficiency is low due to manual evaluation.
Disclosure of Invention
The invention provides a search phrase evaluation method, a search phrase evaluation device, a server and a storage medium, which can effectively improve the evaluation efficiency of search phrases.
In a first aspect, the present invention provides a method for evaluating a search phrase, comprising:
obtaining a corpus;
acquiring a search phrase recommended based on the corpus; the search phrase is generated by combining key words in the corpus with matched hot search words, and the hot search words are entity words with the search quantity larger than a preset threshold value
Detecting the search phrase to obtain multi-dimensional feature data of the search phrase;
evaluating the search phrase according to the multi-dimensional feature data.
In some embodiments of the present invention, the detecting the search phrase to obtain the multidimensional feature data of the search phrase specifically includes:
detecting the correlation between the search phrase and the corpus to obtain the degree of correlation;
detecting the integrity of the search phrase to obtain the integrity;
detecting the availability of the search phrase according to a preset availability condition to obtain the availability;
adding the relevance, the completeness, and the availability to the multi-dimensional feature data of the search phrase.
In some embodiments of the present invention, the detecting the correlation between the search phrase and the corpus to obtain the correlation specifically includes:
detecting the topic relevance of the search phrase and the corpus to obtain topic relevance;
detecting the entity relevance of the search phrase and the corpus to obtain entity relevance;
and determining the relevance of the search phrase and the corpus according to the topic relevance and the entity relevance.
In some embodiments of the present invention, the detecting the topic relevance between the search phrase and the corpus to obtain the topic relevance specifically includes:
identifying a central word of the corpus;
identifying core words in the search phrase, the core words including the keywords or the popular search words;
detecting whether the central word and the core word meet at least one theme condition in preset theme conditions or not according to a pre-established knowledge graph; the subject conditions comprise belonging to the same concept, being associated with the same event, having an affiliation, or belonging to the same subject;
if not, determining the topic relevance of the search phrase and the corpus as a first topic relevance;
if yes, determining the topic relevance of the search phrase and the corpus as a second topic relevance, wherein the second topic relevance is greater than the first topic relevance.
In some embodiments of the present invention, the detecting the entity relevance between the search phrase and the corpus to obtain the entity relevance specifically includes:
detecting whether ambiguity exists between the central word and the core word according to a pre-established knowledge graph;
if so, determining the entity relevance of the search phrase and the corpus as a first entity relevance;
if not, determining that the entity relevance of the search phrase and the corpus is a second entity relevance, wherein the second entity relevance is greater than the first entity relevance.
In some embodiments of the present invention, the determining the relevance between the search phrase and the corpus according to the topic relevance and the entity relevance specifically includes:
if the topic relevance is a first topic relevance, determining the relevance of the search phrase and the corpus as a first relevance;
if the topic relevance is a second topic relevance and the entity relevance is a first entity relevance, determining that the relevance of the search phrase and the corpus is a second relevance;
and if the topic relevance is a second topic relevance and the entity relevance is a second entity relevance, determining that the relevance of the search phrase and the corpus is a third relevance, and sequentially increasing the first relevance, the second relevance and the third relevance.
In some embodiments of the present invention, the detecting the integrity of the search phrase to obtain the integrity specifically includes:
detecting the text integrity of the search phrase to obtain the text integrity;
detecting the semantic integrity of the search phrase to obtain the semantic integrity;
and determining the integrity of the search phrase according to the text integrity and the semantic integrity.
In some embodiments of the present invention, the detecting the text integrity of the search phrase to obtain the text integrity specifically includes:
detecting whether a grammatical structure of the search phrase is complete;
if the grammar structure is incomplete, determining the text integrity of the search phrase as a first text integrity;
if the grammar structure is complete, detecting whether the search phrase is a parallel phrase or a bias phrase;
if so, determining the text integrity of the search phrase as a second text integrity;
if not, determining that the text integrity of the search phrase is a third text integrity, and sequentially increasing the first text integrity, the second text integrity and the third text.
In some embodiments of the present invention, the detecting the semantic integrity of the search phrase to obtain the semantic integrity specifically includes:
identifying semantic information corresponding to the search phrase;
if the corresponding semantic information is not identified, determining the semantic integrity of the search phrase as a first semantic integrity;
if the search phrase is identified to correspond to at least two semantic information, determining the semantic integrity of the search phrase as a second semantic integrity;
and if the search phrase is identified to correspond to semantic information, determining the semantic integrity of the search phrase to be a third semantic integrity, and sequentially increasing the first semantic integrity, the second semantic integrity and the third semantic integrity.
In some embodiments of the present invention, the determining the integrity of the search phrase according to the text integrity and the semantic integrity specifically includes:
if the text integrity is a first text integrity or the semantic integrity is a first semantic integrity, determining the integrity of the search phrase as the first integrity;
if the text integrity is a second text integrity or a third text integrity and the semantic integrity is a second semantic integrity, determining the integrity of the search phrase to be the second integrity;
and if the text integrity is a second text integrity or a third text integrity and the semantic integrity is a third semantic integrity, determining the integrity of the search phrase to be a third integrity, and sequentially increasing the first integrity, the second integrity and the third integrity.
In some embodiments of the present invention, the correlation includes a first correlation, a second correlation and a third correlation which are sequentially increased, the integrity includes a first integrity, a second integrity and a third integrity which are sequentially increased, and the availability includes a first availability and a second availability which are sequentially increased;
the evaluating the search phrase according to the multi-dimensional feature data specifically includes:
if the relevance is first relevance, or the integrity is first integrity, or the availability is first availability, evaluating the recommendation degree of the search phrase as first recommendation degree;
if the relevance is a third relevance, the integrity is a third integrity, and the availability is a second availability, evaluating the recommendation degree of the search phrase as a third recommendation degree;
and if not, evaluating the recommendation degree of the search phrase as a second recommendation degree, wherein the first recommendation degree, the second recommendation degree and the third recommendation degree are sequentially increased.
In some embodiments of the invention, the method further comprises:
obtaining an evaluation result of the search phrase;
and if the evaluation result meets the preset evaluation condition, adding the search phrase and the evaluation result thereof into the training sample.
In some embodiments of the invention, the method further comprises:
saving the evaluation result of the search phrase in a block form in a block chain.
In a second aspect, the present invention provides an apparatus for evaluating a search phrase, comprising:
the corpus acquiring module is used for acquiring a corpus;
a search phrase obtaining module, configured to obtain a search phrase corresponding to the corpus; the search phrase is generated by combining the keywords in the corpus with matched popular search words, and the popular search words are entity words with the search quantity larger than a preset threshold value;
the detection module is used for detecting the search phrase to obtain multi-dimensional feature data of the search phrase; and the number of the first and second groups,
and the evaluation module is used for evaluating the search phrase according to the multi-dimensional characteristic data.
In a third aspect, the present invention provides a server comprising a memory and a processor, the memory having stored therein a computer program that, when executed by the processor, causes the processor to perform the steps of:
obtaining a corpus;
acquiring a search phrase corresponding to the corpus; the search phrase is generated by combining the keywords in the corpus with matched popular search words, and the popular search words are entity words with the search quantity larger than a preset threshold value;
detecting the search phrase to obtain multi-dimensional feature data of the search phrase;
evaluating the search phrase according to the multi-dimensional feature data.
In a fourth aspect, the present invention provides a storage medium storing a plurality of instructions adapted to be loaded by a processor to perform the steps of the method for evaluating a search phrase according to any one of the first aspect.
According to the embodiment of the invention, the corpus is obtained, the search phrase corresponding to the corpus is further obtained, the search phrase is detected, the multi-dimensional feature data of the search phrase is obtained, the multi-dimensional intelligent evaluation of the search phrase is realized according to the multi-dimensional feature data, and the evaluation efficiency of the search phrase is effectively improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a schematic diagram of one scenario of a system for evaluating search phrases provided in embodiments of the present invention;
FIG. 2 is an alternative structural diagram of the distributed system applied to the blockchain system according to the embodiment of the present invention;
FIG. 3 is an alternative block structure provided in the embodiments of the present invention;
FIG. 4 is a schematic flow chart diagram of a method for evaluating search phrases provided in embodiments of the present invention;
FIG. 5 is a schematic diagram of a recommendation interface for search phrases in an embodiment of the present invention;
FIG. 6 is another flow diagram of a method for evaluating search phrases provided in embodiments of the present invention;
FIG. 7 is a schematic diagram of an arrangement of an apparatus for evaluating search phrases provided in an embodiment of the present invention;
fig. 8 is a schematic structural diagram of a server provided in the embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In the description that follows, specific embodiments of the present invention are described with reference to steps and symbols executed by one or more computers, unless otherwise indicated. Accordingly, these steps and operations will be referred to, several times, as being performed by a computer, the computer performing operations involving a processing unit of the computer in electronic signals representing data in a structured form. This operation transforms the data or maintains it at locations in the computer's memory system, which may be reconfigured or otherwise altered in a manner well known to those skilled in the art. The data maintains a data structure that is a physical location of the memory that has particular characteristics defined by the data format. However, while the principles of the invention have been described in language specific to above, it is not intended to be limited to the specific form set forth herein, but on the contrary, it is to be understood that various steps and operations described hereinafter may be implemented in hardware.
The term "module" or "unit" as used herein may be considered a software object executing on the computing system. The various components, modules, engines, and services described herein may be viewed as objects implemented on the computing system. The apparatus and method described herein are preferably implemented in software, but may also be implemented in hardware, and are within the scope of the present invention.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.
The embodiment of the invention provides a method and a device for evaluating a search phrase, a server and a storage medium.
Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.
The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.
Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and formal education learning.
The scheme provided by the embodiment of the invention can be an evaluation method related to artificial intelligence search phrases, namely the embodiment of the invention provides an evaluation method based on artificial intelligence search phrases, which comprises the following steps: obtaining a corpus; acquiring a search phrase corresponding to the corpus; the search phrase is generated by combining the keywords in the corpus with matched popular search words, and the popular search words are entity words with the search quantity larger than a preset threshold value; detecting the search phrase by using a machine learning algorithm to obtain multi-dimensional feature data of the search phrase; evaluating the search phrase according to the multi-dimensional feature data.
Referring to fig. 1, fig. 1 is a schematic view of a scenario of a search phrase evaluation system according to an embodiment of the present invention, where the search phrase evaluation system may include a server 10, and an evaluation device for a search phrase is integrated in the server 10. In the embodiment of the present invention, the server 10 is mainly used for obtaining corpora; acquiring a search phrase corresponding to the corpus; the search phrase is generated by combining the keywords in the corpus with matched popular search words, and the popular search words are entity words with the search quantity larger than a preset threshold value; detecting the search phrase to obtain multi-dimensional feature data of the search phrase; evaluating the search phrase according to the multi-dimensional feature data.
In this embodiment of the present invention, the server 10 may be an independent server, or may be a server network or a server cluster composed of servers, for example, the server 10 described in this embodiment of the present invention includes, but is not limited to, a computer, a network host, a single network server, a plurality of network server sets, or a cloud server composed of a plurality of servers. Among them, the Cloud server is constituted by a large number of computers or web servers based on Cloud Computing (Cloud Computing).
Those skilled in the art will appreciate that the application environment shown in fig. 1 is only one application scenario related to the present application, and does not constitute a limitation to the application scenario of the present application, and that other application environments may further include more or less servers than those shown in fig. 1, or a network connection relationship of servers, for example, only 1 server is shown in fig. 1, and it will be understood that the system for evaluating a search phrase may further include one or more other servers, or/and one or more clients connected to a network of servers, and is not limited herein.
In addition, as shown in fig. 1, the system for evaluating search phrases may further include a memory 20 for storing data, such as a corpus, in which corpora, such as information, articles, and search phrases corresponding to the corpora, are stored, the memory 20 may further include a feature database, in which multidimensional feature data of the search phrases are stored, and the memory 20 may further include an evaluation result database, in which evaluation results of the search phrases are stored.
It should be noted that the scenario diagram of the search phrase evaluation system shown in fig. 1 is only an example, and the search phrase evaluation system and the scenario described in the embodiment of the present invention are for more clearly illustrating the technical solution of the embodiment of the present invention, and do not form a limitation on the technical solution provided in the embodiment of the present invention.
The system for evaluating the search phrase according to the embodiment of the present invention may be a distributed system formed by a plurality of nodes (any form of computing device in an access network, such as the server 10, etc.) connected in a network communication manner.
Taking a distributed system as an example of a blockchain system, referring To fig. 2, fig. 2 is an optional structural schematic diagram of the distributed system 100 applied To the blockchain system, which is formed by a plurality of nodes 200 (computing devices in any form in an access network, such as servers) and clients 300, and a Peer-To-Peer (P2P, Peer To Peer) network is formed between the nodes, and the P2P Protocol is an application layer Protocol operating on a Transmission Control Protocol (TCP). In a distributed system, any machine, such as a server or a terminal, can join to become a node, and the node comprises a hardware layer, a middle layer, an operating system layer and an application layer. In the embodiment of the present invention, the servers 10 are each a node in the blockchain system.
Referring to the functions of each node in the blockchain system shown in fig. 2, the functions involved include:
1) routing, a basic function that a node has, is used to support communication between nodes.
Besides the routing function, the node may also have the following functions:
2) the application is used for being deployed in a block chain, realizing specific services according to actual service requirements, recording data related to the realization functions to form recording data, carrying a digital signature in the recording data to represent a source of task data, and sending the recording data to other nodes in the block chain system, so that the other nodes add the recording data to a temporary block when the source and integrity of the recording data are verified successfully.
For example, the services implemented by the application include:
2.1) wallet, for providing the function of transaction of electronic money, including initiating transaction (i.e. sending the transaction record of current transaction to other nodes in the blockchain system, after the other nodes are successfully verified, storing the record data of transaction in the temporary blocks of the blockchain as the response of confirming the transaction is valid; of course, the wallet also supports the querying of the remaining electronic money in the electronic money address;
and 2.2) sharing the account book, wherein the shared account book is used for providing functions of operations such as storage, query and modification of account data, record data of the operations on the account data are sent to other nodes in the block chain system, and after the other nodes verify the validity, the record data are stored in a temporary block as a response for acknowledging that the account data are valid, and confirmation can be sent to the node initiating the operations.
2.3) Intelligent contracts, computerized agreements, which can enforce the terms of a contract, implemented by codes deployed on a shared ledger for execution when certain conditions are met, for completing automated transactions according to actual business requirement codes, such as querying the logistics status of goods purchased by a buyer, transferring the buyer's electronic money to the merchant's address after the buyer signs for the goods; of course, smart contracts are not limited to executing contracts for trading, but may also execute contracts that process received information.
3) And the Block chain comprises a series of blocks (blocks) which are mutually connected according to the generated chronological order, new blocks cannot be removed once being added into the Block chain, and recorded data submitted by nodes in the Block chain system are recorded in the blocks.
Referring to fig. 3, fig. 3 is an optional schematic diagram of a Block Structure (Block Structure) according to an embodiment of the present invention, where each Block includes a hash value of a transaction record stored in the Block (hash value of the Block) and a hash value of a previous Block, and the blocks are connected by the hash values to form a Block chain. The block may include information such as a time stamp at the time of block generation. A block chain (Blockchain), which is essentially a decentralized database, is a string of data blocks associated by using cryptography, and each data block contains related information for verifying the validity (anti-counterfeiting) of the information and generating a next block.
When the evaluation system of the search phrase in the embodiment of the present invention is a blockchain system, and the server in the embodiment of the present invention is a node in the blockchain system, the evaluation result of the search phrase may be stored in the blockchain. Specifically, in the embodiment of the present invention, the method further includes: obtaining an evaluation result of the search phrase; saving the evaluation result of the search phrase in a block form in a block chain. For a specific way of adding blocks, reference may be made to the description of the above-mentioned blockchain system, which is not described herein again.
The following is a detailed description of specific embodiments.
In the present embodiment, a description will be made from the viewpoint of an evaluation means of a search phrase, which may be specifically integrated in the server 10.
The invention provides an evaluation method of a search phrase, which comprises the following steps: obtaining a corpus; acquiring a search phrase corresponding to the corpus; the search phrase is generated by combining the keywords in the corpus with matched popular search words, and the popular search words are entity words with the search quantity larger than a preset threshold value; detecting the search phrase to obtain multi-dimensional feature data of the search phrase; evaluating the search phrase according to the multi-dimensional feature data.
Please refer to fig. 4, which is a flowchart illustrating an evaluation method of a search phrase according to an embodiment of the present invention, the evaluation method of a search phrase includes:
401. and obtaining the corpus.
The corpus refers to language materials, and the corpus in the embodiment of the invention refers to text information provided for a user to read and view, such as information and articles on a website. As shown in FIG. 5, the main display area 51 of the browser displays an article titled "Per day play in the Forward broadcast Aux match at the center, in 14 days, and allows a unique foreign war, and one article is expected to catch the lead" to form a corpus.
402. Acquiring a search phrase corresponding to the corpus; the search phrase is generated by combining the keywords in the corpus with matched popular search words, and the popular search words are entity words with the search quantity larger than a preset threshold value.
The term "entity" refers to a proper noun with a unique meaning. In order to meet the extended reading requirement of a user, a recommendation model is built, so that the recommendation model generates related search phrases based on the specific content of the corpus and recommends the search phrases to the user. Specifically, a plurality of candidate keywords are extracted from the corpus according to a pre-constructed word bank, and then the importance degree of each candidate keyword in the corpus is calculated, so that the candidate keywords with the importance degree higher than a preset degree value are used as the keywords of the corpus. Meanwhile, a plurality of popular search terms in the search engine are obtained, and the popular search terms are obtained by the search engine according to actual search statistics of a large number of users. And then, matching the keywords of the corpus with each hot search word to obtain the matching degree of the keywords of the corpus and each hot search word, and combining the hot search words with the matching degree larger than a preset matching value with the keywords of the corpus to generate search phrases to be recommended to the user.
After the user reads the corpus, the user does not need to manually input the related content of the corpus, and the user can quickly search the related content of the corpus by clicking the search phrase corresponding to the selected corpus, so that the user experience is greatly improved. However, since the popular search term is a term actually searched by the user and has a certain unreliability, and the search phrase generated based on the popular search term also has a certain unreliability, multi-dimensional detection needs to be performed on the search phrase to evaluate the recommendation degree of the search phrase, and the like.
Prior to performing multidimensional detection on search phrases, the search phrases may be initially filtered to remove unsatisfactory phrases. Specifically, the recommendation model may generate a plurality of search phrases based on the above method, where each search phrase corresponds to a keyword of a corpus. And detecting whether the keyword corresponding to each search phrase is a sensitive word, and removing the search phrase with the keyword being the sensitive word, namely, the search phrase does not carry out subsequent evaluation operation.
As shown in fig. 5, the recommendation model generates six search phrases based on the content of the article, and displays the six search phrases "coach in grandchild, vs. Zhu in grandchild, Men's half-duke live, Wenyweb Men in 2019, what cctv5+ is, and a coach" in the bottom area 52 of the main display area 51.
403. And detecting the search phrase to obtain the multi-dimensional characteristic data of the search phrase.
In order to improve the accuracy of detecting the search phrase, multi-dimensional detection can be performed on the search phrase, and each dimension of the search phrase is detected to obtain corresponding feature data, so that the multi-dimensional feature data is obtained. The multi-dimensional detection may include relevance detection, integrity detection, and usability detection, and accordingly, the multi-dimensional feature data may include relevance, integrity, and usability.
Specifically, the detecting the search phrase in step 403 to obtain the multidimensional feature data of the search phrase includes: detecting the correlation between the search phrase and the corpus to obtain the degree of correlation; detecting the integrity of the search phrase to obtain the integrity; detecting the availability of the search phrase according to a preset availability condition to obtain the availability; adding the relevance, the completeness, and the availability to the multi-dimensional feature data of the search phrase. The correlation, integrity and availability can be realized by different detection modes, which are as follows:
(1) correlation detection
Relevance detection is the analysis of the relevance of search phrases and corpora. The relevance detection of the search phrase to the corpus may be implemented by relevance detection of the search phrase to the words in the corpus. Specifically, the detecting the relevance between the search phrase and the corpus to obtain the relevance specifically includes: extracting a central sentence of the corpus; identifying a central word in the central sentence; identifying core words in the search phrase; and detecting the correlation between the search phrase and the corpus according to the central word and the core word to obtain the degree of correlation.
The central sentence refers to the most refined sentence capable of summarizing the core content of the corpus, that is, the key information is incomplete due to the deletion of any content in the central sentence. The core word in the core sentence refers to a proper noun (subject or object) or an action or state (predicate or expression) of an object in the sentence. The identification of the central word can be realized through a neural network, namely, candidate central words are extracted from the central sentence, meanwhile, the weight of the candidate central words is calculated, and the candidate central words with the weight larger than a threshold (such as 0.5) are selected as the central words. The identification of the core words in the search phrase can be realized through posterior probability and information gain, the candidate core words are extracted from the search phrase, meanwhile, the weight of the candidate core words is calculated, and the candidate core words with the weight larger than a threshold (such as 0.5) are selected as the core words.
It should be noted that a plurality of core words may be identified in the central sentence, and a plurality of core words may be identified in the search phrase, for example, a keyword of a corpus corresponding to the search phrase, a popular search word, and the like. And respectively detecting the correlation between each central word and each core word, wherein the correlation detection between any central word and any core word can obtain corresponding correlation, and the maximum correlation is selected from the obtained correlation to be used as the correlation between the search phrase and the corpus. For example, 2 core words A, B are identified in the core sentence, 3 core words C, D, E are identified in the search phrase, the correlation between the core word a and the core word C, D, E is detected, and the correlation between the core word B and the core word C, D, E is detected, so that 6 correlation degrees are obtained, and the correlation between the core word B and the core word C is the largest in the 6 correlation degrees, so that the correlation between the core word C and the core word B is used as the correlation between the search phrase and the corpus. In addition, it should be noted that the correlation detection in the embodiment of the present invention refers to the detection of positive correlation, that is, the greater the detected correlation degree, the more relevant the search phrase and the corpus is.
Relevance detection may include both detection, namely topic relevance detection and entity relevance detection. Specifically, the detecting the relevance between the search phrase and the corpus according to the core word and the core word to obtain the relevance includes: detecting the topic relevance of the search phrase and the corpus according to the central word and the core word to obtain topic relevance; detecting the entity relevance of the search phrase and the corpus according to the central word and the core word to obtain entity relevance; and determining the relevance of the search phrase and the corpus according to the topic relevance and the entity relevance.
The topic relevance detection of the search phrase and the corpus can be realized by the topic relevance detection of the central word and the core word. Topic relevance refers to the relevance of the two main contents. Specifically, the detecting the topic relevance between the search phrase and the corpus according to the core word and the core word to obtain the topic relevance includes: detecting whether the core words and the central words meet at least one theme condition in preset theme conditions or not according to a pre-established knowledge graph; the subject conditions comprise belonging to the same concept, being associated with the same event, having an affiliation, describing the same thing, or belonging to the same subject; if not, determining the topic relevance of the search phrase and the corpus as a first topic relevance; if yes, determining the topic relevance of the search phrase and the corpus as a second topic relevance, wherein the second topic relevance is greater than the first topic relevance. Wherein a greater subject relevance indicates that the search phrase is more relevant to the subject of the corpus.
It should be noted that the knowledge graph is intended to describe various entities or concepts and their relationships existing in the real world, and it constitutes a huge semantic network graph, where nodes represent entities or concepts, and edges are formed by attributes or relationships. Because the core words and the core words are all entity words, the incidence relation between the core words and the core words can be obtained through the knowledge graph, and then the relevance between the core words and the core words is detected according to the incidence relation.
And presetting a plurality of theme conditions according to the incidence relation required by the two related words, so as to realize the detection of the theme correlation between the search phrase and the corpus by detecting whether the central word and the core word meet any theme condition. For example, if a theme condition is set as belonging to the same concept, it is detected whether the core word and the core word are directly associated in the knowledge map, and whether the entity attributes are the same, where the entity attributes refer to types pointed by the entities, such as cars, events, and the like, and if the core word and the core word are directly associated in the knowledge map, and the entity attributes are the same, it is determined that the core word and the core word satisfy the theme condition, if the core word is the public CC, the core word is the toyota SUV, and the public CC and the toyota SUV both belong to cars, and satisfy the theme condition. And if so, judging that the central word and the core word meet the theme condition, if the central word is a certain word, the core word is a world trade building, and both the certain word and the world trade building appear in the 911 event, so that the theme condition is met. And if the central word is a certain Li, the core word is the hallucinogen novel, and if the certain Li is the name of a person of the certain hallucinogen, the certain Li and the hallucinogen novel have an affiliation relationship, and the theme condition is met. If the core word and the core word meet the theme condition, if the core word and the core word are related to the same word, the core word is a celestial beast, and the eighty beast can evolve into a celestial beast, namely the eighty beast and the celestial beast belong to different stages of the same main body and meet the theme condition, and if the core word is a cold in a dog cold, the core word is a cold in a human cold, and the dog and the human do not belong to the same main body and do not meet the theme condition.
Determining the topic correlation of the central word and the core word as long as the incidence relation between the central word and the core word meets any topic condition, thereby determining the topic correlation of the search phrase and the corpus, and setting the topic correlation as a second topic correlation, such as 1; if each topic condition is not satisfied, determining that the central word is not related to the topic of the core word, thereby determining that the search phrase is not related to the topic of the corpus, and setting the topic relevance of the search phrase to be the first topic relevance, such as 0.
The entity relevance detection of the search phrase and the corpus can be realized by the entity relevance detection of the core word and the central word. Entity relevance refers to the association of two described entities. Specifically, the detecting the entity relevance between the search phrase and the corpus according to the core word and the core word to obtain the entity relevance includes: detecting whether ambiguity exists between the central word and the core word according to a pre-established knowledge graph; if so, determining the entity relevance as a first entity relevance; if not, determining that the entity correlation degree is a second entity correlation degree, wherein the second entity correlation degree is greater than the first entity correlation degree. Wherein a greater degree of entity relevance indicates that the search phrase is more relevant to the entity of the corpus.
It should be noted that ambiguity detection of two words may be implemented by detecting hypernyms of the two words in the knowledge graph, that is, obtaining hypernyms of a core word and a core word in the knowledge graph, and detecting whether the hypernyms of the core word and the core word are the same, if the hypernyms of the core word and the core word are the same, it indicates that there is no ambiguity between the core word and the core word, and if the hypernyms of the core word and the core word are different, it indicates that there is ambiguity between the core word and the core word. For example, the central word is the west lake in the west lake of Guangzhou, the core word is the west lake in the west lake of Hangzhou, and the superior words of the two west lakes are different and ambiguous.
If the core word is ambiguous, determining that the core word is not related to the core word entity, thereby determining that the search phrase is not related to the entity of the corpus, and setting the entity relevance as a first entity relevance, such as 0; and if the central word is not ambiguous with the core word, determining that the central word is related to the core word entity, thereby determining that the central word is related to the core word entity, and setting the entity relevance as a second entity relevance, such as 1.
After the topic relevance and the entity relevance of the search phrase and the corpus are obtained, the relevance of the search phrase and the corpus can be determined according to the topic relevance and the entity relevance, and the method specifically comprises the following steps: if the topic relevance is a first topic relevance, determining the relevance of the search phrase and the corpus as a first relevance; if the topic relevance is a second topic relevance and the entity relevance is a first entity relevance, determining that the relevance of the search phrase and the corpus is a second relevance; and if the topic relevance is a second topic relevance and the entity relevance is a second entity relevance, determining that the relevance of the search phrase and the corpus is a third relevance, and sequentially increasing the first relevance, the second relevance and the third relevance.
Since the topic relevance is more weighted in the relevance detection of the search phrase and the corpus, the relevance of the search phrase and the corpus is mainly set based on the topic relevance. If the topic relevance is a first topic relevance, such as 0, and the search phrase is not related to the topic of the corpus, determining that the search phrase is not related to the corpus, and setting the relevance of the search phrase and the corpus to be the first relevance, such as 0; if the topic relevance is a second topic relevance, such as 1, and the entity relevance is a first entity relevance, such as 0, determining that the search phrase is related to the corpus, but the relevance is low, and setting the relevance of the search phrase and the corpus as a second relevance, such as 1; if the topic relevance is a second topic relevance, such as 1, and the entity relevance is a second entity relevance, such as 1, it is determined that the search phrase is related to the corpus, and the relevance is high, the relevance of the search phrase and the corpus is set to a third relevance, such as 2, and the specific relevance determination rule is shown in table 1.
Topic relevance Degree of entity correlation Degree of correlation
0 0 0
0 1 0
1 0 1
1 1 2
TABLE 1
(2) Integrity detection
Integrity detection refers to integrity detection of structural and semantic information of a search phrase. Integrity detection may include two aspects of detection, namely text integrity detection and semantic integrity detection. Specifically, the detecting the integrity of the search phrase to obtain the integrity includes: detecting the text integrity of the search phrase to obtain the text integrity; detecting the semantic integrity of the search phrase to obtain the semantic integrity; and determining the integrity of the search phrase according to the text integrity and the semantic integrity.
Wherein the text integrity check of the search phrase may be implemented by detecting a grammatical structure of the search phrase. Specifically, the detecting the text integrity of the search phrase to obtain the text integrity includes: detecting whether a grammatical structure of the search phrase is complete; if the grammar structure is incomplete, determining the text integrity of the search phrase as a first text integrity; if the grammar structure is complete, detecting whether the search phrase is a parallel phrase or a bias phrase; if so, determining the text integrity of the search phrase as a second text integrity; if not, determining that the text integrity of the search phrase is a third text integrity, and sequentially increasing the first text integrity, the second text integrity and the third text integrity. Wherein the greater the text completeness, the more complete the text of the search phrase.
It should be noted that the completeness of the grammar structure includes complete structure and complete words, and whether the grammar structure is complete or not can be judged through a probability model, that is, a probability model is constructed in advance, search words actively input by a user are collected, the collected search words are input into the probability model to be trained, so that the probability model can detect the probability of actively searching different words by the user, and then word segmentation processing is performed on the search phrases, the last word in the search phrases is obtained, the word is input into the probability model, the probability of actively searching the word by the user is obtained, if the probability is higher than a certain threshold value, the grammar structure of the search phrases is judged to be complete, otherwise, the grammar structure of the search phrases is incomplete. The incompleteness of the grammatical structure of the search phrase can directly determine the incompleteness of the text of the search phrase, such as the search phrase being Shenzhen today, Shenzhen natural oil price. When the search phrase is determined to be incomplete, the text integrity of the search phrase is set to a first text integrity, such as 0.
The complete grammatical structure of the search phrase can determine that the text of the search phrase is complete, but the text integrity of different search phrases is different, the structural type of the search phrase needs to be identified, whether the search phrase is the concatenation of words or not, namely parallel phrases, partial positive phrases and the like, if so, the text integrity of the search phrase is low, such as the oil consumption of pterogorgia, a friend of a king and the like, and the text integrity is determined to be the second text integrity, such as 1. If the search phrase is a short text formed by combining the dominant-subordinate phrase, the moving-guest phrase and the like with the moving-supplementary phrase, the shape-supplementary phrase, the intervening phrase and the like, the text integrity of the search phrase is high, and the text integrity of the search phrase is determined to be a third text integrity, such as 2.
Detection of semantic integrity may be accomplished by identifying semantic information corresponding to the search term. Specifically, the detecting the semantic integrity of the search phrase to obtain the semantic integrity includes: identifying semantic information corresponding to the search phrase; if the corresponding semantic information is not identified, determining the semantic integrity of the search phrase as a first semantic integrity; if the search phrase is identified to correspond to at least two semantic information, determining the semantic integrity of the search phrase as a second semantic integrity; and if the search phrase is identified to correspond to semantic information, determining the semantic integrity of the search phrase to be a third semantic integrity, and sequentially increasing the first semantic integrity, the second semantic integrity and the third semantic integrity. Wherein, the higher the semantic integrity, the more complete the semantic meaning of the search phrase is.
It should be noted that, in the identification process of the semantic information, word segmentation processing may be performed on the search phrase, and the meaning of each word in the search phrase and the association relationship between words are determined, so as to determine whether the search phrase describes an object content, event, scene, behavior, concept, method, and the like, and if not, it indicates that the search phrase does not correspond to any semantic information, and it is difficult to understand the semantics of the search phrase, or the search phrase is meaningless, has semantic emptiness, and has no exploration value, and the like, that is, the semantics of the search phrase is incomplete, such as a parent student, so as to determine that the semantic integrity of the search phrase is the first semantic integrity, such as 0; if yes, the semantic meaning of the search phrase is complete. However, semantic integrity of different search phrases is different, whether the search phrase has multiple continuous possibilities is also required to be detected, if yes, the search phrase is indicated to correspond to multiple semantic information, namely the semantic integrity of the search phrase is low, such as oil consumption of tigers, the semantic information can be corresponding to 'the oil consumption of tigers is not large', and the semantic information can also be corresponding to 'the difference between winter and summer oil consumption of tigers', so that the semantic integrity of the search phrase is determined to be the second semantic integrity, such as 1; if not, the search phrase is only corresponding to one semantic information, the semantic integrity is high, such as a butterfly loves a full text, and therefore the semantic integrity of the search phrase is determined to be a third semantic integrity, such as 2.
After the text integrity and the semantic integrity of the search phrase are obtained, the integrity of the search phrase can be determined according to the text integrity and the semantic integrity, and the method specifically comprises the following steps: if the text integrity is a first text integrity or the semantic integrity is a first semantic integrity, determining the integrity of the search phrase as the first integrity; if the text integrity is a second text integrity or a third text integrity and the semantic integrity is a second semantic integrity, determining the integrity of the search phrase to be the second integrity; and if the text integrity is a second text integrity or a third text integrity and the semantic integrity is a third semantic integrity, determining the integrity of the search phrase to be a third integrity, and sequentially increasing the first integrity, the second integrity and the third integrity. Wherein a higher degree of completeness indicates a more complete search phrase.
Since semantic completeness accounts for a greater weight in the completeness detection of a search phrase, the completeness of a search phrase is set based primarily on semantic completeness. If the text integrity is the first text integrity, such as 0, or the semantic integrity is the first semantic integrity, such as 0, that is, any one of the text integrity and the semantic integrity is 0, determining that the search phrase is incomplete, and setting the integrity of the search phrase to be the first integrity, such as 0; if the text integrity is the second text integrity or the third text integrity, that is, the text integrity is not 0, that is, the text is complete, the integrity of the search phrase is consistent with the semantic integrity: if the semantic integrity is the second semantic integrity, such as 1, the integrity of the search phrase is the second integrity, such as 1, if the semantic integrity is the third semantic integrity, such as 2, the integrity of the search phrase is the third integrity, such as 2, and the specific integrity determination rule is shown in table 2.
Text integrity Semantic integrity Degree of integrity
0 0 0
0 1 0
0 2 0
1 0 0
1 1 1
1 2 2
2 0 0
2 1 1
2 2 2
TABLE 2
(3) Usability detection
Usability detection refers to detecting whether a search phrase can be used. The availability detection of the search phrase may be implemented by a preset availability condition. Specifically, the detecting the availability of the search phrase to obtain the availability includes: detecting whether the search phrase meets a preset available condition; the available conditions include not belonging to query intent class phrases, not having sensitive words, not having word stutters, and not belonging to rumors; if not, determining the availability of the search phrase as a first availability; if yes, determining the availability of the search phrase to be a second availability, wherein the second availability is larger than the first availability. Wherein a greater availability indicates that the search phrase is more available.
Wherein the available condition detects the availability of the search phrase by setting four conditions. For the first condition, detecting whether the search phrase belongs to a query intention phrase, wherein the query intention phrase can comprise a content acquisition phrase, such as Hangzhou fire-fighting registration network; resource acquisition type phrases such as Axiure 8 activation codes and Legend single versions; and (3) information query phrases such as the price of Shenzhen 92 gasoline, a Laoyao Liu and a daughter Li. The detection of the query intention phrase can be realized by acquiring a suffix of the search phrase and detecting whether the suffix of the search phrase is a preset query intention term, if the suffix of the search phrase is the query intention term, the search phrase is indicated as the query intention phrase, the search phrase is judged not to meet a first condition, and if not, the search phrase is judged to meet the first condition.
For the second condition, detecting whether the search phrase has sensitive words, wherein the sensitive words can include yellow violence and other words, such as erotic, violent, reactionary, political sensitive words and other words which do not accord with the core value view of social meaning; sensitive words may also include words that cause discomfort to the user, such as nausea, horror, vulgar, dirty words, words with profound negative emotions. The detection of the sensitive words can be realized by respectively detecting whether each word in the search phrase is a preset sensitive word after segmenting the search phrase, if any word in the search phrase is a sensitive word, the search phrase is indicated to have a sensitive word, the search phrase is judged not to satisfy the second condition, otherwise, the search phrase is judged to satisfy the second condition.
For the third condition, whether word deformity exists in the search phrase is detected, and the word deformity means that content in the search phrase is lost, so that the internal words have no meaning, such as a specialist. The word incomplete detection can be realized by identifying words in the search phrase, if the search phrase has unrecognizable words, the word incomplete detection is indicated to exist in the search phrase, the search phrase is judged not to meet a third condition, and otherwise, the search phrase is judged to meet the third condition.
For the fourth condition, whether the search phrase belongs to a rumor is detected, wherein the rumor refers to the information that the authenticity of the content is questioned or the information has common sense errors, such as eating sugarcane to help lose weight and smoking to be good for health. The rumor detection may be performed by detecting whether the search phrase corresponds to the search result in the search engine, if the search phrase does not have the corresponding search result, the search phrase is determined to belong to the rumor, and if not, the search phrase is determined not to satisfy the fourth condition, otherwise, the search phrase is determined to satisfy the fourth condition.
The search phrase should satisfy the above four conditions at the same time to determine that the search phrase is available, and the availability of the search phrase is determined to be a second availability, such as 1. And if the search phrase does not meet any one of the conditions, determining that the search phrase is unavailable, and setting the availability of the search phrase to a first availability, such as 0.
After obtaining the relevancy between the search phrase and the corpus and the integrity and the availability of the search phrase, the search phrase can be evaluated according to the relevancy, the integrity and the availability, specifically: if the relevance is first relevance, or the integrity is first integrity, or the availability is first availability, evaluating the recommendation degree of the search phrase as first recommendation degree; if the relevance is a third relevance, the integrity is a third integrity, and the availability is a second availability, evaluating the recommendation degree of the search phrase as a third recommendation degree; and if not, evaluating the recommendation degree of the search phrase as a second recommendation degree, wherein the first recommendation degree, the second recommendation degree and the third recommendation degree are sequentially increased.
It should be noted that, if the relevance is the first relevance, or the completeness is the first completeness, or the usability is the first usability, that is, the search phrase is not related to the corpus, or the search phrase is incomplete, or the search phrase is unavailable, the search phrase is evaluated as an unrecommended phrase, and the recommendation degree of the search phrase is set to be the first recommendation degree, such as 0; if the relevance is a third relevance, the integrity is a third integrity, and the availability is a second availability, namely the relevance between the search phrase and the corpus is high, the integrity of the search phrase is high, and the search phrase is available, evaluating the search phrase as a recommended phrase, and setting the recommendation as a third recommendation, such as 2; otherwise, the relevance is the second relevance, the completeness is the third completeness, and the usability is the second usability, that is, the relevance between the search phrase and the corpus is low, the completeness of the search phrase is high, and the search phrase is usable, the search phrase is evaluated as the recommended phrase, but the recommendation degree is low, the recommendation degree is set to be the second recommendation degree, for example, 1, or the relevance is the third relevance, the completeness is the second completeness, and the usability is the second usability, that is, the relevance between the search phrase and the corpus is high, the completeness of the search phrase is low, and the search phrase is usable, the search phrase is evaluated as the recommended phrase, but the recommendation degree is low, and the recommendation degree is set to be the second recommendation degree, for example, 1.
In addition, in the evaluation process of the search phrase, the embodiment of the invention can also select the high-quality search phrase to realize the iterative optimization of various algorithm models. Specifically, the method further comprises: obtaining an evaluation result of the search phrase; and if the evaluation result meets the preset evaluation condition, adding the search phrase and the evaluation result thereof into the training sample.
The evaluation result of the search phrase may include a recommendation degree of the search phrase, and the preset evaluation condition may be that the preset recommendation degree threshold is greater than a preset recommendation degree threshold. And comparing the recommendation degree of the search phrase with a preset recommendation degree threshold, and if the recommendation degree of the search phrase is greater than the preset recommendation degree threshold, adding the search phrase and the recommendation degree thereof to a training sample of a recommendation model.
For example, the recommendation degree may be divided into three levels, that is, a first recommendation degree (e.g., 0), a second recommendation degree (e.g., 1), and a third recommendation degree (e.g., 2), and a preset recommendation degree threshold is 0.5, then the search phrase evaluated as the second recommendation degree or the third recommendation degree and the recommendation degree thereof may be added to a training sample of the recommendation model, so as to implement iterative optimization of the recommendation model, and enable the recommendation model to generate a search phrase more in line with the expectation of the user.
Similarly, the embodiment of the invention can also select the search phrase with the correlation degree larger than the correlation degree threshold value according to the correlation degree of the search phrase and the corpus, and add the search phrase into the training sample of the correlation model to realize the iterative optimization of the correlation model; selecting a search phrase with the integrity degree larger than an integrity degree threshold value according to the integrity degree of the search phrase, and adding the search phrase into a training sample of the integrity model to realize iterative optimization of the integrity model; and selecting the search phrase with the availability degree larger than the availability degree threshold value according to the availability degree of the search phrase, and adding the search phrase into a training sample of the availability model to realize the iterative optimization of the availability model.
More specifically, according to the topic relevance between the search phrase and the corpus, the search phrase with the topic relevance larger than the topic relevance threshold value is selected and added into a training sample of the topic relevance model, so that iterative optimization of the topic relevance model is realized; selecting a search phrase with the entity correlation degree larger than the entity correlation degree threshold value according to the entity correlation degree of the search phrase and the corpus, and adding the search phrase into a training sample of the entity correlation model to realize iterative optimization of the entity correlation model; selecting a search phrase with the text integrity degree larger than a text integrity degree threshold value according to the text integrity degree of the search phrase, and adding the search phrase into a training sample of a text integrity model to realize iterative optimization of the text integrity model; and selecting the search phrase with the semantic integrity degree larger than the semantic integrity degree threshold value according to the semantic integrity degree of the search phrase, and adding the search phrase into a training sample of the semantic integrity model to realize the iterative optimization of the semantic integrity model. In addition, according to the identification result of the sensitive word, the embodiment of the invention can select the corresponding search phrase and add the corresponding search phrase into the training sample of the sensitive word identification model to realize the iterative optimization of the sensitive word identification model; and selecting a corresponding search phrase according to the rumor recognition result, and adding the search phrase into a training sample of the rumor recognition model to realize the iterative optimization of the rumor recognition model.
The evaluation method of the search phrase in the embodiment of the present invention is described below with reference to a specific application scenario.
Referring to fig. 6, a flowchart of another embodiment of a method for evaluating a search phrase according to an embodiment of the present invention is shown, where the method for evaluating a search phrase is applied to a server, and the method for evaluating a search phrase includes:
601. and acquiring article content.
For example, the article content in figure 5 entitled "central view live Australian match 14 day play forenotice, allow a unique foreign war, and hopefully capture a grand" is obtained.
602. And acquiring a search phrase corresponding to the article content.
The search phrase is generated based on keywords in the article content. When the search phrase is obtained, whether the keyword corresponding to the search phrase is a sensitive word or not can be detected, and if the keyword is the sensitive word, the search phrase is removed. For example, a search phrase "coach of a certain grandson" recommended at the bottom of the chapter content in fig. 5 is obtained, and the keyword corresponding to the search phrase is "certain grandson" and is not a sensitive word, so that the search phrase is retained and subsequent evaluation is continued.
603. Topic relevance of article content to the search phrase is detected and scored.
If the topics are related, the score of the topic relevance is 1, and if the topics are not related, the score of the topic relevance is 0. For example, if the core word "grandfather" in the search phrase "coach of grandfather" belongs to the same subject as the center word "grandfather" in the content of the article, the search phrase is related to the subject of the content of the article and has a score of 1.
604. Entity relevance of the article content to the search phrase is detected and scored.
If the entities are related, the entity relevance scores 1, and if the entities are not related, the entity relevance scores 0. For example, if there is no ambiguity between the core word "grandfather" in the search phrase "coach of grandfather" and the core word "grandfather" in the article content, the search phrase is related to the article content entity and has a score of 1.
605. And determining a relevance score of the article content and the search phrase according to the score of the topic relevance and the score of the entity relevance.
If the topic relevance score is 0, the overall relevance score is 0; if the topic relevance score is 1 and the entity relevance score is 0, the overall relevance score is 1; if the topic relevance score is 1 and the entity relevance score is 1, the overall relevance score is 2. For example, the article content in FIG. 5 has a relevance score of 2 to the search phrase "coach of someone on grandchild".
606. The text integrity of the search phrase is detected and scored.
If the text is not complete, the score of the text completeness is 0, if the text is complete but the text completeness is low, the score of the text completeness is 1, and if the text is complete and the text completeness is high, the score of the text completeness is 2. For example, if the search phrase "coach of a grandchild" is complete in grammar structure but belongs to a bias phrase, it indicates that the text of the search phrase is complete, but the text integrity is low, and the score of the text integrity is 1.
607. Semantic integrity of the search term is detected and scored.
If the semantics are not complete, the score of the semantic integrity is 0, if the semantics are complete but the semantic integrity is low, the score of the semantic integrity is 1, and if the semantics are complete and the semantic integrity is high, the score of the semantic integrity is 2. For example, the search phrase "coach" may correspond to various semantic information, which indicates that the search phrase is complete in semantic, but the semantic integrity is low, and the score of the semantic integrity is 1.
608. And determining the integrity score of the search phrase according to the score of the text integrity and the score of the semantic integrity.
If any score in the text integrity and the semantic integrity is 0, the integral integrity score of the search phrase is 0; and if the scores of the text completeness and the semantic completeness are not 0, the integral completeness score of the search phrase is consistent with the score of the semantic completeness. For example, if the score of the text completeness and semantic completeness of the search phrase "coach from one sun" is 1, the completeness score of the search phrase is 1.
609. And detecting whether the search phrase is a query intention phrase or not, and scoring the query intention according to the detection result.
If the search phrase is a query intention phrase, the query intention score is 0, and if the search phrase is not a query intention phrase, the query intention score is 1. For example, if the search phrase "coach from a particular person" belongs to the information query class phrase, i.e., to the query intent class phrase, the query intent score of the search phrase is 0.
610. And detecting whether the search phrase has the sensitive word or not, and scoring the sensitive word according to the detection result.
If the search phrase has the sensitive word, the sensitive word score is 0, and if the search phrase does not have the sensitive word, the sensitive word score is 1. For example, if the search phrase "coach of someone on the sun" does not have a sensitive word, then the sensitive word score for that search phrase is 1.
611. And detecting whether the search phrase has word deformity or not, and scoring the word deformity according to the detection result.
And if the search phrase has the word disability, the word disability score is 0, and if the search phrase has no word disability, the word disability score is 1. For example, if there is no word disability in the search phrase "grand one's coach," the word disability score for that search phrase is 1.
612. And detecting whether the search phrase belongs to the rumor, and scoring the rumor according to the detection result.
The rumor score is 0 if the search phrase belongs to a rumor, and 1 if the search phrase does not belong to a rumor. For example, if the search phrase "coach from a particular grandson" does not belong to a rumor, the rumor score for the search phrase is 1.
613. Determining an availability score for the search phrase based on the query intent score, the sensitive word score, the term disability score, and the rumor score.
If any one of the query intention score, the sensitive word score, the word disability score and the rumor score is 0, the usability score of the search phrase is 0; if the query intent score, sensitive word score, term disability score, and rumor score are all 1, the usability score of the search phrase is 1. For example, if the query intent score of the search phrase "grand a coach" is 0, then the availability score of the search phrase is 0.
614. The overall score of the search phrase is evaluated based on the relevance score, the completeness score, and the usability score.
If any one of the relevance score, the integrity score and the availability score is 0, the overall score of the search phrase is 0; if none of the relevance score, the completeness score, and the availability score is 0, then the overall score for the search phrase is the lower of the relevance score and the completeness score. For example, if the search phrase "coach who has one in the sun" has a relevance score of 1, an integrity score of 1, and an availability score of 0, then the search phrase is scored as a whole as 0.
After the overall score is obtained, an evaluation record table can be further made, so that URL (Uniform Resource Locator) addresses, article titles, article contents, search phrases, source words, topic relevance scores, entity relevance scores, text integrity scores, semantic integrity scores, query intention scores, sensitive word scores, word disability scores, rumor scores, relevance scores, integrity scores, availability scores, overall scores and the like of the articles are recorded in the evaluation record table, and necessary classification training samples are provided for the subsequent iterative optimization of various algorithm models. The evaluation record table may be shown in table 3.
Figure BDA0002254659760000251
TABLE 3
In summary, the embodiment of the present invention obtains the corpus, further obtains the search phrase corresponding to the corpus, detects the search phrase, obtains the multidimensional feature data of the search phrase, and implements multidimensional intelligent evaluation on the search phrase according to the multidimensional feature data, thereby effectively improving the evaluation efficiency of the search phrase. In addition, the embodiment of the invention excavates the problem of low commonality of the search phrase by carrying out systematic and flow quality evaluation on the multiple dimensions of the search phrase, and effectively promotes the classification optimization of various algorithm models.
In order to better implement the evaluation method of the search phrase provided by the embodiment of the invention, the embodiment of the invention also provides a device based on the evaluation method of the search phrase. Wherein the meaning of nouns is the same as in the above evaluation method of search phrases, and the details of implementation can refer to the description in the method embodiments.
Referring to fig. 7, fig. 7 is a schematic structural diagram of an apparatus for evaluating a search phrase according to an embodiment of the present invention, where the apparatus for evaluating a search phrase may include:
a corpus obtaining module 701 configured to obtain a corpus;
a search phrase obtaining module 702, configured to obtain a search phrase corresponding to the corpus; the search phrase is generated by combining the keywords in the corpus with matched popular search words, and the popular search words are entity words with the search quantity larger than a preset threshold value;
a detection module 703, configured to detect the search phrase to obtain multidimensional feature data of the search phrase; and the number of the first and second groups,
an evaluation module 704 configured to evaluate the search phrase according to the multi-dimensional feature data.
In some embodiments of the present invention, the detection module 703 is specifically configured to:
detecting the correlation between the search phrase and the corpus to obtain the degree of correlation;
detecting the integrity of the search phrase to obtain the integrity;
detecting the availability of the search phrase according to a preset availability condition to obtain the availability;
adding the relevance, the completeness, and the availability to the multi-dimensional feature data of the search phrase.
In some embodiments of the present invention, the detection module 703 is specifically configured to:
detecting the topic relevance of the search phrase and the corpus to obtain topic relevance;
detecting the entity relevance of the search phrase and the corpus to obtain entity relevance;
and determining the relevance of the search phrase and the corpus according to the topic relevance and the entity relevance.
In some embodiments of the present invention, the detection module 703 is specifically configured to:
identifying a central word of the corpus;
identifying core words in the search phrase, the core words including the keywords or the popular search words;
detecting whether the central word and the core word meet at least one theme condition in preset theme conditions or not according to a pre-established knowledge graph; the subject conditions comprise belonging to the same concept, being associated with the same event, having an affiliation, or belonging to the same subject;
if not, determining the topic relevance of the search phrase and the corpus as a first topic relevance;
if yes, determining the topic relevance of the search phrase and the corpus as a second topic relevance, wherein the second topic relevance is greater than the first topic relevance.
In some embodiments of the present invention, the detection module 703 is specifically configured to:
detecting whether ambiguity exists between the central word and the core word according to a pre-established knowledge graph;
if so, determining the entity relevance of the search phrase and the corpus as a first entity relevance;
if not, determining that the entity relevance of the search phrase and the corpus is a second entity relevance, wherein the second entity relevance is greater than the first entity relevance.
In some embodiments of the present invention, the detection module 703 is specifically configured to:
if the topic relevance is a first topic relevance, determining the relevance of the search phrase and the corpus as a first relevance;
if the topic relevance is a second topic relevance and the entity relevance is a first entity relevance, determining that the relevance of the search phrase and the corpus is a second relevance;
and if the topic relevance is a second topic relevance and the entity relevance is a second entity relevance, determining that the relevance of the search phrase and the corpus is a third relevance, and sequentially increasing the first relevance, the second relevance and the third relevance.
In some embodiments of the present invention, the detection module 703 is specifically configured to:
detecting the text integrity of the search phrase to obtain the text integrity;
detecting the semantic integrity of the search phrase to obtain the semantic integrity;
and determining the integrity of the search phrase according to the text integrity and the semantic integrity.
In some embodiments of the present invention, the detection module 703 is specifically configured to:
detecting whether a grammatical structure of the search phrase is complete;
if the grammar structure is incomplete, determining the text integrity of the search phrase as a first text integrity;
if the grammar structure is complete, detecting whether the search phrase is a parallel phrase or a bias phrase;
if so, determining the text integrity of the search phrase as a second text integrity;
if not, determining that the text integrity of the search phrase is a third text integrity, and sequentially increasing the first text integrity, the second text integrity and the third text.
In some embodiments of the present invention, the detection module 703 is specifically configured to:
identifying semantic information corresponding to the search phrase;
if the corresponding semantic information is not identified, determining the semantic integrity of the search phrase as a first semantic integrity;
if the search phrase is identified to correspond to at least two semantic information, determining the semantic integrity of the search phrase as a second semantic integrity;
and if the search phrase is identified to correspond to semantic information, determining the semantic integrity of the search phrase to be a third semantic integrity, and sequentially increasing the first semantic integrity, the second semantic integrity and the third semantic integrity.
In some embodiments of the present invention, the detection module 703 is specifically configured to:
if the text integrity is a first text integrity or the semantic integrity is a first semantic integrity, determining the integrity of the search phrase as the first integrity;
if the text integrity is a second text integrity or a third text integrity and the semantic integrity is a second semantic integrity, determining the integrity of the search phrase to be the second integrity;
and if the text integrity is a second text integrity or a third text integrity and the semantic integrity is a third semantic integrity, determining the integrity of the search phrase to be a third integrity, and sequentially increasing the first integrity, the second integrity and the third integrity.
In some embodiments of the present invention, the correlation includes a first correlation, a second correlation and a third correlation which are sequentially increased, the integrity includes a first integrity, a second integrity and a third integrity which are sequentially increased, and the availability includes a first availability and a second availability which are sequentially increased; the detection module 703 is specifically configured to:
if the relevance is first relevance, or the integrity is first integrity, or the availability is first availability, evaluating the recommendation degree of the search phrase as first recommendation degree;
if the relevance is a third relevance, the integrity is a third integrity, and the availability is a second availability, evaluating the recommendation degree of the search phrase as a third recommendation degree;
and if not, evaluating the recommendation degree of the search phrase as a second recommendation degree, wherein the first recommendation degree, the second recommendation degree and the third recommendation degree are sequentially increased.
In some embodiments of the present invention, the apparatus further includes a sample adding module, and the sample adding module is specifically configured to:
obtaining an evaluation result of the search phrase;
and if the evaluation result meets the preset evaluation condition, adding the search phrase and the evaluation result thereof into the training sample.
In some embodiments of the present invention, the apparatus further includes a storage module, where the storage module is specifically configured to:
obtaining an evaluation result of the search phrase;
and storing the evaluation result in a block form in a block chain.
In specific implementation, the above modules may be implemented as independent entities, or may be combined arbitrarily to be implemented as the same or several entities, and specific implementation of the above modules may refer to the foregoing method embodiments, which are not described herein again.
According to the embodiment of the invention, the corpus is obtained, the search phrase corresponding to the corpus is further obtained, the search phrase is detected, the multi-dimensional feature data of the search phrase is obtained, the multi-directional evaluation of the search phrase is realized according to the multi-dimensional feature data, and the evaluation efficiency and the evaluation accuracy of the search phrase are effectively improved. In addition, the embodiment of the invention excavates the problem of low commonality of the search phrase by carrying out systematic and flow quality evaluation on the multiple dimensions of the search phrase, and effectively promotes the classification optimization of various algorithm models.
An embodiment of the present invention further provides a server, as shown in fig. 8, which shows a schematic structural diagram of the server according to the embodiment of the present invention, specifically:
the server may include components such as a processor 801 of one or more processing cores, memory 802 of one or more computer-readable storage media, a power supply 803, and an input unit 804. Those skilled in the art will appreciate that the server architecture shown in FIG. 8 is not meant to be limiting, and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components. Wherein:
the processor 801 is a control center of the server, connects various parts of the entire server using various interfaces and lines, and performs various functions of the server and processes data by running or executing software programs and/or modules stored in the memory 802 and calling data stored in the memory 802, thereby performing overall monitoring of the server. Alternatively, processor 801 may include one or more processing cores; preferably, the processor 801 may integrate an application processor, which mainly handles operations of storage media, user interfaces, application programs, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 801.
The memory 802 may be used to store software programs and modules, and the processor 801 executes various functional applications and data processing by operating the software programs and modules stored in the memory 802. The memory 802 may mainly include a storage program area and a storage data area, wherein the storage program area may store an application program (such as a sound playing function, an image playing function, etc.) required for operating a storage medium, at least one function, and the like; the storage data area may store data created according to the use of the server, and the like. Further, the memory 802 may include high speed random access memory and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, the memory 802 may also include a memory controller to provide the processor 801 access to the memory 802.
The server further comprises a power supply 803 for supplying power to each component, and preferably, the power supply 803 can be logically connected with the processor 801 through a power management storage medium, so that functions of charging, discharging, power consumption management and the like can be managed through the power management storage medium. The power supply 803 may also include any component of one or more dc or ac power sources, rechargeable storage media, power failure detection circuitry, power converters or inverters, power status indicators, and the like.
The server may further include an input unit 804, and the input unit 804 may be used to receive input numeric or character information and generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control.
Although not shown, the server may further include a display unit and the like, which will not be described in detail herein. Specifically, in this embodiment, the processor 801 in the server loads the executable file corresponding to the process of one or more application programs into the memory 802 according to the following instructions, and the processor 801 runs the application programs stored in the memory 802, thereby implementing various functions as follows:
obtaining a corpus; acquiring a search phrase corresponding to the corpus; the search phrase is generated by combining the keywords in the corpus with matched popular search words, and the popular search words are entity words with the search quantity larger than a preset threshold value; detecting the search phrase to obtain multi-dimensional feature data of the search phrase; evaluating the search phrase according to the multi-dimensional feature data.
It will be understood by those skilled in the art that all or part of the steps of the methods of the above embodiments may be performed by instructions or by associated hardware controlled by the instructions, which may be stored in a computer readable storage medium and loaded and executed by a processor.
To this end, embodiments of the present invention provide a storage medium having stored therein a plurality of instructions that can be loaded by a processor to perform the steps of any of the methods for evaluating a search phrase provided by embodiments of the present invention. For example, the instructions may perform the steps of:
obtaining a corpus; acquiring a search phrase corresponding to the corpus; the search phrase is generated by combining the keywords in the corpus with matched popular search words, and the popular search words are entity words with the search quantity larger than a preset threshold value; detecting the search phrase to obtain multi-dimensional feature data of the search phrase; evaluating the search phrase according to the multi-dimensional feature data.
The above operations can be implemented in the foregoing embodiments, and are not described in detail herein.
Wherein the storage medium may include: read Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disks, and the like.
Since the instructions stored in the storage medium can execute the steps in the method for evaluating any search phrase provided by the embodiment of the present invention, the beneficial effects that can be achieved by the method for evaluating any search phrase provided by the embodiment of the present invention can be achieved, which are detailed in the foregoing embodiments and will not be described herein again.
The method, the apparatus, the server and the storage medium for evaluating a search phrase provided by the embodiments of the present invention are described in detail above, and a specific example is applied in the present disclosure to explain the principle and the implementation of the present invention, and the description of the above embodiments is only used to help understanding the method and the core idea of the present invention; meanwhile, for those skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims (15)

1. A method for evaluating a search phrase, comprising:
obtaining a corpus;
acquiring a search phrase corresponding to the corpus; the search phrase is generated by combining the keywords in the corpus with matched popular search words, and the popular search words are entity words with the search quantity larger than a preset threshold value;
detecting the search phrase to obtain multi-dimensional feature data of the search phrase;
evaluating the search phrase according to the multi-dimensional feature data.
2. The method for evaluating a search phrase according to claim 1, wherein the detecting the search phrase to obtain the multidimensional feature data of the search phrase specifically comprises:
detecting the correlation between the search phrase and the corpus to obtain the degree of correlation;
detecting the integrity of the search phrase to obtain the integrity;
detecting the availability of the search phrase according to a preset availability condition to obtain the availability;
adding the relevance, the completeness, and the availability to the multi-dimensional feature data of the search phrase.
3. The method for evaluating a search phrase according to claim 2, wherein the detecting the correlation between the search phrase and the corpus to obtain the degree of correlation specifically comprises:
detecting the topic relevance of the search phrase and the corpus to obtain topic relevance;
detecting the entity relevance of the search phrase and the corpus to obtain entity relevance;
and determining the relevance of the search phrase and the corpus according to the topic relevance and the entity relevance.
4. The method according to claim 3, wherein the detecting the topic relevance between the search phrase and the corpus to obtain the topic relevance specifically comprises:
identifying a central word of the corpus;
identifying core words in the search phrase, the core words including the keywords or the popular search words;
detecting whether the central word and the core word meet at least one theme condition in preset theme conditions or not according to a pre-established knowledge graph; the subject conditions comprise belonging to the same concept, being associated with the same event, having an affiliation, or belonging to the same subject;
if not, determining the topic relevance of the search phrase and the corpus as a first topic relevance;
if yes, determining the topic relevance of the search phrase and the corpus as a second topic relevance, wherein the second topic relevance is greater than the first topic relevance.
5. The method according to claim 4, wherein the detecting the entity relevance between the search phrase and the corpus to obtain the entity relevance specifically comprises:
detecting whether ambiguity exists between the central word and the core word according to a pre-established knowledge graph;
if so, determining the entity relevance of the search phrase and the corpus as a first entity relevance;
if not, determining that the entity relevance of the search phrase and the corpus is a second entity relevance, wherein the second entity relevance is greater than the first entity relevance.
6. The method according to claim 5, wherein the determining the relevancy of the search phrase to the corpus according to the topic relevancy and the entity relevancy specifically comprises:
if the topic relevance is a first topic relevance, determining the relevance of the search phrase and the corpus as a first relevance;
if the topic relevance is a second topic relevance and the entity relevance is a first entity relevance, determining that the relevance of the search phrase and the corpus is a second relevance;
and if the topic relevance is a second topic relevance and the entity relevance is a second entity relevance, determining that the relevance of the search phrase and the corpus is a third relevance, and sequentially increasing the first relevance, the second relevance and the third relevance.
7. The method for evaluating a search phrase according to claim 2, wherein the detecting the completeness of the search phrase to obtain the completeness specifically comprises:
detecting the text integrity of the search phrase to obtain the text integrity;
detecting the semantic integrity of the search phrase to obtain the semantic integrity;
and determining the integrity of the search phrase according to the text integrity and the semantic integrity.
8. The method for evaluating a search phrase according to claim 7, wherein the detecting the text completeness of the search phrase to obtain the text completeness specifically comprises:
detecting whether a grammatical structure of the search phrase is complete;
if the grammar structure is incomplete, determining the text integrity of the search phrase as a first text integrity;
if the grammar structure is complete, detecting whether the search phrase is a parallel phrase or a bias phrase;
if so, determining the text integrity of the search phrase as a second text integrity;
if not, determining that the text integrity of the search phrase is a third text integrity, and sequentially increasing the first text integrity, the second text integrity and the third text.
9. The method for evaluating search phrases according to claim 8, wherein the detecting semantic integrity of the search phrase to obtain semantic integrity specifically includes:
identifying semantic information corresponding to the search phrase;
if the corresponding semantic information is not identified, determining the semantic integrity of the search phrase as a first semantic integrity;
if the search phrase is identified to correspond to at least two semantic information, determining the semantic integrity of the search phrase as a second semantic integrity;
and if the search phrase is identified to correspond to semantic information, determining the semantic integrity of the search phrase to be a third semantic integrity, and sequentially increasing the first semantic integrity, the second semantic integrity and the third semantic integrity.
10. The method for evaluating a search phrase according to claim 9, wherein said determining the completeness of the search phrase according to the text completeness and the semantic completeness specifically comprises:
if the text integrity is a first text integrity or the semantic integrity is a first semantic integrity, determining the integrity of the search phrase as the first integrity;
if the text integrity is a second text integrity or a third text integrity and the semantic integrity is a second semantic integrity, determining the integrity of the search phrase to be the second integrity;
and if the text integrity is a second text integrity or a third text integrity and the semantic integrity is a third semantic integrity, determining the integrity of the search phrase to be a third integrity, and sequentially increasing the first integrity, the second integrity and the third integrity.
11. The method for evaluating a search phrase according to claim 2, wherein the degrees of correlation include a first degree of correlation, a second degree of correlation, and a third degree of correlation that are sequentially increased, the degrees of completeness include a first degree of completeness, a second degree of completeness, and a third degree of completeness that are sequentially increased, and the degrees of availability include a first degree of availability and a second degree of availability that are sequentially increased;
the evaluating the search phrase according to the multi-dimensional feature data specifically includes:
if the relevance is first relevance, or the integrity is first integrity, or the availability is first availability, evaluating the recommendation degree of the search phrase as first recommendation degree;
if the relevance is a third relevance, the integrity is a third integrity, and the availability is a second availability, evaluating the recommendation degree of the search phrase as a third recommendation degree;
and if not, evaluating the recommendation degree of the search phrase as a second recommendation degree, wherein the first recommendation degree, the second recommendation degree and the third recommendation degree are sequentially increased.
12. The method for evaluating a search phrase as recited in claim 1, further comprising:
obtaining an evaluation result of the search phrase;
and if the evaluation result meets the preset evaluation condition, adding the search phrase and the evaluation result thereof into the training sample.
13. An apparatus for evaluating a search phrase, comprising:
the corpus acquiring module is used for acquiring a corpus;
a search phrase obtaining module, configured to obtain a search phrase corresponding to the corpus; the search phrase is generated by combining the keywords in the corpus with matched popular search words, and the popular search words are entity words with the search quantity larger than a preset threshold value;
the detection module is used for detecting the search phrase to obtain multi-dimensional feature data of the search phrase; and the number of the first and second groups,
and the evaluation module is used for evaluating the search phrase according to the multi-dimensional characteristic data.
14. A server comprising a memory and a processor, the memory having stored therein a computer program that, when executed by the processor, causes the processor to perform the steps of:
obtaining a corpus;
acquiring a search phrase corresponding to the corpus; the search phrase is generated by combining the keywords in the corpus with matched popular search words, and the popular search words are entity words with the search quantity larger than a preset threshold value;
detecting the search phrase to obtain multi-dimensional feature data of the search phrase;
evaluating the search phrase according to the multi-dimensional feature data.
15. A storage medium storing a plurality of instructions adapted to be loaded by a processor to perform the steps of the method for evaluating a search phrase of any of claims 1-12.
CN201911048275.1A 2019-10-30 2019-10-30 Evaluation method and device of search phrase, server and storage medium Active CN112749246B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911048275.1A CN112749246B (en) 2019-10-30 2019-10-30 Evaluation method and device of search phrase, server and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911048275.1A CN112749246B (en) 2019-10-30 2019-10-30 Evaluation method and device of search phrase, server and storage medium

Publications (2)

Publication Number Publication Date
CN112749246A true CN112749246A (en) 2021-05-04
CN112749246B CN112749246B (en) 2023-11-28

Family

ID=75640999

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911048275.1A Active CN112749246B (en) 2019-10-30 2019-10-30 Evaluation method and device of search phrase, server and storage medium

Country Status (1)

Country Link
CN (1) CN112749246B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105138511A (en) * 2015-08-10 2015-12-09 北京思特奇信息技术股份有限公司 Method and system for semantically analyzing search keyword
US20160004766A1 (en) * 2006-10-10 2016-01-07 Abbyy Infopoisk Llc Search technology using synonims and paraphrasing
CN109460499A (en) * 2018-10-16 2019-03-12 青岛聚看云科技有限公司 Target search word generation method and device, electronic equipment, storage medium
CN109522465A (en) * 2018-10-22 2019-03-26 国家电网公司 The semantic searching method and device of knowledge based map
CN110377817A (en) * 2019-06-13 2019-10-25 百度在线网络技术(北京)有限公司 Search entry method for digging and device and its application in multimedia resource

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160004766A1 (en) * 2006-10-10 2016-01-07 Abbyy Infopoisk Llc Search technology using synonims and paraphrasing
CN105138511A (en) * 2015-08-10 2015-12-09 北京思特奇信息技术股份有限公司 Method and system for semantically analyzing search keyword
CN109460499A (en) * 2018-10-16 2019-03-12 青岛聚看云科技有限公司 Target search word generation method and device, electronic equipment, storage medium
CN109522465A (en) * 2018-10-22 2019-03-26 国家电网公司 The semantic searching method and device of knowledge based map
CN110377817A (en) * 2019-06-13 2019-10-25 百度在线网络技术(北京)有限公司 Search entry method for digging and device and its application in multimedia resource

Also Published As

Publication number Publication date
CN112749246B (en) 2023-11-28

Similar Documents

Publication Publication Date Title
CN107918644B (en) News topic analysis method and implementation system in reputation management framework
Rehman et al. A benchmark dataset and learning high-level semantic embeddings of multimedia for cross-media retrieval
Petkos et al. Two-level Message Clustering for Topic Detection in Twitter.
US9070087B2 (en) Methods and systems for investigation of compositions of ontological subjects
CN113962293B (en) LightGBM classification and representation learning-based name disambiguation method and system
Hoogeveen et al. Detecting misflagged duplicate questions in community question-answering archives
US20170235836A1 (en) Information identification and extraction
Ballatore et al. Linking geographic vocabularies through WordNet
Tayal et al. Fast retrieval approach of sentimental analysis with implementation of bloom filter on Hadoop
Budikova et al. ConceptRank for search-based image annotation
Zemlyanskiy et al. DOCENT: Learning self-supervised entity representations from large document collections
Xu et al. Towards annotating media contents through social diffusion analysis
Wasim et al. Extracting and modeling user interests based on social media
Chen Understanding and exploiting user intent in community question answering
CN112749246A (en) Search phrase evaluation method, device, server and storage medium
CN113392294A (en) Sample labeling method and device
Reddy et al. Web services discovery based on semantic similarity clustering
Liu et al. MB-ToT: an effective model for topic mining in microblogs
Bide et al. Cross event detection and topic evolution analysis in cross events for man-made disasters in social media streams
Shi et al. Story disambiguation: Tracking evolving news stories across news and social streams
CN112287229B (en) National defense construction dynamic information recommendation method based on combined semantic similarity
Sharma et al. TwiBiNG: A Bipartite News Generator Using Twitter.
Coppola DimensionRank: Personal Neural Representations for Personalized General Search
Giannoulakis Instagram hashtags as a source of semantic information for Automatic Image Annotation
Kozko et al. Adaptive discussion forum for reduce information overload

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40048359

Country of ref document: HK

GR01 Patent grant
GR01 Patent grant