CN113220838A - Method and device for determining key information, electronic equipment and storage medium - Google Patents

Method and device for determining key information, electronic equipment and storage medium Download PDF

Info

Publication number
CN113220838A
CN113220838A CN202110520029.2A CN202110520029A CN113220838A CN 113220838 A CN113220838 A CN 113220838A CN 202110520029 A CN202110520029 A CN 202110520029A CN 113220838 A CN113220838 A CN 113220838A
Authority
CN
China
Prior art keywords
phrase
target
speech
determining
service
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110520029.2A
Other languages
Chinese (zh)
Inventor
冯君陶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202110520029.2A priority Critical patent/CN113220838A/en
Publication of CN113220838A publication Critical patent/CN113220838A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Mathematical Physics (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The disclosure provides a method, a device, electronic equipment and a storage medium for determining key information, which are applied to the field of electronic technology, in particular to the fields of natural language processing, deep learning and data mining. The specific implementation scheme of the method for determining the key information is as follows: determining query information related to a target service as target query information; extracting candidate key phrases from the target query information; and determining a target phrase in the candidate key phrases as key information for the target service based on the similarity between the candidate key phrases and the service name of the target service.

Description

Method and device for determining key information, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of electronic technologies, and in particular, to the field of natural language processing, deep learning, and data mining, and more particularly, to a method, an apparatus, a device, and a storage medium for determining key information.
Background
With the development of electronic technology and internet technology, people are more inclined to inquire information through networks. In order to improve the user experience, it is one of research hotspots how to provide service information satisfying the demand to the user according to the query information input by the user.
In order to provide service information satisfying the demand to the user, trigger phrases may be set for various types of services. As such, the type of service information provided to the user may be determined from the trigger phrase included in the query information.
Disclosure of Invention
A method, an apparatus, an electronic device, and a storage medium for determining key information with improved accuracy are provided.
According to an aspect of the present disclosure, there is provided a method of determining key information, the method including: determining query information related to a target service as target query information; extracting candidate key phrases from the target query information; and determining a target phrase in the candidate key phrases as key information for the target service based on the similarity between the candidate key phrases and the service name of the target service.
According to another aspect of the present disclosure, there is provided an apparatus for determining key information, the apparatus including: the query information determining module is used for determining query information related to the target service as target query information; the phrase extraction module is used for extracting candidate key phrases from the target query information; and the key information determining module is used for determining a target phrase in the candidate key phrases based on the similarity between the candidate key phrases and the service name of the target service, so as to serve as the key information aiming at the target service.
According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of determining critical information provided by the present disclosure.
According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform the method of determining key information provided by the present disclosure.
According to another aspect of the present disclosure, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the method of determining critical information provided by the present disclosure.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
fig. 1 is a schematic view of an application scenario of a method, an apparatus, an electronic device and a storage medium for determining key information according to an embodiment of the present disclosure;
FIG. 2 is a flow chart of a method of determining key information according to an embodiment of the present disclosure;
FIG. 3 is a schematic diagram illustrating the principle of determining query information relating to a target service according to an embodiment of the present disclosure;
FIG. 4 is a schematic diagram illustrating the principle of extracting candidate key-phrases from target query information, according to an embodiment of the present disclosure;
FIG. 5 is a schematic diagram of the principle of determining a target phrase in candidate key phrases in accordance with an embodiment of the present disclosure;
FIG. 6 is a schematic diagram illustrating a method of determining key information according to an embodiment of the present disclosure;
FIG. 7 is a block diagram of an apparatus for determining key information according to an embodiment of the present disclosure; and
FIG. 8 is a block diagram of an electronic device used to implement a method of determining key information according to an embodiment of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
The present disclosure provides a method for determining key information, which includes a query information determining stage, a phrase extracting stage and a key information determining stage. In the query information determination phase, query information relating to a target service is determined as target query information. In the phrase extraction phase, candidate key phrases are extracted from the target query information. In the key information determining stage, a target phrase in the candidate key phrases is determined as key information for the target service based on the similarity between the candidate key phrase and the service name of the target service.
An application scenario of the method and apparatus provided by the present disclosure will be described below with reference to fig. 1.
Fig. 1 is a schematic view of an application scenario of a method and an apparatus for determining key information according to an embodiment of the present disclosure.
As shown in fig. 1, the application scenario 100 includes a user 110, a terminal device 120, and a server 130. The terminal device 120 may be communicatively coupled to the server 130 via a network, which may include wired or wireless communication links.
User 110 may interact with server 130, for example, over a network using terminal device 120, to receive or send messages, etc. The terminal device 120 may be a terminal device having a display screen and having a processing function, including but not limited to a smart phone, a tablet computer, a laptop portable computer, a desktop computer, and the like.
Illustratively, the terminal device 120 may obtain the query information 140 in response to a user operation, or may recognize voice information of the user through a voice recognition technology, thereby obtaining the query information 140. The terminal device 120 may also send the query information 140 to the server 130 to obtain the service information 150 satisfying the user's requirement from the server 130. The query information 140 may be, for example, a query in a search scenario.
The server 130 may be a server that provides various services, such as a background management server that provides support for websites or client applications that users browse with the terminal device 120. The server 130 may be a cloud server, a server of a distributed system, or a server with a blockchain.
Illustratively, the server 130 may, for example, in response to the received query information 140, match the query information 140 according to trigger phrases of a plurality of types of service information maintained in advance, determine a trigger phrase matching the query information 140, and feed back the service information 150 corresponding to the matched trigger phrase to the terminal device 120.
According to an embodiment of the present disclosure, as shown in fig. 1, the application scenario 100 may further include a database 160. The database 160 stores historical query information. The server 130 may access the database 160, for example, to determine trigger phrases for multiple types of service information based on historical query information.
The aforementioned types of service information may include, for example, "attraction tickets" information, "full-time recruitment" information, "home appliance maintenance" information, and other service information related to the fields of work, life, education, and the like.
It should be noted that the key information determined by the present disclosure may be the aforementioned trigger phrase. The method of determining key information provided by the present disclosure may be performed by the server 130. Accordingly, the means for determining key information provided by the present disclosure may be provided in the server 130.
It should be understood that the number and type of terminal devices, servers, and databases in fig. 1 are merely illustrative. There may be any number and type of terminal devices, servers, and databases, as the implementation requires.
Fig. 2 is a flow chart of a method of determining key information according to an embodiment of the present disclosure.
As shown in fig. 2, the method 200 of determining key information of this embodiment may include operations S210 to S230.
In operation S210, query information related to the target service is determined as target query information.
The target service may be a service capable of providing various types of service information, and may include, for example, a "scenery spot ticket" service, a "home appliance maintenance" service, a "full-time recruitment" service, and/or a "house rental" service. The target business is any one of service items provided in multiple fields of life, work, education, medical treatment and the like.
Illustratively, query information relating to the target service may be recalled from the historical query information. The historical query information may be stored, for example, in a category according to the target service involved and indexed by the target service involved. The embodiment may determine the target service first, and use the historical query information indexed by the target service as the target query information. Or, query information with a high similarity to the service name of the target service in the historical query information may be used as the target query information. The determined target query information can be one or more. To improve the accuracy of the determined key information, multiple target query information may be recalled.
In operation S220, candidate key-phrases are extracted from the target query information.
According to embodiments of the present disclosure, a phrase (phrase) extraction algorithm may be employed to extract candidate key phrases. The phrase Extraction algorithm may be a Rapid Automatic Keyword Extraction algorithm (RAKE), an Extraction algorithm based on mutual information and left and right information entropy, or a key phrase-extract operator provided by a natural language processing platform, etc. The phrase refers to a language unit without sentence tone combined by three language units which can be matched on the three levels of syntax, semantics and language, and is also called a phrase. The phrase is a grammatical unit that is larger than a word without a sentence.
It is to be understood that the above phrase extraction algorithm is merely an example to facilitate understanding of the present disclosure, and the present disclosure is not limited thereto.
In operation S230, a target phrase in the candidate key phrases is determined as key information for the target service based on a similarity between the candidate key phrase and a service name of the target service.
After the candidate key phrase is obtained, a semantic similarity calculation method may be used to determine the similarity between the candidate key phrase and the service name. And selecting phrases with similarity greater than a similarity threshold value with the service name from the candidate key phrases as key information for the target service. The similarity threshold value may be set according to actual requirements, which is not limited by the present disclosure.
Illustratively, the semantic similarity calculation method may include a word shift distance (W) basedord move's Distance), Smooth Inverse Frequency (Smooth Inverse Frequency) based methods, and the use of Deep Structured Semantic Models (DSSM). It is to be understood that the above semantic similarity calculation method is only an example to facilitate understanding of the present disclosure, and the present disclosure is not limited thereto.
After the key information is determined, if the query obtained by the terminal device includes the key information, the embodiment may feed back the service information of the target service to the terminal device, and show the service information as the feedback information of the response query to the user.
Compared with the technical scheme that the key information is obtained by rewriting the service name (such as the service item name) of the target service in the related technology, the query information related to the target service is recalled from the historical query information, and the phrase with high similarity to the service name is mined from the query information to serve as the key information, so that the text range of the source text of the key information can be expanded, and the key information which is different from the text included in the service name and is semantically related can be mined. Therefore, the embodiment can obtain the key information which is high in accuracy and more suitable for the use scene. Based on the technical scheme of the embodiment, better search service is provided for the user conveniently.
Fig. 3 is a schematic diagram of a principle of determining query information related to a target service according to an embodiment of the present disclosure.
The embodiment of the disclosure may generate a dictionary tree in advance based on the service names of a plurality of services, and when target query information is recalled, query information related to the target service is searched from the historical query information based on the dictionary tree. Therefore, the efficiency and the accuracy of recalling the target query information are improved. The plurality of services include the target service, and the plurality of services may be set according to actual requirements, which is not limited by the present disclosure.
When generating the dictionary tree, the service name of each service in the plurality of services may be divided by taking a word as a unit to obtain a phrase for each service. After a phrase for each service is obtained, a branch for each service is generated based on the phrase, a plurality of branches are obtained, and a predetermined dictionary tree is configured based on the plurality of branches.
Illustratively, the word segmentation processing can be carried out on the business names of all businesses according to a custom dictionary or a word segmentation tool. For example, for the business name "sight spot ticket", the phrase { sight spot, ticket } can be obtained by word cutting processing. Based on the sequence of each word in the phrase, nodes indicating each word are sequentially established to obtain a plurality of nodes, and the plurality of nodes are sequentially connected according to the sequence of each word, so that branches aiming at each service can be obtained. For the phrase { sight spot, entrance ticket }, a first node indicating a "sight", a second node indicating a "point", a third node indicating a "gate", and a fourth node indicating a "ticket" may be sequentially established, and the first node, the second node, the third node, and the fourth node are sequentially connected to obtain a branch for a "sight spot entrance ticket" service. Connecting the plurality of branches to the root node may result in a dictionary tree. Wherein the root node does not indicate character information.
Illustratively, after connecting multiple branches to the root node, the embodiment may also employ a string matching algorithm to add mismatch pointers to the nodes in the branches. The string matching Algorithm may be, for example, a knudt-Morris-Pratt Algorithm (The Knuth-Morris-Pratt Algorithm, KMP Algorithm), a BM (Boyer-Moore) Algorithm, or The like. By adding the mismatch pointer, the query efficiency of the dictionary tree can be improved.
When searching for query information relating to a target service from the historical query information based on the dictionary tree, the historical query information may be matched with a predetermined dictionary tree, and query information matched with a branch of the predetermined dictionary tree for the target service may be used as the target query information.
Illustratively, the predetermined dictionary tree may be queried based on each historical query information. If a certain historical query information comprises a word string matched with a word group indicated by a certain branch in a preset dictionary tree, the certain historical query information is reserved, and a query-service pair is formed by the service corresponding to the word group and the certain historical query information. By this way, multiple query-service pairs can be obtained. And taking the historical query information forming a query-service pair with the target service as the target query information. If a mismatch pointer is added to the predetermined dictionary tree, the predetermined dictionary tree may be queried by using the aforementioned string matching algorithm. For example, the predetermined dictionary tree may be generated and queried based on an AC automaton (Aho-cordick automation) algorithm, which is not limited by this disclosure.
In an embodiment, after the query information matching with the branch for the target service is recalled by querying the predetermined dictionary tree, the recalled query information may be further filtered according to a similarity between the business name of the target service, for example, and the filtered query information is used as the target query information. By the method, the relevance between the target query information and the target service can be improved, and therefore the accuracy of the determined key information aiming at the target service is improved conveniently.
For example, as shown in fig. 3, when determining query information related to a target service, the embodiment 300 may first query a predetermined dictionary tree 320 based on each historical query information in the query information base 310, that is, match the historical query information with the predetermined dictionary tree, and obtain query information matching a branch of the predetermined dictionary tree for the target service, as candidate query information 330. After the candidate query information is obtained, the similarity 350 between each candidate query information 330 and the service name 340 of the target service is determined. Candidate query information with similarity greater than or equal to a first threshold with the service name of the target service is selected from the candidate query information 330, resulting in target query information 360. For example, the semantic similarity calculation method described above may be used to determine the similarity 350 between each candidate query information 330 and the service name 340 of the target service, which is not limited in this disclosure. The first threshold may be set according to an actual requirement, for example, may be set to any value not less than 0.5, which is not limited in this disclosure. For example, if the candidate query information is "modern concision wind", the service name of the target service is "decoration", and the similarity between the two is 0.58, the candidate query information "modern concision wind" is taken as a piece of target query information.
In an embodiment, after obtaining the candidate query information 330 or obtaining the target query information, the obtained query information may be filtered based on a pre-constructed blacklist, for example, and the query information including words in the blacklist is filtered out. The pre-constructed blacklist may include forbidden words issued by an authority or sensitive words violating laws, social norms and hindering public interests, etc.
FIG. 4 is a schematic diagram illustrating the principle of extracting candidate key phrases from target query information according to an embodiment of the present disclosure.
The embodiment of the disclosure can perform word segmentation processing, part of speech analysis and the like on the target query information, and obtain the predetermined blacklist which cannot be used as the key information according to the processing result. When candidate key phrases are extracted from the target query information, words of the key phrases extracted from the target query information can be removed according to the preset black list, and phrases after the words are removed are used as the candidate key phrases. By the method, the determined candidate key phrases can better meet the setting requirements of the key information to a certain extent, and the efficiency and the accuracy of determining the key information are improved.
According to an embodiment of the present disclosure, as shown in fig. 4, the target query information may be multiple, and when extracting candidate key phrases from the target query information, the embodiment 400 may extract respective key phrases of multiple target query information 410 to obtain multiple key phrases 420. Candidate key-phrases of the plurality of key-phrases 420 are then determined based on the predefined blacklist 430. When candidate key phrases in the plurality of key phrases are determined, words belonging to a predetermined blacklist in the key phrases can be removed based on the predetermined blacklist, and the candidate key phrases are obtained.
In an embodiment, the predefined blacklist 430 may include non-allowed parts-of-speech 431. In determining the candidate key-phrases in the plurality of key-phrases, the part-of-speech of each word included in each of the plurality of key-phrases 420 may be determined. And then removing words with the part of speech being not allowed part of speech from each phrase to obtain candidate key phrases. The part of speech analysis may be performed on the key phrases by using a Lexical Analyzer (LA) or a natural language analysis tool in the related art, so as to determine the part of speech of each word included in each phrase. The non-allowed part of speech may be, for example, at least one of a group/organizational noun, quantifier, adverb, english abbreviation, and the like.
In an embodiment, after the above embodiment removes the words with the part of speech being not allowed part of speech from each phrase, the obtained key phrase is used as the third phrase 440. Candidate key phrases are then screened from the third phrase 440 based on the similarity between the third phrase and the service name 450 of the target service. Specifically, a third phrase having a greater similarity with the service name 450 of the target service may be used as the candidate key phrase. Wherein, the third phrase with larger similarity can be selected according to a preset fourth threshold. The fourth threshold may be greater than the first threshold described above, thereby gradually increasing the requirement for the relevance of the filtered phrases to the target service. According to the embodiment, after the words are removed from the key phrases, the third phrases are screened based on the similarity between the third phrases and the business names, so that the accuracy of subsequently determined key information can be improved conveniently, and the waste of unnecessary computing resources is reduced to a certain extent.
In an embodiment, the predefined blacklist 430 may further include non-allowed words 432. When determining a candidate key-phrase from the plurality of key-phrases, after selecting a third phrase with a greater similarity according to a fourth threshold value by the foregoing embodiment, a fourth phrase 460 may be obtained. The fourth phrase 460 is a third phrase having a similarity to the service name 450 of the target service greater than or equal to a fourth threshold. After the fourth phrase 460 is obtained, non-allowed words may be removed from the fourth phrase 460 to obtain a candidate key phrase 470. For example, the fourth phrase 460 may be compared with the non-allowed words 432 to identify the non-allowed words in the fourth phrase 460 and eliminate the non-allowed words. The non-allowed words in this embodiment may include, for example, words belonging to the non-allowed part of speech described above. Because the analysis is inaccurate when the part of speech analysis is performed on the phrase, the embodiment can further enable the determined candidate key phrase to better accord with the set rule of the key information by further removing the words in the phrase based on the non-allowed words, and improve the efficiency and the accuracy of determining the key information.
According to an embodiment of the disclosure, in order to facilitate removing words in the third key phrase based on the non-allowed parts of speech, the embodiment may generate the non-allowed parts of speech in advance based on a plurality of target query information. For example, part-of-speech analysis may be performed on the target query information 410 to obtain part-of-speech of a word included in each query information in the target query information. By counting the parts of speech of the words in the target query information, a word list can be generated for each part of speech, and the words belonging to each part of speech in the target query information are listed in the word list. Thus, a vocabulary for each of the plurality of parts of speech can be obtained. After the word list is obtained, the average similarity between each word in the word list for each part of speech and the service name of the target service can be determined as the average similarity for each part of speech. And listing the part of speech for which the average similarity smaller than the third threshold is not allowed part of speech.
For example, after a vocabulary for each part of speech is obtained, the similarity between each word in the vocabulary and the service name of the target service may be calculated. And calculating the average value of the similarity between all the words in the word list and the business name as the average similarity aiming at the part of speech. The third threshold may be set according to actual requirements, and may be, for example, 0.5, and the like, which is not limited by the present disclosure.
According to an embodiment of the disclosure, in order to facilitate removing words in the third phrase based on the non-allowed words, the embodiment may generate the non-allowed words based on the plurality of target query information after determining the non-allowed parts of speech. For example, a target part-of-speech in the non-allowed parts-of-speech may be determined, and all words in the vocabulary for the target part-of-speech may be treated as non-allowed words. The parts of speech to which the word list including the words whose similarity with the service name of the target service is less than the fifth threshold value is used as the target part of speech. The fifth threshold may be any value smaller than the third threshold described above, for example, may be 0.3, and the like, which is not limited in this disclosure.
FIG. 5 is a schematic diagram of the principle of determining a target phrase in candidate key phrases according to an embodiment of the present disclosure.
According to the embodiment of the disclosure, when determining the target phrase in the candidate key phrases, phrases with low confusion (perplexity) can be selected as the target phrase to filter the phrases with high confusion, so that the determined target phrase is more suitable for the expression mode of the query information, and the accuracy of the service information fed back to the user is improved. In the natural language processing model, the lower the confusion, the higher the probability of explaining each word in the phrase, and the higher the degree of engagement and the degree of currency of the text.
For example, in determining the target phrase in the candidate key-phrases, the confusion of the candidate key-phrases may be determined first. And taking the candidate key phrases with the confusion degree smaller than the confusion degree threshold value as the target phrases. Wherein a statistical language model or a deep learning language model may be employed, for example, to determine the perplexity of candidate key-phrases. The statistical language model may be an N-Gram model, the Deep learning language model may be a convolutional neural network model, a Bidirectional Encoder reconstruction from transformations (Bert) model, and the like, and the convolutional neural network model may be, for example, a Long Short-Term Memory (LSTM) network model, a Bidirectional Long Short-Term Memory network model (Bi-LSTM), or a Deep contextualized word Representation (ELMo) model, and the like. It is to be understood that the model for determining the degree of confusion is only an example to facilitate understanding of the present disclosure, and the present disclosure is not limited thereto. The confusion threshold may be set according to actual requirements, for example, and may be 0.5, for example, which is not limited in this disclosure.
Illustratively, for a candidate key phrase "bad fit", the determined confusion may be high due to semantic ambiguity, and the candidate key phrase may be filtered out. For the candidate key phrase "brief wind decoration", since the confusion degree obtained by semantic definition is low, the candidate key phrase may be used as the target phrase.
According to the embodiment of the disclosure, considering that the candidate key phrases include phrases with partial words removed based on the blacklist, the similarity between the candidate key phrases and the service name of the target service is different from the similarity between the target query information and the service name of the target service. After selecting the phrases with low confusion, the embodiment may select the phrases with high similarity to the service name of the target service from the phrases with low confusion as the target phrases. For example, phrases having a similarity to the business name of the target business greater than or equal to a second threshold may be selected. The second threshold may be any value greater than the first threshold described above, and the second threshold may also be greater than the fourth threshold described above, so as to gradually increase the requirement on the relevance between the filtered phrases and the target service, and improve the accuracy of the determined target phrases. Through the screening based on the similarity, the relevance between the determined target phrase and the target service can be further improved, and the accuracy of the determined target phrase is improved.
According to the embodiment of the disclosure, in order to improve the hit rate of the service information on the basis of accurately providing the service information based on the determined target phrase, phrase extraction may be further performed on a longer candidate key phrase or words that do not satisfy the key information rule of the target service may be eliminated.
Illustratively, as shown in fig. 5, the embodiment 500 may divide the candidate key-phrases 510 into a fifth phrase 520 having a length less than or equal to a predetermined length and a first phrase 530 having a length greater than the predetermined length according to the length of each of the candidate key-phrases. For the first phrase 530, phrases having a length less than or equal to a predetermined length may be extracted from the first phrase 530 as the second phrase 550 based on the target part of speech 540 for the target service. And replaces the first phrase with the second phrase 550 to obtain a replaced candidate key-phrase 560. Then, the confusion of each phrase in the substituted candidate key phrases 560 is determined, the phrase 570 with lower confusion (i.e. the phrase with lower confusion than the confusion threshold) is selected from the substituted candidate key phrases 560, and the phrase with higher similarity to the service name of the target service is selected from the phrase 570 with lower confusion as the target phrase 580.
The predetermined length may be set according to actual requirements, and may be set to any value such as 6, for example. The target part-of-speech 540 may be set in advance according to actual requirements, and may include verbs, nouns, and the like. For example, if the candidate key phrase is "bad refrigerator needs to be repaired", and the length of the phrase exceeds a predetermined length, word segmentation processing and part-of-speech analysis may be performed on the candidate key phrase, and the second phrase "refrigerator repair" is obtained by reserving "refrigerator" and "repair" whose parts-of-speech are the target parts-of-speech.
For example, if the phrase length of the word belonging to the target part of speech in the candidate key phrase is still greater than the predetermined length, the word string with the predetermined length located at the front position may be selected as the second phrase. Alternatively, phrases that are still larger than a predetermined length may be filtered out directly.
According to an embodiment of the present disclosure, after obtaining the replaced candidate key-phrase 560, a deduplication operation may be performed on the replaced candidate key-phrase 560. The confusion is then calculated for the candidate key-phrases remaining after the deduplication operation. By the method, repeated processing of the same phrase can be reduced, and processing efficiency is improved to a certain extent.
Fig. 6 is a schematic diagram illustrating the principle of a method of determining key information according to an embodiment of the present disclosure.
As shown in fig. 6, in an application scenario of providing a service item to a user in response to query information, the method 600 for determining key information of this embodiment may be divided into a target query determination flow 610, a candidate key phrase determination flow 620 and a trigger phrase determination flow 630 as a whole.
In the target query determination process 610, word segmentation processing may be performed on each service item name in the service item category 611 to obtain a word pair 612 for each service item. After the word pair 612 is obtained, a dictionary tree (Trie tree 613) may be generated based on the word pair 612. The above operation is similar to the operation of generating a dictionary tree as described above. In the obtained dictionary tree, the Trie tree 613 may be queried based on each query in the query library 614, and a query matching the Trie tree and a service item targeted by a branch matching the query form a query-service item pair 615. And screening the query in the query-service item pair 615 based on the similarity between the query and the matched service item name, and taking the query with higher similarity obtained by screening as a target query 616 for later flow standby to complete a target query determination flow 610.
After the target query determination process 610 is completed, the target queries may be grouped based on the service items paired with the target queries, so as to divide the queries paired with the same service item into one group, and obtain a query group for each service item. The candidate key phrase determination process 620 and the trigger phrase determination process 630 are then performed with each service item as a target service as described previously.
In the candidate key phrase determination process 620, word segmentation and part-of-speech analysis may be performed on each query in the query group for each service item, and according to a similarity statistical result between each word obtained by word segmentation and the service item name, a predetermined blacklist 621 is generated, where the predetermined blacklist 621 includes a non-allowed part-of-speech 6211 and a non-allowed word 6212. The predetermined black list 621 is generated in a similar manner as described above. A phrase extraction algorithm as described above is used to extract phrases from each target query in the query set for the target service, resulting in key phrases 622. After the predetermined blacklist 621 and the key phrases 622 are obtained, words with parts of speech being non-allowed parts of speech 6211 in the key phrases 622 may be removed based on the predetermined blacklist 621, and non-allowed words 6212 may be removed, so as to obtain a candidate key phrase 623. Wherein the method of obtaining the candidate key-phrases 623 based on the predetermined blacklist is similar to the method described above.
In the trigger phrase determination process 630, operation S631 is performed to determine whether the length of each phrase in the candidate key phrases 623 is less than or equal to a predetermined length. If it is equal to or less than the predetermined length, the confusion of the phrase is directly calculated 633. If the length is greater than the predetermined length, words of the target part of speech may be first selected from the phrases, and the words selected from the phrases may be combined into a second phrase 632 corresponding to each phrase. The length of the second phrase 632 is less than or equal to a predetermined length. The confusion 633 of this second phrase 632 is then calculated. After the confusion of each phrase with the length less than or equal to the predetermined length is obtained, phrases 634 with low confusion (which may be the phrases with the confusion less than the confusion threshold described above) are selected from all phrases. Finally, the low-confusion phrase 634 is de-duplicated and similarity-filtered to obtain the trigger phrase 635. And the similarity screening is to select a phrase with higher similarity with the service item name from the first key phrases.
Based on the above method for determining the key information, the present disclosure also provides an apparatus for determining the key information, which will be described in detail below with reference to fig. 7.
Fig. 7 is a block diagram of a structure of an apparatus for determining key information according to an embodiment of the present disclosure.
As shown in fig. 7, the apparatus 700 for determining key information may include a query information determining module 710, a phrase extracting module 720, and a key information determining module 730.
The query information determination module 710 is configured to determine query information related to the target service as target query information. In an embodiment, the query information determining module 710 may be configured to perform the operation S210 described above, which is not described herein again.
The phrase extraction module 720 is used to extract candidate key phrases from the target query information. In an embodiment, the phrase extraction module 720 may be configured to perform the operation S220 described above, which is not described herein again.
The key information determining module 730 is configured to determine a target phrase in the candidate key phrases as key information for the target service based on a similarity between the candidate key phrase and a service name of the target service. In an embodiment, the key information determining module 730 may be configured to perform the operation S230 described above, which is not described herein again.
According to an embodiment of the present disclosure, the query information determination module 710 may include a matching sub-module and an information determination sub-module. And the matching submodule is used for matching the historical query information with a preset dictionary tree to obtain candidate query information. The information determination submodule is used for determining candidate query information, of which the similarity with the service name of the target service is greater than or equal to a first threshold value, as the target query information. Wherein the predetermined dictionary tree includes a plurality of branches including a branch for the target service, the candidate query information being matched with the branch for the target service.
According to an embodiment of the present disclosure, the key information determination module 730 may include a confusion determination sub-module and a target phrase determination sub-module. The confusion determination sub-module is used to determine the confusion of the candidate key-phrases. The target phrase determination sub-module is configured to determine a target phrase based on the candidate key-phrases having a perplexity less than the perplexity threshold.
According to an embodiment of the present disclosure, the target phrase determination submodule is specifically configured to: and determining key phrases with the similarity between the candidate key phrases with the confusion degree smaller than the confusion degree threshold value and the service name of the target service larger than or equal to a second threshold value as the target phrases.
According to an embodiment of the present disclosure, the above-described confusion determination sub-module may include a first phrase determination unit, a phrase extraction unit, and a confusion determination unit. The first phrase determination unit is used for determining a first phrase with the length larger than a preset length in the candidate key phrases. The phrase extraction unit is used for extracting a second phrase with the length being larger than or equal to a preset length from the first phrase based on the target part of speech aiming at the target service, and replacing the first phrase with the second phrase. The confusion determination unit is used for determining the confusion of the second phrase.
According to an embodiment of the present disclosure, the target query information is multiple, and the phrase extraction module 720 may include a phrase extraction sub-module and a candidate phrase determination sub-module. The phrase extraction submodule is used for extracting key phrases of the target query information to obtain a plurality of key phrases. The candidate phrase determination sub-module is configured to determine a candidate key-phrase of the plurality of key-phrases based on a predetermined blacklist. Wherein the predetermined blacklist is generated based on a plurality of target query information.
According to an embodiment of the present disclosure, the predetermined blacklist includes non-allowed parts of speech. The phrase extraction submodule comprises a part of speech determining unit, a word eliminating unit and a second phrase determining unit. The part of speech determining unit is used for determining the part of speech of each word in each phrase aiming at each phrase in the plurality of key phrases. The word removing unit is used for removing words with the part of speech being not allowed part of speech from each phrase to obtain a third phrase. The second phrase determining unit is used for determining candidate key phrases based on the similarity between the third phrase and the business name of the target business.
According to an embodiment of the present disclosure, the apparatus 700 for determining key information may further include a part-of-speech generating module, configured to generate a non-allowed part-of-speech based on the plurality of target query information. The part-of-speech generation module may include a part-of-speech statistics submodule, a similarity determination submodule, and a first part-of-speech determination submodule. The part-of-speech statistics submodule is used for counting the parts of speech of the words in the target query information to obtain a word list aiming at each part of speech in multiple parts of speech. The similarity determination submodule is used for determining the average similarity between each word in the word list aiming at each part of speech and the service name of the target service, and the average similarity is used as the average similarity aiming at each part of speech. The first part-of-speech determination submodule is used for determining that the part-of-speech for which the average similarity smaller than a third threshold value is not allowed part-of-speech.
According to an embodiment of the present disclosure, the predetermined blacklist may further include non-allowed words. The second phrase determining unit may include a phrase determining sub-unit and a word eliminating sub-unit. The phrase determining subunit is configured to determine a third phrase, of which the similarity with the service name of the target service is greater than or equal to a fourth threshold, and obtain a fourth phrase. And the word removing subunit is used for removing the non-allowed words from the fourth phrase to obtain candidate key phrases, wherein the non-allowed words comprise words belonging to non-allowed parts of speech.
According to an embodiment of the present disclosure, the apparatus 700 for determining key information may further include a word generating module, configured to generate a non-allowed word based on the plurality of target query information. The word generation module may include a second part-of-speech determination submodule and a word determination submodule. And the second part-of-speech determination submodule is used for determining a target part-of-speech in the non-allowed part-of-speech, and the similarity between each word in the word list aiming at the target part-of-speech and the business name of the target business is smaller than a fifth threshold value. The word determination submodule is used for determining that the words in the word list aiming at the target part of speech are non-allowed words.
According to an embodiment of the present disclosure, the apparatus 700 for determining key information may further include a dictionary tree generating module, configured to generate a predetermined dictionary tree based on the service names of the plurality of services. The dictionary tree generating module may include a phrase obtaining sub-module and a dictionary tree generating sub-module. The phrase obtaining submodule is used for dividing the service name of each service in the plurality of services by taking a word as a unit to obtain a phrase aiming at each service. And the dictionary tree generation submodule is used for generating branches aiming at each service according to the word group aiming at each service to obtain a plurality of branches so as to form a preset dictionary tree.
It should be noted that, in the technical solution of the present disclosure, the acquisition, storage, application, and the like of the personal information of the related user all conform to the regulations of the relevant laws and regulations, and do not violate the common customs of the public order.
The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.
FIG. 8 illustrates a schematic block diagram of an electronic device 800 that may be used to implement the determination of critical information of embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 8, the apparatus 800 includes a computing unit 801 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)802 or a computer program loaded from a storage unit 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data required for the operation of the device 800 can also be stored. The calculation unit 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to bus 804.
A number of components in the device 800 are connected to the I/O interface 805, including: an input unit 806, such as a keyboard, a mouse, or the like; an output unit 807 such as various types of displays, speakers, and the like; a storage unit 808, such as a magnetic disk, optical disk, or the like; and a communication unit 809 such as a network card, modem, wireless communication transceiver, etc. The communication unit 809 allows the device 800 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.
Computing unit 801 may be a variety of general and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 801 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and the like. The calculation unit 801 executes the respective methods and processes described above, such as a method of determining key information. For example, in some embodiments, the method of determining key information may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 808. In some embodiments, part or all of the computer program can be loaded and/or installed onto device 800 via ROM 802 and/or communications unit 809. When loaded into RAM 803 and executed by computing unit 801, a computer program may perform one or more of the steps of the method of determining key information described above. Alternatively, in other embodiments, the computing unit 801 may be configured to perform the method of determining the critical information by any other suitable means (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server may be a cloud Server, which is also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service extensibility in a traditional physical host and a VPS service ("Virtual Private Server", or simply "VPS"). The server may also be a server of a distributed system, or a server incorporating a blockchain.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.
The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims (25)

1. A method of determining key information, comprising:
determining query information related to a target service as target query information;
extracting candidate key phrases from the target query information; and
and determining a target phrase in the candidate key phrases as key information for the target service based on the similarity between the candidate key phrases and the service name of the target service.
2. The method of claim 1, wherein determining query information related to a target service comprises:
matching the historical query information with a predetermined dictionary tree to obtain candidate query information; and
determining candidate query information having a similarity greater than or equal to a first threshold with the service name of the target service as the target query information,
wherein the predetermined dictionary tree includes a plurality of branches including a branch for the target service, the candidate query information matching the branch for the target service.
3. The method of claim 1 or 2, wherein determining a target phrase of the candidate key phrases comprises:
determining a perplexity of the candidate key-phrase; and
determining the target phrase based on the candidate key-phrases for which the perplexity is less than a perplexity threshold.
4. The method of claim 3, wherein determining the target phrase comprises:
and determining key phrases, of the candidate key phrases with the confusion degree smaller than the confusion degree threshold value, of which the similarity degree with the service name of the target service is larger than or equal to a second threshold value, as the target phrases.
5. The method of claim 3 or 4, wherein determining a perplexity of the candidate key-phrase comprises:
determining a first phrase of the candidate key phrases having a length greater than a predetermined length;
extracting a second phrase with the length less than or equal to the preset length from the first phrase based on the target part of speech aiming at the target service, and replacing the first phrase with the second phrase; and
determining a perplexity of the second phrase.
6. The method according to claim 1 or 2, wherein the target query information is plural; the extracting candidate key phrases from the target query information comprises:
extracting respective key phrases of the target query information to obtain a plurality of key phrases; and
determining a candidate key-phrase of the plurality of key-phrases based on a predetermined blacklist,
wherein the predetermined blacklist is generated based on the plurality of target query information.
7. The method of claim 6, wherein the predetermined blacklist includes non-allowed parts of speech; the determining candidate key-phrases of the plurality of key-phrases comprises:
for each phrase in the plurality of key phrases, determining the part of speech of each word in each phrase;
removing words with the part of speech being the non-allowable part of speech from each phrase to obtain a third phrase; and
determining the candidate key phrase based on a similarity between the third phrase and the service name of the target service.
8. The method of claim 7, further comprising generating the non-allowed part of speech based on the plurality of target query information by:
counting the parts of speech of the words in the target query information to obtain a word list aiming at each part of speech in multiple parts of speech;
determining the average similarity between each word in the word list aiming at each part of speech and the service name of the target service as the average similarity aiming at each part of speech; and
and determining the part of speech for which the average similarity smaller than a third threshold value is determined to be the non-allowed part of speech.
9. The method according to claim 7 or 8, wherein the predetermined blacklist further comprises non-allowed words; determining the candidate key-phrase comprises:
determining a third phrase with the similarity between the third phrase and the service name of the target service being greater than or equal to a fourth threshold value to obtain a fourth phrase; and
removing the non-allowed word from the fourth phrase to obtain the candidate key phrase,
wherein the non-allowed words include words belonging to the non-allowed part of speech.
10. The method of claim 9, further comprising generating a non-allowed word based on the plurality of target query information by:
determining a target part of speech in the non-allowed part of speech, wherein the similarity between each word in a word list aiming at the target part of speech and the service name of the target service is smaller than a fifth threshold; and
determining that the word in the vocabulary for the target part of speech is the non-allowed word.
11. The method of claim 2, further comprising generating the predetermined trie based on business names of a plurality of businesses by:
dividing the service name of each service in the plurality of services by taking a word as a unit to obtain a phrase aiming at each service; and
and generating branches aiming at the services according to the phrases aiming at the services to obtain the branches so as to form the preset dictionary tree.
12. An apparatus to determine key information, comprising:
the query information determining module is used for determining query information related to the target service as target query information;
a phrase extraction module for extracting candidate key phrases from the target query information; and
and the key information determining module is used for determining a target phrase in the candidate key phrases as the key information aiming at the target service based on the similarity between the candidate key phrase and the service name of the target service.
13. The apparatus of claim 12, wherein the query information determination module comprises:
the matching submodule is used for matching the historical query information with a preset dictionary tree to obtain candidate query information; and
an information determination sub-module, configured to determine candidate query information having a similarity greater than or equal to a first threshold with the service name of the target service, as the target query information,
wherein the predetermined dictionary tree includes a plurality of branches including a branch for the target service, the candidate query information matching the branch for the target service.
14. The apparatus of claim 12 or 13, wherein the critical information determination module comprises:
a confusion determination sub-module for determining a confusion of the candidate key phrase; and
a target phrase determination sub-module for determining the target phrase based on the candidate key phrases having the perplexity less than the perplexity threshold.
15. The apparatus of claim 14, wherein the target phrase determination submodule is specifically configured to:
and determining key phrases, of the candidate key phrases with the confusion degree smaller than the confusion degree threshold value, of which the similarity degree with the service name of the target service is larger than or equal to a second threshold value, as the target phrases.
16. The apparatus of claim 14 or 15, wherein the confusion determination sub-module comprises:
a first phrase determining unit, configured to determine a first phrase with a length greater than a predetermined length from among the candidate key phrases;
a phrase extracting unit, configured to extract a second phrase having a length less than or equal to the predetermined length from the first phrase based on a target part of speech for the target service, and replace the first phrase with the second phrase; and
a confusion determination unit for determining a confusion of the second phrase.
17. The apparatus according to claim 12 or 13, wherein the target query information is plural; the phrase extraction module comprises:
the phrase extraction submodule is used for extracting key phrases of the target query information to obtain a plurality of key phrases; and
a candidate phrase determination sub-module for determining candidate key-phrases of the plurality of key-phrases based on a predetermined blacklist,
wherein the predetermined blacklist is generated based on the plurality of target query information.
18. The apparatus of claim 17, wherein the predetermined blacklist includes non-allowed parts of speech; the phrase extraction submodule includes:
a part-of-speech determining unit configured to determine, for each of the plurality of key phrases, a part-of-speech of each word in the each phrase;
the word removing unit is used for removing words with the part of speech being the non-allowable part of speech from each phrase to obtain a third phrase; and
a second phrase determining unit, configured to determine the candidate key phrase based on a similarity between the third phrase and the service name of the target service.
19. The apparatus of claim 18, further comprising a part of speech generation module to generate the non-allowed part of speech based on the plurality of target query information; the part of speech generation module comprises:
the part-of-speech statistics submodule is used for counting the parts of speech of the words in the target query information to obtain a word list aiming at each part of speech in multiple parts of speech;
a similarity determining submodule, configured to determine an average similarity between each word in the vocabulary for each part of speech and the service name of the target service, as an average similarity for each part of speech; and
and the first part-of-speech determination submodule is used for determining the part-of-speech which is less than the third threshold and is aimed at by the average similarity as the non-allowed part-of-speech.
20. The apparatus of claim 18 or 19, wherein the predetermined blacklist further comprises non-allowed words; the second phrase determination unit includes:
the phrase determining subunit is used for determining a third phrase of which the similarity with the service name of the target service is greater than or equal to a fourth threshold value to obtain a fourth phrase; and
a word eliminating subunit, configured to eliminate the non-allowable word from the fourth phrase to obtain the candidate key phrase,
wherein the non-allowed words include words belonging to the non-allowed part of speech.
21. The apparatus of claim 20, further comprising a word generation module to generate a non-allowed word based on the plurality of target query information; the word generation module comprises:
a second part-of-speech determination submodule, configured to determine a target part-of-speech in the non-allowed part-of-speech, where similarity between each word in a vocabulary of the target part-of-speech and a service name of the target service is smaller than a fifth threshold; and
and the word determining submodule is used for determining that the words in the word list aiming at the target part of speech are the non-allowed words.
22. The apparatus of claim 13, further comprising a trie generation module for generating the predetermined trie based on business names of a plurality of businesses; the dictionary tree generation module comprises:
the phrase obtaining submodule is used for dividing the service name of each service in the plurality of services by taking a word as a unit to obtain a phrase aiming at each service; and
and the dictionary tree generation submodule is used for generating branches aiming at the services according to the phrases aiming at the services to obtain the branches so as to form the preset dictionary tree.
23. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-11.
24. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any of claims 1-11.
25. A computer program product comprising a computer program which, when executed by a processor, implements a method according to any one of claims 1 to 11.
CN202110520029.2A 2021-05-12 2021-05-12 Method and device for determining key information, electronic equipment and storage medium Pending CN113220838A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110520029.2A CN113220838A (en) 2021-05-12 2021-05-12 Method and device for determining key information, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110520029.2A CN113220838A (en) 2021-05-12 2021-05-12 Method and device for determining key information, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN113220838A true CN113220838A (en) 2021-08-06

Family

ID=77095275

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110520029.2A Pending CN113220838A (en) 2021-05-12 2021-05-12 Method and device for determining key information, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113220838A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070143278A1 (en) * 2005-12-15 2007-06-21 Microsoft Corporation Context-based key phrase discovery and similarity measurement utilizing search engine query logs
CN112115227A (en) * 2020-08-14 2020-12-22 咪咕文化科技有限公司 Data query method and device, electronic equipment and storage medium
CN112632292A (en) * 2020-12-23 2021-04-09 深圳壹账通智能科技有限公司 Method, device and equipment for extracting service keywords and storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070143278A1 (en) * 2005-12-15 2007-06-21 Microsoft Corporation Context-based key phrase discovery and similarity measurement utilizing search engine query logs
CN112115227A (en) * 2020-08-14 2020-12-22 咪咕文化科技有限公司 Data query method and device, electronic equipment and storage medium
CN112632292A (en) * 2020-12-23 2021-04-09 深圳壹账通智能科技有限公司 Method, device and equipment for extracting service keywords and storage medium

Similar Documents

Publication Publication Date Title
CN111460083B (en) Method and device for constructing document title tree, electronic equipment and storage medium
CN110569370B (en) Knowledge graph construction method and device, electronic equipment and storage medium
CN113836314B (en) Knowledge graph construction method, device, equipment and storage medium
CN114579104A (en) Data analysis scene generation method, device, equipment and storage medium
CN111435406A (en) Method and device for correcting database statement spelling errors
US20210216710A1 (en) Method and apparatus for performing word segmentation on text, device, and medium
CN116484826B (en) Operation ticket generation method, device, equipment and storage medium
CN111666417B (en) Method, device, electronic equipment and readable storage medium for generating synonyms
EP3992814A2 (en) Method and apparatus for generating user interest profile, electronic device and storage medium
CN113204613B (en) Address generation method, device, equipment and storage medium
CN113220838A (en) Method and device for determining key information, electronic equipment and storage medium
CN114491232A (en) Information query method and device, electronic equipment and storage medium
CN114417862A (en) Text matching method, and training method and device of text matching model
CN114048315A (en) Method and device for determining document tag, electronic equipment and storage medium
CN113051896A (en) Method and device for correcting text, electronic equipment and storage medium
CN113868508A (en) Writing material query method and device, electronic equipment and storage medium
CN112926297A (en) Method, apparatus, device and storage medium for processing information
CN114186552B (en) Text analysis method, device and equipment and computer storage medium
CN113569027B (en) Document title processing method and device and electronic equipment
CN113656592B (en) Data processing method and device based on knowledge graph, electronic equipment and medium
CN113971216B (en) Data processing method and device, electronic equipment and memory
CN114398469A (en) Method and device for determining search term weight and electronic equipment
CN114329212A (en) Information recommendation method and device and electronic equipment
CN116226039A (en) File name index generation method and device and file searching method and device
CN114969485A (en) Search prompting method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination