CN113325959A - Input corpus recommendation method and device - Google Patents

Input corpus recommendation method and device Download PDF

Info

Publication number
CN113325959A
CN113325959A CN202110576949.6A CN202110576949A CN113325959A CN 113325959 A CN113325959 A CN 113325959A CN 202110576949 A CN202110576949 A CN 202110576949A CN 113325959 A CN113325959 A CN 113325959A
Authority
CN
China
Prior art keywords
user
corpora
user input
corpus
local
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110576949.6A
Other languages
Chinese (zh)
Inventor
朱彬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Wodong Tianjun Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN202110576949.6A priority Critical patent/CN113325959A/en
Publication of CN113325959A publication Critical patent/CN113325959A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/02Input arrangements using manually operated switches, e.g. using keyboards or dials
    • G06F3/023Arrangements for converting discrete items of information into a coded form, e.g. arrangements for interpreting keyboard generated codes as alphanumeric codes, operand codes or instruction codes
    • G06F3/0233Character input methods
    • G06F3/0237Character input methods using prediction or retrieval techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation

Abstract

The invention discloses a recommendation method and device for input corpora, and relates to the technical field of computers. One embodiment of the method comprises: receiving a user input; generating phrase identification based on the user input and the index number corresponding to the local word stock to judge whether phrases matched with the user input exist in the local word stock: if yes, acquiring one or more linguistic data from a local corpus according to the phrases to recommend to the user; and if not, sending the user input to a search engine so as to obtain one or more corpora from the search engine to recommend to the user. The implementation mode improves the efficiency of obtaining the corpus and ensures the instantaneity of corpus recommendation.

Description

Input corpus recommendation method and device
Technical Field
The invention relates to the technical field of computers, in particular to a recommendation method and device for input expectation.
Background
The text communication is widely applied to the fields of customer service consultation, social platform, information retrieval and the like, and in order to improve user experience, simplify a user input mode or reduce user input time, input association is often adopted to assist user input, namely possible input of a user is predicted according to text rules or an intelligent algorithm to be recommended to the user.
At present, a common input association method is implemented based on a search engine, and a full-text retrieval function provided by an unstructured database such as es (elastic search) is used to search an existing corpus for user input, so as to provide a predicted corpus corresponding to the user input.
In the process of implementing the invention, the inventor finds that at least the following problems exist in the prior art: the response time corresponding to the input association function realized based on the search engine is relatively long, and the scene with high instantaneity requirements, such as online customer service or online consultation, cannot be met, so that the user experience is poor.
Disclosure of Invention
In view of this, embodiments of the present invention provide a method and an apparatus for recommending input corpora, which can preferentially obtain the corpora corresponding to the user input locally to recommend to the user before using a search engine to recommend the corpora, thereby improving the efficiency of obtaining the corpora, reducing the response time, ensuring the instantaneity of the recommended corpora, and improving the user experience.
To achieve the above object, according to an aspect of an embodiment of the present invention, there is provided a recommendation method for input corpus, including:
receiving a user input;
generating phrase identification based on the user input and the index number corresponding to the local word stock to judge whether phrases matched with the user input exist in the local word stock:
if yes, acquiring one or more linguistic data from a local corpus according to the phrases to recommend to the user;
and if not, sending the user input to a search engine so as to obtain one or more corpora from the search engine to recommend to the user.
Optionally, the obtaining one or more corpora from a local corpus according to the phrase to recommend to the user includes:
acquiring one or more corpora corresponding to the phrases and one or more preset weight parameters corresponding to the corpora from the local corpus;
acquiring one or more service parameters corresponding to the user input;
under the condition that the business parameters belong to preset weight parameters corresponding to the corpora, calculating weight scores corresponding to the corpora according to the weights corresponding to the business parameters;
and selecting one or more corpora according to the weight scores corresponding to the corpora from high to low to recommend the corpora to the user.
Optionally, the method further comprises:
before judging whether a phrase matched with the user input exists in a local word stock, deleting one or more of the following contents from the user input: stop words, punctuation marks, special characters, expressions.
Optionally, the method further comprises:
before judging whether a phrase matched with the user input exists in a local word stock, judging whether the version corresponding to the local word stock is the latest version, and acquiring the word stock of the latest version from the search engine under the condition that the version corresponding to the local word stock is not the latest version.
Optionally, the method further comprises:
before judging whether a phrase matched with the user input exists in a local word stock, judging whether the local word stock is available: if yes, continuing to judge whether a phrase matched with the user input exists in a local word stock; if not, sending the user input to the search engine.
Optionally, the local thesaurus is set as unavailable when any one of the following occurs: the user is a blacklist user, the current time period is a preset unavailable time period, the response time corresponding to the local word bank is greater than the threshold response time, and the number of received user inputs is greater than the threshold number of user inputs.
Optionally, the user input, one or more corpora recommended to the user corresponding to the user input, and the corpus clicked by the user are collected by a buried point technique, so as to update the local lexicon and the local corpus.
Optionally, setting the local thesaurus to unavailable based on one or more pluggable components; and collecting the user input, one or more corpora recommended to the user corresponding to the user input and the corpora clicked by the user based on a pluggable service embedding component.
To achieve the above object, according to another aspect of the embodiments of the present invention, there is provided an apparatus for recommending an input corpus, including: the system comprises a user input receiving module and a corpus acquiring module; wherein the content of the first and second substances,
the user input receiving module is used for receiving user input;
the corpus acquiring module is configured to generate a phrase identifier based on the user input and an index number corresponding to the local lexicon, so as to determine whether a phrase matching the user input exists in the local lexicon:
if yes, acquiring one or more linguistic data from a local corpus according to the phrases to recommend to the user;
and if not, sending the user input to a search engine so as to obtain one or more corpora from the search engine to recommend to the user.
Optionally, the obtaining one or more corpora from a local corpus according to the phrase to recommend to the user includes:
acquiring one or more corpora corresponding to the phrases and one or more preset weight parameters corresponding to the corpora from the local corpus;
acquiring one or more service parameters corresponding to the user input;
under the condition that the business parameters belong to preset weight parameters corresponding to the corpora, calculating weight scores corresponding to the corpora according to the weights corresponding to the business parameters;
and selecting one or more corpora according to the weight scores corresponding to the corpora from high to low to recommend the corpora to the user.
Optionally, the method further comprises: a user input processing module; wherein the content of the first and second substances,
the user input processing module is used for deleting one or more of the following contents from the user input before judging whether the phrase matched with the user input exists in the local word stock: stop words, punctuation marks, special characters, expressions.
Optionally, the method further comprises: a lexicon preprocessing module; wherein the content of the first and second substances,
the word stock preprocessing module is used for judging whether the version corresponding to the local word stock is the latest version or not before judging whether the word stock has the phrase matched with the user input or not, so that the word stock of the latest version is obtained from the search engine under the condition that the version corresponding to the local word stock is not the latest version.
Optionally, the thesaurus preprocessing module is further configured to,
before judging whether a phrase matched with the user input exists in a local word stock, judging whether the local word stock is available: if yes, continuing to judge whether a phrase matched with the user input exists in a local word stock; if not, sending the user input to the search engine.
Optionally, the thesaurus preprocessing module is further configured to set the local thesaurus as unavailable when any one of the following conditions occurs: the user is a blacklist user, the current time period is a preset unavailable time period, the response time corresponding to the local word bank is greater than the threshold response time, and the number of received user inputs is greater than the threshold number of user inputs.
Optionally, the thesaurus preprocessing module further comprises one or more of the following pluggable components: the device comprises a switch assembly, a fusing current limiting assembly and a flow distribution assembly; wherein the content of the first and second substances,
the switch component is used for judging whether the user is a blacklist user or whether the current time period is an unavailable time period;
the fusing current limiting component is used for judging whether the response time corresponding to the local word bank is greater than the threshold response time;
the traffic distribution component is configured to determine whether the received number of user inputs is greater than a threshold number of user inputs.
Optionally, the thesaurus preprocessing module further includes: a service site burying component; wherein the content of the first and second substances,
and the service embedded point component is used for collecting the user input, one or more corpora recommended to the user corresponding to the user input and the corpora clicked by the user through an embedded point technology so as to update the local word stock and the local corpus.
To achieve the above object, according to an aspect of an embodiment of the present invention, there is provided an electronic device for recommending input corpora, including: one or more processors; a storage device for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the method of any of the methods of recommending an input expectation as described above.
To achieve the above object, according to an aspect of embodiments of the present invention, there is provided a computer-readable medium on which a computer program is stored, the program, when executed by a processor, implementing any one of the methods of inputting a recommendation of an expectation as described above.
One embodiment of the above invention has the following advantages or benefits: the method has the advantages that phrases corresponding to user input are matched in the local word stock before the search engine is used for carrying out the corpus recommendation, and one or more corpora are obtained from the local word stock according to the phrases to be recommended to the user, so that the corpus obtaining efficiency is improved, the response time is reduced, the instantaneity of the recommended corpora is guaranteed, and the user experience is improved; meanwhile, under the condition that the local word stock cannot be matched with the corresponding word group, the reliability and the universality of the corpus recommendation are ensured by continuously using the search engine to obtain one or more corpora to recommend to the user.
Further effects of the above-mentioned non-conventional alternatives will be described below in connection with the embodiments.
Drawings
The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:
fig. 1 is a schematic diagram of a main flow of a recommendation method for input corpus according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a main flow of a phrase matching method according to an embodiment of the present invention;
FIG. 3 is a diagram illustrating a mapping relationship of a local lexicon according to an embodiment of the present invention;
FIG. 4 is a schematic diagram illustrating a main flow of a method for obtaining corpora from a local corpus according to an embodiment of the present invention;
FIG. 5 is a diagram illustrating a mapping relationship of a local corpus according to an embodiment of the present invention;
FIG. 6 is a schematic diagram illustrating a main flow of another method for recommending input corpus according to an embodiment of the present invention;
FIG. 7a is a diagram illustrating major blocks of an apparatus for recommending input corpus according to an embodiment of the present invention;
FIG. 7b is a schematic diagram of the main structure of a lexicon preprocessing module according to an embodiment of the present invention;
FIG. 8 is an exemplary system architecture diagram in which embodiments of the present invention may be employed;
fig. 9 is a schematic structural diagram of a computer system suitable for implementing a terminal device or a server according to an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Fig. 1 is a schematic main flow diagram of a recommendation method of an input corpus according to an embodiment of the present invention, and as shown in fig. 1, the recommendation method of an input corpus may specifically include the following steps:
step S101, receiving a user input.
The user input refers to any text information input by the user during text communication, such as how and how you are often input during communication with the online customer service.
Step S102, generating a phrase identification base based on the user input and the index number corresponding to the local word stock to judge whether phrases matched with the user input exist in the local word stock: if the signal exists, the following step S103 is continuously executed, and if the signal does not exist, the following step S104 is continuously executed.
The local word stock is composed of one or more phrases extracted according to one or more high-frequency linguistic data frequently input by a user. For example, the linguistic data "how you are good" and "how you are good" frequently input by the user are taken as examples for explanation, and the corresponding phrase is "hello"; the phrase corresponding to the linguistic data "how the product is discounted" and "how the clothes is discounted" is "discount". It can be understood that there may be thousands of linguistic data input by the user and thousands of corresponding phrases, and in order to further improve the efficiency of obtaining the linguistic data corresponding to the user input from the local, only some high-frequency linguistic data or popular-term linguistic data frequently input by the user may be stored in the local corpus, and correspondingly, only one or more popular phrases corresponding to the high-frequency linguistic data, such as your good, coupon, discount, return, change, etc., may be stored in the local corpus. More specifically, the local lexicon and the local corpus may be directly stored in the local cache, so as to quickly obtain the corresponding corpus when the user input is received. It should be noted that the local cache only stores the high-frequency corpus and the high-frequency phrases, and the storage space of Redis and the like used by the search engine stores the full amount of user input corpus and corresponding phrases.
Specifically, the generating a phrase identifier based on the user input and the index number corresponding to the local lexicon to determine whether a phrase matching the user input exists in the local lexicon includes: querying the local word stock to determine whether a word group corresponding to the word group identifier exists; if yes, phrases matched with the user input exist in the local word stock; and if not, the local word stock does not have the phrase matched with the user input.
The index number is used for realizing the logical isolation of a plurality of different local word banks so as to improve the searching efficiency of the local word banks, each word bank has a unique index number corresponding to the index number, and the index number can be composed of any one or more of numbers, characters and special symbols. Specifically, a hash operation may be performed on the index number and the user input corresponding to the currently available local lexicon to generate a phrase identifier with a fixed length, that is, a token value, to determine whether a phrase corresponding to the token value exists in the local lexicon according to the token value, if so, it indicates that a phrase matching the user input exists in the local lexicon, and if not, it indicates that a word matching the user input does not exist in the local lexicon.
It can be understood that if a phrase corresponding to a user input can be matched in a local corpus, it indicates that the content input by the user is most likely to be a frequently used high-frequency corpus, and therefore, it can be considered that one or more corresponding corpora are obtained in the local corpus without remotely calling a search engine to obtain the corpora, thereby improving the obtaining efficiency of the recommended corpora; if the phrase corresponding to the user input cannot be matched in the local word stock, it indicates that the content input by the user is most likely not a frequently used high-frequency corpus, and the recommended corpus cannot be acquired from the local corpus, but the user input is input for assisting the user at the same time, and the user input is directly sent to a search engine to acquire the recommended corpus from the search engine, so that the user input can be ensured to be assisted always, and the user experience is improved.
In an optional embodiment, the method further comprises: before judging whether a phrase matched with the user input exists in a local word stock, deleting one or more of the following contents from the user input: stop words, punctuation marks, special characters, expressions. Therefore, meaningless characters in the user input can be removed, and the corresponding phrases can be matched from the local word stock more accurately.
In an optional embodiment, the method further comprises: before judging whether a phrase matched with the user input exists in a local word stock, judging whether the version corresponding to the local word stock is the latest version, and acquiring the word stock of the latest version from the search engine under the condition that the version corresponding to the local word stock is not the latest version.
The word stock of the latest version can be a word stock updated latest according to newly generated hot words or high-frequency words, or a specific word stock suitable for the current business time period, for example, in the field of retail business, high-frequency linguistic data input by users in different time periods are different, for example, in the period of the double eleven university, the users can consult information such as logistics, delivery time and the like due to large order volume and relatively low delivery speed, so that the word stock of the latest version can be generated based on the high-frequency linguistic data used by the users in the period of the double eleven university, and the corresponding word stock can be used in the period of the double eleven university, so that the local matching efficiency of the linguistic data is improved.
In an optional embodiment, the method further comprises: before judging whether a phrase matched with the user input exists in a local word stock, judging whether the local word stock is available: if yes, continuing to judge whether a phrase matched with the user input exists in a local word stock; if not, sending the user input to the search engine. That is, the local lexicon can be selectively enabled for associative input assistance to improve the versatility of the input expectation recommendation method provided by the present embodiment.
Further, the local thesaurus is set as unavailable when any one of the following occurs: the user is a blacklist user, the current time period is a preset unavailable time period, the response time corresponding to the local word bank is greater than the threshold response time, and the number of received user inputs is greater than the threshold number of user inputs.
The blacklist user refers to one or more users who can only use the search engine but cannot obtain the corresponding corpus input by the user through local matching, and thus, the number of user inputs of the corresponding corpus obtained based on the local lexicon can be reasonably limited by setting the blacklist user or the blacklist user proportion, so that the processing pressure is relieved. The unavailable time period is a time period set according to the change rule of the business adjustment requirement or the user input quantity, and can be expressed by year, month, day, time, minute, second and the like, for example, in the time period of 1-5 points in the morning every day, the number of users for consulting is small, the corresponding user input quantity is small, and the linguistic data can be obtained from the search engine without starting a local word stock for saving resources. Taking the threshold response time of 500ms as an example, if the response time of locally acquiring the corpus corresponding to the user input is greater than 500ms, the efficiency of locally acquiring the corpus corresponding to the user input is not obviously better than the efficiency of acquiring the corpus from the search engine, so that the expectation is directly acquired from the search engine in order to improve the efficiency. The threshold number of user inputs is to reasonably limit the number of user inputs that need to be processed locally, so as to relieve processing pressure and ensure the efficiency of obtaining the corpus locally.
More specifically, the local thesaurus is set to unavailable based on one or more pluggable components, such as using a switch component to determine whether the user is a blacklisted user or whether the current time period is an unavailable time period; judging whether the response time corresponding to the local word bank is larger than threshold response time or not based on a fusing current-limiting component; the traffic-based distribution component determines whether the number of received user inputs is greater than a threshold number of user inputs.
Step S103, one or more linguistic data are obtained from the local corpus according to the phrases and are recommended to the user.
In an optional implementation manner, the obtaining one or more corpora from a local corpus according to the phrases to recommend to the user includes: acquiring one or more corpora corresponding to the phrases and one or more preset weight parameters corresponding to the corpora from the local corpus; acquiring one or more service parameters corresponding to the user input; under the condition that the business parameters belong to preset weight parameters corresponding to the corpora, calculating weight scores corresponding to the corpora according to the weights corresponding to the business parameters; and selecting one or more corpora according to the weight scores corresponding to the corpora from high to low to recommend the corpora to the user.
The business parameters include, but are not limited to, an entry, a channel, a terminal type, article information, merchant information, order information, logistics information, a business field and the like of a user entering text communication. The text communication entry refers to an interface clicked by a user when entering text communication, such as an item detail page, an item homepage, an order homepage and the like; the character communication channel refers to H5 page, applet, APP and the like; the user terminal type refers to IOS terminal, android terminal and the like; the article information refers to information such as article identification, color, size, price, model, three-level classification and the like corresponding to the article consulted by the user; the merchant information refers to merchant identification, merchant level and the like; the business field refers to logistics, retail, block chain and the like divided according to business types. The preset weight parameter refers to information such as an article identifier, a merchant identifier, and an article type set according to actual requirements, that is, the service parameter corresponding to the user input may be a preset weight parameter corresponding to the corpus or may not be a preset weight parameter corresponding to the corpus, each preset weight parameter corresponds to a weight, and weights corresponding to different preset weight parameters may be the same or different.
Specifically, the product identifier, the merchant identifier, and the product classification included in the service parameter are SKUA, merchant a, and the mobile phone, respectively, and the preset weight parameter corresponding to the corpus obtained from the local corpus is as shown in table 1 below as an example to explain: for corpus 1, determining SKUA in the service parameters and a preset weight parameter corresponding to the corpus 1 of the mobile phone, and further calculating a weight score corresponding to the corpus 1 to be 4 through corresponding weight addition; for the corpus 2, it can be determined that SKUA, the merchant a, and the mobile phone in the service parameters are all preset weight parameters corresponding to the corpus 2, and then the weight score corresponding to the corpus 2 can be calculated to be 6 through corresponding weight addition; for the corpus 3, it may be determined that the merchant a in the service parameters is a preset weight parameter corresponding to the corpus 3, and further, it may be determined that the weight score corresponding to the corpus 3 is 2 through weight addition; for corpus 4, as the service parameters are not corresponding preset weight parameters, the corresponding weight scores are defaulted to be 0. Based on this, it can be determined that the weight scores corresponding to the corpus 1, the corpus 2, the corpus 3, and the corpus 4 are 4, 6, 2, and 0 in sequence, and then the corpuses can be sorted into the corpus 2, the corpus 1, the corpus 3, and the corpus 4 in sequence based on the order of the weight scores from high to low, so as to recommend one or more corpuses to the user in sequence, if one corpus is recommended to the user, the corpus is recommended to be the corpus 2, and if two corpuses are recommended to the user, the corpuses are recommended to be the corpus 2 and the corpus 1. Therefore, the corpora obtained from the local corpus are screened through the service parameters, and the accuracy of recommending the corpora to the user is further improved.
TABLE 1 corpus corresponding preset weight parameters and weight examples
Figure BDA0003084679910000111
Step S104, sending the user input to a search engine so as to obtain one or more corpora from the search engine to recommend to the user.
Specifically, with diversification of business fields, such as retail, logistics, finance, and the like, user input corpora, high-frequency corpora, local lexicons, and the like corresponding to each business field have uniqueness of their respective fields, which is different. Therefore, in order to improve the auxiliary effect of user input in different service fields, different search engines can be used for different service fields, so that the user input is shunted according to the service field corresponding to the user input under the condition of receiving the user input. Correspondingly, when the user input is received, the local word stock in the service field can be used for matching the word group according to the service field corresponding to the user input.
It can be understood that, with the development of the business field, the updating of the articles, and the like, the high-frequency corpus, the hot words, and the like input by the user at different stages are constantly changing, and in order to meet the changing requirements of the character communication, the local lexicon and the local corpus, as well as the full corpus and the phrases stored in the search engine, need to be constantly updated. Based on this, in an optional implementation, the user input, one or more corpora recommended to the user corresponding to the user input, and the corpus clicked by the user are collected by a buried point technology to update the local lexicon and the local corpus.
More specifically, the user input, one or more corpora recommended to the user corresponding to the user input, and the corpus clicked by the user are collected based on a pluggable service embedding component. That is, the word stock and the corpus corresponding to the input of the user are continuously updated according to the clicking condition of the recommended corpus by the user, so that the reliability of the input association assistance is improved.
Based on the embodiment, before the corpus recommendation is performed by using the search engine, the word groups corresponding to the user input are matched in the local word stock, and one or more corpora are obtained from the local corpus according to the word groups to be recommended to the user, so that the corpus obtaining efficiency is improved, the response time is reduced, the instantaneity of the recommended corpora is ensured, and the user experience is improved; meanwhile, under the condition that the local word stock cannot be matched with the corresponding word group, the reliability and the universality of the corpus recommendation are ensured by continuously using the search engine to obtain one or more corpora to recommend to the user.
Referring to fig. 2, on the basis of the foregoing embodiment, an embodiment of the present invention provides a phrase matching method for explaining the foregoing step S102 in detail, where the phrase matching method specifically includes the following steps:
and S1021, generating phrase identification according to the index number input by the user and corresponding to the local word stock.
Specifically, hash operation may be performed on the index number corresponding to the currently available local lexicon and the user input to generate a phrase identifier with a fixed length, that is, a token value, so as to determine whether a phrase corresponding to the token value exists in the local lexicon according to the token value.
Step S1022, query the local lexicon to determine whether there is a phrase corresponding to the phrase identifier; if the signal exists, the following step S1023 is continuously executed, and if the signal does not exist, the following step S1024 is continuously executed.
Specifically, referring to the local lexicon mapping relationship diagram shown in fig. 3, after performing a hash operation on the index number and the user input corresponding to the local lexicon to generate a phrase identifier with a fixed length (e.g., 994754697 as shown in the figure), the phrase identifier may be used in the local lexicon to perform a query to determine whether the phrase identifier has a corresponding phrase (e.g., hello, etc.) in the local lexicon.
And step S1023, phrases matched with the user input exist in the local word stock.
Still taking the phrase identifier 994754697 as an example, since the phrase "hello" corresponding to the phrase identifier exists in the local thesaurus, one or more pieces of corpus corresponding to the phrase "hello" can be continuously obtained for recommendation.
And step S1024, phrases matched with the user input do not exist in the local word stock.
Specifically, the user input is directly sent to the corresponding search engine according to the service field corresponding to the user input to obtain one or more corpora for recommendation.
Referring to fig. 4, on the basis of the foregoing embodiment, an embodiment of the present invention provides a method for obtaining corpora from a local corpus, so as to describe the foregoing step S103 in detail, which specifically includes the following steps:
step S1031, obtaining one or more corpora corresponding to the phrase and one or more preset weight parameters corresponding to the corpora from the local corpus.
Specifically, referring to the mapping relationship of the local corpus shown in fig. 5, still taking the phrase determined from the local corpus as "hello" as an example for explanation, all the corpora under "hello" can be obtained from the local corpus, that is, "how hello" and the like. It can be understood that, under different services or different scenes, under the condition that phrases matched by user input are all 'hello', finished corpora which the user wants to input are also very likely to be different, so that in order to improve the accuracy of the input corpora recommended to the user, the corpora can be sorted and screened based on the service parameters corresponding to the user input. Further, each corpus corresponds to one or more preset weight parameters, i.e., vender id (supplier identification), ThirdCat (product level three classification), etc.
Step S1032, acquiring one or more service parameters corresponding to the user input.
Step S1033, calculating a weight score corresponding to the corpus according to the weight corresponding to the service parameter when the service parameter belongs to the preset weight parameter corresponding to the corpus.
Specifically, still referring to fig. 5, taking "how you are good" and "how you are good" as an example for explanation, the preset weight parameters corresponding to "how you are good" are vender id and ThirdCat, and the corresponding weights are 1 and 5, respectively, and the preset weight corresponding to "how you are good" is color (color of the article), and the corresponding weight is 3; if the business parameters corresponding to the user input 'hello' comprise VenderID, ThirdCat and color, the weight scores corresponding to the linguistic data 'hello is in which' and 'hello is in which' can be calculated to be 6 and 3 respectively; on the basis, if only one corpus is pushed to the user, the corpus with the highest weight score is selected to be recommended to the user, namely' how you are.
Step S1034, selecting one or more corpora according to the weight scores corresponding to the corpora from high to low to recommend the corpora to the user.
Referring to fig. 6, on the basis of the foregoing embodiment, an embodiment of the present invention provides another method for recommending an input corpus, which may specifically include the following steps:
in step S601, a user input is received.
Step S602, determine whether the local thesaurus is available. If yes, continuing to execute the following step S603; if not, the following step S608 is executed.
Step S603, determine whether the version corresponding to the local thesaurus is the latest version. If the version is the latest version, the following step S605 is continuously executed; if not, the following step S604 is continued.
Step S604, obtaining the latest version of the thesaurus from the search engine.
Specifically, the latest version of the word stock is loaded from the search engine to be stored in the local cache in a remote full-scale synchronization mode. It will be appreciated that the local corpus corresponding to the latest version of the thesaurus is loaded at the same time as the latest version of the thesaurus is loaded from the search engine.
Step S605, deleting one or more of the following from the user input: stop words, punctuation marks, special characters, expressions.
Step S606, determining whether there is a phrase matching the user input in the local lexicon. If yes, continuing to execute the following step S607; if not, the following step S608 is continuously executed.
Step S607, obtaining one or more corpora from the local corpus according to the phrases to recommend to the user.
Specifically, a hash operation may be performed on the index number and the user input corresponding to the currently available local lexicon to generate a phrase identifier with a fixed length, that is, a token value, to determine whether a phrase corresponding to the token value exists in the local lexicon according to the token value, if so, it indicates that a phrase matching the user input exists in the local lexicon, and if not, it indicates that a word matching the user input does not exist in the local lexicon.
Step S608, sending the user input to a search engine to obtain one or more corpora from the search engine to recommend to the user.
Specifically, one or more corpora corresponding to the phrase and one or more preset weight parameters corresponding to the corpora are obtained from the local corpus; acquiring one or more service parameters corresponding to the user input; under the condition that the business parameters belong to preset weight parameters corresponding to the corpora, calculating weight scores corresponding to the corpora according to the weights corresponding to the business parameters; and selecting one or more corpora according to the weight scores corresponding to the corpora from high to low to recommend the corpora to the user.
Referring to fig. 7a, on the basis of the above embodiment, an embodiment of the present invention provides an apparatus 700 for recommending input corpus, including: a user input receiving module 701 and a corpus obtaining module 704; wherein the content of the first and second substances,
the user input receiving module 701 is configured to receive a user input;
the corpus obtaining module 704 is configured to generate a phrase identifier based on the user input and the index number corresponding to the local lexicon, so as to determine whether a phrase matching the user input exists in the local lexicon:
if yes, acquiring one or more linguistic data from a local corpus according to the phrases to recommend to the user;
and if not, sending the user input to a search engine so as to obtain one or more corpora from the search engine to recommend to the user.
In an optional implementation manner, the obtaining one or more corpora from a local corpus according to the phrases to recommend to the user includes:
acquiring one or more corpora corresponding to the phrases and one or more preset weight parameters corresponding to the corpora from the local corpus;
acquiring one or more service parameters corresponding to the user input;
under the condition that the business parameters belong to preset weight parameters corresponding to the corpora, calculating weight scores corresponding to the corpora according to the weights corresponding to the business parameters;
and selecting one or more corpora according to the weight scores corresponding to the corpora from high to low to recommend the corpora to the user.
In an optional embodiment, the method further comprises: a user input processing module 703; wherein the content of the first and second substances,
the user input processing module 703 is configured to delete one or more of the following contents from the user input before determining whether a phrase matching the user input exists in the local thesaurus: stop words, punctuation marks, special characters, expressions.
In an optional embodiment, the method further comprises: a lexicon preprocessing module 702; wherein the content of the first and second substances,
the word stock preprocessing module is used for judging whether the version corresponding to the local word stock is the latest version or not before judging whether the word stock has the phrase matched with the user input or not, so that the word stock of the latest version is obtained from the search engine under the condition that the version corresponding to the local word stock is not the latest version.
In an alternative embodiment, the thesaurus preprocessing module 702 is further configured to,
before judging whether a phrase matched with the user input exists in a local word stock, judging whether the local word stock is available: if yes, continuing to judge whether a phrase matched with the user input exists in a local word stock; if not, sending the user input to the search engine.
In an alternative embodiment, the thesaurus preprocessing module 702 is further configured to set the local thesaurus as unavailable when any one of the following conditions occurs: the user is a blacklist user, the current time period is a preset unavailable time period, the response time corresponding to the local word bank is greater than the threshold response time, and the number of received user inputs is greater than the threshold number of user inputs.
Specifically, referring to fig. 7b, the lexicon preprocessing module 703 further comprises one or more of the following pluggable components: a switch component 7031, a fusing current limiting component 7032, a flow distribution component 7033; wherein the content of the first and second substances,
the switch component 7031 is configured to determine whether the user is a blacklist user or whether the current time period is an unavailable time period;
the fusing current limiting component 7032 is configured to determine whether response time corresponding to the local thesaurus is greater than threshold response time;
the traffic distribution component 7033 is configured to determine whether the number of received user inputs is greater than a threshold number of user inputs.
Further, still referring to fig. 7b, the lexicon preprocessing module 703 further comprises: a pluggable service site component 7034; wherein the content of the first and second substances,
the service embedding component 7034 is configured to collect, by using an embedding technique, the user input, one or more corpora recommended to the user and corresponding to the user input, and a corpus clicked by the user, so as to update the local corpus and the local corpus.
It should be noted that, for one or more pluggable components provided in this embodiment, the pluggable components may be combined and integrated in the apparatus for recommending input corpus in an assembling and disassembling manner, so as to implement the universality of the apparatus for recommending input corpus.
Fig. 8 illustrates an exemplary system architecture 800 of a recommendation method for input corpus or a recommendation device for input expectation to which an embodiment of the present invention may be applied.
As shown in fig. 8, the system architecture 800 may include terminal devices 801, 802, 803, a network 804, and a server 805. The network 804 serves to provide a medium for communication links between the terminal devices 801, 802, 803 and the server 805. Network 804 may include various types of connections, such as wire, wireless communication links, or fiber optic cables, to name a few.
A user may use the terminal devices 801, 802, 803 to interact with a server 805 over a network 804 to receive or send messages or the like. The terminal devices 801, 802, 803 may have installed thereon various communication client applications, such as shopping applications, web browser applications, search applications, instant messaging tools, mailbox clients, social platform software, and the like.
The terminal devices 801, 802, 803 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
The server 805 may be a server that provides various services, such as a background management server that supports shopping websites browsed by users using the terminal devices 801, 802, 803. The background management server can analyze and process the received user input, and feed back the processing result, such as one or more corpora corresponding to the user input, to the terminal device.
It should be noted that the recommendation method for input corpora provided by the embodiment of the present invention is generally executed by the server 805, and accordingly, the recommendation device for input corpora is generally disposed in the server 805.
It should be understood that the number of terminal devices, networks, and servers in fig. 8 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Referring now to FIG. 9, shown is a block diagram of a computer system 900 suitable for use with a terminal device implementing an embodiment of the present invention. The terminal device shown in fig. 9 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.
As shown in fig. 9, the computer system 900 includes a Central Processing Unit (CPU)901 that can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)902 or a program loaded from a storage section 908 into a Random Access Memory (RAM) 903. In the RAM 903, various programs and data necessary for the operation of the system 900 are also stored. The CPU 901, ROM 902, and RAM 903 are connected to each other via a bus 904. An input/output (I/O) interface 905 is also connected to bus 904.
The following components are connected to the I/O interface 905: an input portion 906 including a keyboard, a mouse, and the like; an output section 907 including components such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 908 including a hard disk and the like; and a communication section 909 including a network interface card such as a LAN card, a modem, or the like. The communication section 909 performs communication processing via a network such as the internet. The drive 910 is also connected to the I/O interface 905 as necessary. A removable medium 911 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 910 as necessary, so that a computer program read out therefrom is mounted into the storage section 908 as necessary.
In particular, according to the embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 909, and/or installed from the removable medium 911. The above-described functions defined in the system of the present invention are executed when the computer program is executed by a Central Processing Unit (CPU) 901.
It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules described in the embodiments of the present invention may be implemented by software or hardware. The described modules may also be provided in a processor, which may be described as: a processor comprises a user input receiving module and a corpus obtaining module. The names of these modules do not in some cases constitute a limitation on the module itself, for example, the user input receiving module may also be described as a "module for receiving user input".
As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to comprise: receiving a user input; generating phrase identification based on the user input and the index number corresponding to the local word stock to judge whether phrases matched with the user input exist in the local word stock: if yes, acquiring one or more linguistic data from a local corpus according to the phrases to recommend to the user; and if not, sending the user input to a search engine so as to obtain one or more corpora from the search engine to recommend to the user.
According to the technical scheme of the embodiment of the invention, before the corpus recommendation is carried out by using the search engine, the word groups corresponding to the user input are matched in the local word stock, and one or more corpora are obtained from the local word stock according to the word groups so as to be recommended to the user, so that the corpus obtaining efficiency is improved, the response time is reduced, the instantaneity of the recommended corpus is ensured, and the user experience is improved; meanwhile, under the condition that the local word stock cannot be matched with the corresponding word group, the reliability and the universality of the corpus recommendation are ensured by continuously using the search engine to obtain one or more corpora to recommend to the user.
The above-described embodiments should not be construed as limiting the scope of the invention. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (18)

1. A recommendation method for input corpora is characterized by comprising the following steps:
receiving a user input;
generating phrase identification based on the user input and the index number corresponding to the local word stock to judge whether phrases matched with the user input exist in the local word stock:
if yes, acquiring one or more linguistic data from a local corpus according to the phrases to recommend to the user;
and if not, sending the user input to a search engine so as to obtain one or more corpora from the search engine to recommend to the user.
2. The method according to claim 1, wherein said obtaining one or more corpora from a local corpus according to the phrase to recommend the corpus to the user comprises:
acquiring one or more corpora corresponding to the phrases and one or more preset weight parameters corresponding to the corpora from the local corpus;
acquiring one or more service parameters corresponding to the user input;
under the condition that the business parameters belong to preset weight parameters corresponding to the corpora, calculating weight scores corresponding to the corpora according to the weights corresponding to the business parameters;
and selecting one or more corpora according to the weight scores corresponding to the corpora from high to low to recommend the corpora to the user.
3. The method for recommending input corpus of claim 1, further comprising:
before judging whether a phrase matched with the user input exists in a local word stock, deleting one or more of the following contents from the user input: stop words, punctuation marks, special characters, expressions.
4. The method for recommending input corpus of claim 1, further comprising:
before judging whether a phrase matched with the user input exists in a local word stock, judging whether the version corresponding to the local word stock is the latest version, and acquiring the word stock of the latest version from the search engine under the condition that the version corresponding to the local word stock is not the latest version.
5. The method for recommending input corpus of claim 1, further comprising:
before judging whether a phrase matched with the user input exists in a local word stock, judging whether the local word stock is available: if yes, continuing to judge whether a phrase matched with the user input exists in a local word stock; if not, sending the user input to the search engine.
6. The method for recommending input corpus according to claim 5, wherein,
setting the local thesaurus as unavailable when any one of the following conditions occurs: the user is a blacklist user, the current time period is a preset unavailable time period, the response time corresponding to the local word bank is greater than the threshold response time, and the number of received user inputs is greater than the threshold number of user inputs.
7. The method of recommending input corpus according to claim 6, wherein,
and collecting the user input, one or more corpora recommended to the user corresponding to the user input and the corpora clicked by the user through a point-burying technology so as to update the local word bank and the local corpus.
8. The method of recommending input corpus according to claim 7, wherein,
setting the local thesaurus as unavailable based on one or more pluggable components;
and collecting the user input, one or more corpora recommended to the user corresponding to the user input and the corpora clicked by the user based on a pluggable service embedding component.
9. A recommendation device for inputting corpora, comprising: the system comprises a user input receiving module and a corpus acquiring module; wherein the content of the first and second substances,
the user input receiving module is used for receiving user input;
the corpus acquiring module is configured to generate a phrase identifier based on the user input and an index number corresponding to the local lexicon, so as to determine whether a phrase matching the user input exists in the local lexicon:
if yes, acquiring one or more linguistic data from a local corpus according to the phrases to recommend to the user;
and if not, sending the user input to a search engine so as to obtain one or more corpora from the search engine to recommend to the user.
10. The apparatus for recommending input corpus of claim 9, wherein said obtaining one or more corpora from a local corpus according to said phrases for recommending to said user comprises:
acquiring one or more corpora corresponding to the phrases and one or more preset weight parameters corresponding to the corpora from the local corpus;
acquiring one or more service parameters corresponding to the user input;
under the condition that the business parameters belong to preset weight parameters corresponding to the corpora, calculating weight scores corresponding to the corpora according to the weights corresponding to the business parameters;
and selecting one or more corpora according to the weight scores corresponding to the corpora from high to low to recommend the corpora to the user.
11. The apparatus for recommending input corpus according to claim 9, further comprising: a user input processing module; wherein the content of the first and second substances,
the user input processing module is used for deleting one or more of the following contents from the user input before judging whether the phrase matched with the user input exists in the local word stock: stop words, punctuation marks, special characters, expressions.
12. The method for recommending input corpus of claim 9, further comprising: a lexicon preprocessing module; wherein the content of the first and second substances,
the word stock preprocessing module is used for judging whether the version corresponding to the local word stock is the latest version or not before judging whether the word stock has the phrase matched with the user input or not, so that the word stock of the latest version is obtained from the search engine under the condition that the version corresponding to the local word stock is not the latest version.
13. The apparatus for recommending input corpus of claim 9, wherein said lexicon preprocessing module is further configured to,
before judging whether a phrase matched with the user input exists in a local word stock, judging whether the local word stock is available: if yes, continuing to judge whether a phrase matched with the user input exists in a local word stock; if not, sending the user input to the search engine.
14. The apparatus for recommending input corpus of claim 9, wherein said lexicon preprocessing module is further configured to,
setting the local thesaurus as unavailable when any one of the following conditions occurs: the user is a blacklist user, the current time period is a preset unavailable time period, the response time corresponding to the local word bank is greater than the threshold response time, and the number of received user inputs is greater than the threshold number of user inputs.
15. The input expectation recommending apparatus according to claim 14,
the word stock preprocessing module further comprises one or more pluggable components: the device comprises a switch assembly, a fusing current limiting assembly and a flow distribution assembly; wherein the content of the first and second substances,
the switch component is used for judging whether the user is a blacklist user or whether the current time period is an unavailable time period;
the fusing current limiting component is used for judging whether the response time corresponding to the local word bank is greater than the threshold response time;
the traffic distribution component is configured to determine whether the received number of user inputs is greater than a threshold number of user inputs.
16. The apparatus for recommending input expectations of claim 15, wherein said lexicon preprocessing module further comprises: a pluggable service site-embedding component; wherein the content of the first and second substances,
and the service embedded point component is used for collecting the user input, one or more corpora recommended to the user corresponding to the user input and the corpora clicked by the user through an embedded point technology so as to update the local word stock and the local corpus.
17. An electronic device for recommending input corpora, comprising:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-8.
18. A computer-readable medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-8.
CN202110576949.6A 2021-05-26 2021-05-26 Input corpus recommendation method and device Pending CN113325959A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110576949.6A CN113325959A (en) 2021-05-26 2021-05-26 Input corpus recommendation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110576949.6A CN113325959A (en) 2021-05-26 2021-05-26 Input corpus recommendation method and device

Publications (1)

Publication Number Publication Date
CN113325959A true CN113325959A (en) 2021-08-31

Family

ID=77416911

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110576949.6A Pending CN113325959A (en) 2021-05-26 2021-05-26 Input corpus recommendation method and device

Country Status (1)

Country Link
CN (1) CN113325959A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115840510A (en) * 2023-02-21 2023-03-24 中航信移动科技有限公司 Input association method, electronic equipment and storage medium for civil aviation intelligent question answering

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115840510A (en) * 2023-02-21 2023-03-24 中航信移动科技有限公司 Input association method, electronic equipment and storage medium for civil aviation intelligent question answering

Similar Documents

Publication Publication Date Title
CN107172151B (en) Method and device for pushing information
CN106960030B (en) Information pushing method and device based on artificial intelligence
CN109614402B (en) Multidimensional data query method and device
CN108628830B (en) Semantic recognition method and device
CN109992766B (en) Method and device for extracting target words
US11436446B2 (en) Image analysis enhanced related item decision
CN107609192A (en) The supplement searching method and device of a kind of search engine
CN112818111B (en) Document recommendation method, device, electronic equipment and medium
CN112925900B (en) Search information processing method, device, equipment and storage medium
CN110874532A (en) Method and device for extracting keywords of feedback information
CN112100396A (en) Data processing method and device
CN111861596A (en) Text classification method and device
CN113657113A (en) Text processing method and device and electronic equipment
CN109753424B (en) AB test method and device
CN107247798B (en) Method and device for constructing search word bank
CN110245357B (en) Main entity identification method and device
CN110750707A (en) Keyword recommendation method and device and electronic equipment
CN111538817A (en) Man-machine interaction method and device
CN107908662B (en) Method and device for realizing search system
CN113325959A (en) Input corpus recommendation method and device
CN111435406A (en) Method and device for correcting database statement spelling errors
CN113761565A (en) Data desensitization method and apparatus
CN111126073A (en) Semantic retrieval method and device
CN107679030B (en) Method and device for extracting synonyms based on user operation behavior data
EP4071633A1 (en) Task query method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination