CN116975428A - Associated word recommendation method and device, electronic equipment and storage medium - Google Patents

Associated word recommendation method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN116975428A
CN116975428A CN202310138951.4A CN202310138951A CN116975428A CN 116975428 A CN116975428 A CN 116975428A CN 202310138951 A CN202310138951 A CN 202310138951A CN 116975428 A CN116975428 A CN 116975428A
Authority
CN
China
Prior art keywords
original
word
words
target
target word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310138951.4A
Other languages
Chinese (zh)
Inventor
杨正彪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sogou Technology Development Co Ltd
Original Assignee
Beijing Sogou Technology Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sogou Technology Development Co Ltd filed Critical Beijing Sogou Technology Development Co Ltd
Priority to CN202310138951.4A priority Critical patent/CN116975428A/en
Publication of CN116975428A publication Critical patent/CN116975428A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Machine Translation (AREA)

Abstract

The application relates to the technical field of natural language processing, in particular to a related word recommending method, a related word recommending device, electronic equipment and a storage medium, which are used for improving information consulting efficiency. The method comprises the following steps: receiving input keywords, and obtaining target words matched with the keywords from the original words; acquiring at least one type of original associated word corresponding to a target word and the correlation degree between each original associated word and the target word from a preset corpus; acquiring recommended evaluation values of the original associated words based on the association relation and the correlation degree between the original associated words and the target words; based on each recommendation evaluation value, screening candidate associated words with recommendation evaluation values meeting preset recommendation conditions from each original associated word; and presenting the screened candidate associated words in the display interface. Because only the screened candidate related words are needed to be presented in the display interface, the application can meet the query requirement of the related words of the object, reduce the occupation proportion of the object in the display interface and improve the information review efficiency.

Description

Associated word recommendation method and device, electronic equipment and storage medium
Technical Field
The present application relates to the field of natural language processing technologies, and in particular, to a related word recommendation method, a related word recommendation device, an electronic device, and a storage medium.
Background
Currently, when a target object uses a search engine to query a keyword, there is often a need to refer to related words of the keyword, so in order to meet the related word query requirement in the related art, different types of related words are usually further displayed in a display interface of the keyword.
However, when the related words are displayed in the above manner, each type of related word needs to occupy dedicated display areas, and as the number of related words increases, the interface proportion occupied by each display area in the display interface is gradually increased, which seriously affects the review of other information in the display interface by the target object.
For example, referring to fig. 1, assume that a keyword input by a target object is "good", and that related words include two types, namely a hyponym and an anticnym, in a "good" presentation interface, a presentation area 1 is used for presenting the hyponym, including: the display area 2 is used for displaying the disambiguation, and comprises: bad, inferior, gangster, bad, bad. As can be seen from fig. 1, the display area 1 and the display area 2 squeeze the display space of other information in the display interface, and if the types and the number of the related words further increase, the display space of other information is further squeezed, so that it is difficult for the target object to quickly review the required content.
Disclosure of Invention
The embodiment of the application provides a related word recommending method, a related word recommending device, electronic equipment and a storage medium, which are used for improving information consulting efficiency.
The related word recommending method provided by the embodiment of the application comprises the following steps:
receiving input keywords, and obtaining target words matched with the keywords from the original words;
acquiring at least one type of original associated word corresponding to the target word and the relevance between each original associated word and the target word from a preset corpus, wherein the association between different types of original associated words and the target word is different;
acquiring respective recommendation evaluation values of the original associated words based on the association relation between the original associated words and the target words and the respective correlation degree;
screening candidate associated words with corresponding recommended evaluation values meeting preset recommendation conditions from the original associated words based on the obtained recommended evaluation values;
and presenting the screened candidate associated words in the display interface.
The related word recommending device provided by the embodiment of the application comprises the following components:
a receiving unit, configured to receive an input keyword, and obtain a target word matched with the keyword from each original word;
The acquisition unit is used for acquiring at least one type of original associated word corresponding to the target word from a preset corpus and the relevance between each original associated word and the target word, wherein the association relations between different types of original associated words and the target word are different;
a determining unit, configured to obtain respective recommendation evaluation values of the original associated words based on association relationships between the original associated words and the target words, and respective correlation degrees;
the screening unit is used for screening candidate associated words, corresponding to the recommended evaluation values, from the original associated words based on the obtained recommended evaluation values;
and the presentation unit is used for presenting the screened candidate associated words in the display interface.
Optionally, the at least one type includes a co-occurrence type, the co-occurrence type characterizing: the corresponding original associated word is similar to the use scene of the target word; the acquisition unit is specifically configured to:
based on the corpus, obtaining a plurality of sentences to be used, and respectively splitting the sentences to be used into a plurality of corresponding original phrases, wherein at least two original words contained in each original phrase are adjacent in the belonging sentences to be used;
Screening out each target phrase containing the target word from each obtained original phrase;
and obtaining each original associated word with the co-occurrence type corresponding to the target word based on other original words except the target word in each target word group.
Optionally, the acquiring unit is specifically configured to:
searching related phrases of each target phrase from the original phrases based on other original words, wherein the related phrases comprise other original words in the corresponding target phrases;
counting second frequency information of each related original word except for the other original words in each related phrase, wherein the second frequency information represents: the occurrence times of the corresponding related original words in the related phrases;
and screening out the related original words corresponding to the second frequency information meeting the second frequency screening condition from the related original words based on the second frequency information, and taking the related original words corresponding to the target word as the original associated words with the co-occurrence type.
Optionally, the at least one type includes a near shape type, the near shape type characterizing: the corresponding original associated word is similar to the composition structure of the target word; the acquisition unit is specifically configured to:
Respectively obtaining first similarity between the target word and each original word in the corpus;
and screening out the original words with the corresponding first similarity meeting the first similarity condition from the original words based on the obtained first similarities, and taking the original words with the corresponding near-shape type as the original associated words with the corresponding target words.
Optionally, the at least one type includes a co-translation type, the co-translation type characterizing: the corresponding original associated word is similar to the paraphrasing of the target word; the acquisition unit is specifically configured to:
obtaining a plurality of paraphrases of the target word and a plurality of paraphrases of each original word from the corpus;
determining a paraphrasing set corresponding to each original word based on a plurality of paraphrases corresponding to the target word and a plurality of paraphrases corresponding to each original word, wherein each paraphrasing set comprises: a plurality of definitions corresponding to the corresponding original word, identical definitions between the plurality of definitions corresponding to the target word;
and screening out the corresponding original words with the same paraphrasing quantity meeting the preset paraphrasing condition from the original words based on the same paraphrasing quantity contained in each paraphrasing set, and taking the corresponding original words with the same paraphrasing type as the original associated words corresponding to the target word.
Optionally, the at least one type includes a vector type, the vector type characterizing: the corresponding original word vector of the original associated word is similar to the composition structure of the target word vector of the target word; the acquisition unit is specifically configured to:
obtaining the target word vector and each original word vector from the corpus;
respectively determining second similarity between the target word vector and each original word vector;
and screening out the original words with the second similarity meeting the second similarity condition from the original words based on the determined second similarity, and taking the original words with the second similarity meeting the second similarity condition as the original associated words with the vector types corresponding to the target words.
Optionally, the at least one type includes a resolution type, the resolution type characterizing: the compared times of the corresponding original associated word and the target word accord with the preset times condition; the acquisition unit is specifically configured to:
obtaining a plurality of query log data from the corpus, wherein each query log data comprises an input query string;
screening each target character string conforming to a preset syntax rule from each query character string based on a pre-constructed regular expression, wherein the target character string contains the target word;
And obtaining each original associated word with the resolution type corresponding to the target word based on the original words except the target word in each target character string.
Optionally, the acquiring unit is specifically configured to:
counting respective third frequency information of original words except the target word in each target character string, wherein the third frequency information represents: the number of occurrences of the corresponding original word in the target strings;
and screening out the original words, corresponding to the third frequency information, which meet a third frequency screening condition from the original words except the target word in each target character string based on the third frequency information, and taking the original words, corresponding to the target word, as each original associated word with the resolution type.
The electronic equipment provided by the embodiment of the application comprises a processor and a memory, wherein the memory stores a computer program, and when the computer program is executed by the processor, the processor is caused to execute the steps of any related word recommending method.
An embodiment of the present application provides a computer readable storage medium including a computer program for causing an electronic device to execute the steps of any one of the related word recommendation methods described above when the computer program is run on the electronic device.
Embodiments of the present application provide a computer program product comprising a computer program stored in a computer readable storage medium; when the processor of the electronic device reads the computer program from the computer-readable storage medium, the processor executes the computer program, so that the electronic device performs the steps of any one of the related word recommendation methods described above.
The application has the following beneficial effects:
according to the related word recommending method, device, electronic equipment and storage medium, firstly, input keywords are received, target words matched with the keywords are obtained from the original words, at least one type of original related words corresponding to the target words and the relativity of the original related words and the target words can be obtained from a preset corpus, the query requirement of a target object on the related words of the keywords can be met, then, based on the relativity of the original related words and the target words and the relativity of the original related words, the recommended evaluation value of the original related words is obtained, based on the recommended evaluation values, candidate related words, corresponding to the recommended evaluation values, are screened out from the original related words, and finally, only the screened candidate related words are needed to be displayed in a display interface.
Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the application. The objectives and other advantages of the application will be realized and attained by the structure particularly pointed out in the written description and claims thereof as well as the appended drawings.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:
FIG. 1 is a schematic diagram of a display interface in the related art;
FIG. 2 is an alternative schematic diagram of an application scenario in an embodiment of the present application;
FIG. 3 is a flowchart illustrating an implementation of a related word recommendation method according to an embodiment of the present application;
FIG. 4 is a schematic diagram of a method for determining a recommendation score according to an embodiment of the present application;
FIG. 5 is a schematic diagram of a related word screening method according to an embodiment of the present application;
FIG. 6 is a logic diagram of a related word recommendation method according to an embodiment of the present application;
FIG. 7A is a schematic diagram showing an interface according to an embodiment of the present application;
FIG. 7B is a schematic diagram of another display interface according to an embodiment of the application;
FIG. 8A is a diagram illustrating a method for splitting an original phrase according to an embodiment of the present application;
FIG. 8B is a schematic diagram of a co-occurrence word screening method according to an embodiment of the present application;
fig. 9A is a schematic structural view of an image processing apparatus in an embodiment of the present application;
FIG. 9B is a diagram of a dictionary construction method according to an embodiment of the present application;
FIG. 9C is a diagram illustrating another dictionary construction method according to an embodiment of the present application;
FIG. 9D is a schematic diagram of another co-occurrence screening method according to an embodiment of the present application;
FIG. 10 is a diagram illustrating a method for screening a near term according to an embodiment of the present application;
FIG. 11 is a diagram illustrating a method for screening homonyms according to an embodiment of the application;
FIG. 12 is a diagram of a method for screening vector words according to an embodiment of the present application;
FIG. 13A is a schematic diagram of yet another dictionary construction method in an embodiment of the present application;
FIG. 13B is a diagram illustrating a method for screening a resolved word according to an embodiment of the present application;
FIG. 13C is a schematic overall flow chart of a related word recommendation method according to an embodiment of the present application;
FIG. 14 is a schematic structural diagram of a related word recommendation device according to an embodiment of the present application;
Fig. 15 is a schematic diagram of a hardware composition structure of an electronic device to which the embodiment of the present application is applied;
fig. 16 is a schematic diagram of a hardware configuration of another electronic device to which the embodiment of the present application is applied.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the technical solutions of the present application, but not all embodiments. All other embodiments, based on the embodiments described in the present document, which can be obtained by a person skilled in the art without any creative effort, are within the scope of protection of the technical solutions of the present application.
Some of the concepts involved in the embodiments of the present application are described below.
Original words: the minimum word-forming structural form unit for forming sentence and article may be a Chinese character, for example, good, an English word, for example, good, or a word in other languages, which are listed here one by one.
Target word: referring to the original words that match the keywords, the target word is actually one of the original words, e.g., the original word contains: good, beautiful, ugly, bad, the keyword of input is: if so, the target word is: good.
Original associated word: referring to a word having an association with an original word, the original association word may be a word, for example, fine, or a phrase, for example, well done. The association relationship between the original association word and the target word is different, and the types of the original association word are different.
Corpus: the application relates to a large-scale electronic text library which is scientifically sampled and processed, wherein language materials which are actually appeared in actual use of languages are stored, original words and original associated words in the application are all taken from a corpus, and association between the original words and the original associated words can be established according to data in the corpus, and the corpus in the embodiment of the application can comprise English dictionary, english training corpus, object query log data and training model word vector data.
Levenstein (Levenshtein) distance: referring to the minimum number of editing operations required to transition from one to another between two strings, the allowed editing operations include: replacing one character with another, inserting one character, and deleting one character. The Levenshtein distance may be used to measure the similarity between two strings, for example, in an embodiment of the present application, the Levenshtein distance between the target word and the original word may be calculated to determine the first similarity between the target word and the original word.
Dictionary: a data structure that contains a series of key-value pairs is an abstract data structure. After the dictionary is constructed, a value (abbreviated as y) can be quickly found through a key (abbreviated as k). In the embodiment of the application, the corresponding relation between the original words and the original associated words can be stored in the dictionary in a key-value form, so that the efficiency of searching each original associated word corresponding to the target word is improved.
Natural language processing (Nature Language processing, NLP) is an important direction in the fields of computer science and artificial intelligence. It is studying various theories and methods that enable effective communication between a person and a computer in natural language. Natural language processing is a science that integrates linguistics, computer science, and mathematics. Thus, the research in this field will involve natural language, i.e. language that people use daily, so it has a close relationship with the research in linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic questions and answers, knowledge graph techniques, and the like.
It will be appreciated that in the specific embodiments of the present application, related data such as user information, query log data, etc. are involved, and when the above embodiments of the present application are applied to specific products or technologies, user permissions or consents need to be obtained, and the collection, use and processing of related data need to comply with relevant laws and regulations and standards of relevant countries and regions.
The following briefly describes the design concept of the embodiment of the present application:
currently, when a target object uses a search engine to query a keyword, there is often a need to refer to related words of the keyword, so in order to meet the related word query requirement in the related art, different types of related words are usually further displayed in a display interface of the keyword.
However, when the related words are displayed in the above manner, each type of related word needs to occupy dedicated display areas, and as the number of related words increases, the interface proportion occupied by each display area in the display interface is gradually increased, which seriously affects the review of other information in the display interface by the target object.
In view of this, the embodiment of the application provides a related word recommending method, an apparatus, an electronic device and a storage medium, firstly, receive an input keyword, obtain a target word matched with the keyword from each original word, obtain at least one type of each original related word corresponding to the target word from a preset corpus, and the relativity of each original related word and the target word, so as to meet the query requirement of a target object on the related word of the keyword, then, based on the relativity between each original related word and the target word, and the relativity of each original related word, obtain respective recommended evaluation values of each original related word, screen out candidate related words corresponding to the recommended evaluation values according to the preset recommended conditions from each original related word based on the obtained recommended evaluation values, finally, only display the screened candidate related words in the display interface, and based on the above mode, even if the type and the number of the original related words are increased, the occupation proportion of the candidate related words in the display interface can be reduced, further, the influence of the candidate related words on other information in the display interface can be reduced, the target object can be quickly referred to the display interface, and the content can be quickly referred to the required to the display interface.
The preferred embodiments of the present application will be described below with reference to the accompanying drawings of the specification, it being understood that the preferred embodiments described herein are for illustration and explanation only, and not for limitation of the present application, and embodiments of the present application and features of the embodiments may be combined with each other without conflict.
Fig. 2 is a schematic diagram of an application scenario according to an embodiment of the present application. The application scenario diagram includes two terminal devices 210 and a server 220.
In the embodiment of the application, the terminal equipment comprises, but is not limited to, mobile phones, tablet computers, notebook computers, desktop computers, electronic book readers, intelligent voice interaction equipment, intelligent household appliances, vehicle-mounted terminals and other equipment; the terminal device may be provided with a client related to the related word recommendation, where the client may be software (such as a browser, translation software, etc.), or may be a web page, an applet, etc., and the server may be a background server corresponding to the software or the web page, the applet, etc., or a server specifically used for performing the related word recommendation, and the application is not limited in detail. The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, content delivery networks (Content Delivery Network, CDN), basic cloud computing services such as big data and artificial intelligent platforms, and the like.
It should be noted that, the related word recommending method in the embodiment of the present application may be executed by an electronic device, and the electronic device may be a server or a terminal device, that is, the method may be executed by the server or the terminal device alone or may be executed by both the server and the terminal device together. For example, when the method is executed by the server and the terminal equipment together, the terminal equipment receives an input keyword and sends the keyword to the server, the server firstly obtains a target word matched with the keyword from each original word, and obtains at least one type of each original associated word corresponding to the target word and the correlation degree of each original associated word and the target word from a preset corpus, then obtains respective recommended evaluation values of each original associated word based on the correlation relation between each original associated word and the target word and the correlation degree of each original associated word, and finally screens out candidate associated words, corresponding to the recommended evaluation values, according to preset recommended conditions from each original associated word based on the obtained recommended evaluation values, and the server sends the screened candidate associated words to the terminal equipment.
In an alternative embodiment, the communication between the terminal device and the server may be via a communication network.
In an alternative embodiment, the communication network is a wired network or a wireless network.
It should be noted that, the number of terminal devices and servers shown in fig. 2 is merely illustrative, and the number of terminal devices and servers is not limited in practice, and is not particularly limited in the embodiment of the present application.
In the embodiment of the application, when the number of the servers is multiple, the multiple servers can be formed into a blockchain, and the servers are nodes on the blockchain; the related word recommending method disclosed by the embodiment of the application can be used for storing the related corpus on a blockchain and the like.
In addition, the embodiment of the application can be applied to various scenes, including not only natural language processing scenes, but also but not limited to cloud technology, artificial intelligence, intelligent traffic, auxiliary driving and other scenes.
The related word recommendation method provided by the exemplary embodiment of the present application will be described below with reference to the accompanying drawings in conjunction with the application scenario described above, and it should be noted that the application scenario described above is only shown for the convenience of understanding the spirit and principle of the present application, and the embodiment of the present application is not limited in any way in this respect.
Referring to fig. 3, a flowchart of an implementation of a related word recommendation method according to an embodiment of the present application is shown, taking an execution subject as a terminal device as an example, where the implementation process of the method includes steps S31 to S35 as follows:
s31: the terminal equipment receives the input keywords and obtains target words matched with the keywords from the original words;
specifically, taking an input keyword as a good example, a target word matched with the keyword is good, and taking an input keyword as a good example, the matched target word is good, that is, the keyword and the target word are the same word. It should be noted that, in the embodiment of the present application, examples are mainly given by taking chinese and english as examples, and in fact, the related word recommendation method in the present application may also be applied to other languages, which is not limited herein specifically.
Hereinafter, an example will be mainly described in which an input keyword is good.
S32: the terminal equipment acquires at least one type of original associated word corresponding to the target word from a preset corpus, and the relevance between each original associated word and the target word;
the types of the original associated words comprise: the co-occurrence type, the near-shape type, the co-translation type, the vector type and the resolution type, the original associated words of different types have different association relations with the target word, and the original associated words corresponding to the co-occurrence type representation are similar to the use scene of the target word, for example, for the target word good, some words are usually matched with the good for use, such as for and at, and for and at are both the original associated words of the co-occurrence type of good (abbreviated as co-occurrence word); the near-form type representation corresponds to the original association word and the target word, the composition structure is similar, and for the target word good, gold, good and good are all near-form type original association words (short near-form words) of good; the vector type represents the similarity of the composition structure of the original word vector of the corresponding original associated word and the target word vector of the target word, is similar to the near-form type, and is the similarity of the original associated word and the target word from the composition structure of the word vector, and for the target word good, the Benefit and the good are the original associated words (vector words for short) of the vector type of good; the corresponding original associated word of the homonym type representation is similar to the paraphrasing of the target word, and for the target word good, well and best are all original associated words of the homonym type of good (homonym for short); the number of times that the original associated word corresponding to the analysis type representation is compared with the target word accords with a preset number of times condition, namely the original associated word of the analysis type is often distinguished and analyzed by an object together with the target word, and for the target word good, well and best are also the original associated word of the analysis type of good (simply referred to as the analysis word).
It should be noted that, the types of the original related words are exemplified by the co-occurrence type, the near-form type, the co-translation type, the vector type and the resolution type, and in fact, the types of the original related words are not limited to the above types, and the types of the original related words corresponding to the target words obtained in the present application may be one or more of the above types, or may be other types, which is not particularly limited herein.
Further, the relevance between each original associated word and the target word is different for the same type of original associated word, for example, in co-occurrence words of good, the probability of co-occurrence of for and good is higher, and the relevance of for is also higher.
S33: the terminal equipment obtains respective recommendation evaluation values of the original associated words based on the association relation between the original associated words and the target words and the respective correlation degree;
specifically, by combining the association relation and the correlation degree between the original association word and the target word to obtain the recommended evaluation value of the original association word, the original association word can be screened according to the recommended evaluation value. The association relationship between the original association word and the target word is determined by the type of the original association word, for example, the type of the original association word is a co-occurrence type, and the association relationship is a co-occurrence relationship. The recommendation evaluation value of the original related word can be calculated by the following formula:
Wherein s is i The i-th original related word t of the target word w i The recommendation evaluation value f is a type score mapping function, the function can obtain the type score corresponding to each type, the input of the function is the type name, the output is the type score, the specific score of each type can be set after being confirmed according to the actual requirement of the product, alpha is a discount parameter, 0.8 d can be taken ij For the original association word t i And (3) in the ordering position of the original associated word of the same type, if the correlation degree of ti with the target word in the original associated word of the same type is highest, the ordering position is 1, the ordering position is 2 if the correlation degree is next highest, and then the following steps are repeated.
At least one type of original associated word corresponding to the target word is fused, as shown in fig. 4, which is a schematic diagram of a method for determining a recommendation evaluation value in the embodiment of the present application, where the original associated word 1, the original associated word 2, the original associated word 3, the original associated word 4 and the original associated word 5 corresponding to the target word 1 are respectively calculated, and the recommendation evaluation values 1, 2, 3, 4 and 5 corresponding to each original associated word are respectively calculated, so that the original associated word can be screened according to the obtained recommendation evaluation values.
S34: based on the obtained recommended evaluation values, the terminal equipment screens candidate associated words, of which the corresponding recommended evaluation values accord with preset recommendation conditions, from the original associated words;
Specifically, preset recommendation conditions, the recommendation evaluation values are ranked from high to low, the recommendation evaluation value of the first three in the ranking result is taken, referring to fig. 5, which is a schematic diagram of a related word screening method in the embodiment of the present application, still taking the target word 1 as an example, the recommendation evaluation value of the original related word 1 is 0.9, the recommendation evaluation value of the original related word 2 is 0.4, the recommendation evaluation value of the original related word 3 is 0.5, the recommendation evaluation value of the original related word 4 is 0.8, and the recommendation evaluation value of the original related word 5 is 0.7, and the screened candidate recommendation words are the original related word 1, the original related word 4 and the original related word 5.
Referring to fig. 6, which is a logic diagram of a related word recommendation method in an embodiment of the present application, taking a target word as a good as an example, first, the sorting result of each type of original related word of the good is:
the near-shape word: gold, god, food;
word vector (vector word): benefit, food;
then, each type of original related word will synthesize a recommended word (original related word):
the type score of the near-shape word is 10000, the type score of the word vector is 9000, and the discount coefficient is: 0.8, calculating a ranking score (recommended evaluation value):
gold=1*10000=10000
god=0.8*10000=8000
food=0.8*0.8*10000+0.8*9000=13600
benefit=1*9000=9000
the final recommended word sequence can be obtained according to the sorting score as follows:
S35: and the terminal equipment presents the screened candidate associated words in the display interface.
Specifically, referring to fig. 7A, which is a schematic diagram of a display interface in an embodiment of the present application, for a keyword good, the candidate related words selected are: food, gold, benefit, god, the candidate related words can be displayed in the display area 1, and compared with fig. 1, it can be seen that through the related word recommending method in the embodiment of the application, multiple types of recommended word data are fused to obtain one set of ordered recommended word data, and when the candidate related words are displayed in the display interface, only one display space is occupied, and the display space of other information is not extruded, so that all online display of more types of recommended word data can be supported.
Referring to fig. 7B, a schematic diagram of another display interface in an embodiment of the present application may further display the selected candidate related words in the display area 2 for the keyword good.
It should be noted that, the display interfaces in fig. 7A and fig. 7B are only two listed interfaces that can display candidate related words, and actually, the candidate related words may also be displayed in any interface that needs to make related word recommendation, which is not limited in detail herein.
In the embodiment of the application, firstly, input keywords are received, target words matched with the keywords are obtained from the original words, at least one type of original associated words corresponding to the target words and the relativity of the original associated words and the target words are obtained from a preset corpus, the query requirement of the target object on the associated words of the keywords can be met, then, based on the relativity between the original associated words and the target words and the relativity of the original associated words, the recommended evaluation value of the original associated words is obtained, based on the recommended evaluation values, the candidate associated words, corresponding to the recommended evaluation values, meeting preset recommendation conditions are screened out from the original associated words, and finally, the screened candidate associated words are only needed to be displayed in the display interface.
In an alternative embodiment, after the original associated word corresponding to the target word is obtained in step S32, candidate associated words may be screened in the following two ways:
mode one: firstly, marking a batch of ordered original associated word sequences to construct training samples, then learning the training samples by using a neural network and other machine learning models to obtain a trained model, ranking and scoring the original associated words by using the trained model, ranking the original associated words according to the scores output by the model, and screening candidate associated words.
Mode two: and after merging various original associated words, directly displaying the combined original associated words as candidate associated words to the object without sorting, and adjusting the sequence of the candidate associated words according to the actual clicking times of the object on different original associated words.
In an alternative embodiment, in step S32, each original associated word of at least one type corresponding to the target word is obtained by the following steps S321 to S323:
s321: based on a corpus, obtaining a plurality of sentences to be used, and respectively splitting the sentences to be used into a plurality of corresponding original phrases;
s322: screening out each target phrase containing the target word from each obtained original phrase;
S323: and obtaining each original associated word with the co-occurrence type corresponding to the target word based on other original words except the target word in each target word group.
Specifically, at least two original words included in each original phrase are adjacent in the attributed sentence to be used, for example, the sentence to be used is "an apple on a tree", and the splitting the obtained original phrase includes: the tree is provided with one, one apple and one apple, and the target word is an apple, and the target word group comprises: apples, other primitive words include: and obtaining the co-occurrence words of the apples based on the fruits and the fruits.
Referring to fig. 8A, which is a schematic diagram of a splitting method of an original phrase in an embodiment of the present application, a corpus can be regarded as a set of a series of sentences s1, s2, …, sn (n is the number of sentences). For each sentence si (where 1.ltoreq.i.ltoreq.n), its word sequence is noted wi1, wi2, …, wik (k is the number of si sentence words), and all bigrams (bigrams) below have k-1, the frequency of these bigrams being 1. Traversing all sentences s1, s2, …, sn to get all bigrams in the corpus: g1 G2, …, gm (m is the number of all bigrams), and combining the frequencies corresponding to each bigram to obtain frequency data of all bigrams: c1 C2, …, cm. And then the co-occurrence word of the target word can be obtained according to the frequency data.
In an alternative embodiment, the co-occurrence type of the original related words may be obtained in two ways, mode 1: step S323 may be implemented as:
first, first frequency information of each other original word is counted, wherein the first frequency information is characterized by: the number of occurrences of the corresponding other original words in each target phrase; and screening out other original words corresponding to the first frequency information, which meet the first frequency screening condition, from the other original words based on the first frequency information, wherein the other original associated words are used as original associated words corresponding to the target words and have co-occurrence types.
Specifically, in the method 1, for the original phrase of the bigram, that is, each original phrase includes only two original words, the frequency information of each original phrase is the frequency information of other included original words, and the first frequency filtering condition may be that the first 3 first frequency information is the highest, for example, the other original words include: there are, one, apple, fruit, there are first frequency information of 15, first frequency information of 20, first frequency information of 10, first frequency information of apple of 30, first frequency information of fruit of 25, the co-occurrence words screened out are: 1. apple and fruit.
Referring to fig. 8B, which is a schematic diagram of a co-occurrence word screening method in an embodiment of the present application, for each target word w, find all bigrams containing w: g1 G2, …, gp (p is the number of all bigrams containing the term w), then get the word and frequency co-occurrence with w: (w 1, c 1), (w 2, c 2), …, (wt, ct), t is the number of co-occurrence words corresponding to w, then ordering the data from high to low according to frequency, and then intercepting top10 to obtain the co-occurrence word of the target word w: v1, v2, …, v10.
It should be noted that, the above-mentioned screening method of co-occurrence words is only exemplified by the target word w, and co-occurrence words of original words may also be screened based on the above-mentioned method, which is not described herein.
Based on the mode, the original related words of the co-occurrence type corresponding to the target word can be obtained, manual classification is not needed, the determination efficiency of the co-occurrence words is improved, the types of the related words are enriched, and the object is ensured to be capable of referring to the needed information.
Mode 2: step S323 may be implemented as:
firstly, searching related phrases of each target phrase from each original phrase based on each other original phrase; then, counting the second frequency information of each related original word except other original words in each related phrase; and finally, screening out the relevant original words corresponding to the second frequency information meeting the second frequency screening condition from the relevant original words based on the second frequency information, and taking the relevant original words corresponding to the target words as the original associated words with the co-occurrence type.
Wherein, the related phrase contains other original words in the corresponding target phrase, and the second frequency information characterizes: the number of occurrences of the corresponding related original word in each related phrase, specifically, mode 2 is mainly aimed at original phrases containing more than two original words, and each original related word of the co-occurrence type determined based on mode 2 can also be called as a co-background word of the target word. For example, the target word is true, and the target phrase includes: you are true, sugar is true sweet, other original words include: you, good, sugar, sweet, your really good related phrases are: the related phrases of good and bad sweet of sugar are as follows: the sugar is not sweet, the sugar is very sweet, and related original words except other original words in related phrases comprise: if the second frequency information is 100 and the second frequency information is 200, the filtering can be performed based on the second frequency information to obtain each original related word with the co-occurrence type.
Referring to fig. 9A, which is a schematic diagram of another splitting method of original phrases in an embodiment of the present application, taking each original phrase as a trigram (trigram) as an example, first, using english training corpus in a corpus to calculate the frequency of all trigrams. A corpus is a collection that can be regarded as a series of sentences s1, s2, …, sn (n is the number of sentences). For each sentence si (1.ltoreq.i.ltoreq.n), the word sequence is noted wi1, wi2, …, wik (k is the number of si sentence words), and k-2 are all trigrams below, and the frequency of these trigrams is 1. Traversing all sentences s1, s2, …, sn to obtain all trigrams in the corpus: g1 G2, …, gm (m is the number of all trigrams), and combining the frequencies corresponding to each trigram gives the frequencies of all trigrams: c1 C2, …, cm.
Referring to fig. 9B, which is a schematic diagram of a dictionary construction method in an embodiment of the present application, all obtained trigrams are traversed, g is marked as a current trigram, the frequency is marked as c, the g structure is (w 1, w2, w 3), the (w 1, w 3) is used as a key, the value is a list type, and the (w 2, c) is added into the value. After all trigrams are traversed, a dictionary part 1 with a list of (first, last) key and (middle, frequency) value is obtained.
Referring to fig. 9C, a schematic diagram of another dictionary construction method according to an embodiment of the present application is shown, each key of the dictionary 1 is traversed, a value with a type of list is taken out, and two-layer traversal is performed on the value. The specific iteration process comprises the following steps: the first layer traversal middle word is denoted as A1, the frequency is denoted as A1, the second layer traversal middle word is denoted as B1 (b1+.a1), and the frequency is denoted as B1. A dictionary is established, key1=a1, value1= (B1, a1+b2) is added to the dictionary, and key2=b1, value2= (A1, a1+b2) is added to the dictionary. After the data traversal is completed, a dictionary part 2 can be constructed, and the information of the same background words and the frequency thereof under each original word is stored in the dictionary part.
Referring to fig. 9D, a schematic diagram of another co-occurrence word screening method according to an embodiment of the present application is shown, where dictionary subject 2 is traversed, and for each target word w, there are a series of original related words w1, w2, …, wk (k is the number of co-background words corresponding to the term w), and the corresponding frequencies: c1 C2, …, ck, all original related words are ordered according to the frequency information, and then the top10 is cut off to obtain the same background word corresponding to the target word w: v1, v2, …, v10.
Based on the mode, the original related words of the other co-occurrence type corresponding to the target word can be obtained, namely the same background word is not needed to be manually classified, the determination efficiency of the same background word is improved, the types of the related words are enriched, and the object is ensured to be capable of referring to the needed information.
The method for determining the same background word as the original word will be described in detail below by taking trigram "verygood food" and "verybad food" as examples. The other original words are the 'first word and last word' in the trigram, good and bad have the same other original words, good and bad are the same background words, the trigram and the corresponding frequency: the very good food is 100,very bad food, the very good dog is 90, the very bad dog is 80, and the very nice dog is 70
The structure of the part 1 is as follows: "(first word, last word)" is key, "(intermediate word, frequency)" constitutes list as value, and the obtained direct 1 is constructed:
k1:(very,food)
v1:[(good,100),(bad,20)]
k2:(very,dog)
v2:[(good,90),(bad,80),(nice,70)]
the structure of the part 2 is as follows: "intermediate word" is key, "(same background word, score)" makes up list as value, wherein score extraction frequency is added, and the obtained subject 2 is constructed:
k1:good
v1:[(bad,290),(nice,160)]
k2:bad
v2:[(good,290),(nice,150)]
k3:nice
v3:[(good,160),(bad,150)]
taking good as an example to describe in detail, first traverse the first piece of data in the subject 1:
k1:(very,food)
v1:[(good,100),(bad,20)]
and performing two-layer traversal on the value:
A first layer: good bad
A second layer: good bad
At this time, the data corresponding to good in the subject 2 is:
[(bad,120)]
then traversing the 2 nd data in the section 1:
k2:(very,dog)
v2:[(good,90),(bad,80),(nice,70)]
and performing two-layer traversal on the value:
a first layer: good bad nice
A second layer: good bad nice
At this time, the data corresponding to good in the subject 2 is: [ (bad, 290), (nice, 160) ], i.e. bad and nice are both good and corresponding to the same background word.
In an alternative embodiment, the original related words of the near-form type may be obtained through steps S41-S42:
s41: respectively obtaining first similarity between the target word and each original word in a corpus;
s42: and screening out each original word with the corresponding first similarity meeting the first similarity condition from the original words based on the obtained first similarities, and taking the original word with the corresponding near-shape type as each original associated word corresponding to the target word.
Specifically, the first similarity may be obtained by calculating a Levenshtein distance between original words of the target word, for example, for the target word good, the first similarity with the original word god is 0.7, the first similarity with the original word gold is 0.5, the first similarity with the original word good is 0.8, the first similarity condition is two of the highest first similarities, and god and good are original related words with a near shape type of good.
Referring to fig. 10, a schematic diagram of a screening method of near-shape words in an embodiment of the present application is shown, for each target word w, traversing original words wj (1 is less than or equal to j is less than or equal to n, and wj is not equal to w) of a dictionary, then sequentially calculating Levenshtein distances dj between w and wj, sorting all the original words according to distance results d1, …, dn, and then taking a result of top10 to obtain near-shape words v1, v2, …, v10 of the target word w.
Based on the mode, the original associated word of the near-shape type corresponding to the target word can be obtained, manual classification is not needed, the determination efficiency of the near-shape word is improved, the types of the associated word are enriched, and the object is ensured to be capable of referring to the needed information.
In an alternative embodiment, the original related words of the same translation type may be obtained by steps S51-S53:
s51: obtaining a plurality of paraphrases of the target word and a plurality of paraphrases of each original word from a corpus;
s52: determining a paraphrasing set corresponding to each original word based on a plurality of paraphrases corresponding to the target word and a plurality of paraphrases corresponding to each original word;
s53: and screening out the corresponding original words with the same paraphrasing quantity meeting the preset paraphrasing condition from the original words based on the same paraphrasing quantity contained in each paraphrasing set, and taking the corresponding original words with the same paraphrasing type as the corresponding original associated words with the same paraphrasing type of the target word.
Specifically, the dictionary in the corpus comprises terms and definitions, a plurality of definitions exist under one term, a plurality of definitions of a target word and a plurality of corresponding definitions of each original word are obtained, each definition set comprises a plurality of corresponding definitions of the corresponding original word, the same definitions among the plurality of corresponding definitions of the target word, namely, the definition set comprises the definitions commonly owned by the target word and the original word, if the number of the same definitions contained in the definition set of the original word is more, the more similar the definitions of the original word and the target word are, so that n original words which are closer to the definitions of the target word can be screened out through preset definition conditions, and the n original associated words with the same interpretation type are used as the target word.
For example, the paraphrasing of target word 2 includes: paraphrasing 1, paraphrasing 2, paraphrasing 3 and paraphrasing 4, the paraphrasing of the original word 2 comprising: paraphrasing 1, paraphrasing 2, paraphrasing 5 and paraphrasing 6, the paraphrasing of the original word 3 comprising: paraphrasing 1, paraphrasing 2, paraphrasing 3 and paraphrasing 7, then the corresponding paraphrasing set of the original word 2 comprises: paraphrasing 1, paraphrasing 2, number 2, the paraphrasing set corresponding to the original word 3 comprises: paraphrasing 1, paraphrasing 2 and paraphrasing 3, the number is 3, the preset paraphrasing condition is to take the original word with the largest number of the same paraphrasing, and the original word 3 is each original associated word with the same translation type of the target word 2.
Referring to fig. 11, a schematic diagram of a method for screening co-translated words in an embodiment of the present application is shown, in which a paraphrasing set of a target word w is s, n is the number of terms in a dictionary, terms wi (where 1 is greater than or equal to i is less than or equal to n and wj is not equal to w) of the dictionary are traversed, a paraphrasing set si corresponding to each term is taken out, an intersection ti of s and si is obtained by calculation, all original words are ordered according to the size of the set ti, and then top10 is intercepted as the co-translated word of the target word w to obtain co-translated words v1, v2, … and v10 of the target word w.
Based on the mode, the original related words of the same translation type corresponding to the target word can be obtained, manual classification is not needed, the determination efficiency of the same translation word is improved, the types of the related words are enriched, and the object is ensured to be capable of referring to the needed information.
In an alternative embodiment, the original associated word of the vector type may be obtained by steps S61-S63:
s61: obtaining a target word vector and each original word vector from a corpus;
s62: respectively determining a target word vector and a second similarity between the target word vector and each original word vector;
s63: and screening out each original word with the corresponding second similarity meeting the second similarity condition from each original word based on the determined second similarity, and taking the corresponding original word with the second similarity as each original associated word with the vector type corresponding to the target word.
Specifically, the second similarity between the target word vector and the original word vector may be determined by a euclidean distance, or may be a cosine similarity, or may be another similarity calculation method, which is not specifically limited herein.
Referring to fig. 12, a schematic diagram of a vector word screening method in an embodiment of the present application is shown, for each target word w, a word vector x corresponding to w is first extracted from training model word vector data in a corpus, then each original word wi in a dictionary is traversed (where i is greater than or equal to 1 and less than n, wj is equal to w, n is the dictionary word number), and a word vector xi corresponding to wi is also extracted from the word vector data. The Euclidean distance di between x and xi is calculated, then all original words are ordered according to the Euclidean distance, and the original associated words with vector types corresponding to the target word w are obtained by cutting off the top 10: v1, v2, …, v10.
Based on the mode, the similarity between the word vector of the target word and the word vector of the original word is compared, so that the composition structure of the target word and the original word can be compared in the dimension of the vector, the search of the similar word of the target word is perfected, manual classification is not needed, the determination efficiency of the vector word is improved, the types of the related words are enriched, and the object is ensured to be capable of consulting the needed information.
In an alternative embodiment, the original related words of the resolved type may be obtained by steps S71-S73:
s71: obtaining a plurality of query log data from a corpus, wherein each query log data comprises an input query string;
s72: screening each target character string conforming to a preset syntax rule from each query character string based on a pre-constructed regular expression;
s73: and obtaining each original associated word with the resolution type corresponding to the target word based on the original words except the target word in each target character string.
Specifically, the target character string contains target words, and the target character string is screened out in a regular expression mode, and the structure of the target character string is as follows: a and B are respectively target words and original words except the target words, and then the original words except the target words are screened to obtain original associated words with resolution types corresponding to the target words.
For example, taking the target word as good as an example, the screened target character string includes: and (3) distinguishing the good from the well and distinguishing the good from the best, and determining the original related words with the resolution types corresponding to the good from the well and the best.
In an alternative embodiment, step S73 may be implemented as the following steps S731-S732:
S731: counting respective third frequency information of original words except the target word in each target character string;
s732: and screening out the original words, corresponding to the third frequency information, which meet the third frequency screening condition from the original words except the target words in the target character strings based on the third frequency information, and taking the original words, corresponding to the target words, as the original associated words with the resolution type.
The third frequency information indicates the number of occurrences of the corresponding original word in each target character string, for example, the third frequency information of the well is 20, the third frequency information of the best is 15, and the third frequency filtering condition is that the highest third frequency information is taken, so that the well is the resolution word of good.
Referring to fig. 13A, which is a schematic diagram of another dictionary construction method in an embodiment of the present application, query strings with modes of "xxx and yyy difference" are screened from query logs by a regular expression mode: q1, q2, …, qn (n is the number of query strings conforming to the pattern), and simultaneously obtaining the query frequencies c1, c2, …, cn corresponding to the query strings by counting. By analyzing two words xi and yi in qi (i is more than or equal to 1 and less than or equal to n), a dictionary is constructed, key is an original word, and value is a series of original related words and frequency information thereof.
Referring to fig. 13B, a schematic diagram of a screening method of a resolved word according to an embodiment of the present application corresponds to a series of original related words for a target word w in a dictionary: w1, w2, …, wk (k is the number of original related words of w), and frequency information thereof: c1 C2, …, ck. By sequencing the frequency information, then intercepting top10, the resolution word corresponding to the target word w is obtained: v1, v2, …, v10.
Based on the mode, the original related words of the resolution type corresponding to the target word can be obtained, manual classification is not needed, the determination efficiency of the resolution word is improved, the types of the related words are enriched, and the object is ensured to be capable of referring to the needed information.
In the embodiment of the application, 6 types of recommended word information can be supported to be mined, and the recommended words of all types are fused together by a fusion ordering algorithm to form ordered recommended word sequence data. The finally generated data contains all types of comprehensive recommended word data, and only one display space is occupied. The final data can be provided for products at a mobile terminal and a personal computer (Personal Computer, PC) terminal, and specific products can be used by taking recommended word data of top5 or top10 according to the actual space resource limit.
Referring to fig. 13C, an overall flow chart of a related word recommendation method in an embodiment of the present application is shown, which includes two parts, namely data mining and fusion ordering. The data mining mainly mines recommended words (original associated words) of all types from an original corpus (corpus), fusion ranking refers to fusing all types of recommended words together, calculating ranking scores according to factors such as the type of the recommended words, ranking positions in the original category and the like, and finally forming a recommended word sequence according to the ranking score to form final required data by all the recommended words.
Aiming at the problem that the current 6-class recommended word updating online cannot be supported in the prior art. The fusion ordering method of the application realizes the measurement of the importance degree of each recommended word through ordering scoring, obtains the needed ordered recommended word data according to the scoring result, and realizes the data fusion of multiple types of recommended words. The data generated after fusion and sequencing finally only occupies one product space, and can support the actual requirement of updating and uploading the existing multi-class recommended word data.
Based on the same inventive concept, the embodiment of the application also provides a related word recommending device. As shown in fig. 14, which is a schematic structural diagram of the related-word recommendation device 1400, may include:
A receiving unit 1401 for receiving an input keyword and obtaining a target word matching the keyword from each original word;
an obtaining unit 1402, configured to obtain, from a preset corpus, at least one type of each original associated word corresponding to a target word, and a degree of correlation between each original associated word and the target word, where the original associated words of different types have different association relationships with the target word;
a determining unit 1403 for obtaining respective recommendation evaluation values of the original associated words based on the association relationship between the original associated words and the target words, and the degree of correlation;
a screening unit 1404, configured to screen candidate related words with corresponding recommended evaluation values meeting preset recommendation conditions from the original related words based on the obtained recommended evaluation values;
and a presenting unit 1405, configured to present the screened candidate related words in the presentation interface.
Optionally, the at least one type includes a co-occurrence type, the co-occurrence type characterizing: the corresponding original associated word is similar to the use scene of the target word; the acquisition unit 1402 specifically is configured to:
based on a corpus, obtaining a plurality of sentences to be used, and respectively splitting the sentences to be used into a plurality of corresponding original phrases, wherein at least two original words contained in each original phrase are adjacent in the belonging sentences to be used;
Screening out each target phrase containing the target word from each obtained original phrase;
and obtaining each original associated word with the co-occurrence type corresponding to the target word based on other original words except the target word in each target word group.
Optionally, the acquiring unit 1402 is specifically configured to:
counting first frequency information of each other original word, wherein the first frequency information represents: the number of occurrences of the corresponding other original words in each target phrase;
and screening out other original words corresponding to the first frequency information according with the first frequency screening condition from other original words based on the first frequency information, and taking the other original words corresponding to the first frequency information as original associated words with co-occurrence types.
Optionally, the acquiring unit 1402 is specifically configured to:
searching respective related phrases of each target phrase from each original phrase based on each other original word, wherein the related phrases comprise other original words in the corresponding target phrase;
counting second frequency information of each related original word except other original words in each related phrase, wherein the second frequency information represents: the occurrence times of the corresponding related original words in each related phrase;
And screening out the relevant original words corresponding to the second frequency information meeting the second frequency screening condition from the relevant original words based on the second frequency information, and taking the relevant original words corresponding to the second frequency information as the original associated words with the co-occurrence type.
Optionally, the at least one type includes a near-shape type, the near-shape type characterizing: the corresponding original associated word is similar to the target word in composition structure; the acquisition unit 1402 specifically is configured to:
respectively obtaining first similarity between the target word and each original word in a corpus;
and screening out each original word with the corresponding first similarity meeting the first similarity condition from the original words based on the obtained first similarities, and taking the original word with the corresponding near-shape type as each original associated word corresponding to the target word.
Optionally, the at least one type includes a co-translation type, the co-translation type characterizing: the corresponding original associated word is similar to the paraphrasing of the target word; the acquisition unit 1402 specifically is configured to:
obtaining a plurality of paraphrases of the target word and a plurality of paraphrases of each original word from a corpus;
determining respective paraphrasing sets of the original words based on the plurality of paraphrasitions corresponding to the target words and the respective plurality of paraphrasitions of the original words, wherein each paraphrasing set comprises: a plurality of definitions corresponding to the corresponding original word, identical definitions between the plurality of definitions corresponding to the target word;
And screening out the corresponding original words with the same paraphrasing quantity meeting the preset paraphrasing condition from the original words based on the same paraphrasing quantity contained in each paraphrasing set, and taking the corresponding original words with the same paraphrasing type as the corresponding original associated words with the same paraphrasing type of the target word.
Optionally, the at least one type includes a vector type, the vector type characterizing: the corresponding original word vector of the original associated word is similar to the composition structure of the target word vector of the target word; the acquisition unit 1402 specifically is configured to:
obtaining a target word vector and each original word vector from a corpus;
respectively determining a target word vector and a second similarity between the target word vector and each original word vector;
and screening out each original word with the corresponding second similarity meeting the second similarity condition from each original word based on the determined second similarity, and taking the corresponding original word with the second similarity as each original associated word with the vector type corresponding to the target word.
Optionally, the at least one type includes a resolved type, a resolved type characterization: the compared times of the corresponding original associated word and the target word accord with the preset times; the acquisition unit 1402 specifically is configured to:
obtaining a plurality of query log data from a corpus, wherein each query log data comprises an input query string;
Screening each target character string conforming to a preset syntax rule from each query character string based on a pre-constructed regular expression, wherein the target character string contains target words;
and obtaining each original associated word with the resolution type corresponding to the target word based on the original words except the target word in each target character string.
Optionally, the acquiring unit is specifically configured to:
counting respective third frequency information of original words except the target word in each target character string, wherein the third frequency information represents: the number of occurrences of the corresponding original word in each target string;
and screening out the original words, corresponding to the third frequency information, which meet the third frequency screening condition from the original words except the target words in the target character strings based on the third frequency information, and taking the original words, corresponding to the target words, as the original associated words with the resolution type.
For convenience of description, the above parts are described as being functionally divided into modules (or units) respectively. Of course, the functions of each module (or unit) may be implemented in the same piece or pieces of software or hardware when implementing the present application.
Those skilled in the art will appreciate that the various aspects of the application may be implemented as a system, method, or program product. Accordingly, aspects of the application may be embodied in the following forms, namely: an entirely hardware embodiment, an entirely software embodiment (including firmware, micro-code, etc.) or an embodiment combining hardware and software aspects may be referred to herein as a "circuit," module "or" system.
The embodiment of the application also provides electronic equipment based on the same conception as the embodiment of the method. In one embodiment, the electronic device may be a server, such as the server shown in FIG. 2. In this embodiment, the structure of the electronic device may include a memory 1501, a communication module 1503, and one or more processors 1502 as shown in fig. 15.
A memory 1501 for storing computer programs executed by the processor 1502. The memory 1501 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, a program required for running an instant communication function, and the like; the storage data area can store various instant messaging information, operation instruction sets and the like.
The memory 1501 may be a volatile memory (RAM) such as a random-access memory (RAM); the memory 1501 may also be a nonvolatile memory (non-volatile memory), such as a read-only memory, a flash memory (flash memory), a hard disk (HDD) or a Solid State Drive (SSD); or memory 1501, is any other medium capable of carrying or storing a desired computer program in the form of instructions or data structures and capable of being accessed by a computer, but is not limited thereto. The memory 1501 may be a combination of the above memories.
The processor 1502 may include one or more central processing units (central processing unit, CPU) or digital processing units, or the like. A processor 1502 for implementing the above related word recommendation method when calling the computer program stored in the memory 1501.
The communication module 1503 is used for communicating with the terminal device and other servers.
The specific connection medium between the memory 1501, the communication module 1503 and the processor 1502 is not limited in the embodiment of the present application. The embodiment of the present application is illustrated in fig. 15 by the memory 1501 and the processor 1502 being connected by the bus 1504, the bus 1504 being illustrated in fig. 15 by a bold line, and the connection between other components being illustrated only by way of example and not by way of limitation. The bus 1504 may be divided into an address bus, a data bus, a control bus, and the like. For ease of description, only one thick line is depicted in fig. 15, but only one bus or one type of bus is not depicted.
The memory 1501 stores therein a computer storage medium in which computer executable instructions for implementing the related word recommendation method of the embodiment of the present application are stored. The processor 1502 is configured to perform the related-word recommendation method described above, as shown in fig. 3.
In another embodiment, the electronic device may also be other electronic devices, such as the terminal device shown in fig. 1. In this embodiment, the structure of the electronic device may include, as shown in fig. 16: communication component 1610, memory 1620, display unit 1630, camera 1640, sensor 1650, audio circuitry 1660, bluetooth module 1670, processor 1680, and the like.
The communication component 1610 is for communicating with a server. In some embodiments, a circuit wireless fidelity (Wireless Fidelity, wiFi) module may be included, where the WiFi module belongs to a short-range wireless transmission technology, and the electronic device may help the user to send and receive information through the WiFi module.
Memory 1620 may be used to store software programs and data. The processor 1680 performs various functions of the terminal device and data processing by executing software programs or data stored in the memory 1620. The memory 1620 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device. The memory 1620 stores an operating system that enables the terminal device to operate. The memory 1620 may store an operating system and various application programs, and may also store a computer program for executing the related word recommendation method according to the embodiment of the present application.
The display unit 1630 may also be used to display information input by a user or information provided to the user and a graphical user interface (graphical user interface, GUI) of various menus of the terminal device. Specifically, the display unit 1630 may include a display screen 1632 disposed on the front surface of the terminal device. The display 1632 may be configured in the form of a liquid crystal display, light emitting diodes, or the like. The display unit 1630 may be used to display a presentation interface or the like in an embodiment of the present application.
The display unit 1630 may also be used to receive input numeric or character information, generate signal inputs related to user settings and function control of the terminal device, and in particular, the display unit 1630 may include a touch screen 1631 disposed on the front of the terminal device, and may collect touch operations on or near the user, such as clicking buttons, dragging scroll boxes, and the like.
The touch screen 1631 may cover the display screen 1632, or the touch screen 1631 may be integrated with the display screen 1632 to implement input and output functions of the terminal device, and after integration, the touch screen may be abbreviated as touch screen. The display unit 1630 may display application programs and corresponding operation steps in the present application.
The camera 1640 may be used to capture still images, and a user may post comments on the image captured by the camera 1640 through an application. The camera 1640 may be one or a plurality of cameras. The object generates an optical image through the lens and projects the optical image onto the photosensitive element. The photosensitive element may be a charge coupled device (charge coupled device, CCD) or a Complementary Metal Oxide Semiconductor (CMOS) phototransistor. The photosensitive elements convert the optical signals to electrical signals, which are then passed to the processor 1680 for conversion to digital image signals.
The terminal device may further include at least one sensor 1650, such as an acceleration sensor 1651, a distance sensor 1652, a fingerprint sensor 1653, a temperature sensor 1654. The terminal device may also be configured with other sensors such as gyroscopes, barometers, hygrometers, thermometers, infrared sensors, light sensors, motion sensors, and the like.
Audio circuitry 1660, speakers 1661, and microphone 1662 may provide an audio interface between the user and the terminal device. The audio circuit 1660 may transmit the received electrical signal converted from audio data to the speaker 1661, and convert the electrical signal into an audio signal by the speaker 1661 to be output. The terminal device may also be configured with a volume button for adjusting the volume of the sound signal. On the other hand, the microphone 1662 converts the collected sound signals into electrical signals, which are received by the audio circuit 1660 and converted into audio data, which are output to the communication component 1610 for transmission to, for example, another terminal device, or to the memory 1620 for further processing.
The bluetooth module 1670 is used to exchange information with other bluetooth devices having bluetooth modules through bluetooth protocols. For example, the terminal device may establish a bluetooth connection with a wearable electronic device (e.g., a smart watch) that also has a bluetooth module through bluetooth module 1670, thereby performing data interaction.
The processor 1680 is a control center of the terminal device, connects various parts of the entire terminal using various interfaces and lines, and performs various functions of the terminal device and processes data by running or executing software programs stored in the memory 1620 and calling data stored in the memory 1620. In some embodiments, the processor 1680 may include one or more processing units; the processor 1680 may also integrate an application processor that primarily handles operating systems, user interfaces, applications, etc., and a baseband processor that primarily handles wireless communications. It will be appreciated that the baseband processor described above may not be integrated into the processor 1680. The processor 1680 of the present application may run an operating system, an application, a user interface display, and a touch response, as well as related word recommendation methods of embodiments of the present application. In addition, a processor 1680 is coupled to the display unit 1630.
In some possible embodiments, aspects of the related-word recommendation method provided by the present application may also be implemented in the form of a program product, which includes a computer program for causing an electronic device to perform the steps of the related-word recommendation method according to the various exemplary embodiments of the present application described above when the program product is run on the electronic device, for example, the electronic device may perform the steps as shown in fig. 3.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The program product of embodiments of the present application may take the form of a portable compact disc read only memory (CD-ROM) and comprise a computer program and may be run on an electronic device. However, the program product of the present application is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with a command execution system, apparatus, or device.
The readable signal medium may comprise a data signal propagated in baseband or as part of a carrier wave in which a readable computer program is embodied. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with a command execution system, apparatus, or device.
A computer program embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer programs for performing the operations of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer program may execute entirely on the consumer electronic device, partly on the consumer electronic device, as a stand-alone software package, partly on the consumer electronic device and partly on a remote electronic device or entirely on the remote electronic device or server. In the case of remote electronic devices, the remote electronic device may be connected to the consumer electronic device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external electronic device (e.g., connected through the internet using an internet service provider).
It should be noted that although several units or sub-units of the apparatus are mentioned in the above detailed description, such a division is merely exemplary and not mandatory. Indeed, the features and functions of two or more of the elements described above may be embodied in one element in accordance with embodiments of the present application. Conversely, the features and functions of one unit described above may be further divided into a plurality of units to be embodied.
Furthermore, although the operations of the methods of the present application are depicted in the drawings in a particular order, this is not required to either imply that the operations must be performed in that particular order or that all of the illustrated operations be performed to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having a computer-usable computer program embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program commands may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the commands executed by the processor of the computer or other programmable data processing apparatus produce means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program commands may also be stored in a computer readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the commands stored in the computer readable memory produce an article of manufacture including command means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the application.
It will be apparent to those skilled in the art that various modifications and variations can be made to the present application without departing from the spirit or scope of the application. Thus, it is intended that the present application also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims (15)

1. A related-word recommendation method, the method comprising:
receiving input keywords, and obtaining target words matched with the keywords from the original words;
acquiring at least one type of original associated word corresponding to the target word and the relevance between each original associated word and the target word from a preset corpus, wherein the association between different types of original associated words and the target word is different;
acquiring respective recommendation evaluation values of the original associated words based on the association relation between the original associated words and the target words and the respective correlation degree;
screening candidate associated words with corresponding recommended evaluation values meeting preset recommendation conditions from the original associated words based on the obtained recommended evaluation values;
and presenting the screened candidate associated words in the display interface.
2. The method of claim 1, wherein the at least one type comprises a co-occurrence type, the co-occurrence type characterizing: the corresponding original associated word is similar to the use scene of the target word;
the obtaining each original associated word of at least one type corresponding to the target word from a preset corpus comprises:
based on the corpus, obtaining a plurality of sentences to be used, and respectively splitting the sentences to be used into a plurality of corresponding original phrases, wherein at least two original words contained in each original phrase are adjacent in the belonging sentences to be used;
screening out each target phrase containing the target word from each obtained original phrase;
and obtaining each original associated word with the co-occurrence type corresponding to the target word based on other original words except the target word in each target word group.
3. The method of claim 2, wherein the obtaining each original associated word with the co-occurrence type corresponding to the target word based on other original words in each target word group than the target word comprises:
counting first frequency information of each other original word, wherein the first frequency information represents: the number of occurrences of the corresponding other original words in the target phrases;
And screening out other original words corresponding to the first frequency information according with a first frequency screening condition from the other original words based on the first frequency information, and taking the other original words corresponding to the target word as original associated words with co-occurrence types.
4. The method of claim 2, wherein the obtaining each original associated word with the co-occurrence type corresponding to the target word based on other original words in each target word group than the target word comprises:
searching related phrases of each target phrase from the original phrases based on other original words, wherein the related phrases comprise other original words in the corresponding target phrases;
counting second frequency information of each related original word except for the other original words in each related phrase, wherein the second frequency information represents: the occurrence times of the corresponding related original words in the related phrases;
and screening out the related original words corresponding to the second frequency information meeting the second frequency screening condition from the related original words based on the second frequency information, and taking the related original words corresponding to the target word as the original associated words with the co-occurrence type.
5. The method of claim 1, wherein the at least one type comprises a near-shape type, the near-shape type characterizing: the corresponding original associated word is similar to the composition structure of the target word;
the obtaining each original associated word of at least one type corresponding to the target word from a preset corpus comprises:
respectively obtaining first similarity between the target word and each original word in the corpus;
and screening out the original words with the corresponding first similarity meeting the first similarity condition from the original words based on the obtained first similarities, and taking the original words with the corresponding near-shape type as the original associated words with the corresponding target words.
6. The method of claim 1, wherein the at least one type comprises a co-translation type, the co-translation type characterizing: the corresponding original associated word is similar to the paraphrasing of the target word;
the obtaining each original associated word of at least one type corresponding to the target word from a preset corpus comprises:
obtaining a plurality of paraphrases of the target word and a plurality of paraphrases of each original word from the corpus;
determining a paraphrasing set corresponding to each original word based on a plurality of paraphrases corresponding to the target word and a plurality of paraphrases corresponding to each original word, wherein each paraphrasing set comprises: a plurality of definitions corresponding to the corresponding original word, identical definitions between the plurality of definitions corresponding to the target word;
And screening out the corresponding original words with the same paraphrasing quantity meeting the preset paraphrasing condition from the original words based on the same paraphrasing quantity contained in each paraphrasing set, and taking the corresponding original words with the same paraphrasing type as the original associated words corresponding to the target word.
7. The method of claim 1, wherein the at least one type comprises a vector type, the vector type characterizing: the corresponding original word vector of the original associated word is similar to the composition structure of the target word vector of the target word;
the obtaining each original associated word of at least one type corresponding to the target word from a preset corpus comprises:
obtaining the target word vector and each original word vector from the corpus;
respectively determining second similarity between the target word vector and each original word vector;
and screening out the original words with the second similarity meeting the second similarity condition from the original words based on the determined second similarity, and taking the original words with the second similarity meeting the second similarity condition as the original associated words with the vector types corresponding to the target words.
8. The method of claim 1, wherein the at least one type comprises a resolution type, the resolution type characterizing: the compared times of the corresponding original associated word and the target word accord with the preset times condition;
The obtaining each original associated word of at least one type corresponding to the target word from a preset corpus comprises:
obtaining a plurality of query log data from the corpus, wherein each query log data comprises an input query string;
screening each target character string conforming to a preset syntax rule from each query character string based on a pre-constructed regular expression, wherein the target character string contains the target word;
and obtaining each original associated word with the resolution type corresponding to the target word based on the original words except the target word in each target character string.
9. The method of claim 8, wherein the obtaining each original associated word with a resolution type corresponding to the target word based on the original words except the target word in each target character string comprises:
counting respective third frequency information of original words except the target word in each target character string, wherein the third frequency information represents: the number of occurrences of the corresponding original word in the target strings;
and screening out the original words, corresponding to the third frequency information, which meet a third frequency screening condition from the original words except the target word in each target character string based on the third frequency information, and taking the original words, corresponding to the target word, as each original associated word with the resolution type.
10. An associated word recommendation device, comprising:
a receiving unit, configured to receive an input keyword, and obtain a target word matched with the keyword from each original word;
the acquisition unit is used for acquiring at least one type of original associated word corresponding to the target word from a preset corpus and the relevance between each original associated word and the target word, wherein the association relations between different types of original associated words and the target word are different;
a determining unit, configured to obtain respective recommendation evaluation values of the original associated words based on association relationships between the original associated words and the target words, and respective correlation degrees;
the screening unit is used for screening candidate associated words, corresponding to the recommended evaluation values, from the original associated words based on the obtained recommended evaluation values;
and the presentation unit is used for presenting the screened candidate associated words in the display interface.
11. The apparatus of claim 10, wherein the at least one type comprises a co-occurrence type, the co-occurrence type characterizing: the corresponding original associated word is similar to the use scene of the target word; the acquisition unit is specifically configured to:
Based on the corpus, obtaining a plurality of sentences to be used, and respectively splitting the sentences to be used into a plurality of corresponding original phrases, wherein at least two original words contained in each original phrase are adjacent in the belonging sentences to be used;
screening out each target phrase containing the target word from each obtained original phrase;
and obtaining each original associated word with the co-occurrence type corresponding to the target word based on other original words except the target word in each target word group.
12. The apparatus of claim 11, wherein the acquisition unit is specifically configured to:
counting first frequency information of each other original word, wherein the first frequency information represents: the number of occurrences of the corresponding other original words in the target phrases;
and screening out other original words corresponding to the first frequency information according with a first frequency screening condition from the other original words based on the first frequency information, and taking the other original words corresponding to the target word as original associated words with co-occurrence types.
13. An electronic device comprising a processor and a memory, wherein the memory stores a computer program which, when executed by the processor, causes the processor to perform the steps of the method of any of claims 1 to 9.
14. A computer readable storage medium, characterized in that it comprises a computer program for causing an electronic device to perform the steps of the method according to any one of claims 1-9 when said computer program is run on the electronic device.
15. A computer program product comprising a computer program, the computer program being stored on a computer readable storage medium; when the computer program is read from the computer readable storage medium by a processor of an electronic device, the processor executes the computer program, causing the electronic device to perform the steps of the method of any one of claims 1-9.
CN202310138951.4A 2023-02-20 2023-02-20 Associated word recommendation method and device, electronic equipment and storage medium Pending CN116975428A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310138951.4A CN116975428A (en) 2023-02-20 2023-02-20 Associated word recommendation method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310138951.4A CN116975428A (en) 2023-02-20 2023-02-20 Associated word recommendation method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116975428A true CN116975428A (en) 2023-10-31

Family

ID=88470140

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310138951.4A Pending CN116975428A (en) 2023-02-20 2023-02-20 Associated word recommendation method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116975428A (en)

Similar Documents

Publication Publication Date Title
AlDayel et al. Stance detection on social media: State of the art and trends
US20210232763A1 (en) Graphical systems and methods for human-in-the-loop machine intelligence
CN110543574B (en) Knowledge graph construction method, device, equipment and medium
CN112507715B (en) Method, device, equipment and storage medium for determining association relation between entities
US20210397980A1 (en) Information recommendation method and apparatus, electronic device, and readable storage medium
KR102354716B1 (en) Context-sensitive search using a deep learning model
JP6440732B2 (en) Automatic task classification based on machine learning
CN110019732B (en) Intelligent question answering method and related device
WO2023065211A1 (en) Information acquisition method and apparatus
KR20210037619A (en) Multimodal content processing method, apparatus, device and storage medium
US10963318B2 (en) Natural language interface to web API
US10678786B2 (en) Translating search queries on online social networks
CN105069103B (en) Method and system for APP search engine to utilize user comments
KR101982081B1 (en) Recommendation System for Corresponding Message
US20190361966A1 (en) Graphical systems and methods for human-in-the-loop machine intelligence
CN116720004B (en) Recommendation reason generation method, device, equipment and storage medium
CN110162771A (en) The recognition methods of event trigger word, device, electronic equipment
US20120166428A1 (en) Method and system for improving quality of web content
CN115114395B (en) Content retrieval and model training method and device, electronic equipment and storage medium
CN113806588A (en) Method and device for searching video
JP2020135135A (en) Dialog content creation assisting method and system
Ahmed et al. Sentiment analysis for smart cities: state of the art and opportunities
KR20200083159A (en) Method and system for searching picture on user terminal
CN115033739A (en) Search method, model training method, device, electronic equipment and medium
JP5286298B2 (en) Reputation analysis apparatus, reputation analysis method, and reputation analysis program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication