CN110162770B

CN110162770B - Word expansion method, device, equipment and medium

Info

Publication number: CN110162770B
Application number: CN201811231345.2A
Authority: CN
Inventors: 韩家龙; 宋彦; 史树明
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2018-10-22
Filing date: 2018-10-22
Publication date: 2023-07-21
Anticipated expiration: 2038-10-22
Also published as: CN110162770A

Abstract

The embodiment of the application discloses a word expansion method, a device, equipment and a medium, wherein the method comprises the following steps: acquiring a seed word to be expanded and a context of the seed word; according to the seed word and the context of the seed word, obtaining an output vector through a word expansion model, wherein the output vector is used for representing semantic similarity between each candidate word and the seed word in a candidate word library; and determining the expansion word of the seed word according to the output vector. The word expansion model adopted in the method is a neural network trained by a machine learning algorithm, and the model considers the semantics of the seed word itself and the context semantics of the seed word in the prediction process, so that the determined expansion word of the seed word can be ensured to conform to the context of the seed word, thereby providing information capable of meeting business requirements for various natural language processing applications and improving the application performance of the natural language processing applications.

Description

Word expansion method, device, equipment and medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a word expansion method, apparatus, device, and computer readable storage medium.

Background

In many natural language processing applications such as search engines, computer aided writing, automatic dialogue systems and the like at present, similar word expansion is generally required for specified words in natural language, so that further operation is performed based on the expanded similar words, and application performance is improved. The term "similar words" means that other words having the same or similar meaning as the meaning of the specified word are expanded for the specified word in a sentence, for example, for the sentence "barley grass is rich in nutrients such as amino acids," similar words having the same or similar meaning as the meaning of "amino acids" such as "vitamins", "chlorophyll" and the like can be expanded for the term "amino acids".

The existing similar word expansion technology is an aggregation expansion technology, and the aggregation expansion technology mainly aims at a specified word in a sentence to expand words in an implicit semantic class belonging to the specified word to be used as similar words; the set expansion technology mainly excavates similar words belonging to the same semantic meaning from a large number of corpus according to a certain excavation rule, for example, excavates a plurality of words separated by semicolons into similar words, wherein the similar words appear in the same row in a sentence. However, the set expansion technology does not consider the context of the specified word, so that the expanded similar words cannot meet the current application requirements. For example, in the case of the term "barley grass is rich in nutrients such as amino acids", the term "amino acids" may be extended by the collective extension technique to the term "fat" which belongs to the nutrition class but does not conform to the context of the specified term (barley grass does not contain fat).

It can be seen that in the environment of natural language processing applications, a scheme capable of implementing similar word expansion based on context needs to be researched, so as to improve the performance of various applications and promote the development of natural language processing application technology.

Disclosure of Invention

The embodiment of the application provides a word expansion method, related equipment and a system, which can provide information meeting service requirements for various natural language processing applications and improve the application performance of the natural language processing applications.

In view of this, a first aspect of the present application provides a word expansion method, the method comprising:

acquiring a seed word to be expanded and acquiring a context of the seed word;

according to the seed word and the context of the seed word, obtaining an output vector through a word expansion model, wherein the output vector is used for representing the semantic similarity between each candidate word in a candidate word library and the seed word; the word expansion model is a neural network model and is used for predicting semantic similarity between each candidate word and the seed word in the candidate word bank according to the word vector and the context vector corresponding to the seed word;

and determining the expansion word of the seed word according to the output vector.

A second aspect of the present application provides a method for training a word expansion model, the method comprising:

obtaining a training sample set, each sample in the training sample set comprising: seed words, contexts of the seed words and real expansion words corresponding to the seed words;

and constructing an initial neural network model, and training parameters of the initial neural network model according to the training sample set to obtain a neural network model meeting training ending conditions, wherein the neural network model is used as a word expansion model.

A third aspect of the present application provides a word expansion apparatus, the apparatus comprising:

the first acquisition module is used for acquiring seed words to be expanded and acquiring the contexts of the seed words;

the second acquisition module is used for acquiring an output vector through a word expansion model according to the seed word and the context of the seed word, wherein the output vector is used for representing the semantic similarity between each candidate word in the candidate word library and the seed word; the word expansion model is a neural network model and is used for predicting semantic similarity between each candidate word and the seed word in the candidate word bank according to the word vector and the context vector corresponding to the seed word;

and the determining module is used for determining the expansion word of the seed word according to the output vector.

A fourth aspect of the present application provides an apparatus for training a word expansion model, including:

an acquisition module for acquiring a training sample set, each sample in the training sample set comprising: seed words, contexts of the seed words and real expansion words corresponding to the seed words;

the construction module is used for constructing an initial neural network model, training parameters of the initial neural network model according to the training sample set to obtain a neural network model meeting training ending conditions, and the neural network model is used as a word expansion model.

A fifth aspect of the present application provides an apparatus comprising a processor and a memory:

the memory is used for storing program codes and transmitting the program codes to the processor;

the processor is configured to execute the steps of the word expansion method according to the first aspect or the step of training the word expansion model according to the second aspect according to the instructions in the program code.

A sixth aspect of the present application provides a computer readable storage medium for storing program code for performing the steps of the word expansion method of the first aspect described above, or the steps of training a word expansion model of the second aspect described above.

A seventh aspect of the present application provides a computer program product comprising instructions which, when run on a computer, cause the computer to perform the steps of the word expansion method of the first aspect described above, or the step of training a word expansion model as described in the second aspect described above.

From the above technical solutions, the embodiments of the present application have the following advantages:

in the embodiment of the application, a word expansion method is provided, in which a word expansion model is used for predicting corresponding expansion words for seed words to be expanded, the word expansion model is a neural network trained by a machine learning algorithm, the seed words to be expanded and information in the context thereof are encoded into two vectors, namely a word vector and a context vector through the neural network, and the two vectors are used for predicting possible expansion words of the seed words in a candidate word stock. The semantics of the seed word itself is considered in the prediction process, and the semantics of the seed word context are introduced, so that the context can influence the finally generated expansion word, the expansion word predicted by the word expansion model can accord with the context of the seed word, and the information meeting the service requirement can be provided for respective natural language processing application, so that the application performance of the natural language processing application is improved.

Drawings

Fig. 1 is an application scenario schematic diagram of a word expansion method in an embodiment of the present application;

FIG. 2 is a schematic flow chart of a word expansion method in an embodiment of the present application;

FIG. 3 is a schematic diagram of a word expansion model according to an embodiment of the present application;

FIG. 4 is a flowchart of a method for training a word expansion model according to an embodiment of the present application;

FIG. 5 is a schematic diagram of a word expansion model training process according to an embodiment of the present application;

fig. 6 is a schematic diagram of an application scenario of another word expansion method in the embodiment of the present application;

FIG. 7 is a schematic structural diagram of a first word expansion device according to an embodiment of the present application;

FIG. 8 is a schematic structural diagram of a second word expansion device according to an embodiment of the present application;

FIG. 9 is a schematic structural diagram of a third word expansion device according to an embodiment of the present application;

fig. 10 is a schematic structural diagram of a fourth word expansion device in an embodiment of the present application;

FIG. 11 is a schematic structural diagram of a device for training a word expansion model according to a first embodiment of the present application;

FIG. 12 is a schematic structural diagram of a device for training a word expansion model according to a second embodiment of the present application;

FIG. 13 is a schematic structural diagram of a word expansion device according to an embodiment of the present application;

Fig. 14 is a schematic structural diagram of another word expansion device in an embodiment of the present application.

Detailed Description

In order to make the present application solution better understood by those skilled in the art, the following description will clearly and completely describe the technical solution in the embodiments of the present application with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.

The terms "first," "second," "third," "fourth" and the like in the description and in the claims of this application and in the above-described figures, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that embodiments of the present application described herein may be capable of operation in sequences other than those illustrated or described herein, for example. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

When the existing set expansion technology is adopted to expand similar words, the technical problems that the expanded similar words do not accord with the context of the appointed word, the expanded similar words cannot meet the actual application requirements and the like often occur. In order to solve the problems of the existing similar word expansion technology, the embodiment of the application provides a word expansion method.

The following describes the core technical ideas of the word expansion method provided in the embodiment of the present application:

the word expansion method provided by the embodiment of the application provides a new word expansion model, and the word expansion model can determine the semantic similarity between each candidate word in the candidate word bank and the seed word to be expanded according to the word vector of the seed word to be expanded and the context vector of the seed word, namely, in the process of determining the expansion word of the seed word, the seed word and the context of the seed word are taken as reference factors. When word expansion is specifically carried out, a seed word to be expanded and the context of the seed word are acquired; then, inputting the seed word and the context of the seed word into the word expansion model, wherein the word expansion model predicts the semantic similarity between each candidate word in the candidate word bank and the seed word according to the word vector and the context vector corresponding to the seed word, and takes a vector capable of expressing the semantic similarity between each candidate word in the candidate word bank and the seed word as an output vector; and finally, determining the expansion words of the seed words according to the output vectors output by the word expansion model.

Compared with the existing set expansion technology, the word expansion method provided by the embodiment of the application considers the semantics of the seed word itself and the context semantics of the seed word in the process of predicting the expanded word of the seed word, ensures that the expanded word of the seed word can be determined based on the influence of the context of the seed word, namely, the expanded word determined based on the word expansion model can accord with the context of the seed word, thereby providing information capable of meeting business requirements for various natural language processing applications and improving the application performance of the natural language processing applications.

It should be understood that the word expansion method provided in the embodiment of the present application may be applied to a device having a natural language processing function, such as a terminal device, a server, or the like. The terminal equipment can be a smart phone, a computer, a personal digital assistant (Personal Digital Assitant, PDA), a tablet personal computer and the like; the server can be an application server or a Web server, and can be an independent server or a cluster server when the application is deployed, and the server can provide named entity identification service for a plurality of terminal devices at the same time.

In order to facilitate understanding of the technical solution provided in the embodiments of the present application, a server is taken as an execution body, and the word expansion method provided in the embodiments of the present application is introduced in conjunction with an actual application scenario.

Referring to fig. 1, fig. 1 is an application scenario schematic diagram of a word expansion method provided in an embodiment of the present application. The application scenario includes a terminal device 101 and a server 102, where the terminal device 101 is configured to send a text to be processed acquired by itself to the server 102, and the server 102 is configured to execute a word expansion method provided in the embodiment of the present application, and perform word expansion on a seed word in the text to be processed, so as to obtain an expanded word of the seed word.

When a user needs to search for content related to a certain text through a search engine, the user may input the text in a text input field provided on the terminal device 101, and accordingly, after the terminal device 101 acquires the text input by the user in the text input field, the acquired text is transmitted to the server 102 as a text to be processed.

After receiving a text to be processed sent by the terminal equipment 101, the server 102 acquires a seed word to be expanded in the text to be processed and a context of the seed word; then, the server 102 inputs the obtained seed word and the context of the seed word into a word expansion model operated by the server, the word expansion model is a neural network model trained by a machine learning algorithm, the word expansion model obtains word vectors and context vectors corresponding to the seed word by encoding the seed word and the context, and then predicts semantic similarity between each candidate word and the seed word in the candidate word library based on the word vectors corresponding to the seed word and the context vectors of the seed word, and further outputs an output vector capable of representing the semantic similarity between each candidate word and the seed word in the candidate word library; finally, the server 102 selects the expanded word of the seed word from the candidate words according to the output vector which is output by the word expansion model and can represent the semantic similarity between each candidate word and the seed word.

It should be noted that, when the word expansion model running in the server 102 predicts the semantic similarity between each candidate word and the seed word in the candidate word library, the semantic of the seed word itself is considered, and the context semantic of the seed word is considered, so that the expansion word of the seed word can be determined based on the influence of the context of the seed word, that is, the expansion word determined based on the word expansion model can conform to the context of the seed word, thereby providing information capable of meeting the service requirement for the natural language processing application, and improving the application performance of the natural language processing application.

It should be noted that, the scenario shown in fig. 1 is only an example, and in practical application, the word expansion method provided in the embodiment of the present application may also be applied to a terminal device, and no specific limitation is made to the application scenario of the word expansion method.

The word expansion method provided in the present application is described below by way of examples.

Referring to fig. 2, fig. 2 is a schematic flow chart of a word expansion method according to an embodiment of the present application. For convenience of description, the following embodiments describe a terminal device as an execution subject, and it should be understood that the execution subject of the word expansion method is not limited to the terminal device, but may be applied to a device having a natural language processing function such as a server. As shown in fig. 2, the word expansion method includes the steps of:

Step 201: and acquiring a seed word to be expanded and acquiring the context of the seed word.

The seed word is a keyword which can be subjected to word expansion in natural sentences, the keyword is usually a word with a substantial meaning in the natural sentences, such as nouns, verbs, adjectives and the like, and the word expansion is performed on the seed word, so that a plurality of expansion words which are the same as or similar to the implicit semantic meaning of the seed word and accord with the context of the seed word can be expanded; for example, in a search engine application, word segmentation is generally required to be performed on a query sentence input by a user, so as to determine words with substantial meaning, such as nouns (e.g., person names, place names, other named entity names), verbs, and the like, in the query sentence, where the nouns and the verbs can be used as keywords, i.e., seed words, in the query sentence, and further, word expansion is performed on the determined seed words, so that information search can be performed on the basis of the expanded words of the seed words, and search quality is improved.

It should be understood that in a natural sentence there may be one seed word or there may be a plurality of seed words, and no limitation is made herein on the number of seed words present in a natural sentence.

In order to more visually explain the seed words and their context, a specific example will be described below. For example, for the natural sentence "barley grass is rich in nutrients such as amino acids", the keyword that can be expanded, i.e., the seed word, is "amino acids".

Correspondingly, the context of the seed word is the part of the natural sentence which is left after the seed word is removed, and still taking the natural sentence of ' barley grass is rich in nutrients such as amino acids ' as an example, after the seed word of ' amino acids ' in the natural sentence is removed, the part of the natural sentence of ' barley grass is rich [? The nutrient "is the context of the seed word" amino acid ", wherein [? And represents the placeholder to which the seed word corresponds.

After the terminal equipment acquires the natural sentence input by the user, the keyword which can be expanded in the natural sentence can be determined by carrying out related processing such as semantic analysis and the like on the natural sentence, and the keyword is used as a seed word to be expanded in the natural sentence; and after determining the seed word to be expanded in the natural sentence, the terminal device can directly use the rest part of the natural sentence after removing the seed word as the context of the seed word.

It should be understood that, in addition to determining the seed word and the context of the seed word by means of semantic analysis, the terminal device may also determine the seed word and the context of the seed word in the natural sentence by other methods, and the manner of determining the seed word and the context of the seed word is not specifically limited herein.

Step 202: and obtaining an output vector through a word expansion model according to the seed word and the context of the seed word.

After the terminal equipment determines the seed word and the context of the seed word aiming at the natural sentence, the determined seed word and the context of the seed word are input into a word expansion model operated by the terminal equipment, and the word expansion model correspondingly processes the seed word and the context of the seed word to output an output vector capable of representing the semantic similarity between each candidate word in the candidate word library and the seed word.

The word expansion model is a neural network model trained by a machine learning algorithm, and the neural network model can respectively convert an input seed word and the context of the seed word into a word vector corresponding to the seed word and a context vector corresponding to the seed word; furthermore, the neural network model can predict the semantic similarity between each candidate word and the seed word in the candidate word bank according to the word vector corresponding to the seed word and the context vector corresponding to the seed word.

It should be noted that, the candidate word library is formed by all words extracted from all natural corpus, that is, each candidate word included in the candidate word library is actually all words extracted from the natural corpus, and in practical application, along with updating of the natural corpus, the candidate word included in the candidate word library may be correspondingly updated.

It should be noted that, the output vector of the word expansion model may be a similarity vector capable of representing the semantic similarity between each candidate word and the seed word in the candidate word bank, where the similarity vector includes the feature similarity between each candidate word and the seed word in the candidate word bank; in addition, the output vector of the word expansion model may be a probability vector capable of representing the semantic similarity between each candidate word in the candidate word bank and the seed word, where the probability vector is obtained by further calculating the similarity vector, and the probability vector includes the probability of each candidate word in the candidate word bank as the expansion word of the seed word.

Step 203: and determining the expansion word of the seed word according to the output vector.

After the word expansion model processes the input seed word and the context of the seed word correspondingly, an output vector capable of representing the semantic similarity between each candidate word and the seed word in the candidate word bank is output, and further, the terminal equipment can determine the expansion word of the seed word according to the output vector.

In particular, when the output vector output by the word expansion model is a similarity vector, the terminal device may determine the expansion word of the seed word according to the feature similarity between each candidate word and the seed word included in the similarity vector. It should be understood that, if the feature similarity between a certain candidate word and a seed word in the candidate word library is higher, the likelihood that the candidate word is used as an expansion word of the seed word is higher; otherwise, if the feature similarity between a certain candidate word and a seed word in the candidate word library is lower, the likelihood that the candidate word is used as an expansion word of the seed word is lower.

If the output vector output by the word expansion model is a probability vector, the terminal device can determine the expansion word of the seed word according to the probability that each candidate word included in the probability vector is used as the expansion word of the seed word. It should be understood that, if the probability that a certain candidate word in the candidate word library is used as an expansion word of a seed word is higher, the more likely that the candidate word is used as the expansion word of the seed word is indicated; conversely, if the probability that a certain candidate word in the candidate word library is used as the expansion word of the seed word is lower, the candidate word is less likely to be used as the candidate word of the seed word.

In one possible implementation manner, in order to avoid determining a large number of expansion words for a seed word, and performing a large number of calculation processes based on the large number of expansion words in a subsequent processing process, the word expansion method provided in this embodiment may also limit the number of expansion words in the process of performing word expansion. Specifically, the terminal device may select candidate words corresponding to M elements ranked first from the candidate word library according to a descending order ranking order of element values in the output vector, where M is a number threshold of the expansion words, as expansion words of the seed word.

In specific implementation, the terminal device may sort the elements included in the output vector according to the size of each element value included in the output vector and the order from large to small, and further select candidate words corresponding to M elements with the top order as expansion words of the seed words. In addition, the terminal device may further sort the elements included in the output vector according to the order from small to large according to the values of the elements included in the output vector, and further select candidate words corresponding to M elements after sorting as the expansion words of the seed word.

It should be understood that the specific value corresponding to M may be set according to actual requirements, and no limitation is made to the specific value corresponding to M herein.

When the output vector is a similarity vector containing feature similarity between each candidate word and the seed word in the candidate word library, the terminal device can sort the feature similarity corresponding to each candidate word according to the descending order and the order of the feature similarity corresponding to each candidate word in the similarity vector, and then select the candidate words corresponding to M feature similarities with the top sorting as the expansion words of the seed word.

When the output vector is a probability vector containing probabilities of the expansion words of the candidate words in the candidate word library as seed words, the terminal device can sort the probabilities corresponding to the candidate words according to the descending sort order according to the probabilities corresponding to the candidate words in the probability vector, and then select the candidate words corresponding to the M probabilities with the top sort order as the expansion words of the seed words.

It should be noted that, under many application scenarios of practical application, there is a requirement for word expansion of the seed word, so that in addition to the related processing of the seed word, the related processing of the expanded word of the seed word can be performed, so as to provide more abundant information capable of meeting the service requirement, and three common application scenarios requiring word expansion of the seed word and specific implementation manners of word expansion in each application scenario are described below.

In the first application scenario, when a user uses a search engine to inquire certain content, a terminal device acquires a natural sentence which is input by the user and is used as an inquiry condition, and performs word expansion on seed words in the natural sentence, so that related content can be searched for the seed words in the natural sentence and the expansion words of the seed words later, and the information searching range is reasonably enlarged.

The terminal equipment acquires a query statement, wherein the query statement refers to a search condition for querying information, which is input in a search engine; the terminal equipment extracts keywords from the query sentence as seed words to be expanded, and extracts the context of the seed words from the query sentence. Correspondingly, after the terminal equipment determines the expansion words of the seed words by using the word expansion model, the query sentences and the expansion words of the seed words input by the user are sent to the server, so that the server can return relevant search results to the terminal equipment according to the query sentences and the expansion words of the seed words.

In specific implementation, the terminal device may obtain a query sentence input by a user from an input field provided by the search engine, extract an expandable keyword from the query sentence as a seed word to be expanded by performing semantic analysis on the query sentence, and use a portion of the query sentence from which the seed word is removed as a context of the seed word. Then, the terminal equipment inputs the determined seed word and the context of the seed word into a word expansion model, and the word expansion model correspondingly processes the seed word and the context of the seed word to output an output vector capable of representing the semantic similarity between each candidate word and the seed word in the candidate word library; further, the terminal device determines the expansion word of the seed word according to the size of each element value in the output vector. After the expanded words of the seed words are determined, the terminal equipment sends the query sentences input by the user and the determined expanded words of the seed words to the server, so that the server can search information according to the expanded words of the seed words on the basis of searching related information according to the query sentences, the range of information searching is reasonably expanded, and the server returns the searched search results to the terminal equipment after searching to obtain the related search results of the query sentences and the expanded words of the seed words.

To facilitate further understanding of the implementation details, the implementation details are illustrated below:

assuming that a user inputs a query sentence "Mei Xi belongs to the world cup champion team" in an input field provided by a search engine, after a terminal device acquires the query sentence "Meixi belongs to the world cup champion team", semantic analysis is performed to determine that an expandable keyword in the query sentence is "Mei Xi", the Meixi "is used as a seed word, and the [? The "world cup champion team" as the context of the seed word; the terminal device will seed word "Mei Xi" and the context of the seed word "[? The method is characterized in that the word expansion model is used for correspondingly processing seed words and the contexts of the seed words, outputting output vectors capable of representing semantic similarity between each candidate word and the seed words in a candidate word bank, and according to the output vectors, the terminal equipment determines other expansion words conforming to the context of the seed words, namely, determines other ball stars related to the world cup champion team, such as 'C-Row', 'Qidan', 'Ronaldi' and the like, and takes the ball stars as expansion words of 'Mexico'. After the terminal equipment determines the expansion words of the seed words, the query sentences and the expansion words of the 'Meixi' input by the user are correspondingly sent to the server, the server searches related information according to the query sentences and the expansion words of the 'Meixi', and the searched result is returned to the terminal equipment.

In the second application scenario, the terminal equipment acquires a text sentence input by a user in the writing process, performs word expansion aiming at a seed word in the text sentence, and recommends an expansion word of the seed word obtained by word expansion to the user so as to provide richer writing resources for the user.

The terminal device may acquire a text sentence, which refers to a sentence in text information input in the text editor; the terminal equipment extracts keywords from the text sentence as seed words to be expanded, and extracts the context of the seed words from the text sentence; correspondingly, after the terminal equipment determines the expansion word of the seed word, information prompt is carried out according to the expansion word of the seed word.

In specific implementation, the terminal device can acquire a text sentence input by a user through a text editor, extract an expandable keyword from the text sentence as a seed word through semantic analysis of the text sentence, and correspondingly take the rest of the text sentence after the seed word is removed as the context of the seed word; the terminal equipment inputs the determined seed words and the contexts of the seed words into a word expansion model running by the terminal equipment, and the word expansion model correspondingly processes the input seed words and the contexts of the seed words to output vectors capable of representing semantic similarity between each candidate word and each seed word in the candidate word library; and the terminal equipment determines the expansion words of the seed words according to the output vector, and further displays the expansion words of the seed words to a user in an information prompt mode.

when a user uses office software such as Word to perform daily writing or academic and news writing, the terminal device can acquire a text sentence input by the user through a text editor, and the terminal device correspondingly acquires the text sentence of ' nutrition such as barley grass which is rich in amino acid ' provided that the text sentence input by the user is ' nutrition such as barley grass which is rich in amino acid ', extracts a keyword ' amino acid ' from the text sentence through semantic analysis as a seed Word to be expanded, and enriches ' barley grass [? Nutrition of ] etc. "as context of seed words; the terminal device then enriches the determined seed word "amino acid" and the context of the seed word "barley grass [? Inputting a word expansion model running per se, and outputting an output vector capable of representing semantic similarity between each candidate word and the seed word in the candidate word library by correspondingly processing the seed word and the context of the seed word; furthermore, the terminal equipment determines the expansion words of the 'amino acid' according to the output vector, and displays the expansion words of the 'amino acid' in a form of information prompt so as to push the expansion words of the 'amino acid' to the user, thereby being convenient for the user to enrich the writing content based on the pushed expansion words in the writing process.

In the third application scenario, when the user uses the automatic question-answering system, the terminal device can perform word expansion on the seed words in the question-answering sentences input by the user, so that related answer contents can be searched for the seed words in the question-answering sentences and the expanded words of the seed words later, the search range of the answer contents is enlarged, and reasonable answer contents are returned to the user.

The terminal equipment can acquire a question-answer sentence, wherein the question-answer sentence is a sentence input on an input interface of a question-answer system; extracting keywords from the question-answer sentence as seed words to be expanded, and extracting the context of the seed words from the question-answer sentence; correspondingly, after the terminal equipment determines the expansion word of the seed word, searching the response content according to the expansion word of the seed word, and returning the response content.

In specific implementation, the terminal device may acquire the sentence input by the user on the question-answering system interface, and it should be understood that the sentence input by the user on the question-answering system interface may be a natural sentence in text form or a natural sentence in speech form, and when the sentence input by the user is a natural sentence in speech form, the terminal device may convert the natural sentence in speech form into a corresponding natural sentence in text form through techniques such as speech conversion, and then execute the subsequent steps. After the terminal equipment acquires the natural sentence in the text form, extracting the expandable keyword from the natural sentence in the text form through semantic analysis, taking the keyword as a seed word, and taking the rest part after the seed word is removed from the natural sentence as the context of the seed word. The terminal equipment inputs the determined seed words and the contexts of the seed words into a word expansion model running on the terminal equipment, and the word expansion model outputs output vectors capable of representing semantic similarity between each candidate word and the seed word in the candidate word library by correspondingly processing the seed words and the contexts of the seed words. And the terminal equipment determines the expansion word of the seed word according to the output vector, and then sends the sentence input by the user on the interface of the question-answering system and the expansion word of the seed word obtained by word expansion to the server, so that the server can search relevant answer content according to the content and return the answer content to the terminal equipment.

the method is characterized in that the user inputs a question-answer sentence 'apple stock price circumference rising' through voice on an automatic question-answer system interface provided by the terminal equipment, and after the terminal equipment acquires the question-answer sentence, voice conversion is carried out on the question-answer sentence to obtain a corresponding text form 'apple stock price circumference rising'. The terminal device extracts an expandable keyword, namely a seed word, from the "apple stock price week rising" as "apple" through semantic analysis, and correspondingly, "[? The @ stock price week rises "is the context of the seed word. The terminal device will seed the word "apple" and the context of the seed word "[? The price of the stock increases every week "input the word expansion model itself runs, accordingly, the word expansion model is in the context of the seed word" apple "and the seed word" [? After the price of the stock rises up and is subjected to correlation processing, output vectors capable of representing semantic similarity between each candidate word in the candidate word library and the seed word apple are output, and according to the output vectors, terminal equipment can determine other expansion words conforming to the context of the seed word, namely, determine names of other well-known electronic equipment production companies, such as ' three stars ', ' millet ', ' and ' Hua Cheng ' and the like, and the names of the electronic equipment production companies are used as expansion words of the apple. Further, the terminal device transmits the question and answer sentence input by the user and the determined expansion word to the server, so that the server searches for relevant answer contents according to the contents, and returns the searched answer contents to the terminal device.

It should be understood that the above three application scenarios are merely examples of the application scenario of the word expansion technique provided in the present embodiment, and in practical application, the word expansion technique provided in the present embodiment may also be applied to other application scenarios, and no specific limitation is made to the application scenario of the word expansion method provided in the present embodiment.

In the word expansion method provided by the embodiment of the application, in the process of predicting the expansion word of the seed word, the semantics of the seed word is considered, and the context semantics of the seed word is considered, so that the expansion word of the seed word can be determined based on the influence of the context of the seed word, namely, the expansion word determined based on the word expansion model can accord with the context of the seed word, thereby providing information capable of meeting business requirements for each natural language processing application, and improving the application performance of the natural language processing application.

As described above, the implementation of the word expansion method provided in the embodiments of the present application mainly depends on a word expansion model, which can output an output vector capable of characterizing semantic similarity between each candidate word and a seed word in a candidate word library by correspondingly processing the input seed word and the context of the seed word. In order to facilitate further understanding of the word expansion method provided in the embodiments of the present application, the word expansion model is specifically described below with reference to the accompanying drawings.

Referring to fig. 3, fig. 3 is a schematic architecture diagram of a word expansion model 300 according to an embodiment of the present application. As shown in fig. 3, the word expansion model 300 includes an input layer 310 and a prediction layer 320.

The input layer 310 specifically includes: a seed word encoder 311 and a context encoder 312.

The seed word encoder 311 takes a seed word as an input and takes a word vector corresponding to the seed word as an output; the context encoder 312 takes as input the context of the seed word and as output the context vector corresponding to the seed word.

The input layer 310 is mainly used for encoding the seed words and the contexts corresponding to the seed words input into the word expansion model, and respectively obtaining word vectors corresponding to the seed words and context vectors corresponding to the seed words; and then, the word vector corresponding to the obtained seed word and the context vector corresponding to the seed word are spliced to generate a semantic vector of the seed word, namely, input data of a prediction layer is generated.

The seed word encoder 311 included in the input layer 310 is mainly used for encoding a seed word input into the word expansion model, and generating a word vector corresponding to the seed word. It should be understood that, for a natural sentence, the number of seed words included therein is generally smaller, so that a vector average encoder with a simple structure may be directly adopted as the seed word encoder 311, and the vector average encoder may be used to perform an average process on the vector corresponding to the input seed word, so as to obtain the word vector corresponding to the seed word.

The context encoder 312 included in the input layer 310 is mainly configured to encode a context corresponding to a seed word input into the word expansion model, and generate a context vector corresponding to the seed word.

For the context encoder 312, an encoder having a different structure may be selected as the context encoder 312 from both the viewpoint of considering the placeholder and the viewpoint of not considering the placeholder. Here, the placeholder refers to a position that the seed word originally occupies in the natural sentence.

When considering placeholders, a convolutional neural network (Convolutional Neural Network, CNN) encoder that introduces location features may be used as a context encoder; specifically, a position vector may be spliced on each word vector of the input CNN layer, where the position vector is determined according to a relative positional relationship between a vocabulary corresponding to the word vector and a placeholder, and the position vector may be continuously updated during the training process. In addition, context2vec encoders may also be used as context encoders that encode content to the left and right of a placeholder using two Long Short-Term Memory (LSTM) networks, respectively.

When placeholders are not considered, a vector average encoder may be directly selected as the context encoder. When generating a context vector corresponding to the seed word for the context of the input seed word, the placeholder can be directly removed or regarded as a rare word, so that the context of the input seed word is regarded as a common sentence. And then, carrying out average processing on vectors corresponding to the vocabularies in the context of the seed words to generate context vectors corresponding to the seed words.

Optionally, an attention mechanism-based encoder can be used for the context encoder, so that focusing on more important parts in sentences in the encoding process can be ensured, and more optimized context vectors can be obtained.

After the seed word encoder 311 generates a word vector corresponding to the seed word and the context encoder 312 generates a context vector corresponding to the context of the seed word, the input layer 310 concatenates the word vector corresponding to the seed word generated by the seed word encoder 311 and the context vector generated by the context encoder 312 to generate a semantic feature vector of the seed word, and the semantic feature vector of the seed word is output to the prediction layer 320 as an output of the input layer.

The formula of the input layer 310 for specifically calculating the semantic feature vector of the seed word is shown in formula (1):

wherein x is a semantic feature vector of the seed word, s is a word vector corresponding to the seed word, and C is a context vector corresponding to the seed word.

It should be noted that, the word expansion model in this embodiment may be applicable to various languages, that is, seed words in different languages and contexts of the seed words may be encoded to generate word vectors corresponding to the seed words and context vectors corresponding to the seed words.

The prediction layer 320 specifically includes: a full connectivity layer 321 and a classification layer 322.

The full-connection layer 321 takes the semantic feature vector of the seed word as input and takes a similarity vector comprising feature similarity between each candidate word and the seed word in the candidate word library as output; the semantic feature vector of the seed word is a vector generated by splicing a word vector corresponding to the seed word and a context vector.

The classification layer 322 takes as input the similarity vector and takes as output vector of the word expansion model the probability vector normalized to the similarity vector.

The prediction layer 320 mainly plays a role of determining a probability vector capable of representing semantic similarity between each candidate word and a seed word in the candidate word bank as an output vector of the word expansion model according to the semantic feature vector of the seed word output by the input layer 310. The semantic feature vector of the seed word is a vector formed by the input layer 310 according to the word vector corresponding to the seed word and the context vector corresponding to the seed word.

The full-connection layer 321 included in the prediction layer 320 is mainly used for determining feature similarity between each candidate word and the seed word in the candidate word bank according to the semantic feature vector of the seed word output by the input layer 310, and further generating a similarity vector according to the feature similarity between each candidate word and the seed word in the candidate word bank.

In specific implementation, the full-connection layer 321 processes semantic feature vectors of the seed words output by the input layer 310 by using its own parameters to obtain feature similarity between each candidate word and the seed word in the candidate word library, and for convenience in description, the feature similarity between each candidate word and the seed word is abbreviated as the feature similarity corresponding to each candidate word; and further, generating a similarity vector according to the feature similarity corresponding to each candidate word in the candidate word library. It should be noted that, the parameters of the full-connection layer 321 are obtained during the training process of the training word expansion model, and are substantially semantic feature vectors corresponding to each candidate word in the candidate word library, and optionally, the parameters of the full-connection layer may include, in addition to the semantic feature vectors corresponding to each candidate word, bias parameters corresponding to each semantic feature vector.

For example, assume that a candidate word library is L, where the candidate word library L includes a large number of candidate words extracted from natural corpus, there is a corresponding semantic feature vector for each candidate word, and there is a corresponding bias parameter for the semantic feature vector corresponding to each candidate word, where the semantic feature vector corresponding to each candidate word in the candidate word library and the bias parameter corresponding to each candidate word are parameters of a full-connection layer in fact; assume that for the t candidate word in the candidate word library, the corresponding semantic feature vector is w _t The bias parameter is b _t Then { w } _t ，b _t } _t∈L I.e. the parameters corresponding to the candidate word in the full-connection layer, using { w } _t ，b _t } _t∈L And the semantic feature vector x of the seed word input to the full-connection layer can calculate the feature similarity between the t candidate word and the seed word, wherein the feature similarity is w _t ^T x+b _t The method comprises the steps of carrying out a first treatment on the surface of the According to the method, the full-connection layer can calculate the feature similarity between each candidate word and the seed word in the candidate word library, and then the similarity vector can be determined according to the feature similarity corresponding to each candidate word.

The classification layer 322 included in the prediction layer 320 is mainly used for further normalizing the similarity vector output by the full-connection layer 321, and generating a probability vector capable of representing the semantic similarity between each candidate word and the seed word in the candidate word library, as an output vector of the word expansion model.

In specific implementation, the classification layer 322 may calculate, according to the feature similarity corresponding to each candidate word in the candidate word library included in the similarity vector, a probability corresponding to each candidate word in the candidate word library, where the probability can represent a likelihood of an expanded word of the candidate word called a seed word; and generating probability vectors capable of representing semantic similarity between each candidate word in the candidate word library and the seed word according to the probability corresponding to each candidate word in the candidate word library.

Alternatively, in practical application, a softmax classifier may be used as the classification layer 322, and when the softmax classifier is used to calculate the probability corresponding to each candidate word according to the feature similarity corresponding to each candidate word in the similarity vector, the calculation may be performed by using a formula shown in formula (2).

Wherein P (t|s, C) represents the probability that the t-th candidate word in the candidate word library becomes an expanded word of the seed word s,for the feature similarity corresponding to the t candidate word in the similarity vector, t' represents any one candidate word in the candidate word library, and accordingly,/-is>Is the feature similarity corresponding to any candidate word in the similarity vector.

Thus, a probability vector which can be an output vector of the word expansion model can be generated from probabilities of becoming expansion words corresponding to the respective candidate words in the candidate word library.

The word expansion model generates word vectors corresponding to the seed words according to the input seed words by utilizing a seed word encoder in an input layer, generates context vectors corresponding to the seed words according to the contexts of the seed words by utilizing a context encoder in the input layer, and then the input layer splices the word vectors corresponding to the seed words and the context vectors to generate semantic feature vectors corresponding to the seed words and inputs the semantic feature vectors corresponding to the seed words to a prediction layer; the full-connection layer in the prediction layer utilizes the parameters of the full-connection layer to process the semantic feature vectors corresponding to the seed words to generate similarity vectors, and further, the classification layer in the prediction layer generates probability vectors capable of representing semantic similarity between each candidate word in the candidate word library and the seed words according to the similarity vectors, and the probability vectors are used as output vectors of the word expansion model. In the process of determining the output vector, the word expansion model considers the semantics of the seed word and the context semantics of the seed word, ensures that the output vector can be determined based on the context of the seed word, and further ensures that the seed word can be determined based on the output vector to conform to the context of the seed word.

It should be understood that whether the word expansion model can accurately expand the expansion word conforming to the context of the seed word depends on the model performance of the word expansion model, and whether the model performance of the word expansion model depends on the training process of the word expansion model.

The training method of the word expansion model is described below. Referring to fig. 4, fig. 4 is a flowchart of a method for training a word expansion model according to an embodiment of the present application, and for convenience of description, the following embodiment describes a server as an execution body, and it should be understood that the execution body of the training method of the word expansion model is not limited to the server, and may be applied to a device with a model training function, such as a terminal device.

As shown in fig. 4, the method for training the word expansion model includes the following steps:

step 401: a training sample set is obtained.

When the server trains the word expansion model, a training sample set for training the word expansion model needs to be obtained in advance, the training sample set comprises a large number of samples, and each sample comprises: seed words, the context of the seed words, and the actual expansion words corresponding to the seed words.

It should be understood that, before training the word expansion model, the server needs to pre-take a large number of natural corpus from the network, generate samples based on the natural corpus, and construct a training sample set for training the word expansion model based on the samples. When the training sample set is specifically constructed, the server can firstly extract sentences meeting training conditions from natural corpus, and extract context and hyponym sets aiming at the sentences meeting the training conditions, wherein the hyponym sets at least comprise two hyponyms.

When the method is specifically implemented, the server firstly takes a large amount of natural corpus from the network before constructing the training sample set, and specifically, the server can take the natural corpus from the resources such as web pages, news, encyclopedia and the like; after a large amount of natural corpus is picked up, the server further extracts sentences meeting training conditions from the picked-up natural corpus, wherein the sentences meeting the training conditions generally refer to natural sentences comprising at least two similar words; and the server extracts the context and the lower word set from the sentence meeting the training conditions by carrying out semantic analysis on the sentence meeting the training conditions, namely, all similar words in the sentence meeting the training conditions are used as lower words, the lower words are used for forming the lower word set, and the rest of the sentence meeting the training conditions except all the similar words is used as the context.

It should be understood that the cost of screening sentences meeting training conditions from natural corpus is far lower than the cost required by manually labeling data, so that the word expansion model training method provided by the embodiment of the application acquires training data from natural corpus, and the cost required by model training can be greatly reduced.

In one possible implementation, the server may extract, from the natural corpus, sentences satisfying the Hearst pattern according to the regular expression of the Hearst pattern, and extract corresponding context and hyponym sets for the sentences satisfying the Hearst pattern.

It should be noted that, the sentence pattern satisfying the statement of the heart mode is generally "t1, t2 …, etc. h" or "t1, t2 …, and other h", where t1, t2 represent lower words, h represents upper words, and "barley is rich in vitamins, antioxidants, enzymes, minerals, amino acids, chlorophyll, etc." for example, "vitamins", "antioxidants", "enzymes", "minerals", "amino acids" and "chlorophyll" all belong to lower words, and "nutrition" belongs to upper words.

Accordingly, after the server obtains the sentence meeting the above-mentioned heart mode from the natural corpus, all the hyponyms in the sentence can be extracted to be used as a hyponym set, and the rest part of the sentence, from which all the hyponyms are removed, is used as context. Still taking the example of "barley grass is rich in vitamins, antioxidants, enzymes, minerals, amino acids, chlorophyll, etc." the lower-level set of words of this sentence contains "vitamins", "antioxidants", "enzymes", "minerals", "amino acids" and "chlorophyll", the context of this sentence is "barley grass is rich [? Nutrient such as ].

It should be understood that the server may also extract sentences satisfying the training conditions from the natural corpus by using a machine learning method, and no limitation is made on a specific implementation manner of extracting sentences satisfying the training conditions.

Then, the server takes one hyponym in the hyponym set as a seed word, and takes other hyponyms except the hyponym in the hyponym set as real extension words corresponding to the seed word; and taking the real expansion word corresponding to the seed word, the context and the seed word as a sample.

Specifically, after extracting a context and a hyponym set from sentences meeting training conditions, the server selects one hyponym from the hyponym set as a seed word, uses other hyponyms in the hyponym set as real expansion words of the seed word, and further, utilizes the seed word, the context and the real expansion words of the seed word to form a sample. It should be appreciated that for one set of hyponyms, the server may generate a number of different samples in the manner described above, the seed words included in the different samples being different, the number of samples generated for one set of hyponyms being equal to the number of hyponyms in the set of hyponyms.

For example, for the sentence "barley grass is rich in vitamins, antioxidants, enzymes, minerals, amino acids, chlorophyll, etc., which satisfies the training conditions," the set of lower words corresponding to the sentence includes "vitamins", "antioxidants", "enzymes", "minerals", "amino acids" and "chlorophyll," the context of the sentence is "barley grass is rich [? Nutrient such as ]. The context contained in the sample with "vitamin" as seed word is "barley grass rich [? Nutrition, etc., the true expanded words of the included seed words are "antioxidant", "enzyme", "mineral", "amino acid" and "chlorophyll"; the context contained in the sample with "antioxidant" as a seed word is "barley grass rich [? And the like, and the true extended words of the included seed words are vitamins, enzymes, minerals, amino acids, chlorophyll, and the like, and the samples for this sentence are not listed here one by one.

Thus, according to the method, a plurality of samples corresponding to each sentence meeting the training conditions in the natural corpus can be generated, and further, a training sample set for training the word expansion model is generated based on the samples corresponding to each sentence meeting the training conditions.

Step 402: and constructing an initial neural network model, and training parameters of the initial neural network model according to the training sample set to obtain a neural network model meeting training ending conditions, wherein the neural network model is used as a word expansion model.

When a server trains a word expansion model, an initial neural network model needs to be constructed as a trained word expansion initial model, the structure of the initial neural network model is the same as that of the word expansion model, the initial neural network model also comprises an input layer and a prediction layer, the input layer comprises a seed word encoder and a context encoder, the prediction layer comprises a full-connection layer and a classification layer, parameters of the full-connection layer are composed of initial word vectors and initial word frequencies corresponding to candidate words in a tag library, and model parameters of all parts in the initial neural network model can be updated continuously along with optimization of the model in the model training process.

After the initial neural network model is built, the server trains model parameters of the initial neural network model by using the training sample set obtained in the step 401, and after the trained initial neural network model meets the training ending condition, a word expansion model which can be put into practical application is built according to the model structure and the model parameters of the neural network model meeting the training ending condition.

It should be noted that, the function that the initial neural network model can realize is the same as the function that the word expansion model can realize, and the initial neural network model can output an output vector capable of reflecting the semantic feature similarity between each tag and the seed word in the tag library according to the input seed word and the context of the seed word, where the tag is a hyponym obtained by a server from a large number of natural corpus, the tag library includes all hyponyms extracted from the natural corpus, and the content included in the tag library is actually consistent with the content included in the candidate word library.

When the initial neural network model is trained, a plurality of samples for training can be acquired from a training sample set, and seed words in the samples and the contexts of the seed words are input into the initial neural network model; the initial neural network model outputs probability vectors capable of representing semantic feature similarity of the seed words and all labels in the label library by correspondingly processing the input seed words and the contexts of the seed words, wherein the probability vectors comprise probabilities which correspond to all labels in the label library and can become expansion words. The server generates a true probability vector according to the true expansion word of the seed word in the sample, wherein the probability corresponding to the true expansion word included in the tag library is 1, and the probabilities corresponding to other tags in the tag library are all 0. And then, the server constructs a loss function according to the error between the probability vector output by the initial neural network model and the true probability vector, and further adjusts model parameters in the initial neural network according to the loss function, so that the optimization of the initial neural network model is realized. When the initial neural network model meets the training ending condition, the word expansion model can be determined according to the model parameters and the model structure of the current neural network model.

When the initial neural network model meets the training ending condition, verifying a first model by using a test sample, wherein the first model is a model obtained by performing first round training optimization on the initial neural network model by using samples in a training sample set, specifically, inputting seed words and the contexts of the seed words in the test sample into the first model, and correspondingly processing the seed words and the contexts of the seed words by using the first model to obtain a predictive probability vector output by the first model; furthermore, according to the true probability vector generated by the true expansion word of the test seed word, the word expansion accuracy is calculated according to the true probability vector and the prediction probability vector, when the word expansion accuracy is larger than a preset threshold, the model performance of the first model can be considered to be better and can meet the requirement, and then the word expansion model can be determined according to the model parameters and the network structure of the first model.

The preset threshold may be set according to actual situations, and is not specifically limited herein.

In addition, when judging whether the neural network model meets the training ending condition, determining whether to continue training the model according to a plurality of models obtained through multiple rounds of training so as to obtain a word expansion model with optimal model performance. Specifically, a plurality of models obtained through multiple rounds of training can be respectively verified by using a test sample, if the difference between word expansion accuracy rates of the models obtained through the rounds of training is smaller, the model can be selected if the performance of the model is considered to have no lifting space, and the word expansion model is determined according to model parameters and a network structure of the model; if the word expansion accuracy of the neural network model obtained through training of each round has a larger gap, the performance of the model is considered to have a space for improvement, and training of the model can be continued until a word expansion model with stable model performance and optimal performance is obtained.

It should be noted that, the number of the labels included in the label library is very large, the training of the initial neural network model based on all the labels in the label library requires very large calculation amount, and the optimizing training of the initial neural network model requires much time, so as to reduce the calculation amount consumed in the word expansion model training process and the time consumed in the word expansion model training process, the method for training the word expansion model provided in this embodiment may further extract a label subset from the label library, and train the initial neural network model based on the label subset.

Specifically, a server inputs samples in a training sample set into an initial neural network model, and obtains a predictive probability vector output by the initial neural network model, wherein the predictive probability vector is obtained by predicting the samples by using a label subset corresponding to the samples; the tag subset corresponding to the sample is a subset extracted from the tag library for the sample, and the tag subset comprises real expansion words related to the sample and candidate words not related to the sample.

Before the server trains the initial neural network model by using a certain sample in the training sample set, the server can extract a label subset corresponding to the sample from a label library in advance aiming at the sample, wherein the label subset comprises real expansion words related to the sample and candidate words not related to the sample. For example, for the inclusion of the seed word "vitamin", the context "barley grass is rich [? And the like, the server can extract a label subset from a label library in advance, wherein the label subset comprises the real extended words of all seed words and a plurality of labels which are randomly selected from the label library and are not related to the sample.

When extracting the tag subset specifically, the server may sample a smaller tag subset from the tag library by using a Sampled Softmax function, so as to prevent the tag subset obtained by sampling from containing a large number of words which are very easy to be judged as negative tags, and may extract words which are frequently co-present with the seed word into the tag subset on the basis of extracting the tag subset by using the Sampled Softmax function, and specifically, may calculate the distribution of each word in the tag library and the seed word by using a formula as shown in the formula (3):

wherein c (s, t) is the number of times that the seed word s and the vocabulary t appear together in the sentences of which the labels meet the training conditions, c(s) is the number of times that the seed word s appears in the sentences of which the training conditions, and P _N (t|s) is the correlation distribution between the vocabulary t and the seed word s.

Therefore, vocabulary which is difficult to identify as a negative sample can be extracted when the tag subset is extracted, the convergence of the model is quickened, and the prediction effect of the final model is improved.

Further, the server may also use the antagonism network to learn more complex distributions of individual vocabularies and seed words to generate higher quality negative labels.

When an initial neural network model is built, the initial word vector and the initial word frequency corresponding to the labels contained in the label subset are used as initial model parameters of a full-connection layer in the initial neural network model; when the initial neural network model is trained, the samples in the training sample set are input into the initial neural network, namely, the seed words and the contexts in the samples are input into the initial neural network, and the initial neural network model outputs a predictive probability vector by carrying out correlation processing on the seed words and the contexts.

And then, the server calculates a loss function according to the predicted probability vector and the true probability vector of the tag subset, takes the minimized loss function as a training target, and updates parameters in the initial neural network model until the neural network reaches convergence, so as to obtain the word expansion model.

Specifically, the server may determine, in advance, a true probability vector of the tag subset, where the true probability vector includes probabilities of becoming the expanded words corresponding to the respective tags in the tag subset, and it should be understood that the probability corresponding to the true expanded word of the seed word in the tag subset is 1, and the probability corresponding to the other candidate words in the tag subset is 0.

After the initial neural network model outputs a predictive probability vector corresponding to a sample, the server calculates a loss function by utilizing the predictive probability vector and a true probability vector of the tag subset, and optimizes the loss function by continuously updating and adjusting model parameters of the initial neural network model so as to minimize the output value of the loss function, and after the neural network reaches convergence, namely the neural network model meets the training ending condition, a word expansion model is generated according to the model structure and the model parameters of the current neural network model.

The initial neural network is trained by adopting the method for training the word expansion model, seed words and contexts in the samples are correspondingly processed by using the initial neural network model, a predictive probability vector corresponding to the samples is output, then a loss function is calculated according to the predictive probability vector and a real probability vector corresponding to the samples, the initial neural network model is optimally trained according to the loss function, and when the initial neural network model meets the training ending condition, the word expansion model is generated according to the model structure and model parameters of the neural network model. In the training process, the initial neural network model is trained based on the seed words and the contexts of the seed words, so that the word expansion model obtained through training can simultaneously consider the semantics of the seed words and the contexts Wen Yuyi of the seed words, and further an output vector is determined based on the context of the seed words, and the seed words can be determined to accord with the context of the seed words based on the output vector.

As described above, by adopting the method for training the word expansion model provided by the embodiment of the application, the word expansion model which can be put into practical application can be obtained by training and optimizing the initial neural network model constructed. In order to further understand the method for training the word expansion model, the training architecture of the word expansion model is described below with reference to the accompanying drawings.

Referring to fig. 5, fig. 5 is a schematic architecture diagram of an initial neural network model training process according to an embodiment of the present application. As shown in fig. 5, the pre-built initial neural network model 5100 includes: an input layer 5110 and a prediction layer 5120.

Wherein the input layer 5110 includes a seed word encoder 5111 and a context encoder 5112.

The seed word encoder 5111 takes a seed word as an input and takes a word vector corresponding to the seed word as an output; the context encoder 5112 takes as input the context of the seed word and as output the context vector corresponding to the seed word.

The input layer 5110 is mainly used for encoding the seed words and the contexts corresponding to the seed words input into the initial neural network model, and respectively obtaining word vectors corresponding to the seed words and context vectors corresponding to the seed words; and then, the word vector corresponding to the obtained seed word and the context vector corresponding to the seed word are spliced to generate a prediction semantic feature vector of the seed word, namely, input data of a prediction layer is generated.

The seed word encoder 5111 included in the input layer 5110 is mainly used for encoding a seed word input into the initial neural network model, and generating a word vector corresponding to the seed word.

Alternatively, the seed word encoder 5111 may be a vector average encoder, which is configured to average the vectors corresponding to all the words in the seed word to obtain the word vector of the seed word. It should be understood that the number of seed words included in a sample is usually small, and is generally one or two, so that a vector average encoder with a simple structure may be directly adopted as the seed word encoder 5111, and the vector average encoder is used to perform average calculation processing on the vector corresponding to the input seed word, so as to obtain the word vector corresponding to the seed word.

The context encoder 5112 included in the input layer 5110 is mainly configured to encode a context corresponding to a seed word input into the word expansion model, and generate a context vector corresponding to the seed word.

Alternatively, the context encoder 5112 may be a text vector encoder for encoding the content on the left and right sides of the placeholder in the context to obtain the context vector. Specifically, the text vector encoder uses two LSTM networks to encode the content on the left of the placeholder and the content on the right of the placeholder in the context respectively, so as to generate a vector of the context, wherein the placeholder refers to the position that the seed word originally occupies in the natural sentence.

Alternatively, the context encoder 5112 may be a CNN encoder considering a position feature, where the CNN encoder may splice a position vector on each word vector of the input CNN layer, where the position vector is determined according to a relative positional relationship between the vocabulary corresponding to each word vector and the placeholder.

Alternatively, the context encoder 5112 may be a vector average encoder, which may directly remove the placeholder, or treat the placeholder as a uncommon word, treat the input context as a common sentence, and generate the context vector corresponding to the seed word by performing an average process on the vectors corresponding to the words in the context.

After the seed word encoder 5111 generates a word vector corresponding to the seed word and the context encoder 5112 generates a context vector corresponding to the context of the seed word, the input layer 5110 concatenates the word vector corresponding to the seed word generated by the seed word encoder 311 and the context vector generated by the context encoder 312 to generate a predicted semantic feature vector of the seed word, and outputs the predicted semantic feature vector of the seed word as an output of the input layer to the prediction layer 320.

The prediction layer 5120 specifically includes: a full connectivity layer 5121 and a classification layer 5122.

The full-connection layer 5121 takes the predicted semantic feature vector of the seed word as input and takes a predicted similarity vector comprising feature similarity between each candidate word and the seed word in the tag library as output; the prediction semantic feature vector of the seed word is a vector generated by splicing a word vector corresponding to the seed word and a context vector.

The classification layer 5122 takes the predicted similarity vector as an input, and takes the predicted probability vector normalized to the predicted similarity vector as an output vector of the initial neural network model.

The prediction layer 5120 mainly plays a role of determining a prediction probability vector capable of representing semantic similarity between each candidate word and the seed word in the tag library as an output vector of the initial neural network model according to the prediction semantic feature vector of the seed word output by the input layer 5110. The predicted semantic feature vector of the seed word is a vector formed by splicing the input layer 5110 according to the word vector corresponding to the seed word and the context vector corresponding to the seed word.

The full-connection layer 5121 included in the prediction layer 5120 is mainly used for determining the predicted feature similarity between each candidate word and the seed word in the tag library according to the predicted semantic feature vector of the seed word output by the input layer 5110, and further generating a similarity vector according to the feature similarity between each candidate word and the seed word in the tag library.

In specific implementation, the full-connection layer 5121 processes the predicted semantic feature vector of the seed word output by the input layer 5110 by utilizing the parameters of the full-connection layer 5121 to obtain the predicted feature similarity between each candidate word and the seed word in the tag library; and further, generating a prediction similarity vector according to the prediction feature similarity corresponding to each candidate word in the tag library. It should be noted that, the parameters of the full-connection layer 5121 actually consist of semantic feature vectors corresponding to each candidate word in the tag library, and optionally, the parameters of the full-connection layer 5121 may include, in addition to the semantic feature vectors corresponding to each candidate word, bias parameters corresponding to each semantic feature vector. The parameters of the full connection layer 5121 can be continuously optimally updated during the training process.

The classification layer 5122 included in the prediction layer 5120 is mainly used for further normalizing the prediction similarity vector output by the full-connection layer 5121, and generating a prediction probability vector capable of representing the prediction semantic similarity between each candidate word and the seed word in the tag library, as an output vector of the initial neural network model.

In specific implementation, the classification layer 5122 may calculate, according to the predicted feature similarity corresponding to each candidate word in the tag library included in the predicted similarity vector, a probability corresponding to each candidate word in the tag library, where the probability can represent a likelihood of an expanded word of the candidate word called a seed word; and generating a predictive probability vector capable of representing semantic similarity between each candidate word and the seed word in the tag library according to the probability corresponding to each candidate word in the tag library.

As shown in fig. 5, the initial neural network model 5100 is model-trained by using the method for training the word expansion model shown in fig. 4, and when the initial neural network model 5100 meets the training end condition, a word expansion model 5200 which can be put into practical use is constructed according to the model structure and model parameters of the current neural network model, and the word expansion model 5200 includes: an input layer 5210 obtained by optimizing the training input layer 5110, and a prediction layer 5220 obtained by optimizing the training prediction layer 5120; among them, the input layer 5210 includes therein a seed word encoder 5211 obtained by optimally training the seed word encoder 5111 and a context encoder 5212 obtained by optimally training the context encoder 5112; the prediction layer 5220 includes a full connection layer 5221 obtained by optimally training the full connection layer 5121 and a classification layer 5222 obtained by optimally training the classification layer 5122.

In order to further understand the word expansion method provided by the embodiment of the present application, the word expansion method provided by the embodiment of the present application is described below in conjunction with an actual application scenario.

Referring to fig. 6, fig. 6 is an application scenario schematic diagram of the word expansion method provided in the embodiment of the present application. The application scenario includes a terminal device 6100 and a word expansion server 6200.

When a user needs to search for certain content through a certain search engine, the user may input a related query sentence in a search field provided by the search engine displayed on the terminal device 6100, and assume that the query sentence input by the user in the search field is "nutrition such as barley grass is rich in amino acids", and accordingly, the terminal device 6100 acquires the query sentence "nutrition such as barley grass is rich in amino acids", and sends the query sentence to the word expansion server 6200.

After the word expansion server 6200 receives the query sentence' barley grass is rich in nutrition such as amino acid, the seed word extraction module 6210 is utilized to carry out semantic analysis on the query sentence, expandable keywords are extracted from the query sentence to serve as seed words, and the part, except the seed words, of the query sentence is used as the context of the seed words; that is, extracting the expandable keyword "amino acids" as seed words, enriching the part of the query sentence "barley grass" other than "amino acids [? And the like "as a context of seed words.

The word expansion server 6200 inputs the seed word extracted by the seed word extraction module 6210 and the context of the seed word into the word expansion model 6220, and the word expansion model 6220 outputs an output vector capable of being used for representing semantic similarity between each candidate word in the candidate word library and the seed word by correspondingly processing the input seed word and the context of the seed word.

When the word expansion model 6220 specifically processes the input seed word and the context of the seed word accordingly, an output vector capable of representing the semantic similarity between each candidate word and the seed word in the candidate word bank needs to be determined based on the input layer 6221 and the prediction layer 6222.

Specifically, the word expansion model 6220 processes the input seed word "amino acids" using a seed word encoder in the input layer 6221 to obtain word vectors corresponding to the seed word "amino acids", and uses a context encoder in the input layer 6221 to enrich the input seed word context "barley grass [? Processing the nutrition such as 'and the like' to obtain a context vector corresponding to the seed word 'amino acid'; then, the input layer 6221 concatenates the word vector output from the seed word encoder and the context vector output from the context encoder to generate a semantic feature vector of the seed word, and inputs the semantic feature vector of the seed word to the prediction layer 6222.

The full-connection layer in the prediction layer 6222 carries out correlation processing on semantic feature vectors of the seed words by utilizing parameters of the full-connection layer, so that feature similarity between each candidate word in the candidate word library and the seed word is generated, and further, similarity vectors are generated by utilizing feature similarity corresponding to each candidate word, wherein the parameters of the full-connection layer are actually word vectors and word frequencies corresponding to each candidate word in the candidate word library; the full-connection layer sends the generated similarity vector to the classification layer, the classification layer performs normalization processing on the similarity vector accordingly, a probability vector capable of representing feature similarity between each candidate word and the seed word in the candidate word library is generated, and the word expansion model 6220 takes the probability vector as an output vector and outputs the output vector to the expansion word determining module 6230.

The expanded word determining module 6230 determines which candidate words in the candidate word library may be expanded words of the seed word based on the received output vector. In specific implementation, the expanded word determining module 6230 may sort the probabilities corresponding to the candidate words in the candidate word library according to the probabilities corresponding to the candidate words in the probability vector, and further select the candidate words corresponding to M probability values with the top order as the expanded words of the seed word, where it should be understood that the specific value of M may be set according to the actual requirement.

Therefore, after the word expansion server 6200 determines the expansion words of the seed words, if the expansion words determined for the amino acids in the query sentence "barley grass is rich in amino acids and other nutrients" include 'vitamins' and 'chlorophyll', the information search operation can be performed according to the determined expansion words, so that the information search range is reasonably expanded, and the information meeting the user requirements is ensured to be provided.

Aiming at the word expansion method, the application also provides a corresponding word expansion device, so that the word expansion method is convenient to apply and realize in practice.

Referring to fig. 7, fig. 7 is a schematic structural view of a word expansion apparatus 700 corresponding to the word expansion method shown in fig. 2 above, the word expansion apparatus 700 comprising:

A first obtaining module 701, configured to obtain a seed word to be expanded and obtain a context of the seed word;

the second obtaining module 702 is configured to obtain, according to the seed word and the context of the seed word, an output vector through a word expansion model, where the output vector is used to characterize semantic similarity between each candidate word in the candidate word bank and the seed word; the word expansion model is a neural network model and is used for predicting semantic similarity between each candidate word and the seed word in the candidate word bank according to the word vector and the context vector corresponding to the seed word;

a determining module 703, configured to determine the expansion word of the seed word according to the output vector.

Optionally, on the basis of the word expansion device shown in fig. 7, the word expansion model includes: an input layer and a prediction layer; wherein,,

the input layer includes: a seed word encoder and a context encoder;

the seed word encoder takes a seed word as input and takes a word vector corresponding to the seed word as output; the context encoder takes the context of the seed word as input and takes the context vector corresponding to the seed word as output;

the prediction layer includes: a full connection layer and a classification layer;

The full-connection layer takes semantic feature vectors of the seed words as input and takes similarity vectors comprising feature similarity between each candidate word and the seed word in the candidate word library as output; the semantic feature vector of the seed word is a vector generated by splicing the word vector and the context vector;

the classification layer takes the similarity vector as input, and takes a probability vector normalized by the similarity vector as an output vector of the word expansion model.

Optionally, on the basis of the word expansion device shown in fig. 7, the determining module 703 is specifically configured to:

selecting candidate words corresponding to M elements ranked at the front from the candidate word library according to the descending order and the sorting order of the element values in the output vector, and taking the candidate words as expansion words of the seed words; wherein M is the threshold value of the number of the expansion words.

Optionally, on the basis of the word expansion device shown in fig. 7, the first obtaining module 701 is specifically configured to:

acquiring a query statement, wherein the query statement refers to a search condition for querying information, which is input in a search engine;

extracting keywords from the query statement as seed words to be expanded, and extracting the context of the seed words from the query statement;

Correspondingly, as shown in fig. 8, on the basis of the word expansion device shown in fig. 7, the word expansion device further includes:

and the searching module 801 is used for searching information according to the expansion words of the seed words so as to return search results.

acquiring a text sentence, wherein the text sentence refers to a sentence in text information input in a text editor;

extracting keywords from the text sentence as seed words to be expanded, and extracting the context of the seed words from the text sentence;

correspondingly, as shown in fig. 9, on the basis of the word expansion device shown in fig. 7, the word expansion device further includes:

and the prompting module 901 is used for prompting information according to the expansion words of the seed words.

acquiring a question-answer sentence, wherein the question-answer sentence is a sentence input on an input interface of a question-answer system;

extracting keywords from the question-answer sentence as seed words to be expanded, and extracting the context of the seed words from the question-answer sentence;

Correspondingly, as shown in fig. 10, on the basis of the word expansion device shown in fig. 7, the word expansion device further includes:

and the searching module 1001 is configured to search for response content according to the expanded word of the seed word, and return the response content.

In the word expansion device provided by the embodiment of the application, in the process of predicting the expansion word of the seed word, the semantics of the seed word is considered, and the context semantics of the seed word is considered, so that the expansion word of the seed word can be determined based on the influence of the context of the seed word, namely, the expansion word determined based on the word expansion model can accord with the context of the seed word, thereby providing information capable of meeting business requirements for each natural language processing application, and improving the application performance of the natural language processing application.

Aiming at the method for training the word expansion model, the application also provides a corresponding device for training the word expansion model, so that the method for training the word expansion model can be applied to practice and realized.

Referring to fig. 11, fig. 11 is a schematic structural view of an apparatus 1100 for training a word expansion model corresponding to the method for training a word expansion model shown in fig. 4 above, the apparatus 1100 for training a word expansion model including:

An obtaining module 1101, configured to obtain a training sample set, where each sample in the training sample set includes: seed words, contexts of the seed words and real expansion words corresponding to the seed words;

and a construction module 1102, configured to construct an initial neural network model, train parameters of the initial neural network model according to the training sample set, so as to obtain a neural network model that meets the training ending condition, and use the neural network model as a word expansion model.

Optionally, referring to fig. 12, based on the apparatus for training a word expansion model shown in fig. 11, another apparatus 1200 for training a word expansion model provided in this embodiment of the present application, where the apparatus 1200 for training a word expansion model further includes:

the extraction module 1201 is configured to extract sentences satisfying training conditions from natural corpus, and extract context and hyponym sets for the sentences satisfying training conditions, where the hyponym sets at least include two hyponyms;

a seed word determining module 1202, configured to use one hyponym in the hyponym set as a seed word, and use other hyponyms except the one hyponym in the hyponym set as real expansion words corresponding to the one seed word;

The sample determining module 1203 is configured to use the one seed word and the context thereof, and the real expansion word corresponding to the one seed word as a sample;

and the generating module 1204 is configured to generate a sample training set according to samples corresponding to each sentence in the natural corpus.

Optionally, on the basis of the apparatus for training a word expansion model shown in fig. 12, the extracting module 1201 is specifically configured to:

according to the regular expression of the Hearst mode, extracting sentences meeting the Hearst mode from the natural corpus, and extracting corresponding context and hyponym sets aiming at the sentences meeting the Hearst mode.

Optionally, on the basis of the apparatus for training a word expansion model shown in fig. 11, the initial neural network model includes: an input layer and a prediction layer; wherein,,

the input layer includes: a seed word encoder and a context encoder;

The full-connection layer takes the predicted semantic feature vector of the seed word as input and takes a predicted similarity vector comprising the feature similarity between each candidate word and the seed word in the candidate word library as output; the predicted semantic feature vector of the seed word is a vector generated by splicing the word vector and the context vector;

the classification layer takes the prediction similarity vector as input, and takes the prediction probability vector normalized by the prediction similarity vector as the output vector of the initial neural network model.

Optionally, on the basis of the apparatus for training a word expansion model shown in fig. 11, the building module 1102 is specifically configured to:

inputting samples in the training sample set into an initial neural network model, and obtaining a predictive probability vector output by the initial neural network model, wherein the predictive probability vector is obtained by predicting the samples by using a label subset corresponding to the samples; the method comprises the steps that a tag subset corresponding to a sample is extracted from a tag library aiming at the sample, wherein the tag subset comprises real expansion words related to the sample and candidate words not related to the sample;

And calculating a loss function according to the predicted probability vector and the true probability vector of the tag subset, taking the minimized loss function as a training target, and updating parameters in the initial neural network model until the neural network reaches convergence, so as to obtain a word expansion model.

Optionally, on the basis of the apparatus for training a word expansion model shown in fig. 11, the seed word encoder includes: the vector average coder is used for averaging the vectors corresponding to all the vocabularies in the seed words to obtain word vectors of the seed words;

the context encoder includes: and the text vector encoder is used for respectively encoding the content on the left side and the right side of the placeholder in the context to obtain the vector of the context.

The device for training the word expansion model is used for training the initial neural network, the seed words and the context in the sample are correspondingly processed by the initial neural network model, a predictive probability vector corresponding to the sample is output, then a loss function is calculated according to the predictive probability vector and a real probability vector corresponding to the sample, the initial neural network model is optimally trained according to the loss function, and when the initial neural network model meets the training ending condition, the word expansion model is generated according to the model structure and model parameters of the neural network model. In the training process, the initial neural network model is trained based on the seed words and the contexts of the seed words, so that the word expansion model obtained through training can simultaneously consider the semantics of the seed words and the contexts Wen Yuyi of the seed words, and further an output vector is determined based on the context of the seed words, and the seed words can be determined to accord with the context of the seed words based on the output vector.

The present application also provides a word expansion device, which may specifically be a server, referring to fig. 13, and fig. 13 is a schematic structural diagram of a word expansion server provided in an embodiment of the present application, where the server 1300 may generate relatively large differences due to configuration or performance, and may include one or more central processing units (central processing units, CPU) 1322 (e.g. one or more processors) and a memory 1332, and one or more storage media 1330 (e.g. one or more mass storage devices) storing application programs 1342 or data 1344. Wherein the memory 1332 and storage medium 1330 may be transitory or persistent. The program stored on the storage medium 1330 may include one or more modules (not shown), each of which may include a series of instruction operations on a server. Further, the central processor 1322 may be configured to communicate with the storage medium 1330, and execute a series of instruction operations in the storage medium 1330 on the server 1300.

The server 1300 may also include one or more power supplies 1326, one or more wired or wireless network interfaces 1350, one or more input/output interfaces 1358, and/or one or more operating systems 1341, such as Windows server (tm), mac OS XTM, unixTM, linuxTM, freeBSDTM, and so forth.

The steps performed by the server in the above embodiments may be based on the server structure shown in fig. 13.

Wherein CPU1322 is configured to perform the following steps:

acquiring a seed word to be expanded and acquiring a context of the seed word;

Optionally, CPU1322 may also perform method steps for any particular implementation of the word expansion method in embodiments of the present application.

In addition, the application further provides a device for training the word expansion model, which can be specifically a server, the structure of the server is similar to that of the word expansion device shown in fig. 13, and the CPU is configured to execute the following steps:

Optionally, the CPU may further perform method steps of any specific implementation manner of the method for training a word expansion model in the embodiments of the present application.

The embodiment of the present application further provides another word expansion device, which may be a terminal device, as shown in fig. 14, for convenience of explanation, only a portion related to the embodiment of the present application is shown, and specific technical details are not disclosed, please refer to a method portion of the embodiment of the present application. The terminal can be any terminal equipment including a mobile phone, a tablet personal computer, a personal digital assistant (English full name: personal Digital Assistant, english abbreviation: PDA), a Sales terminal (English full name: point of Sales, english abbreviation: POS), a vehicle-mounted computer and the like, taking the mobile phone as an example of the terminal:

fig. 14 is a block diagram showing a part of the structure of a mobile phone related to a terminal provided in an embodiment of the present application. Referring to fig. 14, the mobile phone includes: radio Frequency (r.f. Frequency) circuit 1410, memory 1420, input unit 1430, display unit 1440, sensor 1450, audio circuit 1460, wireless fidelity (r.f. wireless fidelity, wiFi) module 1470, processor 1480, and power supply 1490. It will be appreciated by those skilled in the art that the handset construction shown in fig. 14 is not limiting of the handset and may include more or fewer components than shown, or may combine certain components, or a different arrangement of components.

The following describes the components of the mobile phone in detail with reference to fig. 14:

the RF circuit 1410 may be used for receiving and transmitting signals during a message or a call, and particularly, after receiving downlink information of a base station, the downlink information is processed by the processor 1480; in addition, the data of the design uplink is sent to the base station. Generally, RF circuitry 1410 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier (English full name: low Noise Amplifier, english abbreviation: LNA), a duplexer, and the like. In addition, the RF circuitry 1410 may also communicate with networks and other devices through wireless communications. The wireless communication may use any communication standard or protocol, including but not limited to global system for mobile communications (english: global System of Mobile communication, english: GSM), general packet radio service (english: general Packet Radio Service, GPRS), code division multiple access (english: code Division Multiple Access, english: CDMA), wideband code division multiple access (english: wideband Code Division Multiple Access, english: WCDMA), long term evolution (english: long Term Evolution, english: LTE), email, short message service (english: short Messaging Service, SMS), and the like.

The memory 1420 may be used to store software programs and modules, and the processor 1480 performs various functional applications and data processing of the cellular phone by executing the software programs and modules stored in the memory 1420. The memory 1420 may mainly include a storage program area that may store an operating system, application programs required for at least one function (such as a sound playing function, an image playing function, etc.), and a storage data area; the storage data area may store data (such as audio data, phonebook, etc.) created according to the use of the handset, etc. In addition, memory 1420 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device.

The input unit 1430 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the handset. In particular, the input unit 1430 may include a touch panel 1431 and other input devices 1432. The touch panel 1431, also referred to as a touch screen, may collect touch operations thereon or thereabout by a user (e.g., operations of the user on the touch panel 1431 or thereabout by using any suitable object or accessory such as a finger, a stylus, etc.), and drive the corresponding connection device according to a predetermined program. Alternatively, the touch panel 1431 may include two parts, a touch detection device and a touch controller. The touch detection device detects the touch azimuth of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch sensing device and converts it into touch point coordinates, which are then sent to the processor 1480, and can receive commands from the processor 1480 and execute them. Further, the touch panel 1431 may be implemented in various types such as a resistive type, a capacitive type, an infrared type, and a surface acoustic wave type. The input unit 1430 may include other input devices 1432 in addition to the touch panel 1431. In particular, the other input devices 1432 may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control keys, switch keys, etc.), a trackball, mouse, joystick, etc.

The display unit 1440 may be used to display information input by a user or information provided to the user and various menus of the mobile phone. The display unit 1440 may include a display panel 1441, and alternatively, the display panel 1441 may be configured in a form of a liquid crystal display (english full name: liquid Crystal Display, acronym: LCD), an Organic Light-Emitting Diode (OLED), or the like. Further, the touch panel 1431 may overlay the display panel 1441, and when the touch panel 1431 detects a touch operation thereon or nearby, the touch operation is transferred to the processor 1480 to determine the type of the touch event, and then the processor 1480 provides a corresponding visual output on the display panel 1441 according to the type of the touch event. Although in fig. 14, the touch panel 1431 and the display panel 1441 are two separate components to implement the input and output functions of the mobile phone, in some embodiments, the touch panel 1431 may be integrated with the display panel 1441 to implement the input and output functions of the mobile phone.

The handset can also include at least one sensor 1450, such as a light sensor, motion sensor, and other sensors. Specifically, the light sensor may include an ambient light sensor that may adjust the brightness of the display panel 1441 according to the brightness of ambient light, and a proximity sensor that may turn off the display panel 1441 and/or the backlight when the phone is moved to the ear. As one of the motion sensors, the accelerometer sensor can detect the acceleration in all directions (generally three axes), and can detect the gravity and direction when stationary, and can be used for applications of recognizing the gesture of a mobile phone (such as horizontal and vertical screen switching, related games, magnetometer gesture calibration), vibration recognition related functions (such as pedometer and knocking), and the like; other sensors such as gyroscopes, barometers, hygrometers, thermometers, infrared sensors, etc. that may also be configured with the handset are not described in detail herein.

Audio circuitry 1460, speaker 1461, microphone 1462 may provide an audio interface between the user and the handset. The audio circuit 1460 may transmit the received electrical signal after the audio data conversion to the speaker 1461, and the electrical signal is converted into a sound signal by the speaker 1461 and output; on the other hand, the microphone 1462 converts the collected sound signals into electrical signals, which are received by the audio circuit 1460 and converted into audio data, which are processed by the audio data output processor 1480 and sent via the RF circuit 1410 to, for example, another cell phone, or which are output to the memory 1420 for further processing.

WiFi belongs to a short-distance wireless transmission technology, and a mobile phone can help a user to send and receive emails, browse webpages, access streaming media and the like through a WiFi module 1470, so that wireless broadband Internet access is provided for the user. Although fig. 14 shows a WiFi module 1470, it is understood that it does not belong to the necessary components of a cell phone, and can be omitted entirely as needed within the scope of not changing the essence of the invention.

The processor 1480 is a control center of the handset, connects various parts of the entire handset using various interfaces and lines, performs various functions of the handset and processes data by running or executing software programs and/or modules stored in the memory 1420, and invoking data stored in the memory 1420. In the alternative, processor 1480 may include one or more processing units; preferably, the processor 1480 may integrate an application processor that primarily handles operating systems, user interfaces, applications, etc., with a modem processor that primarily handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 1480.

The handset further includes a power supply 1490 (e.g., a battery) for powering the various components, which may be logically connected to the processor 1480 via a power management system so as to provide for managing charge, discharge, and power consumption by the power management system.

Although not shown, the mobile phone may further include a camera, a bluetooth module, etc., which will not be described herein.

In the embodiment of the present application, the processor 1480 included in the terminal further has the following functions:

acquiring a seed word to be expanded and acquiring a context of the seed word;

inputting the seed word and the context of the seed word into a word expansion model to obtain an output vector of the word expansion model, wherein the output vector is used for representing the semantic similarity between each candidate word in a candidate word library and the seed word; the word expansion model is a neural network model and is used for predicting semantic similarity between each candidate word and the seed word in the candidate word bank according to the word vector and the context vector corresponding to the seed word;

Optionally, the processor 1480 may also perform method steps of any specific implementation of the word extension method in the embodiments of the present application.

In addition, the application further provides a device for training the word expansion model, which may be specifically a terminal device, the structure of the terminal device is similar to that of the word expansion device shown in fig. 14, and the processor is configured to execute the following steps:

Optionally, the processor may further perform method steps of any specific implementation of the method for training a word expansion model in the embodiments of the present application.

The present application also provides a computer readable storage medium storing program code for executing any one of the word expansion methods described in the foregoing embodiments, or any one of the methods for training a word expansion model.

The embodiments also provide a computer program product comprising instructions which, when run on a computer, cause the computer to perform any one of the word expansion methods described in the foregoing embodiments, or any one of the methods of training a word expansion model.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, which are not repeated herein.

In the several embodiments provided in this application, it should be understood that the disclosed systems, apparatuses, and methods may be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: u disk, mobile hard disk, read-Only Memory (ROM), random access Memory (Random Access Memory, RAM), magnetic disk or optical disk, etc.

The above embodiments are merely for illustrating the technical solution of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the corresponding technical solutions.

Claims

1. A method of word expansion, comprising:

acquiring a seed word to be expanded and acquiring a context of the seed word, wherein the context of the seed word is a part remained after the seed word is removed in a natural sentence, and the context of the seed word comprises a placeholder corresponding to the seed word;

according to the seed word and the context of the seed word, obtaining an output vector through a word expansion model, wherein the output vector is used for representing the semantic similarity between each candidate word in a candidate word library and the seed word; the word expansion model is a neural network model and is used for predicting semantic similarity between each candidate word and the seed word in the candidate word bank according to the word vector and the context vector corresponding to the seed word; the method comprises the steps that a word vector corresponding to a seed word is spliced with a position vector, the position vector is determined according to the relative position relation between the seed word and a placeholder, or the context vector is obtained by respectively encoding the left and right contents of the placeholder in the context of the seed word; the word expansion model is obtained by training based on the seed word, the context of the seed word and the real expansion word corresponding to the seed word;

Determining the expansion word of the seed word from the candidate word stock according to the output vector, so that the expansion word of the seed word accords with the context of the seed word;

the training based on the seed word, the context of the seed word and the real expansion word corresponding to the seed word comprises the following steps:

selecting sentences meeting training conditions from natural corpus, wherein the sentences meeting training conditions are natural sentences comprising at least two similar words;

extracting a context and a hyponym set from the sentences meeting training conditions, wherein the hyponym set at least comprises two hyponyms, and the hyponyms in the hyponym set are similar words;

taking one hyponym in the hyponym set as a seed word, and taking other hyponyms except the one hyponym in the hyponym set as real expansion words corresponding to the one seed word;

taking the seed word, the context thereof and the real expansion word corresponding to the seed word as a sample;

generating a sample training set according to samples corresponding to each sentence in the natural corpus;

inputting seed words and the context of the seed words in a training sample set into an initial neural network model, and obtaining a predictive probability vector output by the initial neural network model, wherein the predictive probability vector is obtained by predicting samples by using a label subset corresponding to the samples; the method comprises the steps that a tag subset corresponding to a sample is extracted from a tag library aiming at the sample, wherein the tag subset comprises real expansion words related to the sample and candidate words not related to the sample;

2. The method of claim 1, wherein the word expansion model comprises: an input layer and a prediction layer; wherein,,

the input layer includes: a seed word encoder and a context encoder;

3. The method of claim 1, wherein said determining the expanded word of the seed word from the candidate word stock based on the output vector comprises:

4. The method of claim 1, wherein the obtaining the seed word to be expanded and obtaining the context of the seed word comprises:

the method further comprises:

and searching information according to the expansion words of the seed words to return search results.

5. The method of claim 1, wherein the obtaining the seed word to be expanded and obtaining the context of the seed word comprises:

the method further comprises:

and carrying out information prompt according to the expansion words of the seed words.

6. The method of claim 1, wherein the obtaining the seed word to be expanded and obtaining the context of the seed word comprises:

the method further comprises:

searching response content according to the expansion words of the seed words, and returning the response content.

7. A method of training a word expansion model, comprising:

extracting sentences meeting training conditions from natural corpus, wherein the sentences meeting the training conditions are natural sentences comprising at least two similar words;

extracting a context and a hyponym set aiming at the sentences meeting the training conditions, wherein the hyponym set at least comprises two hyponyms, and the hyponyms in the hyponym set are similar words;

constructing an initial neural network model, inputting seed words in the training sample set and the context of the seed words into the initial neural network model, and obtaining a predictive probability vector output by the initial neural network model, wherein the predictive probability vector is obtained by predicting a sample by using a label subset corresponding to the sample; the method comprises the steps that a tag subset corresponding to a sample is extracted from a tag library aiming at the sample, wherein the tag subset comprises real expansion words related to the sample and candidate words not related to the sample;

Calculating a loss function according to the predicted probability vector and the true probability vector of the tag subset, taking the minimized loss function as a training target, updating parameters in the initial neural network model until the neural network is converged to obtain a word expansion model, wherein the word expansion model is used for predicting semantic similarity between each candidate word and each seed word in a candidate word bank according to word vectors and context vectors corresponding to the seed words, and the semantic similarity is used for determining expansion words of the seed words from the candidate word bank and enabling the expansion words of the seed words to accord with the context of the seed words; the context of the seed word is the part remained after the seed word is removed in the natural sentence, and the context of the seed word comprises a placeholder corresponding to the seed word;

and the word vectors corresponding to the seed words are spliced with position vectors, and the position vectors are determined according to the relative position relation between the seed words and the placeholders, or the context vectors are obtained by respectively encoding the left and right contents of the placeholders in the context of the seed words.

8. The method of claim 7, wherein the initial neural network model comprises: an input layer and a prediction layer; wherein,,

The input layer includes: a seed word encoder and a context encoder;

9. A word expansion device, comprising:

the first acquisition module is used for acquiring a seed word to be expanded and acquiring the context of the seed word, wherein the context of the seed word is a part remained after the seed word is removed in a natural sentence, and the context of the seed word comprises a placeholder corresponding to the seed word;

The second acquisition module is used for acquiring an output vector through a word expansion model according to the seed word and the context of the seed word, wherein the output vector is used for representing the semantic similarity between each candidate word in the candidate word library and the seed word; the word expansion model is a neural network model and is used for predicting semantic similarity between each candidate word and the seed word in the candidate word bank according to the word vector and the context vector corresponding to the seed word; the method comprises the steps that a word vector corresponding to a seed word is spliced with a position vector, the position vector is determined according to the relative position relation between the seed word and a placeholder, or the context vector is obtained by respectively encoding the left and right contents of the placeholder in the context of the seed word; the word expansion model is obtained by training based on the seed word, the context of the seed word and the real expansion word corresponding to the seed word;

the determining module is used for determining the expansion word of the seed word from the candidate word stock according to the output vector so that the expansion word of the seed word accords with the context of the seed word;

10. The word expansion device of claim 9, wherein the word expansion model comprises: an input layer and a prediction layer; wherein,,

the input layer includes: a seed word encoder and a context encoder;

11. An apparatus for training a word expansion model, comprising:

the extraction module is used for extracting sentences meeting training conditions from natural corpus, wherein the sentences meeting the training conditions are natural sentences comprising at least two similar words; extracting a context and a hyponym set aiming at the sentences meeting the training conditions, wherein the hyponym set at least comprises two hyponyms, and the hyponyms in the hyponym set are similar words;

the seed word determining module is used for taking one hyponym in the hyponym set as one seed word and taking other hyponyms except the one hyponym in the hyponym set as real expansion words corresponding to the one seed word;

the sample determining module is used for taking the seed word, the context thereof and the real expansion word corresponding to the seed word as a sample;

the generation module is used for generating a sample training set according to samples corresponding to each sentence in the natural corpus;

The construction module is used for constructing an initial neural network model, inputting the seed words in the training sample set and the contexts of the seed words into the initial neural network model, and obtaining a predictive probability vector output by the initial neural network model, wherein the predictive probability vector is obtained by predicting a sample by utilizing a label subset corresponding to the sample; the method comprises the steps that a tag subset corresponding to a sample is extracted from a tag library aiming at the sample, wherein the tag subset comprises real expansion words related to the sample and candidate words not related to the sample; calculating a loss function according to the predicted probability vector and the true probability vector of the tag subset, taking the minimized loss function as a training target, updating parameters in the initial neural network model until the neural network is converged to obtain a word expansion model, wherein the word expansion model is used for predicting semantic similarity between each candidate word and each seed word in a candidate word bank according to word vectors and context vectors corresponding to the seed words, and the semantic similarity is used for determining expansion words of the seed words from the candidate word bank and enabling the expansion words of the seed words to accord with the context of the seed words; the context of the seed word is the part remained after the seed word is removed in the natural sentence, and the context of the seed word comprises a placeholder corresponding to the seed word;

12. The apparatus for training a word expansion model of claim 11, wherein the initial neural network model comprises: an input layer and a prediction layer; wherein,,

the input layer includes: a seed word encoder and a context encoder;

13. A computer device, the computer device comprising a processor and a memory:

the processor is configured to perform the word expansion method of any of claims 1-6, or the method of training a word expansion model of any of claims 7-8, according to instructions in the program code.