WO2019149076A1 - 词向量生成方法、装置以及设备 - Google Patents

词向量生成方法、装置以及设备 Download PDF

Info

Publication number
WO2019149076A1
WO2019149076A1 PCT/CN2019/072081 CN2019072081W WO2019149076A1 WO 2019149076 A1 WO2019149076 A1 WO 2019149076A1 CN 2019072081 W CN2019072081 W CN 2019072081W WO 2019149076 A1 WO2019149076 A1 WO 2019149076A1
Authority
WO
WIPO (PCT)
Prior art keywords
word
vector
words
neural network
corpus
Prior art date
Application number
PCT/CN2019/072081
Other languages
English (en)
French (fr)
Inventor
曹绍升
周俊
Original Assignee
阿里巴巴集团控股有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司 filed Critical 阿里巴巴集团控股有限公司
Priority to SG11202004446PA priority Critical patent/SG11202004446PA/en
Publication of WO2019149076A1 publication Critical patent/WO2019149076A1/zh
Priority to US16/879,316 priority patent/US10824819B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present specification relates to the field of computer software technologies, and in particular, to a word vector generation method, apparatus and device.
  • a word vector is a vector that maps a word to a fixed dimension that represents the semantic information of the word.
  • common algorithms for generating word vectors include, for example, Google's word vector algorithm, Microsoft's deep neural network algorithm, and the like.
  • the embodiment of the present specification provides a word vector generation method, device and device for solving the following technical problem: a more accurate word vector generation scheme is needed.
  • Circulating neural networks are trained according to feature vectors of the words and eigenvectors of context words of the words in the corpus;
  • a word vector of each word is generated according to the feature vector of each word and the cyclic neural network after training.
  • the training module trains the cyclic neural network according to the feature vector of each word and the feature vector of the context word of each word in the corpus;
  • Step 1 establishing a vocabulary composed of words obtained by corpus segmentation, the words do not include words appearing in the corpus less than a set number of times; jump step 2;
  • Step 2 determining the total number of n-ary characters corresponding to each word, the same n-ary characters are counted only once, the n-ary characters represent consecutive n characters of their corresponding words; jump step 3;
  • Step 3 According to each n-character corresponding to each word, respectively establish a feature vector whose dimensions are the total number, and each dimension of the feature vector corresponds to a different n-ary character, and each dimension The value of the corresponding n-ary character corresponds to the word corresponding to the feature vector; jump step 4;
  • Step 4 traversing the corpus after the word segmentation, performing step 5 on the current word traversed, and performing step 6 if the traversal is completed, otherwise continuing the traversal;
  • Step 5 centering on the current word, sliding up to k words on both sides to establish a window, using words other than the current word in the window as context words, and inputting a sequence composed of feature vectors of all context words into the cyclic neural network.
  • the sequence representation layer performs a loop calculation to obtain a first vector; and inputs a current word and a feature vector of the negative sample word selected in the corpus into the fully connected layer of the cyclic neural network to calculate, respectively, to obtain a second vector and a a third vector; updating parameters of the cyclic neural network according to the first vector, the second vector, the third vector, and a specified loss function;
  • the loop calculation is performed according to the following formula:
  • the loss function includes:
  • x t represents the input unit of the sequence representation layer at time t
  • ie the feature vector of the t+1th context word of the current word
  • s t represents the hidden unit of the sequence representation layer at time t
  • o t represents the current word
  • U, W, V represent the weight parameters of the sequence representation layer
  • represents the excitation function
  • c represents the first vector
  • w represents the second vector
  • w' m represents the third vector corresponding to the mth negative sample word
  • represents the offset parameter of the fully connected layer
  • represents the hyperparameter
  • s represents the similarity calculation function
  • represents the number of negative sample words
  • step 6 the feature vectors of the words are respectively input into the fully connected layer of the trained cyclic neural network to perform calculation, and a corresponding word vector is obtained.
  • At least one processor and,
  • the memory stores instructions executable by the at least one processor, the instructions being executed by the at least one processor to enable the at least one processor to:
  • Circulating neural networks are trained according to feature vectors of the words and eigenvectors of context words of the words in the corpus;
  • a word vector of each word is generated according to the feature vector of each word and the cyclic neural network after training.
  • the cyclic neural network can describe the overall semantic information of the context of the word through loop calculation, extract more context semantic information, and the n-ary character can be more refined. Words are expressed on the ground, thus helping to generate word vectors more accurately.
  • FIG. 1 is a schematic diagram of an overall architecture involved in an implementation scenario of the present specification
  • FIG. 2 is a schematic flowchart diagram of a method for generating a word vector according to an embodiment of the present disclosure
  • FIG. 3 is a schematic diagram of a feature vector of an English word in an actual application scenario provided by an embodiment of the present disclosure
  • FIG. 4 is a schematic diagram of a principle of a word vector generation method based on a cyclic neural network in an actual application scenario provided by an embodiment of the present disclosure
  • FIG. 5 is a schematic structural diagram of a sequence representation layer of a cyclic neural network according to an embodiment of the present disclosure
  • FIG. 6 is a schematic flowchart diagram of another method for generating a word vector according to an embodiment of the present disclosure
  • FIG. 7 is a schematic structural diagram of a word vector generating apparatus corresponding to FIG. 2 according to an embodiment of the present disclosure.
  • Embodiments of the present specification provide a word vector generation method, apparatus, and device.
  • FIG. 1 is a schematic diagram of an overall architecture involved in an implementation scenario of the present specification.
  • the overall architecture mainly involves training a server that generates a word vector for a cyclic neural network.
  • the feature vector of each word can be established based on the n-ary character, and the cyclic neural network can be trained by using the context vector of the feature vector and the word.
  • the feature vector can be established by the server or another device.
  • the scheme of this specification applies to languages composed of letters, such as English, French, German, Spanish, etc. It also applies to languages composed of non-letter elements but can be conveniently mapped to letters, such as Chinese (can be mapped to pinyin letters) ), Japanese (can be mapped to Roman letters), etc.
  • languages composed of letters such as English, French, German, Spanish, etc.
  • languages composed of non-letter elements can be conveniently mapped to letters, such as Chinese (can be mapped to pinyin letters) ), Japanese (can be mapped to Roman letters), etc.
  • Chinese can be mapped to pinyin letters
  • Japanese can be mapped to Roman letters
  • FIG. 2 is a schematic flowchart diagram of a method for generating a word vector according to an embodiment of the present disclosure.
  • the execution body of the process includes at least one of the following devices: a personal computer, a large and medium-sized computer, a computer cluster, a mobile phone, a tablet computer, a smart wearable device, a car machine, and the like.
  • the process in Figure 2 can include the following steps:
  • the words may specifically be: at least part of the words that appear at least once in the corpus.
  • the words can be saved in the vocabulary, and the words can be read from the vocabulary when needed.
  • the word can be screened out so that Not included in the words.
  • the words are specifically: part of the words that appear at least once in the corpus.
  • a character of a word may include a character constituting the word, or other characters mapped by characters constituting the word. Taking the word “boy” as an example, “b", “o", and “y” are characters constituting the word "boy”.
  • mark characters can be added to the original word according to certain rules, and these mark characters can also be regarded as characters of the word, for example, can be added at the beginning and/or the end position of the original word. Label the characters. After the label, the word “boy” is denoted as "#boy#", and the two "#” can also be regarded as the characters of the word "boy”.
  • n is an integer not less than 1.
  • 3-character characters include “#bo” (1st to 3rd characters), “boy” (2nd to 4th characters), and “oy#” (3rd to 5th characters).
  • the four-character characters include “#boy” (1st to 4th characters) and "boy# (2nd to 5th characters)".
  • the value of n may be dynamically adjustable.
  • the value of n can only take one (for example, only the 3-ary character corresponding to the word is determined), or multiple (for example, determining the word) Corresponding 3 yuan characters and 4 yuan characters, etc.).
  • n-ary characters can be represented based on a specified code (eg, numbers, etc.). For example, different characters or different n-ary characters can be represented by a different code or code string.
  • the feature vector of the word may be passed through the values of the respective dimensions to indicate the n-ary character corresponding to the word. More precisely, the feature vector of the word can also be used to indicate the order of the n-ary characters corresponding to the word.
  • the sequence representation layer of the cyclic neural network is used to process the sequence to obtain global information of the sequence, and the content of the global information is also affected by the order of each element in the sequence.
  • the sequence is composed of the feature vector of each context word of the current word (each word may be the current word respectively), and the global information may refer to the overall semantics of all context words of the current word.
  • S208 Generate a word vector of each word according to the feature vector of each word and the cyclic neural network after training.
  • the parameters include, for example, weight parameters and offset parameters.
  • the word vector can be obtained.
  • the cyclic neural network can describe the overall semantic information of the context of the word through loop calculation, extract more contextual semantic information, and the n-ary characters can express the words more finely, thus contributing to Generate word vectors more accurately.
  • the embodiments of the present specification further provide some specific implementations of the method, and an extended solution, which will be described below.
  • the eigenvectors of the words are established according to the n-ary characters corresponding to the words, which may specifically include:
  • the embodiment of the present specification provides a schematic diagram of a feature vector of an English word in a practical application scenario, as shown in FIG. 3 .
  • the English word is "boy", and the start position and the end position are respectively added with an annotated character "#”, and f represents a process of constructing a feature vector according to a word, such as a column vector, according to each of "boy” Meta-character creation, you can see that there are 3 elements in the feature vector with a value of 1, indicating the 3-character characters "#bo", “boy”, “oy#", and other elements have a value of 0, indicating "boy” does not correspond to other 3 dollar characters.
  • the goal is to make the feature vector of the current word and the context word reasonably higher after being inferred by the trained cyclic neural network.
  • the context word is regarded as a positive example word.
  • one or more negative sample words of the current word can also be selected according to a certain rule to participate in the training, which is beneficial to the training to quickly converge and obtain more accurate Training results.
  • the target may further include causing the feature vector of the current word and the negative sample word to be relatively low after being inferred by the trained cyclic neural network.
  • Negative sample words can be randomly selected in the corpus, or selected in non-context words, and so on. This specification does not limit the specific way of calculating the similarity. For example, the similarity can be calculated based on the angle cosine operation of the vector, the similarity can be calculated based on the square sum operation of the vector, and the like.
  • the cyclic neural network is trained according to the feature vector of the words and the feature vector of the context words of the words in the corpus. Specifically, it may include:
  • the cyclic neural network is trained according to the feature vector of each word, and the feature words of the words in the corpus and the feature vectors of the negative sample words.
  • the training process of the cyclic neural network may be iterative.
  • a simpler method is to traverse the corpus after the word segmentation, and each iteration is repeated to one of the above words until an iteration is performed. After the traversal is completed, it can be considered that the circulatory neural network has been trained using the corpus.
  • the training of the cyclic neural network according to the feature vector of the words and the context words of the words in the corpus and the feature vectors of the negative sample words may include:
  • FIG. 4 is a schematic diagram of a principle of a word vector generation method based on a cyclic neural network in an actual application scenario provided by an embodiment of the present disclosure.
  • the cyclic neural network of FIG. 4 mainly includes a sequence representation layer, a fully connected layer, and a Softmax layer.
  • the feature vector of the context word is processed by the sequence representation layer to extract the word meaning information of the context word as a whole, and the feature vector of the current word and its negative sample word can be processed by the fully connected layer. .
  • the details are explained below.
  • a sliding window is used to determine a context word.
  • the center of the sliding window is the current word traversed, and the words other than the current word in the sliding window are context words.
  • the feature vectors of all the context words are sequentially input (the feature vectors of each context word are respectively one element in the above sequence) into the sequence representation layer, and then the loop calculation can be performed according to the following formula:
  • x t represents the input unit of the sequence representation layer at time t
  • ie the feature vector of the t+1th context word of the current word
  • s t represents the hidden unit of the sequence representation layer at time t
  • o t represents the current word
  • U, W, and V represent weight parameters of the sequence representation layer
  • represents an excitation function, such as a tanh function or a ReLU function.
  • the subscript of the parameter in the formula can start from 0.
  • the embodiment of the present specification also provides a schematic diagram of the structure of the sequence representation layer of the cyclic neural network, as shown in FIG. 5.
  • x represents the input unit of the sequence representation layer
  • s represents the hidden unit of the sequence input layer
  • o represents the output unit of the sequence input layer
  • the hidden unit processes the input data using the activation function
  • the output unit uses the softmax function. Process the input data.
  • the data calculated by the hidden unit at the previous moment will be fed back with a certain weight to the input of the hidden unit at the next moment, so that the contents of the entire sequence sequentially input are reflected in the data finally outputted by the sequence representation layer
  • U represents The weight parameter of the input unit to the hidden unit
  • W represents the weight parameter of the hidden unit fed back from the hidden unit to the next moment
  • V represents the weight parameter from the hidden unit to the output unit.
  • the structure on the left side is expanded, showing the structure of the input layer of three consecutive time series sequences, and the principle of processing the three elements sequentially input by the sequence. It can be seen that at time t-1, the data of the hidden unit calculation output is expressed as s t-1 ; at time t, the input is the feature vector x t of the t-th context word of the current word, and the hidden unit calculates the output data.
  • Figure 4 also exemplarily shows a current word “liquid” in a corpus, six context words “as”, “the”, “vegan”, “gelatin”, “the current word” in the corpus. Substitute”, “absorbs”, and the two negative sample words “year” and "make” of the current word in the corpus.
  • An output unit of the last context word “absorbs” corresponding to the current word “liquid” is shown in FIG. 4, and the output unit outputs a first vector corresponding to the current word "liquid”.
  • its feature vector can be input into the fully connected layer, for example, according to the following formula:
  • w represents the second vector output by the fully connected layer after processing the feature vector of the current word.
  • q represents the feature vector of the current word
  • represents the bias parameter of the fully connected layer.
  • the feature vector may be input into the fully connected layer, and processed according to the current word, to obtain the third vector, and the third corresponding to the m negative sample word.
  • the vector is represented as w' m .
  • updating the parameters of the cyclic neural network according to the first vector, the second vector, the third vector, and the specified loss function may include, for example, calculating the second vector and a first similarity of the first vector, and a second similarity of the third vector to the first vector; according to the first similarity, the second similarity, and a specified loss function, Updating the parameters of the circulating neural network.
  • the loss function can be, for example:
  • c denotes the first vector
  • w denotes the second vector
  • w′ m denotes the third vector corresponding to the mth negative sample word
  • U, W and V denote weight parameters of the sequence representation layer
  • represents the offset parameter of the fully connected layer
  • represents the hyperparameter
  • s represents the similarity calculation function
  • represents the number of negative sample words.
  • the term of the similarity between the first vector and the third vector may be removed correspondingly in the loss function employed.
  • the feature vector after the cyclic neural network training, the feature vector can be inferred to generate a word vector.
  • the generating the word vector of each word according to the feature vector of the words and the cyclic neural network after the training may specifically include:
  • the feature vectors of the words are respectively input into the fully connected layer of the trained cyclic neural network for calculation, and a vector output after the calculation is obtained as a corresponding word vector.
  • FIG. 6 is a schematic flow chart of the method for generating another word vector.
  • the process in Figure 6 can include the following steps:
  • Step 1 establishing a vocabulary composed of words obtained by corpus segmentation, the words do not include words appearing in the corpus less than a set number of times; jump step 2;
  • Step 2 determining the total number of n-ary characters corresponding to each word, the same n-ary characters are counted only once, the n-ary characters represent consecutive n characters of their corresponding words; jump step 3;
  • Step 3 According to each n-character corresponding to each word, respectively establish a feature vector whose dimensions are the total number, and each dimension of the feature vector corresponds to a different n-ary character, and each dimension The value of the corresponding n-ary character corresponds to the word corresponding to the feature vector; jump step 4;
  • Step 4 traversing the corpus after the word segmentation, performing step 5 on the traversed current word, and performing step 6 if the traversal is completed, otherwise continuing the traversal;
  • Step 5 centering on the current word, sliding up to k words on both sides to establish a window, using words other than the current word in the window as context words, and inputting a sequence composed of feature vectors of all context words into the cyclic neural network.
  • the sequence representation layer performs a loop calculation to obtain a first vector; and inputs a current word and a feature vector of the negative sample word selected in the corpus into the fully connected layer of the cyclic neural network to calculate, respectively, to obtain a second vector and a a third vector; updating parameters of the cyclic neural network according to the first vector, the second vector, the third vector, and a specified loss function;
  • the loop calculation is performed according to the following formula:
  • the loss function includes:
  • x t represents the input unit of the sequence representation layer at time t
  • ie the feature vector of the t+1th context word of the current word
  • s t represents the hidden unit of the sequence representation layer at time t
  • o t represents the current word
  • U, W, V represent the weight parameters of the sequence representation layer
  • represents the excitation function
  • c represents the first vector
  • w represents the second vector
  • w' m represents the third vector corresponding to the mth negative sample word
  • represents the offset parameter of the fully connected layer
  • represents the hyperparameter
  • s represents the similarity calculation function
  • represents the number of negative sample words
  • step 6 the feature vectors of the words are respectively input into the fully connected layer of the trained cyclic neural network to perform calculation, and a corresponding word vector is obtained.
  • the steps in the other word vector generation method may be performed by the same or different modules, which is not specifically limited in this specification.
  • FIG. 7 is a schematic structural diagram of a word vector generating apparatus corresponding to FIG. 2 according to an embodiment of the present disclosure.
  • the apparatus may be located in an execution body of the process in FIG. 2, and includes:
  • the obtaining module 701 is configured to obtain each word obtained by the corpus segmentation
  • the establishing module 702 is configured to establish a feature vector of each word according to each n-character corresponding to each word, where the n-ary character represents consecutive n characters of the corresponding word;
  • the training module 703 trains the cyclic neural network according to the feature vector of each word and the feature vector of the context word of each word in the corpus;
  • the generating module 704 generates a word vector of each word according to the feature vector of each word and the trained cyclic neural network.
  • the characters of the word include each character constituting the word, and an annotated character added to a start position and/or an end position of the word.
  • the establishing module 702 is configured to establish a feature vector of each word according to each n-character corresponding to each word, and specifically includes:
  • the establishing module 702 determines a total number of different n-ary characters in each n-ary character corresponding to each word;
  • a feature vector for determining a dimension according to the total quantity is respectively established for the words, and the feature vector indicates each n-ary character corresponding to the corresponding word by the value of each dimension.
  • the training module 703 performs training on the cyclic neural network according to the feature vector of each word and the feature vector of the context word of each word in the corpus, and specifically includes:
  • the training module 703 trains the cyclic neural network according to the feature vectors of the words and the feature words of the words in the corpus and the feature vectors of the negative sample words.
  • the training module 703 performs training on the cyclic neural network according to the feature vector of each word and the context word of the word in the corpus and the feature vector of the negative sample word, and specifically includes:
  • the training module 703 traverses the corpus after the word segmentation, and executes the traversed current word:
  • the sequence formed by the feature vector of the context word of the current word is input into the sequence representation layer of the cyclic neural network to perform a cyclic calculation to obtain a first vector;
  • the feature vector of the current word is input into the fully connected layer of the cyclic neural network to calculate, the second vector is obtained, and the feature vector of the negative sample word of the current word is input into the fully connected layer of the circulating neural network to calculate Third vector;
  • Updating parameters of the cyclic neural network based on the first vector, the second vector, the third vector, and a specified loss function.
  • the training module 703 performs a loop calculation, and specifically includes:
  • the training module 703 performs a loop calculation according to the following formula:
  • x t represents the input unit of the sequence representation layer at time t
  • ie the feature vector of the t+1th context word of the current word
  • s t represents the hidden unit of the sequence representation layer at time t
  • o t represents the current word
  • U, W, and V represent the weight parameters of the sequence representation layer
  • represents the excitation function.
  • the training module 703 updates the parameters of the cyclic neural network according to the first vector, the second vector, the third vector, and the specified loss function, and specifically includes:
  • the training module 703 calculates a first similarity between the second vector and the first vector, and a second similarity between the third vector and the first vector;
  • the parameters of the cyclic neural network are updated according to the first similarity, the second similarity, and a specified loss function.
  • the loss function specifically includes:
  • c denotes the first vector
  • w denotes the second vector
  • w′ m denotes the third vector corresponding to the mth negative sample word
  • U, W and V denote weight parameters of the sequence representation layer
  • represents the offset parameter of the fully connected layer
  • represents the hyperparameter
  • s represents the similarity calculation function
  • represents the number of negative sample words.
  • the generating module 704 generates a word vector of each word according to the feature vector of each word and the trained cyclic neural network, and specifically includes:
  • the generating module 704 inputs the feature vectors of the words into the fully connected layer of the trained cyclic neural network for calculation, and obtains a vector output after the calculation as a corresponding word vector.
  • the embodiment of the present specification further provides a corresponding word vector generating device, including:
  • At least one processor and,
  • the memory stores instructions executable by the at least one processor, the instructions being executed by the at least one processor to enable the at least one processor to:
  • Circulating neural networks are trained according to feature vectors of the words and eigenvectors of context words of the words in the corpus;
  • a word vector of each word is generated according to the feature vector of each word and the cyclic neural network after training.
  • the embodiment of the present specification further provides a corresponding non-volatile computer storage medium, where computer executable instructions are stored, and the computer executable instructions are set as:
  • Circulating neural networks are trained according to feature vectors of the words and eigenvectors of context words of the words in the corpus;
  • a word vector of each word is generated according to the feature vector of each word and the cyclic neural network after training.
  • the device, the device, the non-volatile computer storage medium and the method provided by the embodiments of the present specification are corresponding, and therefore, the device, the device, and the non-volatile computer storage medium also have similar beneficial technical effects as the corresponding method, since The beneficial technical effects of the method are described in detail, and therefore, the beneficial technical effects of the corresponding device, device, and non-volatile computer storage medium are not described herein.
  • PLD Programmable Logic Device
  • FPGA Field Programmable Gate Array
  • HDL Hardware Description Language
  • the controller can be implemented in any suitable manner, for example, the controller can take the form of, for example, a microprocessor or processor and a computer readable medium storing computer readable program code (eg, software or firmware) executable by the (micro)processor.
  • computer readable program code eg, software or firmware
  • examples of controllers include, but are not limited to, the following microcontrollers: ARC 625D, Atmel AT91SAM, The Microchip PIC18F26K20 and Silicone Labs C8051F320, memory controllers can also be implemented as part of the memory's control logic.
  • the controller can be logically programmed by means of logic gates, switches, ASICs, programmable logic controllers, and embedding.
  • Such a controller can therefore be considered a hardware component, and the means for implementing various functions included therein can also be considered as a structure within the hardware component.
  • a device for implementing various functions can be considered as a software module that can be both a method of implementation and a structure within a hardware component.
  • the system, device, module or unit illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product having a certain function.
  • a typical implementation device is a computer.
  • the computer can be, for example, a personal computer, a laptop computer, a cellular phone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or A combination of any of these devices.
  • embodiments of the specification can be provided as a method, system, or computer program product.
  • embodiments of the present specification can take the form of an entirely hardware embodiment, an entirely software embodiment, or a combination of software and hardware.
  • embodiments of the present specification can take the form of a computer program product embodied on one or more computer usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.
  • the computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device.
  • the apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.
  • These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device.
  • the instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.
  • a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
  • processors CPUs
  • input/output interfaces network interfaces
  • memory volatile and non-volatile memory
  • the memory may include non-persistent memory, random access memory (RAM), and/or non-volatile memory in a computer readable medium, such as read only memory (ROM) or flash memory.
  • RAM random access memory
  • ROM read only memory
  • Memory is an example of a computer readable medium.
  • Computer readable media includes both permanent and non-persistent, removable and non-removable media.
  • Information storage can be implemented by any method or technology.
  • the information can be computer readable instructions, data structures, modules of programs, or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory. (ROM), electrically erasable programmable read only memory (EEPROM), flash memory or other memory technology, compact disk read only memory (CD-ROM), digital versatile disk (DVD) or other optical storage, Magnetic tape cartridges, magnetic tape storage or other magnetic storage devices or any other non-transportable media can be used to store information that can be accessed by a computing device.
  • computer readable media does not include temporary storage of computer readable media, such as modulated data signals and carrier waves.
  • embodiments of the present description can be provided as a method, system, or computer program product. Accordingly, the present description may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment in combination of software and hardware. Moreover, the description may take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.
  • computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • program modules include routines, programs, objects, components, data structures, and the like that perform particular tasks or implement particular abstract data types.
  • the present specification can also be practiced in distributed computing environments where tasks are performed by remote processing devices that are connected through a communication network.
  • program modules can be located in both local and remote computer storage media including storage devices.

Abstract

一种词向量生成方法、装置以及设备。所述方法包括:获取对语料分词得到的各词(S202),根据所述各词对应的各n元字符,建立所述各词的特征向量,所述n元字符表征其对应的词的连续n个字符(S204),根据所述各词的特征向量,以及所述各词在语料中的上下文词的特征向量,对循环神经网络进行训练(S206),根据所述各词的特征向量和训练后的所述循环神经网络,生成所述各词的词向量(S208)。

Description

词向量生成方法、装置以及设备
相关申请的交叉引用
本专利申请要求于2018年2月5日提交的、申请号为201810113710.3、发明名称为“词向量生成方法、装置以及设备”的中国专利申请的优先权,该申请的全文以引用的方式并入本文中。
技术领域
本说明书涉及计算机软件技术领域,尤其涉及词向量生成方法、装置以及设备。
背景技术
如今的自然语言处理的解决方案,大都采用基于神经网络的架构,而在这种架构下一个重要的基础技术就是词向量。词向量是将词映射到一个固定维度的向量,该向量表征了该词的语义信息。
在现有技术中,常见的用于生成词向量的算法比如包括:谷歌公司的词向量算法、微软公司的深度神经网络算法等。
基于现有技术,需要一种更准确的词向量生成方案。
发明内容
本说明书实施例提供词向量生成方法、装置以及设备,用以解决如下技术问题:需要一种更准确的词向量生成方案。
为解决上述技术问题,本说明书实施例是这样实现的:
本说明书实施例提供的一种词向量生成方法,包括:
获取对语料分词得到的各词;
根据所述各词对应的各n元字符,建立所述各词的特征向量,所述n元字符表征其对应的词的连续n个字符;
根据所述各词的特征向量,以及所述各词在所述语料中的上下文词的特征向量,对循环神经网络进行训练;
根据所述各词的特征向量和训练后的所述循环神经网络,生成所述各词的词向量。
本说明书实施例提供的一种词向量生成装置,包括:
获取模块,获取对语料分词得到的各词;
建立模块,根据所述各词对应的各n元字符,建立所述各词的特征向量,所述n元字符表征其对应的词的连续n个字符;
训练模块,根据所述各词的特征向量,以及所述各词在所述语料中的上下文词的特征向量,对循环神经网络进行训练;
生成模块,根据所述各词的特征向量和训练后的所述循环神经网络,生成所述各词的词向量。
本说明书实施例提供的另一种词向量生成方法,包括:
步骤1,建立通过对语料分词得到的各词构成的词汇表,所述各词不包括在所述语料中出现次数少于设定次数的词;跳转步骤2;
步骤2,确定各词对应的各n元字符的总数量,相同的n元字符只计一次,所述n元字符表征其对应的词的连续n个字符;跳转步骤3;
步骤3,根据所述各词对应的各n元字符,为各词分别建立维度为所述总数量的特征向量,所述特征向量的每维分别对应一个不同的n元字符,所述每维的取值表明其对应的n元字符是否对应于所述特征向量对应的词;跳转步骤4;
步骤4,遍历分词后的所述语料,对遍历到的当前词执行步骤5,若遍历完成则执行步骤6,否则继续遍历;
步骤5,以当前词为中心,向两侧分别滑动至多k个词建立窗口,将窗口中除当前词以外的词作为上下文词,并将所有上下文词的特征向量构成的序列输入循环神经网络的序列表示层进行循环计算,得到第一向量;将当前词以及在所述语料中选择的负样例词的特征向量输入所述循环神经网络的全连接层进行计算,分别得到第二向量和第三向量;根据所述第一向量、所述第二向量、所述第三向量,以及指定的损失函数,更新所 述循环神经网络的参数;
所述循环计算按照如下公式进行:
s t=σ(Ux t+Ws t-1)
o t=softmax(Vs t)
所述损失函数包括:
Figure PCTCN2019072081-appb-000001
其中,x t表示序列表示层在t时刻的输入单元,即当前词的第t+1个上下文词的特征向量,s t表示序列表示层在t时刻的隐藏单元,o t表示对当前词的前t+1个上下文词的特征向量进行循环计算得到的向量,U、W、V表示序列表示层的权重参数,σ表示激励函数,c表示所述第一向量,w表示所述第二向量,w' m表示第m个负样例词对应的所述第三向量,
Figure PCTCN2019072081-appb-000002
表示全连接层的权重参数,τ表示全连接层的偏置参数,γ表示超参数,s表示相似度计算函数,λ表示负样例词的数量;
步骤6,将所述各词的特征向量分别输入训练后的所述循环神经网络的全连接层进行计算,得到对应的词向量。
本说明书实施例提供的一种词向量生成设备,包括:
至少一个处理器;以及,
与所述至少一个处理器通信连接的存储器;其中,
所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够:
获取对语料分词得到的各词;
根据所述各词对应的各n元字符,建立所述各词的特征向量,所述n元字符表征其对应的词的连续n个字符;
根据所述各词的特征向量,以及所述各词在所述语料中的上下文词的特征向量,对循环神经网络进行训练;
根据所述各词的特征向量和训练后的所述循环神经网络,生成所述各词的词向量。
本说明书实施例采用的上述至少一个技术方案能够达到以下有益效果:循环神经网络可以通过循环计算,对词的上下文整体语义信息进行刻画,提取更多的上下文语义信息,而且n元字符能够更精细地对词进行表达,因此,有助于更准确地生成词向量。
附图说明
为了更清楚地说明本说明书实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本说明书中记载的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本说明书的方案在一种实际应用场景下涉及的一种整体架构示意图;
图2为本说明书实施例提供的一种词向量生成方法的流程示意图;
图3为本说明书实施例提供的实际应用场景下,一个英文词的特征向量示意图;
图4为本说明书实施例提供的实际应用场景下,基于循环神经网络的词向量生成方法的原理示意图;
图5为本说明书实施例提供的循环神经网络的序列表示层的结构示意图;
图6为本说明书实施例提供的另一种词向量生成方法的流程示意图;
图7为本说明书实施例提供的对应于图2的一种词向量生成装置的结构示意图。
具体实施方式
本说明书实施例提供词向量生成方法、装置以及设备。
为了使本技术领域的人员更好地理解本说明书中的技术方案,下面将结合本说明书实施例中的附图,对本说明书实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本说明书实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都应当属于本申请保护的范围。
图1为本说明书的方案在一种实际应用场景下涉及的一种整体架构示意图。该整体 架构中,主要涉及训练循环神经网络用以生成词向量的服务器。可以基于n元字符建立各词的特征向量,并利用特征向量和词的上下文关系训练循环神经网络。其中,特征向量可以由该服务器或者另一设备来建立。
本说明书的方案适用于由字母构成的语言,比如英文、法文、德文、西班牙文等,也适用于由非字母元素构成但是能够便利地映射为字母的语言,比如中文(可以映射为拼音字母)、日文(可以映射为罗马字母)等。为了便于描述,以下各实施例主要针对英文的场景,对本说明书的方案进行说明。
图2为本说明书实施例提供的一种词向量生成方法的流程示意图。从设备角度而言,该流程的执行主体比如包括以下至少一种设备:个人计算机、大中型计算机、计算机集群、手机、平板电脑、智能可穿戴设备、车机等。
图2中的流程可以包括以下步骤:
S202:获取对语料分词得到的各词。
在本说明书实施例中,所述各词具体可以是:语料中至少出现过一次的词中的至少部分词。为了便于后续处理,可以将各词保存在词汇表中,需要使用时从词汇表中读取词即可。
需要说明的是,考虑到若某词在语料中出现的次数太少,则后续处理时相应的迭代次数也少,训练结果可信度相对低,因此,可以将这种词筛除,使其不包含在所述各词中。在这种情况下,所述各词具体是:语料中至少出现过一次的词中的部分词。
S204:根据所述各词对应的各n元字符,建立所述各词的特征向量,所述n元字符表征其对应的词的连续n个字符。
在本说明书实施例中,词的字符可以包括构成该词的字符,或者构成该词的字符所映射的其他字符。以词“boy”为例,“b”、“o”、“y”均为构成词“boy”的字符。
为了表现词序,可以按照一定的规则在原始的词中添加一些标记字符,可以将这些标记字符也视为词的字符,比如,可以在原始的词的起始位置和/或结束位置等位置添加标注字符,在标注后,词“boy”比如记作“#boy#”,可以将这两个“#”也视为词“boy”的字符。
进一步地,n为不小于1的整数。以“#boy#”为例,其3元字符包括“#bo”(第1~3个字符)、“boy”(第2~4个字符)、“oy#”(第3~5个字符),其4元字符包 括“#boy”(第1~4个字符)、“boy#(第2~5个字符)”。
在本说明书实施例中,n的取值可以是动态可调的。对于同一个词,在确定该词对应的n元字符时,n的取值可以只取1个(比如,只确定该词对应的3元字符),也可以取多个(比如,确定该词对应的3元字符和4元字符等)。
为了便于计算机处理,n元字符可以基于指定的代码(比如,数字等)进行表示。比如,可以将不同的字符或者不同的n元字符分别用一个不同的代码或者代码串表示。
在本说明书实施例中,可以使词的特征向量通过各维的取值,指示该词对应的n元字符。更精确地,还可以使词的特征向量指示该词对应的n元字符的顺序。
S206:根据所述各词的特征向量,以及所述各词在所述语料中的上下文词的特征向量,对循环神经网络进行训练。
在本说明书实施例中,循环神经网络的序列表示层用于处理序列,以得到序列的全局信息,全局信息的内容也会受到序列中各元素顺序的影响。具体到本说明书的场景,序列由当前词(各词可以分别作为当前词)的各上下文词的特征向量作为元素构成,全局信息可以指当前词的全部上下文词的整体语义。
S208:根据所述各词的特征向量和训练后的所述循环神经网络,生成所述各词的词向量。
通过训练循环神经网络,能够为循环神经网络确定合理的参数,使得循环神经网络能够较为准确地刻画上下文词的整体语义,以及对应的当前词的语义。所述参数比如包括权重参数和偏置参数等。
利用训练后的循环神经网络对特征向量进行推理,能够获得词向量。
通过图2的方法,循环神经网络可以通过循环计算,对词的上下文整体语义信息进行刻画,提取更多的上下文语义信息,而且n元字符能够更精细地对词进行表达,因此,有助于更准确地生成词向量。
基于图2的方法,本说明书实施例还提供了该方法的一些具体实施方案,以及扩展方案,下面进行说明。
在本说明书实施例中,对于步骤S204,所述根据所述各词对应的各n元字符,建立所述各词的特征向量,具体可以包括:
确定所述各词对应的各n元字符中不同n元字符的总数量;分别为所述各词建立根 据所述总数量确定维度的特征向量,所述特征向量通过各维的取值指示了其对应的词所对应的各n元字符。
例如,对各词对应的全部n元字符逐一进行编号,编号从0开始,依次加1,相应的n元字符编号相同,假定总数量为N c,则最后一个n元字符的编号为N c-1。分别为各词建立一个维度为N c特征向量,具体地,假定n=3,某词对应的全部3元字符编号分别为2、34、127,则为其建立的特征向量中第2、34、127个元素可以为1,其余元素为0。
更直观地,基于上例,本说明书实施例提供实际应用场景下,一个英文词的特征向量示意图,如图3所示。该英文词为“boy”,起始位置和结束位置分别添加有一个标注字符“#”,f表示根据词建立特征向量的处理过程,特征向量比如是一个列向量,根据“boy”的各3元字符建立,可以看到,特征向量中有3个元素的取值为1,分别指示了3元字符“#bo”、“boy”、“oy#”,其他元素的取值为0,表示“boy”未对应于其他3元字符。
在本说明书实施例中,在对循环神经网络进行训练时,目标是使得当前词与上下文词的特征向量在经过训练后的循环神经网络推理后,相似度能够相对变高。
进一步地,将上下文词视为正样例词,作为对照,还可以按照一定的规则选择当前词的一个或者多个负样例词也参与训练,如此有利于训练快速收敛以及获得更为准确的训练结果。这种情况下,所述目标还可以包括使得当前词与负样例词的特征向量在经过训练后的循环神经网络推理后,相似度能够相对变低。负样例词比如可以在语料中随机选择得到,或者在非上下文词中选择得到,等等。本说明书对计算相似度的具体方式并不做限定,比如,可以基于向量的夹角余弦运算计算相似度,可以基于向量的平方和运算计算相似度,等等。
根据上一段的分析,对于步骤S206,所述根据所述各词的特征向量,以及所述各词在所述语料中的上下文词的特征向量,对循环神经网络进行训练。具体可以包括:
根据所述各词的特征向量,以及所述各词在所述语料中的上下文词和负样例词的特征向量,对循环神经网络进行训练。
在本说明书实施例中,循环神经网络的训练过程可以是迭代进行的,比较简单的一种方式是对分词后的语料进行遍历,每遍历到上述各词中的一个词即进行一次迭代,直到遍历完毕,可以视为已经利用该语料训练过循环神经网络了。
具体地,所述根据所述各词的特征向量,以及所述各词在所述语料中的上下文词和负样例词的特征向量,对循环神经网络进行训练,可以包括:
对分词后的所述语料进行遍历,对遍历到的当前词执行(执行内容即为一次迭代过程):
确定当前词在分词后的所述语料中的一个或多个上下文词以及负样例词;将当前词的上下文词的特征向量构成的序列输入循环神经网络的序列表示层进行循环计算,得到第一向量;将当前词的特征向量输入所述循环神经网络的全连接层进行计算,得到第二向量,以及将当前词的负样例词的特征向量输入所述循环神经网络的全连接层进行计算,得到第三向量;根据所述第一向量、所述第二向量、所述第三向量,以及指定的损失函数,更新所述循环神经网络的参数。
更直观地,结合图4进行说明。图4为本说明书实施例提供的实际应用场景下,基于循环神经网络的词向量生成方法的原理示意图。
图4的循环神经网络主要包括序列表示层、全连接层,以及Softmax层。在训练循环神经网络的过程中,上下文词的特征向量由序列表示层进行处理,以提取上下文词整体的词义信息,而当前词及其负样例词的特征向量则可以由全连接层进行处理。下面分别详细说明。
在本说明书实施例中,假定采用滑动窗口来确定上下文词,滑动窗口的中心为遍历到的当前词,滑动窗口中除当前词以外的其他词为上下文词。将全部上下文词的特征向量依次(每个上下文词的特征向量分别是上述序列中的一个元素)输入序列表示层,进而可以按照如下公式,进行循环计算:
s t=σ(Ux t+Ws t-1)
o t=softmax(Vs t)
其中,x t表示序列表示层在t时刻的输入单元,即当前词的第t+1个上下文词的特征向量,s t表示序列表示层在t时刻的隐藏单元,o t表示对当前词的前t+1个上下文词的特征向量进行循环计算得到的向量,U、W、V表示序列表示层的权重参数,σ表示激励函数,比如,tanh函数或者ReLU函数等。公式中参数的下标可以从0开始。
为了便于理解循环计算所使用的公式,本说明书实施例还提供了循环神经网络的序列表示层的结构示意图,如图5所示。
在图5左侧,x表示序列表示层的输入单元,s表示序列输入层的隐藏单元,o表示序列输入层的输出单元,隐藏单元会利用激活函数处理输入的数据,输出单元会用softmax函数处理输入的数据。隐藏单元前一时刻计算输出的数据会以一定的权重向下一时刻隐藏单元的输入进行反馈,从而,使得依次输入的整个序列的内容都会反映在序列表示层最终输出的数据中,U表示从输入单元至隐藏单元的权重参数,W表示从隐藏单元反馈至下一时刻的隐藏单元的权重参数,V表示从隐藏单元至输出单元的权重参数。
在图5右侧,对左侧的结构进行了展开,示出了连续的三个时刻序列输入层的结构,以及对序列依次输入的三个元素进行处理的原理。可以看到,在t-1时刻,隐藏单元计算输出的数据表示为s t-1;在t时刻,输入的是当前词的第t个上下文词的特征向量x t,隐藏单元计算输出的数据表示为s t=σ(Ux t+Ws t-1),输出单元计算输出的数据表示为o t=softmax(Vs t);在t+1时刻,输入的是当前词的第t+1个上下文词的特征向量x t+1,隐藏单元计算输出的数据表示为s t+1=σ(Ux t+1+Ws t),输出单元计算输出的数据表示为o t+1=softmax(Vs t+1)。以此类推,当前词的最后一个上下文词的特征向量输入后,即能够输出对应的第一向量。
图4还示例性地示出了某语料中的某个当前词“liquid”、该当前词在该语料中的6个上下文词“as”、“the”、“vegan”、“gelatin”、“substitute”、“absorbs”,以及该当前词在该语料中的两个负样例词“year”、“make”。图4中示出了序列表示层对应于当前词“liquid”的最后一个上下文词“absorbs”的输出单元,该输出单元输出当前词“liquid”对应的第一向量。
对于当前词,其特征向量可以输入全连接层,比如按照以下公式进行计算:
Figure PCTCN2019072081-appb-000003
其中,w表示全连接层对当前词的特征向量处理后输出的所述第二向量,
Figure PCTCN2019072081-appb-000004
表示全连接层的权重参数,q表示当前词的特征向量,τ表示全连接层的偏置参数。
类似地,对于每个负样例词,其特征向量可以分别输入全连接层,参照当前词的方式进行处理,得到所述第三向量,将第m个负样例词对应的所述第三向量表示为w' m
进一步地,所述根据所述第一向量、所述第二向量、所述第三向量,以及指定的损失函数,更新所述循环神经网络的参数,比如可以包括:计算所述第二向量与所述第一 向量的第一相似度,以及所述第三向量与所述第一向量的第二相似度;根据所述第一相似度、所述第二相似度,以及指定的损失函数,更新所述循环神经网络的参数。
列举一种损失函数作为示例。所述损失函数比如可以是:
Figure PCTCN2019072081-appb-000005
其中,c表示所述第一向量,w表示所述第二向量,w' m表示第m个负样例词对应的所述第三向量,U、W、V表示序列表示层的权重参数,
Figure PCTCN2019072081-appb-000006
表示全连接层的权重参数,τ表示全连接层的偏置参数,γ表示超参数,s表示相似度计算函数,λ表示负样例词的数量。
在实际应用中,若未采用负样例词,则采用的损失函数中可以相应地去掉计算第一向量与第三向量的相似度的项。
在本说明书实施例中,在循环神经网络训练后,可以通过对特征向量进行推理,以生成词向量。具体地,对于步骤S208,所述根据所述各词的特征向量和训练后的所述循环神经网络,生成所述各词的词向量,具体可以包括:
将所述各词的特征向量分别输入训练后的所述循环神经网络的全连接层进行计算,获得计算后输出的向量,作为对应的词向量。
基于同样的思路,本说明书实施例提供了另一种词向量生成方法,其为图2中的词向量生成方法示例性的一种具体实施方案。图6为该另一种词向量生成方法的流程示意图。
图6中的流程可以包括以下步骤:
步骤1,建立通过对语料分词得到的各词构成的词汇表,所述各词不包括在所述语料中出现次数少于设定次数的词;跳转步骤2;
步骤2,确定各词对应的各n元字符的总数量,相同的n元字符只计一次,所述n元字符表征其对应的词的连续n个字符;跳转步骤3;
步骤3,根据所述各词对应的各n元字符,为各词分别建立维度为所述总数量的特征向量,所述特征向量的每维分别对应一个不同的n元字符,所述每维的取值表明其对应的n元字符是否对应于所述特征向量对应的词;跳转步骤4;
步骤4,遍历分词后的所述语料,对遍历到的当前词执行步骤5,若遍历完成则执行 步骤6,否则继续遍历;
步骤5,以当前词为中心,向两侧分别滑动至多k个词建立窗口,将窗口中除当前词以外的词作为上下文词,并将所有上下文词的特征向量构成的序列输入循环神经网络的序列表示层进行循环计算,得到第一向量;将当前词以及在所述语料中选择的负样例词的特征向量输入所述循环神经网络的全连接层进行计算,分别得到第二向量和第三向量;根据所述第一向量、所述第二向量、所述第三向量,以及指定的损失函数,更新所述循环神经网络的参数;
所述循环计算按照如下公式进行:
s t=σ(Ux t+Ws t-1)
o t=softmax(Vs t)
所述损失函数包括:
Figure PCTCN2019072081-appb-000007
其中,x t表示序列表示层在t时刻的输入单元,即当前词的第t+1个上下文词的特征向量,s t表示序列表示层在t时刻的隐藏单元,o t表示对当前词的前t+1个上下文词的特征向量进行循环计算得到的向量,U、W、V表示序列表示层的权重参数,σ表示激励函数,c表示所述第一向量,w表示所述第二向量,w' m表示第m个负样例词对应的所述第三向量,
Figure PCTCN2019072081-appb-000008
表示全连接层的权重参数,τ表示全连接层的偏置参数,γ表示超参数,s表示相似度计算函数,λ表示负样例词的数量;
步骤6,将所述各词的特征向量分别输入训练后的所述循环神经网络的全连接层进行计算,得到对应的词向量。
该另一种词向量生成方法中各步骤可以由相同或者不同的模块执行,本说明书对此并不做具体限定。
上面对本说明书实施例提供的词向量生成方法进行了说明,基于同样的思路,本说明书实施例还提供了对应的装置,如图7所示。
图7为本说明书实施例提供的对应于图2的一种词向量生成装置的结构示意图,该装置可以位于图2中流程的执行主体,包括:
获取模块701,获取对语料分词得到的各词;
建立模块702,根据所述各词对应的各n元字符,建立所述各词的特征向量,所述n元字符表征其对应的词的连续n个字符;
训练模块703,根据所述各词的特征向量,以及所述各词在所述语料中的上下文词的特征向量,对循环神经网络进行训练;
生成模块704,根据所述各词的特征向量和训练后的所述循环神经网络,生成所述各词的词向量。
可选地,所述词的字符包括构成所述词的各字符,以及添加于所述词的起始位置和/或结束位置的标注字符。
可选地,所述建立模块702根据所述各词对应的各n元字符,建立所述各词的特征向量,具体包括:
所述建立模块702确定所述各词对应的各n元字符中不同n元字符的总数量;
分别为所述各词建立根据所述总数量确定维度的特征向量,所述特征向量通过各维的取值指示了其对应的词所对应的各n元字符。
可选地,所述训练模块703根据所述各词的特征向量,以及所述各词在所述语料中的上下文词的特征向量,对循环神经网络进行训练,具体包括:
所述训练模块703根据所述各词的特征向量,以及所述各词在所述语料中的上下文词和负样例词的特征向量,对循环神经网络进行训练。
可选地,所述训练模块703根据所述各词的特征向量,以及所述各词在所述语料中的上下文词和负样例词的特征向量,对循环神经网络进行训练,具体包括:
所述训练模块703对分词后的所述语料进行遍历,对遍历到的当前词执行:
确定当前词在分词后的所述语料中的一个或多个上下文词以及负样例词;
将当前词的上下文词的特征向量构成的序列输入循环神经网络的序列表示层进行循环计算,得到第一向量;
将当前词的特征向量输入所述循环神经网络的全连接层进行计算,得到第二向量,以及将当前词的负样例词的特征向量输入所述循环神经网络的全连接层进行计算,得到第三向量;
根据所述第一向量、所述第二向量、所述第三向量,以及指定的损失函数,更新所述循环神经网络的参数。
可选地,所述训练模块703进行循环计算,具体包括:
所述训练模块703按照如下公式,进行循环计算:
s t=σ(Ux t+Ws t-1)
o t=softmax(Vs t)
其中,x t表示序列表示层在t时刻的输入单元,即当前词的第t+1个上下文词的特征向量,s t表示序列表示层在t时刻的隐藏单元,o t表示对当前词的前t+1个上下文词的特征向量进行循环计算得到的向量,U、W、V表示序列表示层的权重参数,σ表示激励函数。
可选地,所述训练模块703根据所述第一向量、所述第二向量、所述第三向量,以及指定的损失函数,更新所述循环神经网络的参数,具体包括:
所述训练模块703计算所述第二向量与所述第一向量的第一相似度,以及所述第三向量与所述第一向量的第二相似度;
根据所述第一相似度、所述第二相似度,以及指定的损失函数,更新所述循环神经网络的参数。
可选地,所述损失函数具体包括:
Figure PCTCN2019072081-appb-000009
其中,c表示所述第一向量,w表示所述第二向量,w' m表示第m个负样例词对应的所述第三向量,U、W、V表示序列表示层的权重参数,
Figure PCTCN2019072081-appb-000010
表示全连接层的权重参数,τ表示全连接层的偏置参数,γ表示超参数,s表示相似度计算函数,λ表示负样例词的数量。
可选地,所述生成模块704根据所述各词的特征向量和训练后的所述循环神经网络,生成所述各词的词向量,具体包括:
所述生成模块704将所述各词的特征向量分别输入训练后的所述循环神经网络的全连接层进行计算,获得计算后输出的向量,作为对应的词向量。
基于同样的思路,本说明书实施例还提供了对应的一种词向量生成设备,包括:
至少一个处理器;以及,
与所述至少一个处理器通信连接的存储器;其中,
所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够:
获取对语料分词得到的各词;
根据所述各词对应的各n元字符,建立所述各词的特征向量,所述n元字符表征其对应的词的连续n个字符;
根据所述各词的特征向量,以及所述各词在所述语料中的上下文词的特征向量,对循环神经网络进行训练;
根据所述各词的特征向量和训练后的所述循环神经网络,生成所述各词的词向量。
基于同样的思路,本说明书实施例还提供了对应的一种非易失性计算机存储介质,存储有计算机可执行指令,所述计算机可执行指令设置为:
获取对语料分词得到的各词;
根据所述各词对应的各n元字符,建立所述各词的特征向量,所述n元字符表征其对应的词的连续n个字符;
根据所述各词的特征向量,以及所述各词在所述语料中的上下文词的特征向量,对循环神经网络进行训练;
根据所述各词的特征向量和训练后的所述循环神经网络,生成所述各词的词向量。
上述对本说明书特定实施例进行了描述。其它实施例在所附权利要求书的范围内。在一些情况下,在权利要求书中记载的动作或步骤可以按照不同于实施例中的顺序来执行并且仍然可以实现期望的结果。另外,在附图中描绘的过程不一定要求示出的特定顺序或者连续顺序才能实现期望的结果。在某些实施方式中,多任务处理和并行处理也是可以的或者可能是有利的。
本说明书中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于装置、设备、非易失性计算机存储介质实施例而言,由于其基本相似于方法实施例,所以描述 的比较简单,相关之处参见方法实施例的部分说明即可。
本说明书实施例提供的装置、设备、非易失性计算机存储介质与方法是对应的,因此,装置、设备、非易失性计算机存储介质也具有与对应方法类似的有益技术效果,由于上面已经对方法的有益技术效果进行了详细说明,因此,这里不再赘述对应装置、设备、非易失性计算机存储介质的有益技术效果。
在20世纪90年代,对于一个技术的改进可以很明显地区分是硬件上的改进(例如,对二极管、晶体管、开关等电路结构的改进)还是软件上的改进(对于方法流程的改进)。然而,随着技术的发展,当今的很多方法流程的改进已经可以视为硬件电路结构的直接改进。设计人员几乎都通过将改进的方法流程编程到硬件电路中来得到相应的硬件电路结构。因此,不能说一个方法流程的改进就不能用硬件实体模块来实现。例如,可编程逻辑器件(Programmable Logic Device,PLD)(例如现场可编程门阵列(Field Programmable Gate Array,FPGA))就是这样一种集成电路,其逻辑功能由用户对器件编程来确定。由设计人员自行编程来把一个数字系统“集成”在一片PLD上,而不需要请芯片制造厂商来设计和制作专用的集成电路芯片。而且,如今,取代手工地制作集成电路芯片,这种编程也多半改用“逻辑编译器(logic compiler)”软件来实现,它与程序开发撰写时所用的软件编译器相类似,而要编译之前的原始代码也得用特定的编程语言来撰写,此称之为硬件描述语言(Hardware Description Language,HDL),而HDL也并非仅有一种,而是有许多种,如ABEL(Advanced Boolean Expression Language)、AHDL(Altera Hardware Description Language)、Confluence、CUPL(Cornell University Programming Language)、HDCal、JHDL(Java Hardware Description Language)、Lava、Lola、MyHDL、PALASM、RHDL(Ruby Hardware Description Language)等,目前最普遍使用的是VHDL(Very-High-Speed Integrated Circuit Hardware Description Language)与Verilog。本领域技术人员也应该清楚,只需要将方法流程用上述几种硬件描述语言稍作逻辑编程并编程到集成电路中,就可以很容易得到实现该逻辑方法流程的硬件电路。
控制器可以按任何适当的方式实现,例如,控制器可以采取例如微处理器或处理器以及存储可由该(微)处理器执行的计算机可读程序代码(例如软件或固件)的计算机可读介质、逻辑门、开关、专用集成电路(Application Specific Integrated Circuit,ASIC)、可编程逻辑控制器和嵌入微控制器的形式,控制器的例子包括但不限于以下微控制器:ARC 625D、Atmel AT91SAM、Microchip PIC18F26K20以及Silicone Labs C8051F320, 存储器控制器还可以被实现为存储器的控制逻辑的一部分。本领域技术人员也知道,除了以纯计算机可读程序代码方式实现控制器以外,完全可以通过将方法步骤进行逻辑编程来使得控制器以逻辑门、开关、专用集成电路、可编程逻辑控制器和嵌入微控制器等的形式来实现相同功能。因此这种控制器可以被认为是一种硬件部件,而对其内包括的用于实现各种功能的装置也可以视为硬件部件内的结构。或者甚至,可以将用于实现各种功能的装置视为既可以是实现方法的软件模块又可以是硬件部件内的结构。
上述实施例阐明的系统、装置、模块或单元,具体可以由计算机芯片或实体实现,或者由具有某种功能的产品来实现。一种典型的实现设备为计算机。具体的,计算机例如可以为个人计算机、膝上型计算机、蜂窝电话、相机电话、智能电话、个人数字助理、媒体播放器、导航设备、电子邮件设备、游戏控制台、平板计算机、可穿戴设备或者这些设备中的任何设备的组合。
为了描述的方便,描述以上装置时以功能分为各种单元分别描述。当然,在实施本说明书时可以把各单元的功能在同一个或多个软件和/或硬件中实现。
本领域内的技术人员应明白,本说明书实施例可提供为方法、系统、或计算机程序产品。因此,本说明书实施例可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本说明书实施例可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本说明书是参照根据本说明书实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
在一个典型的配置中,计算设备包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。
内存可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM)。内存是计算机可读介质的示例。
计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括暂存电脑可读媒体(transitory media),如调制的数据信号和载波。
还需要说明的是,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、商品或者设备中还存在另外的相同要素。
本领域技术人员应明白,本说明书实施例可提供为方法、系统或计算机程序产品。因此,本说明书可采用完全硬件实施例、完全软件实施例或结合软件和硬件方面的实施例的形式。而且,本说明书可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本说明书可以在由计算机执行的计算机可执行指令的一般上下文中描述,例如 程序模块。一般地,程序模块包括执行特定任务或实现特定抽象数据类型的例程、程序、对象、组件、数据结构等等。也可以在分布式计算环境中实践本说明书,在这些分布式计算环境中,由通过通信网络而被连接的远程处理设备来执行任务。在分布式计算环境中,程序模块可以位于包括存储设备在内的本地和远程计算机存储介质中。
本说明书中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于系统实施例而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
以上所述仅为本说明书实施例而已,并不用于限制本申请。对于本领域技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原理之内所作的任何修改、等同替换、改进等,均应包含在本申请的权利要求范围之内。

Claims (20)

  1. 一种词向量生成方法,包括:
    获取对语料分词得到的各词;
    根据所述各词对应的各n元字符,建立所述各词的特征向量,所述n元字符表征其对应的词的连续n个字符;
    根据所述各词的特征向量,以及所述各词在所述语料中的上下文词的特征向量,对循环神经网络进行训练;
    根据所述各词的特征向量和训练后的所述循环神经网络,生成所述各词的词向量。
  2. 如权利要求1所述的方法,所述词的字符包括构成所述词的各字符,以及添加于所述词的起始位置和/或结束位置的标注字符。
  3. 如权利要求1所述的方法,所述根据所述各词对应的各n元字符,建立所述各词的特征向量,具体包括:
    确定所述各词对应的各n元字符中不同n元字符的总数量;
    分别为所述各词建立根据所述总数量确定维度的特征向量,所述特征向量通过各维的取值指示了其对应的词所对应的各n元字符。
  4. 如权利要求1所述的方法,所述根据所述各词的特征向量,以及所述各词在所述语料中的上下文词的特征向量,对循环神经网络进行训练,具体包括:
    根据所述各词的特征向量,以及所述各词在所述语料中的上下文词和负样例词的特征向量,对循环神经网络进行训练。
  5. 如权利要求4所述的方法,所述根据所述各词的特征向量,以及所述各词在所述语料中的上下文词和负样例词的特征向量,对循环神经网络进行训练,具体包括:
    对分词后的所述语料进行遍历,对遍历到的当前词执行:
    确定当前词在分词后的所述语料中的一个或多个上下文词以及负样例词;
    将当前词的上下文词的特征向量构成的序列输入循环神经网络的序列表示层进行循环计算,得到第一向量;
    将当前词的特征向量输入所述循环神经网络的全连接层进行计算,得到第二向量,以及将当前词的负样例词的特征向量输入所述循环神经网络的全连接层进行计算,得到 第三向量;
    根据所述第一向量、所述第二向量、所述第三向量,以及指定的损失函数,更新所述循环神经网络的参数。
  6. 如权利要求5所述的方法,所述进行循环计算,具体包括:
    按照如下公式,进行循环计算:
    s t=σ(Ux t+Ws t-1)
    o t=softmax(Vs t)
    其中,x t表示序列表示层在t时刻的输入单元,即当前词的第t+1个上下文词的特征向量,s t表示序列表示层在t时刻的隐藏单元,o t表示对当前词的前t+1个上下文词的特征向量进行循环计算得到的向量,U、W、V表示序列表示层的权重参数,σ表示激励函数。
  7. 如权利要求5所述的方法,所述根据所述第一向量、所述第二向量、所述第三向量,以及指定的损失函数,更新所述循环神经网络的参数,具体包括:
    计算所述第二向量与所述第一向量的第一相似度,以及所述第三向量与所述第一向量的第二相似度;
    根据所述第一相似度、所述第二相似度,以及指定的损失函数,更新所述循环神经网络的参数。
  8. 如权利要求5所述的方法,所述损失函数具体包括:
    Figure PCTCN2019072081-appb-100001
    其中,c表示所述第一向量,w表示所述第二向量,w' m表示第m个负样例词对应的所述第三向量,U、W、V表示序列表示层的权重参数,
    Figure PCTCN2019072081-appb-100002
    表示全连接层的权重参数,τ表示全连接层的偏置参数,γ表示超参数,s表示相似度计算函数,λ表示负样例词的数量。
  9. 如权利要求1所述的方法,所述根据所述各词的特征向量和训练后的所述循环神经网络,生成所述各词的词向量,具体包括:
    将所述各词的特征向量分别输入训练后的所述循环神经网络的全连接层进行计算, 获得计算后输出的向量,作为对应的词向量。
  10. 一种词向量生成装置,包括:
    获取模块,获取对语料分词得到的各词;
    建立模块,根据所述各词对应的各n元字符,建立所述各词的特征向量,所述n元字符表征其对应的词的连续n个字符;
    训练模块,根据所述各词的特征向量,以及所述各词在所述语料中的上下文词的特征向量,对循环神经网络进行训练;
    生成模块,根据所述各词的特征向量和训练后的所述循环神经网络,生成所述各词的词向量。
  11. 如权利要求10所述的装置,所述词的字符包括构成所述词的各字符,以及添加于所述词的起始位置和/或结束位置的标注字符。
  12. 如权利要求10所述的装置,所述建立模块根据所述各词对应的各n元字符,建立所述各词的特征向量,具体包括:
    所述建立模块确定所述各词对应的各n元字符中不同n元字符的总数量;
    分别为所述各词建立根据所述总数量确定维度的特征向量,所述特征向量通过各维的取值指示了其对应的词所对应的各n元字符。
  13. 如权利要求10所述的装置,所述训练模块根据所述各词的特征向量,以及所述各词在所述语料中的上下文词的特征向量,对循环神经网络进行训练,具体包括:
    所述训练模块根据所述各词的特征向量,以及所述各词在所述语料中的上下文词和负样例词的特征向量,对循环神经网络进行训练。
  14. 如权利要求13所述的装置,所述训练模块根据所述各词的特征向量,以及所述各词在所述语料中的上下文词和负样例词的特征向量,对循环神经网络进行训练,具体包括:
    所述训练模块对分词后的所述语料进行遍历,对遍历到的当前词执行:
    确定当前词在分词后的所述语料中的一个或多个上下文词以及负样例词;
    将当前词的上下文词的特征向量构成的序列输入循环神经网络的序列表示层进行循环计算,得到第一向量;
    将当前词的特征向量输入所述循环神经网络的全连接层进行计算,得到第二向量,以及将当前词的负样例词的特征向量输入所述循环神经网络的全连接层进行计算,得到第三向量;
    根据所述第一向量、所述第二向量、所述第三向量,以及指定的损失函数,更新所述循环神经网络的参数。
  15. 如权利要求14所述的装置,所述训练模块进行循环计算,具体包括:
    所述训练模块按照如下公式,进行循环计算:
    s t=σ(Ux t+Ws t-1)
    o t=softmax(Vs t)
    其中,x t表示序列表示层在t时刻的输入单元,即当前词的第t+1个上下文词的特征向量,s t表示序列表示层在t时刻的隐藏单元,o t表示对当前词的前t+1个上下文词的特征向量进行循环计算得到的向量,U、W、V表示序列表示层的权重参数,σ表示激励函数。
  16. 如权利要求14所述的装置,所述训练模块根据所述第一向量、所述第二向量、所述第三向量,以及指定的损失函数,更新所述循环神经网络的参数,具体包括:
    所述训练模块计算所述第二向量与所述第一向量的第一相似度,以及所述第三向量与所述第一向量的第二相似度;
    根据所述第一相似度、所述第二相似度,以及指定的损失函数,更新所述循环神经网络的参数。
  17. 如权利要求14所述的装置,所述损失函数具体包括:
    Figure PCTCN2019072081-appb-100003
    其中,c表示所述第一向量,w表示所述第二向量,w' m表示第m个负样例词对应的所述第三向量,U、W、V表示序列表示层的权重参数,
    Figure PCTCN2019072081-appb-100004
    表示全连接层的权重参数,τ表示全连接层的偏置参数,γ表示超参数,s表示相似度计算函数,λ表示负样例词的数量。
  18. 如权利要求10所述的装置,所述生成模块根据所述各词的特征向量和训练后 的所述循环神经网络,生成所述各词的词向量,具体包括:
    所述生成模块将所述各词的特征向量分别输入训练后的所述循环神经网络的全连接层进行计算,获得计算后输出的向量,作为对应的词向量。
  19. 一种词向量生成方法,包括:
    步骤1,建立通过对语料分词得到的各词构成的词汇表,所述各词不包括在所述语料中出现次数少于设定次数的词;跳转步骤2;
    步骤2,确定各词对应的各n元字符的总数量,相同的n元字符只计一次,所述n元字符表征其对应的词的连续n个字符;跳转步骤3;
    步骤3,根据所述各词对应的各n元字符,为各词分别建立维度为所述总数量的特征向量,所述特征向量的每维分别对应一个不同的n元字符,所述每维的取值表明其对应的n元字符是否对应于所述特征向量对应的词;跳转步骤4;
    步骤4,遍历分词后的所述语料,对遍历到的当前词执行步骤5,若遍历完成则执行步骤6,否则继续遍历;
    步骤5,以当前词为中心,向两侧分别滑动至多k个词建立窗口,将窗口中除当前词以外的词作为上下文词,并将所有上下文词的特征向量构成的序列输入循环神经网络的序列表示层进行循环计算,得到第一向量;将当前词以及在所述语料中选择的负样例词的特征向量输入所述循环神经网络的全连接层进行计算,分别得到第二向量和第三向量;根据所述第一向量、所述第二向量、所述第三向量,以及指定的损失函数,更新所述循环神经网络的参数;
    所述循环计算按照如下公式进行:
    s t=σ(Ux t+Ws t-1)
    o t=softmax(Vs t)
    所述损失函数包括:
    Figure PCTCN2019072081-appb-100005
    其中,x t表示序列表示层在t时刻的输入单元,即当前词的第t+1个上下文词的特征向量,s t表示序列表示层在t时刻的隐藏单元,o t表示对当前词的前t+1个上下文词的特征向量进行循环计算得到的向量,U、W、V表示序列表示层的权重参数,σ表示激励函数,c表示所述第一向量,w表示所述第二向量,w' m表示第m个负样例词对应的所述 第三向量,
    Figure PCTCN2019072081-appb-100006
    表示全连接层的权重参数,τ表示全连接层的偏置参数,γ表示超参数,s表示相似度计算函数,λ表示负样例词的数量;
    步骤6,将所述各词的特征向量分别输入训练后的所述循环神经网络的全连接层进行计算,得到对应的词向量。
  20. 一种词向量生成设备,包括:
    至少一个处理器;以及,
    与所述至少一个处理器通信连接的存储器;其中,
    所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够:
    获取对语料分词得到的各词;
    根据所述各词对应的各n元字符,建立所述各词的特征向量,所述n元字符表征其对应的词的连续n个字符;
    根据所述各词的特征向量,以及所述各词在所述语料中的上下文词的特征向量,对循环神经网络进行训练;
    根据所述各词的特征向量和训练后的所述循环神经网络,生成所述各词的词向量。
PCT/CN2019/072081 2018-02-05 2019-01-17 词向量生成方法、装置以及设备 WO2019149076A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
SG11202004446PA SG11202004446PA (en) 2018-02-05 2019-01-17 Methods, apparatuses, and devices for generating word vectors
US16/879,316 US10824819B2 (en) 2018-02-05 2020-05-20 Generating word vectors by recurrent neural networks based on n-ary characters

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810113710.3 2018-02-05
CN201810113710.3A CN110119507A (zh) 2018-02-05 2018-02-05 词向量生成方法、装置以及设备

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/879,316 Continuation US10824819B2 (en) 2018-02-05 2020-05-20 Generating word vectors by recurrent neural networks based on n-ary characters

Publications (1)

Publication Number Publication Date
WO2019149076A1 true WO2019149076A1 (zh) 2019-08-08

Family

ID=67479139

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/072081 WO2019149076A1 (zh) 2018-02-05 2019-01-17 词向量生成方法、装置以及设备

Country Status (5)

Country Link
US (1) US10824819B2 (zh)
CN (1) CN110119507A (zh)
SG (1) SG11202004446PA (zh)
TW (1) TWI686713B (zh)
WO (1) WO2019149076A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110968725A (zh) * 2019-12-03 2020-04-07 咪咕动漫有限公司 图像内容描述信息生成方法、电子设备及存储介质

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110347776A (zh) * 2019-07-17 2019-10-18 北京百度网讯科技有限公司 兴趣点名称匹配方法、装置、设备及存储介质
CN110750987B (zh) * 2019-10-28 2021-02-05 腾讯科技(深圳)有限公司 文本处理方法、装置及存储介质
CN110852063B (zh) * 2019-10-30 2023-05-05 语联网(武汉)信息技术有限公司 基于双向lstm神经网络的词向量生成方法及装置
CN111190576B (zh) * 2019-12-17 2022-09-23 深圳平安医疗健康科技服务有限公司 基于文字识别的组件集展示方法、装置和计算机设备

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103995805A (zh) * 2014-06-05 2014-08-20 神华集团有限责任公司 面向文本大数据的词语处理方法
CN105871619A (zh) * 2016-04-18 2016-08-17 中国科学院信息工程研究所 一种基于n-gram多特征的流量载荷类型检测方法
CN106547735A (zh) * 2016-10-25 2017-03-29 复旦大学 基于深度学习的上下文感知的动态词或字向量的构建及使用方法
WO2017057921A1 (ko) * 2015-10-02 2017-04-06 네이버 주식회사 딥러닝을 이용하여 텍스트 단어 및 기호 시퀀스를 값으로 하는 복수 개의 인자들로 표현된 데이터를 자동으로 분류하는 방법 및 시스템
CN107153642A (zh) * 2017-05-16 2017-09-12 华北电力大学 一种基于神经网络识别文本评论情感倾向的分析方法
CN107273503A (zh) * 2017-06-19 2017-10-20 北京百度网讯科技有限公司 用于生成同语言平行文本的方法和装置

Family Cites Families (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1557840B1 (en) * 2002-10-15 2012-12-05 Sony Corporation Memory device, motion vector detection device, and detection method
US7630945B2 (en) * 2005-05-12 2009-12-08 Yahoo! Inc. Building support vector machines with reduced classifier complexity
US7949622B2 (en) * 2007-12-13 2011-05-24 Yahoo! Inc. System and method for generating a classifier model for classifying web content
US9037464B1 (en) * 2013-01-15 2015-05-19 Google Inc. Computing numeric representations of words in a high-dimensional space
US20140363082A1 (en) * 2013-06-09 2014-12-11 Apple Inc. Integrating stroke-distribution information into spatial feature extraction for automatic handwriting recognition
US9898187B2 (en) * 2013-06-09 2018-02-20 Apple Inc. Managing real-time handwriting recognition
US9495620B2 (en) * 2013-06-09 2016-11-15 Apple Inc. Multi-script handwriting recognition using a universal recognizer
US10867597B2 (en) * 2013-09-02 2020-12-15 Microsoft Technology Licensing, Llc Assignment of semantic labels to a sequence of words using neural network architectures
US20150095017A1 (en) * 2013-09-27 2015-04-02 Google Inc. System and method for learning word embeddings using neural language models
US9846836B2 (en) * 2014-06-13 2017-12-19 Microsoft Technology Licensing, Llc Modeling interestingness with deep neural networks
US10380609B2 (en) * 2015-02-10 2019-08-13 EverString Innovation Technology Web crawling for use in providing leads generation and engagement recommendations
US10373054B2 (en) * 2015-04-19 2019-08-06 International Business Machines Corporation Annealed dropout training of neural networks
US9607616B2 (en) * 2015-08-17 2017-03-28 Mitsubishi Electric Research Laboratories, Inc. Method for using a multi-scale recurrent neural network with pretraining for spoken language understanding tasks
US11010550B2 (en) * 2015-09-29 2021-05-18 Apple Inc. Unified language modeling framework for word prediction, auto-completion and auto-correction
US9792534B2 (en) * 2016-01-13 2017-10-17 Adobe Systems Incorporated Semantic natural language vector space
EP3433795A4 (en) * 2016-03-24 2019-11-13 Ramot at Tel-Aviv University Ltd. METHOD AND SYSTEM FOR CONVERTING A TEXT IMAGE
CN107526720A (zh) * 2016-06-17 2017-12-29 松下知识产权经营株式会社 意思生成方法、意思生成装置以及程序
JP6235082B1 (ja) * 2016-07-13 2017-11-22 ヤフー株式会社 データ分類装置、データ分類方法、およびプログラム
CN111611798B (zh) * 2017-01-22 2023-05-16 创新先进技术有限公司 一种词向量处理方法及装置
US10755174B2 (en) * 2017-04-11 2020-08-25 Sap Se Unsupervised neural attention model for aspect extraction
EP3625677A4 (en) * 2017-05-14 2021-04-21 Digital Reasoning Systems, Inc. SYSTEMS AND METHODS FOR QUICKLY CREATING, MANAGING AND SHARING LEARNING MODELS
CN107423269B (zh) * 2017-05-26 2020-12-18 创新先进技术有限公司 词向量处理方法及装置
CN107577658B (zh) * 2017-07-18 2021-01-29 创新先进技术有限公司 词向量处理方法、装置以及电子设备
CN107608953B (zh) * 2017-07-25 2020-08-14 同济大学 一种基于不定长上下文的词向量生成方法
US10642846B2 (en) * 2017-10-13 2020-05-05 Microsoft Technology Licensing, Llc Using a generative adversarial network for query-keyword matching
US10410350B2 (en) * 2017-10-30 2019-09-10 Rakuten, Inc. Skip architecture neural network machine and method for improved semantic segmentation
CN109165385B (zh) * 2018-08-29 2022-08-09 中国人民解放军国防科技大学 一种基于实体关系联合抽取模型的多三元组抽取方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103995805A (zh) * 2014-06-05 2014-08-20 神华集团有限责任公司 面向文本大数据的词语处理方法
WO2017057921A1 (ko) * 2015-10-02 2017-04-06 네이버 주식회사 딥러닝을 이용하여 텍스트 단어 및 기호 시퀀스를 값으로 하는 복수 개의 인자들로 표현된 데이터를 자동으로 분류하는 방법 및 시스템
CN105871619A (zh) * 2016-04-18 2016-08-17 中国科学院信息工程研究所 一种基于n-gram多特征的流量载荷类型检测方法
CN106547735A (zh) * 2016-10-25 2017-03-29 复旦大学 基于深度学习的上下文感知的动态词或字向量的构建及使用方法
CN107153642A (zh) * 2017-05-16 2017-09-12 华北电力大学 一种基于神经网络识别文本评论情感倾向的分析方法
CN107273503A (zh) * 2017-06-19 2017-10-20 北京百度网讯科技有限公司 用于生成同语言平行文本的方法和装置

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110968725A (zh) * 2019-12-03 2020-04-07 咪咕动漫有限公司 图像内容描述信息生成方法、电子设备及存储介质
CN110968725B (zh) * 2019-12-03 2023-04-28 咪咕动漫有限公司 图像内容描述信息生成方法、电子设备及存储介质

Also Published As

Publication number Publication date
US20200279080A1 (en) 2020-09-03
TWI686713B (zh) 2020-03-01
CN110119507A (zh) 2019-08-13
US10824819B2 (en) 2020-11-03
TW201939318A (zh) 2019-10-01
SG11202004446PA (en) 2020-06-29

Similar Documents

Publication Publication Date Title
TWI701588B (zh) 詞向量處理方法、裝置以及設備
WO2019149135A1 (zh) 词向量生成方法、装置以及设备
WO2019149076A1 (zh) 词向量生成方法、装置以及设备
TWI685761B (zh) 詞向量處理方法及裝置
TWI721310B (zh) 基於集群的詞向量處理方法、裝置以及設備
CN108874765B (zh) 词向量处理方法及装置
US10846483B2 (en) Method, device, and apparatus for word vector processing based on clusters
CN107423269B (zh) 词向量处理方法及装置
CN113051910B (zh) 一种用于预测人物角色情绪的方法和装置
CN116028613B (zh) 常识问答方法、系统、计算机设备和存储介质
CN107562715B (zh) 词向量处理方法、装置以及电子设备
CN107577658B (zh) 词向量处理方法、装置以及电子设备
WO2019174392A1 (zh) 针对rpc信息的向量处理
CN107844472B (zh) 词向量处理方法、装置以及电子设备
CN107577659A (zh) 词向量处理方法、装置以及电子设备
CN114648701A (zh) 目标检测方法、系统及计算机设备
CN115309873A (zh) 语义的匹配方法、装置、计算机设备及存储介质
Nene Caption Generation for Images Using Deep Multimodal Neural Network

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19746952

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19746952

Country of ref document: EP

Kind code of ref document: A1