CN111694950A

CN111694950A - Word embedding method and device and electronic equipment

Info

Publication number: CN111694950A
Application number: CN201910194723.2A
Authority: CN
Inventors: 吕乐; 程建波; 彭南博; 史英迪; 范敏
Original assignee: JD Digital Technology Holdings Co Ltd
Current assignee: JD Digital Technology Holdings Co Ltd
Priority date: 2019-03-14
Filing date: 2019-03-14
Publication date: 2020-09-22

Abstract

The present disclosure provides a word embedding method, comprising: receiving word vector data through a preset neural network, wherein the preset neural network comprises a generator and a judger; extracting feature tensors from the word vector data by the generator, the generator formed based on a deconvolution network; classifying the feature tensor by the decider to output a classification result, the decider being formed based on a convolutional network. The word embedding method provided by the disclosure can effectively reduce the number of adjustable parameters of the model, increase the depth of the model and improve the generalization capability of the model.

Description

Word embedding method and device and electronic equipment

Technical Field

The disclosure relates to the technical field of machine learning, in particular to a word embedding method based on a generator-judger model.

Background

With the rapid development of internet technology, the electric business using the internet as a carrier has a explosive growth, and mass real-time data is generated. The data contains valuable user information, such as user behavior data recorded by a user log, and is helpful for analyzing and acquiring user behavior preference. In order to better analyze the behavior preference of a user and further predict the user behavior, the industry uses a Word embedding (W2V, Word to Vec) technology to construct the mutual relation between words in text data, extract the general characteristics of the data, learn the low-order and high-order interaction of the general characteristics, and finish the analysis or prediction of the user behavior.

The FNN (Factorization Machine Neural Network) model is a commonly used word embedding model. The FNN model can divide heterogeneous data into different domains, map the data in the domains to a common low-dimensional real vector space by using an FM (frequency modulation) algorithm, and then use a multilayer perceptron model for analysis and prediction. However, the influence of the first layer parameters of the multilayer perceptron model on the final analysis and prediction capability of the model is small, and the influence of pre-training on the performance improvement of the FNN model is greatly reduced; in addition, the multilayer perceptron model is a fully-connected neural network, the quantity of adjustable parameters of the model is large, the training process is complex, and the over-fitting problem is easy to occur during training.

It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present disclosure, and thus may include information that does not constitute prior art known to those of ordinary skill in the art.

Disclosure of Invention

An object of the present disclosure is to provide a generator-determiner model-based word embedding method for overcoming, at least to some extent, problems of a large number of word embedding model parameters, easy generation of overfitting, and the like due to limitations and drawbacks of the related art.

According to a first aspect of embodiments of the present disclosure, there is provided a word embedding method, including: receiving word vector data through a preset neural network, wherein the preset neural network comprises a generator and a judger; extracting feature tensors from the word vector data by the generator, the generator formed based on a deconvolution network; classifying the feature tensor by the decider to output a classification result, the decider being formed based on a convolutional network.

In an exemplary embodiment of the present disclosure, the classification result is a prediction result of a preset target variable.

In an exemplary embodiment of the disclosure, the generator includes a plurality of generator sub-modules connected in series, each generator sub-module includes N1 deconvolution layers and N2 upsampling layers, and the number of characteristic channels of the plurality of generator sub-modules is the same.

In an exemplary embodiment of the present disclosure, the generator sub-modules halve the number of channels of the feature tensor output by the previous generator sub-module and spatially amplify the number of channels.

In an exemplary embodiment of the present disclosure, the decision device includes a plurality of decision device sub-modules and a plurality of full-connection layers connected in series, each of the decision device sub-modules includes N3 convolutional layers and N4 downsampling layers, and the number of characteristic channels of the plurality of decision device sub-modules is the same.

In an exemplary embodiment of the present disclosure, the judger sub-module doubles the number of channels of the feature tensor output by the previous judger sub-module and performs spatial reduction, and the full connection layer receives the feature tensor which is flattened into a vector and output by the last judger sub-module.

In an exemplary embodiment of the disclosure, the activation functions of the generator and the determiner are both linear rectification functions, and the activation function of the last layer network of the preset neural network is a hyperbolic function.

In an exemplary embodiment of the present disclosure, the loss function of the preset neural network is a two-class cross-entropy loss function.

In an exemplary embodiment of the present disclosure, the preset neural network is trained by an error back propagation algorithm.

According to a second aspect of embodiments of the present disclosure, there is provided a word embedding device including:

the data receiving module is used for receiving word vector data through a preset neural network, and the preset neural network comprises a generator and a judger;

a feature extraction module configured to extract a feature tensor for the word vector data by the generator, the generator being formed based on a deconvolution network;

and the classification output module is used for classifying the characteristic tensor through the decider so as to output a classification result, and the decider is formed on the basis of a convolutional network.

According to a third aspect of the present disclosure, there is provided a word embedding device comprising: a memory; and a processor coupled to the memory, the processor configured to perform the method of any of the above based on instructions stored in the memory.

According to a fourth aspect of the present disclosure, there is provided a computer-readable storage medium having stored thereon a program which, when executed by a processor, implements the word embedding method as recited in any one of the above.

The word embedding method provided by the embodiment of the disclosure can effectively reduce the adjustable parameters of the word embedding model by using the generator-decider model based on the deconvolution network-convolution network to perform feature extraction and classification prediction on the word vectors, thereby reducing the overfitting phenomenon of the model.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure. It is to be understood that the drawings in the following description are merely exemplary of the disclosure, and that other drawings may be derived from those drawings by one of ordinary skill in the art without the exercise of inventive faculty.

Fig. 1 is a flowchart of a word embedding method in an exemplary embodiment of the present disclosure.

Fig. 2 is a schematic diagram of a neural network in an exemplary embodiment of the present disclosure.

Fig. 3 is a schematic diagram of a generator in an exemplary embodiment of the present disclosure.

Fig. 4 is a schematic diagram of a decider in an exemplary embodiment of the present disclosure.

Fig. 5 is a block diagram of a word embedding device in an exemplary embodiment of the present disclosure.

FIG. 6 is a block diagram of an electronic device in an exemplary embodiment of the present disclosure.

FIG. 7 is a schematic diagram of a computer-readable storage medium in an exemplary embodiment of the disclosure.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the subject matter of the present disclosure can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and the like. In other instances, well-known technical solutions have not been shown or described in detail to avoid obscuring aspects of the present disclosure.

Further, the drawings are merely schematic illustrations of the present disclosure, in which the same reference numerals denote the same or similar parts, and thus, a repetitive description thereof will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.

The following detailed description of exemplary embodiments of the disclosure refers to the accompanying drawings.

Fig. 1 is a flowchart of a word embedding method in an exemplary embodiment of the present disclosure. Referring to fig. 1, a word embedding method 100 may include:

step S102, receiving word vector data through a preset neural network, wherein the preset neural network comprises a generator and a judger;

step S104, extracting feature tensors from the word vector data through the generator, wherein the generator is formed on the basis of a deconvolution network;

and step S106, classifying the feature tensor through the decider to output a classification result, wherein the decider is formed based on a convolutional network.

Aiming at the defect that an FNN model processes high-dimensional sparse heterogeneous data, the word embedding method based on a generator-judger network structure is provided, high-dimensional sparse features are converted into dense real tensors through a generator formed based on a deconvolution neural network, and the tensors are classified through a judger formed based on the convolution neural network, so that data analysis and prediction are achieved. Compared with a fully-connected network, the neural network using the deconvolution layer, the convolution layer and the nonlinear activation function has the advantages that the number of adjustable parameters is increased slowly in the process of increasing the depth, the complexity of the model is low, and the generalization capability of the word embedding model can be enhanced (namely, the overfitting problem is reduced) when the nonlinear function is fitted. On one hand, a generator formed based on the deconvolution neural network can learn the low-order and high-order interaction of features, extract richer and more effective information, and train a classification model on the basis of more abstract features; on the other hand, the decision device formed based on the convolutional neural network has few adjustable parameters, a deep network structure and strong generalization capability, and can effectively overcome the defects of the FNN model.

Next, each step of the word embedding method 100 will be described in detail.

Before inputting user data into a preset neural network, i.e., a word embedding model, the user data needs to be preprocessed first.

First, user data of a preset time period can be selected and added into a data set to be preprocessed. Taking the application score as an example, for each user, the log behavior information in the previous year of the activation time point has the most research value, the behavior habit preference of the user in the current state can be reflected most, and effective and non-virtual log behavior data in the time period can be selected as characteristic data. In order to prevent some user characteristics from being empty, statistics can be respectively carried out on order data and browsing data of the user within 30 days, 60 days, 90 days and 180 days before activation, and according to the coverage rate of the characteristics on the user, the statistics data within 90 days before the user is activated are finally selected as characteristic data added into the data set to be processed.

Then, the feature data may be processed into word vectors by a general method, and the word vectors are input to a preset neural network.

The word embedding model (preset neural network) provided by the disclosure is an end-to-end learning model emphasizing low-order and high-order feature interaction, and combines the contents of two aspects of feature learning and deep learning prediction.

Fig. 2 is a schematic structural diagram of a neural network provided in the present disclosure.

Referring to fig. 2, the neural network 200 is a deep neural network model and mainly includes a generator 21 and a decider 22.

The generator 21 is formed based on a deconvolution network, and includes a plurality of deconvolution modules 211, configured to extract feature tensors from word vector data; the decider 22 is formed based on a convolutional network, and includes several convolution modules 221 and a full connection layer 222 for classifying the feature tensor to output a classification result.

Fig. 3 is a schematic structural diagram of the generator 21 in one embodiment of the present disclosure.

Unlike the FNN model which uses a word embedding matrix to perform linear transformation on the high-dimensional sparse feature vector to generate low-dimensional dense features, the generator 21 uses a deconvolution network to perform nonlinear transformation on the high-dimensional sparse features to convert the high-dimensional sparse features into dense feature tensor representations. The deconvolution network can adaptively construct local correlation relations among different characteristic components.

Referring to fig. 3, the generator 21 is mainly composed of a deconvolution layer and an upsampling layer. In the embodiment shown in fig. 3, the generator 21 includes a plurality of generator sub-modules 211 (3 in fig. 3) connected in series, the number of eigen channels of each generator sub-module is the same, and each generator sub-module halves the number of eigen tensor channels output by the previous generator sub-module and performs spatial amplification.

In addition, each generator submodule 211 may include N1 deconvolution layers and N2 upsampling layers. In the embodiment shown in fig. 3, N1 is 2, N2 is 1, the sizes of the deconvolution kernels of the respective deconvolution layers are each set to 3 × 3, and the shift steps are each set to 1 × 1; the sampling layer kernel size of each upsampling layer is set to 2 × 2. In other embodiments, the number of generator sub-modules, specific values of N1 and N2, kernel sizes of the deconvolution/upsampling layers, and moving step sizes may be set by those skilled in the art, and the disclosure is not limited thereto.

Finally, all activation functions in the generator 21 are set to linear rectification functions (Rectified linear units).

In the embodiment of the disclosure, since the generator 21 is composed of the deconvolution layer, the upsampling layer and the nonlinear activation function, and includes multiple layers of parameters, the nonlinearity is strong, more effective feature representation can be extracted, the prediction capability of the model is obviously improved, and the defects that the influence of the first layer of parameters of the multilayer perceptron model on the final analysis prediction capability of the model is small, and the influence of the pre-training on the performance improvement of the FNN model is low can be effectively overcome.

Fig. 4 is a schematic structural diagram of the determiner 22 in one embodiment of the present disclosure.

Referring to fig. 4, the decision unit 22 is mainly composed of a convolutional layer, a downsampling layer, and a full link layer. In the embodiment shown in fig. 4, the arbiter 22 comprises a plurality of arbiter sub-modules 221 (3 in fig. 4) and a plurality of fully connected layers 222 (2 in fig. 4) connected in series. The number of the feature channels of each decider sub-module is the same, each sub-module doubles the number of the feature tensor channels of the previous module and reduces the space, and finally, the generated feature tensor is input to the full connection layer 222 after being flattened into a vector.

In addition, each of the decider sub-modules 221 may include N3 convolutional layers and N4 downsample layers. In the embodiment shown in fig. 4, N3 is 2, N4 is 1, the convolution kernels of the convolutional layers are each set to 3 × 3 in size, and the shift steps are each set to 1 × 1; the sampling layer core size of each downsampling layer is set to 2 × 2. All activation functions in the decider 22 may likewise be set to linear rectification functions. In other embodiments, the number of sub-decision blocks, the specific values of N3 and N4, the core size of each convolutional/downsample layer, the step size of the motion, and the activation function may be set by those skilled in the art, and the disclosure is not limited thereto.

It should be noted that, depending on the output target, different parameters may be set for the decider 22 to control the output scalar of the decider 22.

In the embodiment of the present disclosure, the output scalar of the decision device 22 is set as a prediction result of the target variable, for example, a prediction of the user behavior is output according to the input user log word vector data. In one embodiment, the risk of credit extended to the user may be predicted based on the user's browsing at the e-commerce website, purchasing data, and the like, for items that the user may purchase within 7 days of the future, or based on the user's behavioral data within 90 days of the future.

The structures shown in fig. 3 and 4 are merely examples, and in the research work related to the generation of the neural network, any generator and decision device structure may be used to implement the preset neural network 200 provided by the embodiment of the present disclosure, and the structure of the preset neural network 200 has strong flexibility.

In the preset neural network 200, the activation function of the last layer of the fully-connected network (i.e., the output network) may be set as a hyperbolic (sigmoid) function.

To process the word vector using the pre-set neural network 200, it needs to be trained first. For the preset neural network 200 formed based on the generator-decision device model proposed by the present disclosure, the loss function of the generator-decision device model can be set as a two-class cross entropy loss function, and the formula is as follows:

wherein l is a loss function, n is the number of samples in the data set, x is a word vector, y is a classification label of the user's real behavior, and a is a probability prediction of the generator-judger model on the user's behavior classification. The two-classification problem generally only adopts a cross entropy loss function to guide the training of the model, is a loss function obtained according to a maximum likelihood estimation method of the two-classification problem, and has a clear theoretical basis. The person skilled in the art can set the loss function according to the actual requirement, and the disclosure is not limited thereto.

In the disclosed embodiment, after the loss function is set, an error back-propagation algorithm pair is used for training, and the training mode is End-to-End (End-to-End) training (obtaining the expected output directly from the input, i.e. the learning algorithm connects the input End of the system to the output End) so as to simplify the training process.

After training, the preset neural network 200 may be applied to perform steps S102 to S106 of the word embedding method 100, and a classification result may be output according to the word vector data.

In order to verify the effect of the neural network 200 provided by the present disclosure, the prediction ability and stability of the model need to be verified. The test standard of the model usually has indexes such as AUC, KS and the like. The AUC measure is the area under the line of the ROC curve. In the classification problem, different recall rates and false alarm rates can be generated by using different classification threshold values, and an ROC curve can be drawn by taking the false alarm rate as a horizontal axis and the recall rate as a vertical axis. The KS index was calculated similarly to the AUC index. Ability to distinguish common users from overdue users by evaluation modelFor example, the KS index is calculated by: arranging the evaluation samples according to the ascending order of scores, cutting the samples into N sections (generally into 10 sections), and calculating the cumulative common user occupation ratio g of each section i_iAnd overdue user ratio b_iKS is calculated as:

the evaluation indexes of the FNN model and the preset neural network 200 are compared on the same data set, and the data is shown in table 1.

As can be seen from table 1, compared with the FNN model, the AUC index of the preset neural network 200 is improved by one point, and the KS index is improved by two points, which indicates that the preset neural network 200 has more accurate classification capability.

In the embodiment of the disclosure, a generator formed by a deconvolution network converts a high-dimensional sparse feature vector describing user behavior into a low-dimensional feature tensor, and adaptively organizes related feature components into spatially adjacent points in the tensor; and a determiner formed by a convolutional network carries out prediction identification on the feature tensor. Because the deconvolution network (generator) and the convolution network (decision device) have few adjustable parameters, deep network structure, strong generalization capability and difficult overfitting, the defects of high difficulty and easy overfitting of the full-connection network in the FNN model can be effectively overcome. In addition, because the generator and the decision device are both composed of a multilayer network, the migration learning is easier to perform. For example, the generator part consists of an deconvolution layer, an up-sampling layer and a nonlinear activation function, the number of layers is more, after training, a deeper network part can be reserved after model training is finished to be directly used for abstract feature extraction of other models, and the defect that FM only has one layer is overcome; the decider portion may be applied to receive the feature tensor output by the generator and output a plurality of classification results in a plurality of classification modes after fine training using a logistic regression or factorization algorithm.

In conclusion, the word embedding method provided by the disclosure can effectively reduce the number of adjustable parameters of the neural network model for realizing word embedding, increase the depth of the model, and improve the generalization capability of the model.

Corresponding to the above method embodiments, the present disclosure also provides a word embedding apparatus, which may be used to execute the above method embodiments.

Referring to fig. 5, the word embedding device 500 may include:

a data receiving module 502 configured to receive word vector data through a preset neural network, where the preset neural network includes a generator and a determiner;

a feature extraction module 504 configured to extract feature tensors from the word vector data by the generator, the generator being formed based on a deconvolution network;

a classification output module 506 configured to classify the feature tensor by the decider to output a classification result, wherein the decider is formed based on a convolutional network.

Since the functions of the apparatus 500 have been described in detail in the corresponding method embodiments, the disclosure is not repeated herein.

It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.

In an exemplary embodiment of the present disclosure, an electronic device capable of implementing the above method is also provided.

As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method or program product. Thus, various aspects of the invention may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system.

An electronic device 600 according to this embodiment of the invention is described below with reference to fig. 6. The electronic device 600 shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.

As shown in fig. 6, the electronic device 600 is embodied in the form of a general purpose computing device. The components of the electronic device 600 may include, but are not limited to: the at least one processing unit 610, the at least one memory unit 620, and a bus 630 that couples the various system components including the memory unit 620 and the processing unit 610.

Wherein the storage unit stores program code that is executable by the processing unit 610 to cause the processing unit 610 to perform steps according to various exemplary embodiments of the present invention as described in the above section "exemplary methods" of the present specification. For example, the processing unit 610 may execute step S102 as shown in fig. 1: receiving word vector data through a preset neural network, wherein the preset neural network comprises a generator and a judger; step S104: extracting feature tensors from the word vector data by the generator, the generator formed based on a deconvolution network; step S106: classifying the feature tensor by the decider to output a classification result, the decider being formed based on a convolutional network.

The storage unit 620 may include readable media in the form of volatile memory units, such as a random access memory unit (RAM)6201 and/or a cache memory unit 6202, and may further include a read-only memory unit (ROM) 6203.

The memory unit 620 may also include a program/utility 6204 having a set (at least one) of program modules 6205, such program modules 6205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.

Bus 630 may be one or more of several types of bus structures, including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.

The electronic device 600 may also communicate with one or more external devices 700 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device 600, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 600 to communicate with one or more other computing devices. Such communication may occur via an input/output (I/O) interface 650. Also, the electronic device 600 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the Internet) via the network adapter 660. As shown, the network adapter 660 communicates with the other modules of the electronic device 600 over the bus 630. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the electronic device 600, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a terminal device, or a network device, etc.) to execute the method according to the embodiments of the present disclosure.

In an exemplary embodiment of the present disclosure, there is also provided a computer-readable storage medium having stored thereon a program product capable of implementing the above-described method of the present specification. In some possible embodiments, aspects of the invention may also be implemented in the form of a program product comprising program code means for causing a terminal device to carry out the steps according to various exemplary embodiments of the invention described in the above section "exemplary methods" of the present description, when said program product is run on the terminal device.

Referring to fig. 7, a program product 800 for implementing the above method according to an embodiment of the present invention is described, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present invention is not limited in this regard and, in the present document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

A computer readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).

Furthermore, the above-described figures are merely schematic illustrations of processes involved in methods according to exemplary embodiments of the invention, and are not intended to be limiting. It will be readily understood that the processes shown in the above figures are not intended to indicate or limit the chronological order of the processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, e.g., in multiple modules.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

Claims

1. A word embedding method, comprising:

receiving word vector data through a preset neural network, wherein the preset neural network comprises a generator and a judger;

extracting feature tensors from the word vector data by the generator, the generator formed based on a deconvolution network;

classifying the feature tensor by the decider to output a classification result, the decider being formed based on a convolutional network.

2. The word embedding method according to claim 1, wherein the classification result is a prediction result for a preset target variable.

3. The word embedding method of claim 1, wherein the generator includes a plurality of generator sub-modules connected in series, each of the generator sub-modules including N1 deconvolution layers and N2 upsampling layers, the number of characteristic channels of the plurality of generator sub-modules being the same.

4. The word embedding method of claim 3, wherein the generator submodule halves and spatially amplifies the number of channels of the feature tensor output by a previous generator submodule.

5. The word embedding method as claimed in claim 1, wherein the decider includes a plurality of decider sub-modules and a plurality of full-connection layers connected in series, each of the decider sub-modules includes N3 convolutional layers and N4 downsampling layers, and the number of characteristic channels of the plurality of decider sub-modules is the same.

6. The word embedding method according to claim 5, wherein the judger sub-module doubles the number of channels of the feature tensor output by the previous judger sub-module and performs spatial reduction, and the fully-connected layer receives the feature tensor flattened into a vector output by the last judger sub-module.

7. The word embedding method according to claim 1, wherein the activation functions of the generator and the decider are both linear rectification functions, and the activation function of the last layer of the preset neural network is a hyperbolic function.

8. The word embedding method of claim 1, wherein the loss function of the pre-defined neural network is a two-class cross-entropy loss function.

9. The word embedding method according to claim 1, wherein the pre-set neural network is trained by an error back-propagation algorithm.

10. A word embedding device, comprising:

11. An electronic device, comprising:

a memory; and

a processor coupled to the memory, the processor configured to perform the word embedding method of any of claims 1-9 based on instructions stored in the memory.

12. A computer-readable storage medium on which a program is stored, which program, when executed by a processor, implements the word embedding method according to any one of claims 1 to 9.