CN111694950A - Word embedding method and device and electronic equipment - Google Patents

Word embedding method and device and electronic equipment Download PDF

Info

Publication number
CN111694950A
CN111694950A CN201910194723.2A CN201910194723A CN111694950A CN 111694950 A CN111694950 A CN 111694950A CN 201910194723 A CN201910194723 A CN 201910194723A CN 111694950 A CN111694950 A CN 111694950A
Authority
CN
China
Prior art keywords
generator
word embedding
decider
neural network
sub
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910194723.2A
Other languages
Chinese (zh)
Inventor
吕乐
程建波
彭南博
史英迪
范敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
JD Digital Technology Holdings Co Ltd
Original Assignee
JD Digital Technology Holdings Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by JD Digital Technology Holdings Co Ltd filed Critical JD Digital Technology Holdings Co Ltd
Priority to CN201910194723.2A priority Critical patent/CN111694950A/en
Publication of CN111694950A publication Critical patent/CN111694950A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Error Detection And Correction (AREA)

Abstract

The present disclosure provides a word embedding method, comprising: receiving word vector data through a preset neural network, wherein the preset neural network comprises a generator and a judger; extracting feature tensors from the word vector data by the generator, the generator formed based on a deconvolution network; classifying the feature tensor by the decider to output a classification result, the decider being formed based on a convolutional network. The word embedding method provided by the disclosure can effectively reduce the number of adjustable parameters of the model, increase the depth of the model and improve the generalization capability of the model.

Description

Word embedding method and device and electronic equipment
Technical Field
The disclosure relates to the technical field of machine learning, in particular to a word embedding method based on a generator-judger model.
Background
With the rapid development of internet technology, the electric business using the internet as a carrier has a explosive growth, and mass real-time data is generated. The data contains valuable user information, such as user behavior data recorded by a user log, and is helpful for analyzing and acquiring user behavior preference. In order to better analyze the behavior preference of a user and further predict the user behavior, the industry uses a Word embedding (W2V, Word to Vec) technology to construct the mutual relation between words in text data, extract the general characteristics of the data, learn the low-order and high-order interaction of the general characteristics, and finish the analysis or prediction of the user behavior.
The FNN (Factorization Machine Neural Network) model is a commonly used word embedding model. The FNN model can divide heterogeneous data into different domains, map the data in the domains to a common low-dimensional real vector space by using an FM (frequency modulation) algorithm, and then use a multilayer perceptron model for analysis and prediction. However, the influence of the first layer parameters of the multilayer perceptron model on the final analysis and prediction capability of the model is small, and the influence of pre-training on the performance improvement of the FNN model is greatly reduced; in addition, the multilayer perceptron model is a fully-connected neural network, the quantity of adjustable parameters of the model is large, the training process is complex, and the over-fitting problem is easy to occur during training.
It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present disclosure, and thus may include information that does not constitute prior art known to those of ordinary skill in the art.
Disclosure of Invention
An object of the present disclosure is to provide a generator-determiner model-based word embedding method for overcoming, at least to some extent, problems of a large number of word embedding model parameters, easy generation of overfitting, and the like due to limitations and drawbacks of the related art.
According to a first aspect of embodiments of the present disclosure, there is provided a word embedding method, including: receiving word vector data through a preset neural network, wherein the preset neural network comprises a generator and a judger; extracting feature tensors from the word vector data by the generator, the generator formed based on a deconvolution network; classifying the feature tensor by the decider to output a classification result, the decider being formed based on a convolutional network.
In an exemplary embodiment of the present disclosure, the classification result is a prediction result of a preset target variable.
In an exemplary embodiment of the disclosure, the generator includes a plurality of generator sub-modules connected in series, each generator sub-module includes N1 deconvolution layers and N2 upsampling layers, and the number of characteristic channels of the plurality of generator sub-modules is the same.
In an exemplary embodiment of the present disclosure, the generator sub-modules halve the number of channels of the feature tensor output by the previous generator sub-module and spatially amplify the number of channels.
In an exemplary embodiment of the present disclosure, the decision device includes a plurality of decision device sub-modules and a plurality of full-connection layers connected in series, each of the decision device sub-modules includes N3 convolutional layers and N4 downsampling layers, and the number of characteristic channels of the plurality of decision device sub-modules is the same.
In an exemplary embodiment of the present disclosure, the judger sub-module doubles the number of channels of the feature tensor output by the previous judger sub-module and performs spatial reduction, and the full connection layer receives the feature tensor which is flattened into a vector and output by the last judger sub-module.
In an exemplary embodiment of the disclosure, the activation functions of the generator and the determiner are both linear rectification functions, and the activation function of the last layer network of the preset neural network is a hyperbolic function.
In an exemplary embodiment of the present disclosure, the loss function of the preset neural network is a two-class cross-entropy loss function.
In an exemplary embodiment of the present disclosure, the preset neural network is trained by an error back propagation algorithm.
According to a second aspect of embodiments of the present disclosure, there is provided a word embedding device including:
the data receiving module is used for receiving word vector data through a preset neural network, and the preset neural network comprises a generator and a judger;
a feature extraction module configured to extract a feature tensor for the word vector data by the generator, the generator being formed based on a deconvolution network;
and the classification output module is used for classifying the characteristic tensor through the decider so as to output a classification result, and the decider is formed on the basis of a convolutional network.
According to a third aspect of the present disclosure, there is provided a word embedding device comprising: a memory; and a processor coupled to the memory, the processor configured to perform the method of any of the above based on instructions stored in the memory.
According to a fourth aspect of the present disclosure, there is provided a computer-readable storage medium having stored thereon a program which, when executed by a processor, implements the word embedding method as recited in any one of the above.
The word embedding method provided by the embodiment of the disclosure can effectively reduce the adjustable parameters of the word embedding model by using the generator-decider model based on the deconvolution network-convolution network to perform feature extraction and classification prediction on the word vectors, thereby reducing the overfitting phenomenon of the model.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure. It is to be understood that the drawings in the following description are merely exemplary of the disclosure, and that other drawings may be derived from those drawings by one of ordinary skill in the art without the exercise of inventive faculty.
Fig. 1 is a flowchart of a word embedding method in an exemplary embodiment of the present disclosure.
Fig. 2 is a schematic diagram of a neural network in an exemplary embodiment of the present disclosure.
Fig. 3 is a schematic diagram of a generator in an exemplary embodiment of the present disclosure.
Fig. 4 is a schematic diagram of a decider in an exemplary embodiment of the present disclosure.
Fig. 5 is a block diagram of a word embedding device in an exemplary embodiment of the present disclosure.
FIG. 6 is a block diagram of an electronic device in an exemplary embodiment of the present disclosure.
FIG. 7 is a schematic diagram of a computer-readable storage medium in an exemplary embodiment of the disclosure.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the subject matter of the present disclosure can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and the like. In other instances, well-known technical solutions have not been shown or described in detail to avoid obscuring aspects of the present disclosure.
Further, the drawings are merely schematic illustrations of the present disclosure, in which the same reference numerals denote the same or similar parts, and thus, a repetitive description thereof will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.
The following detailed description of exemplary embodiments of the disclosure refers to the accompanying drawings.
Fig. 1 is a flowchart of a word embedding method in an exemplary embodiment of the present disclosure. Referring to fig. 1, a word embedding method 100 may include:
step S102, receiving word vector data through a preset neural network, wherein the preset neural network comprises a generator and a judger;
step S104, extracting feature tensors from the word vector data through the generator, wherein the generator is formed on the basis of a deconvolution network;
and step S106, classifying the feature tensor through the decider to output a classification result, wherein the decider is formed based on a convolutional network.
Aiming at the defect that an FNN model processes high-dimensional sparse heterogeneous data, the word embedding method based on a generator-judger network structure is provided, high-dimensional sparse features are converted into dense real tensors through a generator formed based on a deconvolution neural network, and the tensors are classified through a judger formed based on the convolution neural network, so that data analysis and prediction are achieved. Compared with a fully-connected network, the neural network using the deconvolution layer, the convolution layer and the nonlinear activation function has the advantages that the number of adjustable parameters is increased slowly in the process of increasing the depth, the complexity of the model is low, and the generalization capability of the word embedding model can be enhanced (namely, the overfitting problem is reduced) when the nonlinear function is fitted. On one hand, a generator formed based on the deconvolution neural network can learn the low-order and high-order interaction of features, extract richer and more effective information, and train a classification model on the basis of more abstract features; on the other hand, the decision device formed based on the convolutional neural network has few adjustable parameters, a deep network structure and strong generalization capability, and can effectively overcome the defects of the FNN model.
Next, each step of the word embedding method 100 will be described in detail.
Before inputting user data into a preset neural network, i.e., a word embedding model, the user data needs to be preprocessed first.
First, user data of a preset time period can be selected and added into a data set to be preprocessed. Taking the application score as an example, for each user, the log behavior information in the previous year of the activation time point has the most research value, the behavior habit preference of the user in the current state can be reflected most, and effective and non-virtual log behavior data in the time period can be selected as characteristic data. In order to prevent some user characteristics from being empty, statistics can be respectively carried out on order data and browsing data of the user within 30 days, 60 days, 90 days and 180 days before activation, and according to the coverage rate of the characteristics on the user, the statistics data within 90 days before the user is activated are finally selected as characteristic data added into the data set to be processed.
Then, the feature data may be processed into word vectors by a general method, and the word vectors are input to a preset neural network.
The word embedding model (preset neural network) provided by the disclosure is an end-to-end learning model emphasizing low-order and high-order feature interaction, and combines the contents of two aspects of feature learning and deep learning prediction.
Fig. 2 is a schematic structural diagram of a neural network provided in the present disclosure.
Referring to fig. 2, the neural network 200 is a deep neural network model and mainly includes a generator 21 and a decider 22.
The generator 21 is formed based on a deconvolution network, and includes a plurality of deconvolution modules 211, configured to extract feature tensors from word vector data; the decider 22 is formed based on a convolutional network, and includes several convolution modules 221 and a full connection layer 222 for classifying the feature tensor to output a classification result.
Fig. 3 is a schematic structural diagram of the generator 21 in one embodiment of the present disclosure.
Unlike the FNN model which uses a word embedding matrix to perform linear transformation on the high-dimensional sparse feature vector to generate low-dimensional dense features, the generator 21 uses a deconvolution network to perform nonlinear transformation on the high-dimensional sparse features to convert the high-dimensional sparse features into dense feature tensor representations. The deconvolution network can adaptively construct local correlation relations among different characteristic components.
Referring to fig. 3, the generator 21 is mainly composed of a deconvolution layer and an upsampling layer. In the embodiment shown in fig. 3, the generator 21 includes a plurality of generator sub-modules 211 (3 in fig. 3) connected in series, the number of eigen channels of each generator sub-module is the same, and each generator sub-module halves the number of eigen tensor channels output by the previous generator sub-module and performs spatial amplification.
In addition, each generator submodule 211 may include N1 deconvolution layers and N2 upsampling layers. In the embodiment shown in fig. 3, N1 is 2, N2 is 1, the sizes of the deconvolution kernels of the respective deconvolution layers are each set to 3 × 3, and the shift steps are each set to 1 × 1; the sampling layer kernel size of each upsampling layer is set to 2 × 2. In other embodiments, the number of generator sub-modules, specific values of N1 and N2, kernel sizes of the deconvolution/upsampling layers, and moving step sizes may be set by those skilled in the art, and the disclosure is not limited thereto.
Finally, all activation functions in the generator 21 are set to linear rectification functions (Rectified linear units).
In the embodiment of the disclosure, since the generator 21 is composed of the deconvolution layer, the upsampling layer and the nonlinear activation function, and includes multiple layers of parameters, the nonlinearity is strong, more effective feature representation can be extracted, the prediction capability of the model is obviously improved, and the defects that the influence of the first layer of parameters of the multilayer perceptron model on the final analysis prediction capability of the model is small, and the influence of the pre-training on the performance improvement of the FNN model is low can be effectively overcome.
Fig. 4 is a schematic structural diagram of the determiner 22 in one embodiment of the present disclosure.
Referring to fig. 4, the decision unit 22 is mainly composed of a convolutional layer, a downsampling layer, and a full link layer. In the embodiment shown in fig. 4, the arbiter 22 comprises a plurality of arbiter sub-modules 221 (3 in fig. 4) and a plurality of fully connected layers 222 (2 in fig. 4) connected in series. The number of the feature channels of each decider sub-module is the same, each sub-module doubles the number of the feature tensor channels of the previous module and reduces the space, and finally, the generated feature tensor is input to the full connection layer 222 after being flattened into a vector.
In addition, each of the decider sub-modules 221 may include N3 convolutional layers and N4 downsample layers. In the embodiment shown in fig. 4, N3 is 2, N4 is 1, the convolution kernels of the convolutional layers are each set to 3 × 3 in size, and the shift steps are each set to 1 × 1; the sampling layer core size of each downsampling layer is set to 2 × 2. All activation functions in the decider 22 may likewise be set to linear rectification functions. In other embodiments, the number of sub-decision blocks, the specific values of N3 and N4, the core size of each convolutional/downsample layer, the step size of the motion, and the activation function may be set by those skilled in the art, and the disclosure is not limited thereto.
It should be noted that, depending on the output target, different parameters may be set for the decider 22 to control the output scalar of the decider 22.
In the embodiment of the present disclosure, the output scalar of the decision device 22 is set as a prediction result of the target variable, for example, a prediction of the user behavior is output according to the input user log word vector data. In one embodiment, the risk of credit extended to the user may be predicted based on the user's browsing at the e-commerce website, purchasing data, and the like, for items that the user may purchase within 7 days of the future, or based on the user's behavioral data within 90 days of the future.
The structures shown in fig. 3 and 4 are merely examples, and in the research work related to the generation of the neural network, any generator and decision device structure may be used to implement the preset neural network 200 provided by the embodiment of the present disclosure, and the structure of the preset neural network 200 has strong flexibility.
In the preset neural network 200, the activation function of the last layer of the fully-connected network (i.e., the output network) may be set as a hyperbolic (sigmoid) function.
To process the word vector using the pre-set neural network 200, it needs to be trained first. For the preset neural network 200 formed based on the generator-decision device model proposed by the present disclosure, the loss function of the generator-decision device model can be set as a two-class cross entropy loss function, and the formula is as follows:
Figure BDA0001995422770000071
wherein l is a loss function, n is the number of samples in the data set, x is a word vector, y is a classification label of the user's real behavior, and a is a probability prediction of the generator-judger model on the user's behavior classification. The two-classification problem generally only adopts a cross entropy loss function to guide the training of the model, is a loss function obtained according to a maximum likelihood estimation method of the two-classification problem, and has a clear theoretical basis. The person skilled in the art can set the loss function according to the actual requirement, and the disclosure is not limited thereto.
In the disclosed embodiment, after the loss function is set, an error back-propagation algorithm pair is used for training, and the training mode is End-to-End (End-to-End) training (obtaining the expected output directly from the input, i.e. the learning algorithm connects the input End of the system to the output End) so as to simplify the training process.
After training, the preset neural network 200 may be applied to perform steps S102 to S106 of the word embedding method 100, and a classification result may be output according to the word vector data.
In order to verify the effect of the neural network 200 provided by the present disclosure, the prediction ability and stability of the model need to be verified. The test standard of the model usually has indexes such as AUC, KS and the like. The AUC measure is the area under the line of the ROC curve. In the classification problem, different recall rates and false alarm rates can be generated by using different classification threshold values, and an ROC curve can be drawn by taking the false alarm rate as a horizontal axis and the recall rate as a vertical axis. The KS index was calculated similarly to the AUC index. Ability to distinguish common users from overdue users by evaluation modelFor example, the KS index is calculated by: arranging the evaluation samples according to the ascending order of scores, cutting the samples into N sections (generally into 10 sections), and calculating the cumulative common user occupation ratio g of each section iiAnd overdue user ratio biKS is calculated as:
Figure BDA0001995422770000081
the evaluation indexes of the FNN model and the preset neural network 200 are compared on the same data set, and the data is shown in table 1.
Figure BDA0001995422770000082
As can be seen from table 1, compared with the FNN model, the AUC index of the preset neural network 200 is improved by one point, and the KS index is improved by two points, which indicates that the preset neural network 200 has more accurate classification capability.
In the embodiment of the disclosure, a generator formed by a deconvolution network converts a high-dimensional sparse feature vector describing user behavior into a low-dimensional feature tensor, and adaptively organizes related feature components into spatially adjacent points in the tensor; and a determiner formed by a convolutional network carries out prediction identification on the feature tensor. Because the deconvolution network (generator) and the convolution network (decision device) have few adjustable parameters, deep network structure, strong generalization capability and difficult overfitting, the defects of high difficulty and easy overfitting of the full-connection network in the FNN model can be effectively overcome. In addition, because the generator and the decision device are both composed of a multilayer network, the migration learning is easier to perform. For example, the generator part consists of an deconvolution layer, an up-sampling layer and a nonlinear activation function, the number of layers is more, after training, a deeper network part can be reserved after model training is finished to be directly used for abstract feature extraction of other models, and the defect that FM only has one layer is overcome; the decider portion may be applied to receive the feature tensor output by the generator and output a plurality of classification results in a plurality of classification modes after fine training using a logistic regression or factorization algorithm.
In conclusion, the word embedding method provided by the disclosure can effectively reduce the number of adjustable parameters of the neural network model for realizing word embedding, increase the depth of the model, and improve the generalization capability of the model.
Corresponding to the above method embodiments, the present disclosure also provides a word embedding apparatus, which may be used to execute the above method embodiments.
Fig. 5 is a block diagram of a word embedding device in an exemplary embodiment of the present disclosure.
Referring to fig. 5, the word embedding device 500 may include:
a data receiving module 502 configured to receive word vector data through a preset neural network, where the preset neural network includes a generator and a determiner;
a feature extraction module 504 configured to extract feature tensors from the word vector data by the generator, the generator being formed based on a deconvolution network;
a classification output module 506 configured to classify the feature tensor by the decider to output a classification result, wherein the decider is formed based on a convolutional network.
In an exemplary embodiment of the present disclosure, the classification result is a prediction result of a preset target variable.
In an exemplary embodiment of the disclosure, the generator includes a plurality of generator sub-modules connected in series, each generator sub-module includes N1 deconvolution layers and N2 upsampling layers, and the number of characteristic channels of the plurality of generator sub-modules is the same.
In an exemplary embodiment of the present disclosure, the generator sub-modules halve the number of channels of the feature tensor output by the previous generator sub-module and spatially amplify the number of channels.
In an exemplary embodiment of the present disclosure, the decision device includes a plurality of decision device sub-modules and a plurality of full-connection layers connected in series, each of the decision device sub-modules includes N3 convolutional layers and N4 downsampling layers, and the number of characteristic channels of the plurality of decision device sub-modules is the same.
In an exemplary embodiment of the present disclosure, the judger sub-module doubles the number of channels of the feature tensor output by the previous judger sub-module and performs spatial reduction, and the full connection layer receives the feature tensor which is flattened into a vector and output by the last judger sub-module.
In an exemplary embodiment of the disclosure, the activation functions of the generator and the determiner are both linear rectification functions, and the activation function of the last layer network of the preset neural network is a hyperbolic function.
In an exemplary embodiment of the present disclosure, the loss function of the preset neural network is a two-class cross-entropy loss function.
In an exemplary embodiment of the present disclosure, the preset neural network is trained by an error back propagation algorithm.
Since the functions of the apparatus 500 have been described in detail in the corresponding method embodiments, the disclosure is not repeated herein.
It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.
In an exemplary embodiment of the present disclosure, an electronic device capable of implementing the above method is also provided.
As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method or program product. Thus, various aspects of the invention may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system.
An electronic device 600 according to this embodiment of the invention is described below with reference to fig. 6. The electronic device 600 shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.
As shown in fig. 6, the electronic device 600 is embodied in the form of a general purpose computing device. The components of the electronic device 600 may include, but are not limited to: the at least one processing unit 610, the at least one memory unit 620, and a bus 630 that couples the various system components including the memory unit 620 and the processing unit 610.
Wherein the storage unit stores program code that is executable by the processing unit 610 to cause the processing unit 610 to perform steps according to various exemplary embodiments of the present invention as described in the above section "exemplary methods" of the present specification. For example, the processing unit 610 may execute step S102 as shown in fig. 1: receiving word vector data through a preset neural network, wherein the preset neural network comprises a generator and a judger; step S104: extracting feature tensors from the word vector data by the generator, the generator formed based on a deconvolution network; step S106: classifying the feature tensor by the decider to output a classification result, the decider being formed based on a convolutional network.
The storage unit 620 may include readable media in the form of volatile memory units, such as a random access memory unit (RAM)6201 and/or a cache memory unit 6202, and may further include a read-only memory unit (ROM) 6203.
The memory unit 620 may also include a program/utility 6204 having a set (at least one) of program modules 6205, such program modules 6205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
Bus 630 may be one or more of several types of bus structures, including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.
The electronic device 600 may also communicate with one or more external devices 700 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device 600, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 600 to communicate with one or more other computing devices. Such communication may occur via an input/output (I/O) interface 650. Also, the electronic device 600 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the Internet) via the network adapter 660. As shown, the network adapter 660 communicates with the other modules of the electronic device 600 over the bus 630. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the electronic device 600, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a terminal device, or a network device, etc.) to execute the method according to the embodiments of the present disclosure.
In an exemplary embodiment of the present disclosure, there is also provided a computer-readable storage medium having stored thereon a program product capable of implementing the above-described method of the present specification. In some possible embodiments, aspects of the invention may also be implemented in the form of a program product comprising program code means for causing a terminal device to carry out the steps according to various exemplary embodiments of the invention described in the above section "exemplary methods" of the present description, when said program product is run on the terminal device.
Referring to fig. 7, a program product 800 for implementing the above method according to an embodiment of the present invention is described, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present invention is not limited in this regard and, in the present document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
A computer readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
Furthermore, the above-described figures are merely schematic illustrations of processes involved in methods according to exemplary embodiments of the invention, and are not intended to be limiting. It will be readily understood that the processes shown in the above figures are not intended to indicate or limit the chronological order of the processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, e.g., in multiple modules.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

Claims (12)

1. A word embedding method, comprising:
receiving word vector data through a preset neural network, wherein the preset neural network comprises a generator and a judger;
extracting feature tensors from the word vector data by the generator, the generator formed based on a deconvolution network;
classifying the feature tensor by the decider to output a classification result, the decider being formed based on a convolutional network.
2. The word embedding method according to claim 1, wherein the classification result is a prediction result for a preset target variable.
3. The word embedding method of claim 1, wherein the generator includes a plurality of generator sub-modules connected in series, each of the generator sub-modules including N1 deconvolution layers and N2 upsampling layers, the number of characteristic channels of the plurality of generator sub-modules being the same.
4. The word embedding method of claim 3, wherein the generator submodule halves and spatially amplifies the number of channels of the feature tensor output by a previous generator submodule.
5. The word embedding method as claimed in claim 1, wherein the decider includes a plurality of decider sub-modules and a plurality of full-connection layers connected in series, each of the decider sub-modules includes N3 convolutional layers and N4 downsampling layers, and the number of characteristic channels of the plurality of decider sub-modules is the same.
6. The word embedding method according to claim 5, wherein the judger sub-module doubles the number of channels of the feature tensor output by the previous judger sub-module and performs spatial reduction, and the fully-connected layer receives the feature tensor flattened into a vector output by the last judger sub-module.
7. The word embedding method according to claim 1, wherein the activation functions of the generator and the decider are both linear rectification functions, and the activation function of the last layer of the preset neural network is a hyperbolic function.
8. The word embedding method of claim 1, wherein the loss function of the pre-defined neural network is a two-class cross-entropy loss function.
9. The word embedding method according to claim 1, wherein the pre-set neural network is trained by an error back-propagation algorithm.
10. A word embedding device, comprising:
the data receiving module is used for receiving word vector data through a preset neural network, and the preset neural network comprises a generator and a judger;
a feature extraction module configured to extract a feature tensor for the word vector data by the generator, the generator being formed based on a deconvolution network;
and the classification output module is used for classifying the characteristic tensor through the decider so as to output a classification result, and the decider is formed on the basis of a convolutional network.
11. An electronic device, comprising:
a memory; and
a processor coupled to the memory, the processor configured to perform the word embedding method of any of claims 1-9 based on instructions stored in the memory.
12. A computer-readable storage medium on which a program is stored, which program, when executed by a processor, implements the word embedding method according to any one of claims 1 to 9.
CN201910194723.2A 2019-03-14 2019-03-14 Word embedding method and device and electronic equipment Pending CN111694950A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910194723.2A CN111694950A (en) 2019-03-14 2019-03-14 Word embedding method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910194723.2A CN111694950A (en) 2019-03-14 2019-03-14 Word embedding method and device and electronic equipment

Publications (1)

Publication Number Publication Date
CN111694950A true CN111694950A (en) 2020-09-22

Family

ID=72475231

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910194723.2A Pending CN111694950A (en) 2019-03-14 2019-03-14 Word embedding method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN111694950A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170270100A1 (en) * 2016-03-18 2017-09-21 International Business Machines Corporation External Word Embedding Neural Network Language Models
US20170308790A1 (en) * 2016-04-21 2017-10-26 International Business Machines Corporation Text classification by ranking with convolutional neural networks
US20180189559A1 (en) * 2016-12-29 2018-07-05 Ncsoft Corporation Apparatus and method for detecting debatable document
CN109284506A (en) * 2018-11-29 2019-01-29 重庆邮电大学 A kind of user comment sentiment analysis system and method based on attention convolutional neural networks

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170270100A1 (en) * 2016-03-18 2017-09-21 International Business Machines Corporation External Word Embedding Neural Network Language Models
US20170308790A1 (en) * 2016-04-21 2017-10-26 International Business Machines Corporation Text classification by ranking with convolutional neural networks
US20180189559A1 (en) * 2016-12-29 2018-07-05 Ncsoft Corporation Apparatus and method for detecting debatable document
CN109284506A (en) * 2018-11-29 2019-01-29 重庆邮电大学 A kind of user comment sentiment analysis system and method based on attention convolutional neural networks

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
子非鱼LEO: "DCGAN深度卷积生成对抗网络&python自动绘图", Retrieved from the Internet <URL:https://blog.csdn.net/Leo1120178518/article/details/86509680> *

Similar Documents

Publication Publication Date Title
Sharma et al. Era of deep neural networks: A review
Nanni et al. Ensemble of convolutional neural networks to improve animal audio classification
Ratul et al. Skin lesions classification using deep learning based on dilated convolution
US20200104729A1 (en) Method and system for extracting information from graphs
US11321363B2 (en) Method and system for extracting information from graphs
CN109766557B (en) Emotion analysis method and device, storage medium and terminal equipment
CN116194912A (en) Method and system for aspect-level emotion classification using graph diffusion transducers
CN108959246A (en) Answer selection method, device and electronic equipment based on improved attention mechanism
JP2020520492A (en) Document abstract automatic extraction method, device, computer device and storage medium
WO2020238783A1 (en) Information processing method and device, and storage medium
CN110598620B (en) Deep neural network model-based recommendation method and device
CN106663425A (en) Frame skipping with extrapolation and outputs on demand neural network for automatic speech recognition
CN110929774A (en) Method for classifying target objects in image, method and device for training model
Chen et al. Recursive context routing for object detection
CN113946681B (en) Text data event extraction method and device, electronic equipment and readable medium
CN109034206A (en) Image classification recognition methods, device, electronic equipment and computer-readable medium
CN110796171A (en) Unclassified sample processing method and device of machine learning model and electronic equipment
CN113434683A (en) Text classification method, device, medium and electronic equipment
WO2023159756A1 (en) Price data processing method and apparatus, electronic device, and storage medium
EP4318322A1 (en) Data processing method and related device
CN115775350A (en) Image enhancement method and device and computing equipment
CN112884118A (en) Neural network searching method, device and equipment
US20190236419A1 (en) Method and apparatus for recognizing video fine granularity, computer device and storage medium
CN110348581B (en) User feature optimizing method, device, medium and electronic equipment in user feature group
CN114065771A (en) Pre-training language processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
CB02 Change of applicant information

Address after: Room 221, 2 / F, block C, 18 Kechuang 11th Street, Daxing District, Beijing, 100176

Applicant after: Jingdong Technology Holding Co.,Ltd.

Address before: Room 221, 2 / F, block C, 18 Kechuang 11th Street, Daxing District, Beijing, 100176

Applicant before: Jingdong Digital Technology Holding Co.,Ltd.

Address after: Room 221, 2 / F, block C, 18 Kechuang 11th Street, Daxing District, Beijing, 100176

Applicant after: Jingdong Digital Technology Holding Co.,Ltd.

Address before: Room 221, 2 / F, block C, 18 Kechuang 11th Street, Daxing District, Beijing, 100176

Applicant before: JINGDONG DIGITAL TECHNOLOGY HOLDINGS Co.,Ltd.

CB02 Change of applicant information
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination