CN110889290A

CN110889290A - Text encoding method and apparatus, text encoding validity checking method and apparatus

Info

Publication number: CN110889290A
Application number: CN201911107117.9A
Authority: CN
Inventors: 双锴; 张智轩; 顾梦宇
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2019-11-13
Filing date: 2019-11-13
Publication date: 2020-03-17
Anticipated expiration: 2039-11-13
Also published as: CN110889290B

Abstract

The invention provides a text encoding method and device, a text encoding validity checking method and device, an electronic device and a non-transitory computer readable storage medium. The text encoding method includes: receiving text input, and inputting the text input into a global encoder to obtain a first encoding result, wherein the first encoding result comprises global subject information of a text; inputting the first coding result into a two-dimensional deconvolution layer to generate a convolution kernel, wherein the parameter of the convolution kernel can sense context semantic information and is adjusted according to the context semantic information; and performing one-dimensional convolution operation on the text word by word according to the convolution kernel so as to obtain a second coding result. According to the technical scheme of the invention, the generation process of generating the convolution filter through deconvolution is simple and effective, and also relates to global topic information of sentences or documents, and the generated filter can generate a coding result of the cognitive polysemous words.

Description

Text encoding method and apparatus, text encoding validity checking method and apparatus

Technical Field

The present invention relates to the field of natural language processing for artificial intelligence deep learning, and in particular, to a text encoding technique related to a deconvolution neural network, and more particularly, to a text encoding method and apparatus, a text encoding validity checking method and apparatus, and an electronic apparatus and a non-transitory computer-readable storage medium.

Background

Natural Language Understanding (NLU) is a basic task of causing a computer to recognize and understand text information semantically, and is widely concerned as one of the artificial intelligence complete problems (AI-complete), and Natural Language Processing (NLP). Natural language is input into a computer in the form of text, and generally words are used as basic units, and sentences and chapters are formed by word combinations, so that the key for understanding the words, the sentences, the chapters and the like by the computer is to reasonably encode the words. The vector generated by the encoder (vector) needs to contain semantic information implied by the words in the context, and thus is further applied in the subtask of natural language understanding: language models, emotion analysis, text classification, machine translation, and the like.

The word coding method can be generally divided into a high-dimensional sparse vector (one-hot) representation and a low-dimensional dense vector (densvector, or also called continuous vector) representation. In order to convert words into mathematical features for input into a computer, early researchers proposed encoding words as a high-dimensional sparse vector, however, the excessively sparse representation is limited to many applications. Currently, researchers mainly focus on a low-dimensional dense vector encoding method, namely, words are encoded into low-dimensional continuous vectors. At this stage, word coders based on distributed semantic assumptions have been successful. On the basis of the above, a Word2vec model proposed by Google in 2013 and a Glove model proposed by Stanford NLP group in 2014 are typical representatives of text coding. However, these encoders have a general problem of polysemy-insensitive, i.e., the existing word vector encoding method cannot handle the phenomenon of word ambiguity. The ambiguous word is almost ubiquitous in natural language, for example, the word "eagle" may refer to an animal and may also refer to the team name of a basketball team. However, each word corresponds to a fixed vector at present, so that the vector corresponds to the "average" semantic of a certain word, the semantic recognition of the ambiguous word is not accurate, and the expression of the model (i.e. the model taking the vector as input) in the NLP can be greatly reduced.

With the development of deep learning, Networks further encoded on the basis of the original word encoding are continuously appearing, wherein Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN) and multi-head attention-training (pre-training) Networks represented by Google latest bert (bidirectional Encoder retrieval from transformations).

CNN was first traced Back to the Back Propagation (BP) algorithm proposed in 1987, and then used in the multi-layer neural network in Yann LeCun in 1989 until the lent-5 model was proposed in Yann LeCun in 1998, with CNN gradually revealing a prototype. CNN can be divided into three parts: an input layer, a hidden layer, and an output layer. The hidden layer mainly comprises a convolution layer, a pooling layer and a full-connection layer, wherein the convolution layer is responsible for extracting semantic features of an input sequence, the pooling layer is responsible for selecting and filtering the semantic features, and the full-connection layer is responsible for combining the semantic features and outputting a prediction score corresponding to each classification. The output layer outputs the prediction probability corresponding to each class using a logistic function or a normalized exponential function, such as a softmax function. The CNN has the advantages of being good at extracting local semantic information and high in training speed. However, just as it is suitable for extracting local semantic information, CNNs are less effective than RNNs in extracting long-sequence semantic information.

In 1989, Ronald Williams and David Zipser proposed Real-time cyclic Learning (RTRL) of RNN. Subsequently, Paul Werbos proposed Time-dependent back-propagation of RNN (BPThroughTime, BPTT) in 1990. RNN has many variants, among which widely used Long Short-term memory (LSTM), produced by Sepp Hochreiter and

schmidhuber proposed in 1997, a Gated Recurrent Unit (GRU) proposed by Kyunghyun Cho in 2014. RNN can be divided into three parts: an input layer, a hidden layer, and an output layer. Wherein the input layer and the output layer are the same as the input and output layer of the CNN, and the hidden layer is a loop layer and is responsible for extracting the time sequence characteristics of the input sequence and the time of current extractionThe timing characteristics depend on the timing characteristics extracted at the last time. RNNs have the advantage of being suitable for analyzing global semantic information or contextual semantic information, especially long sequences of global semantic information. Because the hidden layer is a circulation layer, the training speed of the circulation neural network is slow. Moreover, in a long sequence, the features extracted by the hidden layer are more biased to the information input later in time, which affects the extraction effect of the RNN on the global semantic information.

To overcome the drawbacks of RNN and CNN, Google proposed deep BERT and XLNet (multi-head self-attention) based encoders that achieve the best results among the dozen natural language processing tasks using pre-trained encoders. However, since the model includes a large number of parameters and a large number of data are required for pre-training, the adjustment cost of the model is large, that is, the model can be used as a module, but the internal structure is difficult to modify.

In summary, although the CNN, RNN and the pre-training model extract context semantic information in different ways during the encoding process, they do not model and encode the features of the ambiguous word, and thus do not implement ambiguous word recognition (polysemy-sensitive). Meanwhile, in the industry, the effect verification is carried out on the coding result of the encoder only through a public data set, and a quantitative targeted experimental verification mode is lacked.

Disclosure of Invention

In view of the above, the present invention is made to solve the technical problem that the polysemous word recognition cannot be realized.

According to a first aspect of the present invention, there is provided a text encoding method comprising: receiving text input, and inputting the text input into a global encoder to obtain a first encoding result, wherein the first encoding result comprises global subject information of a text; inputting a first coding result into a two-dimensional deconvolution layer to generate a convolution kernel, wherein parameters of the convolution kernel can sense context semantic information and can be adjusted according to the context semantic information; and performing one-dimensional convolution operation on the text word by word according to the convolution kernel so as to obtain a second coding result.

Optionally, after receiving the text input, the method further comprises the steps of: coding the text from the character information, the position information and the segmentation information so as to obtain a preliminary coding result; integrating the preliminary coding result to obtain a preliminary coding matrix, wherein the step of performing one-dimensional convolution operation word by word on the text according to the convolution kernel to obtain a second coding result comprises the following steps: and performing one-dimensional convolution operation word by word on the preliminary coding matrix to obtain a second coding result.

Optionally, the size of the parameter matrix in the convolution kernel is one of: 1 xn, 3 xn, 5 xn, where n denotes the coding vector dimension of the first encoding.

Optionally, the global encoder is one of: convolutional neural network encoders, cyclic neural networks, and pre-training encoders.

Optionally, the text encoding method further includes the steps of: judging whether the second coding result is valid, wherein the judging step comprises the following steps: performing singular value decomposition on the parameters of the convolution kernel output by the two-dimensional deconvolution layer; taking the maximum two singular values, and drawing the two singular values as the origin of coordinates on a two-dimensional plane; clustering the samples according to the sample categories for all the samples; calculating the coordinates of the geometric center of all samples belonging to the same category; calculating the average Euclidean distance d from the samples of the same category to the geometric center; calculating the average distance D between every two geometric central points of different categories; calculating the ratio of D/D; judging that the second coding result is valid under the condition that the ratio is greater than the threshold value; and under the condition that the ratio is smaller than the threshold value, judging that the second coding result is invalid, adjusting the sizes of the global encoder and the convolution kernel parameter matrix, and recoding.

According to another embodiment of the present invention, there is provided a text encoding validity checking method, for use in the text encoding method, including: performing singular value decomposition on the parameters of the convolution kernel output by the two-dimensional deconvolution layer; taking the two maximum singular values, and drawing the two maximum singular values on a two-dimensional plane as coordinate original points; clustering all samples according to sample categories; calculating the coordinates of the geometric center of all samples belonging to the same category; calculating the average Euclidean distance d from the samples of the same category to the geometric center; calculating the average distance D between every two geometric central points of different categories; and calculating the ratio of D/D, and judging that the text code is effective under the condition that the ratio is greater than a threshold value, otherwise, judging that the text code is ineffective.

According to another embodiment of the present invention, there is provided a text encoding apparatus including: the global coding module is used for receiving text input and calculating to obtain a first coding result, and the first coding result contains global subject information of the text; the deconvolution layer module is used for receiving the first coding result and generating a convolution kernel, and the parameter of the convolution kernel can sense context semantic information and is adjusted according to the context semantic information; and the convolution operation module is used for performing one-dimensional convolution operation on the text word by word according to the convolution kernel so as to obtain a second coding result.

According to another embodiment of the present invention, there is provided a text encoding validity verifying apparatus for the above text encoding method, including: the singular value decomposition module is used for carrying out singular value decomposition on the parameters of the convolution kernel output by the two-dimensional deconvolution layer, taking the two largest singular values and drawing the two largest singular values on a two-dimensional plane as coordinate original points; the sample clustering module is used for clustering the samples according to the sample types for all the samples; the central coordinate calculation module is used for calculating the coordinates of the geometric centers of all samples belonging to the same category; the Euclidean distance calculation module is used for calculating the average Euclidean distance d from the samples of the same category to the geometric center; the center distance calculation module is used for calculating the average distance D between every two geometric center points of different categories; the ratio calculation module is used for calculating the ratio of D/D; and the judging module is used for judging that the text code is effective under the condition that the ratio is greater than the threshold value, and otherwise, judging that the text code is ineffective.

According to still another embodiment of the present invention, an electronic device is provided, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the method when executing the computer program.

According to yet another embodiment of the present invention, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the above-described method.

At present, few studies on the manipulation of natural language processing tasks are available in deconvolution related papers, particularly in supervised learning. As can be seen from the foregoing, embodiments of the present invention provide a technique for implementing an adaptive CNN convolution kernel, and in particular, extend Deconvolution (Deconvolution) to generating a parametric adaptive convolution kernel. Different from the traditional CNN model, the parameters of convolution operation are adapted to different contexts in the model of the technical scheme of the application and are not fixed or unchanged any more. This is to obtain different encoding results for the same ambiguous word according to different contexts during the encoding process.

In the technical scheme of the application, the generation process of generating the convolution filter, namely the convolution kernel, through deconvolution is simple and effective, global subject information of sentences or documents is also involved, and the generated filter can generate a coding result of the cognitive polysemous words.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a schematic flow chart of a text encoding method according to an embodiment of the present invention;

FIG. 2 is a schematic flow chart of a text encoding method according to another embodiment of the present invention;

FIG. 3 is a schematic flow chart of a text encoding validity checking method according to an embodiment of the present invention;

FIG. 4 is a schematic flow chart of a text encoding method according to another embodiment of the present invention;

FIG. 5 is a simplified flow diagram of a method for text encoding and verification in accordance with an embodiment of the present invention;

FIG. 6 is a schematic block diagram of a text encoding apparatus of an embodiment of the present invention;

FIG. 7 is a schematic block diagram of a text encoding validity checking apparatus according to an embodiment of the present invention;

fig. 8 is a schematic block diagram of an electronic device for implementing a text encoding method and a text encoding validity checking method according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to specific embodiments and the accompanying drawings.

It is to be noted that technical terms or scientific terms used in the embodiments of the present invention should have the ordinary meanings as understood by those having ordinary skill in the art to which the present disclosure belongs, unless otherwise defined. The use of "first," "second," and similar terms in this disclosure is not intended to indicate any order, quantity, or importance, but rather is used to distinguish one element from another. The word "comprising" or "comprises", and the like, means that the element or item listed before the word covers the element or item listed after the word and its equivalents, but does not exclude other elements or items. The terms "connected" or "coupled" and the like are not restricted to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "upper", "lower", "left", "right", and the like are used merely to indicate relative positional relationships, and when the absolute position of the object being described is changed, the relative positional relationships may also be changed accordingly.

As described in the background section, the study of natural language processing tasks does not enable ambiguous word recognition, while an implementation technique of an adaptive CNN convolution kernel is provided according to embodiments of the present invention. Unlike the conventional CNN model, the parameters of the convolution operation are automatically generated in the model of the present embodiment in response to different contexts. A method for text encoding using an adaptive CNN convolution kernel according to an embodiment of the present invention is described below with reference to fig. 1.

As shown in fig. 1, the method comprises the steps of:

s110, receiving text input, and inputting the text input into a global encoder to obtain a first encoding result, wherein the first encoding result comprises global subject information of the text.

Wherein the first encoding result may be a preliminary characterization of an entire sentence. The global encoder may be a CNN encoder, an RNN encoder or a pre-training encoder (e.g., BERT, XLNET), and the selection of a specific encoder should be a super-parameter, and those skilled in the art can select the encoder according to the performance of different data sets.

After receiving the text input and before inputting the text into the global encoder, the text may be further encoded and integrated, specifically as follows:

and encoding the text from the character (token) information, the position (position) information and the segment (segment) information to obtain a preliminary encoding result. The character information can be coded by using word2vec, glove, word and the like; the position information can be fixedly coded by a sine wave vector mode, or can be coded by a trainable position vector, or can be coded by a combination mode of the sine wave vector and the trainable position vector; the segmentation information coding mode uses trainable segment marking vectors for coding, and the segmentation basis can be punctuation, paragraph and other information.

Then, for the fusion with the coding information, the combination can be performed by adopting splicing, counterpoint summation, weighted summation and other manners, so as to obtain a preliminary coding matrix E.

And then, inputting the preliminary coding matrix E obtained by integration into a global coder for coding to obtain a first coding result.

The specific encoding mode and the integration means of the preliminary encoding described above can be chosen and chosen by those skilled in the art according to actual needs, and the present invention is not limited herein.

And S120, inputting the first coding result into the two-dimensional deconvolution layer to generate a convolution kernel, wherein the parameter of the convolution kernel can sense context semantic information and is adjusted according to the context semantic information.

As described in step S110, the first encoding result, i.e., the input of the two-dimensional deconvolution layer, has global information, so the convolution kernel generated by the two-dimensional deconvolution layer has global information and is polysemous word adaptive, and its parameters can be adjusted adaptively while sensing context semantic information. The parameter matrix in the convolution kernel may have various sizes, and the size may be preset, for example, 1 × n, 3 × n, or 5 × n (where n represents the dimension of the first encoded encoding vector), or may be selected from a plurality of size parameters according to the dimension (e.g., 768) of the network input word vector.

S130, performing one-dimensional convolution operation word by word on the text according to the convolution kernel, and thus obtaining a second coding result.

The convolution kernel of 1 xn, 3 xn or 5 xn size illustrated above can sense semantic information of 1, 3 or 5 words, i.e. semantic information of itself and surrounding context, thus combining global semantic information to generate a coding vector of each word.

Traditionally, the neural network is divided into two processes of training and testing, and in the traditional CNN, appropriate parameters are learned according to training samples in the training process and are fixed in the testing process. In other words, the parameters of the conventional CNN are independent of anything during the test and remain fixed at all times.

Unlike the conventional convolution operation with fixed parameters, according to this embodiment, since a sample (which may be a sentence or a document) needs to be input into the global encoder first to generate an overall token vector, the overall token vector contains the subject information of the entire sample, such as education, finance, sports, science and technology, etc. When the adaptive convolution kernel processes the polysemous words, the parameters of the convolution kernel can be adjusted according to the semantics of the context of the polysemous words, and then the polysemous words are subjected to adaptive coding in the convolution process, and the polysemous word cognitive coding vector is output.

For example, the same word "hawk" has different meaning in the following two sentences due to different context: 1) eagle today performs well, pressing the lake; 2) hawk belongs to bird species.

According to the embodiment, the same word "eagle" in two sentences may generate two different vectors (for example, 768-dimensional vectors, which are hyper-parameters and can be adjusted by those skilled in the art according to actual needs) due to different contexts, and the cosine distance of the generated vector is also larger because the semantic difference between the two scenarios is larger.

Fig. 2 shows an overall flow diagram of a text encoding method according to an embodiment of the invention, after the integration of character, position and segmentation information encoding and encoding information.

As described in the background section, the industry only verifies the effect of the encoder coding result through the public data set, and lacks a quantitative targeted experimental verification mode. According to the embodiment of the invention, the text coding inspection method is also provided and is used for inspecting the coding result of the text coding method.

Fig. 3 shows a schematic flow diagram of a text encoding validity checking method according to an embodiment of the present invention. As shown in fig. 3, the method includes:

s310, performing singular value decomposition on the parameters of the convolution kernel output by the two-dimensional deconvolution layer;

s320, taking the two maximum singular values as coordinate original points to be drawn on a two-dimensional plane;

after each set of convolution kernel parameters is subjected to singular value decomposition, two singular values are generated, which are assumed to be x1 and x2, and are respectively used as origin values of the x axis and the y axis of the two-dimensional coordinate, that is, the origin of the two-dimensional coordinate plane is (x1 and x 2).

S330, clustering all samples according to sample categories;

here, each sample (sentence or paragraph) is input to generate a coordinate point in the two-dimensional plane, and the samples are clustered according to the sample category for all the samples.

S340, calculating the coordinates of the geometric centers of all samples belonging to the same class;

all samples belonging to the same class may also be samples obtained by negative sampling.

S350, calculating the average Euclidean distance d from the samples of the same category to the geometric center;

s360, calculating the average distance D between every two geometric central points of different categories;

s370, calculating the ratio of D/D;

and S380, judging that the text code is effective under the condition that the ratio is larger than the threshold value, otherwise, judging that the text code is ineffective.

The ratio D/D is calculated as a measure of the effectiveness of the encoder. The larger the ratio of D/D, the better the performance of the encoder is proved, the higher the effectiveness is, and a lower limit threshold value can be set in the application process. The specific threshold value needs to be specifically formulated according to the sample data set. For example, the lower threshold may be set to 0.5, and D/D >0.5 generally indicates that the encoder performance is excellent.

The above describes a method flow of validity checking of semantic cognitive information of a text encoder. The encoder designed by the embodiment of the application aims at realizing the cognition of the task semantic information, so that the visualization and quantitative verification of reasonable semantic information are key steps for proving the effectiveness of the text encoder besides good performance on a data set. Therefore, the validity check method shown in fig. 3 can be combined with the text encoding method shown in fig. 1, the singular value decomposition is used for checking the validity of the encoding result of the text encoder, and the text encoding result is judged to be reasonable through the clustering related indexes.

Fig. 4 shows a flow chart of a text encoding method incorporated into the method shown in fig. 3. As shown in fig. 4, after step S130 of fig. 1, the following steps may be further included:

s140, judging whether the second coding result is valid.

As previously mentioned, the determining step includes: performing singular value decomposition on the parameters of the convolution kernel output by the two-dimensional deconvolution layer; taking the two maximum singular values, and drawing the two maximum singular values on a two-dimensional plane as coordinate original points; clustering all samples according to sample categories; calculating the coordinates of the geometric center of all samples belonging to the same category; calculating the average Euclidean distance d of the samples of the same category to the geometric center; calculating the average distance D between every two geometric central points of different categories; calculating the ratio of D/D; judging that the second coding result is valid under the condition that the ratio is greater than a threshold value; and under the condition that the ratio is smaller than a threshold value, judging that the second coding result is invalid.

And S150, under the condition that the second coding result is judged to be invalid, adjusting the sizes of the global coder and the convolution kernel parameter matrix, and recoding.

If the coding result passes the validity check, the method can be applied to tasks such as text classification tasks, emotion analysis, machine translation, language models and the like.

Fig. 5 shows a simplified flow chart of the overall process. As shown in fig. 5, the overall process includes: obtaining an integral coding result of a sentence by using a global coder, inputting the integral coding result into a deconvolution model, and generating a parameter self-adaptive convolution kernel; performing convolution operation by using a self-adaptive convolution kernel, performing polysemous word cognitive self-adaptive coding on local information by using global information, and outputting a final coding vector; and finally, carrying out validity check on the coding result in a singular value decomposition mode, judging whether the coding result achieves the target of polysemous word cognition, if not, adjusting the hyper-parameters, and repeating the coding process.

According to another embodiment of the invention, a text encoding device is also provided. The text encoding apparatus is configured to perform the text encoding method described above. Fig. 6 shows a schematic block diagram of the text encoding device. As shown in fig. 6, the text encoding apparatus includes: the global encoding module 610 is configured to receive text input, and perform calculation to obtain a first encoding result, where the first encoding result includes global subject information of a text. And an deconvolution layer module 620, configured to receive the first encoding result, and generate a convolution kernel, where a parameter of the convolution kernel can sense context semantic information and is adjusted according to the context semantic information. A convolution operation module 630, configured to perform a word-by-word one-dimensional convolution operation on the text according to the convolution kernel, so as to obtain a second encoding result.

According to another embodiment of the present invention, there is also provided a text encoding validity checking apparatus for checking whether the encoding result of the above text encoding method is valid or for checking whether the encoding method result performed by the above text encoding apparatus is valid. Fig. 7 shows a schematic block diagram of the text encoding validity checking apparatus. As shown in fig. 7, the apparatus includes: and the singular value decomposition module 710 is configured to perform singular value decomposition on the parameters of the convolution kernel output by the two-dimensional deconvolution layer, take the two largest singular values, and draw the two largest singular values as coordinate origin points on the two-dimensional plane. And a sample clustering module 720, configured to cluster the samples according to the sample categories for all the samples. And a central coordinate calculating module 730, configured to calculate coordinates of the geometric center of all samples belonging to the same category. A euclidean distance calculating module 740 for calculating an average euclidean distance d of the samples of the same class to the geometric center. And a center distance calculating module 750 for calculating an average distance D between every two geometric center points of different categories. And a ratio calculation module 760 for calculating the ratio of D/D. The determining module 770 is configured to determine that the text code is valid when the ratio is greater than the threshold, and otherwise determine that the text code is invalid.

The apparatus of the foregoing embodiment is used to implement the corresponding method in the foregoing embodiment, and has the beneficial effects of the corresponding method embodiment, which are not described herein again.

It should be noted that the method of the embodiment of the present invention may be executed by a single device, such as a computer or a server. The method of the embodiment can also be applied to a distributed scene and completed by the mutual cooperation of a plurality of devices. In the case of such a distributed scenario, one of the multiple devices may only perform one or more steps of the method according to the embodiment of the present invention, and the multiple devices interact with each other to complete the method.

Fig. 8 is a schematic diagram illustrating a more specific hardware structure of an electronic device according to this embodiment, where the electronic device may include: a processor 810, a memory 820, an input/output interface 830, a communication interface 840, and a bus 850. Wherein processor 810, memory 820, input/output interface 830, and communication interface 840 are communicatively coupled to each other within the device via bus 850.

The processor 810 may be implemented by a general-purpose CPU (Central Processing Unit), a microprocessor, an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits, and is configured to execute related programs to implement the technical solutions provided in the embodiments of the present specification.

The Memory 820 may be implemented in the form of a ROM (Read Only Memory), a RAM (Random access Memory), a static storage device, a dynamic storage device, or the like. The memory 820 may store an operating system and other application programs, and when the technical solution provided by the embodiments of the present specification is implemented by software or firmware, the relevant program codes are stored in the memory 820 and called to be executed by the processor 810.

The input/output interface 830 is used for connecting an input/output module to realize information input and output. The i/o module may be configured as a component in a device (not shown) or may be external to the device to provide a corresponding function. The input devices may include a keyboard, a mouse, a touch screen, a microphone, various sensors, etc., and the output devices may include a display, a speaker, a vibrator, an indicator light, etc.

The communication interface 840 is used for connecting a communication module (not shown in the figure) to realize communication interaction between the device and other devices. The communication module can realize communication in a wired mode (such as USB, network cable and the like) and also can realize communication in a wireless mode (such as mobile network, WIFI, Bluetooth and the like).

Bus 850 includes a pathway for communicating information between various components of the device, such as processor 810, memory 820, input/output interface 830, and communication interface 840.

It should be noted that although the above-mentioned device only shows the processor 810, the memory 820, the input/output interface 830, the communication interface 840 and the bus 850, in a specific implementation, the device may also include other components necessary for normal operation. In addition, those skilled in the art will appreciate that the above-described apparatus may also include only those components necessary to implement the embodiments of the present description, and not necessarily all of the components shown in the figures.

The embodiment of the invention also provides a program product stored with the machine-readable instruction codes. The instruction codes can be read and executed by a machine to execute the text encoding method and the text encoding validity checking method according to the embodiment of the invention. Accordingly, various storage media for carrying such program products are also included in the present disclosure.

Computer-readable media of the present embodiments include both non-transitory and non-transitory, removable and non-removable media implemented in any method or technology for storage of information. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device.

Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, is limited to these examples; within the idea of the invention, also features in the above embodiments or in different embodiments may be combined, steps may be implemented in any order, and there are many other variations of the different aspects of the invention as described above, which are not provided in detail for the sake of brevity.

In addition, well known power/ground connections to Integrated Circuit (IC) chips and other components may or may not be shown within the provided figures for simplicity of illustration and discussion, and so as not to obscure the invention. Furthermore, devices may be shown in block diagram form in order to avoid obscuring the invention, and also in view of the fact that specifics with respect to implementation of such block diagram devices are highly dependent upon the platform within which the present invention is to be implemented (i.e., specifics should be well within purview of one skilled in the art). Where specific details (e.g., circuits) are set forth in order to describe example embodiments of the invention, it should be apparent to one skilled in the art that the invention can be practiced without, or with variation of, these specific details. Accordingly, the description is to be regarded as illustrative instead of restrictive.

While the present invention has been described in conjunction with specific embodiments thereof, many alternatives, modifications, and variations of these embodiments will be apparent to those of ordinary skill in the art in light of the foregoing description. For example, other memory architectures (e.g., dynamic ram (dram)) may use the discussed embodiments.

The embodiments of the invention are intended to embrace all such alternatives, modifications and variances that fall within the broad scope of the appended claims. Therefore, any omissions, modifications, substitutions, improvements and the like that may be made without departing from the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims

1. A method of text encoding, comprising:

receiving text input, and inputting the text input into a global encoder to obtain a first encoding result, wherein the first encoding result comprises global subject information of a text;

inputting the first coding result into a two-dimensional deconvolution layer to generate a convolution kernel, wherein the parameter of the convolution kernel can sense context semantic information and is adjusted according to the context semantic information;

and performing one-dimensional convolution operation on the text word by word according to the convolution kernel so as to obtain a second coding result.

2. The text encoding method of claim 1, wherein after receiving text input, further comprising the steps of:

coding the text from the character information, the position information and the segmentation information so as to obtain a preliminary coding result;

integrating the preliminary coding results to obtain a preliminary coding matrix,

performing a word-by-word one-dimensional convolution operation on the text according to the convolution kernel to obtain a second encoding result, including: and performing one-dimensional convolution operation word by word on the preliminary coding matrix to obtain a second coding result.

3. The text encoding method of claim 1 or 2, wherein the size of the parameter matrix in the convolution kernel is one of: 1 xn, 3 xn, 5 xn, where n denotes the coding vector dimension of the first encoding.

4. The text encoding method of claim 3, wherein the global encoder is one of: convolutional neural network encoders, cyclic neural networks, and pre-training encoders.

5. The text encoding method of claim 1 or 2, further comprising the steps of:

judging whether the second coding result is valid, wherein the judging step comprises the following steps:

performing singular value decomposition on the parameters of the convolution kernel output by the two-dimensional deconvolution layer;

taking the two maximum singular values, and drawing the two maximum singular values on a two-dimensional plane as coordinate original points;

clustering the samples according to the sample categories for all the samples;

calculating the coordinates of the geometric center of all samples belonging to the same category;

calculating the average Euclidean distance d from the samples of the same category to the geometric center;

calculating the average distance D between every two geometric central points of different categories;

calculating the ratio of D/D;

judging that the second coding result is valid under the condition that the ratio is greater than a threshold value;

and under the condition that the ratio is smaller than the threshold value, judging that the second coding result is invalid, adjusting the sizes of the global encoder and the convolution kernel parameter matrix, and recoding.

6. A text encoding validity checking method for the text encoding method of any one of claims 1 to 5, comprising:

clustering all samples according to sample categories;

calculating the average Euclidean distance d of the samples of the same category to the geometric center;

and calculating the ratio of D/D, and judging that the text code is effective under the condition that the ratio is greater than a threshold value, otherwise, judging that the text code is ineffective.

7. A text encoding apparatus, characterized by comprising:

the global coding module is used for receiving text input and calculating to obtain a first coding result, wherein the first coding result comprises global subject information of the text;

the deconvolution layer module is used for receiving the first coding result and generating a convolution kernel, and the parameter of the convolution kernel can sense context semantic information and is adjusted according to the context semantic information;

and the convolution operation module is used for performing one-dimensional convolution operation on the text word by word according to the convolution kernel so as to obtain a second coding result.

8. A text encoding validity verifying apparatus for use in the text encoding method of any one of claims 1 to 5, comprising:

the singular value decomposition module is used for carrying out singular value decomposition on the parameters of the convolution kernel output by the two-dimensional deconvolution layer, taking the two largest singular values and drawing the two largest singular values on a two-dimensional plane as coordinate original points;

the sample clustering module is used for clustering the samples according to the sample types for all the samples;

the central coordinate calculation module is used for calculating the coordinates of the geometric centers of all samples belonging to the same category;

a Euclidean distance calculation module for calculating the average Euclidean distance d from the same category of samples to the geometric center;

the center distance calculation module is used for calculating the average distance D between every two geometric center points of different categories;

the ratio calculation module is used for calculating the ratio of D/D;

and the judging module is used for judging that the text code is effective under the condition that the ratio is greater than the threshold value, and otherwise, judging that the text code is ineffective.

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1 to 6 when executing the program.

10. A non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the method of any one of claims 1 to 6.