CN114676696A

CN114676696A - Method, device, server, electronic equipment and storage medium for generating word feature vector

Info

Publication number: CN114676696A
Application number: CN202011557976.0A
Authority: CN
Inventors: 刘凡平
Original assignee: Individual
Current assignee: Individual
Priority date: 2020-12-24
Filing date: 2020-12-24
Publication date: 2022-06-28

Abstract

The invention discloses a method for generating word feature vectors, which comprises the following steps: acquiring a minimum training unit to perform word segmentation processing to obtain a target word set; respectively obtaining initial characteristic vectors of all target words in the target word set; and training the target words in the target word set according to the difference value between the sum of the initial characteristic vectors of all the target words in the target word set and the number of the target words in the target word set, and determining the word characteristic vector of each target word according to the training result. The invention also discloses a device and a server for generating the word feature vector. According to the device and the method provided by the invention, the word feature vector can be rapidly deduced, so that the problems that the mainstream word vector generation method in the prior art is low in training efficiency and prediction efficiency and needs to consume a large amount of memory and computing resources are solved.

Description

Method, device, server, electronic equipment and storage medium for generating word feature vector

Technical Field

The present invention relates to the field of natural language processing technologies, and in particular, to a method for generating word feature vectors, an apparatus for generating word feature vectors, a server, an electronic device, and a storage medium.

Background

Word feature vectors (word vectors for short) are the preferred technology for text vectorization in various natural language processing tasks, such as part-of-speech tagging, named entity recognition, text classification, document clustering, emotion analysis, document generation, question and answer systems, and the like. A word vector may map words to fixed dimensions, and mathematically the word vector may be expressed as f: x → y, x being a word or segment text, and y being a fixed dimension vector after mapping.

The existing mainstream Word vector generation modes comprise Word2vec, GloVe model, BERT model and the like. However, the Word2vec model has the disadvantage that the problem of the polysemous words cannot be effectively solved, and the GloVe and BERT models can cover the semantic and grammatical information of the words as much as possible, but have the defects of low training efficiency and prediction efficiency and large consumption of memory and computing resources.

Disclosure of Invention

The embodiment of the invention provides a technical concept for training word feature vectors based on errors, and the word feature vectors are rapidly deduced in an error calculation mode, so that the problems that a mainstream word vector generation method in the prior art is low in training efficiency and prediction efficiency and needs to consume a large amount of memory and calculation resources are solved.

In a first aspect, an embodiment of the present invention provides a method for generating a word feature vector, where the method includes:

acquiring a minimum training unit to perform word segmentation processing to obtain a target word set;

respectively acquiring initial characteristic vectors of all target words in the target word set;

training the target words in the target word set according to the difference between the sum of the initial characteristic vectors of all the target words in the target word set and the number of the target words in the target word set, and determining the word characteristic vector of each target word according to the training result.

In a second aspect, an embodiment of the present invention provides an apparatus for generating word feature vectors, which includes

At least one processor for executing a program code for the at least one processor,

and a memory communicatively coupled to the at least one processor;

wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the steps of the above method.

In a third aspect, an embodiment of the present invention provides a server, which includes

and a memory communicatively coupled to the at least one processor;

In a fourth aspect, an embodiment of the present invention provides an electronic device, which includes

and a memory communicatively coupled to the at least one processor;

In a fifth aspect, the invention provides a storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the above method.

The embodiment of the invention has the beneficial effects that: the technical scheme provided by the embodiment of the invention utilizes the error between the sum of the initial characteristic vectors of all words contained in the minimum training unit and the number of the words of all words contained in the minimum training unit to train the characteristic vectors of the words contained in the minimum training unit so as to train the stable characteristic vectors of the words according to the error of the initial characteristic vectors and the number of the words. During training, training is carried out based on the error between the sum of the feature vectors of the words and the number of the words contained in the minimum training unit, so that the training process can be completed by using simple addition and subtraction operation, and the training efficiency and the prediction efficiency are greatly improved; in addition, because the measurement and calculation are carried out based on the minimum training unit, the context semantics and the word meaning of the word in the minimum training unit can be considered, and the training and prediction effects of the word feature vector are guaranteed.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a flowchart of a method for generating word feature vectors according to an embodiment of the present invention;

FIG. 2 is a diagram illustrating an example of a training process in which a plurality of minimum training units all include the same word according to an embodiment of the present invention;

FIG. 3 is a flowchart illustrating a method for implementing step S13 in FIG. 1 according to an embodiment of the present invention;

FIG. 4 is a flowchart illustrating an implementation method for forming the training matrix in step S131 according to an embodiment of the present invention;

FIG. 5 is a diagram illustrating the effect of the training matrix formed according to an embodiment of the present invention;

FIG. 6 is a flowchart illustrating a method for implementing step S132 according to an embodiment of the present invention;

FIG. 7 is a flowchart illustrating a method for implementing step S602 according to an embodiment of the present invention;

FIG. 8 is a process diagram of training matrix column summation according to an embodiment of the present invention;

FIG. 9 is a diagram illustrating the encoding effect of the initial feature vectors formed by the random initialization of words according to an embodiment of the present invention;

FIG. 10 is a functional block diagram of an apparatus for generating word feature vectors in accordance with an embodiment of the present invention;

fig. 11 is a schematic structural diagram of an embodiment of an electronic device according to the invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.

The invention may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

As used in this application, the terms "module," "apparatus," "system," and the like are intended to refer to a computer-related entity, either hardware, a combination of hardware and software, or software in execution. In particular, for example, an element may be, but is not limited to being, a process running on a processor, an object, an executable, a thread of execution, a program, and/or a computer. Also, an application or script running on a server, or a server, may be an element. One or more elements may be in a process and/or thread of execution and an element may be localized on one computer and/or distributed between two or more computers and can be operated by various computer-readable media. The elements may also communicate by way of local and/or remote processes based on a signal having one or more data packets, e.g., from a data packet interacting with another element in a local system, distributed system, and/or across a network in the internet with other systems by way of the signal.

Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. The term "comprising", without further limitation, means that the element so defined is not excluded from the group of processes, methods, articles, or devices that include the element.

The method for generating the word feature vector in the embodiment of the present invention may be applied to any server having a natural language processing function, and may also be applied to any terminal device configured with a natural language processing function module, for example, a terminal device such as a smart phone, a tablet computer, and a smart home, which is not limited in this respect. By applying the method for generating the word feature vector, the server and the terminal equipment can quickly and accurately train and generate the word feature vector for providing functions such as semantic analysis and the like when providing natural language processing functions such as text classification, emotion analysis, dialogue question answering, document clustering and the like, the computing time on the server or the terminal equipment with the same configuration is halved on the premise of ensuring the word vector effect, and the training efficiency and the prediction efficiency of the server and the terminal equipment are greatly improved.

Fig. 1 schematically shows a flow of a method for generating word feature vectors according to an embodiment of the present invention, the method is suitable for training word feature vectors to obtain word feature vectors in a stable state, and an execution subject of the method may be a processor in a server or a processor on a terminal device, where the terminal device may be, for example, a computer, a personal computer, a smart phone, a smart robot, and the like, which is not limited by the embodiments of the present invention. As shown in fig. 1, the method of the embodiment of the present invention includes:

Step S11: and acquiring a minimum training unit to perform word segmentation processing to obtain a target word set.

Step S12: and respectively obtaining the initial characteristic vector of each target word in the target word set.

Step S13: and training the target words in the target word set according to the difference value between the sum of the initial characteristic vectors of all the target words in the target word set and the number of the target words in the target word set, and determining the word characteristic vector of each target word according to the training result.

In step S11, the obtaining of the minimum training unit may be implemented by pre-configuring a content database storing the minimum training unit to be trained, and sequentially reading the minimum training units stored in the database; or by receiving the training content input by the user on the user interface. Wherein, the obtained minimum training unit may be a sentence. On the occasion that the obtained minimum training unit is a sentence, the obtained minimum training unit is obtained by carrying out segmentation pretreatment on the collected corpus by taking a sentence separator as a segmentation identifier; or by receiving user input; or by reading a pre-stored training sentence from the corpus. The corpus collection may be, for example, capturing corpus content from the internet based on historical news data, wikipedia data, and the like downloaded in the open. The division preprocessing refers to a process for splitting a complete semantically expressed sentence from the captured corpus content as much as possible. In the embodiment of the present invention, a preprocessing method of splitting according to sentence delimiters is preferably adopted, for example, sentence delimiters such as "period", "exclamation mark", and the like are used as a segmentation identifier of a sentence, and the collected corpus content is segmented, so as to obtain the minimum training units. The word segmentation processing mode can be realized by referring to a mature word segmentation processing technical means in the prior art, for example, a hidden markov model, a conditional random field model or a deep learning model is selected to perform word segmentation processing on the acquired minimum training unit to obtain a target word set. Exemplarily, taking the obtained minimum training unit as a sentence, and taking the obtained sentence content as "the computer is a modern electronic computing machine for high-speed computation" as an example, after performing word segmentation processing on the sentence, a target word set including eleven target words can be obtained: "computer/modern/a/for/high speed/computing/electronic/computing/machine".

In the embodiment of the present invention, the initial feature vector in step S12 refers to a current feature vector of each target word in the minimum training unit that is obtained when training is performed on the currently obtained minimum training unit. Since different target words may occur repeatedly in different minimum training units, e.g. for the word "calculate", which may occur in the sentence "computer is a modern electronic computing machine for high-speed computing", and may occur in the sentence "calculator makes computing more convenient", when the two sentences are respectively acquired as minimum training units, training for each sentence involves a process of obtaining an initial feature vector for "computing" the target word, and thus, in the embodiment of the present invention, the obtained initial feature vectors of the target word are respectively for the current training of the current minimum training unit, that is, for the current minimum training unit, the current feature vector of the target word is the initial feature vector thereof. Based on this situation, it can be understood that, for different minimum training units, which refer to the same target word, the obtained initial feature vector for the target word may be the same or different, depending on the way the initial feature vector is obtained, as the common target word of "calculating" in the above two examples. Because the word feature vector obtained by final training should be a stable vector for the same target word, as a preferred implementation example, the embodiment of the present invention obtains the initial feature vector in the following manner:

First, before performing the above steps S11 to S13, corpus initialization processing is performed, which is implemented to include: acquiring word linguistic data, and generating random N-bit codes for each word linguistic data in a random coding mode to serve as initial characteristic vector storage of the word linguistic data, wherein N is a positive integer not less than 1;

next, after determining the word feature vector of each target word from the training result of step S13, a corpus update process is performed, which is implemented to include: updating the initial characteristic vector of the word corpus matched with each target word in the corpus into the determined word characteristic vector of the corresponding target word;

in step S12, when the initial feature vector of each target word in the target word set is obtained, the most recently updated initial feature vector of the word corpus matching each target word is obtained from the corpus.

Thus, according to this method, when the initial feature vector of each target word is acquired in step S12, according to the appearance of the target word, when training is performed for the first appearance, the random N-bit code initially pre-configured for the word is acquired from the corpus as the initial feature vector, and when training is performed for the non-first appearance of the target word, the updated initial feature vector is acquired from the corpus, that is, the initial feature vector acquired for the word is the training result obtained in the previous training. For example, taking the training process shown in fig. 2 as an example, for a target word "compute", which may be included in N different sentences, when training is performed on each of the N sentences, since a random N-bit code is generated in advance for a word corpus of "computer" during initial configuration and stored as its initial feature vector, when training is performed on sentence _1, since the target word "computer" is obtained through word segmentation for the first time, at this time, the initial feature vector of the word is obtained in step S12, that is, the random N-bit code corresponding to the same word corpus as "computer" is matched from the corpus and is used as its initial feature vector for subsequent training. After the sentence _1 is trained, the word feature vector for the target word, i.e. the "computer", trained in the sentence _1 is updated to the latest initial feature vector of the corpus of the word in the corpus. Then, when a subsequent sentence is trained, for example, when sentence _2 is trained, since the target word "computer" is obtained through word segmentation again, by matching an initial feature vector corresponding to the word corpus that is the same as that of "computer" from the corpus, a feature vector obtained after sentence _1 training and subjected to one training for the word is obtained, that is, when sentence _2 is trained, the obtained initial feature vector of the target word "computer" is a word feature vector obtained after sentence _1 training. Similarly, after the sentence _2 is trained, the initial feature vector of the word corpus, which is "computer", stored in the corpus is continuously updated, so that the adjusted word feature vector is obtained during the subsequent sentence training. Therefore, for each word corpus, after repeated training in different minimum training units such as sentences, the corresponding initial feature vectors are repeatedly updated in an iterative manner, after a large amount of training, the feature vectors of the words which tend to be stable in different contexts can be obtained, the feature vectors which tend to be stable after repeated training are used as the final word feature vectors of the words, and the semantics of the feature vectors in a large amount of sentences are combined, so that the effect of the word vectors of the words can be effectively guaranteed.

In step S13, the target words in the target word set are trained according to the difference between the sum of the initial feature vectors of all the target words in the target word set and the number of the target words in the target word set, preferably, the sum of the feature vectors of all the target words in the target word set is equal to the number of the target words in the target word set. By comparing the difference value between the sum of the initial feature vectors of all target words in the target word set and the number of the target words in the set, and making the sum of the feature vectors of all the target words in the set tend to be equal to the number of the target words in the set, the realization method assumes that the influence of each word in the minimum training unit on the semantic result of the sentence is the same, so that when training is performed with the sum of the feature vectors of all the target words in the target word set equal to the number of the target words in the target word set as a target, each word in the minimum training unit has equal opportunity to be adjusted, and meanwhile, each word has certain contribution to the semantic result as far as possible, the semantic and context of the words are considered, and the validity of the feature vectors of the trained words can be guaranteed. And, training based on the difference can effectively promote computational efficiency, and then promotes training efficiency and prediction efficiency. In addition, based on the training model, the more the number of the target words obtained through word segmentation processing is, the larger the target value to which the feature vector needs to converge is, and the less the number of the target words is, the smaller the target value to which the feature vector needs to converge is, which is beneficial to the adjustment of the feature vector of the target words.

Fig. 3 schematically shows a flow of an implementation method of step S13 according to an embodiment of the present invention, and as shown in fig. 3, in this implementation example, the step S13 shown in fig. 1 may be implemented to include:

step S131: and forming a training matrix of a minimum training unit according to the obtained target word set and the initial characteristic vector of each target word in the set.

Step S132: and performing iterative adjustment on matrix elements of the training matrix according to a preset reference rule, and determining word characteristic vectors of each target word according to an iterative adjustment result.

In step S131, a training matrix is preferably generated from the targets trained in step S13. Taking as an example that the sum of the feature vectors of all target words in the target word set is equal to the number of target words in the target word set as a target, in step S131, a training matrix is preferably generated according to the number of target words in the target word set and the initial feature vector of each target word. Fig. 4 schematically shows a method process of one embodiment of forming the training matrix in step S131, and as shown in fig. 4, the method process is adapted to a training target in which a sum of feature vectors of all target words in the target word set is equal to a number of target words in the target word set, and a generation manner of the training matrix in the embodiment of the present invention is implemented to include:

Step S401: and taking the number of the target words in the target word set as the number of rows of the training matrix.

Step S402: and taking the encoding digit number of the initial feature vector of the target word as the column number of the training matrix.

Step S403: and respectively and correspondingly binding each target word in the target word set to a corresponding row of the training matrix, and setting matrix elements of the corresponding row of the training matrix as initial characteristic vectors of the correspondingly bound target words.

Therefore, assuming that the number of target words in the target word set of the minimum training unit is m and the number of encoding bits of the initial feature vector of the target word is n, a training matrix with m rows and n columns can be generated. Continuing with the above example of "the computer is a modern electronic computing machine for high-speed computation" as the minimum training unit for obtaining, assuming that, in the initialization stage, the initial feature vector of each word corpus pre-configured in a random coding manner is 128-bit random coding, and since a target word set including eleven target words is obtained through word segmentation processing, a training matrix of 11 × 128 for the sentence can be obtained according to the above method steps shown in fig. 4 as shown in fig. 5, where in this example, m in fig. 5 is 11, n is 128, and each row corresponds to one target word in the target word set. Taking the example that the target words obtained by word segmentation are ordered according to the sequence of the target words in the sentence and form the training matrix according to the sequence, the first row of the training matrix in fig. 5 corresponds to the initial feature vector of the target word, the second row of the training matrix in fig. 5 corresponds to the initial feature vector of the target word, and so on. Therefore, each target word in the target word set can be respectively and correspondingly bound to a corresponding row of the training matrix, and each target word is trained and adjusted by the training matrix.

In step S132, the preset reference rules include a first reference rule for determining the adjustment object in each iteration from the training matrix, a second reference rule for controlling the adjustment amplitude of the adjustment object in each iteration, and a third reference rule for controlling the number of times of iterative adjustment. Fig. 6 schematically shows an implementation process of step S132 according to an embodiment. As shown in fig. 6, performing iterative adjustment on matrix elements of the training matrix according to a preset reference rule, and determining a term feature vector of each target term according to an iterative adjustment result to implement:

step S601: and determining whether the iterative adjustment is continued currently according to a third reference rule, performing the processing of step S602 when determining that the iterative adjustment is continued, and performing the processing of step S603 when determining that the iterative adjustment is not continued.

Step S602: determining an adjusting object which needs to be adjusted currently according to a first reference rule, and adjusting the adjusting object by a corresponding adjusting amplitude according to a second reference rule;

step S603: and determining matrix elements of each row of the current training matrix as word characteristic vectors of corresponding bound target words.

In step S601, it is preferable to perform determination based on the difference between the feature vectors of all target words in the target word set and the number of target words, and in this determination manner, the third criterion rule may be set in advance to include a preset condition as a determination criterion condition. Specifically, the manner of calculating the difference between the feature vectors of all target words in the target word set and the number of the target words may be implemented as including: summing all matrix elements of the training matrix to obtain a first operation result; and performing subtraction operation on the first operation result and the row number of the training matrix to obtain the difference between the feature vectors of all the target words in the set and the number of the target words. After the difference is obtained, it may be determined whether to continue the iterative adjustment according to a preset condition in the third reference rule, for example, the difference may be compared with the preset condition, and when the preset condition is met, it is determined that the iterative adjustment is not to be continued, and when the preset condition is not met, it is determined that the iterative adjustment is to be continued. The content of the preset condition may be set to be that the difference is equal to zero, or the absolute value of the difference is a value close to zero, such as 0.02. In other embodiments, the content of the preset condition may also be set to that the iterative processing cannot decrease the difference value any more, and for example, the content of the preset condition is set to be the same or close to the same as three difference values obtained in three consecutive iterative processing (for example, the absolute value of the difference value between every two of the three difference values is to distinguish it from the difference value between the feature vector of all target words and the number of the target words, and the absolute value of the difference value between every two of the three difference values may be referred to as a second absolute value of the difference value, which is a value close to zero, such as 0.001). Therefore, when the difference value meets the preset condition, the characteristic vector of each target word reaches a stable state through iterative adjustment in the training process of the minimum training unit, and the adjustment is not needed to be carried out continuously; and when the difference value does not meet the preset condition, the characteristic vector of each target word is still in an unstable state, and the adjustment is required to be continuously carried out.

As a preferred implementation, the embodiment of the present invention preferably performs the difference calculation by using a mode of summing columns, that is, as shown in fig. 8, summing all matrix elements of the training matrix to obtain the first operation result is implemented by: firstly, summing each column of a training matrix for 5n to obtain a second operation result of each column; and then, adding the second operation results of all the columns to calculate to obtain a first operation result. Correspondingly, in step S602, when the set first reference rule is that the first operation result is greater than the number of rows of the training matrix, selecting a matrix element with the largest value in a column with the largest sum of column elements in the training matrix as an adjustment object; when the first operation result is smaller than the row number of the training matrix, the matrix element with the minimum value in the column with the minimum sum of column elements in the training matrix is selected as the adjustment object, for example, as shown in fig. 7, the determination of the adjustment object to be adjusted currently according to the first reference rule may be implemented as including the following steps:

step S701: judging the difference value between the first operation result and the row number of the training matrix, and selecting the matrix element with the largest value in the row with the largest sum of the column elements in the training matrix as an adjustment object when the first operation result is judged to be larger than the row number of the training matrix;

Step S702: and when the difference value is smaller than the row number of the training matrix, selecting the matrix element with the minimum value in the column with the minimum sum of the column elements in the training matrix as an adjustment object.

In step S701, it may be determined whether a difference obtained by subtracting the first operation result from the row number of the training matrix is a value greater than zero, so as to determine a size relationship between the first operation result and the row number of the training matrix. And selecting the column with the largest sum of column elements in the training matrix or selecting the column with the smallest sum of column elements in the training matrix, the sum of column elements, that is, the maximum value and the minimum value in the second operation result, may be determined by comparing and judging the second operation result. Accordingly, the maximum value or the minimum value of the matrix elements in the current column can be determined by the comparison judgment between the matrix elements in the same column.

Preferably, the adjustment object determined according to the first reference rule includes at least one matrix element that needs to be adjusted.

As a preferred implementation example, the second reference rule set in step S602 may include an adjustment direction and a preset magnitude, where the adjustment direction is adjusted upward when the first operation result is greater than the number of rows of the training matrix, and is adjusted downward when the first operation result is less than the number of rows of the training matrix. The preset amplitude comprises a first preset amplitude used for downward adjustment and a second preset amplitude used for downward adjustment. Thus, the adjusting the corresponding adjustment amplitude of the adjustment object according to the second reference rule includes:

When a matrix element with the largest value in a column with the largest sum of column elements in the training matrix is selected as an adjustment object, adjusting the adjustment object downwards by a first preset amplitude;

and when the matrix element with the minimum value in the column with the minimum sum of the column elements in the training matrix is selected as an adjustment object, adjusting the adjustment object upwards by a second preset amplitude.

The first preset amplitude and the second preset amplitude can be set to be the same value or different values, and in specific practice, specific values can be set according to requirements.

Preferably, the number of bits of the random N-bit coded random initial feature vector set for the word corpus is 128 bits, and the numeric value of each bit of code ranges from [ -1, 1 ]]. Illustratively, as shown in FIG. 9, the numeric value range of each bit of the code set for the linguistic data of the word "computer" is [ -1, 1 [ ]]Is randomly encoded. Accordingly, the first preset amplitude and the second preset amplitude are both set to

Wherein S is_iIs the sum of the column elements of the ith column of the training matrix, and m is the number of rows of the training matrix. By setting each word corpus to a value range of [ -1, 1 [ -1 [ ]]Such that the sum of each column can more easily reach the adjustment target by going towards 1, i.e. such that the sum of all columns is exactly m. Accordingly, by calculating the distance S of the currently selected column from 1 _i-1 and scaling the error to the mean value over each word, i.e. setting both the first and the second predetermined amplitude to be equal

The method can ensure that each adjustment is as close to the truest adjustment value as possible, and avoid the defect of excessive adjustment times caused by too large or too small range of the adjustment value, thereby ensuring that the calculation times of iterative adjustment are minimum and the calculation efficiency is optimized.

According to the scheme provided by the embodiment of the invention, on one hand, a word vector is generated in an initialization stage in a random initialization mode; on the other hand, the word vector is iteratively adjusted in an error fast calculation mode, so that the generation efficiency of the word feature vector is extremely high, and the training efficiency of the word vector model of the embodiment of the invention is improved by at least over 50% compared with that of the prior art under the same server configuration condition. In addition, because the word vector is adjusted by taking the sentence as a unit, the meaning of the word in the sentence is additionally adjusted in the adjusting process, and the adjusting distance is not limited to the adjacent words limited above and below the word but is the complete sentence, so the generation of the word vector is closer to the semantic meaning.

Fig. 10 schematically shows an apparatus for generating word feature vectors according to an embodiment of the present invention, as shown in fig. 10, which includes at least one processor 10 and a memory 11 communicatively connected to the at least one processor 10. Wherein the memory 11 stores instructions executable by the at least one processor 10, such that the processor 10 is capable of performing the following operations:

respectively obtaining initial feature vectors of all target words in the target word set;

In some embodiments, the processor is further capable of:

and performing corpus initialization processing, wherein the corpus initialization processing comprises the steps of obtaining word corpuses, and generating random N-bit codes for each word corpus in a random coding mode to serve as initial feature vector storage of the word corpuses, wherein N is a positive integer not less than 1.

In other embodiments, the processor is further capable of:

after determining the word feature vector of each target word according to the training result, performing corpus updating processing, wherein the process comprises updating the initial feature vector of the word corpus matched with each target word in the corpus into the determined word feature vector of the corresponding target word.

Preferably, the processor performs the operation of obtaining the initial feature vector of each target word in the target word set by obtaining a random N-bit encoded initial feature vector of the word corpus matching each target word from the corpus, or by obtaining a latest updated initial feature vector of the word corpus matching each target word from the corpus.

In a preferred implementation, the processor performs an operation of training the target words in the target word set according to a difference between a sum of initial feature vectors of all target words in the target word set and a number of target words in the target word set, with a goal that a sum of current feature vectors of all target words in the target word set is equal to the number of target words in the target word set.

In some embodiments, the processor performs training on the target words in the target word set according to the difference between the sum of the initial feature vectors of all the target words in the target word set and the number of the target words in the target word set, and the operation of determining the word feature vector of each target word according to the training result is implemented by:

Forming a training matrix of the minimum training unit according to the obtained target word set and the initial characteristic vector of each target word in the set;

and performing iterative adjustment on matrix elements of the training matrix according to a preset reference rule, and determining word characteristic vectors of each target word according to an iterative adjustment result.

Preferably, the preset reference rules include a first reference rule for determining the adjustment object in each iteration from the training matrix, a second reference rule for controlling the adjustment amplitude of the adjustment object in each iteration, and a third reference rule for controlling the number of times of iterative adjustment.

In some embodiments, the processor performs the operation of forming the training matrix of the minimum training unit according to the obtained target word set and the initial feature vectors of the target words in the set by:

taking the number of the target words in the target word set as the number of rows of the training matrix;

taking the encoding digit number of the initial feature vector of the target word as the column number of the training matrix;

and respectively and correspondingly binding each target word in the target word set to a corresponding row of the training matrix, and setting matrix elements of the corresponding row of the training matrix as initial characteristic vectors of the correspondingly bound target words.

In some embodiments, the processor performs iterative adjustment on matrix elements of the training matrix according to a preset reference rule, and the operation of determining the word feature vector of each target word according to an iterative adjustment result is implemented by:

determining whether iterative adjustment is continuously carried out at present according to a third reference rule, determining an adjustment object which needs to be adjusted at present according to a first reference rule when the iterative adjustment is continuously determined, and carrying out corresponding adjustment amplitude adjustment on the adjustment object according to a second reference rule;

when determining that iterative adjustment is not continuously carried out, determining matrix elements of each row of the current training matrix as word feature vectors of corresponding bound target words;

wherein the adjustment object comprises at least one matrix element which needs to be adjusted.

In some embodiments, the processor performing the operation of determining whether to continue iterative adjustment currently according to the third reference rule is performed by:

summing all matrix elements of the training matrix to obtain a first operation result;

and judging the difference value between the first operation result and the line number of the training matrix, and outputting a result of determining not to continue the iterative adjustment when the difference value is judged to meet the preset condition, or outputting a result of determining to continue the iterative adjustment.

In some embodiments, the processor determines that the adjustment object currently required to be adjusted according to the first reference rule by:

judging the difference value between the first operation result and the row number of the training matrix, and selecting the matrix element with the largest value in the row with the largest sum of the column elements in the training matrix as an adjustment object when the first operation result is judged to be larger than the row number of the training matrix;

and when the first operation result is judged to be smaller than the row number of the training matrix, selecting the matrix element with the minimum value in the row with the minimum sum of the row elements in the training matrix as an adjustment object.

In some embodiments, the processor performs the adjustment of the corresponding adjustment amplitude of the adjustment object according to the second reference rule by:

and when the matrix element with the minimum value in the column with the minimum sum of the column elements in the training matrix is selected as an adjusting object, adjusting the adjusting object upwards by a second preset amplitude.

Wherein, as a preferred implementation example, the first preset amplitude and the second preset amplitude are both set to be

Wherein S is_iIs the sum of the column elements of the ith column of the training matrix, and m is the number of rows of the training matrix.

It should be noted that, a specific implementation process and an implementation principle of the apparatus for generating a word feature vector according to the embodiment of the present invention are similar to an implementation process and an implementation principle of corresponding operation steps of the foregoing method embodiment, and reference may be specifically made to the description of the foregoing embodiment, which is not described herein again.

In some embodiments, embodiments of the present invention provide a server, which includes at least one processor, and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method for generating word feature vectors of any of the above embodiments.

In some embodiments, the present invention provides a non-transitory computer-readable storage medium, in which one or more programs including executable instructions are stored, where the executable instructions can be read and executed by an electronic device (including but not limited to a computer, a server, or a network device, etc.) to perform the method for generating the word feature vector according to any one of the above embodiments of the present invention.

In some embodiments, the present invention further provides a computer program product, which includes a computer program stored on a non-volatile computer-readable storage medium, the computer program including program instructions, which, when executed by a computer, cause the computer to perform the method for generating word feature vectors of any one of the above embodiments.

In some embodiments, an embodiment of the present invention further provides an electronic device, which includes: at least one processor, and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method for generating word feature vectors of any of the above embodiments.

In some embodiments, an embodiment of the present invention further provides a storage medium, on which a computer program is stored, where the program is executed by a processor to implement the method for generating a word feature vector of any one of the above embodiments.

Fig. 11 is a schematic hardware structure diagram of an electronic device for executing a method for generating a word feature vector according to another embodiment of the present application, where, as shown in fig. 11, the device includes:

One or more processors 610 and a memory 620, one processor 610 being illustrated in fig. 11.

The apparatus performing the method for generating a word feature vector may further include: an input device 630 and an output device 640.

The processor 610, memory 620, input device 630, and output device 640 may be connected by a bus or other means, such as by bus in fig. 11.

The memory 620, which is a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules, such as program instructions/modules corresponding to the method for generating word feature vectors in the embodiments of the present application. The processor 610 executes various functional applications of the server and data processing, i.e., implements the method for generating the word feature vector of the above-described method embodiments, by executing the nonvolatile software program, instructions, and modules stored in the memory 620.

The memory 620 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of a method for generating a word feature vector, and the like. Further, the memory 620 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, memory 620 optionally includes memory located remotely from processor 610, which may be connected to the electronic device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The input device 630 may receive input numeric or character information and generate signals related to user settings and function control of a method for generating a word feature vector. The output device 640 may include a display device such as a display screen.

The one or more modules are stored in the memory 620 and, when executed by the one or more processors 610, perform a method for generating word feature vectors in any of the method embodiments described above.

The product can execute the method provided by the embodiment of the application, and has the corresponding functional modules and beneficial effects of the execution method. For technical details that are not described in detail in this embodiment, reference may be made to the methods provided in the embodiments of the present application.

The electronic device of the embodiments of the present application exists in various forms, including but not limited to:

(1) a mobile communication device: such devices are characterized by mobile communications capabilities and are primarily targeted at providing voice, data communications. Such terminals include: smart phones (e.g., iphones), multimedia phones, functional phones, and low-end phones, among others.

(2) Ultra mobile personal computer device: the equipment belongs to the category of personal computers, has calculation and processing functions and generally has the characteristic of mobile internet access. Such terminals include: PDA, MID, and UMPC devices, etc., such as ipads.

(3) A portable entertainment device: such devices may display and play multimedia content. This kind of equipment includes: audio, video players (e.g., ipods), handheld game consoles, electronic books, and smart toys and portable car navigation devices.

(4) A server: the device for providing the computing service comprises a processor, a hard disk, a memory, a system bus and the like, and the server is similar to a general computer architecture, but has higher requirements on processing capacity, stability, reliability, safety, expandability, manageability and the like because of the need of providing high-reliability service.

(5) And other electronic devices with data interaction functions.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment may be implemented by software plus a general hardware platform, and may also be implemented by hardware. Based on such understanding, the above technical solutions substantially or contributing to the related art may be embodied in the form of a software product, which may be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims

1. A method for generating a word feature vector, the method comprising:

2. The method of claim 1, further comprising:

3. The method of claim 2, further comprising

4. The method according to claim 3, wherein the obtaining of the initial feature vector of each target word in the target word set is obtaining a random N-bit encoded initial feature vector of a word corpus matching each target word from a corpus, or obtaining a latest updated initial feature vector of the word corpus matching each target word from the corpus.

5. The method of any one of claims 1 to 4, wherein the training of the target words in the target word set is based on a difference between a sum of initial feature vectors of all target words in the target word set and a number of target words in the target word set, and is based on a goal that a sum of current feature vectors of all target words in the target word set is equal to the number of target words in the target word set.

6. The method of claim 5, wherein the training of the target words in the set of target words according to the difference between the sum of the initial feature vectors of all the target words in the set of target words and the number of the target words in the set of target words, and the determining of the word feature vector of each target word according to the training result comprises

7. The method according to claim 6, wherein the preset reference rules include a first reference rule for determining the adjustment object in each iteration from the training matrix, a second reference rule for controlling the adjustment amplitude of the adjustment object in each iteration, and a third reference rule for controlling the number of times of adjustment of the iteration.

8. The method of claim 7, wherein forming the training matrix of the minimum training unit according to the obtained target word set and the initial feature vector of each target word in the set comprises

9. The method of claim 8, wherein iteratively adjusting matrix elements of the training matrix according to a predetermined reference rule, and determining the term eigenvector for each target term according to the iteratively adjusted result comprises

10. The method of claim 9, wherein determining whether iterative adjustments are currently proceeding according to a third baseline rule comprises

and judging the difference value between the first operation result and the row number of the training matrix, and outputting a result of determining not to continue the iterative adjustment when the difference value is judged to meet a preset condition, or outputting a result of determining to continue the iterative adjustment.