CN115098647B

CN115098647B - Feature vector generation method and device for text representation and electronic equipment

Info

Publication number: CN115098647B
Application number: CN202211015737.1A
Authority: CN
Inventors: 赵祥; 葛标; 张聪聪; 柳进军; 王辉; 郭宝松
Original assignee: Zhongguancun Smart City Co Ltd
Current assignee: Zhongguancun Smart City Co Ltd
Priority date: 2022-08-24
Filing date: 2022-08-24
Publication date: 2022-11-01
Anticipated expiration: 2042-08-24
Also published as: CN115098647A

Abstract

The embodiment of the disclosure discloses a feature vector generation method and device for text representation and electronic equipment. One embodiment of the method comprises: splicing the knowledge features in the knowledge feature sequence with the text to be represented; inputting the splicing characteristics into a pre-training model; multiplying each candidate feature vector in the candidate feature vector subsequence by an embedded matrix in a pre-training model; carrying out vector fusion on the target characteristic vector and a word vector corresponding to the target characteristic vector in a word list; inputting the fusion vector in the obtained fusion vector sequence into a first feature dimension reduction network; inputting the target candidate feature vector into a second feature dimension reduction network; and carrying out vector splicing on the first dimension-reducing feature vector and the second dimension-reducing feature vector in the first dimension-reducing feature vector sequence to generate a text representation vector corresponding to the text to be represented. This embodiment improves the accuracy of the generated feature vectors.

Description

Feature vector generation method and device for text representation and electronic equipment

Technical Field

The embodiment of the disclosure relates to the technical field of computers, in particular to a method and a device for generating a feature vector for text representation and electronic equipment.

Background

In a plurality of application scenarios in the field of natural language processing, text characterization, that is, conversion of a text into corresponding feature vectors, is often required for the text. At present, when feature vectors are generated, the method generally adopted is as follows: feature extraction is performed through a plurality of feature extraction layers to generate feature vectors.

However, when the above-described manner is adopted, there are often technical problems as follows:

firstly, when the text contains more features, the vector correlation degree among a plurality of feature vectors obtained through a plurality of feature extraction is not large, so that the generated feature vectors are not accurate enough;

secondly, the importance degree of the features corresponding to different features is different, and adaptive feature extraction is often not performed on different features, so that the generated feature vector is not accurate enough.

Disclosure of Invention

This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

Some embodiments of the present disclosure propose a feature vector generation method, apparatus and electronic device for text characterization to solve one or more of the technical problems mentioned in the above background section.

In a first aspect, some embodiments of the present disclosure provide a method for feature vector generation for text characterization, the method including: acquiring a text to be represented; splicing the knowledge features in the knowledge feature sequence with the text to be characterized to generate splicing features; inputting the splicing characteristics into a pre-training model to generate a candidate characteristic vector sequence; multiplying each candidate feature vector in the candidate feature vector subsequence included in the candidate feature vector sequence by the embedded matrix in the pre-training model to generate a target feature vector to obtain a target feature vector sequence; for each target feature vector in the target feature vector sequence, performing vector fusion on the target feature vector and a word vector corresponding to the target feature vector in a word list to generate a fusion vector; inputting the fusion vector in the obtained fusion vector sequence into a first feature dimension reduction network to generate a first dimension reduction feature vector sequence; inputting a target candidate eigenvector into a second feature dimension reduction network to generate a second dimension reduction eigenvector, wherein the target candidate eigenvector is a candidate eigenvector in the candidate eigenvector sequence except the candidate eigenvector subsequence; and carrying out vector splicing on the first dimension-reducing feature vector and the second dimension-reducing feature vector in the first dimension-reducing feature vector sequence to generate a text representation vector corresponding to the text to be represented.

In a second aspect, some embodiments of the present disclosure provide an apparatus for feature vector generation for text characterization, the apparatus comprising: the acquisition unit is configured to acquire a text to be represented; the splicing unit is configured to splice the knowledge features in the knowledge feature sequence with the text to be represented so as to generate splicing features; a first input unit configured to input the splicing feature into a pre-training model to generate a candidate feature vector sequence; a multiplication processing unit, configured to multiply each candidate eigenvector in a candidate eigenvector subsequence included in the candidate eigenvector sequence with an embedded matrix in the pre-training model to generate a target eigenvector, so as to obtain a target eigenvector sequence; a vector fusion unit configured to perform vector fusion on each target feature vector in the target feature vector sequence and a corresponding word vector of the target feature vector in a word list to generate a fusion vector; a second input unit configured to input the fusion vector in the obtained fusion vector sequence into the first feature dimension reduction network to generate a first dimension reduction feature vector sequence; a third input unit configured to input a target candidate feature vector into a second feature dimension reduction network to generate a second dimension reduction feature vector, wherein the target candidate feature vector is a candidate feature vector in the candidate feature vector sequence except for the candidate feature vector subsequence; and the vector splicing unit is configured to perform vector splicing on the first dimension-reduced feature vector and the second dimension-reduced feature vector in the first dimension-reduced feature vector sequence to generate a text representation vector corresponding to the text to be represented.

In a third aspect, some embodiments of the present disclosure provide an electronic device, comprising: one or more processors; a storage device having one or more programs stored thereon, which when executed by one or more processors, cause the one or more processors to implement the method described in any of the implementations of the first aspect.

In a fourth aspect, some embodiments of the disclosure provide a computer readable medium on which a computer program is stored, wherein the program when executed by a processor implements the method described in any implementation of the first aspect.

The above embodiments of the present disclosure have the following advantages: by the feature vector generation method for text representation according to some embodiments of the present disclosure, the accuracy of the generated feature vector is improved. Specifically, the reason why the accuracy of the generated feature vector is not accurate enough is that: when the text contains more features, the vector correlation degree between a plurality of feature vectors obtained through a plurality of times of feature extraction is not large, so that the generated feature vectors are not accurate enough. Based on this, in the feature vector generation method for text representation of some embodiments of the present disclosure, first, a text to be represented is obtained. And then, splicing the knowledge features in the knowledge feature sequence with the text to be characterized to generate a splicing feature. Knowledge features in the present disclosure are features that are associated with the text to be characterized. The text content of the text to be characterized is enhanced and/or supplemented by adding knowledge features. Further, the splicing features are input into a pre-training model to generate a candidate feature vector sequence. The feature vector is preliminarily determined by pre-training the model. In addition, each candidate feature vector in the candidate feature vector subsequence included in the candidate feature vector sequence is multiplied by the embedded matrix in the pre-training model to generate a target feature vector, so that a target feature vector sequence is obtained. Next, for each target feature vector in the target feature vector sequence, vector fusion is performed between the target feature vector and a corresponding word vector of the target feature vector in a word list to generate a fusion vector. And enriching the feature expression capability of the vector by adding the word vector corresponding to the feature vector. In addition, the fusion vectors in the obtained fusion vector sequence are input into a first feature dimension reduction network to generate a first dimension reduction feature vector sequence. Then, inputting a target candidate feature vector into a second feature dimension reduction network to generate a second dimension reduction feature vector, wherein the target candidate feature vector is a candidate feature vector in the candidate feature vector sequence except for the candidate feature vector subsequence. The feature dimension is reduced through feature dimension reduction, so that the complexity of subsequent feature calculation is reduced. And finally, carrying out vector splicing on the first dimension-reducing feature vector and the second dimension-reducing feature vector in the first dimension-reducing feature vector sequence to generate a text representation vector corresponding to the text to be represented. The feature vector generated in the mode can well express the features corresponding to the highlighted text, and therefore the purpose of improving the accuracy of the generated feature vector is achieved.

Drawings

The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that elements and elements are not necessarily drawn to scale.

FIG. 1 is a flow diagram of some embodiments of a feature vector generation method for text characterization according to the present disclosure;

FIG. 2 is a schematic structural diagram of some embodiments of a feature vector generation apparatus for text characterization according to the present disclosure;

FIG. 3 is a schematic block diagram of an electronic device suitable for use in implementing some embodiments of the present disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.

It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings. The embodiments and features of the embodiments in the present disclosure may be combined with each other without conflict.

It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units.

It is noted that references to "a" or "an" in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will appreciate that references to "one or more" are intended to be exemplary and not limiting unless the context clearly indicates otherwise.

The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.

The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

Referring to fig. 1, a flow 100 of some embodiments of a feature vector generation method for text characterization according to the present disclosure is shown. The feature vector generation method for text representation comprises the following steps:

step 101, a text to be characterized is obtained.

In some embodiments, an executing subject (e.g., a computing device) of the feature vector generation method for text characterization may obtain the text to be characterized through a wired connection or a wireless connection. The text to be represented may be a text of a corresponding text representation vector to be generated. For example, the text to be characterized may be news text. For another example, the text to be characterized may also be a work order text.

The computing device may be hardware or software. When the computing device is hardware, it may be implemented as a distributed cluster composed of multiple servers or terminal devices, or may be implemented as a single server or a single terminal device. When the computing device is embodied as software, it may be installed in the hardware devices enumerated above. It may be implemented, for example, as multiple software or software modules to provide distributed services, or as a single software or software module. And is not particularly limited herein. It should be understood that the number of computing devices may have any number of computing devices, as desired for an implementation.

And 102, splicing the knowledge characteristics in the knowledge characteristic sequence with the text to be represented to generate spliced characteristics.

In some embodiments, the execution body may concatenate the knowledge features in the knowledge feature sequence with the text to be characterized to generate a concatenated feature. And the knowledge features in the knowledge feature sequence are features associated with the existing content of the text to be characterized. For example, if the text to be characterized is a work order text, the knowledge feature sequence may be [2021-07-10, haisha XX street ].

As an example, the executing body may splice the knowledge features in the knowledge feature sequence with the text to be characterized in front and back to generate a spliced feature.

In some optional implementations of some embodiments, the executing body concatenates the knowledge features in the knowledge feature sequence with the text to be characterized to generate the concatenated features may include the following steps:

firstly, performing feature splicing on each knowledge feature in the knowledge feature sequence to generate a spliced knowledge feature.

As an example, the knowledge characteristic sequence may be [ knowledge characteristic a, knowledge characteristic B ]. The resulting stitched knowledge feature may be "knowledge feature a knowledge feature B".

And secondly, splicing the splicing knowledge characteristics and the text to be characterized to generate candidate splicing characteristics.

As an example, the candidate concatenation feature may be "knowledge feature a knowledge feature B text to be characterized".

And thirdly, inserting a covering character into the tail part of the knowledge characteristic included in the candidate splicing characteristic, and inserting a covering character into the tail part of the text to be characterized included in the candidate splicing characteristic so as to generate the splicing characteristic.

As an example, the above-mentioned MASK may be [ MASK ]. The splicing characteristic can be 'knowledge characteristic A [ MASK ] knowledge characteristic B [ MASK ] text [ MASK ] to be represented'.

And 103, inputting the splicing features into a pre-training model to generate a candidate feature vector sequence.

In some embodiments, the execution agent may input the stitched features into a pre-trained model to generate a sequence of candidate feature vectors. And the candidate feature vector in the candidate feature vector sequence is the feature vector corresponding to the text at the position corresponding to the covering character. The pre-trained model may be a pre-trained model. For example, the pre-training model may be a BERT model. The vector dimension of the candidate eigenvector in the candidate eigenvector sequence is 768 dimensions.

And 104, multiplying each candidate feature vector in the candidate feature vector subsequence included in the candidate feature vector sequence by an embedded matrix in the pre-training model to generate a target feature vector to obtain a target feature vector sequence.

In some embodiments, the executing entity may multiply each candidate feature vector in a candidate feature vector subsequence included in the candidate feature vector sequence by an embedded matrix in a pre-training model to generate a target feature vector, so as to obtain a target feature vector sequence. The embedded matrix is an embedded matrix corresponding to an embedded layer included in the pre-training model. The vector dimensions of the target feature vectors in the target feature vector sequence are as follows: 21128 dimension.

In some optional implementations of some embodiments, the performing a multiplication process on each candidate feature vector in a candidate feature vector subsequence included in the candidate feature vector sequence by an embedding matrix in the pre-training model to generate a target feature vector includes:

and multiplying the candidate feature vector by an embedding matrix corresponding to an input layer included in the pre-training model to generate the target feature vector. Wherein, the input layer may be the first embedded layer included in the pre-training model.

And 105, for each target feature vector in the target feature vector sequence, performing vector fusion on the target feature vector and a word vector corresponding to the target feature vector in the word list to generate a fusion vector.

In some embodiments, for each target feature vector in the sequence of target feature vectors, the execution entity may vector-fuse the target feature vector with a corresponding word vector of the target feature vector in the vocabulary to generate a fused vector. The word list is a preset data list in which words and corresponding word vectors are stored.

As an example, the execution agent may vector-splice the target feature vector and the corresponding word vector to generate a fused vector.

By fusing the target characteristic vector and the corresponding word vector, different vector representations aiming at the same word are enriched, and the representation capability of the obtained fusion vector is improved.

Optionally, a vector length of the target feature vector in the target feature vector sequence is the same as a vector length of a word vector corresponding to the target feature vector in the word list, where the word vector in the word list is a word vector subjected to unique hot encoding.

As an example, the length of the word vector in the above vocabulary may be 21128 dimensions.

In some optional implementations of some embodiments, the vector fusing, by the execution main body, the target feature vector and a corresponding word vector of the target feature vector in a word list to generate a fused vector may include:

and performing vector addition fusion on the target feature vector and the word vector corresponding to the target feature vector in the word list to generate a fusion vector corresponding to the target feature vector.

As an example, the target feature vector may be "00000 \8230; 010". The word vector corresponding to the target feature vector may be "00000 \8230001 ″. The resulting fusion vector may be "00000 \8230, 011".

And 106, inputting the fusion vector in the obtained fusion vector sequence into a first feature dimension reduction network to generate a first dimension reduction feature vector sequence.

In some embodiments, the executing entity may input the fusion vector in the obtained fusion vector sequence into the first feature dimension reduction network to generate a first dimension reduction feature vector sequence. The first feature dimension reduction network may be a model for reducing feature dimensions of features. The vector dimension of the first dimensionality reduction feature vector output by the first feature network is as follows: and (4) 128 dimensions.

As an example, the first dimension-reduction feature network may be a linear convolutional network including a plurality of convolutional layers.

Optionally, the first feature dimension reduction network includes: a first linear layer and a second linear layer. Wherein the vector dimension of the vector output by the second linear layer is smaller than the vector dimension of the vector output by the first linear layer.

In an optional implementation manner of some embodiments, the inputting, by the executing entity, a fused vector in the obtained fused vector sequence into a first feature dimension reduction network to generate a first dimension reduction feature vector sequence may include the following steps:

in a first step, a first target-embedded matrix is initialized.

The first target embedded matrix is an embedded matrix corresponding to the first characteristic dimension reduction network. The matrix dimension of the first target embedding matrix is as follows: 21128 × 768 dimensions.

And secondly, multiplying the fusion vector in the fusion vector sequence by the first target embedding matrix to generate a first multiplied feature vector sequence.

And the vector dimensions of the first multiplied eigenvectors in the first multiplied eigenvector sequence are the same. The vector dimension of the first multiplied eigenvector is: 768 dimensions.

And thirdly, inputting the first multiplied eigenvectors in the first multiplied eigenvector sequence into the first linear layer to obtain first linearly processed eigenvectors and obtain a first linearly processed eigenvector sequence.

The vector length of the feature vector after the first linear processing is the same as the vector length of the feature vector after the first multiplication. For example, the vector dimensions of the vector input into the first linear layer are: 768 dimensions. The vector dimension of the vector output by the first linear layer is as follows: 768 dimensions.

And fourthly, inputting the first linearly processed feature vector in the first linearly processed feature vector sequence into the second linear layer to generate a first dimension reduction feature vector, and obtaining the first dimension reduction feature vector sequence.

And the vector dimensions of the first dimension-reducing feature vectors in the first dimension-reducing feature vector sequence are the same. The vector length of the first dimension-reducing feature vector is smaller than the vector length of the feature vector after the linear processing. The vector dimension of the first dimension-reducing feature vector is: and (4) 128 dimensions. The vector dimension of the vector input into the second linear layer is: 768 dimensions. The vector dimension of the vector output by the second linear layer is as follows: and 128 dimensions.

And 107, inputting the target candidate feature vector into a second feature dimension reduction network to generate a second dimension reduction feature vector.

In some embodiments, the executing agent may input the target candidate feature vector into a second feature dimension reduction network to generate a second dimension reduction feature vector. The second feature dimension reduction network may be a model for performing feature dimension reduction of the feature. Wherein, the vector dimension of the second dimensionality reduction feature vector output by the second feature network is: 256 dimensions. The target candidate eigenvector is a candidate eigenvector in the candidate eigenvector sequence except for the candidate eigenvector subsequence

As an example, the second dimension-reduced feature network may be a linear convolutional network including a plurality of convolutional layers.

As an example, the candidate feature vector sequence may be [ candidate feature vector a, candidate feature vector B, candidate feature vector C, candidate feature vector D ]. The target feature vector sequence may be [ candidate feature vector a, candidate feature vector B, candidate feature vector C ]. The target candidate feature vector may be the candidate feature vector D.

Optionally, the second feature dimension reduction network comprises: a third linear layer and a fourth linear layer. The vector dimension of the vector output by the third linear layer is larger than the vector dimension of the vector output by the fourth linear layer.

In some optional implementations of some embodiments, the executing subject inputting the target candidate feature vector into a second feature dimension reduction network to generate a second dimension reduction feature vector may include the following steps:

in a first step, a second target-embedded matrix is initialized.

And the second target embedded matrix is an embedded matrix corresponding to the second characteristic dimension reduction network. The matrix dimension of the second target embedding matrix is as follows: 768X 768 dimensions.

And a second step of multiplying the target candidate eigenvector by the second target embedding matrix to generate a second multiplied eigenvector.

Wherein the vector dimension of the second multiplied eigenvector is: 768 dimensions.

And inputting the second multiplied feature vector into the third linear layer to generate a second linearly processed feature vector.

The vector dimension of the vector output by the third linear layer is as follows: 768 dimensions.

And a fourth step of inputting the second linearly processed feature vector to the fourth linear layer to generate the second reduced-dimension feature vector.

Wherein, the vector dimension of the second dimension-reducing feature vector is 128 dimensions.

The content in the above step 106 to step 107 serves as an invention point of the present disclosure, and a second technical problem in the background art is solved, that is, "different features correspond to different features with different degrees of importance, and adaptive feature extraction is often not performed for different features, so that the generated feature vector is not accurate enough". In practical situations, the importance degrees of different features are often different, and the same feature extraction mode is adopted, which results in that the vector lengths of the obtained sub-feature vectors are consistent, that is, the proportion of the sub-feature vectors corresponding to the more important features in the feature vectors is reduced. Meanwhile, when the number of features is large, the length of the feature vector is increased, and subsequent calculation and use are influenced. Based on this, it is considered that the importance degree of the target candidate feature in the present disclosure is higher than that of the target feature vector in the target feature vector sequence. Therefore, the method adopts the first characteristic dimension reduction network to carry out characteristic dimension reduction on the target characteristic vector, and carries out dimension reduction on the target candidate characteristic vector through the second characteristic dimension reduction network to obtain a first dimension reduction characteristic vector sequence and a second dimension reduction characteristic vector. And the vector dimension of the second dimension-reducing feature vector is larger than that of the first dimension-reducing feature vector. In addition, the vector dimension of the obtained text representation vector is shortened through the first feature dimension reduction network and the second feature dimension reduction network, and the calculation complexity in the subsequent text representation vector use process is reduced.

And 108, carrying out vector splicing on the first dimension-reducing feature vector and the second dimension-reducing feature vector in the first dimension-reducing feature vector sequence to generate a text characterization vector corresponding to the text to be characterized.

In some embodiments, the executing entity may perform vector splicing on the first dimension-reduced feature vector and the second dimension-reduced feature vector in the first dimension-reduced feature vector sequence to generate a text representation vector corresponding to the text to be represented. The execution body may splice the first dimension-reduced feature vector and the second dimension-reduced feature vector in the first dimension-reduced feature vector sequence front and back to generate the text representation vector.

Optionally, the executing body may further execute the following processing steps:

and in response to the fact that the characteristic application scene corresponding to the text to be characterized is determined to be a text classification scene, inputting the text characterization vector to a pre-trained text classification model to determine the text category corresponding to the text to be characterized. Wherein the feature application scenario characterizes an application scenario of the feature. For example, the feature application scenario may be a text annotation scenario. As another example, the feature application scenario may also be a text classification scenario. The text classification scene is an application scene for classifying the texts to be represented. Such as classifying the emotion expressed by the text to be characterized.

As an example, in response to determining that the feature application scenario corresponding to the text to be characterized is a text classification scenario, inputting the text characterization vector to a pre-trained text classification model to determine the text category corresponding to the text to be characterized may include the following steps:

firstly, determining a classification task corresponding to the text classification scene.

For example, the classification task may be, but is not limited to, any of the following: the method comprises a text emotion classification task, a text labeling task and a text category classification task. For example, when the text to be characterized is a work order text, the corresponding classification task may be a text category classification task, and in practice, the text category classification task may be a classification task for classifying the work order category. Further, the text category classification task may also be a classification task for rating a work order.

And secondly, inputting the text representation vector into a text classification model corresponding to the classification task to generate the text category.

As an example, when the classification task is a text emotion classification task, the text classification model may be an LSTM (Long short-term memory) model.

As yet another example, when the classification task is a text labeling task, the text classification model may be a BiLSTM (Bi-directional Long Short-Term Memory) model + CNN (Convolutional Neural Networks) model + CRF (Conditional Random Field) model.

As yet another example, when the classification task is a text category classification task, the above text classification model may be a CNN model with a classification layer.

The above embodiments of the present disclosure have the following beneficial effects: by the feature vector generation method for text representation of some embodiments of the present disclosure, the accuracy of the generated feature vector is improved. Specifically, the reason why the accuracy of the generated feature vector is not accurate enough is that: when the text contains more features, the vector correlation degree among a plurality of feature vectors obtained through feature extraction for a plurality of times is not large, so that the generated feature vectors are not accurate enough. Based on this, in the feature vector generation method for text representation according to some embodiments of the present disclosure, first, a text to be represented is obtained. And then, splicing the knowledge features in the knowledge feature sequence with the text to be characterized to generate a splicing feature. Knowledge features in this disclosure are features that have an associative relationship with the text to be characterized. By adding knowledge features, the text content of the text to be characterized is enhanced and/or supplemented. Further, the splicing characteristics are input into a pre-training model to generate a candidate characteristic vector sequence. The feature vector is preliminarily determined by pre-training the model. In addition, each candidate feature vector in the candidate feature vector subsequence included in the candidate feature vector sequence is multiplied by the embedded matrix in the pre-training model to generate a target feature vector, so that a target feature vector sequence is obtained. Next, for each target feature vector in the target feature vector sequence, vector fusion is performed between the target feature vector and a corresponding word vector of the target feature vector in a word list to generate a fusion vector. And enriching the feature expression capability of the vector by adding the word vector corresponding to the feature vector. In addition, the fusion vectors in the obtained fusion vector sequence are input into a first feature dimension reduction network to generate a first dimension reduction feature vector sequence. Then, inputting a target candidate feature vector into a second feature dimension reduction network to generate a second dimension reduction feature vector, wherein the target candidate feature vector is a candidate feature vector in the candidate feature vector sequence except for the candidate feature vector subsequence. Through feature dimension reduction, feature dimensions are reduced, and therefore complexity of subsequent feature calculation is reduced. And finally, carrying out vector splicing on the first dimension-reducing feature vector and the second dimension-reducing feature vector in the first dimension-reducing feature vector sequence to generate a text representation vector corresponding to the text to be represented. The feature vector generated in the mode can well express the features corresponding to the highlighted text, and therefore the purpose of improving the accuracy of the generated feature vector is achieved.

With further reference to fig. 2, as an implementation of the methods shown in the above figures, the present disclosure provides some embodiments of a feature vector generation apparatus for text characterization, which correspond to those of the method embodiments shown in fig. 1, and which may be applied in various electronic devices in particular.

As shown in fig. 2, the feature vector generation apparatus 200 for text characterization of some embodiments includes: the device comprises an acquisition unit, a splicing unit, a first input unit, a multiplication processing unit, a vector fusion unit, a second input unit, a third input unit and a vector splicing unit. The obtaining unit is configured to obtain a text to be represented; the splicing unit is configured to splice the knowledge features in the knowledge feature sequence with the text to be characterized to generate splicing features; a first input unit configured to input the stitched features into a pre-training model to generate a candidate feature vector sequence; the multiplication processing unit is configured to multiply each candidate feature vector in the candidate feature vector subsequence included in the candidate feature vector sequence with the embedded matrix in the pre-training model to generate a target feature vector to obtain a target feature vector sequence; the vector fusion unit is configured to perform vector fusion on each target feature vector in the target feature vector sequence and a corresponding word vector of the target feature vector in a word list to generate a fusion vector; a second input unit configured to input the fusion vector in the obtained fusion vector sequence into the first feature dimension reduction network to generate a first dimension reduction feature vector sequence; a third input unit configured to input a target candidate feature vector into a second feature dimension reduction network to generate a second dimension reduction feature vector, wherein the target candidate feature vector is a candidate feature vector in the candidate feature vector sequence except for the candidate feature vector subsequence; and the vector splicing unit is configured to perform vector splicing on the first dimension-reduced feature vector and the second dimension-reduced feature vector in the first dimension-reduced feature vector sequence to generate a text characterization vector corresponding to the text to be characterized.

It will be understood that the units described in the apparatus 200 correspond to the various steps in the method described with reference to fig. 1. Thus, the operations, features and resulting advantages described above with respect to the method are also applicable to the apparatus 200 and the units included therein, and are not described herein again.

Referring now to FIG. 3, shown is a block diagram of an electronic device (e.g., computing device) 300 suitable for use in implementing some embodiments of the present disclosure. The electronic device shown in fig. 3 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 3, the electronic device 300 may include a processing means (e.g., a central processing unit, a graphics processor, etc.) 301 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM) 302 or a program loaded from a storage means 308 into a Random Access Memory (RAM) 303. In the RAM 303, various programs and data necessary for the operation of the electronic apparatus 300 are also stored. The processing device 301, the ROM 302, and the RAM 303 are connected to each other via a bus 304. An input/output (I/O) interface 305 is also connected to bus 304.

Generally, the following devices may be connected to the I/O interface 305: input devices 306 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; an output device 307 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage devices 308 including, for example, magnetic tape, hard disk, etc.; and a communication device 309. The communication means 309 may allow the electronic device 300 to communicate wirelessly or by wire with other devices to exchange data. While fig. 3 illustrates an electronic device 300 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided. Each block shown in fig. 3 may represent one device or may represent multiple devices, as desired.

In particular, according to some embodiments of the present disclosure, the processes described above with reference to the flow diagrams may be implemented as computer software programs. For example, some embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In some such embodiments, the computer program may be downloaded and installed from a network through the communication device 309, or installed from the storage device 308, or installed from the ROM 302. The computer program, when executed by the processing apparatus 301, performs the above-described functions defined in the methods of some embodiments of the present disclosure.

It should be noted that the computer readable medium described in some embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In some embodiments of the disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In some embodiments of the present disclosure, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

In some embodiments, the clients, servers may communicate using any currently known or future developed network Protocol, such as HTTP (HyperText Transfer Protocol), and may interconnect with any form or medium of digital data communication (e.g., a communications network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network.

The computer readable medium may be embodied in the electronic device; or may be separate and not incorporated into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring a text to be represented; splicing the knowledge features in the knowledge feature sequence with the text to be characterized to generate splicing features; inputting the splicing features into a pre-training model to generate a candidate feature vector sequence; multiplying each candidate feature vector in the candidate feature vector subsequence included in the candidate feature vector sequence by the embedded matrix in the pre-training model to generate a target feature vector to obtain a target feature vector sequence; for each target feature vector in the target feature vector sequence, performing vector fusion on the target feature vector and a word vector corresponding to the target feature vector in a word list to generate a fusion vector; inputting the fusion vector in the obtained fusion vector sequence into a first feature dimension reduction network to generate a first dimension reduction feature vector sequence;

inputting a target candidate eigenvector into a second feature dimension reduction network to generate a second dimension reduction eigenvector, wherein the target candidate eigenvector is a candidate eigenvector in the candidate eigenvector sequence except the candidate eigenvector subsequence; and carrying out vector splicing on the first dimension-reducing feature vector and the second dimension-reducing feature vector in the first dimension-reducing feature vector sequence to generate a text representation vector corresponding to the text to be represented.

Computer program code for carrying out operations for embodiments of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in some embodiments of the present disclosure may be implemented by software, and may also be implemented by hardware. The described units may also be provided in a processor, and may be described as: a processor comprises an acquisition unit, a splicing unit, a first input unit, a multiplication processing unit, a vector fusion unit, a second input unit, a third input unit and a vector splicing unit. The names of the units do not form a limitation to the units themselves in some cases, for example, a concatenation unit may also be described as a "unit that concatenates a knowledge feature in a knowledge feature sequence with the text to be characterized to generate a concatenation feature".

The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems on a chip (SOCs), complex Programmable Logic Devices (CPLDs), and the like.

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the embodiments of the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is made without departing from the inventive concept as defined above. For example, the above features and (but not limited to) the features with similar functions disclosed in the embodiments of the present disclosure are mutually replaced to form the technical solution.

Claims

1. A method of feature vector generation for text characterization, comprising:

acquiring a text to be represented;

splicing the knowledge features in the knowledge feature sequence with the text to be represented to generate splicing features;

inputting the splicing features into a pre-training model to generate a candidate feature vector sequence;

multiplying each candidate feature vector in the candidate feature vector subsequence included in the candidate feature vector sequence by an embedded matrix in the pre-training model to generate a target feature vector to obtain a target feature vector sequence;

for each target feature vector in the target feature vector sequence, performing vector fusion on the target feature vector and a word vector corresponding to the target feature vector in a word list to generate a fusion vector;

inputting the fusion vector in the obtained fusion vector sequence into a first feature dimension reduction network to generate a first dimension reduction feature vector sequence;

inputting a target candidate feature vector into a second feature dimension reduction network to generate a second dimension reduction feature vector, wherein the target candidate feature vector is a candidate feature vector in the candidate feature vector sequence except for the candidate feature vector subsequence;

and carrying out vector splicing on the first dimension-reducing feature vector and the second dimension-reducing feature vector in the first dimension-reducing feature vector sequence to generate a text representation vector corresponding to the text to be represented.

2. The method of claim 1, wherein the stitching the knowledge features in the knowledge feature sequence with the text to be characterized to generate a stitched feature comprises:

performing feature splicing on each knowledge feature in the knowledge feature sequence to generate spliced knowledge features;

splicing the splicing knowledge characteristic with the text to be characterized to generate a candidate splicing characteristic;

and inserting a covering character at the tail part of the knowledge characteristic included in the candidate splicing characteristic, and inserting a covering character at the tail part of the text to be characterized included in the candidate splicing characteristic to generate the splicing characteristic.

3. The method of claim 2, wherein the multiplying each candidate eigenvector in the candidate eigenvector subsequence included in the candidate eigenvector sequence by the embedding matrix in the pre-training model to generate the target eigenvector comprises:

and multiplying the candidate feature vector by an embedding matrix corresponding to an input layer included by the pre-training model to generate the target feature vector.

4. The method according to claim 3, wherein the vector length of the target eigenvector in the target eigenvector sequence is the same as the vector length of the corresponding word vector of the target eigenvector in the vocabulary, and the word vector in the vocabulary is the one-hot coded word vector; and

the vector fusion of the target feature vector and the corresponding word vector of the target feature vector in the word list to generate a fusion vector includes:

and carrying out vector addition and fusion on the target characteristic vector and the word vector corresponding to the target characteristic vector in the word list to generate a fusion vector corresponding to the target characteristic vector.

5. The method of claim 4, wherein the first feature dimension reduction network comprises: a first linear layer and a second linear layer; and

inputting the fusion vector in the obtained fusion vector sequence into a first feature dimension reduction network to generate a first dimension reduction feature vector sequence, including:

initializing a first target embedding matrix, wherein the first target embedding matrix is an embedding matrix corresponding to the first characteristic dimension reduction network;

multiplying the fusion vector in the fusion vector sequence by the first target embedding matrix to generate a first multiplied eigenvector sequence;

inputting a first multiplied feature vector in the first multiplied feature vector sequence into the first linear layer to obtain a first linearly processed feature vector and obtain a first linearly processed feature vector sequence, wherein the vector length of the first linearly processed feature vector is the same as the vector length of the first multiplied feature vector;

and inputting the first linear processed feature vector in the first linear processed feature vector sequence into the second linear layer to generate a first dimension reduction feature vector to obtain the first dimension reduction feature vector sequence, wherein the vector length of the first dimension reduction feature vector is smaller than the vector length of the linear processed feature vector.

6. The method of claim 5, wherein the second feature dimension reduction network comprises: a third linear layer and a fourth linear layer; and

inputting the target candidate feature vector into a second feature dimension reduction network to generate a second dimension reduction feature vector, comprising:

initializing a second target embedding matrix, wherein the second target embedding matrix is an embedding matrix corresponding to the second characteristic dimension reduction network;

multiplying the target candidate eigenvector by the second target embedding matrix to generate a second multiplied eigenvector;

inputting the second multiplied feature vector into the third linear layer to generate a second linearly processed feature vector;

inputting the second linearly processed feature vector into the fourth linear layer to generate the second reduced-dimension feature vector.

7. The method of claim 6, wherein the method further comprises:

and in response to the fact that the characteristic application scene corresponding to the text to be represented is determined to be a text classification scene, inputting the text representation vector to a pre-trained text classification model to determine a text category corresponding to the text to be represented.

8. A feature vector generation apparatus for text characterization, comprising:

the acquisition unit is configured to acquire a text to be characterized;

the splicing unit is configured to splice the knowledge features in the knowledge feature sequence with the text to be represented to generate splicing features;

a first input unit configured to input the stitching features into a pre-training model to generate a candidate feature vector sequence;

the multiplication processing unit is configured to multiply each candidate feature vector in a candidate feature vector subsequence included in the candidate feature vector sequence with an embedded matrix in the pre-training model to generate a target feature vector, so as to obtain a target feature vector sequence;

the vector fusion unit is configured to perform vector fusion on each target feature vector in the target feature vector sequence and a corresponding word vector of the target feature vector in a word list so as to generate a fusion vector;

a second input unit configured to input the fusion vector in the obtained fusion vector sequence into the first feature dimension reduction network to generate a first dimension reduction feature vector sequence;

a third input unit configured to input a target candidate feature vector into a second feature dimension reduction network to generate a second dimension reduction feature vector, wherein the target candidate feature vector is a candidate feature vector in the candidate feature vector sequence except for the candidate feature vector subsequence;

and the vector splicing unit is configured to perform vector splicing on the first dimension-reduced feature vector and the second dimension-reduced feature vector in the first dimension-reduced feature vector sequence to generate a text representation vector corresponding to the text to be represented.

9. An electronic device, comprising:

one or more processors;

a storage device having one or more programs stored thereon;

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-7.

10. A computer-readable medium, on which a computer program is stored, wherein the program, when executed by a processor, implements the method of any one of claims 1 to 7.