CN114611511A - Text vector generation method, model training method and related device - Google Patents

Text vector generation method, model training method and related device Download PDF

Info

Publication number
CN114611511A
CN114611511A CN202210290851.9A CN202210290851A CN114611511A CN 114611511 A CN114611511 A CN 114611511A CN 202210290851 A CN202210290851 A CN 202210290851A CN 114611511 A CN114611511 A CN 114611511A
Authority
CN
China
Prior art keywords
vector
text
sequence
word
prior
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210290851.9A
Other languages
Chinese (zh)
Inventor
罗欢
张炫
姚晓远
未波波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Himalaya Technology Co ltd
Original Assignee
Shanghai Himalaya Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Himalaya Technology Co ltd filed Critical Shanghai Himalaya Technology Co ltd
Priority to CN202210290851.9A priority Critical patent/CN114611511A/en
Publication of CN114611511A publication Critical patent/CN114611511A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

In the text vector generation method, the model training method and the related device provided by the application, for an obtained text sequence, the text processing equipment inputs the prior vector of the text sequence and the word vector, the position vector and the segment vector of the text sequence into a Bert layer of a text vector model together, so that the text vector model takes the prior vector of the text sequence as a reference, and possible lexical knowledge in the text sequence is obtained from the prior vector of the text sequence for converting the text sequence into the text vector. The prior vector carries prior information of words in the text sequence, so that the text sequence is converted by the prior information-assisted text vector model under the condition of word segmentation without depending on a dictionary, and a text vector with more accurate text sequence is obtained.

Description

Text vector generation method, model training method and related device
Technical Field
The present application relates to the field of natural language identification, and in particular, to a text vector generation method, a model training method, and a related apparatus.
Background
In some languages (e.g., chinese), the vocabulary does not have formal delimiters, as opposed to spaces between english words as natural delimiters. Therefore, before the text in these languages is processed, the text needs to be participled.
The existing word segmentation method is based on words or words, wherein the word-based method lacks word information, and the word-based information depends on a dictionary, so that the selection of the dictionary directly determines the word segmentation effect.
Disclosure of Invention
In order to overcome at least one of the deficiencies in the prior art, the present application provides a text vector generation method, a model training method and a related device, which are used for enabling a text vector model to automatically learn to segment a text by referring to a word matrix, and include:
in a first aspect, the present application provides a text vector generation method applied to a text processing device, where the text processing device is configured with a text vector model, where the text vector model includes a Bert layer, and the method includes:
acquiring a prior vector of a text sequence, wherein the prior vector carries prior information of a vocabulary, and the vocabulary is formed by texts in the text sequence;
generating a word vector, a position vector and a segment vector of the text sequence according to the convention of the Bert layer on an input vector;
and inputting the word vector, the position vector, the segment vector and the prior vector of the text sequence into the Bert layer to obtain the text vector of the text sequence.
In a second aspect, the present application further provides a model training method, which is applied to a training device, where the training device is configured with a model to be trained, the model to be trained includes a feature extraction layer and a Bert layer, which are sequentially connected, and the method includes:
obtaining a sample word matrix of a sample sequence, wherein the sample word matrix represents words which can be formed by texts of the sample sequence;
inputting the sample word matrix into the feature extraction layer to obtain a prior vector of the sample sequence;
generating a word vector, a position vector and a segment vector of the sample sequence according to the convention of the Bert layer on an input vector;
inputting the word vector, the position vector, the segment vector of the sample sequence and the prior vector of the sample sequence into the Bert layer to obtain a calculation result of the Bert layer on the sample sequence;
if the calculation result does not meet the preset convergence condition, returning to the step of inputting the sample word matrix to the feature extraction layer and obtaining the prior vector of the sample sequence after adjusting the model parameters of the Bert layer and the feature extraction layer according to the calculation result;
and if the calculation result does not meet the preset convergence condition, taking the trained model to be trained as the text vector model.
In a third aspect, the present application further provides a text vector generation device applied to a text processing device, where the text processing device is configured with a text vector model, the text vector model includes a Bert layer, and the text vector generation device includes:
the device comprises a prior information module, a text sequence analysis module and a text analysis module, wherein the prior information module is used for acquiring a prior vector of the text sequence, and the prior vector carries prior information of a vocabulary which is composed of texts in the text sequence;
the vector generation module is used for generating a word vector, a position vector and a segment vector of the text sequence according to the convention of the Bert layer on an input vector;
the vector generation module is further configured to input the word vector, the position vector, the segment vector, and the prior vector of the text sequence to the Bert layer, so as to obtain a text vector of the text sequence.
In a fourth aspect, the present application further provides a text processing device, where the text processing device includes a processor and a memory, where the memory stores a computer program, and the computer program, when executed by the processor, implements the text vector generation method.
In a fifth aspect, the present application further provides a computer-readable storage medium, which stores a computer program, and when the computer program is executed by a processor, the computer program implements the text vector generation method.
Compared with the prior art, the method has the following beneficial effects:
in the text vector generation method, the model training method, and the related apparatus provided in this embodiment, for an obtained text sequence, the text processing device inputs the prior vector of the text sequence and the word vector, the position vector, and the segment vector of the text sequence together into the Bert layer of the text vector model, so that the text vector model obtains possible lexical knowledge in the text sequence from the prior vector of the text sequence as a reference, which is used to convert the text sequence into the text vector. The prior vector carries prior information of words in the text sequence, so that the text sequence is converted by the prior information-assisted text vector model under the condition of word segmentation without depending on a dictionary, and a text vector with more accurate text sequence is obtained.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.
Fig. 1 is a schematic structural diagram of a text processing apparatus according to an embodiment of the present application;
fig. 2 is a schematic flowchart of a text vector generation method according to an embodiment of the present application;
FIG. 3 is a schematic structural diagram of a Transformer model provided in an embodiment of the present application;
FIG. 4 is a schematic diagram of a Bert input vector provided in an embodiment of the present application;
FIG. 5 is a schematic diagram of a word matrix provided in an embodiment of the present application;
FIG. 6 is a second schematic diagram of a word matrix provided in an embodiment of the present application;
FIG. 7 is a schematic flowchart of a model training method according to an embodiment of the present disclosure;
fig. 8 is a schematic structural diagram of a text vector generation apparatus according to an embodiment of the present application.
Icon: 120-a memory; 130-a processor; 140-a communication unit; 201-a priori information module; 202-vector generation module.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations.
Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
It has been found that the words in some languages do not have formal delimiters, and therefore, the text needs to be participled before being processed in these languages. However, at present, a dictionary is mainly used for word segmentation processing of such texts, and therefore, the selection of the dictionary directly determines the word segmentation effect.
Taking a text sequence in a Chinese form of "today is really good", if the text sequence is segmented by the first dictionary, it is assumed that the segmentation result is "today, weather, really good". If the word segmentation is carried out on the text sequence through the second dictionary, the word segmentation result is assumed to be 'today, day, vigor and goodness'. The first dictionary and the second dictionary respectively focus on dictionaries in different fields, but it is easy to see that the word segmentation effect of the first dictionary is better than that of the second dictionary. Therefore, the selection of the dictionary directly determines the word segmentation effect, namely that if the word segmentation is directly performed on the text sequence through the dictionary, a certain local limitation always exists in the method. Therefore, a method that does not depend on a dictionary to perform direct word segmentation processing on a text sequence, but can utilize prior information of words in the dictionary is needed to improve the conversion effect when the text sequence is converted.
In view of this, the present embodiment provides a text vector generation method applied to a text processing device. In the method, for an input text sequence, a text vector model in text processing equipment takes a prior vector of the text sequence as a reference, and possible vocabulary knowledge in the text sequence is obtained from the prior vector for converting the text sequence into the text vector. Because the prior vector carries the prior information of the vocabulary, and the vocabulary is formed by the texts in the text sequence, the text sequence is converted into the text vector by the text vector model under the assistance of the prior vector under the condition of not depending on a dictionary for word segmentation.
Wherein the text processing device may be a user terminal or a server. Wherein the user terminal may comprise a mobile terminal, a tablet computer, a laptop computer, or any combination thereof. In some embodiments, the mobile terminal may include a smart home device (e.g., a smart speaker), a wearable device, a smart mobile device, a virtual reality device, an augmented reality device, or the like, or any combination thereof. In some embodiments, the smart home devices may include smart lighting devices, control devices for smart electrical devices, smart monitoring devices, smart televisions, smart cameras, or walkie-talkies, or the like, or any combination thereof. In some embodiments, the wearable device may include a smart bracelet, a smart lace, smart glass, a smart helmet, a smart watch, a smart garment, a smart backpack, a smart accessory, and the like, or any combination thereof. In some embodiments, the smart mobile device may include a smartphone, a Personal Digital Assistant (PDA), a gaming device, a navigation device, or a Point of Sale (POS) device, or the like, or any combination thereof.
The server may be a single server or a group of servers. The set of servers can be centralized or distributed (e.g., the servers can be a distributed system). In some embodiments, the server may be local or remote to the user terminal. In some embodiments, the server may be implemented on a cloud platform; by way of example only, the Cloud platform may include a private Cloud, a public Cloud, a hybrid Cloud, a Community Cloud, a distributed Cloud, a cross-Cloud (Inter-Cloud), a Multi-Cloud (Multi-Cloud), and the like, or any combination thereof. In some embodiments, the server may be implemented on an electronic device having one or more components.
The text sequence may be a text sequence input by a user through a text box, a text sequence parsed from a file, or a text sequence parsed from a dialog with the user in a voice interaction process. For example, when the text processing device is a smart speaker, the smart speaker needs to record audio information of a user during speaking, convert the audio information into a text sequence, and then perform natural language processing on the text sequence to identify a control instruction of the text sequence.
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the present embodiment further provides a hardware structure of the text processing apparatus. As shown in fig. 1, the text processing apparatus includes a memory 120, a processor 130, and a communication unit 140. The memory 120, the processor 130 and the communication unit 140 are electrically connected to each other directly or indirectly, so as to realize data transmission or interaction. For example, the components may be electrically connected to each other via one or more communication buses or signal lines.
The Memory 120 may be, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Read-Only Memory (EPROM), an electrically Erasable Read-Only Memory (EEPROM), and the like. The memory 120 is used for storing a program, and the processor 130 executes the program after receiving the execution instruction.
The communication unit 140 is used for transceiving data through a network. The Network may include a wired Network, a Wireless Network, a fiber optic Network, a telecommunications Network, an intranet, the internet, a Local Area Network (LAN), a Wide Area Network (WAN), a Wireless Local Area Network (WLAN), a Metropolitan Area Network (MAN), a Wide Area Network (WAN), a Public Switched Telephone Network (PSTN), a bluetooth Network, a ZigBee Network, or a Near Field Communication (NFC) Network, or the like, or any combination thereof. In some embodiments, the network may include one or more network access points. For example, the network may include wired or wireless network access points, such as base stations and/or network switching nodes, through which one or more components of the service request processing system may connect to the network to exchange data and/or information.
The processor 130 may be an integrated circuit chip having signal processing capabilities, and may include one or more processing cores (e.g., a single-core processor or a multi-core processor). Merely by way of example, the Processor may include a Central Processing Unit (CPU), an Application Specific Integrated Circuit (ASIC), an Application Specific Instruction Set Processor (ASIP), a Graphics Processing Unit (GPU), a Physical Processing Unit (PPU), a Digital Signal Processor (DSP), a Field Programmable Gate Array (FPGA), a Programmable Logic Device (PLD), a controller, a microcontroller Unit, a Reduced Instruction Set computer (Reduced Instruction Set computer), a microprocessor, or the like, or any combination thereof.
In consideration of the fact that the present embodiment relates to a Bert layer, in order to make the present embodiment clearer, before the text vector generation method provided in the present embodiment is explained in detail, the operation principle of the Bert layer to be referred to in the present embodiment is explained first.
To explain the working principle of the Bert layer, the embodiment needs to introduce Word vector tool Word2Vec for comparison. Word2Vec requires training through a large amount of text before it can be used. After the model is trained, when a Word needs to be converted into a corresponding Word vector, only the corresponding model parameter of the Word in Word2Vec needs to be retrieved by directly looking up the dictionary, and the model parameter is used as the Word vector of the Word. Thus, when model training is complete, the same word has one and only one unique word vector.
Although Bert also requires a large amount of text to train it before it is used, unlike Word2Vec, the trained Bert layer is a pre-trained model, and when a Word needs to be converted into a Word vector, the Bert layer calculates in the context of the sentence according to the Word and returns the Word vector of the Word. Thus, even for the same word, when the word is in a different context, the Bert layer will output a different word vector.
For example, "i like to eat an apple" and "i like to use an apple" in the two sentences, it can be seen that "apple" in "i like to eat an apple" refers to a fruit according to the context information of the two sentences; the apple in the apple favorite is a mobile phone.
When the apple in the two sentences is input into Word2Vec, the same Word vector can be obtained; and inputting the apple in the two sentences into the Bert, different word vectors can be respectively output according to the context of the two sentences.
Based on the above description, the following describes in detail each step of the method for generating text vectors provided in this embodiment with reference to fig. 2, but it should be understood that the operations of the flowchart may be implemented out of order, and the steps without logical context may be implemented in reverse order or simultaneously. One skilled in the art, under the guidance of this application, may add one or more other operations to, or remove one or more operations from, the flowchart.
In this embodiment, the method is applied to a text processing device, and a text vector model is configured in the text processing device, where the text vector model includes a Bert layer. As shown in fig. 2, the method includes:
S101A, obtaining the prior vector of the text sequence.
The prior vector carries prior information of a vocabulary, and the vocabulary is formed by texts in a text sequence.
S102A, generating a word vector, a position vector and a segment vector of the text sequence according to the convention of the Bert layer to the input vector.
Wherein the Bert layer is derived from a Transformer model. As shown in fig. 3, the transform model is a neural network model in the form of an Encoder-Decoder structure, and includes an Encoder and a Decoder. Wherein the Encoder includes a plurality of encoding layers encoders (the number of encoders is 6 in fig. 3), and the Decoder also includes a plurality of decoding layers decoders (the number of decoders is 6 in fig. 3) for converting an input first sequence into a second sequence output. For example, as shown in fig. 3, when a chinese sequence "I have one cat" is input to the Transformer model, and the input sequence is processed by the Transformer model, an english sequence "I have a cat" is output.
The Bert layer in this embodiment only uses the coding portion of the transform model. In the Bert layer constraint on the vector quantity, the text sequence needs to be converted into a word vector, a position vector, and a segment vector.
Illustratively, the text sequence "my dog is cut, he likes compressing" is taken as an example. As shown in fig. 4, for the text sequence, it needs to be converted into:
“[CLS]my dog is cute[SEP]he likes paly##ing[SEP]”
wherein, [ CLS]Put at the head of the first sentence for marking in the subsequent classification task. [ SEP ]]For separating sentences. The text processing apparatus converts each word (or character) in the text sequence into a vector of fixed length, i.e., a word vector of the text sequence in the present embodiment is obtained. As shown in fig. 4, "[ CLS]"is represented as E[CLS]The vector of "dog" is denoted as EmyAnd other words are analogized, and the description of the embodiment is omitted.
The position vector represents encoding position information of words into a feature vector, thereby introducing a positional relationship between words in a text sequence. As shown in FIG. 4, the first word "[ CLS ]]"position information of E1To show, the last word "[ SEP ]]"position information of E10And (4) performing representation.
Segment vector identification encodes information of a sentence to which the word belongs into a feature vector. As shown in FIG. 4, "[ CLS ]]mydog is cut 'as the first sentence, and the information of the sentence to which each word belongs is used as' EA"carry out" means. "[ SEP ]]he likes paly##ing[SEP]If the word is the second sentence, the information of the sentence to which each word belongs is used as EB"carry out" means.
S103A, inputting the word vector, the position vector, the segment vector and the prior vector of the text sequence into the Bert layer to obtain the text vector of the text sequence.
Because the prior vector carries the prior information of the vocabulary in the text sequence, the Bert layer can use the prior information in the dictionary under the condition of not performing word segmentation, and the effect of converting the text sequence by the Bert layer is improved. When the text sequence is Chinese, the text vector output by the Bert layer may include a word vector of each text in the text sequence.
Thus, with the above embodiment, for an obtained text sequence, the text processing apparatus inputs the prior vector of the text sequence together with the word vector, the position vector, and the segment vector of the text sequence into the Bert layer of the text vector model, so that the text vector model obtains possible lexical knowledge in the text sequence from the prior vector of the text sequence as a reference for converting the text sequence into the text vector. The prior vector carries prior information of words in the text sequence, so that the text sequence is converted by the prior information-assisted text vector model under the condition of word segmentation without depending on a dictionary, and a text vector with more accurate text sequence is obtained.
In this embodiment, the prior vector of the text sequence may be derived from deep speech information obtained by processing a word matrix of the text sequence, and therefore, the step S101A may obtain the prior vector of the text sequence by:
S101A-1, obtaining a word matrix of the text sequence.
The word matrix represents words which can be formed by texts in the text sequence. In an alternative embodiment, the word matrix of the text sequence may be constructed by the following embodiments.
S101A-1-1, acquiring a text sequence.
S101A-1-2, constructing a plurality of matrixes to be initialized according to the text sequence.
And the rows and the columns of each square matrix respectively correspond to the text sequences one by one.
For example, assuming that the text sequence is "i'm born in the people's republic of china," 11 by 11 square matrices may be constructed for the text sequence. A square matrix as shown in fig. 5 includes 11 rows and 11 columns, wherein 11 rows of the square matrix correspond to 11 texts in "i'm birth in the people's republic of china" one by one, and 11 columns of the square matrix also correspond to 11 texts in "i'm birth in the people's republic of china" one by one.
S101A-1-3, respectively selecting mutually different texts for a plurality of matrixes from the text sequence as initial texts of the matrixes.
And the initial text of each square matrix and the text corresponding to each line of the square matrix form a plurality of text pairs.
Illustratively, based on the square matrix shown in fig. 5, assuming that the starting text of the square matrix is "i am in" the birth of the people's republic of china ", a total of 11 text pairs as shown in fig. 6, of" medium-me "," medium-in "," medium-hua "," medium-person "," medium-min "," medium-co "," medium-sum "," medium-country "," medium-out "," medium-birth ", can be composed.
Similarly, other texts in "i am born in the people's republic of china" may be used as the starting texts of other squares.
S101A-1-4, for each square matrix, initializing each line of the square matrix according to text segments intercepted from the text sequence by the text pairs of each line of the square matrix.
If the text of each line can form a vocabulary from the text segments intercepted from the text sequence, initializing the position corresponding to the text segment in the line to a first preset numerical value, and initializing the position not corresponding to the text segment in the line to a second preset numerical value.
Illustratively, continuing with the text pair "Zhongmin" on line 6 in FIG. 6, the text segment of the text pair taken from the text sequence "I am born in the people's republic of China" is "people of China", and "people of China" belongs to a vocabulary in the text, and the position of "people of China" in line 6 in the square matrix is in 3-6 columns, so that 3-6 columns of line 6 in the square matrix are initialized to the first preset value of 1, and the rest to the second preset value of 0.
In the same manner, columns 3-9 of row 9 in the square matrix may be initialized to a first predetermined value of 1 and the remainder to a second predetermined value of 0.
Of course, the first preset value 1 and the second preset value 0 in the above examples are only examples provided for easy understanding, and those skilled in the art can appropriately adjust the first preset value and the second preset value as needed.
In addition, when determining whether a text segment can form a vocabulary, in some embodiments, a semantic analysis model may be used to determine the text segment, that is, the semantic analysis model may score the semantics of the input text segment, and when the text segment can form a vocabulary, a higher score is given, and otherwise, a lower score is given.
In some embodiments, the text processing device may further match the text segment with a pre-configured vocabulary library, and if the matching is successful, it means that the text segment may constitute a vocabulary, otherwise it is not a vocabulary. Wherein the vocabulary library collects the vocabulary that the individual texts may compose.
S101A-1-5, all initialized matrixes are used as word matrixes of the text sequence.
After the word matrix of the text sequence is obtained by the implementation provided in step S101A-1, it can be converted into a prior vector by the implementation provided in step S101A-2:
S101A-2, converting the word matrix into a prior vector, wherein the prior vector has the same vector dimension as the word vector, the position vector and the segment vector of the text sequence.
In some embodiments, the text vector model further comprises a feature extraction layer constructed with artificial neurons. Based on the feature extraction layer, the text processing equipment inputs a word matrix of a text sequence into feature extraction to be calculated, so that a prior vector of the text sequence is obtained. Therefore, compared with the shallow semantic information carried by the word matrix of the text sequence, the prior vector processed by the feature extraction layer has the deep semantic information of the text sequence.
With continued reference to step S103A in fig. 2, in this step, the vector input to the Bert layer includes a prior vector of the text sequence, a word vector of the text sequence, a position vector, and a segment vector, and the present embodiment fuses the above vectors by the following implementation, that is, step S103A includes:
S103A-1, fusing the word vector, the position vector, the segment vector and the prior vector of the text sequence to obtain a fusion vector.
In some embodiments, the text processing device fuses the word vector, the position vector, the segment vector, and the prior vector of the text sequence in a summation manner to obtain a fused vector.
S103A-2, inputting the fusion vector into the Bert layer to obtain a text vector of the text sequence.
As the fusion vector carries richer semantic information, the conversion effect of the Bert hierarchy text sequence can be improved.
It should also be understood that the text vector model is obtained by training a model to be trained using sample text, wherein the model to be trained includes a feature extraction layer and a Bert layer. Therefore, the present embodiment further provides a model training method applied to a model training device, as shown in fig. 7, the method includes:
S101B, obtaining a sample word matrix of the sample sequence and a word vector, a position vector and a segment vector of the sample sequence.
The sample word matrix represents words which can be formed by texts of the sample sequence, and the word vector, the position vector and the segment vector of the sample sequence are generated according to the convention of the Bert layer on the input vector. In addition, the construction method of the sample word matrix is the same as that of the word matrix of the text sequence constructed in the using process of the text vector model, and this embodiment is not repeated in any detail.
S102B, inputting the sample word matrix into the feature extraction layer, and obtaining the prior vector of the sample sequence.
S103B, inputting the word vector, the position vector, the segment vector and the prior vector of the sample sequence into the Bert layer to obtain the calculation result of the Bert layer on the sample sequence.
S104B, judging whether the calculation result meets a preset convergence condition; if so, step S106B is executed, otherwise, step S105B is executed, and the process returns to step S102B.
And S105B, adjusting the model parameters of the Bert layer and the feature extraction layer according to the calculation result.
In this embodiment, the parameters of the model to be trained are adjusted by using the existing training mode of the Bert model and the existing loss function during the training, and therefore details of the related implementation are not repeated in this embodiment.
And S106, 106B, using the trained model to be trained as a text vector model.
Therefore, after the model to be trained is trained through the above embodiment, the trained text vector model can convert the text sequence by using the prior vector of the text sequence as auxiliary information.
Based on the same inventive concept as the text vector generation method, the present embodiment also provides a device related thereto, including:
the embodiment also provides a text vector generation device, which is applied to text processing equipment, wherein the text processing equipment is configured with a text vector model, and the text vector model comprises a Bert layer. Wherein, the text vector generating device comprises at least one functional module which can be stored in a memory in a software form. As shown in fig. 8, functionally divided, the text vector generating apparatus may include:
the prior information module 201 is configured to obtain a prior vector of the text sequence, where the prior vector carries prior information of a vocabulary, and the vocabulary is composed of texts in the text sequence.
In this embodiment, the prior information module 201 is configured to implement step S101A in fig. 2, and as for the prior information module 201, the detailed description may refer to the detailed description of step S101A in fig. 2.
The vector generating module 202 is configured to generate a word vector, a position vector, and a segment vector of the text sequence according to the convention of the Bert layer on the input vector.
The vector generation module 202 is further configured to input the word vector, the position vector, the segment vector, and the prior vector of the text sequence into the Bert layer, so as to obtain a text vector of the text sequence.
In this embodiment, the vector generation module 202 is configured to implement steps S102A-S103A in fig. 2, and for the prior information module 201, the detailed description may refer to the detailed description of steps S102A-S103A in fig. 2.
Furthermore, it is noted that in some embodiments, the text vector generation apparatus may further include other software functional modules for implementing other steps or sub-steps of the text vector generation method. In other embodiments, the above prior information module 201 and the vector generation module 202 can also be used to implement other steps or sub-steps of the text vector generation method, and this embodiment is not particularly limited thereto.
The embodiment further provides a text processing device, which includes a processor and a memory, where the memory stores a computer program, and when the computer program is executed by the processor, the text vector generation method is implemented.
The present embodiment also provides a computer-readable storage medium, in which a computer program is stored, and when the computer program is executed by a processor, the method for generating a text vector is implemented.
It should be noted that the terms "first," "second," "third," and the like are used merely to distinguish one description from another, and are not intended to indicate or imply relative importance. Furthermore, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
In the embodiments provided in the present application, it should also be understood that the disclosed apparatus and method may be implemented in other ways. The apparatus embodiments described above are merely illustrative, and for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above description is only for various embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of changes or substitutions within the technical scope of the present application, and all such changes or substitutions are included in the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (10)

1. A text vector generation method is applied to a text processing device, the text processing device is provided with a text vector model, the text vector model comprises a Bert layer, and the method comprises the following steps:
acquiring a prior vector of a text sequence, wherein the prior vector carries prior information of a vocabulary, and the vocabulary is formed by texts in the text sequence;
generating a word vector, a position vector and a segment vector of the text sequence according to the convention of the Bert layer on an input vector;
and inputting the word vector, the position vector, the segment vector and the prior vector of the text sequence into the Bert layer to obtain the text vector of the text sequence.
2. The method of claim 1, wherein the obtaining a prior vector of a text sequence comprises:
acquiring a word matrix of the text sequence, wherein the word matrix represents words which can be formed by texts in the text sequence;
converting the word matrix into the prior vector, wherein the prior vector has the same vector dimensions as a word vector, a position vector, and a segment vector of the text sequence.
3. The text vector generation method of claim 2, wherein the text vector model further comprises a feature extraction layer, and wherein converting the word matrix into the prior vector comprises:
and inputting the word matrix into the feature extraction layer to obtain the prior vector.
4. The method of claim 2, wherein the obtaining a word matrix of the text sequence comprises:
acquiring the text sequence;
constructing a plurality of square matrixes to be initialized according to the text sequences, wherein rows and columns of each square matrix respectively correspond to the text sequences one by one;
respectively selecting different texts for the square matrixes from the text sequence as initial texts of the square matrixes, wherein the initial texts of each square matrix and the texts corresponding to each line of the square matrixes form a plurality of text pairs;
initializing each line of the square matrix according to text segments intercepted from the text sequence by the text pairs of each line of the square matrix, wherein if the text segments intercepted from the text sequence by the text pairs of each line can form words, the position corresponding to the text segments in the line is initialized to a first preset numerical value, and the position not corresponding to the text segments in the line is initialized to a second preset numerical value;
and taking all initialized matrixes as word matrixes of the text sequences.
5. The method of generating text vectors according to claim 1, wherein the inputting the word vector, the position vector, the segment vector, and the prior vector of the text sequence into the Bert layer to obtain the text vector of the text sequence comprises:
fusing the word vector, the position vector, the segment vector and the prior vector of the text sequence to obtain a fused vector;
and inputting the fusion vector into the Bert layer to obtain a text vector of the text sequence.
6. The method of claim 5, wherein the fusing the word vector, the position vector, the segment vector, and the prior vector of the text sequence to obtain a fused vector comprises:
and fusing the word vector, the position vector, the segment vector and the prior vector of the text sequence in a summing mode to obtain the fusion vector.
7. A model training method is applied to training equipment, the training equipment is provided with a model to be trained, the model to be trained comprises a feature extraction layer and a Bert layer which are sequentially connected, and the method comprises the following steps:
acquiring a sample word matrix of a sample sequence and a word vector, a position vector and a segment vector of the sample sequence, wherein the sample word matrix represents words which can be formed by texts of the sample sequence, and the word vector, the position vector and the segment vector of the sample sequence are generated according to the convention of the Bert layer on an input vector;
inputting the sample word matrix into the feature extraction layer to obtain a prior vector of the sample sequence;
inputting the word vector, the position vector, the segment vector of the sample sequence and the prior vector of the sample sequence into the Bert layer to obtain a calculation result of the Bert layer on the sample sequence;
if the calculation result does not meet the preset convergence condition, returning to the step of inputting the sample word matrix to the feature extraction layer and obtaining the prior vector of the sample sequence after adjusting the model parameters of the Bert layer and the feature extraction layer according to the calculation result;
and if the calculation result does not meet the preset convergence condition, taking the trained model to be trained as the text vector model.
8. A text vector generation apparatus applied to a text processing device configured with a text vector model including a Bert layer, the text vector generation apparatus comprising:
the device comprises a prior information module, a text sequence analysis module and a text analysis module, wherein the prior information module is used for acquiring a prior vector of the text sequence, and the prior vector carries prior information of a vocabulary which is composed of texts in the text sequence;
the vector generation module is used for generating a word vector, a position vector and a segment vector of the text sequence according to the convention of the Bert layer on an input vector;
the vector generation module is further configured to input the word vector, the position vector, the segment vector, and the prior vector of the text sequence to the Bert layer, so as to obtain a text vector of the text sequence.
9. A text processing apparatus comprising a processor and a memory, the memory storing a computer program which, when executed by the processor, implements the text vector generation method of any one of claims 1 to 6.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when executed by a processor, implements the text vector generation method of any one of claims 1 to 6.
CN202210290851.9A 2022-03-23 2022-03-23 Text vector generation method, model training method and related device Pending CN114611511A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210290851.9A CN114611511A (en) 2022-03-23 2022-03-23 Text vector generation method, model training method and related device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210290851.9A CN114611511A (en) 2022-03-23 2022-03-23 Text vector generation method, model training method and related device

Publications (1)

Publication Number Publication Date
CN114611511A true CN114611511A (en) 2022-06-10

Family

ID=81865138

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210290851.9A Pending CN114611511A (en) 2022-03-23 2022-03-23 Text vector generation method, model training method and related device

Country Status (1)

Country Link
CN (1) CN114611511A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115392192A (en) * 2022-10-27 2022-11-25 北京中科汇联科技股份有限公司 Text coding method and system for hybrid neural network and character information

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115392192A (en) * 2022-10-27 2022-11-25 北京中科汇联科技股份有限公司 Text coding method and system for hybrid neural network and character information
CN115392192B (en) * 2022-10-27 2023-01-17 北京中科汇联科技股份有限公司 Text coding method and system for hybrid neural network and character information

Similar Documents

Publication Publication Date Title
WO2020238985A1 (en) Model training method, dialogue generation method, apparatus and device, and storage medium
CN111090736B (en) Question-answering model training method, question-answering method, device and computer storage medium
CN113011189A (en) Method, device and equipment for extracting open entity relationship and storage medium
CN110263353B (en) Machine translation method and device
CN110147435B (en) Dialogue generation method, device, equipment and storage medium
CN109344242B (en) Dialogue question-answering method, device, equipment and storage medium
EP4113357A1 (en) Method and apparatus for recognizing entity, electronic device and storage medium
CN115309877B (en) Dialogue generation method, dialogue model training method and device
CN111223476B (en) Method and device for extracting voice feature vector, computer equipment and storage medium
US20230127787A1 (en) Method and apparatus for converting voice timbre, method and apparatus for training model, device and medium
CN111813923A (en) Text summarization method, electronic device and storage medium
CN112463942A (en) Text processing method and device, electronic equipment and computer readable storage medium
CN110942774A (en) Man-machine interaction system, and dialogue method, medium and equipment thereof
CN116467417A (en) Method, device, equipment and storage medium for generating answers to questions
CN115312034A (en) Method, device and equipment for processing voice signal based on automaton and dictionary tree
CN114611511A (en) Text vector generation method, model training method and related device
CN113689868B (en) Training method and device of voice conversion model, electronic equipment and medium
CN112800339B (en) Information stream searching method, device and equipment
CN113793599A (en) Training method of voice recognition model and voice recognition method and device
US20230317058A1 (en) Spoken language processing method and apparatus, and storage medium
CN113486160B (en) Dialogue method and system based on cross-language knowledge
CN112686059B (en) Text translation method, device, electronic equipment and storage medium
CN113569585A (en) Translation method and device, storage medium and electronic equipment
CN112560466A (en) Link entity association method and device, electronic equipment and storage medium
CN111091011A (en) Domain prediction method, domain prediction device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination