CN111581335B

CN111581335B - Text representation method and device

Info

Publication number: CN111581335B
Application number: CN202010406112.2A
Authority: CN
Inventors: 李伟康
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2020-05-14
Filing date: 2020-05-14
Publication date: 2023-11-24
Anticipated expiration: 2040-05-14
Also published as: CN111581335A

Abstract

The application relates to the technical field of computers, in particular to a text representation method and a text representation device, which are used for obtaining word vector representations of words in a text to be processed; obtaining the original word vector representation of each word segmentation in the text to be processed; fusing the character vector representation of each character with the corresponding original word vector representation of each word to obtain the fused vector representation of each word; according to the fusion vector representation of each word, the text vector representation of the text to be processed is obtained, so that words are fused, text representation information can be enriched, and the accuracy of text representation is improved.

Description

Text representation method and device

Technical Field

The present application relates to the field of computer technologies, and in particular, to a text representation method and apparatus.

Background

The text representation method refers to a vectorization method of text, and it is necessary to represent the text as a vector containing semantic information to facilitate applications such as classification, retrieval, and recommendation, and how to accurately represent the text.

In the related art, a text representation method mainly uses a word or a word as a minimum unit, namely a meta unit, and then converts the word or the word into a vector representation, and further obtains the whole vector representation of a sentence text by using a related network.

Disclosure of Invention

The embodiment of the application provides a text representation method and a text representation device, which are used for improving the accuracy of text representation.

The specific technical scheme provided by the embodiment of the application is as follows:

one embodiment of the present application provides a text representing method, including:

obtaining a word vector representation of each word in the text to be processed;

obtaining the original word vector representation of each word segmentation in the text to be processed;

fusing the character vector representation of each character with the corresponding original word vector representation of each word to obtain the fused vector representation of each word;

and obtaining the text vector representation of the text to be processed according to the fusion vector representation of each word.

Another embodiment of the present application provides a text representing apparatus including:

the first obtaining module is used for obtaining word vector representations of words in the text to be processed;

the second obtaining module is used for obtaining the primitive word vector representation of each word in the text to be processed;

the fusion module is used for fusing the character vector representation of each character with the corresponding original word vector representation of each word to obtain the fusion vector representation of each word;

and the third obtaining module is used for obtaining the text vector representation of the text to be processed according to the fusion vector representation of each word.

Another embodiment of the application provides an electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the steps of any of the text representation methods described above when the program is executed.

Another embodiment of the present application provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of any of the text representation methods described above.

In the embodiment of the application, the word vector representation of each word and the original word vector representation of each word in the text to be processed are obtained, the word vector representation of each word and the corresponding original word vector representation of each word are fused, the fusion vector representation of each word is obtained, and the text vector representation of the text to be processed is obtained according to the fusion vector representation of each word, so that word information in the text can be fully mined by fusing word information, information among words in the word is considered, more accurate and information-rich meta-unit representation can be obtained, the information representation of the text can be enriched, and the accuracy of the text vector representation is improved.

Drawings

FIG. 1 is a schematic diagram of an application architecture of a text representation method according to an embodiment of the present application;

FIG. 2 is a flow chart of a text representation method in an embodiment of the application;

FIG. 3 is a diagram illustrating a vector subtraction operation according to an embodiment of the present application;

FIG. 4 is a diagram illustrating a vector multiplication operation according to an embodiment of the present application;

FIG. 5 is a diagram illustrating a vector addition operation according to an embodiment of the present application;

FIG. 6 is a schematic diagram of vector concatenation operation according to an embodiment of the present application;

FIG. 7 is a schematic diagram of fusion operations by RNN model according to an embodiment of the present application;

FIG. 8 is a schematic diagram of a fusion operation by a CNN model in an embodiment of the present application;

FIG. 9 is a schematic diagram of a fusion operation by a feedforward neural network model in an embodiment of the present application;

FIG. 10 is a diagram illustrating the tensor inner product calculation operation according to an embodiment of the present application;

FIG. 11 is a schematic diagram of a text presentation device according to an embodiment of the present application;

fig. 12 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments, but not all embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

To facilitate an understanding of embodiments of the present application, several concepts will be briefly described as follows:

short text: the text with too short length in the video field can be a text with shorter length in other application fields, such as question and answer sentences in intelligent equipment, and the text to be processed in the embodiment of the application mainly aims at short text, can solve the problem of insufficient information of the short text, enriches the information representation of the short text, and is certainly not limited to the short text.

Word fusion: information representing the full mining and exploitation of words and phrases.

The element: the smallest unit constituting a sentence, e.g., a word.

Text representation: the representation is vectorization of the text, which is represented as a vector containing semantic information.

Artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use the knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.

The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.

Natural language processing (Nature Language processing, NLP) is an important direction in the fields of computer science and artificial intelligence. It is studying various theories and methods that enable effective communication between a person and a computer in natural language. Natural language processing is a science that integrates linguistics, computer science, and mathematics. Thus, the research in this field will involve natural language, i.e. language that people use daily, so it has a close relationship with the research in linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic questions and answers, knowledge graph techniques, and the like. For example, the embodiment of the application mainly relates to a natural language processing technology, which can be used for carrying out word segmentation in a text to be processed, carrying out coding operation on words or segmentation and the like, further fusing the words, and processing by adopting a neural network model during fusion processing, so as to obtain a fused meta-unit representation of the text to be processed, and modeling the whole text to be processed to obtain a text vector representation. In addition, in the embodiment of the application, the text vector representation of the text to be processed can be obtained by utilizing the sentence classification, machine translation and other technologies in the natural language processing technology, and then the operations such as recognition, classification, translation generation and the like can be carried out on the text to be processed.

With research and progress of artificial intelligence technology, research and application of artificial intelligence technology are developed in various fields, such as common smart home, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned, automatic driving, unmanned aerial vehicles, robots, smart medical treatment, smart customer service, etc., and with development of technology, artificial intelligence technology will be applied in more fields and become more and more important.

The scheme provided by the embodiment of the application mainly relates to the natural language processing technology of artificial intelligence, and is specifically described by the following embodiments:

the vector representation of the text is very important in business applications such as classification, retrieval, recommendation and the like, in the related art, a text representation method mainly takes a word or a word as a meta unit, then the word or the word is converted into a single-hot (one-hot) coding form through a dictionary, or is converted into a dense vector by means of word2vec and the like, and then a neural network is utilized to carry out integral modeling on sentences to obtain final text vector representation, but in the related art, text representation is carried out by directly taking a word or a word as a meta unit, information among words is ignored, and single word ambiguity is large, so that text information cannot be accurately represented, for example, a word can be composed of a plurality of words, if a user single word is represented as a meta unit, current context information cannot be accurately represented, especially for short texts, and because the content of the word is limited, it is very important to fully mine fusion information among words.

Therefore, in view of the above problems, the embodiment of the present application provides a new text representation method, which obtains a word vector representation of each word and a primitive word vector representation of each word in a text to be processed, fuses the word vector representation of each word and the primitive word vector representation of each corresponding word to obtain a fused vector representation of each word, and further obtains a text vector representation of the text to be processed according to the fused vector representation of each word, so that, by combining word information, word information and fusion information thereof, a vector representation of a meta unit of the text to be processed, that is, the fused vector representation of each word, enriches meta unit representation information, and finally obtains an overall text vector representation, thereby optimizing text representation capability and improving text representation accuracy and representation quality.

Referring to fig. 1, an application architecture diagram of a text representation method in an embodiment of the present application includes a terminal 100 and a server 200.

The terminal 100 may be any intelligent device such as a smart phone, a tablet computer, a portable personal computer, a desktop computer, a smart television, and a smart robot, and various Application programs (APP) may be installed on the terminal 100, for example, a user wants to search for a video, through the video APP in the terminal 100, a search text is input, the terminal 100 may send the search text to the server 200, the server 200 fuses word information based on the text presentation method in the embodiment of the present application, obtains a fused vector representation of each word of the search text, and obtains a text vector representation of the search text, and further searches for a video associated with the matched text according to the text vector representation of the search text, and returns the searched video to the terminal 100, and the terminal 100 displays the video returned by the server 200.

The server 200 can provide various network services for the terminal 100, and for different application programs, the server 200 can be regarded as a corresponding background server, wherein the server 200 can be a server, a server cluster formed by a plurality of servers or a cloud computing center.

The terminal 100 and the server 200 may be connected to each other through the internet to realize communication therebetween. Optionally, the internet described above uses standard communication techniques and/or protocols. The internet is typically the internet, but may be any network including, but not limited to, a local area network (Local Area Network, LAN), metropolitan area network (Metropolitan Area Network, MAN), wide area network (Wide Area Network, WAN), mobile, wired or wireless network, private network, or any combination of virtual private networks. In some embodiments, data exchanged over the network is represented using techniques and/or formats including HyperText Mark-up Language (HTML), extensible markup Language (Extensible Markup Language, XML), and the like. All or some of the links may also be encrypted using conventional encryption techniques such as secure socket layer (Secure Socket Layer, SSL), transport layer security (Transport Layer Security, TLS), virtual private network (Virtual Private Network, VPN), internet protocol security (Internet Protocol Security, IPsec), and the like. In other embodiments, custom and/or dedicated data communication techniques may also be used in place of or in addition to the data communication techniques described above.

It should be noted that, in the embodiments of the present application, the text representation method may be executed by the server 200, or may be executed by the terminal 100, or may be executed by both the terminal 100 and the server 200, which is not limited in the embodiments of the present application, so that the server 200 obtains the text to be processed from the terminal 100, determines the word vector representation of each word in the text to be processed and the primitive word vector representation of each word, performs word fusion, obtains the fusion vector representation of each word, obtains the text vector representation of the text to be processed according to the fusion vector representation of each word, and further performs related service processing based on the text vector representation of the text to be processed.

It should be noted that, the application architecture diagram in the embodiment of the present application is to more clearly illustrate the technical solution in the embodiment of the present application, and is not limited to the technical solution provided by the embodiment of the present application, and is not limited to the representation of short text, and is not limited to the application in the business fields of video, intelligent customer service, translation, etc., but for other application architectures and business applications, the technical solution provided by the embodiment of the present application is also applicable to similar problems.

In various embodiments of the present application, a text representation method is schematically illustrated as applied to the application architecture shown in fig. 1.

Based on the above embodiments, referring to fig. 2, a flowchart of a text representation method according to an embodiment of the present application is shown, where the method is described by taking a server as an example, and specifically the method includes:

step 200: a word vector representation of each word in the text to be processed is obtained.

Specifically, the text to be processed is word-divided, and each word is encoded based on a trained machine learning model, and mapped into a word vector representation containing context information.

The machine learning model may be a word2vec model, a golve model, etc., which is not limited in the embodiment of the present application, the machine learning model is pre-trained, and each word is mapped to a word vector representation by using a word vector table obtained by the pre-training method. word2vec model can map the word or word to K dimension vector space by using context information to obtain the vector representation of the word or word, while glove model is more prone to analyzing the co-occurrence relation between the context, abstract the word vector through the co-occurrence relation, the co-occurrence representation is that whether one word appears near another word or not, the so-called nearby is the concept of a moving window, after defining the radius of the window (distance from the center word to the edge), the number of words appearing in the range of the square circle is determined, and the co-occurrence is the co-occurrence.

Step 210: and obtaining the primitive word vector representation of each word in the text to be processed.

When executing step 210, the method specifically includes: and adopting a word segmentation tool to segment the text to be processed to obtain each word segment of the text to be processed, and coding each word segment based on a trained machine learning model to obtain the original word vector representation of each word segment.

The word segmentation tool can be a joba word segmentation tool, the joeba word segmentation is mainly based on a statistical dictionary, a prefix dictionary is constructed, the prefix dictionary is utilized to segment texts to obtain all segmentation possibilities, a directed acyclic graph is constructed according to segmentation positions, and a maximum probability path is calculated through a dynamic programming algorithm to obtain a final segmentation form.

The word2vec model, the golve model and the like can also be adopted as the encoded machine learning model, the embodiment of the application is not limited, the machine learning model is pre-trained, and word vectors containing context information are mapped into word vectors by utilizing a word vector table obtained by a pre-training method.

Of course, the specific implementation manner of the method is not limited, and related technology can be adopted for processing for obtaining the word vector representation of each word and the original word vector representation of each word in the text to be processed.

Step 220: and fusing the character vector representation of each character with the corresponding original word vector representation of each word to obtain the fused vector representation of each word.

When executing step 220, the method specifically includes:

s1, respectively aiming at each word segment, carrying out fusion processing on the word vector representation of the word corresponding to the word segment, and obtaining the intermediate vector representation corresponding to each word segment.

S2, respectively aiming at each word segment, carrying out fusion processing on the original word vector representation and the intermediate vector representation of the word segment to obtain the fusion vector representation of each word segment.

That is, in the embodiment of the application, the main purpose is to fuse word information, firstly, the word vector representations corresponding to the word are fused to obtain a new intermediate vector representation of the word based on the word vector representation, and then the original word vector representation and the intermediate vector representation are fused to obtain a final fused vector representation of the word, so that the word and word information in the text can be fully mined, meta-unit information representation is enriched, and meta-unit representation accuracy is improved.

Step 230: and obtaining the text vector representation of the text to be processed according to the fusion vector representation of each word.

In the embodiment of the application, after the fusion vector representation of each word of the text to be processed is obtained, the fusion vector representation of each word is used as a meta unit representation, so as to obtain the text vector representation of the text to be processed.

The neural network model may be a recurrent neural network (Recurrent Neural Network, RNN), a convolutional neural network (Convolutional Neural Network, CNN), a recurrent neural network, an attention network, a graph neural network, or the like, which is not limited in the embodiment of the present application, and the text to be processed is modeled based on the neural network model to obtain an overall text vector representation of the text to be processed.

In the embodiment of the application, the word vector representation of each word and the original word vector representation of each word in the text to be processed are obtained, the word vector representation of each word and the corresponding original word vector representation of each word are fused, the fusion vector representation of each word is obtained, and then the text vector representation of the text to be processed is obtained according to the fusion vector representation of each word, so that words and words are fused, word information in the text can be fully mined, more accurate and rich meta representations are obtained, the information representation of the text is further enriched, and especially, the text representation can be more accurately performed aiming at the problem of short text information shortage.

Based on the above embodiments, the implementation manner of the intermediate vector representation and the fusion vector representation in the step 220 in the embodiment of the present application is specifically described below, and is specifically divided into the following two parts:

A first part: and respectively aiming at each word segment, carrying out fusion processing on the character vector representation of the character corresponding to the word segment to obtain the intermediate vector representation corresponding to each word segment.

Specifically: s1, carrying out fusion processing on character vector representations of characters corresponding to the segmentation, wherein the fusion processing specifically comprises the following steps: and representing the word vector of the word corresponding to the segmentation, and performing one or more of the following operations: vector subtraction operation, vector multiplication operation, vector addition operation, vector concatenation operation, and vector fusion operation by inputting the vector into a neural network model.

For example, referring to fig. 3, in an embodiment of the present application, a vector subtraction operation is performed on a word vector representation of a word corresponding to a word, that is, corresponding elements of the word vector representation corresponding to the word are subtracted one by one to obtain a subtracted vector, for example, the word "hello" is used for the word corresponding to "you" and "good" respectively, the word vector corresponding to "you" is denoted as a1, the word vector corresponding to "good" is denoted as a2, and the corresponding elements of a1 and a2 are subtracted to obtain a new vector.

For example, referring to fig. 4, in an embodiment of the present application, a vector multiplication operation is performed on a word vector representation of a word corresponding to a word, that is, vector corresponding elements are multiplied one by one, so as to obtain a new vector.

For example, referring to fig. 5, in an embodiment of the present application, a vector addition operation is performed on a word vector representation of a word corresponding to a word, that is, elements corresponding to the vector are added one by one, so as to obtain a new vector.

For example, referring to fig. 6, which is a schematic diagram of vector concatenation operation in the embodiment of the present application, vector concatenation operation is performed on word vector representations of words corresponding to the word segmentation, that is, the vectors are concatenated together end to obtain a new vector, for example, the word segmentation "hello" is performed on "you" and "good" respectively, the word vector corresponding to "you" is denoted as a1, the word vector corresponding to "good" is denoted as a2, and the new vector obtained by concatenating a1 and a2 is denoted as a1a2.

The method comprises the steps of inputting a trained neural network model, specifically, respectively inputting character vector representations of characters corresponding to each word segmentation into the trained neural network to obtain a new vector corresponding to the word segmentation.

In the embodiment of the application, the neural network model can be RNN, CNN or feedforward neural network, and the like, is not limited, and word vector representations can be fused by modeling the neural network model to obtain new vectors based on word vector representations by word segmentation.

For example, referring to fig. 7, in an embodiment of the present application, a word vector of a word corresponding to a word is represented by an RNN model fusion operation schematic diagram, and the fused vector, that is, a new vector of the word, is output.

For example, referring to fig. 8, in an embodiment of the present application, a CNN model fusion operation is illustrated, a word vector of a word corresponding to a word is input to a CNN model, and a fused vector, that is, a new vector of the word is output.

For example, referring to fig. 9, a schematic diagram of a fusion operation of a feedforward neural network model according to an embodiment of the present application is shown, a word vector of a word corresponding to a word is input to the feedforward neural network model, and a fused vector, that is, a new vector of the word is output.

It should be noted that, the number of words corresponding to one word may be plural, and is not limited to two in the above example, and generally, the number of words corresponding to one word is greater than two, and the neural network model has better effect when modeling the input into a plurality of vectors, so that the word vector representation of the word corresponding to the word is fused, the neural network model may be adopted, vector subtraction, vector addition, vector multiplication and vector concatenation are all applicable to the operation of a plurality of vectors, and a plurality of vectors may be fused into one vector, thereby realizing the fusion of the information of the words corresponding to the word.

Of course, the embodiment of the present application is not limited to the above-mentioned several fusion operations, and may be performed in other manners, which is not limited in the embodiment of the present application.

S2, obtaining intermediate vector representations corresponding to the segmentation words.

Specifically, two cases can be distinguished:

first case: if one operation is performed, the word vector representation after the operation corresponding to each word is used as the intermediate vector representation of the corresponding word.

That is, in the embodiment of the present application, only one operation may be adopted to implement word information fusion, for example, only a vector addition operation is adopted, and then a vector addition operation is performed on a word vector representation of a word corresponding to a word segment, and a vector is obtained after the addition operation, that is, the vector is used as an intermediate vector representation of the corresponding word segment.

Second case: if at least two operations are performed, respectively obtaining the character vector representation after each operation corresponding to each word segment, respectively performing parallel operation on the character vector representation after each operation corresponding to each word segment, and obtaining the intermediate vector representation of the corresponding word segment.

In the embodiment of the application, when the character vector representation fusion processing is performed, multiple operations are adopted at the same time, and finally vectors obtained after the multiple operations are fused in parallel, for example, vector addition and vector subtraction are adopted, a new vector is obtained after the vector addition operation, a new vector is obtained after the vector subtraction, and then the two new vectors are subjected to the parallel operation, so that the intermediate vector representation of the corresponding segmentation is obtained.

In addition, if only one word corresponds to one word, for example, after the word segmentation is carried out on the text to be processed ' i love sky ', each word obtained is ' i ', ' love ' sky ', and only one word corresponds to ' i ' and ' love ' word, the word vectors corresponding to the words ' i ' and ' love ' can be directly represented as intermediate vectors corresponding to the word segments ' i ' and ' love '.

A second part: and respectively carrying out fusion processing on the original word vector representation and the intermediate vector representation of each word segment to obtain the fusion vector representation of each word segment.

Specifically:

s1, carrying out fusion processing on an original word vector representation and an intermediate vector representation of a word segmentation, wherein the fusion processing specifically comprises the following steps: the original word vector representation and the intermediate vector representation of the word segmentation are subjected to one or more of the following operations: vector subtraction operation, vector multiplication operation, vector addition operation, vector concatenation operation, tensor inner product calculation operation.

When the original word vector representation and the intermediate vector representation of the word are fused, the vector subtraction operation, the vector multiplication operation, the vector addition operation and the vector parallel operation are the same as the principles of fig. 3-6, except that the fusion of the original word vector representation and the intermediate vector representation is performed for two vectors.

For tensor inner product calculation operation, for example, referring to fig. 10, which is a schematic diagram of tensor inner product calculation operation in the embodiment of the present application, tensor inner product calculation is performed on the primitive word vector representation and the intermediate vector representation of the word, and the calculation result is the fusion vector representation of the word.

It should be noted that, since the primitive word vector representation and the intermediate vector representation of the word are fusion processing for two vectors, and tensor inner product is generally simpler and more efficient to calculate for two vectors, the calculation of fusion vector representation suitable for word segmentation is compared, and vector subtraction operation, vector multiplication operation, vector addition operation, vector concatenation operation are also suitable for two vector calculation.

In addition, the embodiment of the present application is not limited to the above-mentioned several fusion operations, and other methods may be used to perform fusion operations on the primitive word vector representation and the intermediate vector representation, which is not limited in the embodiment of the present application.

S2, obtaining fusion vector representations of the segmentation words.

Specifically, two cases can be distinguished:

first case: if one operation is performed, the original word vector representation and the intermediate vector representation after the operation corresponding to each word are used as fusion vector representations corresponding to the words.

Second case: if at least two operations are performed, respectively obtaining an original word vector representation and an intermediate vector representation after the operations corresponding to each word segment, and respectively performing parallel operation on the original word vector representation and the intermediate vector representation after the operations corresponding to each word segment to obtain a fusion vector representation corresponding to the word segment.

That is, in the embodiment of the present application, one or more operation modes may be adopted to perform fusion processing on the primitive word vector representation and the intermediate vector representation, which is not limited in the embodiment of the present application.

In this way, in the embodiment of the application, various mechanisms are adopted to perform information fusion, the words of the segmented words are fused firstly, then the intermediate vector representation after word fusion and the original word vector representation are fused, the fusion vector representation of the final segmented words is obtained, word information in the text can be fully mined, richer meta-unit representation can be obtained, and the accuracy of meta-unit representation is improved.

Further, for different application scenarios, other information can be fused to obtain text vector representations of the text to be processed, so as to enrich the information representations of the text, and in particular, several possible implementation manners are provided in the embodiment of the present application:

First embodiment: according to the fusion vector representation of each word, obtaining a text vector representation of the text to be processed, specifically comprising:

obtaining user portrait characteristic information of a user corresponding to the text to be processed; and obtaining the text vector representation of the text to be processed according to the fusion vector representation of each word and the user portrait characteristic information.

For example, short text in the video field is often related to the corresponding video content, and different users have different attitudes, such as like or dislike, on the same video, so that when the short text is represented by a vector, user portrait information, such as age, occupation, hobbies, gender, etc., can be fused into the representation of the short text, where the embodiment of the present application is not limited.

Specifically, for example, user portrait characteristic information is obtained, after modeling is performed through a neural network model, various user portrait characteristic information can be fused to obtain fused vectors, and then the fused vectors and fused vector representations of various word segments are simultaneously input into the neural network model, so that text vector representations of final texts to be processed are obtained.

Therefore, the user portrait characteristic information is fused in the text vector representation of the text to be processed, so that a personalized text representation method can be constructed, more accurate content can be provided when the user searches videos or recommends platform users, and user experience is optimized.

Second embodiment: according to the fusion vector representation of each word, obtaining a text vector representation of the text to be processed, specifically comprising:

obtaining context multi-modal information of a text to be processed; and obtaining the text vector representation of the text to be processed according to the fusion vector representation of each word and the context multi-modal information.

The context multi-modal information may be a picture, a video, an audio, etc., which is not limited in the embodiment of the present application.

In the embodiment of the application, the multi-modal information existing in the context environment of the text to be processed can be considered, for example, the context multi-modal information and the fusion vector representation of each word are input into the neural network model for modeling, so that the text vector representation of the final text to be processed is output, and the context multi-modal information is fused in the text vector representation, so that the vectorization representation of the text can be further enriched.

Further, based on the above embodiment, the following description will be made with specific application scenarios, and the text representation method in the embodiment of the present application may be applied to different service fields, such as video recommendation, classification, generation, identification, and retrieval, without limitation, so that after obtaining the text vector representation of the text to be processed in the embodiment of the present application, a possible implementation manner is further provided, and corresponding service processing is performed on the text to be processed according to the text vector representation of the text to be processed and the specified service target.

Thus, due to the text representation method in the embodiment of the application, text representation information can be enriched, the accuracy and the representation quality of text representation can be improved, and further, when related business processing is carried out based on final text vector representation, the recognition and generation capacity of the text of related business can be optimized, and the text classification, search and recommendation accuracy and effect and the like can be improved.

Based on the same inventive concept, the embodiment of the application also provides a text representing device, which can be a hardware structure, a software module or a hardware structure plus a software module. Based on the above embodiments, referring to fig. 11, the text representing apparatus in the embodiment of the present application specifically includes:

a first obtaining module 1100, configured to obtain a word vector representation of each word in the text to be processed;

a second obtaining module 1110, configured to obtain an primitive word vector representation of each word segment in the text to be processed;

The fusion module 1120 is configured to fuse the word vector representation of each word with the corresponding primitive word vector representation of each word to obtain a fused vector representation of each word;

and a third obtaining module 1130, configured to obtain a text vector representation of the text to be processed according to the fused vector representation of each word segment.

Optionally, when the word vector representation of each word and the corresponding primitive word vector representation of each word are fused to obtain the fused vector representation of each word, the fusion module 1120 is specifically configured to:

respectively carrying out fusion processing on the character vector representations of the characters corresponding to the segmented words aiming at each segmented word to obtain intermediate vector representations corresponding to the segmented words;

and respectively carrying out fusion processing on the original word vector representation and the intermediate vector representation of each word segment to obtain the fusion vector representation of each word segment.

Optionally, when the word vector representation of the word corresponding to the segmentation is fused, the fusion module 1120 is specifically configured to:

and representing the word vector of the word corresponding to the segmentation, and performing one or more of the following operations: vector subtraction operation, vector multiplication operation, vector addition operation, vector concatenation operation, and vector fusion operation by inputting the vector into a neural network model.

Optionally, when obtaining the intermediate vector representation corresponding to each word segment, the fusion module 1120 is specifically configured to:

if one operation is performed, the word vector representation after the operation corresponding to each word is used as the intermediate vector representation of the corresponding word;

and if at least two operations are performed, respectively obtaining the character vector representations after the operations corresponding to the segmented words, respectively performing parallel operation on the character vector representations after the operations corresponding to the segmented words, and obtaining the intermediate vector representations corresponding to the segmented words.

Optionally, when the word segmentation primitive word vector representation and the intermediate vector representation are fused, the fusion module 1120 is specifically configured to:

the original word vector representation and the intermediate vector representation of the word segmentation are subjected to one or more of the following operations: vector subtraction operation, vector multiplication operation, vector addition operation, vector concatenation operation, tensor inner product calculation operation.

Optionally, when obtaining the fusion vector representation of each word segment, the fusion module 1120 is specifically configured to:

if one operation is carried out, the original word vector representation and the intermediate vector representation which are subjected to the operation and correspond to each word are used as fusion vector representations of the corresponding word;

And if at least two operations are performed, respectively obtaining the original word vector representation and the intermediate vector representation after the operations corresponding to the respective word segments, and respectively performing parallel operation on the original word vector representation and the intermediate vector representation after the operations corresponding to the respective word segments to obtain the fusion vector representation corresponding to the word segments.

Optionally, when obtaining the text vector representation of the text to be processed according to the fusion vector representation of each word, the third obtaining module 1130 is specifically configured to:

obtaining user portrait characteristic information of a user corresponding to the text to be processed;

and obtaining the text vector representation of the text to be processed according to the fusion vector representation of each word and the user portrait characteristic information.

obtaining context multi-modal information of the text to be processed;

and obtaining the text vector representation of the text to be processed according to the fusion vector representation of each word and the context multi-modal information.

In the embodiment of the application, the richer meta-representation of the text to be processed can be obtained by fusing the information of each word and each word in the text to be processed, so that more accurate text vector representation can be obtained, the text representation is enriched, the text representation quality is improved, and the recognition, the generation and other business applications of the text to be processed are facilitated.

Based on the above embodiments, referring to fig. 12, a schematic structural diagram of an electronic device according to an embodiment of the present application is shown.

The embodiment of the present application provides an electronic device, which may be a terminal or a server, and the embodiment of the present application is described by taking the electronic device as a server, where the electronic device may include a processor 1210 (Center Processing Unit, CPU), a memory 1220, an input device 1230, an output device 1240, and the like.

The memory 1220 may include Read Only Memory (ROM) and Random Access Memory (RAM), and provides the processor 1210 with program instructions and data stored in the memory 1220. In an embodiment of the present application, the memory 1220 may be used to store a program of any text representation method in the embodiment of the present application.

Processor 1210 is operative to perform any one of the text presentation methods of the embodiments of the present application in accordance with the obtained program instructions by invoking the program instructions stored in memory 1220.

Based on the above embodiments, in the embodiments of the present application, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the text representation method in any of the above method embodiments.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the application.

It will be apparent to those skilled in the art that various modifications and variations can be made to the embodiments of the present application without departing from the spirit or scope of the embodiments of the application. Thus, if such modifications and variations of the embodiments of the present application fall within the scope of the claims and the equivalents thereof, the present application is also intended to include such modifications and variations.

Claims

1. A method of text representation, comprising:

obtaining word vector representation of each word in a text to be processed, wherein the text to be processed is a short text;

for each word segment, the following operations are respectively executed to obtain respective fusion vector representations of each word segment: performing a first fusion operation on respective word vectors of each word in one word segment to obtain an intermediate vector representation of the one word segment, and performing a second fusion operation on an original word vector representation of the one word segment and a corresponding intermediate vector representation to obtain a fusion vector representation of the one word segment; wherein the first fusing operation includes at least one of: vector subtraction operation, vector fusion operation is carried out by inputting the vector subtraction operation into a neural network model, and the second fusion operation comprises at least one of the following operations: vector subtraction operation and tensor inner product calculation operation; if a first fusion operation is carried out, the word vector representation after the operation corresponding to the word segmentation is used as the intermediate vector representation of the corresponding word segmentation; if at least two first fusion operations are carried out, respectively obtaining the character vector representations after each operation corresponding to one word segmentation, and respectively carrying out parallel operation on the character vector representations after each operation corresponding to one word segmentation to obtain the intermediate vector representation corresponding to the word segmentation; if a second fusion operation is carried out, the original word vector representation and the intermediate vector representation which are subjected to the operation corresponding to the word segmentation are used as fusion vector representations corresponding to the word segmentation; if at least two second fusion operations are carried out, respectively obtaining an original word vector representation and an intermediate vector representation after the operation corresponding to the one word segmentation, and respectively carrying out parallel operation on the original word vector representation and the intermediate vector representation after the operation corresponding to the one word segmentation to obtain a fusion vector representation corresponding to the word segmentation;

And extracting context multi-modal information from the context environment of the text to be processed, and obtaining text vector representation of the text to be processed based on the fusion vector of each word and the context multi-modal information.

2. The method of claim 1, wherein after obtaining the respective fusion vector representations of the respective tokens, the method further comprises:

3. A text presentation device, comprising:

the first obtaining module is used for obtaining word vector representations of words in a text to be processed, wherein the text to be processed is a short text;

the fusion module is used for respectively executing the following operations aiming at each word segment to obtain respective fusion vector representations of each word segment: performing a first fusion operation on respective word vectors of each word in one word segment to obtain an intermediate vector representation of the one word segment, and performing a second fusion operation on an original word vector representation of the one word segment and a corresponding intermediate vector representation to obtain a fusion vector representation of the one word segment; wherein the first fusing operation includes at least one of: vector subtraction operation, vector fusion operation is carried out by inputting the vector subtraction operation into a neural network model, and the second fusion operation comprises at least one of the following operations: vector subtraction operation and tensor inner product calculation operation; if a first fusion operation is carried out, the word vector representation after the operation corresponding to the word segmentation is used as the intermediate vector representation of the corresponding word segmentation; if at least two first fusion operations are carried out, respectively obtaining the character vector representations after each operation corresponding to one word segmentation, and respectively carrying out parallel operation on the character vector representations after each operation corresponding to one word segmentation to obtain the intermediate vector representation corresponding to the word segmentation; if a second fusion operation is carried out, the original word vector representation and the intermediate vector representation which are subjected to the operation corresponding to the word segmentation are used as fusion vector representations corresponding to the word segmentation; if at least two second fusion operations are carried out, respectively obtaining an original word vector representation and an intermediate vector representation after the operation corresponding to the one word segmentation, and respectively carrying out parallel operation on the original word vector representation and the intermediate vector representation after the operation corresponding to the one word segmentation to obtain a fusion vector representation corresponding to the word segmentation;

And the third obtaining module is used for extracting context multi-modal information from the context environment of the text to be processed and obtaining text vector representation of the text to be processed based on the fusion vector of each word and the context multi-modal information.

4. The apparatus of claim 3, wherein after obtaining the respective fusion vector for each of the respective tokens, the third obtaining module is further configured to:

5. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method of any of claims 1-2 when the program is executed by the processor.

6. A computer-readable storage medium having stored thereon a computer program, characterized by: the computer program implementing the steps of the method of any of claims 1-2 when executed by a processor.