CN111104799A - Text information representation method and system, computer equipment and storage medium - Google Patents

Text information representation method and system, computer equipment and storage medium Download PDF

Info

Publication number
CN111104799A
CN111104799A CN201910981528.4A CN201910981528A CN111104799A CN 111104799 A CN111104799 A CN 111104799A CN 201910981528 A CN201910981528 A CN 201910981528A CN 111104799 A CN111104799 A CN 111104799A
Authority
CN
China
Prior art keywords
sentence
training
sentence vector
word
vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910981528.4A
Other languages
Chinese (zh)
Other versions
CN111104799B (en
Inventor
侯晓龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Life Insurance Company of China Ltd
Original Assignee
Ping An Life Insurance Company of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Life Insurance Company of China Ltd filed Critical Ping An Life Insurance Company of China Ltd
Priority to CN201910981528.4A priority Critical patent/CN111104799B/en
Publication of CN111104799A publication Critical patent/CN111104799A/en
Application granted granted Critical
Publication of CN111104799B publication Critical patent/CN111104799B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

The invention belongs to the field of artificial intelligence, and relates to a text information representation method, a text information representation system, computer equipment and a storage medium, wherein the method comprises the following steps: obtaining a corpus to be analyzed, performing word segmentation pretreatment on the corpus to be analyzed, and respectively generating corresponding word vectors based on the obtained words, wherein the corpus to be analyzed is text information which comprises at least one sentence; obtaining word vectors of participles contained in each statement in the corpus to be analyzed to obtain a word vector group of each statement, sequentially inputting the word vectors in the word vector group into an initial sentence vector algorithm model according to a sequence, and generating an initial sentence vector of a corresponding statement; and inputting the initial sentence vector into a pre-trained sentence vector model to obtain a final sentence vector of each sentence, wherein the final sentence vector is used for representing text information, and the pre-trained sentence vector model is generated based on the context of the sentence. The scheme provided by the invention can avoid the influence caused by different semantics of words in different sentences, and can more accurately represent the text information.

Description

Text information representation method and system, computer equipment and storage medium
Technical Field
The embodiment of the invention belongs to the technical field of artificial intelligence, and particularly relates to a text information representation method and system, computer equipment and a storage medium.
Background
In the field of natural language processing, text information representation is the basis for solving the text processing problem, and in the prior art, Word vector addition and average based on Word2Vec are generally adopted as a text information representation method, but the semantics of the same Word in different sentences and different contexts are different, so that the text information representation based on the Word vector is inaccurate and is not suitable for the representation of text information such as article information in the field of information stream recommendation.
Disclosure of Invention
In view of this, embodiments of the present invention provide a method and a system for characterizing text information, a computer device, and a storage medium, so as to solve the problem in the prior art that the representation of text information based on word vectors is not accurate enough and is not suitable for the representation of text information such as article information in the field of information stream recommendation.
In a first aspect, an embodiment of the present invention provides a text information characterization method, including:
obtaining a corpus to be analyzed, performing word segmentation pretreatment on the corpus to be analyzed, and respectively generating corresponding word vectors based on the obtained words, wherein the corpus to be analyzed is text information which comprises at least one sentence;
obtaining word vectors of participles contained in each statement in the corpus to be analyzed to obtain a word vector group of each statement, sequentially inputting the word vectors in the word vector group into an initial sentence vector algorithm model according to a sequence, and generating an initial sentence vector of a corresponding statement;
and inputting the initial sentence vector into a pre-trained sentence vector model to obtain a final sentence vector of each sentence, wherein the final sentence vector is used for representing text information, and the pre-trained sentence vector model is generated based on the context of the sentence.
As an implementable manner of the present invention, before the obtaining of the corpus to be analyzed, the method further includes a step of performing model training on the pre-trained sentence vector model, wherein a training process of the pre-trained sentence vector model includes:
acquiring a training corpus set, performing word segmentation pretreatment on corpora in the training corpus set, and respectively generating corresponding word vectors based on the obtained word segments, wherein the training corpus set is a training text information set, and the training text information comprises at least one training sentence;
acquiring word vectors of participles contained in each training sentence to obtain a word vector group of each training sentence, and sequentially inputting the word vectors in the word vector group of the training sentences into the initial sentence vector algorithm model in sequence to generate corresponding initial sentence vectors of the training sentences;
and inputting the initial sentence vector corresponding to each training sentence into an initial sentence vector model for training based on the context corresponding to each training sentence in the training corpus set to obtain the pre-trained sentence vector model.
As an implementable aspect of the present invention, the obtaining of the pre-trained sentence vector model by inputting the initial sentence vector corresponding to each training sentence into the initial sentence vector model based on the context corresponding to each training sentence in the training corpus set includes:
configuring a parameter matrix of the initial sentence vector model, wherein the parameter matrix is connected with an input layer and an output layer of the initial sentence vector model;
generating training samples and test samples according to the context corresponding to each training sentence, wherein the training samples and the test samples respectively comprise K1 sentence groups and K2 sentence groups, each sentence group comprises at least one training sentence used for generating an input sentence vector and at least one training sentence used for generating an output sentence vector, and K1 and K2 are positive integers;
sequentially inputting the input sentence vector in each sentence group in the training sample into the initial sentence vector model for training, gradually adjusting the parameters in the parameter matrix until the sentence group in the training sample is trained, and gradually matching the output of the initial sentence vector model with the corresponding output sentence vector in the sentence group;
and testing the initial sentence vector model after training through the test sample, and finishing the training of the initial sentence vector model if the test is passed, so as to obtain the trained sentence vector model.
As an implementation manner of the present invention, the inputting the initial sentence vector to a pre-trained sentence vector model to obtain a final sentence vector of each sentence includes: and inputting the initial sentence vector into the pre-trained sentence vector model, and multiplying the initial sentence vector of the corpus to be analyzed by the parameter matrix to obtain a final sentence vector for representing the text information of the corpus to be analyzed.
As a way in which the present invention may be implemented, the initial sentence vector model may be a skip-gram model or a cbow model.
As an implementable manner of the present invention, the connecting corpus performs word segmentation preprocessing on corpus in the corpus to obtain a group of words, and generating corresponding word vectors for all the obtained words respectively includes:
performing word segmentation on the corpus in the corpus by adopting a preset word segmentation algorithm, and executing word stop operation on word segmentation results to obtain a word bank with the number of segmented words being N, wherein N is a positive integer;
and inputting the N participles in the word stock into a preset word vector model to obtain word vectors of the N participles.
As a practical mode of the present invention, the initial sentence vector algorithm model is a GRU algorithm model.
In a second aspect, an embodiment of the present invention provides a text information characterization system, including:
the word vector generation module is used for acquiring linguistic data to be analyzed, performing word segmentation pretreatment on the linguistic data to be analyzed, and respectively generating corresponding word vectors based on the obtained word segments, wherein the linguistic data to be analyzed is text information; the text information comprises at least one sentence;
the initial sentence vector generation module is used for acquiring word vectors of participles contained in each sentence in the corpus to be analyzed to obtain a word vector group of each sentence, and sequentially inputting the word vectors in the word vector group into the initial sentence vector algorithm model to generate an initial sentence vector of the corresponding sentence;
the text information representation module is used for inputting the initial sentence vector into a pre-trained sentence vector model to obtain a final sentence vector of each sentence, and the final sentence vector is used for representing text information; wherein the pre-trained sentence vector model is generated based on context relationships of sentences.
In a third aspect, an embodiment of the present invention provides a computer device, including:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores computer-readable instructions executable by the at least one processor, which, when executed by the at least one processor, cause the at least one processor to perform the steps of the textual information characterization method as described above.
In a fourth aspect, the present invention provides a computer-readable storage medium, on which computer-readable instructions are stored, and the computer-readable instructions, when executed by at least one processor, implement the steps of the text information characterization method as described above.
According to the text information representation method, the text information representation system, the computer equipment and the storage medium provided by the embodiment of the invention, the pre-trained sentence vector model is established based on the context relationship of the sentences to carry out text information representation at sentence level.
Drawings
In order to illustrate the solution of the invention more clearly, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are some embodiments of the invention, and that other drawings may be derived from these drawings by a person skilled in the art without inventive effort.
Fig. 1 is a flowchart of a text information characterization method according to an embodiment of the present invention;
FIG. 2 is a flow chart of generating word vectors according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a network node of a GRU algorithm model according to an embodiment of the present invention;
FIG. 4 is a flowchart of a training process for a pre-trained sentence vector model according to an embodiment of the present invention;
FIG. 5 is a flowchart of training an initial sentence vector model based on context of a training sentence according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of a text information representation system according to an embodiment of the present invention;
FIG. 7 is another schematic diagram of a textual information representation system according to an embodiment of the present invention;
FIG. 8 is a schematic structural diagram of a model training module according to an embodiment of the present invention;
fig. 9 is a block diagram of a computer device according to an embodiment of the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
The appearances of the phrase "an embodiment" in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
The embodiment of the invention provides a text information representation method, as shown in fig. 1, the text information representation method comprises the following steps:
s1, obtaining a corpus to be analyzed, performing word segmentation pretreatment on the corpus to be analyzed, and respectively generating corresponding word vectors based on the obtained words, wherein the corpus to be analyzed is text information, and the text information comprises at least one sentence;
s2, obtaining word vectors of participles contained in each statement in the corpus to be analyzed to obtain a word vector group of each statement, sequentially inputting the word vectors in the word vector group into an initial sentence vector algorithm model in sequence to generate an initial sentence vector of a corresponding statement;
s3, inputting the initial sentence vector to a pre-trained sentence vector model to obtain a final sentence vector of each sentence, wherein the final sentence vector is used for representing text information, and the pre-trained sentence vector model is generated based on the context of the sentence.
Specifically, in this embodiment of the present invention, the corpus to be analyzed in step S1 may be various text information from the internet or local in the terminal device, where for the obtaining of the word vector, in some embodiments of the present invention, as shown in fig. 2, the obtaining the corpus to be analyzed, performing word segmentation preprocessing on the corpus to be analyzed, and generating corresponding word vectors based on the obtained words respectively may specifically include:
s11, performing word segmentation on the corpus to be analyzed by adopting a preset word segmentation algorithm, and executing word stop-word operation on a word segmentation result to obtain a word bank with the number of segmented words being N, wherein N is a positive integer;
s12, inputting the N participles in the word stock into a preset word vector model to obtain word vectors of the N participles.
Specifically, the S11 may select different types of word segmentation algorithms for different languages, and for the chinese corpus, a word segmentation method based on string matching (mechanical word segmentation), a word segmentation method based on understanding, and a word segmentation method based on statistics, such as a shortest path word segmentation algorithm, a jieba word segmentation algorithm, and the like, may be used, which is not limited in this scheme.
In this embodiment, the step S12 may be implemented by using a word2vec model, specifically, N segmented words are sorted and respectively represented by one-hot vectors, for example, "traffic accident at some place, safe life and quick start special claim pre-claiming service" to obtain segmented words through the segmented word pre-processing of the step S11: the "place", "traffic", "accident", "safe life", "fast", "start", "special case", "pre-claim", "service", form a thesaurus of 9 segmented words, and the results are expressed by one-hot vectors after the 9 segmented words are sorted as follows:
somewhere → [1, 0, 0, 0, 0, 0, 0, 0, 0 ];
traffic → [0, 1, 0, 0, 0, 0, 0, 0, 0 ];
accident → [0, 0, 1, 0, 0, 0, 0, 0, 0 ];
safe life → [0, 0, 0, 1, 0, 0, 0, 0, 0 ];
rapidly → [0, 0, 0, 0, 1, 0, 0, 0, 0 ];
start → [0, 0, 0, 0, 0, 1, 0, 0, 0 ];
special case → [0, 0, 0, 0, 0, 0, 1, 0, 0 ];
preclaims → [0, 0, 0, 0, 0, 0, 0, 1, 0 ];
service → [0, 0, 0, 0, 0, 0, 0, 0, 1 ];
the dimension of the one-hot vector is the same as the number N of the participles in the word stock, the one-hot vector is used as the input of the word2vec model, specifically, the one-hot vector of one or more participles is input to the word2vec model by combining the context relationship of the participles in the word stock, the weight matrix initially set in the word2vec model is trained and optimized, the word vector of each participle is obtained according to the trained weight matrix, and specifically, the one-hot vector of each participle is multiplied by the trained weight matrix to obtain the corresponding word vector.
In the embodiment of the present invention, for the confirmation of the participle of each sentence in step S2, the same participle preprocessing method as that in step S1 is adopted to ensure the consistency of the participle result, and the number of sentences in the corpus to be analyzed is consistent with the number of initial sentence vectors obtained in step S2.
Regarding the obtaining of the initial sentence vector of each sentence, in an embodiment of the present invention, the generating the initial sentence vector of the corresponding sentence according to the word vector group may include: and averaging or weighted averaging is carried out on each word vector in the word vector group to obtain an initial sentence vector of the corresponding sentence. For the average word vector, for example, the above "traffic accident at a certain place, safe life and quick start special claim pre-claiming service" obtains the participles through the participle pre-processing of step S11: corresponding to 9 word vectors, directly averaging numerical values in the 9 word vectors to generate a new vector with the same dimension, namely the initial sentence vector; for the word vector weighted average, each participle occupies a certain weight in the whole word bank according to the occurrence frequency or the importance degree, for example, in 9 word vectors corresponding to "certain place", "traffic", "accident", "safe life", "quick", "start", "special case", "pre-claim", "service", the words such as "accident", "safe life", and "pre-claim" need to be more prominent in the text representation, so the weight importance is higher than other participles, the weight of the participle can be calculated according to the occurrence frequency of each participle in the history corpus, and the value in each word vector in each sentence is weighted averaged by the weight to generate a new vector with the same dimension, so that the corresponding initial sentence vector can be obtained.
As a way in which the present invention can be implemented, the initial sentence vector algorithm model may be a GRU algorithm model, and the following description specifically takes the initial sentence vector algorithm model as the GRU algorithm model as an example. The GRU algorithm is one of RNN convolutional neural networks, the network layer of the GRU algorithm model cascade includes a plurality of network nodes in cascade, the structure of each network node is the same, specifically referring to fig. 3, all sentences are stored in a certain order, if the corpus to be analyzed currently includes M sentences, S is usediThe ith sentence is expressed, the value range of i is 1 to M, the number of participles contained in each sentence is t, t is a positive integer, and the method is used for solving the problem that the sentence is not easy to be found in the past
Figure RE-GDA0002427048710000071
Sequentially represent a sentence SiIncluding each participle, and
Figure RE-GDA0002427048710000072
the word vector indicating each word segmentation, such as "traffic accident somewhere, safe life quick-start special claim service" includes two sentences, wherein the first sentence "traffic accident somewhere" is pre-segmented in step S11And (3) processing to obtain word segmentation: "somewhere", "traffic", and "accident", respectively
Figure RE-GDA0002427048710000081
Is shown at the same time as
Figure RE-GDA0002427048710000082
The word vectors representing the three participles, the second sentence "safe longevity quickly starts special claim pre-claiming service" obtains the participles through the participle pre-processing of step S11: "safe life", "fast", "start", "special case", "pre-claim" and "service", respectively, in the following
Figure RE-GDA0002427048710000083
Is shown at the same time as
Figure RE-GDA0002427048710000084
The word vectors representing these six participles are analogized to the more sentences contained in the corpus to be analyzed. When the word vectors are sequentially input into each network node of the GRU algorithm model for processing, the following formula is satisfied:
Figure RE-GDA0002427048710000085
Figure RE-GDA0002427048710000086
Figure RE-GDA0002427048710000087
Figure RE-GDA0002427048710000088
the network node of the GRU algorithm comprises an update gate and a reset gate, wherein the output of the update gate is rtThe output of the reset gate is ztReset gate r for the tth wordtAnd update gate ztFrom the word vector of the t-th word and the t-1 stepOutput h oftIs obtained by
Figure RE-GDA0002427048710000089
Indicating the currently required information (candidate state), htRepresenting all the information currently stored, wherein σ and tanh are activation functions, the activation function σ is used to compress the processing result between 0 and 1, the activation function tanh is used to compress the result between-1 and 1 for subsequent processing by the network node, ⊙ represents a Hadamard product, i.e. the product of corresponding elements, in the formula WrAnd UrRespectively represent inputs
Figure RE-GDA00024270487100000810
And a connection matrix of a last network node to the refresh gate; wzAnd UzRespectively represent inputs
Figure RE-GDA00024270487100000811
And a connection matrix of a last network node to the reset gate; w and U represent inputs, respectively
Figure RE-GDA00024270487100000812
And the last network node to the candidate state
Figure RE-GDA00024270487100000813
The connection matrix of (2); wherein the update gate can control the extent to which the status information of the previous network node is brought into the status information of the current network node, ztThe larger the value of (a), the more the state information of the previous processing node is brought in, the degree to which the reset gate control ignores the state information of the previous network node, rtThe smaller the value is, the more the word segmentation information is ignored, the reset gate and the update gate can effectively accumulate the word segmentation information contained in all the word vectors to the last network node for processing, and the result containing all the word segmentation information is obtained, namely the initial sentence vector.
In this embodiment of the present invention, as for step S3, before the obtaining the corpus to be analyzed, the method further includes a step of performing model training on the pre-trained sentence vector model, where as shown in fig. 4, a training process of the pre-trained sentence vector model includes:
s31, acquiring a training corpus set, performing word segmentation preprocessing on corpora in the training corpus set, and respectively generating corresponding word vectors based on the obtained word segments, wherein the training corpus set is a training text information set, and the training text information comprises at least one training sentence;
s32, obtaining word vectors of participles contained in each training sentence to obtain a word vector group of each training sentence, sequentially inputting the word vectors in the word vector group of each training sentence into the initial sentence vector algorithm model according to the sequence, and generating the initial sentence vector of the corresponding training sentence;
and S33, inputting the initial sentence vector corresponding to each training sentence into the initial sentence vector model for training based on the context corresponding to each training sentence in the training corpus set to obtain the pre-trained sentence vector model.
The training corpus can be an internet corpus such as Baidu encyclopedia and Wikipedia or other network corpus, such as various information websites, and the unsupervised model training of the algorithm model can be converted into supervised model training by utilizing the large-scale corpus of the internet, so that the effect of the algorithm model adopted in the scheme is effectively improved.
In this embodiment, the process of obtaining the participles and the word vectors of the participles in the corpus of the training sentences in step S31 is the same as that in step S1 described above, so as to ensure the consistency of the participle results, and similarly, the process of obtaining the initial sentence vectors of the training sentences in step S31 is the same as that in step S2, and the number of sentences in the corpus of the training sentences is the same as that of the initial sentence vectors obtained in step S32.
For step S33, as shown in fig. 5, the inputting an initial sentence vector corresponding to each training sentence into an initial sentence vector model for training based on the context corresponding to each training sentence in the training corpus set, and obtaining the pre-trained sentence vector model may specifically include:
s331, configuring a parameter matrix of the initial sentence vector model, wherein the parameter matrix is connected with an input layer and an output layer of the initial sentence vector model;
s332, generating training samples and inspection samples according to the context corresponding to each training sentence, wherein the training samples and the inspection samples respectively comprise K1 sentence groups and K2 sentence groups, each sentence group comprises at least one training sentence used for generating an input sentence vector and at least one training sentence used for generating an output sentence vector, K1 and K2 are positive integers, K1 and K2 can be equal or unequal, and K1 can be not less than K2, namely the number of the training samples is not less than the number of the inspection samples; the training sentences used for generating the input sentence vectors and the training sentences used for generating the output sentence vectors have a contextual relationship, such as the text "i call xx, i comes from xxx", wherein the sentences "i call xx" and the sentences "i come from xxx" have a precedence relationship (belong to the contextual relationship) in the language order, and at the moment, "i call xx" can be used as the sentences used for generating the input sentence vectors and "i come from xxx" can be used as the sentences used for generating the output sentence vectors.
S333, sequentially inputting the input sentence vectors in each sentence group in the training sample into the initial sentence vector model for training, gradually adjusting the parameters in the parameter matrix until the sentence groups in the training sample are trained, and gradually matching the output of the initial sentence vector model with the corresponding output sentence vectors in the sentence groups;
s334, the trained initial sentence vector model is tested through the test sample, and the training of the initial sentence vector model is completed after the test is passed, so that the trained sentence vector model is obtained.
Further, in this embodiment of the present invention, the inputting the initial sentence vector into a pre-trained sentence vector model to obtain a final sentence vector of each sentence includes: and inputting the initial sentence vector into the pre-trained sentence vector model, and multiplying the initial sentence vector of the corpus to be analyzed by the parameter matrix to obtain a final sentence vector for representing the text information of the corpus to be analyzed.
In this embodiment, the initial sentence vector model in the above description may be a skip-gram model or a cbow model. Specifically, for the skip-gram model, a statement having a context relationship with the statement is predicted by inputting a statement, and only one statement group contained in the training sample and the test sample is used as an input statement; for the cbow model, a sentence located in the middle of a plurality of sentences is predicted by inputting the plurality of sentences, the sentence has a context relationship with the input plurality of sentences, and the training sample and the test sample only contain one sentence group as an output sentence. In the embodiment, the initial sentence vector is corrected through the trained sentence vector model, and the representation of the text is more accurate due to the consideration of the context relationship of the sentences, so that when the method is applied to information stream pushing, the representation of the text information such as the title of news information is more accurate, and the reading conversion rate of the information is favorably improved.
According to the text information representation method provided by the embodiment of the invention, the sentence vector model is established based on the context of the sentence, and the text information representation at sentence level is carried out, so that the influence caused by different semantics of words in different sentences can be avoided in the representation process of the text information due to the consideration of the context of the sentence, and the representation of the text information is more accurate; in addition, the large-scale corpus of the internet can be utilized in the training process of the pre-trained sentence vector model, unsupervised training can be effectively converted into supervised training, the model training effect is effectively improved, and therefore the accuracy of representation of text information is improved.
The embodiment of the present invention provides a text information representation system, which can execute the text information representation method provided in the above embodiment, and as shown in fig. 6, the text information representation system includes a word vector generation module 10, an initial sentence vector generation module 20, and a text information representation module 30, where the word vector generation module 10 is configured to obtain a corpus to be analyzed, perform word segmentation preprocessing on the corpus to be analyzed, and generate corresponding word vectors based on the obtained word segments, where the corpus to be analyzed is text information; the text information comprises at least one sentence; the initial sentence vector generation module 20 is configured to obtain a word vector of a participle included in each sentence in the corpus to be analyzed, obtain a word vector group of each sentence, sequentially input the word vectors in the word vector group into the initial sentence vector algorithm model in order, and generate an initial sentence vector of a corresponding sentence; the text information representation module 30 is configured to input the initial sentence vector to a pre-trained sentence vector model to obtain a final sentence vector of each sentence, where the final sentence vector is used to represent text information; wherein the pre-trained sentence vector model is generated based on context relationships of sentences.
Specifically, in the embodiment of the present invention, the corpus to be analyzed processed in the word vector generation module 10 may be various text information from the internet or local to the terminal device. For the acquisition of word vectors, in some embodiments of the present invention, the word vector generation module 10 acquires a corpus to be analyzed, performs word segmentation preprocessing on the corpus to be analyzed, and specifically, when generating corresponding word vectors based on the obtained word segments, is configured to: and performing word segmentation on the corpus to be analyzed by adopting a preset word segmentation algorithm, executing word deactivation operation on a word segmentation result to obtain a word bank with the number of segmented words being N, wherein N is a positive integer, and inputting N segmented words in the word bank into a preset word vector model to obtain word vectors of the N segmented words. Specifically, the word vector generation module 10 may select different types of word segmentation algorithms for different languages, and for a chinese corpus, a word segmentation method based on string matching (mechanical word segmentation), a word segmentation method based on understanding, and a word segmentation method based on statistics, such as a shortest path word segmentation algorithm, a jieba word segmentation algorithm, and the like, may be used, which is not limited in this scheme.
In this embodiment, the initial sentence vector generation module 20 may use a word2vec model to generate a sentence vector, and a specific implementation process may refer to relevant contents in the foregoing method embodiment, which is not expanded herein. In addition, the initial sentence vector generation module 20 adopts the same word segmentation preprocessing method as the word vector generation module 10 for the confirmation of the segmentation of each sentence, so as to ensure the consistency of the segmentation result, and the number of sentences in the corpus to be analyzed is consistent with the number of initial sentence vectors obtained by the initial sentence vector generation module 20.
Regarding the obtaining of the initial sentence vector of each sentence, in an embodiment of the present invention, when the initial sentence vector generation module 20 generates the initial sentence vector of the corresponding sentence according to the word vector group, the initial sentence vector generation module is specifically configured to: and averaging or weighted averaging is carried out on each word vector in the word vector group to obtain an initial sentence vector of the corresponding sentence. For the weighted average mode, each participle occupies a certain weight in the whole word bank according to the occurrence frequency or the importance degree, the weight is used for carrying out weighted average on each word vector in each day sentence to obtain a corresponding initial sentence vector, and the processing processes of word vector averaging and word vector weighted averaging can refer to the related technical contents in the embodiment of the method and are not expanded here.
As a way in which the present invention may be implemented, the initial sentence vector algorithm model adopted by the initial sentence vector generation module 20 may be a GRU algorithm model, and for the description of the GRU algorithm model, reference may be made to relevant contents in the above method embodiment, which is not expanded herein.
In this embodiment of the present invention, as shown in fig. 7, the text information characterization system further includes a model training module 40, configured to perform model training on the pre-trained sentence vector model before the corpus to be analyzed is obtained, where as shown in fig. 4, a process of training the pre-trained sentence vector model by the model training module 40 includes:
acquiring a training corpus set through the word vector generation module 10, performing word segmentation preprocessing on corpora in the training corpus set, and respectively generating corresponding word vectors based on the obtained segmented words, wherein the training corpus set is a training text information set, and the training text information comprises at least one training sentence; obtaining word vectors of participles contained in each training sentence through the initial sentence vector generation module 20 to obtain a word vector group of each training sentence, and sequentially inputting the word vectors in the word vector group of the training sentence into the initial sentence vector algorithm model in order to generate an initial sentence vector of the corresponding training sentence; and finally, inputting the initial sentence vector corresponding to each training sentence into the initial sentence vector model for training based on the context corresponding to each training sentence in the training corpus set to obtain the pre-trained sentence vector model.
The corpus acquired by the word vector generation module 10 may be an internet corpus such as Baidu encyclopedia, Wikipedia, or other network corpus, such as various information websites, and the unsupervised model training of the algorithm model is converted into supervised model training by using the internet large-scale corpus, so as to effectively improve the effect of the algorithm model adopted in the scheme.
In the embodiment of the present invention, as shown in fig. 8, the model training module 40 may include a parameter matrix configuration unit 41, a sample generation unit 42, a model training unit 43, and a model verification unit 44; the parameter matrix configuration unit 41 is configured to configure a parameter matrix of the initial sentence vector model, where the parameter matrix connects an input layer and an output layer of the initial sentence vector model; the sample generating unit 42 is connected to the word vector generating module 10 and the initial sentence vector generating module 20, and configured to generate training samples and check samples according to context relationships corresponding to training sentences, where the training samples and the check samples respectively include K1 and K2 sentence groups, each sentence group includes at least one training sentence used for generating an input sentence vector and at least one training sentence used for generating an output sentence vector, where K1 and K2 are positive integers, and K1 and K2 may be equal or unequal, and K1 may be not less than K2, that is, the number of training samples is not less than the number of check samples; the model training unit 43 is configured to sequentially input the input sentence vector in each sentence set in the training sample into the initial sentence vector model for training, gradually adjust the parameters in the parameter matrix until the sentence set in the training sample is trained, and gradually match the output of the initial sentence vector model with the corresponding output sentence vector in the sentence set; the model checking unit 44 is configured to check the trained initial sentence vector model through the check sample, and complete training of the initial sentence vector model if the check is passed, so as to obtain the trained sentence vector model.
Further, the initial sentence vector is input into the text information representation module 30, so that the initial sentence vector of the corpus to be analyzed is multiplied by the parameter matrix to obtain a final sentence vector for representing the text information of the corpus to be analyzed.
As a way in which the present invention may be implemented, the initial sentence vector model may be a skip-gram model or a cbow model. Specifically, for the skip-gram model, a statement having a context relationship with the statement is predicted by inputting a statement, and only one statement group contained in the training sample and the test sample is used as an input statement; for the cbow model, a sentence located in the middle of a plurality of sentences is predicted by inputting the plurality of sentences, the sentence has a context relationship with the input plurality of sentences, and the training sample and the test sample only contain one sentence group as an output sentence. In the embodiment, the initial sentence vector is corrected through the trained sentence vector model, and the representation of the text is more accurate due to the consideration of the context relationship of the sentences, so that when the method is applied to information stream pushing, the representation of the text information such as the title of news information is more accurate, and the reading conversion rate of the information is favorably improved.
According to the text information representation system provided by the embodiment of the invention, the sentence vector model is established based on the context of the sentence, and the text information representation at sentence level is carried out, so that the influence caused by different semantics of words in different sentences can be avoided in the representation process of the text information due to the consideration of the context of the sentence, and the representation of the text information is more accurate; in addition, the large-scale corpus of the internet can be utilized in the training process of the pre-trained sentence vector model, unsupervised training can be effectively converted into supervised training, the model training effect is effectively improved, and therefore the accuracy of representation of text information is improved.
Embodiments of the present invention further provide a computer device, as shown in fig. 9, the computer device includes at least one processor 71, and a memory 72 communicatively connected to the at least one processor 71, one processor 71 is shown in fig. 7, and the memory 72 stores computer-readable instructions executable by the at least one processor 71, where the computer-readable instructions are executed by the at least one processor 71, so as to enable the at least one processor 71 to perform the steps of the text information characterization method as described above.
Specifically, the memory 72 in the embodiment of the present invention is a nonvolatile computer-readable storage medium, and can be used to store computer-readable instructions, a nonvolatile software program, a nonvolatile computer-executable program, and modules, such as program instructions/modules corresponding to the text information characterization method in the foregoing embodiment of the present application; the processor 71 executes various functional applications and performs data processing, namely, implementing the text information characterization method described in the above method embodiment, by executing the nonvolatile software program, the computer readable instructions and the modules stored in the memory 72.
In some embodiments, the memory 72 may include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required for at least one function; the data storage area may store data created during processing of the text information representation method, and the like. Further, the memory 72 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device;
in some embodiments, memory 72 may optionally include a remote memory located remotely from processor 71 and connectable to a computer device performing domain name filtering via a network, examples of which include, but are not limited to, the internet, an intranet, a local area network, a mobile communications network, and combinations thereof.
In an embodiment of the present invention, the computer device executing the text information representation method may further include an input system 73 and an output system 74; the input system 73 may obtain operation information of a user on a computer device, and the output system 74 may include a display device such as a display screen. In the embodiment of the present invention, the processor 71, the memory 72, the input system 73 and the output system 74 may be connected by a bus or other means, and fig. 7 illustrates the connection by a bus as an example.
According to the computer device provided by the embodiment of the present invention, when the processor 71 executes the codes in the memory 72, the steps of the text information characterization method in the above embodiment can be executed, and the technical effects of the above embodiment of the method are achieved, and the technical details not described in detail in the embodiment of the present invention can be referred to the technical contents provided in the embodiment of the method of the present application.
Embodiments of the present invention further provide a computer-readable storage medium, where computer-readable instructions are stored on the computer-readable storage medium, and when the computer-readable instructions are executed by at least one processor, the steps of the text information characterization method can be implemented, and when the steps of the method are executed, the technical effects of the above-mentioned method embodiments are achieved, and the technical details that are not described in detail in this embodiment may be referred to in the technical contents provided in the method embodiments of the present application.
The embodiment of the invention also provides a computer program product which can execute the text information representation method provided in the embodiment of the method of the application and has corresponding functional modules and beneficial effects of the execution method. For technical details that are not described in detail in this embodiment, reference may be made to the technical contents provided in the method embodiments of the present application.
It should be noted that, in the above embodiments of the present invention, each functional module may be integrated into one processing unit, or each functional module may exist alone physically, or two or more functional modules may be integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.
The integrated unit implemented in the form of a software functional unit may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium and includes several computer readable instructions for enabling a computer system (which may be a personal computer, a server, or a network system, etc.) or an intelligent terminal device or a Processor (Processor) to execute some steps of the methods according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
In the above embodiments provided by the present invention, it should be understood that the disclosed system and method may be implemented in other ways. For example, the system embodiments described above are merely illustrative, and for example, the division of the modules is merely a logical division, and other divisions may be realized in practice, for example, at least two modules or components may be combined or integrated into another system, or some features may be omitted, or not executed.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical modules, may be located in one place, or may be distributed on at least two network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
It is to be understood that the above-described embodiments are merely illustrative of some, but not restrictive, of the broad invention, and that the appended drawings illustrate preferred embodiments of the invention without limiting its scope. This invention may be embodied in many different forms and, on the contrary, these embodiments are provided so that this disclosure will be thorough and complete. Although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that various changes in the embodiments and modifications can be made, and equivalents may be substituted for elements thereof. All equivalent structures made by using the contents of the specification and the attached drawings of the invention can be directly or indirectly applied to other related technical fields, and are also within the protection scope of the patent of the invention.

Claims (10)

1. A method for characterizing textual information, comprising:
obtaining a corpus to be analyzed, performing word segmentation pretreatment on the corpus to be analyzed, and respectively generating corresponding word vectors based on the obtained words, wherein the corpus to be analyzed is text information which comprises at least one sentence;
obtaining word vectors of participles contained in each statement in the corpus to be analyzed to obtain a word vector group of each statement, sequentially inputting the word vectors in the word vector group into an initial sentence vector algorithm model according to a sequence, and generating an initial sentence vector of a corresponding statement;
and inputting the initial sentence vector into a pre-trained sentence vector model to obtain a final sentence vector of each sentence, wherein the final sentence vector is used for representing text information, and the pre-trained sentence vector model is generated based on the context of the sentence.
2. The method according to claim 1, wherein before the obtaining the corpus to be analyzed, the method further comprises a step of performing model training on the pre-trained sentence vector model, wherein the training process of the pre-trained sentence vector model comprises:
acquiring a training corpus set, performing word segmentation pretreatment on corpora in the training corpus set, and respectively generating corresponding word vectors based on the obtained word segments, wherein the training corpus set is a training text information set, and the training text information comprises at least one training sentence;
acquiring word vectors of participles contained in each training sentence to obtain a word vector group of each training sentence, and sequentially inputting the word vectors in the word vector group of the training sentences into the initial sentence vector algorithm model in sequence to generate corresponding initial sentence vectors of the training sentences;
and inputting the initial sentence vector corresponding to each training sentence into an initial sentence vector model for training based on the context corresponding to each training sentence in the training corpus set to obtain the pre-trained sentence vector model.
3. The method of claim 2, wherein the inputting an initial sentence vector corresponding to each training sentence into an initial sentence vector model for training based on the context corresponding to each training sentence in the corpus to obtain the pre-trained sentence vector model comprises:
configuring a parameter matrix of the initial sentence vector model, wherein the parameter matrix is connected with an input layer and an output layer of the initial sentence vector model;
generating training samples and test samples according to the context corresponding to each training sentence, wherein the training samples and the test samples respectively comprise K1 sentence groups and K2 sentence groups, each sentence group comprises at least one training sentence used for generating an input sentence vector and at least one training sentence used for generating an output sentence vector, and K1 and K2 are positive integers;
sequentially inputting the input sentence vector in each sentence group in the training sample into the initial sentence vector model for training, gradually adjusting the parameters in the parameter matrix until the sentence group in the training sample is trained, and gradually matching the output of the initial sentence vector model with the corresponding output sentence vector in the sentence group;
and testing the initial sentence vector model after training through the test sample, and finishing the training of the initial sentence vector model if the test is passed, so as to obtain the trained sentence vector model.
4. The method of claim 3, wherein the inputting the initial sentence vector into a pre-trained sentence vector model to obtain a final sentence vector for each sentence comprises: and inputting the initial sentence vector into the pre-trained sentence vector model, and multiplying the initial sentence vector of the corpus to be analyzed by the parameter matrix to obtain a final sentence vector for representing the text information of the corpus to be analyzed.
5. The method of claim 2, wherein the initial sentence vector model is a skip-gram model or a cbow model.
6. The method for characterizing text information according to claim 1, wherein the obtaining the corpus to be analyzed, performing word segmentation preprocessing on the corpus to be analyzed, and generating corresponding word vectors based on the obtained word segments respectively comprises:
performing word segmentation on the corpus to be analyzed by adopting a preset word segmentation algorithm, and executing word-stop operation on a word segmentation result to obtain a word bank with the number of segmented words being N, wherein N is a positive integer;
and inputting the N participles in the word stock into a preset word vector model to obtain word vectors of the N participles.
7. The method of characterizing textual information according to claim 1, wherein the initial sentence vector algorithm model is a GRU algorithm model.
8. A textual information characterization system, comprising:
the word vector generation module is used for acquiring linguistic data to be analyzed, performing word segmentation pretreatment on the linguistic data to be analyzed, and respectively generating corresponding word vectors based on the obtained word segments, wherein the linguistic data to be analyzed is text information; the text information comprises at least one sentence;
the initial sentence vector generation module is used for acquiring word vectors of participles contained in each sentence in the corpus to be analyzed to obtain a word vector group of each sentence, and sequentially inputting the word vectors in the word vector group into the initial sentence vector algorithm model to generate an initial sentence vector of the corresponding sentence;
the text information representation module is used for inputting the initial sentence vector into a pre-trained sentence vector model to obtain a final sentence vector of each sentence, and the final sentence vector is used for representing text information; wherein the pre-trained sentence vector model is generated based on context relationships of sentences.
9. A computer device, comprising:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores computer-readable instructions executable by the at least one processor, which, when executed by the at least one processor, cause the at least one processor to perform the steps of the textual information characterization method of any of claims 1 to 7.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium has computer-readable instructions stored thereon, which, when executed by at least one processor, implement the steps of the textual information characterization method according to any of claims 1 to 7.
CN201910981528.4A 2019-10-16 2019-10-16 Text information characterization method, system, computer equipment and storage medium Active CN111104799B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910981528.4A CN111104799B (en) 2019-10-16 2019-10-16 Text information characterization method, system, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910981528.4A CN111104799B (en) 2019-10-16 2019-10-16 Text information characterization method, system, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN111104799A true CN111104799A (en) 2020-05-05
CN111104799B CN111104799B (en) 2023-07-21

Family

ID=70421422

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910981528.4A Active CN111104799B (en) 2019-10-16 2019-10-16 Text information characterization method, system, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111104799B (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111639194A (en) * 2020-05-29 2020-09-08 天健厚德网络科技(大连)有限公司 Knowledge graph query method and system based on sentence vectors
CN111694941A (en) * 2020-05-22 2020-09-22 腾讯科技(深圳)有限公司 Reply information determining method and device, storage medium and electronic equipment
CN112926329A (en) * 2021-03-10 2021-06-08 招商银行股份有限公司 Text generation method, device, equipment and computer readable storage medium
CN113157853A (en) * 2021-05-27 2021-07-23 中国平安人寿保险股份有限公司 Problem mining method and device, electronic equipment and storage medium
WO2021151328A1 (en) * 2020-09-04 2021-08-05 平安科技(深圳)有限公司 Symptom data processing method and apparatus, and computer device and storage medium
CN113435582A (en) * 2021-06-30 2021-09-24 平安科技(深圳)有限公司 Text processing method based on sentence vector pre-training model and related equipment
CN114036272A (en) * 2021-10-29 2022-02-11 厦门快商通科技股份有限公司 Semantic analysis method and system for dialog system, electronic device and storage medium
CN114118085A (en) * 2022-01-26 2022-03-01 云智慧(北京)科技有限公司 Text information processing method, device and equipment
CN114358004A (en) * 2021-12-27 2022-04-15 有米科技股份有限公司 Marketing text generation method and device
CN114943220A (en) * 2022-04-12 2022-08-26 中国科学院计算机网络信息中心 Sentence vector generation method and duplicate checking method for scientific research establishment duplicate checking
WO2023024422A1 (en) * 2021-08-27 2023-03-02 平安科技(深圳)有限公司 Consultation session-based auxiliary diagnosis method and apparatus, and computer device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108280058A (en) * 2018-01-02 2018-07-13 中国科学院自动化研究所 Relation extraction method and apparatus based on intensified learning
WO2019056692A1 (en) * 2017-09-25 2019-03-28 平安科技(深圳)有限公司 News sentence clustering method based on semantic similarity, device, and storage medium
WO2019072166A1 (en) * 2017-10-10 2019-04-18 腾讯科技(深圳)有限公司 Semantic analysis method, device, and storage medium
CN110287312A (en) * 2019-05-10 2019-09-27 平安科技(深圳)有限公司 Calculation method, device, computer equipment and the computer storage medium of text similarity

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019056692A1 (en) * 2017-09-25 2019-03-28 平安科技(深圳)有限公司 News sentence clustering method based on semantic similarity, device, and storage medium
WO2019072166A1 (en) * 2017-10-10 2019-04-18 腾讯科技(深圳)有限公司 Semantic analysis method, device, and storage medium
CN108280058A (en) * 2018-01-02 2018-07-13 中国科学院自动化研究所 Relation extraction method and apparatus based on intensified learning
CN110287312A (en) * 2019-05-10 2019-09-27 平安科技(深圳)有限公司 Calculation method, device, computer equipment and the computer storage medium of text similarity

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111694941A (en) * 2020-05-22 2020-09-22 腾讯科技(深圳)有限公司 Reply information determining method and device, storage medium and electronic equipment
CN111694941B (en) * 2020-05-22 2024-01-05 腾讯科技(深圳)有限公司 Reply information determining method and device, storage medium and electronic equipment
CN111639194A (en) * 2020-05-29 2020-09-08 天健厚德网络科技(大连)有限公司 Knowledge graph query method and system based on sentence vectors
CN111639194B (en) * 2020-05-29 2023-08-08 天健厚德网络科技(大连)有限公司 Knowledge graph query method and system based on sentence vector
WO2021151328A1 (en) * 2020-09-04 2021-08-05 平安科技(深圳)有限公司 Symptom data processing method and apparatus, and computer device and storage medium
CN112926329A (en) * 2021-03-10 2021-06-08 招商银行股份有限公司 Text generation method, device, equipment and computer readable storage medium
CN112926329B (en) * 2021-03-10 2024-02-20 招商银行股份有限公司 Text generation method, device, equipment and computer readable storage medium
CN113157853A (en) * 2021-05-27 2021-07-23 中国平安人寿保险股份有限公司 Problem mining method and device, electronic equipment and storage medium
CN113157853B (en) * 2021-05-27 2024-02-06 中国平安人寿保险股份有限公司 Problem mining method, device, electronic equipment and storage medium
CN113435582A (en) * 2021-06-30 2021-09-24 平安科技(深圳)有限公司 Text processing method based on sentence vector pre-training model and related equipment
CN113435582B (en) * 2021-06-30 2023-05-30 平安科技(深圳)有限公司 Text processing method and related equipment based on sentence vector pre-training model
WO2023024422A1 (en) * 2021-08-27 2023-03-02 平安科技(深圳)有限公司 Consultation session-based auxiliary diagnosis method and apparatus, and computer device
CN114036272A (en) * 2021-10-29 2022-02-11 厦门快商通科技股份有限公司 Semantic analysis method and system for dialog system, electronic device and storage medium
CN114358004A (en) * 2021-12-27 2022-04-15 有米科技股份有限公司 Marketing text generation method and device
CN114118085B (en) * 2022-01-26 2022-04-19 云智慧(北京)科技有限公司 Text information processing method, device and equipment
CN114118085A (en) * 2022-01-26 2022-03-01 云智慧(北京)科技有限公司 Text information processing method, device and equipment
CN114943220B (en) * 2022-04-12 2023-01-10 中国科学院计算机网络信息中心 Sentence vector generation method and duplicate checking method for scientific research establishment duplicate checking
CN114943220A (en) * 2022-04-12 2022-08-26 中国科学院计算机网络信息中心 Sentence vector generation method and duplicate checking method for scientific research establishment duplicate checking

Also Published As

Publication number Publication date
CN111104799B (en) 2023-07-21

Similar Documents

Publication Publication Date Title
CN111104799A (en) Text information representation method and system, computer equipment and storage medium
Pane et al. A multi-lable classification on topics of quranic verses in english translation using multinomial naive bayes
US20230244704A1 (en) Sequenced data processing method and device, and text processing method and device
US11232358B1 (en) Task specific processing of regulatory content
US20210390370A1 (en) Data processing method and apparatus, storage medium and electronic device
CN110188195B (en) Text intention recognition method, device and equipment based on deep learning
CN111506732A (en) Text multi-level label classification method
CN107180084A (en) Word library updating method and device
CN109614611B (en) Emotion analysis method for fusion generation of non-antagonistic network and convolutional neural network
CN110968725B (en) Image content description information generation method, electronic device and storage medium
CN113255331B (en) Text error correction method, device and storage medium
CN112613555A (en) Object classification method, device, equipment and storage medium based on meta learning
CN110991515B (en) Image description method fusing visual context
CN110598869B (en) Classification method and device based on sequence model and electronic equipment
JP2021174503A (en) Limit attack method, device, and storage medium against naive bayes sorter
CN111859988A (en) Semantic similarity evaluation method and device and computer-readable storage medium
CN115064154A (en) Method and device for generating mixed language voice recognition model
CN111241843A (en) Semantic relation inference system and method based on composite neural network
CN112446217A (en) Emotion analysis method and device and electronic equipment
CN113761874A (en) Event reality prediction method and device, electronic equipment and storage medium
CN113609287A (en) Text abstract generation method and device, computer equipment and storage medium
CN112434143A (en) Dialog method, storage medium and system based on hidden state constraint of GRU (generalized regression Unit)
CN114692610A (en) Keyword determination method and device
CN110543569A (en) Network layer structure for short text intention recognition and short text intention recognition method
CN110569331A (en) Context-based relevance prediction method and device and storage equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant