CN114298037A - Text abstract acquisition method based on deep learning - Google Patents

Text abstract acquisition method based on deep learning Download PDF

Info

Publication number
CN114298037A
CN114298037A CN202111662780.2A CN202111662780A CN114298037A CN 114298037 A CN114298037 A CN 114298037A CN 202111662780 A CN202111662780 A CN 202111662780A CN 114298037 A CN114298037 A CN 114298037A
Authority
CN
China
Prior art keywords
text
keywords
semantic information
information
local
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111662780.2A
Other languages
Chinese (zh)
Inventor
张丽
遆敬苗
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Technology
Original Assignee
Beijing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Technology filed Critical Beijing University of Technology
Priority to CN202111662780.2A priority Critical patent/CN114298037A/en
Publication of CN114298037A publication Critical patent/CN114298037A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a text abstract acquisition method based on deep learning, which comprises the steps of firstly extracting key words of an original document; constructing an Encoder module to extract global semantic information; constructing a graph volume module to extract local semantic information; and the construction Decoder module generates a text abstract. The text summarization task is to refine and summarize mass text data, and the time cost for a user to browse the text data is saved by compressing the mass text data into a simple and visual summary. The method takes the key points as local features and the original text as global features to obtain rich semantic representation of the original text; the premise of generating the high-quality abstract is to understand the semantics of the original text; the weight among the features is updated by using the graph convolution, the transmission of semantic information is further promoted, and the meaningless message transmission is inhibited, so that the obtained semantic information of the original text can better reflect the central thought of the original text, the generated abstract can reflect the center of the original text, and the abstract without the central thought is prevented from being generated.

Description

Text abstract acquisition method based on deep learning
Technical Field
The invention belongs to the technical field of natural language processing, and particularly relates to a text abstract acquiring method based on deep learning.
Background
With the rapid development of the internet industry, more and more people rely on publishing and acquiring information from an internet platform, the daily contact text information of people is increased explosively, a large amount of information can be accessed quickly through the internet platform, but as the information of the network is extremely large and messy, people need to spend more time to screen the key information in the text. Therefore, it has become an urgent need to extract important content from a large amount of text information. Traditional text summarization mainly relies on manual summarization, requiring enormous time and labor costs. At the same time, it is impractical to simply rely on human labor to summarize text excerpts due to the explosive growth of text messages. Therefore, automatic text summarization, which is a technology for automatically summarizing text summarization by machine, is a popular field that is being actively researched.
The automatic text excerpts can be divided into two categories, which are extraction type text excerpts and generation type text excerpts, according to the output type. The extraction type text abstract extracts important segments from an original text and combines the important segments to form the abstract, so that the abstract text not only can effectively make the content concise and is convenient for people to understand, but also is simple to realize, and is the most mainstream, most applied and easiest method at present. However, this method has a non-negligible disadvantage in that adjacent segments in the text summary are not necessarily adjacent in content, and thus may cause semantic incoherence of the summary. In contrast, a generative text abstract not only extracts several existing segments from the original text, but also a condensed interpretation of the main content of the original text, possibly resulting in a vocabulary not present in the original text, which is more flexible than an abstract text abstract and closer to the process of manually summarizing the abstract. Generating text summaries requires understanding the original document, generating a concise summary that is highly readable, making the task difficult and challenging. Automatic text summarization may be divided into single document summarization and multiple document summarization according to document type. A single document digest generates a digest from a given one of the documents and a multiple document digest generates a digest from a given set of subject matter related documents. With the rapid development of artificial intelligence technology, natural language processing technology based on neural network and deep learning has also made remarkable development. Automatic text summarization has also received wide attention as an important field of natural language processing. More and more researchers are dedicated to realizing automatic text summarization by using a deep neural network, and the generated text summarization technology makes substantial progress to a certain extent, and meanwhile, the extraction text summarization technology is greatly improved. Despite the great advances in automatic text summarization technology in recent years, it is still far from sufficient to generate high quality summaries. For generative text summarization, a model is required to have a stronger ability to represent, understand and generate text. The existing generated text abstract also has the problems of readability, redundancy, information quantity, false information and the like.
Disclosure of Invention
The text summarization task is to refine and summarize mass text data, and the time cost for a user to browse the text data is saved by compressing the mass text data into a simple and visual summary. With more and more text information which people contact daily, the text abstract becomes an urgent need of people. Automatic text summarization is an important field in natural language processing, and aims to automatically summarize a concise, coherent, large-information and accurate text summary through a machine. With the development of deep learning, a text summarization technology is advanced to a certain extent, however, the technology is far from sufficient to meet the actual requirements of people, and for a computer, summarization is a very challenging task, and when the summarization is generated, the computer is required to understand the content of the original text after reading the original text, and the content is cut, cut and spliced according to the degree of urgency and urgency, so that a smooth short text is finally generated.
Aiming at the quality problem of the generated text abstract, the technology adopted by the invention provides a mode of fusing local information and global characteristics to strengthen the semantic representation of the model on the input text, thereby improving the generation quality of the abstract. The method comprises the following steps:
step 1, extracting keywords of an original text.
For texts, the topic idea of the whole text can be snooped through some keywords, and the method extracts a plurality of keywords representing the semantic content of the article as local information of the text. Because keywords are not given in a data set related to the text abstract, in the invention, the keywords of the original document need to be extracted first, and the method used by the invention is mainly based on an unsupervised thought.
The steps for extracting the original text key words are as follows:
step 1.1, considering the position information of words, the probability that the words appearing in the first sentence and the last sentence are keywords is higher, so that the first sentence and the last sentence of the document are respectively repeated for 3 times, and the word frequency of the keywords in the first sentence and the last sentence is increased.
And 1.2, segmenting the text, and selecting 20 words as candidate keywords by utilizing tf-idf statistical information of each word.
Step 1.3, the keywords that we want to obtain can represent the central thought of the original text as much as possible, and the keywords obtained by using the statistical information can not guarantee this, so we need to further screen the keywords obtained in step 1.2: vector representation d of the document is obtained using Doc2Vec, and vector representation w of the candidate keyword is obtained using Word2 Vec. And sorting the candidate keywords according to the cosine distances of w and d, and selecting key phrases close to the document from the initial candidate keywords, wherein the closer the keywords are to the document, the larger the description information amount is, so that the obtained keywords are ensured to be more relevant to the document.
Step 1.4 is to avoid redundancy of final keywords, that is, extracted keywords have the same meaning although having different expression modes, so we need to perform secondary screening on the keywords obtained in step 1.3: similarly, sorting is performed according to cosine distances among the candidate keywords, and only one keyword with the same semantic meaning is reserved.
And 2, constructing an Encoder module.
The purpose of this module is to encode, i.e. vectorize, the input text. The Encoder module of the invention uses the Encoder module of the Transformer to finally obtain the semantic representation of the original text with semantic features and context features, and the semantic representation becomes global semantic information.
And 3, constructing a graph convolution module.
The purpose of the module is the relationship integration of local semantic information. Semantic information of different keywords is obtained in the step 1, and in order to mine more effective local semantic features, the local features are added into the relational features by using a graph convolution method, so that the local semantic information with the relational information is obtained. In the graph convolution, the input comprises nodes and an adjacency matrix, wherein the nodes are local semantic information extracted in the step 1, the nodes are related, the adjacency matrix represents the degree of relation between the nodes, then the graph convolution is used for learning the relation weight between each keyword in a self-adaptive mode, after the adjacency matrix between the keywords is obtained, the adjacency matrix is multiplied by the initial semantic information to obtain relation characteristics, and then the relation characteristics and the initial characteristics are fused to obtain a new round of characteristics.
The method comprises the following steps:
step 3.1 obtains K local semantic information and 1 global semantic information as the nodes of the graph in step 1 and step 2.
Step 3.2 constructs the adjacency matrix of the graph, and initializes the adjacency matrix to 1.
And 3.3, the larger the difference between the local characteristic and the overall characteristic is, the more the local characteristic is outlier is, so that the difference between the local semantic information and the overall semantic information is calculated to construct a difference matrix, and the difference is utilized to dynamically update the weights of the edges of all the nodes in the graph. Firstly, repeating the global semantic information for K times, respectively obtaining the difference degree of K local semantic information and the global semantic information, and finally obtaining a difference matrix.
And 3.4, converting the difference matrix obtained in the step 3.3 into a matrix with the dimensionality of (K, K) by utilizing operations such as linear transformation and the like, and calling the matrix as an updated matrix.
Step 3.5 multiplies the updated matrix obtained in step 3.4 element by element with the adjacent matrix, and the purpose of the operation is to learn the adjacent matrix adaptively by using the updated matrix.
And 3.6, multiplying the adjacent matrix obtained in the step 3.5 by the node information to obtain the relation characteristic of the semantic information.
And 3.7, splicing the local relation characteristics obtained in the step 3.6 with the local semantic information of the nodes to obtain the local semantic information with the relation information.
Step 4, constructing a Decoder module
The purpose of this module is to generate a summary of the original text. The pointer generator network is a seq2seq model with a copying mechanism, which predicts words according to probability distribution of a generator and a pointer, wherein the generator predicts words of a current step mainly by using a background vector output by an encoder module, a hidden layer of the current step of the decoder and an output predicted by the previous step of the decoder, a predicted abstract of the generator is a word in a vocabulary table and can predict words except an original document, and a predicted word of the probability distribution of the pointer is a text in the original document pointed by the pointer, so that the abstract generated by the pointer generator network can generate a new word and can copy the text in the original document. The pointer generator network can be viewed as a balance between the extraction method and the abstraction method, improving the accuracy and processing power of unknown words by copying words, and simultaneously preserving the capability of generating new words. The present invention uses RNN with attention mechanism as decoder to output the summary. The method comprises the following specific steps:
and 4.1, fusing the global semantic information obtained in the step 2 and the step 3 and the local semantic information with the relation information in a summing mode to serve as an initialized hidden vector of the decoder module.
And 4.2, calculating the importance degree of each word in the original text to the Decoder word according to an attention mechanism, and obtaining semantic representation of the original text with different attention information according to different importance degrees for each step of the Decoder, wherein the semantic representation is called as a background vector.
And 4.3, predicting the output of the current time step according to the semantic representation of the original text, the output of the previous time step and the hidden vector of the current time step, and finally obtaining the predicted output of each time step so as to obtain the text abstract of the original text.
Compared with the prior art, the invention has the following advantages:
(1) the method takes the key points as local features and the original text as global features, and obtains richer semantic representation of the original text. The premise of generating high-quality abstract is to understand the original text semantics
(2) The meaningful features are closer to the global features than the meaningless features, the method updates the weights among the features by using the graph convolution, further promotes the transmission of semantic information, and inhibits the meaningless message transmission, so that the obtained semantic information of the original text can better reflect the central thought of the original text, the generated abstract can reflect the center of the original text, and the abstract without the central thought is prevented from being generated.
Drawings
FIG. 1 is a flow chart of an embodiment of the present invention.
Detailed Description
The present invention will be described in detail below with reference to the accompanying drawings and examples.
The flow chart of an embodiment is shown in fig. 1, and comprises the following steps:
step S10, extracting the key words of the original document;
step S20, constructing an Encoder module to extract global semantic information;
step S30, constructing a graph volume module to extract local semantic information;
and step S40, the Decoder module is constructed to generate a text abstract.
Step 1, extracting keywords of an original text.
For texts, the topic idea of the whole text can be snooped through some keywords, and the method extracts a plurality of keywords representing the semantic content of the article as local information of the text. Because keywords are not given in a data set related to the text abstract, in the invention, the keywords of the original document need to be extracted first, and the method used by the invention is mainly based on an unsupervised thought.
The steps for extracting the original text key words are as follows:
step 1.1, considering the position information of words, the probability that the words appearing in the first sentence and the last sentence are keywords is higher, so that the first sentence and the last sentence of the document are respectively repeated for 3 times, and the word frequency of the keywords in the first sentence and the last sentence is increased.
And 1.2, segmenting the text, and selecting 20 words as candidate keywords by utilizing tf-idf statistical information of each word.
Step 1.3, the keywords that we want to obtain can represent the central thought of the original text as much as possible, and the keywords obtained by using the statistical information can not guarantee this, so we need to further screen the keywords obtained in step 1.2: vector representation d of the document is obtained using Doc2Vec, and vector representation w of the candidate keyword is obtained using Word2 Vec. And sorting the candidate keywords according to the cosine distances of w and d, and selecting key phrases close to the document from the initial candidate keywords, wherein the closer the keywords are to the document, the larger the description information amount is, so that the obtained keywords are ensured to be more relevant to the document.
Step 1.4 is to avoid redundancy of final keywords, that is, extracted keywords have the same meaning although having different expression modes, so we need to perform secondary screening on the keywords obtained in step 1.3: similarly, sorting is performed according to cosine distances among the candidate keywords, and only one keyword with the same semantic meaning is reserved.
And 2, constructing an Encoder module.
The purpose of this module is to encode, i.e. vectorize, the input text. The Encoder module of the invention uses the Encoder module of the Transformer to finally obtain the semantic representation of the original text with semantic features and context features, and the semantic representation becomes global semantic information.
And 3, constructing a graph convolution module.
The purpose of the module is the relationship integration of local semantic information. Semantic information of different keywords is obtained in the step 1, and in order to mine more effective local semantic features, the local features are added into the relational features by using a graph convolution method, so that the local semantic information with the relational information is obtained. In the graph convolution, the input comprises nodes and an adjacency matrix, wherein the nodes are local semantic information extracted in the step 1, the nodes are related, the adjacency matrix represents the degree of relation between the nodes, then the graph convolution is used for learning the relation weight between each keyword in a self-adaptive mode, after the adjacency matrix between the keywords is obtained, the adjacency matrix is multiplied by the initial semantic information to obtain relation characteristics, and then the relation characteristics and the initial characteristics are fused to obtain a new round of characteristics. The method comprises the following steps:
step 3.1 obtains K local semantic information and 1 global semantic information as the nodes of the graph in step 1 and step 2.
Step 3.2 constructs the adjacency matrix of the graph, and initializes the adjacency matrix to 1.
And 3.3, the larger the difference between the local characteristic and the overall characteristic is, the more the local characteristic is outlier is, so that the difference between the local semantic information and the overall semantic information is calculated to construct a difference matrix, and the difference is utilized to dynamically update the weights of the edges of all the nodes in the graph. Firstly, repeating the global semantic information for K times, respectively obtaining the difference degree of K local semantic information and the global semantic information, and finally obtaining a difference matrix.
And 3.4, converting the difference matrix obtained in the step 3.3 into a matrix with the dimensionality of (K, K) by utilizing operations such as linear transformation and the like, and calling the matrix as an updated matrix.
Step 3.5 multiplies the updated matrix obtained in step 3.4 element by element with the adjacent matrix, and the purpose of the operation is to learn the adjacent matrix adaptively by using the updated matrix.
And 3.6, multiplying the adjacent matrix obtained in the step 3.5 by the node information to obtain the relation characteristic of the semantic information.
And 3.7, splicing the local relation characteristics obtained in the step 3.6 with the local semantic information of the nodes to obtain the local semantic information with the relation information.
Step 4, constructing a Decoder module
The purpose of this module is to generate a summary of the original text. The pointer generator network is a seq2seq model with a copying mechanism, which predicts words according to probability distribution of a generator and a pointer, wherein the generator predicts words of a current step mainly by using a background vector output by an encoder module, a hidden layer of the current step of the decoder and an output predicted by the previous step of the decoder, a predicted abstract of the generator is a word in a vocabulary table and can predict words except an original document, and a predicted word of the probability distribution of the pointer is a text in the original document pointed by the pointer, so that the abstract generated by the pointer generator network can generate a new word and can copy the text in the original document. The pointer generator network can be viewed as a balance between the extraction method and the abstraction method, improving the accuracy and processing power of unknown words by copying words, and simultaneously preserving the capability of generating new words. The present invention uses RNN with attention mechanism as decoder to output the summary. The method comprises the following specific steps:
and 4.1, fusing the global semantic information obtained in the step 2 and the step 3 and the local semantic information with the relation information in a summing mode to serve as an initialized hidden vector of the decoder module.
And 4.2, calculating the importance degree of each word in the original text to the Decoder word according to an attention mechanism, and obtaining semantic representation of the original text with different attention information according to different importance degrees for each step of the Decoder, wherein the semantic representation is called as a background vector.
And 4.3, predicting the output of the current time step according to the semantic representation of the original text, the output of the previous time step and the hidden vector of the current time step, and finally obtaining the predicted output of each time step so as to obtain the text abstract of the original text.

Claims (3)

1. The text abstract acquisition method based on deep learning is characterized by comprising the following steps: the method comprises the following steps: step 1, extracting keywords of an original text;
extracting a plurality of keywords representing semantic content of an article as local information of a text; extracting the keywords of the original document, wherein the steps of extracting the keywords of the original document based on an unsupervised thought are as follows:
step 1.1, considering position information of words, enabling the probability that the words appearing in the first sentence and the last sentence are keywords to be high, and repeating the first sentence and the last sentence of the document for 3 times respectively, so that the word frequency of the keywords in the first sentence and the last sentence is increased;
step 1.2, segmenting the text, and selecting 20 words as candidate keywords by utilizing tf-idf statistical information of each word;
step 1.3 the keywords obtained in step 1.2 are further screened: obtaining vector representation d of the document by using Doc2Vec, and obtaining vector representation w of the candidate keyword by using Word2 Vec; sorting the candidate keywords according to the cosine distances of w and d, and selecting key phrases close to the document from the initial candidate keywords, wherein the closer the keywords are to the document, the larger the description information amount is, so that the obtained keywords are ensured to be more relevant to the document;
step 1.4 is to avoid redundancy of final keywords, that is, extracted keywords have the same meaning although having different expression modes, and therefore, the keywords obtained in step 1.3 need to be screened for the second time: sorting according to cosine distances among the candidate keywords, and only keeping one keyword with the same semantics;
step 2, constructing an Encoder module;
the purpose of the Encoder module is to encode, i.e., vectorize, the input text; the Encoder module uses a Transformer Encoder module to finally obtain the semantic representation of the original text with semantic features and context features to become global semantic information;
step 3, constructing a graph convolution module;
obtaining semantic information of different keywords in step 1, and adding the local features into the relationship features by using a graph convolution method to obtain the local semantic information with the relationship information in order to mine more effective local semantic features; in the graph convolution, the input comprises nodes and an adjacency matrix, wherein the nodes are local semantic information extracted in the step 1, the nodes are related, the adjacency matrix represents the degree of relation between the nodes, then the graph convolution is used for learning the relation weight between each keyword in a self-adaptive manner, after the adjacency matrix between the keywords is obtained, the adjacency matrix is multiplied by the initial semantic information to obtain a relation characteristic, and then the relation characteristic is fused with the initial characteristic to obtain a new round of characteristic;
step 4, constructing a Decoder module;
the Decoder module is used for generating a summary of an original text, the pointer generator network is a seq2seq model with a copying mechanism and predicts words according to probability distribution of the generator and the pointer, wherein the generator mainly predicts words of a current step by using a background vector output by the encoder module, a hidden layer of the Decoder current step and output of previous prediction of the Decoder, the predicted summary of the generator is words in a vocabulary table and predicts words outside an original document, and the predicted words of the probability distribution of the pointer are texts in the original document pointed by the pointer, so that the summary generated by the pointer generator network can generate a new vocabulary and can copy texts in the original document; the pointer generator network is regarded as the balance between an extraction method and an abstraction method, improves the accuracy and the processing capacity of the unknown words by copying the words, and simultaneously reserves the capacity of generating new words; the digest is output using RNN with attention as a decoder.
2. The text abstract acquiring method based on deep learning of claim 1, wherein: the step 3 comprises the following steps:
step 3.1, K pieces of local semantic information and 1 piece of global semantic information are obtained in step 1 and step 2 and are used as nodes of the graph;
step 3.2, constructing an adjacency matrix of the graph, and initializing to 1;
3.3, the larger the difference between the local characteristic and the overall characteristic is, the more the local characteristic is outlier is, so that the difference between the local semantic information and the overall semantic information is calculated to construct a difference matrix, and the difference is utilized to dynamically update the weight of the edges of all the nodes in the graph; firstly, repeating the global semantic information for K times to respectively obtain the difference degree of K local semantic information and the global semantic information, and finally obtaining a difference matrix;
step 3.4, converting the difference matrix obtained in step 3.3 into a matrix with the dimensionality of (K, K) by utilizing linear transformation operation, and calling the matrix as an updated matrix;
step 3.5, multiplying the updated matrix obtained in the step 3.4 by the adjacent matrix element by element, wherein the purpose of the operation is to utilize the updated matrix to learn the adjacent matrix adaptively;
step 3.6, multiplying the adjacent matrix obtained in step 3.5 by the node information to obtain the relation characteristic of the semantic information;
and 3.7, splicing the local relation characteristics obtained in the step 3.6 with the local semantic information of the nodes to obtain the local semantic information with the relation information.
3. The text abstract acquiring method based on deep learning of claim 1, wherein: the step 4 comprises the following steps:
step 4.1, fusing the global semantic information obtained in the step 2 and the step 3 and the local semantic information with the relationship information in a summing mode to serve as an initialized hidden vector of the decoder module;
step 4.2, calculating the importance degree of each word in the original text to the Decoder word according to an attention mechanism, and obtaining semantic representation of the original text with different attention information according to different importance degrees in each step of the Decoder, wherein the semantic representation is called as a background vector;
and 4.3, predicting the output of the current time step according to the semantic representation of the original text, the output of the previous time step and the hidden vector of the current time step, and finally obtaining the predicted output of each time step so as to obtain the text abstract of the original text.
CN202111662780.2A 2021-12-31 2021-12-31 Text abstract acquisition method based on deep learning Pending CN114298037A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111662780.2A CN114298037A (en) 2021-12-31 2021-12-31 Text abstract acquisition method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111662780.2A CN114298037A (en) 2021-12-31 2021-12-31 Text abstract acquisition method based on deep learning

Publications (1)

Publication Number Publication Date
CN114298037A true CN114298037A (en) 2022-04-08

Family

ID=80972900

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111662780.2A Pending CN114298037A (en) 2021-12-31 2021-12-31 Text abstract acquisition method based on deep learning

Country Status (1)

Country Link
CN (1) CN114298037A (en)

Similar Documents

Publication Publication Date Title
JP5128629B2 (en) Part-of-speech tagging system, part-of-speech tagging model training apparatus and method
KR20210116379A (en) Method, apparatus for text generation, device and storage medium
CN111190997B (en) Question-answering system implementation method using neural network and machine learning ordering algorithm
CN111324728A (en) Text event abstract generation method and device, electronic equipment and storage medium
CN113127624B (en) Question-answer model training method and device
CN112749326B (en) Information processing method, information processing device, computer equipment and storage medium
CN111178053B (en) Text generation method for generating abstract extraction by combining semantics and text structure
CN111814477B (en) Dispute focus discovery method and device based on dispute focus entity and terminal
CN111767394A (en) Abstract extraction method and device based on artificial intelligence expert system
CN111723295A (en) Content distribution method, device and storage medium
CN114757184B (en) Method and system for realizing knowledge question and answer in aviation field
CN114298055B (en) Retrieval method and device based on multilevel semantic matching, computer equipment and storage medium
CN111859950A (en) Method for automatically generating lecture notes
CN112765977B (en) Word segmentation method and device based on cross-language data enhancement
CN113407711A (en) Gibbs limited text abstract generation method by using pre-training model
CN113486143A (en) User portrait generation method based on multi-level text representation and model fusion
CN110717316B (en) Topic segmentation method and device for subtitle dialog flow
CN111008277B (en) Automatic text summarization method
CN117235250A (en) Dialogue abstract generation method, device and equipment
CN116842934A (en) Multi-document fusion deep learning title generation method based on continuous learning
CN115455152A (en) Writing material recommendation method and device, electronic equipment and storage medium
CN114298037A (en) Text abstract acquisition method based on deep learning
CN114118087A (en) Entity determination method, entity determination device, electronic equipment and storage medium
CN114330296A (en) New word discovery method, device, equipment and storage medium
CN113887244A (en) Text processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination