CN115952279A - Text outline extraction method and device, electronic device and storage medium - Google Patents

Text outline extraction method and device, electronic device and storage medium Download PDF

Info

Publication number
CN115952279A
CN115952279A CN202211533215.0A CN202211533215A CN115952279A CN 115952279 A CN115952279 A CN 115952279A CN 202211533215 A CN202211533215 A CN 202211533215A CN 115952279 A CN115952279 A CN 115952279A
Authority
CN
China
Prior art keywords
text
sentence
extracted
paragraph
extracted based
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211533215.0A
Other languages
Chinese (zh)
Other versions
CN115952279B (en
Inventor
金征雷
周创
张俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Ruicheng Information Technology Co ltd
Original Assignee
Hangzhou Ruicheng Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Ruicheng Information Technology Co ltd filed Critical Hangzhou Ruicheng Information Technology Co ltd
Priority to CN202211533215.0A priority Critical patent/CN115952279B/en
Publication of CN115952279A publication Critical patent/CN115952279A/en
Application granted granted Critical
Publication of CN115952279B publication Critical patent/CN115952279B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application relates to a method, a device, an electronic device and a storage medium for extracting a text outline, wherein the method comprises the following steps: acquiring sentence content characteristics of each sentence of text in the text to be extracted based on readable characters of the text to be extracted, and acquiring sentence format characteristics of each sentence of text in the text to be extracted based on a format of the text to be extracted, wherein the sentence content characteristics comprise character characteristics of the corresponding sentence of text; acquiring sentence fusion characteristics of each sentence of text in the text to be extracted based on the sentence content characteristics and the sentence format characteristics; acquiring paragraph characteristics of each text segment in the text to be extracted based on the sentence content characteristics and the corresponding weight of each text segment in each text segment; and obtaining outline information corresponding to the text to be extracted based on the sentence fusion characteristics and the paragraph characteristics. The method and the device solve the problem that the accuracy of extracting the text outline in the related technology is not high, enrich the levels of the text features, fuse the correlations among the text features of different levels, and improve the accuracy of extracting the text outline.

Description

Text outline extraction method and device, electronic device and storage medium
Technical Field
The present application relates to the field of semantic recognition, and in particular, to a method and an apparatus for extracting a text outline, an electronic apparatus, and a storage medium.
Background
With the continuous development of information technology, the application of semantic recognition technology becomes more and more extensive. The text outline extraction technology is used as an important branch of the semantic recognition field and has important application in the scenes of government affairs, medicine and the like. For example, outline contents of texts such as government official documents and medical documents can be automatically extracted by an outline extraction technique.
In the existing outline extraction technology, characters, words and sentences are generally taken as dimensions to extract text features, then the text features are input into a preset sequence feature extraction model, and the text features are analyzed through the sequence feature extraction model to finally obtain outline contents. However, when analyzing a text in the related art, each feature of the same dimension is often analyzed in isolation, and the correlation between different features of the same dimension and the correlation between features of different dimensions are not considered, and when analyzing the features, the context of the features is often ignored, which results in low accuracy of extracting the outline of the text in the related art.
Aiming at the technical problem that the accuracy of text outline extraction in the related technology is not high, no effective solution is provided at present.
Disclosure of Invention
The embodiment provides a method, a device, an electronic device and a storage medium for extracting a text outline, so as to solve the problem that the accuracy of extracting the text outline is not high in the related art.
In a first aspect, in this embodiment, a method for extracting a text outline is provided, including:
acquiring sentence content characteristics of each sentence of text in the text to be extracted based on readable characters of the text to be extracted, and acquiring sentence format characteristics of each sentence of text in the text to be extracted based on the format of the text to be extracted;
acquiring sentence fusion characteristics of each sentence in the text to be extracted based on the sentence content characteristics and the sentence format characteristics;
acquiring paragraph features of each text segment in the text to be extracted based on the sentence content features and corresponding weights of each text segment in each text segment;
and acquiring outline information corresponding to the text to be extracted based on the sentence fusion characteristics and the paragraph characteristics.
In some embodiments, the obtaining sentence content characteristics of each sentence of text in the text to be extracted based on the readable characters of the text to be extracted includes:
acquiring character features of the text to be extracted based on the readable characters of the text to be extracted;
and acquiring sentence content characteristics of each sentence text in the text to be extracted based on the character characteristics and corresponding weights of a plurality of readable characters in each sentence text.
In some of these embodiments, the sentence format features include a sentence position feature, a sentence length feature, and a sentence placeholder feature.
In some embodiments, the sentence placeholder feature obtaining method includes:
and acquiring sentence placeholder characteristics of each sentence of text in the text to be extracted based on the format placeholder in the text to be extracted.
In some embodiments, the obtaining sentence fusion characteristics of each sentence of text in the text to be extracted based on the sentence content characteristics and the sentence format characteristics includes:
performing fusion processing on the sentence length characteristic, the sentence placeholder characteristic and the sentence content characteristic to obtain a sentence initial fusion characteristic;
and performing fusion processing on the sentence initial fusion characteristic and the sentence position characteristic to obtain the sentence fusion characteristic.
In some embodiments, the obtaining paragraph features of each text segment in the text to be extracted based on the sentence content features and the corresponding weights of each text segment in each text segment includes:
constructing a weight matrix and a bias matrix corresponding to the sentence content characteristics of all the sentence texts;
obtaining paragraph initial features based on the sentence content features, the weight matrix and the bias matrix;
and carrying out normalization processing and aggregation processing on the paragraph initial features to obtain the paragraph features.
In some embodiments, the obtaining outline information corresponding to the text to be extracted based on the sentence fusion feature and the paragraph feature includes:
weighting the sentence fusion characteristics and the paragraph characteristics, and normalizing the processing result;
and determining outline information of the text to be extracted based on the result of the normalization processing.
In a second aspect, in this embodiment, there is provided an apparatus for extracting a text outline, including:
the first acquisition module is used for acquiring sentence content characteristics of each sentence of text in the text to be extracted based on readable characters of the text to be extracted and acquiring sentence format characteristics of each sentence of text in the text to be extracted based on the format of the text to be extracted, wherein the sentence content characteristics comprise character characteristics of the corresponding sentence of text;
the second obtaining module is used for obtaining sentence fusion characteristics of each sentence of text in the text to be extracted based on the sentence content characteristics and the sentence format characteristics;
the third acquisition module is used for acquiring paragraph characteristics of each text in the text to be extracted based on sentence content characteristics and corresponding weight of each text in each text;
and the fourth acquisition module is used for acquiring the outline information corresponding to the text to be extracted based on the sentence fusion characteristics and the paragraph characteristics.
In a third aspect, in this embodiment, there is provided an electronic apparatus, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the processor implements the method for extracting the text outline according to the first aspect.
In a fourth aspect, in the present embodiment, there is provided a storage medium having stored thereon a computer program which, when executed by a processor, implements the method for extracting a text outline according to the first aspect.
Compared with the related art, the application provides a method, a device, an electronic device and a storage medium for extracting the outline of the text, wherein the method comprises the following steps: acquiring sentence content characteristics of each sentence in the text to be extracted based on readable characters of the text to be extracted, and acquiring sentence format characteristics of each sentence in the text to be extracted based on the format of the text to be extracted, wherein the sentence content characteristics comprise character characteristics of corresponding sentence text; acquiring sentence fusion characteristics of each sentence in the text to be extracted based on the sentence content characteristics and the sentence format characteristics; acquiring paragraph features of each text in the text to be extracted based on sentence content features and corresponding weights of each text in each text; and obtaining outline information corresponding to the text to be extracted based on the sentence fusion characteristics and the paragraph characteristics. The method comprises the steps of obtaining the correlation information between the content and the format of each sentence of text by fusing the sentence content characteristics and the sentence format characteristics of each sentence of text, further obtaining the implicit relationship between the sentence text and the paragraph text by fusing the sentence fusion characteristics and the paragraph characteristics, obtaining the outline information by fusing the multilevel texts, avoiding analyzing the text characteristics in an isolated manner and neglecting the context thereof, solving the technical problem of low accuracy of text outline extraction in the related technology, enriching the layers of the text characteristics, and fusing the correlation among the text characteristics of different layers, thereby improving the accuracy of text outline extraction.
The details of one or more embodiments of the application are set forth in the accompanying drawings and the description below to provide a more concise and understandable description of the application, and features, objects, and advantages of the application.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
fig. 1 is a block diagram of a terminal hardware structure of a method for extracting a text outline according to an embodiment of the present application;
fig. 2 is a schematic flow chart of a method for extracting a text outline according to an embodiment of the present application;
fig. 3 is a schematic flow chart of a method for extracting a text outline according to another embodiment of the present application;
fig. 4 is a block diagram of a configuration of an apparatus for extracting a text outline according to an embodiment of the present application.
Detailed Description
For a clearer understanding of the objects, aspects and advantages of the present application, reference is made to the following description and accompanying drawings.
Unless defined otherwise, technical or scientific terms referred to herein shall have the same general meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The use of the terms "a" and "an" and "the" and similar referents in the context of describing the invention (including a reference to the context of the specification and claims) are to be construed to cover both the singular and the plural, as well as the singular and plural. The terms "comprises," "comprising," "has," "having," and any variations thereof, as referred to in this application, are intended to cover non-exclusive inclusions; for example, a process, method, and system, article, or apparatus that comprises a list of steps or modules (elements) is not limited to the listed steps or modules, but may include other steps or modules (elements) not listed or inherent to such process, method, article, or apparatus. Reference throughout this application to "connected," "coupled," and the like is not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. Reference to "a plurality" in this application means two or more. "and/or" describes an association relationship of associated objects, meaning that three relationships may exist, for example, "A and/or B" may mean: a exists alone, A and B exist simultaneously, and B exists alone. In general, the character "/" indicates a relationship in which the objects associated before and after are an "or". The terms "first," "second," "third," and the like in this application are used for distinguishing between similar items and not necessarily for describing a particular sequential or chronological order.
The method embodiments provided in the present embodiment may be executed in a terminal, a computer, or a similar computing device. For example, the method is executed on a terminal, and fig. 1 is a block diagram of a hardware structure of the terminal according to the method for extracting a text outline in this embodiment. As shown in fig. 1, the terminal may include one or more processors 102 (only one shown in fig. 1) and a memory 104 for storing data, wherein the processor 102 may include, but is not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA. Specifically, the processor 102 may be configured as a Central Processing Unit (CPU), and the processor 102 includes an arithmetic unit and a controller. The arithmetic unit is mainly used for the terminal to execute various arithmetic and logic operation operations, and the basic operation of the arithmetic unit comprises four arithmetic operations of addition, subtraction, multiplication and division, and logical operations of AND, OR, NOT, XOR and the like, and also comprises tensor operation, matrix mathematical operation, operations of shifting, comparing, transmitting and the like. The controller is mainly used for analyzing the instruction and sending out a corresponding control signal. The terminal may also include an input-output device 106. It will be understood by those of ordinary skill in the art that the structure shown in fig. 1 is merely an illustration and is not intended to limit the structure of the terminal described above. For example, the terminal may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.
The memory 104 may be used to store a computer program, for example, a software program and a module of application software, such as a computer program corresponding to the text outline extraction method in the embodiment, and the processor 102 executes various functional applications and data processing by running the computer program stored in the memory 104, so as to implement the method described above. The memory 104 may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory located remotely from the processor 102, which may be connected to the terminal over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
In the related technology, characters, words and sentences are usually taken as dimensionalities to extract features, then the features are input into a preset sequence feature extraction model, target features are analyzed through the sequence feature extraction model, and outline contents are finally obtained. However, in the related art, when the features of the same dimension are analyzed, each feature is often analyzed in isolation, different features of the same dimension and correlations between features of different dimensions are not considered, and context of the features is often ignored when the features are analyzed.
Specifically, the following drawbacks mainly exist in the related art: 1) The proportional relationship between the outline space and the text content space and the relative position relationship between the contents of all parts in the text content space are not considered in the related technology; 2) In the related technology, the inherent rules of the format presented by the outline in the text in different fields are not considered, although the text contents in different fields are different, the outline text is often highlighted in the article by using a certain format as the key inductive prompt information; 3) In the related technology, the outline is not considered as the summary of text content, the semanteme contained in the outline has correlation with the texts of other sentences, and the correlation between the outline sentence and other sentences is often high in the range of the content covered by the outline.
Referring to fig. 2, fig. 2 is a schematic flow chart illustrating a method for extracting a text outline according to an embodiment of the present application.
In one embodiment, the method for extracting the text outline comprises the following steps:
s202: the sentence content characteristics of each sentence of text in the text to be extracted are obtained based on the readable characters of the text to be extracted, and the sentence format characteristics of each sentence of text in the text to be extracted are obtained based on the format of the text to be extracted.
Exemplarily, processing the content in the text to be extracted to obtain readable characters in the text to be extracted, wherein the text to be extracted is the text needing outline information extraction, and the text includes but is not limited to documents such as government official documents, academic documents, news reports and the like; the readable characters are characters that can be displayed in the text to be extracted, and include but are not limited to characters such as chinese, english, numbers, and punctuation marks.
Illustratively, after the readable characters in the text to be extracted are acquired, sentence content characteristics corresponding to each sentence text are acquired based on the readable characters of the sentence text, and the sentence content characteristics are used for representing content information of the corresponding sentence text. Specifically, corresponding word features are extracted based on each readable character, and then fusion processing is performed based on the word features corresponding to all characters of each sentence text to obtain sentence content features corresponding to the sentence text, for example, corresponding word features are extracted based on the code of each readable character, and then all word features in each sentence text are subjected to weighted fusion; or, the sentence content features corresponding to each sentence text are constructed directly based on all characters of each sentence text, for example, the codes of all readable characters of each sentence text are spliced to construct a sentence code, and then the sentence content features are extracted based on the sentence code.
Exemplarily, the format of the text to be extracted is identified to obtain the format information of the text to be extracted, and then the sentence format feature of each sentence of text is obtained. The sentence format characteristics of each sentence text are used for representing the format information of the sentence text, and the format information includes, but is not limited to, the position, the length, the format control characters and the like of the sentence text.
S204: and acquiring sentence fusion characteristics of each sentence text in the text to be extracted based on the sentence content characteristics and the sentence format characteristics.
Illustratively, after the sentence content characteristic and the sentence format characteristic of each sentence text are obtained, the sentence content characteristic and the sentence format characteristic are fused, so as to obtain the sentence fusion characteristic of the sentence text. It can be understood that the sentence fusion feature contains both the content information and the format information of the text of the corresponding sentence.
S206: and acquiring paragraph features of each text segment in the text to be extracted based on the sentence content features and the corresponding weights of each text segment in each text segment.
Illustratively, the corresponding weight is determined according to the sentence content characteristics corresponding to each sentence text in each text, for example, for the sentence text containing the general words, the sentence content characteristics can be assigned with higher weight. The weights corresponding to the sentence content features can be stored in a sentence weight matrix form. And after the weight corresponding to each sentence content characteristic is determined, weighting all sentence content characteristics based on the weight of the sentence content characteristics, thereby obtaining paragraph characteristics representing content information of all sentence texts in the paragraph. It will be appreciated that paragraph features reflect the context of the corresponding paragraph text.
S208: and obtaining outline information corresponding to the text to be extracted based on the sentence fusion characteristics and the paragraph characteristics.
Illustratively, the corresponding sentence text is analyzed by combining the sentence fusion feature and the paragraph feature, and the sentence text meeting the condition is used as the outline information corresponding to the paragraph text. Specifically, for each sentence of text, whether the sentence of text is higher in importance in format is determined through format information in the sentence fusion feature, whether the sentence of text is higher in relevance with the overall context of the paragraph of text is determined through relevance of content information in the sentence fusion feature and the paragraph feature, and finally whether the sentence of text can be used as outline information is determined.
The embodiment obtains sentence content characteristics of each sentence in the text to be extracted based on readable characters of the text to be extracted, and obtains sentence format characteristics of each sentence in the text to be extracted based on a format of the text to be extracted, wherein the sentence content characteristics include character characteristics of a corresponding sentence; acquiring sentence fusion characteristics of each sentence of text in the text to be extracted based on the sentence content characteristics and the sentence format characteristics; acquiring paragraph characteristics of each text segment in the text to be extracted based on the sentence content characteristics and the corresponding weight of each text segment in each text segment; and obtaining outline information corresponding to the text to be extracted based on the sentence fusion characteristics and the paragraph characteristics. The method comprises the steps of obtaining the correlation information between the content and the format of each sentence of text by fusing the sentence content characteristics and the sentence format characteristics of each sentence of text, further obtaining the implicit relationship between the sentence text and the paragraph text by fusing the sentence fusion characteristics and the paragraph characteristics, obtaining the outline information by fusing the multilevel texts, avoiding analyzing the text characteristics in an isolated manner and neglecting the context thereof, solving the technical problem of low accuracy of text outline extraction in the related technology, enriching the layers of the text characteristics, and fusing the correlation among the text characteristics of different layers, thereby improving the accuracy of text outline extraction.
In another embodiment, the obtaining sentence content characteristics of each sentence of text in the text to be extracted based on readable characters of the text to be extracted includes:
step 1: acquiring character features of the text to be extracted based on the readable characters of the text to be extracted;
step 2: and acquiring sentence content characteristics of each sentence of text in the text to be extracted based on the character characteristics and corresponding weights of a plurality of readable characters in each sentence of text.
Exemplarily, characters in the text to be extracted are divided into readable characters and format placeholders, and character features of the text to be extracted are extracted based on the readable characters. The readable characters are characters which can be displayed in the text to be extracted, and include but are not limited to characters such as Chinese, english, numbers, punctuations and the like; a format placeholder is a character that is not displayable in the text to be extracted but occupies a text position and controls the text format, including but not limited to "\\ t", "\\ r", "\ n", "\\ s", etc.
Specifically, after the readable characters of the text to be extracted are obtained, the readable characters are trained based on a training network model, so that character features of character dimensions are obtained. Specifically, the Training network model is used for performing feature extraction on the encoding of the input readable character to generate a feature vector, including but not limited to GPT (general Pre-Training model) or BERT (Bidirectional Encoder Representation from transformations), and the like.
Exemplarily, after the word features of the readable characters are obtained, weights corresponding to different readable characters are determined, and the word features are weighted based on the word features and the corresponding weights of all the readable characters in each sentence text, so as to fuse and generate the sentence content features of the sentence text.
Specifically, based on different readable characters, the corresponding weights of the readable characters are determined, and then a corresponding character weight matrix W is constructed w 、u w And word bias matrix b w (ii) a After character features corresponding to all readable characters in each sentence of text are obtained, a character weight matrix W is obtained w 、u w And word bias matrix b w Extracting weights corresponding to all readable characters in the sentence text, and then performing weighted calculation on the character features corresponding to the readable characters based on the extracted weights to obtain a weighted result corresponding to each character feature, wherein the specific calculation process is as follows:
Figure BDA0003976628680000081
wherein j is the sequence number of the sentence text in the paragraph text, t is the sequence number of the readable character in the sentence text, h jt The character characteristic alpha corresponding to the t readable character of the jth sentence text in the paragraph text jt And adding a weighted word characteristic to the t readable character of the jth sentence text in the paragraph text.
After the character features with the additional weight are obtained, the character features are normalized to obtain a normalization result corresponding to each character feature:
Figure BDA0003976628680000082
wherein, a jt And obtaining a normalized result of word characteristics corresponding to the t readable character of the j sentence text in the paragraph text.
After the normalization result is obtained, the normalization results of the character characteristics corresponding to all readable characters of each sentence of text in the paragraph text are aggregated to obtain the sentence content characteristics S of the sentence text j
Figure BDA0003976628680000083
The method comprises the steps of obtaining character features of a text to be extracted based on readable characters of the text to be extracted; the sentence content characteristics of each sentence text in the text to be extracted are obtained based on the character characteristics and the corresponding weights of a plurality of readable characters in each sentence text, so that the characteristic information of the readable characters and the incidence relation between the readable characters are fully combined, the accuracy of the sentence content characteristics is improved, and the accuracy of text outline extraction is further improved.
In another embodiment, the sentence format features include a sentence position feature, a sentence length feature, and a sentence placeholder feature.
Illustratively, the sentence format features in this embodiment include at least a sentence position feature, a sentence length feature and a sentence placeholder feature. The sentence position characteristic is used for representing position information of the sentence text in the paragraph text, the sentence length characteristic is used for representing length information of the sentence text in the paragraph text, and the length ratio of the sentence text in the paragraph text is generally used as the sentence length characteristic; the sentence placeholder feature is used to characterize the format placeholder in the sentence text.
Specifically, the sentence position feature includes a paragraph head feature, a paragraph middle feature and a paragraph end feature, which are respectively used for representing that the sentence text is located at the paragraph head, the paragraph middle and the paragraph end of the paragraph text. In one embodiment, when the sentence position feature of the sentence text is obtained, if the sentence text is located at the beginning of a paragraph, a character "< PAS >" is added to the sentence beginning; if the sentence text is in the paragraph, then add the character "< PAB >" at the beginning of the sentence; if the sentence text is at the end of the paragraph, the character "< PAE >" is added at the beginning of the sentence. And determining the position characteristics of the sentence text through the added characters of the sentence head of the sentence text.
In particular, the sentence length feature may be determined based on a length proportion of the sentence text in the paragraph text. In one embodiment, the sentence length feature is set to S1 if the length ratio of the sentence text in the paragraph text is lower than 0.15; if the length occupation ratio of the sentence text in the paragraph text is higher than 0.98, setting the sentence length characteristic as F1; if the length ratio of the sentence text in the paragraph text is between 0.15 and 0.98, the sentence length characteristic is set to L1.
In particular, sentence placeholder features can be determined based on format placeholders in the text of the sentence. In one specific embodiment, feature extraction is performed on the codes of the format placeholders, so that corresponding feature vectors are obtained, and the feature vectors are used as sentence placeholder features.
In another embodiment, the method for acquiring sentence placeholder features comprises the following steps:
and acquiring sentence placeholder characteristics of each sentence of text in the text to be extracted based on the format placeholder in the text to be extracted.
Exemplarily, dividing characters in a text to be extracted to obtain readable characters and format placeholders; and determining sentence placeholder characteristics corresponding to each sentence text based on the format placeholders in each sentence text.
Specifically, after the format placeholder of each sentence of text is obtained, the format placeholder is trained based on a training network model, so that sentence placeholder characteristics corresponding to the sentence of text are obtained. Specifically, the trained network model is used for performing feature extraction on the codes of the input format placeholders to generate feature vectors, including but not limited to GPT (generic Pre-Training model) or BERT (Bidirectional Encoder retrieval from transformations) and the like.
In the embodiment, the sentence placeholder characteristics of each sentence of text in the text to be extracted are acquired based on the format placeholders in the text to be extracted, so that the sentence placeholder characteristics of each sentence of text are associated with each format placeholder, the accuracy of the sentence placeholder characteristics is improved, and the accuracy of text outline extraction is improved.
In another embodiment, the obtaining of the sentence fusion feature of each sentence in the text to be extracted based on the sentence content feature and the sentence format feature includes:
step 1: performing fusion processing on the sentence length characteristic, the sentence placeholder characteristic and the sentence content characteristic to obtain an initial sentence fusion characteristic;
step 2: and carrying out fusion processing on the sentence initial fusion characteristics and the sentence position characteristics to obtain sentence fusion characteristics.
Illustratively, the sentence format feature in the present embodiment includes a sentence position feature, a sentence length feature, and a sentence placeholder feature at the same time. After the sentence format characteristics are obtained, sentence length characteristics F are firstly matched l Sentence placeholder features F b And sentence content characteristics S j Adding the above-mentioned materials and making fusion treatment to obtain sentence initial fusion characteristics S r
S r =(w l F l +w b F b +w r S j )+b rr
Wherein w l 、w b 、w r And b rr To learn parameters. Further, the position information of the sentence text in the paragraph text is added to the sentence initial fusion feature, namely, the sentence initial fusion feature S r And sentence position feature F p Performing fusion splicing to obtain final sentence fusion characteristic S rr
Figure BDA0003976628680000101
Optionally, the method in this embodiment is only an example, and the sentence length feature F may also be directly used in this application l Sentence placeholder feature F b Sentence content characteristics S j And sentence position feature F p Directly splicing to obtain sentence fusion characteristics S rr
The embodiment combines the sentence length characteristic, the sentence placeholder characteristic, the sentence position characteristic and the sentence content characteristic to generate the sentence fusion characteristic, thereby fully combining the text characteristics of different dimensions such as the relevant content information of characters, sentences, paragraphs and punctuations in the text to be extracted, the length information of the sentence text, the expression space of the outline and the text, the implicit relationship of the mutual positions and the like, improving the richness of the sentence fusion characteristic and further improving the accuracy of the sentence fusion characteristic.
In another embodiment, the obtaining paragraph features of each text segment in the text to be extracted based on the sentence content features and the corresponding weights of each sentence in each text segment includes:
step 1: constructing a weight matrix and a bias matrix corresponding to sentence content characteristics of all sentence texts;
and 2, step: obtaining initial characteristics of the paragraph based on the sentence content characteristics, the weight matrix and the bias matrix;
and step 3: and carrying out normalization processing and aggregation processing on the initial features of the paragraphs to obtain the characteristics of the paragraphs.
Illustratively, based on sentence content characteristics of each sentence text in the paragraph text, corresponding weights are determined, and then a weight matrix and a bias matrix are constructed. And performing weighting processing on the sentence content characteristics based on the weight matrix and the bias matrix to obtain corresponding paragraph initial characteristics. Further, all the paragraph initial features are subjected to normalization processing and aggregation processing, so that the final paragraph features are obtained.
Optionally, before performing the weighting calculation, the sentence content characteristic s may be first calculated ij Sending the sequence feature extraction model models to perform feature extraction, and then based on the constructed weight matrix W w2 、u w2 And a bias matrix b w2 Weighting to obtain initial feature beta of the paragraph ij The specific calculation process is as follows:
Figure BDA0003976628680000102
wherein,i is the sequence number of the paragraph text, and j is the sequence number of the sentence text in the paragraph text. After the initial characteristics of the paragraphs are obtained through calculation, all the paragraph texts in each paragraph text are normalized to obtain a normalization result e ij
Figure BDA0003976628680000103
Further, the result of the normalization processing and the features extracted by the sequence feature extraction model models are subjected to aggregation training to obtain paragraph features PS i
Figure BDA0003976628680000111
Specifically, the sequence feature extraction model in this embodiment includes, but is not limited to, a transform (self-attention mechanism model) and a BiLSTM (bidirectional long short term memory model), and the like, and the sentence content features are extracted again by the sequence feature extraction model, so that the expression effect of the sentence content features is improved.
In the embodiment, weight matrixes and bias matrixes corresponding to sentence content characteristics of all sentence texts are constructed; obtaining paragraph initial characteristics based on the sentence content characteristics, the weight matrix and the bias matrix; the paragraph initial features are subjected to normalization processing and aggregation processing to obtain the paragraph features, so that the paragraph features can fully reflect content information of the paragraph text, the accuracy of the paragraph features is improved, and the accuracy of text outline extraction is further improved.
In another embodiment, obtaining outline information corresponding to a text to be extracted based on the sentence fusion characteristics and the paragraph characteristics includes:
step 1: weighting the sentence fusion characteristics and the paragraph characteristics, and normalizing the processing result;
step 2: and determining outline information of the text to be extracted based on the result of the normalization processing.
Illustratively, after sentence fusion characteristics and paragraph characteristics are obtained, weighted fusion and normalization processing are performed on the sentence fusion characteristics and the paragraph characteristics to obtain corresponding processing results. And further, analyzing and predicting the processing result to obtain a corresponding prediction result, and determining whether the sentence text is an outline sentence or not based on the prediction result corresponding to each sentence text.
Specifically, in the training stage, sentence fusion characteristics S are obtained rr And paragraph feature PS i Then, the sentence is fused with the feature S rr And paragraph feature PS i Features stacked as a column by a weight matrix w i And a bias matrix b i And weighting the stacked features, and further processing a weighting result through a normalization function, so that a probability value P of each sentence of text belonging to the outline sentence is calculated:
Figure BDA0003976628680000112
further, cross entropy loss is calculated according to probability values of all levels of each sentence of text, and loss adjustment is performed through the cross entropy loss. The cross entropy loss L (y, p) is calculated as follows:
Figure BDA0003976628680000113
wherein N is the total number of samples, K is the total number of label values, i is the sample serial number, K is the label serial number, P i,k Probability of the kth label value, y, for the ith sample i,k Is the corresponding predicted value.
Specifically, in the training process, after each training round is finished (or after a certain number of training rounds), a test result is obtained on the verification set, and the best verification set precision of the test result is recorded. And (5) stopping training if the test error of the network model on the verification set rises along with the increase of the number of the training rounds. After the training is finished, extracting the outline information of the text to be extracted through the trained network model.
In this embodiment, weighting processing is performed on the sentence fusion characteristics and the paragraph characteristics, and normalization processing is performed on the processing result; and determining outline information of the text to be extracted based on the result of the normalization processing, thereby fully combining the correlation weight relationship between each sentence of text and other sentences of text, and considering the context of the paragraph of the text and the format information of the sentence when determining whether each sentence of text is an outline sentence, thereby improving the accuracy of extracting the outline information.
In another embodiment, with reference to the above embodiments, the present application further discloses a flow diagram of a specific text outline extraction method. Referring to fig. 3, fig. 3 is a schematic flow chart of a method for extracting a text outline according to another embodiment of the present application. Specifically, as shown in fig. 3, the method for extracting the text outline includes:
s1: and dividing the text to be extracted into readable characters and format placeholders. Wherein Cjt represents the t-th readable character in the j-th sentence, bt represents the t-th format placeholder;
s2: training readable characters Cjt and format placeholder Bt by using a training model to obtain character features hjt and format placeholder features Fb;
s3: constructing a word weight matrix, and obtaining sentence content characteristics Sj through aggregation training;
s4: sentence format characteristics are obtained: extracting sentence position characteristics Fp of a sentence text in the paragraph, wherein the sentence position characteristics Fp comprise three kinds of information of a paragraph head, a paragraph middle and a paragraph tail; extracting length proportion characteristics Fl of the sentence, and classifying according to the length proportion of the sentence in the paragraph; extracting sentence placeholder characteristics Fb contained in the sentence;
s5: and performing characteristic fusion on the sentence content characteristics Sj and the sentence format characteristics to obtain sentence fusion characteristics Srr. Specifically, sentence content characteristics Sj, sentence position characteristics Fp, sentence length characteristics Fl and sentence placeholder characteristics Fb are subjected to characteristic fusion to obtain sentence fusion characteristics Srr;
s6: performing feature extraction on the sentence content features Sj again, constructing a sentence weight matrix, performing weighted calculation on the extracted features, and obtaining paragraph features PSi with the sentence weights fused through aggregation training;
s7: and performing fusion training on the sentence fusion characteristics Srr and the paragraph characteristics PSi to obtain a trained outline extraction model.
It should be noted that the steps illustrated in the above-described flow diagrams or in the flow diagrams of the figures may be performed in a computer system, such as a set of computer-executable instructions, and that, although a logical order is illustrated in the flow diagrams, in some cases, the steps illustrated or described may be performed in an order different than here.
In this embodiment, a device for extracting a text outline is further provided, where the device is used to implement the foregoing embodiment and the preferred embodiment, and details of the description already given are not repeated. The terms "module," "unit," "sub-unit," and the like as used below may implement a combination of software and/or hardware of predetermined functions. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.
Fig. 4 is a block diagram of a configuration of an apparatus for extracting a text outline according to the present embodiment, and as shown in fig. 4, the apparatus includes:
the first obtaining module 10 is configured to obtain, based on readable characters of a text to be extracted, sentence content characteristics of each sentence of the text in the text to be extracted, and obtain, based on a format of the text to be extracted, sentence format characteristics of each sentence of the text in the text to be extracted, where the sentence content characteristics include character characteristics of a corresponding sentence of the text;
the first obtaining module 10 is further configured to obtain word features of the text to be extracted based on the readable characters of the text to be extracted;
acquiring sentence content characteristics of each sentence of text in the text to be extracted based on character characteristics and corresponding weights of a plurality of readable characters in each sentence of text;
the first obtaining module 10 is further configured to obtain a sentence placeholder feature of each sentence of text in the text to be extracted based on the format placeholder in the text to be extracted;
the second obtaining module 20 is configured to obtain a sentence fusion feature of each sentence of text in the text to be extracted based on the sentence content feature and the sentence format feature;
the second obtaining module 20 is further configured to perform fusion processing on the sentence length characteristic, the sentence placeholder characteristic, and the sentence content characteristic to obtain an initial sentence fusion characteristic;
performing fusion processing on the sentence initial fusion characteristics and the sentence position characteristics to obtain sentence fusion characteristics;
a third obtaining module 30, configured to obtain paragraph features of each text segment in the text to be extracted based on the sentence content features and the corresponding weights of each text segment in each text segment;
the third obtaining module 30 is further configured to construct a weight matrix and a bias matrix corresponding to sentence content characteristics of all sentence texts;
obtaining paragraph initial characteristics based on the sentence content characteristics, the weight matrix and the bias matrix;
carrying out normalization processing and aggregation processing on the paragraph initial features to obtain paragraph features;
a fourth obtaining module 40, configured to obtain outline information corresponding to the text to be extracted based on the sentence fusion feature and the paragraph feature;
the fourth obtaining module 40 is further configured to perform weighting processing on the sentence fusion characteristics and the paragraph characteristics, and perform normalization processing on the processing result;
and determining outline information of the text to be extracted based on the result of the normalization processing.
The above modules may be functional modules or program modules, and may be implemented by software or hardware. For a module implemented by hardware, the modules may be located in the same processor; or the modules can be respectively positioned in different processors in any combination.
There is also provided in this embodiment an electronic device comprising a memory having a computer program stored therein and a processor arranged to run the computer program to perform the steps of any of the above method embodiments.
Optionally, the electronic apparatus may further include an input/output device, and the input/output device is connected to the processor.
Optionally, in this embodiment, the processor may be configured to execute the following steps by a computer program:
step 1: acquiring sentence content characteristics of each sentence of text in the text to be extracted based on readable characters of the text to be extracted, and acquiring sentence format characteristics of each sentence of text in the text to be extracted based on the format of the text to be extracted;
and 2, step: acquiring sentence fusion characteristics of each sentence of text in the text to be extracted based on the sentence content characteristics and the sentence format characteristics;
and 3, step 3: acquiring paragraph characteristics of each text in the text to be extracted based on sentence content characteristics and corresponding weight of each text in each text;
and 4, step 4: and acquiring outline information corresponding to the text to be extracted based on the sentence fusion characteristics and the paragraph characteristics.
It should be noted that, for specific examples in this embodiment, reference may be made to the examples described in the foregoing embodiments and optional implementations, and details are not described again in this embodiment.
In addition, in combination with the method for extracting the outline of the text provided in the above embodiment, a storage medium may also be provided in this embodiment. The storage medium having stored thereon a computer program; the computer program, when executed by a processor, implements the method for extracting a text outline in any of the above embodiments.
It should be understood that the specific embodiments described herein are merely illustrative of this application and are not intended to be limiting. All other embodiments, which can be derived by a person skilled in the art from the examples provided herein without any inventive step, shall fall within the scope of protection of the present application.
It is obvious that the drawings are only examples or embodiments of the present application, and it is obvious to those skilled in the art that the present application can be applied to other similar cases according to the drawings without creative efforts. Moreover, it should be appreciated that in the development of any such actual implementation, as in any engineering or design project, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which may vary from one implementation to another.
Reference throughout this application to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the present application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is to be expressly and implicitly understood by one of ordinary skill in the art that the embodiments described in this application may be combined with other embodiments without conflict.
The above-mentioned embodiments only express several implementation modes of the present application, and the description thereof is specific and detailed, but not construed as limiting the scope of the patent protection. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present application should be subject to the appended claims.

Claims (10)

1. A method for extracting a text outline is characterized by comprising the following steps:
acquiring sentence content characteristics of each sentence of text in the text to be extracted based on readable characters of the text to be extracted, and acquiring sentence format characteristics of each sentence of text in the text to be extracted based on a format of the text to be extracted;
acquiring sentence fusion characteristics of each sentence in the text to be extracted based on the sentence content characteristics and the sentence format characteristics;
acquiring paragraph features of each text segment in the text to be extracted based on the sentence content features and corresponding weights of each text segment in each text segment;
and obtaining outline information corresponding to the text to be extracted based on the sentence fusion characteristics and the paragraph characteristics.
2. The method for extracting the outline of the text according to claim 1, wherein the obtaining sentence content features of each sentence of text in the text to be extracted based on the readable characters of the text to be extracted comprises:
acquiring character features of the text to be extracted based on the readable characters of the text to be extracted;
and acquiring sentence content characteristics of each sentence of text in the text to be extracted based on the character characteristics and corresponding weights of a plurality of readable characters in each sentence of text.
3. The method of extracting a textual outline according to claim 1, wherein the sentence format features include a sentence position feature, a sentence length feature and a sentence placeholder feature.
4. The method for extracting textual outline according to claim 3, wherein the method for obtaining sentence placeholder features comprises:
and acquiring sentence placeholder characteristics of each sentence of text in the text to be extracted based on the format placeholder in the text to be extracted.
5. The method for extracting the text outline according to claim 3, wherein the obtaining sentence fusion characteristics of each sentence of text in the text to be extracted based on the sentence content characteristics and the sentence format characteristics comprises:
performing fusion processing on the sentence length characteristic, the sentence placeholder characteristic and the sentence content characteristic to obtain a sentence initial fusion characteristic;
and performing fusion processing on the sentence initial fusion characteristic and the sentence position characteristic to obtain the sentence fusion characteristic.
6. The method for extracting outline of text according to claim 1, wherein said obtaining paragraph features of each text in said text to be extracted based on sentence content features and corresponding weights of each text in each text comprises:
constructing a weight matrix and a bias matrix corresponding to the sentence content characteristics of all sentence texts;
obtaining paragraph initial features based on the sentence content features, the weight matrix and the bias matrix;
and carrying out normalization processing and aggregation processing on the paragraph initial features to obtain the paragraph features.
7. The method for extracting outline of text according to claim 1, wherein the obtaining of outline information corresponding to the text to be extracted based on the sentence fusion feature and the paragraph feature comprises:
weighting the sentence fusion characteristics and the paragraph characteristics, and normalizing the processing result;
and determining outline information of the text to be extracted based on the result of the normalization processing.
8. An apparatus for extracting a text outline, comprising:
the first acquisition module is used for acquiring sentence content characteristics of each sentence in the text to be extracted based on readable characters of the text to be extracted and acquiring sentence format characteristics of each sentence in the text to be extracted based on the format of the text to be extracted, wherein the sentence content characteristics comprise character characteristics of a corresponding sentence text;
the second obtaining module is used for obtaining sentence fusion characteristics of each sentence of text in the text to be extracted based on the sentence content characteristics and the sentence format characteristics;
the third acquisition module is used for acquiring paragraph characteristics of each text segment in the text to be extracted based on the sentence content characteristics and the corresponding weight of each text segment in each text segment;
and the fourth obtaining module is used for obtaining outline information corresponding to the text to be extracted based on the sentence fusion characteristics and the paragraph characteristics.
9. An electronic device comprising a memory and a processor, wherein the memory stores a computer program, and the processor is configured to execute the computer program to perform the method of extracting a text outline according to any one of claims 1 to 7.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method for extracting a textual outline according to any one of claims 1 to 7.
CN202211533215.0A 2022-12-02 2022-12-02 Text outline extraction method and device, electronic device and storage medium Active CN115952279B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211533215.0A CN115952279B (en) 2022-12-02 2022-12-02 Text outline extraction method and device, electronic device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211533215.0A CN115952279B (en) 2022-12-02 2022-12-02 Text outline extraction method and device, electronic device and storage medium

Publications (2)

Publication Number Publication Date
CN115952279A true CN115952279A (en) 2023-04-11
CN115952279B CN115952279B (en) 2023-09-12

Family

ID=87295872

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211533215.0A Active CN115952279B (en) 2022-12-02 2022-12-02 Text outline extraction method and device, electronic device and storage medium

Country Status (1)

Country Link
CN (1) CN115952279B (en)

Citations (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB0329223D0 (en) * 2003-01-07 2004-01-21 Hewlett Packard Development Co Methods and systems for organizing electronic documents
US20040225667A1 (en) * 2003-03-12 2004-11-11 Canon Kabushiki Kaisha Apparatus for and method of summarising text
CN101556580A (en) * 2009-05-20 2009-10-14 北京工商大学 Stock comment classification system based on analysis of discourse structure and method
WO2011035425A1 (en) * 2009-09-25 2011-03-31 Shady Shehata Methods and systems for extracting keyphrases from natural text for search engine indexing
CN103399924A (en) * 2013-08-05 2013-11-20 河海大学 System and method for generating hydrology and water resource data directory
CN107122350A (en) * 2017-04-27 2017-09-01 北京易麦克科技有限公司 A kind of feature extraction system and method for many paragraph texts
CN109522523A (en) * 2018-09-14 2019-03-26 维沃移动通信有限公司 A kind of method and terminal device showing outline information
CN109710945A (en) * 2018-12-29 2019-05-03 北京百度网讯科技有限公司 Text method, device, computer equipment and storage medium are generated based on data
US20190325029A1 (en) * 2018-04-18 2019-10-24 HelpShift, Inc. System and methods for processing and interpreting text messages
CN110781276A (en) * 2019-09-18 2020-02-11 平安科技(深圳)有限公司 Text extraction method, device, equipment and storage medium
CN110781290A (en) * 2019-10-10 2020-02-11 南京摄星智能科技有限公司 Extraction method of structured text abstract of long chapter
CN111695341A (en) * 2020-06-16 2020-09-22 北京理工大学 Implicit discourse relation analysis method and system based on discourse structure diagram convolution
CN113688633A (en) * 2021-08-02 2021-11-23 珠海金山办公软件有限公司 Outline determination method and device
CN114118053A (en) * 2021-11-26 2022-03-01 武汉天喻信息产业股份有限公司 Contract information extraction method and device
CN114254637A (en) * 2021-12-21 2022-03-29 科大讯飞股份有限公司 Summary generation method, device, equipment and storage medium
CN114330313A (en) * 2021-11-30 2022-04-12 广州金山移动科技有限公司 Method and device for identifying document chapter title, electronic equipment and storage medium
CN115129817A (en) * 2022-07-05 2022-09-30 上海晏鼠计算机技术股份有限公司 Method for extracting Word document outline
CN115270738A (en) * 2022-09-30 2022-11-01 北京澜舟科技有限公司 Method and system for generating newspaper and computer storage medium
CN115310436A (en) * 2021-05-07 2022-11-08 珠海金山办公软件有限公司 Document outline extraction method and device, electronic equipment and storage medium

Patent Citations (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB0329223D0 (en) * 2003-01-07 2004-01-21 Hewlett Packard Development Co Methods and systems for organizing electronic documents
US20040225667A1 (en) * 2003-03-12 2004-11-11 Canon Kabushiki Kaisha Apparatus for and method of summarising text
CN101556580A (en) * 2009-05-20 2009-10-14 北京工商大学 Stock comment classification system based on analysis of discourse structure and method
WO2011035425A1 (en) * 2009-09-25 2011-03-31 Shady Shehata Methods and systems for extracting keyphrases from natural text for search engine indexing
CN103399924A (en) * 2013-08-05 2013-11-20 河海大学 System and method for generating hydrology and water resource data directory
CN107122350A (en) * 2017-04-27 2017-09-01 北京易麦克科技有限公司 A kind of feature extraction system and method for many paragraph texts
US20190325029A1 (en) * 2018-04-18 2019-10-24 HelpShift, Inc. System and methods for processing and interpreting text messages
CN109522523A (en) * 2018-09-14 2019-03-26 维沃移动通信有限公司 A kind of method and terminal device showing outline information
CN109710945A (en) * 2018-12-29 2019-05-03 北京百度网讯科技有限公司 Text method, device, computer equipment and storage medium are generated based on data
CN110781276A (en) * 2019-09-18 2020-02-11 平安科技(深圳)有限公司 Text extraction method, device, equipment and storage medium
CN110781290A (en) * 2019-10-10 2020-02-11 南京摄星智能科技有限公司 Extraction method of structured text abstract of long chapter
CN111695341A (en) * 2020-06-16 2020-09-22 北京理工大学 Implicit discourse relation analysis method and system based on discourse structure diagram convolution
CN115310436A (en) * 2021-05-07 2022-11-08 珠海金山办公软件有限公司 Document outline extraction method and device, electronic equipment and storage medium
CN113688633A (en) * 2021-08-02 2021-11-23 珠海金山办公软件有限公司 Outline determination method and device
CN114118053A (en) * 2021-11-26 2022-03-01 武汉天喻信息产业股份有限公司 Contract information extraction method and device
CN114330313A (en) * 2021-11-30 2022-04-12 广州金山移动科技有限公司 Method and device for identifying document chapter title, electronic equipment and storage medium
CN114254637A (en) * 2021-12-21 2022-03-29 科大讯飞股份有限公司 Summary generation method, device, equipment and storage medium
CN115129817A (en) * 2022-07-05 2022-09-30 上海晏鼠计算机技术股份有限公司 Method for extracting Word document outline
CN115270738A (en) * 2022-09-30 2022-11-01 北京澜舟科技有限公司 Method and system for generating newspaper and computer storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
JIANPENG CHENG: "Neural Summarization by Extracting Sentences and Words", 《2016 ASSOCIATION FOR COMPUTATIONAL LINGUISTIC》, pages 484 *
解艳: "基于LSA和段落聚类的自动文摘系统的研究", 《中国优秀硕士学位论文全文数据库 信息科技》, pages 138 - 2352 *

Also Published As

Publication number Publication date
CN115952279B (en) 2023-09-12

Similar Documents

Publication Publication Date Title
CN109918560B (en) Question and answer method and device based on search engine
US20190287142A1 (en) Method, apparatus for evaluating review, device and storage medium
CN111931517B (en) Text translation method, device, electronic equipment and storage medium
CN113627447B (en) Label identification method, label identification device, computer equipment, storage medium and program product
CN110309511B (en) Shared representation-based multitask language analysis system and method
CN110866098B (en) Machine reading method and device based on transformer and lstm and readable storage medium
CN111599340A (en) Polyphone pronunciation prediction method and device and computer readable storage medium
CN112183085A (en) Machine reading understanding method and device, electronic equipment and computer storage medium
CN112287069A (en) Information retrieval method and device based on voice semantics and computer equipment
CN115827819A (en) Intelligent question and answer processing method and device, electronic equipment and storage medium
CN110852066B (en) Multi-language entity relation extraction method and system based on confrontation training mechanism
CN110162624A (en) A kind of text handling method, device and relevant device
CN111209297B (en) Data query method, device, electronic equipment and storage medium
CN114492669B (en) Keyword recommendation model training method, recommendation device, equipment and medium
CN115203388A (en) Machine reading understanding method and device, computer equipment and storage medium
US20220083745A1 (en) Method, apparatus and electronic device for determining word representation vector
CN109033082B (en) Learning training method and device of semantic model and computer readable storage medium
CN111325033A (en) Entity identification method, entity identification device, electronic equipment and computer readable storage medium
CN114297449A (en) Content searching method and device, electronic equipment, computer readable medium and product
CN114281983B (en) Hierarchical text classification method, hierarchical text classification system, electronic device and storage medium
CN111898363B (en) Compression method, device, computer equipment and storage medium for long and difficult text sentence
CN113935312A (en) Long text matching method and device, electronic equipment and computer readable storage medium
CN117076946A (en) Short text similarity determination method, device and terminal
CN116244442A (en) Text classification method and device, storage medium and electronic equipment
CN115952279B (en) Text outline extraction method and device, electronic device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant