WO2020164204A1

WO2020164204A1 - Text template recognition method and apparatus, and computer readable storage medium

Info

Publication number: WO2020164204A1
Application number: PCT/CN2019/088628
Authority: WO
Inventors: 刘轲
Original assignee: 平安科技（深圳）有限公司
Priority date: 2019-02-11
Filing date: 2019-05-27
Publication date: 2020-08-20
Also published as: CN109977995A

Abstract

Disclosed in the present application is a test template recognition method, the method comprising: acquiring a preset text template and a matching text; on the basis of a word frequency-based text similarity algorithm, calculating a first degree of similarity between the matching text and the preset text template; and/or, on the basis of a semantics-based text similarity algorithm, calculating a second degree of similarity between the matching text and the preset text template; and, when the first degree of similarity and/or the second degree of similarity meets a preset similarity condition, determining that the matching text is a text template similar to the preset text template. Also provided in the present application are a text template recognition apparatus and a computer readable storage medium. The present application can improve the efficiency and accuracy of text template recognition.

Description

Text template recognition method, device and computer readable storage medium

This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on February 11, 2019. The application number is 201910109887.0 and the invention title is "Text template recognition method, device and computer readable storage medium". The entire content of the patent application is by reference Incorporated in this application.

Technical field

This application relates to the field of natural language processing technology, and in particular to a text template recognition method, device, and computer-readable storage medium.

Background technique

With the development of Internet technology, people in all walks of life can freely publish and download information through online platforms, which makes the information on the Internet more and more. Big data analysis is to analyze the amount of data on the Internet and extract what they need Information. Sometimes you need to use a text template when analyzing big data, that is, text information that contains some specific words. Generally, the same text information or similar text information can correspond to a text template. In the prior art, the method for obtaining the text template is usually by the worker extracting various information, but this method is time-consuming and labor-intensive, and the worker needs a long time to identify and then obtain the text module.

Summary of the invention

This application provides a text template recognition method, device and computer readable storage medium, the main purpose of which is to improve the efficiency and accuracy of text template recognition.

To achieve the above objective, this application also provides a text template recognition method, which includes:

Obtain the preset text template and matching text;

Calculate the first similarity between the matched text and the preset text template according to a text similarity algorithm based on word frequency; and/or

Calculating the second similarity between the matched text and the preset text template according to a semantic-based text similarity algorithm;

When the first similarity degree and/or the second similarity degree satisfy a preset similarity degree condition, it is determined that the matching text is a text template similar to the preset text template.

In addition, in order to achieve the above object, the present application also provides a text template recognition device, which includes a memory and a processor. The memory stores a text template recognition program that can run on the processor. The following steps are implemented when the recognition program is executed by the processor:

Obtain the preset text template and matching text;

When the first degree of similarity and/or the second degree of similarity satisfy a preset similarity condition, it is determined that the matching text is a text template similar to the preset text template.

In addition, in order to achieve the above object, the present application also provides a computer-readable storage medium having a text template recognition program stored on the computer-readable storage medium, and the text template recognition program can be executed by one or more processors, To realize the steps of the text template recognition method as described above.

The text template recognition method, text template recognition device, and computer-readable storage medium proposed in this application obtain a preset text template and a matching text; calculate the difference between the matching text and the preset text template according to a text similarity algorithm based on word frequency The first degree of similarity; and/or the second degree of similarity between the matched text and the preset text template is calculated according to a semantic-based text similarity algorithm; when the first degree of similarity and/or the second degree of similarity When the preset similarity condition is satisfied, it is determined that the matching text is a text template similar to the preset text template. Without the need for staff to judge one by one, the text module similar to the preset text template can be quickly obtained, which achieves the purpose of improving the efficiency of text template recognition, and when calculating text similarity, the text similarity based on word frequency The calculation of the degree algorithm and/or the semantic-based text similarity algorithm can improve the accuracy of text template recognition.

Description of the drawings

FIG. 1 is a schematic flowchart of a text template recognition method provided by an embodiment of this application;

2 is a schematic diagram of the internal structure of a text template recognition device provided by an embodiment of the application;

FIG. 3 is a schematic diagram of modules of a text template recognition program in a text template recognition device provided by an embodiment of the application.

The realization, functional characteristics and advantages of the purpose of this application will be further described in conjunction with the embodiments and with reference to the accompanying drawings.

detailed description

It should be understood that the specific embodiments described here are only used to explain the application, and are not used to limit the application.

This application provides a text template recognition method. Referring to FIG. 1, it is a schematic flowchart of a text template recognition method provided by the first embodiment of this application. The method can be executed by an electronic device.

In this embodiment, the text template recognition method includes:

Step S10: Obtain a preset text template and matching text.

The preset text template may be a text template pre-stored in a preset storage area (for example, stored in an electronic device). The preset text template can be obtained by the user and stored in a preset storage area, or the preset text template is obtained by analyzing several texts of similar words and extracting similar keywords in the text.

In a possible embodiment, the preset text template is any text template in a text template collection, and the text collection is all text templates of the same type, or the text collection includes various types of text templates. The obtaining of the preset text template includes: obtaining a text template collection; obtaining a text template in the text template collection.

The matched text is text that needs to be judged whether it is a similar text template. The matched text can consist of one or more sentences.

Step S20: Calculate the first similarity between the matching text and the preset text template according to the word frequency-based text similarity algorithm and/or calculate the matching text and the preset text according to the semantic-based text similarity algorithm The second degree of similarity of the template.

The word frequency-based text similarity algorithm calculates the similarity between two texts by the appearance frequency of words; the semantic-based text similarity algorithm calculates the similarity between two texts by the semantics of this.

The specific word frequency-based text similarity algorithm and the semantic-based text similarity algorithm can be obtained from the prior art, and will not be repeated here.

Optionally, in another embodiment of the invention, the calculation of the first similarity between the matched text and the preset text template according to the word frequency-based text similarity algorithm and/or the semantic-based text similarity algorithm Calculating the second similarity between the matched text and the preset text template includes:

Calculating the first similarity between the matched text and the preset text template by using a vector space model;

The LDA document topic generation model is used to calculate the second similarity between the matching text and the preset text model.

In this embodiment, the vector space model is used to calculate the first similarity between the matched text and the preset text template. Using the Vector Space Model (SVM) to calculate the first similarity between the matched text and the preset text template includes:

Perform preprocessing operations on the matched text and preset text templates. The preprocessing operations include, but are not limited to, word segmentation and stop-word removal (including words, symbols, punctuation, garbled characters that have little meaning to the text content, such as "this" "的", "呀", etc.) to obtain the preprocessed matching text and the preprocessed preset text template;

The first keyword is determined from the frequency of words in the preprocessed matching text, and the second keyword is determined from the frequency of words in the preprocessed preset text template, where both the first keyword and the second keyword can be used Contains multiple words;

For example, it is determined that a word with a frequency greater than a preset frequency in the preprocessed matched text is the first keyword.

After determining the first keyword and the second keyword, calculate the reverse text frequency of the first keyword and the reverse text frequency of the second keyword, and generate the first vector representing the matched text and the first vector representing the preset text template Two vectors

Among them, inverse document frequency (IDF) is an index used to measure the weight of keywords.

The reverse text frequency of a certain keyword can be calculated according to its formula IDF=log(D/D _w ), where D is the total number of texts in the sample database, and D _w is the number of texts where the keyword has appeared.

In this embodiment, the first vector and the second vector are obtained according to the following formula:

D=D(T1, W1; T2, W2;..., Tn, Wn)

Among them, T1 is a keyword, W1 is the reverse text frequency of the keyword; T2 is another keyword, W2 is the reverse text frequency of the keyword; and so on, Tn is the nth keyword, Wn is the keyword The reverse text frequency of the keyword.

In the vector space model, the content correlation between two texts Sim (D1, D2) is usually expressed by the cosine value of the angle between the vectors. Therefore, the first vector space model of the matching text and the preset text template are obtained. After the second vector, the cosine of the first vector and the second vector is calculated to obtain the first similarity between the pre-matched text and the preset text template. The formula for calculating the cosine can be obtained from the prior art and will not be repeated here.

In this embodiment, the text is simplified as an N-dimensional vector with the weight of the feature item (keyword) as the component, which simplifies the complex relationship between keywords in the text, makes the model computable, and can Quickly obtain the first similarity between the matched text and the preset text template. .

In this embodiment, the basic idea of the LDA (Latent Dirichlet Allocation, implicit Dirichlet distribution) model is to describe the document as a topic probability distribution and further describe the topic as a term probability distribution. Specifically, how to calculate the second similarity between the matched text and the preset text model according to the LDA document topic generation model can be obtained from the prior art, and will not be repeated here.

Step S30: When the first similarity degree and/or the second similarity degree satisfy a preset similarity degree condition, it is determined that the matching text is a text template similar to the preset text template.

The preset similarity condition may be preset.

Optionally, in another embodiment of the present application, that the first similarity or the second similarity meets a preset similarity condition includes:

The first similarity is greater than the first preset similarity or the second similarity is greater than the second preset similarity.

The first preset similarity degree and the second preset similarity degree may be preset as required, and the values of the first preset similarity degree and the second preset similarity degree may be the same or different. For example, the first preset similarity is 85%, and the second preset similarity is 90%; or, both the first preset similarity and the second preset similarity are 90%.

Optionally, in another embodiment of the present application, that the first similarity and the second similarity satisfy a preset similarity condition includes:

Performing linear weighting according to the first similarity and the second similarity to obtain the third similarity between the matched text and the preset text template;

Judging whether the third similarity is greater than the third preset similarity;

If the third similarity is greater than the third preset similarity, it is determined that the first similarity and the second similarity satisfy a preset similarity condition.

Linear weighting is to give a certain weight value to the first similarity and the second similarity and then add them to obtain the third similarity.

The third preset similarity degree may be preset.

Optionally, in another embodiment of the present application, the linear weighting is performed according to the first similarity and the second similarity to obtain the third similarity between the matched text and the preset text template include:

The first similarity and the second similarity are input into a preset linear weighting formula, and the third similarity between the matched text and the preset text template is output, and the preset linear weighting formula is:

sim(p,q)=αsim _LDA (p,q)+βsim _TFIDF (p,q),

Where p and q are the matching text and the preset text template respectively, sim _TFIDF (p, q) is the first similarity degree, sim _LDA (p, q) is the second similarity degree, sim (p, q) is the third degree of similarity, and α and β are preset weight values.

In this embodiment, 0≤α≤1, 0≤β≤1, and the sum of α and β is 1.

Optionally, in another embodiment of the present application, the method further includes: obtaining a weight value for linear weighting. The obtaining a weight value for linear weighting includes:

Assigning a first initial value to the weight value, and calculating the third degree of similarity according to the first initial value;

Judging whether the matching template and the preset text template are in the same category by a preset clustering algorithm, and obtaining a clustering result;

Judging by the clustering result whether the third degree of similarity calculated according to the first initial value is accurate;

If it is determined that the third degree of similarity calculated according to the first initial value is accurate, determining that the first initial value is a weight value used for linear weighting;

If it is determined that the third similarity calculated according to the first initial value is not accurate, the first initial value is adjusted, and the operation of calculating the third similarity according to the first initial value is performed.

The above steps are used to obtain the value of α or β.

The clustering result is that the matching template and the preset text template are in the same category, or the matching template and the preset text template are not in the same category.

The first initial value may be 0.1. When the first initial value is adjusted, it may be increased by 0.1 each time. For example, if the obtained weight is α, that is, when the value is initially assigned, α is 0.1, and β is 0.9 at this time. The third similarity between the matching text and the preset text template is calculated according to the preset linear weighting formula, and the clustering algorithm is used Determine whether the matching template and the preset text template are in the same category. If the third similarity is less than 50%, and the clustering algorithm determines that the matching template and the preset text template are not in the same category, then it is determined whether the matching template and the preset text template are in the same category. The third similarity is not accurate. Let α=α+0.1, then α is 0.2, and β is 0.8 at this time. According to the preset linear weighting formula, the third similarity between the matching text and the preset text template is calculated, and the matching template and the preset text are judged by the clustering algorithm Whether the template is of the same category, if it is not accurate, let α=α+0.1, then α is 0.3, and β is 0.7 at this time, calculate again, and so on until the optimal value of α and β are found.

In this embodiment, when it is determined that the matched text is a text template similar to the preset text template, the matched text can be added to the template set of the preset text template, so that through this embodiment, multiple text template sets can be obtained , Each text template collection contains similar text templates.

The text template recognition method proposed in this embodiment obtains a preset text template and a matching text; calculates the first similarity between the matching text and the preset text template according to a text similarity algorithm based on word frequency; and/or according to The semantic text similarity algorithm calculates the second similarity between the matched text and the preset text template; when the first similarity and/or the second similarity meets the preset similarity condition, the The matched text is a text template similar to the preset text template. Without the need for staff to judge one by one, the text module similar to the preset text template can be quickly obtained, which achieves the purpose of improving the efficiency of text template recognition, and when calculating text similarity, the text similarity based on word frequency The calculation of the degree algorithm and/or the semantic-based text similarity algorithm can improve the accuracy of text template recognition.

The application also provides a text template recognition device. Referring to FIG. 2, it is a schematic diagram of the internal structure of a text template recognition device provided by an embodiment of this application.

In this embodiment, the text template recognition device 1 may be a PC (Personal Computer, personal computer), or a terminal device such as a smart phone, a tablet computer, or a portable computer. The text template recognition device 1 at least includes a memory 11, a processor 12, a network interface 13, and a communication bus 14.

Wherein, the memory 11 includes at least one type of readable storage medium, and the readable storage medium includes flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), magnetic memory, magnetic disk, optical disk, etc. The memory 11 may be an internal storage unit of the text template recognition device 1 in some embodiments, such as a hard disk of the text template recognition device 1. In other embodiments, the memory 11 may also be an external storage device of the text template recognition device 1, such as a plug-in hard disk equipped on the text template recognition device 1, a smart media card (SMC), and a secure digital (Secure Digital). Digital, SD) card, flash card (Flash Card), etc. Further, the memory 11 may also include both an internal storage unit of the text template recognition apparatus 1 and an external storage device. The memory 11 can be used not only to store application software and various data installed in the text template recognition device 1, such as the code of the text template recognition program 01, etc., but also to temporarily store data that has been output or will be output.

The processor 12 may be a central processing unit (CPU), controller, microcontroller, microprocessor or other data processing chip in some embodiments, and is used to run the program code or processing stored in the memory 11 Data, such as executing text template recognition program 01, etc.

The network interface 13 may optionally include a standard wired interface and a wireless interface (such as a WI-FI interface), and is usually used to establish a communication connection between the device 1 and other electronic devices.

The communication bus 14 is used to realize the connection and communication between these components.

Optionally, the text template recognition device 1 may also include a user interface. The user interface may include a display (Display) and an input unit such as a keyboard (Keyboard). The optional user interface may also include a standard wired interface and a wireless interface. Optionally, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode, organic light emitting diode) touch device, etc. Among them, the display can also be appropriately called a display screen or a display unit, which is used to display the information processed in the text template recognition device 1 and to display a visualized user interface.

Figure 2 only shows the text template recognition device 1 with components 11-14 and the text template recognition program 01. Those skilled in the art can understand that the structure shown in Figure 2 does not constitute a limitation on the text template recognition device 1 It may include fewer or more components than shown, or a combination of some components, or a different component arrangement.

In the embodiment of the text template recognition device 1 shown in FIG. 2, a text template recognition program 01 is stored in the memory 11; the processor 12 implements the following steps when executing the text template recognition program 01 stored in the memory 11:

Get the preset text template and matching text.

Calculate the first similarity between the matching text and the preset text template according to the word frequency-based text similarity algorithm and/or calculate the first similarity between the matching text and the preset text template according to the semantic-based text similarity algorithm Two similarity.

Optionally, in another embodiment of the invention, the calculation of the first similarity between the matched text and the preset text template according to the word frequency-based text similarity algorithm and the calculation of the first similarity between the matching text and the preset text template according to the semantic-based text similarity algorithm The second degree of similarity between the matched text and the preset text template includes:

D=D(T1, W1; T2, W2;..., Tn, Wn)

The preset similarity condition may be preset.

The third preset similarity degree may be preset.

sim(p,q)=αsim _LDA (p,q)+βsim _TFIDF (p,q),

In this embodiment, 0≤α≤1, 0≤β≤1, and the sum of α and β is 1.

Optionally, in another embodiment of the present application, the text template recognition program is executed by the processor, and the following steps are further implemented:

Get the weight value used for linear weighting.

The obtaining a weight value for linear weighting includes:

The above steps are used to obtain the value of α or β.

The text template recognition device proposed in this embodiment obtains a preset text template and a matching text; calculates the first similarity between the matching text and the preset text template according to a text similarity algorithm based on word frequency; and/or according to The semantic text similarity algorithm calculates the second similarity between the matched text and the preset text template; when the first similarity and/or the second similarity meets the preset similarity condition, the The matched text is a text template similar to the preset text template. Without the need for staff to judge one by one, the text module similar to the preset text template can be quickly obtained, which achieves the purpose of improving the efficiency of text template recognition, and when calculating text similarity, the text similarity based on word frequency The calculation of the degree algorithm and/or the semantic-based text similarity algorithm can improve the accuracy of the recognition of the text template.

Optionally, in other embodiments, the text template recognition program may also be divided into one or more modules, and the one or more modules are stored in the memory 11 and run by one or more processors (in this embodiment, The processor 12) is executed to complete the application. The module referred to in the application refers to a series of computer program instruction segments capable of completing specific functions, and is used to describe the execution process of the text template recognition program in the text template recognition device.

For example, referring to FIG. 3, a schematic diagram of the program modules of the text template recognition program 01 in an embodiment of the text template recognition device of this application. In this embodiment, the text template recognition program can be divided into an acquisition module 10 and a calculation module 20. And determining module 30, exemplarily:

The obtaining module 10 is used for: obtaining a preset text template and matching text;

The calculation module 20 is configured to: calculate the first similarity between the matching text and the preset text template according to a text similarity algorithm based on word frequency; and/or calculate the matching text and the matching text according to a semantic-based text similarity algorithm State the second similarity of the preset text template;

The determining module 30 is configured to determine that the matched text is a text template similar to the preset text template when the first similarity degree and/or the second similarity degree satisfy a preset similarity degree condition.

The functions or operation steps implemented by the program modules such as the acquisition module 10, the calculation module 20, and the determination module 30 when executed are substantially the same as those in the foregoing embodiment, and will not be repeated here.

In addition, an embodiment of the present application also proposes a computer-readable storage medium that stores a text template recognition program on the computer-readable storage medium, and the text template recognition program can be executed by one or more processors to achieve the following operating:

Obtain the preset text template and matching text;

The specific implementation of the computer-readable storage medium of the present application is basically the same as the embodiments of the text template recognition device and method described above, and will not be repeated here.

It should be noted that the serial numbers of the above embodiments of the present application are only for description, and do not represent the advantages and disadvantages of the embodiments. And the terms "include", "include" or any other variants thereof in this article are intended to cover non-exclusive inclusion, so that a process, device, article or method including a series of elements not only includes those elements, but also includes The other elements listed may also include elements inherent to the process, device, article, or method. If there are no more restrictions, the element defined by the sentence "including a..." does not exclude the existence of other identical elements in the process, device, article or method that includes the element.

Through the description of the above embodiments, those skilled in the art can clearly understand that the method of the above embodiments can be implemented by means of software plus the necessary general hardware platform. Of course, it can also be implemented by hardware, but in many cases the former is better.的实施方式。 Based on this understanding, the technical solution of this application essentially or the part that contributes to the existing technology can be embodied in the form of a software product, and the computer software product is stored in a storage medium (such as ROM/RAM) as described above. , Magnetic disk, optical disk), including several instructions to make a terminal device (which can be a mobile phone, a computer, a server, or a network device, etc.) execute the method described in each embodiment of the present application.

The above are only preferred embodiments of this application, and do not limit the scope of this application. Any equivalent structure or equivalent process transformation made using the content of the description and drawings of this application, or directly or indirectly used in other related technical fields , The same reason is included in the scope of patent protection of this application.

Claims

A text template recognition method, characterized in that the method includes:

Obtain the preset text template and matching text;

Calculate the first similarity between the matched text and the preset text template according to a text similarity algorithm based on word frequency; and/or

Calculating the second similarity between the matched text and the preset text template according to a semantic-based text similarity algorithm;

When the first degree of similarity and/or the second degree of similarity satisfy a preset similarity condition, it is determined that the matching text is a text template similar to the preset text template.
The text template recognition method according to claim 1, wherein the first similarity and/or the first similarity between the matched text and the preset text template is calculated according to a text similarity algorithm based on word frequency. Calculating the second similarity between the matched text and the preset text template according to the semantic-based text similarity algorithm includes:

Calculating the first similarity between the matched text and the preset text template by using a vector space model;

Calculating the second similarity between the matching text and the preset text model by using the LDA document topic generation model;

The first similarity degree and the second similarity degree satisfying a preset similarity degree condition includes:

Performing linear weighting according to the first similarity and the second similarity to obtain the third similarity between the matched text and the preset text template;

Judging whether the third similarity is greater than the third preset similarity;

If the third similarity is greater than the preset similarity, it is determined that the first similarity and the second similarity satisfy a preset similarity condition.
The text template recognition method according to claim 2, wherein the linear weighting is performed according to the first similarity and the second similarity to obtain the first of the matched text and the preset text template Three similarities include:

The first similarity and the second similarity are input into a preset linear weighting formula, and the third similarity between the matched text and the preset text template is output, and the preset linear weighting formula is:

sim(p,q)=αsim LDA (p,q)+βsim TFIDF (p,q),

Where p and q are the matching text and the preset text template respectively, sim TFIDF (p, q) is the first similarity degree, sim LDA (p, q) is the second similarity degree, sim (p, q) is the third degree of similarity, and α and β are preset weight values.
3. The text template recognition method of claim 2, wherein the method further comprises:

Get the weight value used for linear weighting, including:

Assigning a first initial value to the weight value, and calculating the third degree of similarity according to the first initial value;

Judging whether the matching template and the preset text template are in the same category by a preset clustering algorithm, and obtaining a clustering result;

Judging by the clustering result whether the third degree of similarity calculated according to the first initial value is accurate;

If it is determined that the third degree of similarity calculated according to the first initial value is accurate, determining that the first initial value is a weight value used for linear weighting;

If it is determined that the third degree of similarity calculated according to the first initial value is not accurate, adjust the first initial value, and perform the operation of calculating the third degree of similarity according to the first initial value.
5. The text template recognition method of claim 3, wherein the method further comprises:

Get the weight value used for linear weighting, including:

Assigning a first initial value to the weight value, and calculating the third degree of similarity according to the first initial value;

Judging whether the matching template and the preset text template are in the same category by a preset clustering algorithm, and obtaining a clustering result;

Judging by the clustering result whether the third degree of similarity calculated according to the first initial value is accurate;

If it is determined that the third degree of similarity calculated according to the first initial value is accurate, determining that the first initial value is a weight value used for linear weighting;

If it is determined that the third degree of similarity calculated according to the first initial value is not accurate, adjust the first initial value, and perform the operation of calculating the third degree of similarity according to the first initial value.
5. The text template recognition method according to claim 1, wherein the first similarity or the second similarity satisfying a preset similarity condition comprises:

The first similarity is greater than the first preset similarity or the second similarity is greater than the second preset similarity.
5. The text template recognition method according to any one of claims 2-5, wherein the first similarity or the second similarity satisfying a preset similarity condition comprises:

The first similarity is greater than the first preset similarity or the second similarity is greater than the second preset similarity.
A text template recognition device, characterized in that the device comprises a memory and a processor, the memory stores a text template recognition program that can be run on the processor, and the text template recognition program is processed by the processor. The following steps are implemented when the device is executed:

Obtain the preset text template and matching text;

Calculate the first similarity between the matched text and the preset text template according to a text similarity algorithm based on word frequency; and/or

Calculating the second similarity between the matched text and the preset text template according to a semantic-based text similarity algorithm;

When the first degree of similarity and/or the second degree of similarity satisfy a preset similarity condition, it is determined that the matching text is a text template similar to the preset text template.
8. The text template recognition device according to claim 8, wherein the first similarity between the matched text and the preset text template is calculated according to a text similarity algorithm based on word frequency and/or according to semantic-based The text similarity algorithm calculating the second similarity between the matched text and the preset text template includes:

Calculating the first similarity between the matched text and the preset text template by using a vector space model;

Calculating the second similarity between the matching text and the preset text model by using the LDA document topic generation model;

The first similarity degree and the second similarity degree satisfying a preset similarity degree condition includes:

Performing linear weighting according to the first similarity and the second similarity to obtain the third similarity between the matched text and the preset text template;

Judging whether the third similarity is greater than the third preset similarity;

If the third similarity is greater than the preset similarity, it is determined that the first similarity and the second similarity satisfy a preset similarity condition.
The text template recognition device according to claim 9, wherein the linear weighting is performed according to the first similarity and the second similarity to obtain the first of the matched text and the preset text template Three similarities include:

The first similarity and the second similarity are input into a preset linear weighting formula, and the third similarity between the matched text and the preset text template is output, and the preset linear weighting formula is:

sim(p,q)=αsim LDA (p,q)+βsim TFIDF (p,q),

Where p and q are the matching text and the preset text template respectively, sim TFIDF (p, q) is the first similarity degree, sim LDA (p, q) is the second similarity degree, sim (p, q) is the third degree of similarity, and α and β are preset weight values.
9. The text template recognition device of claim 9, wherein the text template recognition program is executed by the processor, and further implements the following steps:

Get the weight value used for linear weighting, including:

Assigning a first initial value to the weight value, and calculating the third degree of similarity according to the first initial value;

Judging whether the matching template and the preset text template are in the same category by a preset clustering algorithm, and obtaining a clustering result;

Judging by the clustering result whether the third degree of similarity calculated according to the first initial value is accurate;

If it is determined that the third degree of similarity calculated according to the first initial value is accurate, determining that the first initial value is a weight value used for linear weighting;

If it is determined that the third degree of similarity calculated according to the first initial value is not accurate, adjust the first initial value, and perform the operation of calculating the third degree of similarity according to the first initial value.
9. The text template recognition device of claim 10, wherein the text template recognition program is executed by the processor, and further implements the following steps:

Get the weight value used for linear weighting, including:

Assigning a first initial value to the weight value, and calculating the third degree of similarity according to the first initial value;

Judging whether the matching template and the preset text template are in the same category by a preset clustering algorithm, and obtaining a clustering result;

Judging by the clustering result whether the third degree of similarity calculated according to the first initial value is accurate;

If it is determined that the third degree of similarity calculated according to the first initial value is accurate, determining that the first initial value is a weight value used for linear weighting;

If it is determined that the third degree of similarity calculated according to the first initial value is not accurate, adjust the first initial value, and perform the operation of calculating the third degree of similarity according to the first initial value.
8. The text template recognition device of claim 8, wherein the first similarity or the second similarity satisfying a preset similarity condition comprises:

The first similarity is greater than the first preset similarity or the second similarity is greater than the second preset similarity.
11. The text template recognition device according to any one of claims 9-12, wherein the first similarity or the second similarity satisfying a preset similarity condition comprises:

The first similarity is greater than the first preset similarity or the second similarity is greater than the second preset similarity.
A computer-readable storage medium, characterized in that a text template recognition program is stored on the computer-readable storage medium, and the text template recognition program can be executed by one or more processors to implement the following steps:

Obtain the preset text template and matching text;

Calculate the first similarity between the matched text and the preset text template according to a text similarity algorithm based on word frequency; and/or

Calculating the second similarity between the matched text and the preset text template according to a semantic-based text similarity algorithm;

When the first degree of similarity and/or the second degree of similarity satisfy a preset similarity condition, it is determined that the matching text is a text template similar to the preset text template.
The computer-readable storage medium according to claim 15, wherein the first similarity between the matched text and the preset text template is calculated according to a text similarity algorithm based on word frequency and/or according to semantics-based The text similarity algorithm of Calculating the second similarity between the matched text and the preset text template includes:

Calculating the first similarity between the matched text and the preset text template by using a vector space model;

Calculating the second similarity between the matching text and the preset text model by using the LDA document topic generation model;

The first similarity degree and the second similarity degree satisfying a preset similarity degree condition includes:

Performing linear weighting according to the first similarity and the second similarity to obtain the third similarity between the matched text and the preset text template;

Judging whether the third similarity is greater than the third preset similarity;

If the third similarity is greater than the preset similarity, it is determined that the first similarity and the second similarity satisfy a preset similarity condition.
The computer-readable storage medium according to claim 16, wherein the linear weighting is performed according to the first similarity and the second similarity to obtain the difference between the matching text and the preset text template The third degree of similarity includes:

The first similarity and the second similarity are input into a preset linear weighting formula, and the third similarity between the matched text and the preset text template is output, and the preset linear weighting formula is:

sim(p,q)=αsim LDA (p,q)+βsim TFIDF (p,q),

Where p and q are the matching text and the preset text template respectively, sim TFIDF (p, q) is the first similarity degree, sim LDA (p, q) is the second similarity degree, sim (p, q) is the third degree of similarity, and α and β are preset weight values.
16. The computer-readable storage medium of claim 16, wherein the text template recognition program is executed by the processor, and further implements the following steps:

Get the weight value used for linear weighting, including:

Assigning a first initial value to the weight value, and calculating the third degree of similarity according to the first initial value;

Judging whether the matching template and the preset text template are in the same category by a preset clustering algorithm, and obtaining a clustering result;

Judging by the clustering result whether the third degree of similarity calculated according to the first initial value is accurate;

If it is determined that the third degree of similarity calculated according to the first initial value is accurate, determining that the first initial value is a weight value used for linear weighting;

If it is determined that the third degree of similarity calculated according to the first initial value is not accurate, adjust the first initial value, and perform the operation of calculating the third degree of similarity according to the first initial value.
18. The computer-readable storage medium according to claim 17, wherein the text template recognition program is executed by the processor, and the following steps are further implemented:

Get the weight value used for linear weighting, including:

Assigning a first initial value to the weight value, and calculating the third degree of similarity according to the first initial value;

Judging whether the matching template and the preset text template are in the same category by a preset clustering algorithm, and obtaining a clustering result;

Judging by the clustering result whether the third degree of similarity calculated according to the first initial value is accurate;

If it is determined that the third degree of similarity calculated according to the first initial value is accurate, determining that the first initial value is a weight value used for linear weighting;

If it is determined that the third degree of similarity calculated according to the first initial value is not accurate, adjust the first initial value, and perform the operation of calculating the third degree of similarity according to the first initial value.
18. The computer-readable storage medium according to any one of claims 16-18, wherein the first similarity or the second similarity satisfying a preset similarity condition comprises:

The first similarity is greater than the first preset similarity or the second similarity is greater than the second preset similarity.