CN117332768A

CN117332768A - Data processing system for acquiring text generation template

Info

Publication number: CN117332768A
Application number: CN202311308051.6A
Authority: CN
Inventors: 石江枫; 于伟; 靳雯; 王全修; 赵洲洋
Original assignee: Rizhao Ruian Information Technology Co ltd; Beijing Rich Information Technology Co ltd
Current assignee: Rizhao Ruian Information Technology Co ltd; Beijing Rich Information Technology Co ltd
Priority date: 2023-10-10
Filing date: 2023-10-10
Publication date: 2024-01-02
Anticipated expiration: 2043-10-10
Also published as: CN117332768B

Abstract

The invention provides a data processing system for acquiring a text generation template, which comprises the following steps: the method comprises the following steps of key text corresponding to a target user, a preset model type tree, a processor and a memory storing a computer program, wherein when the computer program is executed by the processor, the following steps are realized: obtaining a candidate template text; obtaining a target NLP model according to the candidate template text and a plurality of first preset sample texts; acquiring a middle confidence coefficient list according to the key text and the target NLP model; acquiring a middle text generation template according to the middle confidence coefficient list, the number of characters of all keywords corresponding to the key text and the number of characters in the key text; adjusting the intermediate text generation template to obtain a target text generation template; according to the invention, the text generation template does not need to be manually manufactured or selected, so that the accuracy of acquiring the text generation template is improved, the thinking of a user can be widened, and the working efficiency of the user is improved.

Description

Data processing system for acquiring text generation template

Technical Field

The invention relates to the technical field of text processing, in particular to a data processing system for acquiring a text generation template.

Background

With the wide application of the internet, when a user needs to generate a report, the report can be directly generated through the text generation template without manually writing the report, so that it is very necessary to obtain an accurate and proper text generation template.

However, the above method also has the following technical problems:

the existing templates are limited in online template making or self-use, uncontrollable, a user needs to make or select a template through a large amount of thinking, and when the thinking of the user is not divergent enough, the made template is not accurate enough, and the application range of the template is limited, so that the text generation template obtained through the method is not accurate enough.

Disclosure of Invention

Aiming at the technical problems, the invention adopts the following technical scheme:

a data processing system for retrieving a text generation template, comprising: key text T corresponding to target user ⁰ The method comprises the following steps of:

s1, according to T ⁰ And A, obtain T ⁰ Corresponding candidate template text list t= { T ₁ ，T ₂ ，……，T _b ，……，T _c }，T _b The candidate template text for the b-th candidate template, b=1, 2, … …, c, c is the number of candidate templates, and the candidate template text is a text for describing the applicable range of the candidate template.

S2, training an NLP model according to the T and a plurality of first preset sample texts to obtain a target NLP model, wherein the result of the output of the NLP model is that the first preset sample texts are respectively matched with the T ₁ ，T ₂ ，……，T _b ，……，T _c Confidence between them.

S3, T is taken as ⁰ Input into target NLP model to obtain T ⁰ Corresponding intermediate confidence list T ¹ ＝{T ¹ ₁ ，T ¹ ₂ ，……，T ¹ _b ，……，T ¹ _c }，T ¹ _b Is T ⁰ And T _b An intermediate confidence level between.

S4, when AT/BT is more than or equal to YT, T is processed according to a first processing method ⁰ T and T ¹ Processing to obtain an intermediate text generation template, wherein AT is T ⁰ The sum of the number of characters of all corresponding keywords, BT is T ⁰ The first processing method in S4 includes the following steps:

s41, acquiring a first keyword list GJ= { GJ ₁ ，GJ ₂ ，……，GJ _(ai) ，……，GJ _(am) }，GJ _(ai) For the ai first keyword, ai=1, 2, … …, am, am is the number of first keywords, and the first keyword is T ⁰ Corresponding keywords.

S42, acquiring a second keyword list GJ corresponding to the T ⁰ ＝{GJ ⁰ ₁ ，GJ ⁰ ₂ ，……，GJ ⁰ _b ，……，GJ ⁰ _c }，GJ ⁰ _b ＝{GJ ⁰ _b1 ，GJ ⁰ _b2 ，……，GJ ⁰ _b(aj) ，……，GJ ⁰ _b(an) }，GJ ⁰ _b(aj) Is T _b Corresponding toIs a second keyword list GJ of (2) ⁰ _b Aj=1, 2, … …, an, an is the number of second keywords in the second keyword list, and the second keywords are keywords in the candidate template text.

S43, according to GJ and GJ ⁰ Acquiring a list similarity list GJ corresponding to the GJ ¹ ＝{GJ ¹ ₁ ，GJ ¹ ₂ ，……，GJ ¹ _b ，……，GJ ¹ _c }，GJ ¹ _b Is GJ and GJ ⁰ _b List similarity between, wherein GJ ¹ _b Meets the following conditions:

GJ ¹ _b ＝Σ ^am _ai＝1 (Σ ^an _aj＝1 XS ^(ai) _(aj) /an)/am，XS ^(ai) _(aj) for GJ _(ai) With GJ ⁰ _b(aj) Word similarity between.

S44 according to T ¹ And GJ ¹ Acquiring a first priority list YX= { YX corresponding to T ₁ ，YX ₂ ，……，YX _b ，……，YX _c }，YX _b Is T _b Corresponding first priority, YX _b Meets the following conditions:

YX _b ＝T ¹ _b +GJ ¹ _b 。

s45, when there is only one maximum value in YX, determining max (YX ₁ ，YX ₂ ，……，YX _b ，……，YX _c ) Corresponding T _b The corresponding candidate templates are templates for the intermediate text generation.

S46, when there are a plurality of maximum values in YX, according to T ⁰ And T _b An intermediate text generation template is determined.

S5, when AT/BT is less than YT, obtaining an intermediate text generation template according to a second processing method.

S6, the target user adjusts the intermediate text generation template to obtain a target text generation template.

The invention has at least the following beneficial effects:

the invention provides a data processing system for acquiring a text generation template, which comprises the following steps: the method comprises the following steps of key text corresponding to a target user, a preset model type tree, a processor and a memory storing a computer program, wherein when the computer program is executed by the processor, the following steps are realized: obtaining a candidate template text according to the key text and a preset model type tree; obtaining a target NLP model according to the candidate template text and a plurality of first preset sample texts; inputting the key text into a target NLP model, acquiring an intermediate confidence list corresponding to the key text, acquiring an intermediate text generation template according to a first processing method when the ratio of the number of characters of all keywords corresponding to the key text to the number of characters in the key text is not smaller than a preset character number ratio threshold, otherwise, acquiring the intermediate text generation template according to a second processing method; the target user adjusts the intermediate text generation template to obtain a target text generation template; according to the method, the candidate template text list can be obtained through the key text input by the user, the intermediate text generation template is determined according to the key text, the key words corresponding to the key text and the key words in the candidate template text, the intermediate text generation template is adjusted, and the target text generation template is generated.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flowchart of a data processing system executing a computer program for obtaining a text generation template according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to fall within the scope of the invention.

An embodiment of the present invention provides a data processing system for acquiring a text generation template, including: key text T corresponding to target user ⁰ The system comprises a preset model type tree A, a processor and a memory for storing a computer program, wherein the A comprises a plurality of preset model type nodes, each preset model type node corresponds to a plurality of appointed text generation templates, and when the computer program is executed by the processor, the following steps are realized as shown in fig. 1:

Specifically, the target user is a user who needs to generate a template using text.

Specifically, the candidate template text corresponding to the candidate template is stored in the system.

Specifically, S1 includes the steps of obtaining a candidate template:

s11, T is taken ⁰ Inputting into a preset keyword extraction model to obtain T ⁰ The corresponding keyword information list includes a plurality of keyword information, wherein, the person skilled in the art knows that the preset keyword extraction model is a model trained in advance by the person skilled in the art according to actual requirements, and the details are not repeated here.

Specifically, the keyword information includes: keywords, keyword types.

S12, acquiring a candidate word type list B= { B according to the keyword information list ₁ ，B ₂ ，……，B _e ，……，B _f }，B _e For the e candidate word type, e=1, 2, … …, f, f is the number of candidate word types, and the candidate word type is the keyword type obtained by de-duplicating the keyword types in all the keyword information.

S13, acquiring a third model type tree according to A and B, wherein A= { A ₁ ，A ₂ ，……，A _i ，……，A _m }，A _i ＝{A _i1 ，A _i2 ，……，A _ij ，……，A _in(i) }，A _ij For the j-th preset model type node in the i-th layer of the preset model type tree, i=1, 2, … …, m, m is the number of layers of the preset model type tree, j=1, 2, … …, n (i), and n (i) is the number of preset model type nodes in the i-th layer of the preset model type tree.

Specifically, S13 includes the steps of obtaining a third model type tree:

s131, let all A _ij NULL to obtain a first specified model type tree corresponding to A ¹ ＝{A ¹ ₁ ，A ¹ ₂ ，……，A ¹ _i ，……，A ¹ _m }，A ¹ _i ＝{A ¹ _i1 ，A ¹ _i2 ，……，A ¹ _ij ，……，A ¹ _in(i) }，A ¹ _ij Is A _ij The corresponding designated model type node, where the person skilled in the art knows that the preset model type tree is a tree preset by the person skilled in the art according to the actual requirement, and will not be described herein again.

S132, obtaining A _ij Corresponding preset model type A ⁰ _ij The preset model type is the model type presented by the preset model type node.

S133, obtaining A ⁰ _ij And B is connected with _e Type similarity AB between ^0e _ij The type similarity is the similarity between the preset model type and the candidate word type, wherein, the model type is known to those skilled in the art and can be understood as a label, and any method for obtaining the similarity between two labels in the prior art belongs to the protection scope of the present invention, and is not described herein in detail, for example: cosine similarity, edit distance.

Specifically, the greater the type similarity, the more similar the preset model type and the candidate word type.

S134, when B _e Any one of corresponding AB ^0e _ij ≥A ² When max (AB ^0e ₁₁ ，AB ^0e ₁₂ ，……，AB ^0e _1j ，……，AB ^0e _1n(i) ，AB ^0e ₂₁ ，AB ^0e ₂₂ ，……，AB ^0e _2j ，……，AB ^0e _2n(i) ，……，AB ^0e _i1 ，AB ^0e _i2 ，……，AB ^0e _ij ，……，AB ^0e _in(i) ，……，AB ^0e _m1 ，AB ^0e _m2 ，……，AB ^0e _mj ，……，AB ^0e _mn(i) ) Corresponding A ⁰ _ij Corresponding A _ij Corresponding A ¹ _ij Replaced by B _e To obtain a second specified model tree, wherein max () is a maximum value obtaining function, A ² And presetting a similarity threshold.

Specifically, A ² The value range of (5) is [0.8,1 ]]Wherein, the person skilled in the art knows that the person skilled in the art can realize the invention according to the actual requirements [0.8,1 ]]Specific values of the preset similarity threshold are set, and are not described herein.

S135, when B _e All corresponding AB ^0e _ij ＜A ² At the time, obtain A _ij And B is connected with _e Similarity priority between YX ^e _ij Wherein YX ^e _ij Meets the following conditions:

YX ^e _ij ＝FJ ^e _ij +ZJ ^e _ij ，FJ ^e _ij is A _ij The preset model type corresponding to the father node of (B) _e Type similarity between ZJ ^e _ij Is A _ij The type of the preset model corresponding to the child node of (B) _e Type similarity between, when A _ij FJ without parent node ^e _ij 0, when A _ij ZJ without child node ^e _ij Is 0.

S136, max (YX ^e ₁₁ ，YX ^e ₁₂ ，……，YX ^e _1j ，……，YX ^e _1n(i) ，YX ^e ₂₁ ，YX ^e ₂₂ ，……，YX ^e _2j ，……，YX ^e _2n(i) ，……，YX ^e _i1 ，YX ^e _i2 ，……，YX ^e _ij ，……，YX ^e _in(i) ，……，YX ^e _m1 ，YX ^e _m2 ，……，YX ^e _mj ，……，YX ^e _mn(i) ) Corresponding A _ij Corresponding A ¹ _ij Replaced by B _e To obtain a third specified model tree.

S137, when the j-th second designated model node of the i-th layer of the second designated model tree is NULL and the j-th third designated model node of the i-th layer of the third designated model tree is not NULL, replacing the j-th second designated model node of the i-th layer of the second designated model tree with the j-th third designated model node of the i-th layer of the third designated model tree to obtain a third model type tree.

And when the similarity of all types corresponding to the candidate word type is smaller than the preset similarity threshold, explaining that the candidate word type is quite similar to the preset model type, determining the position of the candidate word type in the third model type tree, generating a third model type tree, determining the position of the candidate word type in the third model type tree according to the preset model type, determining the candidate word type, and generating a candidate text, wherein the candidate word type is not similar to the preset model type, the candidate word type cannot be determined in the third model type tree, the similarity priority corresponding to the candidate word type is acquired, the similarity priority is compared, the position of the candidate word type in the third model type tree is determined, the position of the candidate word type in the third model type tree is used for determining, the second model type tree is fused with the third model type tree, the node in the third model type tree can be accurately determined, the third model type tree is generated, the position of the candidate word type in the third model type tree is not determined according to the preset similarity threshold, the candidate word type is not corresponding to the first model type tree, the candidate text is not used as the candidate text, and the candidate text is prevented from being generated.

S14, determining the level of the nearest public ancestor ZX of all non-NULL nodes in the third model type tree and the position of the nearest public ancestor ZX in the level, wherein a person skilled in the art knows that any mode for acquiring the nearest public ancestor in the prior art belongs to the protection scope of the invention and is not repeated herein; it is understood that determining ZX is the third model type node of the third model type tree at what level.

S15, acquiring a first candidate model type node from the A, wherein the first candidate model type node is a level in the A and a preset model type node with the same position in the level as the level of ZX in a third model type tree and the position in the level; it can be understood that: if ZX is the j-th third model type node of the i-th layer of the third model type tree, the first candidate model type node is the j-th first preset model type node of the i-th layer of the preset model type nodes

S16, acquiring a second candidate model type node list according to the first candidate model type node, wherein the second candidate model type node list comprises a plurality of second candidate model type nodes, and the second candidate model type nodes are preset model type nodes which have the same father node as the first candidate model type nodes.

Specifically, the text generation template is designated as the template stored by the system before the current point in time.

S17, obtaining a candidate template, wherein the candidate template is any appointed text generation template corresponding to any second candidate model type node.

And according to the third model type tree and the preset model type tree, determining a first candidate model type node from the preset model type tree, further determining a second candidate model type node, and taking a specified text generation template corresponding to the second candidate model type node as a candidate template.

S2, training an NLP model according to the T and a plurality of first preset sample texts to obtain a target NLP model, wherein the result of the output of the NLP model is that the first preset sample texts are respectively matched with the T ₁ ，T ₂ ，……，T _b ，……，T _c The confidence level between the two, wherein, the person skilled in the art knows that the first preset sample text is a text preset by the person skilled in the art according to the actual requirement, and any method for training the model in the prior art belongs to the protection scope of the present invention, and is not described in detail herein, for example: unsupervised training and supervised training.

Specifically, the first preset sample text and T respectively ₁ ，T ₂ ，……，T _b ，……，T _c The sum of the confidence levels between them is 1.

Specifically, Σ ^c _b＝1 T ¹ _b ＝1。

S4, when AT/BT is more than or equal to YT, T is processed according to a first processing method ⁰ T and T ¹ Processing to obtain an intermediate text generation template, wherein AT is T ⁰ The sum of the number of characters of all corresponding keywords, BT is T ⁰ The number of the medium characters, YT, is a preset character number proportion threshold value.

Specifically, the value range of YT is [0.6,1], where those skilled in the art know that those skilled in the art set the preset character number proportion threshold in [0.6,1] according to the actual requirement, and the details are not repeated here.

Specifically, the first processing method in S4 includes the following steps:

S42, acquiring a second keyword list GJ corresponding to the T ⁰ ＝{GJ ⁰ ₁ ，GJ ⁰ ₂ ，……，GJ ⁰ _b ，……，GJ ⁰ _c }，GJ ⁰ _b ＝{GJ ⁰ _b1 ，GJ ⁰ _b2 ，……，GJ ⁰ _b(aj) ，……，GJ ⁰ _b(an) }，GJ ⁰ _b(aj) Is T _b Corresponding second keyword list GJ ⁰ _b The aj=1, 2, … …, an, an is the number of second keywords in the second keyword list, and the second keywords are keywords in the candidate template text, where, as known by those skilled in the art, the manner of obtaining the keywords in the candidate template text is the same as the manner of obtaining the keywords corresponding to the keyword text, and will not be described herein again.

GJ ¹ _b ＝Σ ^am _ai＝1 (Σ ^an _aj＝1 XS ^(ai) _(aj) /an)/am，XS ^(ai) _(aj) for GJ _(ai) With GJ ⁰ _b(aj) The term similarity between the two terms, wherein, the person skilled in the art knows that any mode of obtaining the term similarity between the two terms in the prior art belongs to the protection scope of the present invention, and is not repeated here.

Specifically, the larger the value of the term similarity, the more similar the terms are.

Further, the larger the value of the list similarity, the more similar the first keyword list and the second keyword list are.

YX _b ＝T ¹ _b +GJ ¹ _b 。

When the ratio of the number of characters of all keywords corresponding to the key text to the number of characters of the key text is not smaller than the preset character number proportion threshold, the meaning carried by the text can be expressed by the keywords, the similarity between all keywords corresponding to the key text and all keywords corresponding to the candidate template text, namely the list similarity, is obtained, the sum of the confidence coefficient and the list similarity is used as the first priority corresponding to the candidate template text, the candidate template corresponding to the candidate template text with the largest first priority is selected as the intermediate text generation template, the accuracy of obtaining the intermediate text generation template is improved, when a plurality of maximum first priority values exist, the intermediate text generation template cannot be accurately obtained according to the first priority, the key text and the candidate template text need to be analyzed again, and the intermediate text generation template is determined.

Specifically, S46 includes the steps of:

s461, T ⁰ Inputting into a preset vector extraction model to obtain T ⁰ Corresponding first text vector list U ⁰ ＝{U ⁰ ₁ ，U ⁰ ₂ ，……，U ⁰ _v ，……，U ⁰ _w }，U ⁰ _v Is T ⁰ The corresponding v-th first text vector, v=1, 2, … …, w, w is the number of first text vectors, wherein the preset vector extraction model is a neural network model capable of extracting text vectors, and those skilled in the art know that any neural network model capable of obtaining text vectors in the prior art belongs to the protection scope of the present invention, and is not repeated herein.

S462, will T _b Inputting into a preset vector extraction model to obtain T _b Corresponding second text vector list U _b ＝{U _b1 ，U _b2 ，……，U _bv ，……，U _bw }，U _bv Is T _b A corresponding v-th second text vector.

S463 according to U ⁰ And U _b Obtaining T ⁰ Respectively sum to T ₁ ，T ₂ ，……，T _b ，……，T _c Text similarity L between ⁰ ₁ ，L ⁰ ₂ ，……,L ⁰ _b ，……,L ⁰ _c Wherein L is ⁰ _b Is T ⁰ And T _b Text similarity between, and L ⁰ _b Meets the following conditions:

s464 according to T ¹ _b And L ⁰ _b Acquiring a second priority list YX corresponding to T ⁰ ＝{YX ⁰ ₁ ，YX ⁰ ₂ ，……，YX ⁰ _b ，……，YX ⁰ _c }，YX ⁰ _b Is T _b Corresponding second priority, YX ⁰ _b Meets the following conditions:

YX ⁰ _b ＝T ¹ _b +L ⁰ _b 。

s465, YX ⁰ When there is only one maximum value, max (YX ⁰ ₁ ，YX ⁰ ₂ ，……，YX ⁰ _b ，……，YX ⁰ _c ) Corresponding T _b The corresponding candidate templates are templates for the intermediate text generation.

S466, when YX ⁰ When there are a plurality of maximum values in (c), max (T ¹ ₁ +GJ ¹ ₁ +L ⁰ ₁ ，T ¹ ₂ +GJ ¹ ₂ +L ⁰ ₂ ，……，T ¹ _b +GJ ¹ _b +L ⁰ _b ，……，T ¹ _c +GJ ¹ _c +L ⁰ _c ) Corresponding T _b The corresponding candidate templates are templates for the intermediate text generation.

When the plurality of maximum first priority values exist, the text similarity between the key text and the candidate template text is obtained, the sum of the confidence coefficient and the text similarity is taken as the second priority corresponding to the candidate template text, the candidate template corresponding to the candidate template text with the maximum second priority is selected as the intermediate text generation template, the accuracy of obtaining the intermediate text generation template is improved, when the plurality of maximum second priority values exist, the intermediate text generation template cannot be obtained accurately according to the second priority, and at the moment, the candidate template corresponding to the candidate template text with the maximum sum of the confidence coefficient, the list similarity and the text similarity is required to be selected as the intermediate text generation template, and the accuracy of obtaining the intermediate text generation template is improved.

Specifically, the second processing method in S5 includes the following steps:

s51, when YX ⁰ When there is only one maximum value, max (YX ⁰ ₁ ，YX ⁰ ₂ ，……，YX ⁰ _b ，……，YX ⁰ _c ) Corresponding T _b The corresponding candidate templates are templates for the intermediate text generation.

S53, when YX ⁰ When there are a plurality of maximum values in (c), max (T ¹ ₁ +GJ ¹ ₁ +L ⁰ ₁ ，T ¹ ₂ +GJ ¹ ₂ +L ⁰ ₂ ，……，T ¹ _b +GJ ¹ _b +L ⁰ _b ，……，T ¹ _c +GJ ¹ _c +L ⁰ _c ) Corresponding T _b The corresponding candidate templates are templates for the intermediate text generation.

When the ratio of the number of characters of all keywords corresponding to the key text to the number of characters of the key text is smaller than the preset character number proportion threshold, it is stated that the keywords may not express the meaning carried by the text, and the intermediate text generation template cannot be acquired mainly by the keywords, and at this time, the candidate template corresponding to the candidate template text with the largest second priority is selected as the intermediate text generation template, which is favorable for improving the accuracy of acquiring the intermediate text generation template, and when a plurality of the largest second priority values exist, the intermediate text generation template cannot be acquired accurately according to the second priority, and at this time, the candidate template corresponding to the candidate template text with the largest sum of confidence, list similarity and text similarity needs to be selected as the intermediate text generation template, which is favorable for improving the accuracy of acquiring the intermediate text generation template.

S6, adjusting the intermediate text generation template by the target user to obtain the target text generation template, wherein the method for adjusting the intermediate text generation template by the target user is known to the person skilled in the art and is set by the person skilled in the art according to actual requirements, and details are omitted here, for example: and adding the data statistical model in the intermediate text generating template, and deleting the data statistical model in the intermediate text generating template.

The invention also provides a data statistics system based on the text generation template, which takes the target text generation template as an initial text generation template and comprises the following steps after S6:

s100, acquiring a first model type list according to an initial text generation template, wherein the first model type list comprises a plurality of first model types, the first model types are model types of initial data statistical models in the initial text generation template, wherein the initial data statistical models are known to a person skilled in the art, and are models which are trained in advance by the person skilled in the art and can be used for counting data in a data set and outputting result data, and the details are not repeated.

Specifically, the initial text generation template comprises a plurality of initial data statistical models.

Specifically, the different initial data statistical models output different result data.

Further, the form of the result data output by the initial data statistical model is one or more of a data table form, a text form, a digital form, a data set form and the like.

Specifically, S100 includes the following steps:

s101, acquiring an initial data statistical model name list from an initial text generation template, wherein the initial data statistical model name list comprises a plurality of initial data statistical model names in the initial text generation template, and the initial data statistical model names are names of initial data statistical models.

S103, acquiring a second model type list corresponding to the initial data statistical model name list according to the initial data statistical model name list, wherein the second model type list comprises a plurality of second model types, and the second model types are model types of initial data statistical models corresponding to the initial data statistical model names.

Specifically, model types of the initial data statistical model are stored in the system.

S105, performing de-duplication processing on the second model type list to obtain a first model type list, wherein any de-duplication method known to those skilled in the art belongs to the protection scope of the present invention, and is not described herein.

According to the method, the second model type list is obtained through the initial data statistical model names in the initial text generation template, the duplicate removal processing is carried out on the second model type list, the first model type list is obtained, the first model type tree is further generated, the target statistical data set is obtained from the root to the leaf according to the sequence from the key data statistical model corresponding to the root node of the first model type tree, the repeated processing of the initial data statistical model of the same model type can be avoided, the repeated calculation and the resource waste are avoided, and the operation efficiency of the system is improved.

S200, according to the A and the first model type list, a first model type tree list C= { C corresponding to the first model type list is obtained ₁ ，C ₂ ，……，C _r ，……，C _s }，C _r ＝{C _r1 ，C _r2 ，……，C _rg ，……，C _rh }，C _rg ＝{C ¹ _rg ，C ² _rg ，……，C ^x _rg ，……，C ^p _rg }，C ^x _rg For the r first model type tree C _r The x first model type node of the g layer, r=1, 2, … …, s, s the number of first model type trees, g=1, 2, … …, h, h is the number of layers of the first model type tree, x=1, 2, … …, p, p is the number of first model type nodes of one layer of the first model type tree. For example: if the node of the first layer of the preset model type tree, namely the root node, is of a text type, and the node of the child node of the root node, namely the node of the second layer, is: the nodes of the third layer are as follows: the child nodes of the plain text type are: a traditional Chinese character type and a simplified Chinese character type; the child nodes of the plain digital text type are: decimal type, integer type, fractional type; the child nodes of the pure English text type are: capital letter types, lowercase letter types; the child nodes of the mixed text type are: chinese-English mixing typeText and number combination types; if the first model type is: words, english, mixed, traditional, simplified, lower case letters, numbers and Chinese, then 3 first model type trees can be obtained, and the root nodes are respectively: text, english, mix; the child nodes of the characters are: complex and simplified; the subnodes in English are: lower case letters; the mixed child nodes are: numbers and chinese.

Specifically, S200 includes the following steps:

s201, acquiring a second model type tree according to the A and the first model type list, wherein the second model type tree comprises a plurality of second model type nodes, and a person skilled in the art knows that the mode of acquiring the second model type tree according to the preset model type tree and the first model type list is the same as the mode of acquiring a third model type tree according to the preset model type tree list and the candidate word type list in S131-S137, and the description is omitted herein; the second model type tree may be understood as a third model type tree obtained in S131-S137 when the candidate word type in S131-S137 is replaced with the first model type.

S203, deleting the second model type node which is NULL in the second model type tree to acquire C.

According to the type similarity between the preset model type and the first model type, the second model type tree is determined, all empty nodes in the second model tree are deleted, the first model type tree can be accurately acquired, the target statistical data set is acquired from the root to the leaf according to the sequence from the key data statistical model corresponding to the root node of the first model type tree, repeated calculation can be avoided, resource waste is avoided, and the running efficiency of the system is improved.

S300, obtaining C ^x _rg Corresponding key data statistical model list D ^x _rg ＝{D ^x1 _rg ，D ^x2 _rg ，……，D ^xy _rg ，……，D ^xq _rg }，D ^xy _rg Is C ^x _rg The corresponding y-th key data statistical model, y=1, 2, … …, q, q is the first modelThe number of the key data statistical models corresponding to the type nodes is the number of the key data statistical models, and the key data statistical models are initial data statistical models with the same model type as that presented by the first model type node.

Specifically, each first model type node corresponds to q key data statistical models, q varies with x, r, g, for example: if the first model type node is a traditional Chinese, if the model types of the traditional Chinese text name statistical model, the traditional Chinese text quantity statistical model and the traditional Chinese text publication time statistical model are all traditional Chinese, the quantity of the key data statistical models corresponding to the first model type node is 3 and the key data statistical model is: a traditional text name statistical model, a traditional text quantity statistical model and a traditional text publishing time statistical model; if the first model type node is a simple body, if the model types of the simple text quantity statistical model and the simple text publication time statistical model are both simple bodies, the quantity of the key data statistical models corresponding to the first model type node is 2, and the key data statistical models are: the simplified text quantity statistical model and the simplified text publishing time statistical model.

S400, when g=1, input the initial dataset to D ^xy _rg In order to obtain D ^xy _rg Corresponding target statistics set G ^xy _rg Wherein the target statistics set comprises a plurality of pieces of target statistics.

In particular, the initial data set includes all data for data statistics and the initial data set is stored in the system.

Further, the initial data set includes a plurality of pieces of initial data.

S500, when g is not equal to 1, C is _r(g-1) As C in ^x _rg C of parent node of (C) ^x _r(g-1) All corresponding D ^xy _r(g-1) Corresponding G ^xy _r(g-1) Target statistics as C ^x _rg Corresponding intermediate statistics to obtain C ^x _rg Corresponding intermediate statistical data set H ^x _rg And S600 is performed.

In particular, the intermediate statistics set comprises several pieces of intermediate statistics.

S600, H ^x _rg Input to D ^xy _rg In order to obtain D ^xy _rg Corresponding target statistics set G ^xy _rg 。

And when the first model type node corresponding to the key data statistical model is not the root node, acquiring the target statistical data set from the target statistical data set corresponding to all the key data statistical models corresponding to the father node of the corresponding first model type node, avoiding repeated calculation, being beneficial to avoiding resource waste and improving the operation efficiency of the system.

While certain specific embodiments of the invention have been described in detail by way of example, it will be appreciated by those skilled in the art that the above examples are for illustration only and are not intended to limit the scope of the invention. Those skilled in the art will also appreciate that many modifications may be made to the embodiments without departing from the scope and spirit of the invention.

Claims

1. A data processing system for obtaining a text generation template, the system comprising: key text T corresponding to target user ⁰ The method comprises the following steps of:

s1, according to T ⁰ And A, obtain T ⁰ Corresponding candidate template text list t= { T ₁ ，T ₂ ，……，T _b ，……，T _c }，T _b B=1, 2, … …, c, c is the number of candidate templates, and the candidate template text is a text for describing the application range of the candidate template;

s2, training an NLP model according to the T and a plurality of first preset sample texts to obtain a target NLP model, wherein the result of the output of the NLP model is that the first preset sample texts are respectively matched with the T ₁ ，T ₂ ，……，T _b ，……，T _c Confidence between;

s3, T is taken as ⁰ Input into target NLP model to obtain T ⁰ Corresponding intermediate confidence list T ¹ ＝{T ¹ ₁ ，T ¹ ₂ ，……，T ¹ _b ，……，T ¹ _c }，T ¹ _b Is T ⁰ And T _b An intermediate confidence level between;

s41, acquiring a first keyword list GJ= { GJ ₁ ，GJ ₂ ，……，GJ _(ai) ，……，GJ _(am) }，GJ _(ai) For the ai first keyword, ai=1, 2, … …, am, am is the number of first keywords, and the first keyword is T ⁰ Corresponding keywords;

s42, acquiring a second keyword list GJ corresponding to the T ⁰ ＝{GJ ⁰ ₁ ，GJ ⁰ ₂ ，……，GJ ⁰ _b ，……，GJ ⁰ _c }，GJ ⁰ _b ＝{GJ ⁰ _b1 ，GJ ⁰ _b2 ，……，GJ ⁰ _b(aj) ，……，GJ ⁰ _b(an) }，GJ ⁰ _b(aj) Is T _b Corresponding second keyword list GJ ⁰ _b The aj second keywords in the list of the second keywords, aj=1, 2, … …, an, an are the number of the second keywords in the list of the second keywords, and the second keywords are keywords in the candidate template text;

GJ ¹ _b ＝Σ ^am _ai＝1 (Σ ^an _aj＝1 XS ^(ai) _(aj) /an)/am，XS ^(ai) _(aj) for GJ _(ai) With GJ ⁰ _b(aj) Word similarity between;

YX _b ＝T ¹ _b +GJ ¹ _b ；

s45, when there is only one maximum value in YX, determining max (YX ₁ ，YX ₂ ，……，YX _b ，……，YX _c ) Corresponding T _b Generating templates for the intermediate texts by using the corresponding candidate templates;

s46, when there are a plurality of maximum values in YX, according to T ⁰ And T _b Determining an intermediate text generation template;

s5, when AT/BT is less than YT, acquiring an intermediate text generation template according to a second processing method;

2. The data processing system for obtaining a text generation template according to claim 1, wherein S1 comprises the steps of obtaining a candidate template:

s11, T is taken ⁰ Inputting into a preset keyword extraction model to obtain T ⁰ The corresponding keyword information list comprises a plurality of keyword information, wherein the keyword information comprises: keywords, keyword types;

s12, acquiring a candidate word type list B= { B according to the keyword information list ₁ ，B ₂ ，……，B _e ，……，B _f }，B _e For the e candidate word type, e=1, 2, … …, f, f is the number of candidate word types, and the candidate word type is the keyword type obtained by de-duplicating the keyword types in all the keyword information;

s13, acquiring a third model type tree according to A and B, wherein A= { A ₁ ，A ₂ ，……，A _i ，……，A _m }，A _i ＝{A _i1 ，A _i2 ，……，A _ij ，……，A _in(i) }，A _ij For the j-th preset model type node in the i-th layer of the preset model type tree, i=1, 2, … …, m, m are the number of layers of the preset model type tree, j=1, 2, … …, n (i), n (i) are the number of preset model type nodes in the i-th layer of the preset model type tree, and each preset model type node corresponds to a plurality of appointed text generation templates;

s14, determining the level of the nearest common ancestor ZX of all non-NULL nodes in the third model type tree and the positions of the nearest common ancestor ZX in the level;

s15, acquiring a first candidate model type node from the A, wherein the first candidate model type node is a level in the A and a preset model type node with the same position in the level as the level of ZX in a third model type tree and the position in the level;

s16, acquiring a second candidate model type node list according to the first candidate model type node, wherein the second candidate model type node list comprises a plurality of second candidate model type nodes, and the second candidate model type nodes are preset model type nodes which have the same father node as the first candidate model type nodes;

3. The data processing system for retrieving a text generation template of claim 2, wherein the candidate template text corresponding to the candidate template is stored in the system.

4. The data processing system for obtaining a text generation template according to claim 2, wherein S13 comprises the steps of obtaining a third model type tree:

s131, let all A _ij NULL to obtain a first specified model type tree corresponding to A ¹ ＝{A ¹ ₁ ，A ¹ ₂ ，……，A ¹ _i ，……，A ¹ _m }，A ¹ _i ＝{A ¹ _i1 ，A ¹ _i2 ，……，A ¹ _ij ，……，A ¹ _in(i) }，A ¹ _ij Is A _ij Corresponding designated model type nodes;

s132, obtaining A _ij Corresponding preset model type A ⁰ _ij The preset model type is the model type presented by the preset model type node;

s133, obtaining A ⁰ _ij And B is connected with _e Type similarity AB between ^0e _ij The type similarity is the similarity between the preset model type and the candidate word type;

s134, when B _e Any one of corresponding AB ^0e _ij ≥A ² When max (AB ^0e ₁₁ ，AB ^0e ₁₂ ，……，AB ^0e _1j ，……，AB ^0e _1n(i) ，AB ^0e ₂₁ ，AB ^0e ₂₂ ，……，AB ^0e _2j ，……，AB ^0e _2n(i) ，……，AB ^0e _i1 ，AB ^0e _i2 ，……，AB ^0e _ij ，……，AB ^0e _in(i) ，……，AB ^0e _m1 ，AB ^0e _m2 ，……，AB ^0e _mj ，……，AB ^0e _mn(i) ) Corresponding A ⁰ _ij Corresponding A _ij Corresponding A ¹ _ij Replaced by B _e To obtain a second specified model tree, wherein max () is a maximum value obtaining function, A ² A similarity threshold is preset;

YX ^e _ij ＝FJ ^e _ij +ZJ ^e _ij ，FJ ^e _ij is A _ij The preset model type corresponding to the father node of (B) _e Type similarity between ZJ ^e _ij Is A _ij The type of the preset model corresponding to the child node of (B) _e Type similarity between, when A _ij FJ without parent node ^e _ij 0, when A _ij ZJ without child node ^e _ij Is 0;

s136, max (YX ^e ₁₁ ，YX ^e ₁₂ ，……，YX ^e _1j ，……，YX ^e _1n(i) ，YX ^e ₂₁ ，YX ^e ₂₂ ，……，YX ^e _2j ，……，YX ^e _2n(i) ，……，YX ^e _i1 ，YX ^e _i2 ，……，YX ^e _ij ，……，YX ^e _in(i) ，……，YX ^e _m1 ，YX ^e _m2 ，……，YX ^e _mj ，……，YX ^e _mn(i) ) Corresponding A _ij Corresponding A ¹ _ij Replaced by B _e To obtain a third specified model tree;

5. The data processing system for obtaining a text generation template according to claim 1, wherein S46 comprises the steps of:

s461, T ⁰ Inputting into a preset vector extraction model to obtain T ⁰ Corresponding first text vector list U ⁰ ＝{U ⁰ ₁ ，U ⁰ ₂ ，……，U ⁰ _v ，……，U ⁰ _w }，U ⁰ _v Is T ⁰ The corresponding v first text vector, v=1, 2, … …, w, w being the number of first text vectors;

s462, will T _b Inputting into a preset vector extraction model to obtain T _b Corresponding second text vector list U _b ＝{U _b1 ，U _b2 ，……，U _bv ，……，U _bw }，U _bv Is T _b A corresponding v-th second text vector;

s464 root and rootAccording to T ¹ _b And L ⁰ _b Acquiring a second priority list YX corresponding to T ⁰ ＝{YX ⁰ ₁ ，YX ⁰ ₂ ，……，YX ⁰ _b ，……，YX ⁰ _c }，YX ⁰ _b Is T _b Corresponding second priority, YX ⁰ _b Meets the following conditions:

YX ⁰ _b ＝T ¹ _b +L ⁰ _b ；

s465, YX ⁰ When there is only one maximum value, max (YX ⁰ ₁ ，YX ⁰ ₂ ，……，YX ⁰ _b ，……，YX ⁰ _c ) Corresponding T _b Generating templates for the intermediate texts by using the corresponding candidate templates;

6. The data processing system for obtaining a text generation template according to claim 5, wherein the second processing method in S5 comprises the steps of:

s51, when YX ⁰ When there is only one maximum value, max (YX ⁰ ₁ ，YX ⁰ ₂ ，……，YX ⁰ _b ，……，YX ⁰ _c ) Corresponding T _b Generating templates for the intermediate texts by using the corresponding candidate templates;

7. The data processing system for retrieving a text generation template as recited in claim 4, wherein a ² The value range of (5) is [0.8,1 ]]。

8. The data processing system for obtaining a text generation template according to claim 1, wherein the target text generation template is used as an initial text generation template and after S6 comprises the steps of obtaining a target statistical data set:

s100, acquiring a first model type list according to an initial text generation template, wherein the first model type list comprises a plurality of first model types, and the first model types are model types of initial data statistical models in the initial text generation template;

s200, according to the A and the first model type list, a first model type tree list C= { C corresponding to the first model type list is obtained ₁ ，C ₂ ，……，C _r ，……，C _s }，C _r ＝{C _r1 ，C _r2 ，……，C _rg ，……，C _rh }，C _rg ＝{C ¹ _rg ，C ² _rg ，……，C ^x _rg ，……，C ^p _rg }，C ^x _rg For the r first model type tree C _r The x first model type node of the g layer, r=1, 2, … …, s, s the number of first model type trees, g=1, 2, … …, h, h is the number of layers of the first model type tree, x=1, 2, … …, p, p is the number of first model type nodes of one layer of the first model type tree;

s300, obtaining C ^x _rg Corresponding key data statistical model list D ^x _rg ＝{D ^x1 _rg ，D ^x2 _rg ，……，D ^xy _rg ，……，D ^xq _rg }，D ^xy _rg Is C ^x _rg The corresponding y-th key data statistical model is that y=1, 2, … …, q and q are the number of the key data statistical models corresponding to the first model type node, and the key data statistical model is an initial data statistical model with the same model type as that presented by the first model type node;

s400, when g=1, input the initial dataset to D ^xy _rg In order to obtain D ^xy _rg Corresponding target statistics set G ^xy _rg Wherein the initial data set comprises all data for data statistics and the initial data set is stored in the system, and the target statistics set comprises a plurality of pieces of target statistics;

s500, when g is not equal to 1, C is _r(g-1) As C in ^x _rg C of parent node of (C) ^x _r(g-1) All corresponding D ^xy _r(g-1) Corresponding G ^xy _r(g-1) Target statistics as C ^x _rg Corresponding intermediate statistics to obtain C ^x _rg Corresponding intermediate statistical data set H ^x _rg And performs S600;