CN113434661A - Method and device for prompting draft simulation of official document, electronic equipment and storage medium - Google Patents

Method and device for prompting draft simulation of official document, electronic equipment and storage medium Download PDF

Info

Publication number
CN113434661A
CN113434661A CN202110722850.2A CN202110722850A CN113434661A CN 113434661 A CN113434661 A CN 113434661A CN 202110722850 A CN202110722850 A CN 202110722850A CN 113434661 A CN113434661 A CN 113434661A
Authority
CN
China
Prior art keywords
target
official document
similarity
search results
dictionary tree
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110722850.2A
Other languages
Chinese (zh)
Inventor
邓悦
庄伯金
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202110722850.2A priority Critical patent/CN113434661A/en
Publication of CN113434661A publication Critical patent/CN113434661A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/338Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/258Heading extraction; Automatic titling; Numbering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Databases & Information Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the technical field of artificial intelligence, and provides a method and a device for prompting manuscript simulation, electronic equipment and a storage medium, wherein the method comprises the following steps: dividing the official document content of the official document to be drafted to obtain a plurality of target clauses; constructing a first dictionary tree, and performing contraction processing on the first dictionary tree to obtain a second dictionary tree; sequentially searching from the root node of the second dictionary tree based on the prefix to be searched to obtain a plurality of first search results; inputting the official document theme into a theme distribution prediction model trained in advance to perform theme distribution prediction to obtain a plurality of second search results; and calculating the similarity between the plurality of first search results and the plurality of second search results by adopting a plurality of similarity algorithms, and determining a target prompt text. The method determines the target prompt text by considering the prefixes to be searched, the subject of the official documents and a plurality of dimensions of a plurality of similarity algorithms, and improves the accuracy rate of the fed back target prompt text and the efficiency of drawing the official documents.

Description

Method and device for prompting draft simulation of official document, electronic equipment and storage medium
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a method and a device for prompting manuscript simulation, electronic equipment and a storage medium.
Background
In the intelligent manuscript drawing scene, when a text editor is used for writing the content of a manuscript, the text with related materials or similar subjects is frequently used for prompting due to the lack of the related materials, and the prior art prompts the completion of the manuscript drawing through a similar input method.
However, when the document drafting prompt is performed, the input method is used for prompting document drafting completion, namely the prompt text is randomly output according to the document input by the user, the prompt text cannot meet the requirements of the user, and the prompt text accuracy is low, so that the document drafting process is slow.
Therefore, it is necessary to provide a method for quickly and accurately prompting the draft document.
Disclosure of Invention
In view of the above, it is necessary to provide a method and an apparatus for prompting a document draft, an electronic device, and a storage medium, which determine a target prompt text by considering a plurality of dimensions from a prefix to be searched, a document theme, and a plurality of similarity algorithms, and improve the accuracy of the fed-back target prompt text and the document draft prompting efficiency.
The first aspect of the invention provides a method for prompting draft making of a official document, which comprises the following steps:
analyzing the received official document draft making request to acquire the official document to be drafted;
segmenting the official document content extracted from the official document to be drafted to obtain a plurality of target clauses of the official document to be drafted;
constructing a first dictionary tree based on the target clauses, and performing contraction processing on the first dictionary tree to obtain a second dictionary tree;
when detecting that a user inputs a prefix to be searched, sequentially searching from a root node of the second dictionary tree based on the prefix to be searched to obtain a plurality of first search results;
inputting the official document theme of the official document to be drawn into a theme distribution prediction model trained in advance to perform theme distribution prediction to obtain a plurality of second search results;
calculating the similarity between the plurality of first search results and the plurality of second search results by adopting a plurality of similarity algorithms to obtain the target similarity of each first search result;
and determining a target prompt text corresponding to the official document to be drawn according to the plurality of target similarities of the plurality of first search results.
Optionally, the performing a contraction processing on the first dictionary tree to obtain a second dictionary tree includes:
traversing from the root node of the first dictionary tree, and identifying whether each node has a branch path;
and when any target node has no branch path, combining one or more nodes behind the target node with the target node, and updating the first dictionary tree to obtain a second dictionary tree.
Optionally, the constructing a first dictionary tree based on the plurality of target clauses includes:
performing word segmentation processing on the target clauses respectively to obtain a sample sequence corresponding to each target clause, wherein the sample sequence consists of text characters and/or special characters, and the special characters comprise brackets, quotation marks, colons or mathematical symbols;
randomly selecting a sample sequence corresponding to any one target clause to determine as a target sample sequence, and determining a root node of a preset dictionary tree as a current node;
determining the rest sample sequences as the sequences to be matched of the target sample sequence;
repeatedly executing the following steps until a new path of the preset dictionary tree is generated or the sequence to be matched is empty, updating the preset dictionary tree, and obtaining a first dictionary tree:
determining the subtree of the current node as a current target subtree;
searching a first character of the sequence to be matched in a first layer of the current target subtree;
if the first character of the sequence to be matched is not found in the current target subtree, sequentially inserting the characters of the sequence to be matched into corresponding layers in the current target subtree, and sequentially connecting the inserted characters to generate a new path in the dictionary tree;
and if the first character of the sequence to be matched is found in the current target subtree, updating the current node to the found first character, and removing the first character of the sequence to be matched from the sequence to be matched so as to update the sequence to be matched.
Optionally, the segmenting the document content extracted from the document to be drawn to obtain a plurality of target clauses of the document to be drawn to include:
extracting the official document content of the official document to be drawn from the official document to be drawn;
calculating length values of a plurality of sentences in the official document content;
judging whether the length value of each sentence in the official document content meets a preset sentence length threshold range or not;
when the length value of each sentence in the official document content meets the preset sentence length threshold range, reserving each corresponding sentence, and determining a plurality of reserved sentences as a plurality of target clauses of the official document to be drawn.
Optionally, the training process of the topic distribution prediction model includes:
acquiring a plurality of historical themes and a corpus text of each historical theme;
segmenting the corpus text of each historical topic according to a preset segmentation rule to obtain a plurality of clauses corresponding to each historical topic;
taking the plurality of historical themes and a plurality of clauses corresponding to each historical theme as a sample data set;
dividing the data set into a first number of training sets and a second number of testing sets according to a preset proportion;
inputting the training set into a preset convolutional neural network for training to obtain a theme distribution prediction model;
inputting the test set into the theme distribution prediction model for testing to obtain a test passing rate;
judging whether the test passing rate is greater than a preset passing rate threshold value or not;
when the test passing rate is greater than or equal to the preset passing rate threshold value, finishing the training of the theme distribution prediction model; or when the test passing rate is smaller than the preset passing rate threshold, increasing the number of the training sets, and re-training the theme distribution prediction model.
Optionally, the calculating the similarity between the plurality of first search results and the plurality of second search results by using a plurality of similarity algorithms, and obtaining the target similarity of each first search result includes:
calculating the similarity between each first search result and the plurality of second search results by using an edit distance algorithm, and averaging the calculated similarities to obtain the first similarity of each first search result;
calculating the similarity between each first search result and the plurality of second search results by using a Jacobian similarity algorithm, and averaging the calculated similarities to obtain the second similarity of each first search result;
calculating the similarity between each first search result and the plurality of second search results by using a preset first text similarity algorithm, and averaging the calculated similarities to obtain a third similarity of each first search result;
and inputting the first similarity, the second similarity and the third similarity into a preset logistic regression model to obtain the target similarity of each first search result.
Optionally, the determining, according to the multiple target similarities of the multiple first search results, a target prompt text corresponding to the to-be-drawn official document includes:
selecting a plurality of target first search results corresponding to a plurality of target similarities with larger similarities from the plurality of similarities;
acquiring the click frequency corresponding to each node of the plurality of target first search results from the second dictionary tree;
calculating the sum of the click frequency of all nodes of each target first search result to obtain the target click frequency of each target first search result;
sorting a plurality of target click frequencies of the plurality of target first search results in a descending order;
and selecting a plurality of first search results of the targets which are ranked at the front from the descending ranking results, and determining the first search results as target prompt texts corresponding to the to-be-drawn official documents.
A second aspect of the present invention provides a manuscript imitation prompting device, including:
the analysis module is used for analyzing the received official document draft making request to acquire the official document to be drafted;
the segmentation module is used for segmenting the official document content extracted from the official document to be drawn to obtain a plurality of target clauses of the official document to be drawn;
the contraction processing module is used for constructing a first dictionary tree based on the target clauses and carrying out contraction processing on the first dictionary tree to obtain a second dictionary tree;
the searching module is used for sequentially searching from the root node of the second dictionary tree based on the prefix to be searched to obtain a plurality of first searching results when the prefix to be searched input by the user is detected;
the input module is used for inputting the official document theme of the official document to be drawn into a theme distribution prediction model trained in advance to perform theme distribution prediction so as to obtain a plurality of second search results;
the calculating module is used for calculating the similarity between the plurality of first search results and the plurality of second search results by adopting a plurality of similarity algorithms to obtain the target similarity of each first search result;
and the determining module is used for determining a target prompt text corresponding to the to-be-drawn official document according to the plurality of target similarities of the plurality of first search results.
A third aspect of the present invention provides an electronic device, comprising a processor and a memory, wherein the processor is configured to implement the manuscript drawing prompting method when executing a computer program stored in the memory.
A fourth aspect of the present invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the manuscript imitation prompting method.
In summary, according to the method, the device, the electronic device and the storage medium for prompting the draft drawing of the official document, on one hand, the number of nodes and the memory occupation of the server are reduced by constructing the first dictionary tree based on the plurality of target clauses and performing the contraction processing on the constructed first dictionary tree, so that the feedback efficiency of the subsequent prompt text is improved; on the other hand, similarity between the plurality of first search results and the plurality of second search results is calculated by adopting a plurality of similarity algorithms, a target prompt text corresponding to the official document to be drawn is determined, the similarity between each first search result and each second search result is calculated by adopting the plurality of similarity algorithms according to the plurality of first search results obtained by searching the prefix to be searched and the plurality of second search results obtained according to the official document theme of the official document to be drawn, the target prompt text of the official document to be drawn is determined by adopting the plurality of similarity algorithms and the plurality of dimension considerations from the prefix to be searched, the official document theme and the plurality of similarity algorithms, the accuracy of the fed-back target prompt text is improved, and the official document drawing efficiency is improved; and finally, sequentially searching from the root node of the second dictionary tree based on the prefix to be searched to obtain a plurality of first search results, searching with pertinence, reducing the search time and improving the efficiency of obtaining the plurality of first search results.
Drawings
Fig. 1 is a flowchart of a method for prompting draft-drafting of a document according to an embodiment of the present invention.
Fig. 2 is a schematic diagram of a constructed first dictionary tree according to an embodiment of the present invention.
Fig. 3 is a schematic diagram of a second dictionary tree according to an embodiment of the present invention.
Fig. 4 is a structural diagram of a device for presenting a manuscript emulation according to a second embodiment of the present invention.
Fig. 5 is a schematic structural diagram of an electronic device according to a third embodiment of the present invention.
Detailed Description
In order that the above objects, features and advantages of the present invention can be more clearly understood, a detailed description of the present invention will be given below with reference to the accompanying drawings and specific embodiments. It should be noted that the embodiments of the present invention and features of the embodiments may be combined with each other without conflict.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.
Example one
Fig. 1 is a flowchart of a method for prompting draft-drafting of a document according to an embodiment of the present invention.
In this embodiment, the method for prompting manuscript imitation may be applied to an electronic device, and for an electronic device that needs to perform manuscript imitation prompting, the function of the manuscript imitation prompting provided by the method of the present invention may be directly integrated on the electronic device, or may be run in the electronic device in a Software Development Kit (SDK) form.
As shown in fig. 1, the method for prompting manuscript imitation specifically includes the following steps, and the order of the steps in the flowchart may be changed or some of the steps may be omitted according to different requirements.
And S11, analyzing the received official document draft making request to obtain the official document to be drafted.
In the embodiment, when a user draws a document, the user initiates a document drawing simulation prompt request to a server through a client, where the document drawing simulation prompt request includes document contents of a document to be drawn, the client may be a smart phone, an IPAD, or other intelligent devices, the server may be a document drawing simulation subsystem, and in the document drawing simulation process, the client may send the document drawing simulation prompt request to the document drawing simulation subsystem, and the document drawing simulation subsystem is configured to receive the document drawing simulation request sent by the client.
And S12, segmenting the official document content extracted from the official document to be drawn to obtain a plurality of target clauses of the official document to be drawn.
In this embodiment, the content of the official document may include a sentence with an excessively long length or a sentence with an excessively short length, and the excessively long length or the excessively short length of the sentence may affect the target prompt text, so that the content of the official document to be drawn is divided, and the sentences affecting the feedback efficiency are deleted.
In an optional embodiment, the segmenting the document content extracted from the document to be drafted to obtain a plurality of target clauses of the document to be drafted includes:
extracting the official document content of the official document to be drawn from the official document to be drawn;
calculating length values of a plurality of sentences in the official document content;
judging whether the length value of each sentence in the official document content meets a preset sentence length threshold range or not;
when the length value of each sentence in the official document content meets the preset sentence length threshold range, reserving each corresponding sentence, and determining a plurality of reserved sentences as a plurality of target clauses of the official document to be drafted; or
And deleting each corresponding sentence when the length value of each sentence in the official document content does not meet the preset sentence length threshold range.
In this embodiment, a length threshold range of a sentence may be preset, for example, the preset length threshold range of the sentence may be set to be greater than or equal to 5 words and less than or equal to 30 words.
In the embodiment, the official document content of the official document to be drawn is obtained from the official document to be drawn, each sentence in the official document content is divided, the sentences within the preset sentence length threshold range are reserved, and other sentences in the official document content are deleted, so that the interference of irrelevant information is avoided, the data volume is reduced, and the subsequent text prompting feedback efficiency is improved.
S13, constructing a first dictionary tree based on the target clauses, and performing contraction processing on the first dictionary tree to obtain a second dictionary tree.
In this embodiment, on the basis of a basic data structure, a hash-trie with high efficiency, that is, a dictionary tree (also called a prefix tree) is adopted, the dictionary tree can reduce query time by using a common prefix of a character string, and can implement insertion and query operations within a constant time o (len), and is a data structure in which time is changed by using space, but the dictionary tree occupies a large amount of memory of a server and is not friendly to server resources.
In an optional embodiment, said building a first dictionary tree based on said plurality of target clauses comprises:
performing word segmentation processing on the target clauses respectively to obtain a sample sequence corresponding to each target clause, wherein the sample sequence consists of text characters and/or special characters, and the special characters comprise brackets, quotation marks, colons or mathematical symbols;
randomly selecting a sample sequence corresponding to any one target clause to determine as a target sample sequence, and determining a root node of a preset dictionary tree as a current node;
determining the rest sample sequences as the sequences to be matched of the target sample sequence;
repeatedly executing the following steps until a new path of the preset dictionary tree is generated or the sequence to be matched is empty, updating the preset dictionary tree, and obtaining a first dictionary tree:
determining the subtree of the current node as a current target subtree;
searching a first character of the sequence to be matched in a first layer of the current target subtree;
if the first character of the sequence to be matched is not found in the current target subtree, sequentially inserting the characters of the sequence to be matched into corresponding layers in the current target subtree, and sequentially connecting the inserted characters to generate a new path in the dictionary tree;
and if the first character of the sequence to be matched is found in the current target subtree, updating the current node to the found first character, and removing the first character of the sequence to be matched from the sequence to be matched so as to update the sequence to be matched.
Illustratively, 3 target clauses, from which a first dictionary tree is constructed, in particular, a first clause: i is a Chinese; second clause: i is safe in China; the third clause: i dislike apple, the first dictionary tree constructed (as shown in fig. 2).
In this embodiment, each node of the first dictionary tree includes one character, each target clause is subjected to word segmentation processing and then sequenced, each sequenced target clause is determined as a sample sequence, and the first dictionary tree is constructed on a preset dictionary tree through a plurality of sample series corresponding to a plurality of target clauses.
In an optional embodiment, the performing the contraction processing on the first dictionary tree to obtain the second dictionary tree includes:
traversing from the root node of the first dictionary tree, and identifying whether each node has a branch path;
and when any target node has no branch path, combining one or more nodes behind the target node with the target node, and updating the first dictionary tree to obtain a second dictionary tree.
Exemplarily, referring to fig. 3, merging one or more nodes corresponding to the target node without the branch with the target node, and updating to obtain a second dictionary tree.
Further, the method further comprises:
and when any one target node has a branch path, continuously traversing the first dictionary tree.
In this embodiment, when a branch path exists in any one target node, it is determined that the branch path of the target clause corresponding to any one target node is not traversed, and the first dictionary tree is continuously traversed.
S14, when detecting that a user inputs a prefix to be searched, sequentially searching from the root node of the second dictionary tree based on the prefix to be searched to obtain a plurality of first search results.
In this embodiment, when searching is performed through the prefix to be searched input by the user, the obtained multiple first search results include different topics, for example, if the input prefix to be searched is "my is", then the returned multiple first search results are: "I am Chinese" and "I am safe in China".
In the embodiment, the prefixes to be searched are searched in the second dictionary tree, so that targeted search is achieved, the search time is reduced, and the efficiency of obtaining a plurality of first search results is improved.
S15, inputting the official document theme of the official document to be drawn into a theme distribution prediction model trained in advance for theme distribution prediction to obtain a plurality of second search results;
in this embodiment, the theme distribution prediction module may be an LDA theme model, and after determining the official document theme of the official document to be drawn, the theme distribution prediction module inputs the official document theme into a pre-trained theme distribution prediction model to perform theme distribution prediction, so as to obtain a plurality of clauses, that is, a plurality of second search results, corresponding to the official document theme.
Specifically, the training process of the topic distribution prediction model includes:
acquiring a plurality of historical themes and a corpus text of each historical theme;
segmenting the corpus text of each historical topic according to a preset segmentation rule to obtain a plurality of clauses corresponding to each historical topic;
taking the plurality of historical themes and a plurality of clauses corresponding to each historical theme as a sample data set;
dividing the data set into a first number of training sets and a second number of testing sets according to a preset proportion;
inputting the training set into a preset convolutional neural network for training to obtain a theme distribution prediction model;
inputting the test set into the theme distribution prediction model for testing to obtain a test passing rate;
judging whether the test passing rate is greater than a preset passing rate threshold value or not;
when the test passing rate is greater than or equal to the preset passing rate threshold value, finishing the training of the theme distribution prediction model; or when the test passing rate is smaller than the preset passing rate threshold, increasing the number of the training sets, and re-training the theme distribution prediction model.
In this embodiment, a cutting rule may be preset, for example, a sentence length threshold range of each clause may be preset to be greater than or equal to 5 words and less than or equal to 30 words, a corpus text of each historical topic is divided to obtain a plurality of clauses of each historical topic, a topic distribution prediction model is trained according to the plurality of topics and the plurality of clauses corresponding to each topic, and in a subsequent prediction process, the plurality of clauses of each historical topic are used as new data to increase the number of the data sets, the topic distribution prediction model is retrained based on the new data sets, and the topic distribution prediction model is continuously updated, so that the accuracy of topic distribution prediction is continuously improved, and the accuracy of a plurality of second search results output by the topic distribution prediction model is further improved.
And S16, calculating the similarity between the plurality of first search results and the plurality of second search results by adopting a plurality of similarity algorithms to obtain the target similarity of each first search result.
In the embodiment, the similarity between the plurality of first search results and the plurality of second search results is calculated, so that the fact that the fed-back prompt text is irrelevant to the official document theme of the official document to be drawn is avoided, and the accuracy of the fed-back prompt text is improved.
In an optional embodiment, the calculating the similarity between the plurality of first search results and the plurality of second search results by using a plurality of similarity algorithms, and obtaining the target similarity of each first search result includes:
calculating the similarity between each first search result and the plurality of second search results by using an edit distance algorithm, and averaging the calculated similarities to obtain the first similarity of each first search result;
calculating the similarity between each first search result and the plurality of second search results by using a Jacobian similarity algorithm, and averaging the calculated similarities to obtain the second similarity of each first search result;
calculating the similarity between each first search result and the plurality of second search results by using a preset first text similarity algorithm, and averaging the calculated similarities to obtain a third similarity of each first search result;
and inputting the first similarity, the second similarity and the third similarity into a preset logistic regression model to obtain the target similarity of each first search result.
In this embodiment, the first Similarity, the second Similarity, and the third Similarity of each first search result, which are calculated by an Edit Distance algorithm (Edit Distance), a Jaccard Similarity algorithm (Jaccard Similarity) and a preset first text Similarity (BM25) algorithm, are input as three feature values into a preset logistic Regression model (logistic Regression), so as to obtain a value between 0 and 1, and the value is used as the target Similarity of each first search result.
In this embodiment, the Edit Distance algorithm (Edit Distance), the Jaccard Similarity algorithm (Jaccard Similarity) and the preset first text Similarity algorithm (BM25) are prior art, and this embodiment will not be described in detail herein.
Illustratively, when performing a logistic regression algorithm, the present embodiment selects a Sigmoid function, and specifically, an expression of the Sigmoid function is as follows:
Figure BDA0003137374630000121
and combining the Sigmoid function and the linear regression function, and taking the output of the linear regression model as the input of the Sigmoid function to obtain a logistic regression model:
Figure BDA0003137374630000122
wherein, wTRepresenting a preset set of weight vectors, x represents [ a first similarity of each of the first search results, a second similarity of each of the first search results, a third similarity of each of the first search results]And y represents the target similarity of each of the first search results.
In the embodiment, for a plurality of first search results obtained by searching according to the prefix to be searched and a plurality of second search results obtained according to the official document theme of the official document to be drawn, the similarity between each first search result and each second search result is calculated by adopting a plurality of similarity algorithms, the target similarity of each first search result is determined from a plurality of dimensional considerations, the accuracy of the target similarity is improved, and the accuracy of the fed-back target prompt text is improved.
And S17, determining a target prompt text corresponding to the to-be-drawn official document according to the plurality of target similarities of the plurality of first search results.
In this embodiment, the target prompt text is a prompt text fed back by the official document draft simulating subsystem according to the official document draft simulating request, and the user can simulate the official document draft according to the prompt text.
In an optional embodiment, the determining, according to the plurality of target similarities of the plurality of first search results, a target prompt text corresponding to the document to be drawn includes:
selecting a plurality of target first search results corresponding to a plurality of target similarities with larger similarities from the plurality of similarities;
acquiring the click frequency corresponding to each node of the plurality of target first search results from the second dictionary tree;
calculating the sum of the click frequency of all nodes of each target first search result to obtain the target click frequency of each target first search result;
sorting a plurality of target click frequencies of the plurality of target first search results in a descending order;
and selecting a plurality of first search results of the targets which are ranked at the front from the descending ranking results, and determining the first search results as target prompt texts corresponding to the to-be-drawn official documents.
In this embodiment, theme distribution prediction is performed in the theme distribution prediction model, a plurality of second search results are returned, and the similarity between the plurality of first search results and the plurality of second search results is calculated, so that the fed-back target prompt text is prevented from being unrelated to the theme of the official document, the accuracy of the fed-back target prompt text is improved, the draft-making efficiency of the official document draft-making is improved, and the utilization rate and the stability of the completed official document are improved.
In summary, in the method for prompting draft drawing of a official document according to the embodiment, on one hand, the number of nodes and the memory occupation of the server are reduced by constructing the first dictionary tree based on the plurality of target clauses and performing the contraction processing on the constructed first dictionary tree, so that the feedback efficiency of the subsequent prompt text is improved; on the other hand, similarity between the plurality of first search results and the plurality of second search results is calculated by adopting a plurality of similarity algorithms, a target prompt text corresponding to the official document to be drawn is determined, the similarity between each first search result and each second search result is calculated by adopting the plurality of similarity algorithms according to the plurality of first search results obtained by searching the prefix to be searched and the plurality of second search results obtained according to the official document theme of the official document to be drawn, the target prompt text of the official document to be drawn is determined by adopting the plurality of similarity algorithms and the plurality of dimension considerations from the prefix to be searched, the official document theme and the plurality of similarity algorithms, the accuracy of the fed-back target prompt text is improved, and the official document drawing efficiency is improved; and finally, sequentially searching from the root node of the second dictionary tree based on the prefix to be searched to obtain a plurality of first search results, searching with pertinence, reducing the search time and improving the efficiency of obtaining the plurality of first search results.
Example two
Fig. 4 is a structural diagram of a device for presenting a manuscript emulation according to a second embodiment of the present invention.
In some embodiments, the apparatus 40 may include a plurality of functional modules composed of program code segments. The program code of each program segment in the apparatus 40 for manuscript drawing simulation may be stored in a memory of the electronic device and executed by the at least one processor to perform the functions of the manuscript drawing simulation (described in detail in fig. 1 to 3).
In this embodiment, the official document draft presenting device 40 may be divided into a plurality of functional modules according to the functions executed by the device. The functional module may include: parsing module 401, segmentation module 402, pinch processing module 403, search module 404, input module 405, calculation module 406, and determination module 407. The module referred to herein is a series of computer readable instruction segments stored in a memory that can be executed by at least one processor and that can perform a fixed function. In the present embodiment, the functions of the modules will be described in detail in the following embodiments.
The parsing module 401 is configured to parse the received document-drafting request to obtain a document to be drafted.
In the embodiment, when a user draws a document, the user initiates a document drawing simulation prompt request to a server through a client, where the document drawing simulation prompt request includes document contents of a document to be drawn, the client may be a smart phone, an IPAD, or other intelligent devices, the server may be a document drawing simulation subsystem, and in the document drawing simulation process, the client may send the document drawing simulation prompt request to the document drawing simulation subsystem, and the document drawing simulation subsystem is configured to receive the document drawing simulation request sent by the client.
A dividing module 402, configured to divide the document content extracted from the document to be drafted to obtain a plurality of target clauses of the document to be drafted.
In this embodiment, the content of the official document may include a sentence with an excessively long length or a sentence with an excessively short length, and the excessively long length or the excessively short length of the sentence may affect the target prompt text, so that the content of the official document to be drawn is divided, and the sentences affecting the feedback efficiency are deleted.
In an optional embodiment, the segmenting module 402 segments the content of the document extracted from the document to be drawn, and obtaining a plurality of target clauses of the document to be drawn includes:
extracting the official document content of the official document to be drawn from the official document to be drawn;
calculating length values of a plurality of sentences in the official document content;
judging whether the length value of each sentence in the official document content meets a preset sentence length threshold range or not;
when the length value of each sentence in the official document content meets the preset sentence length threshold range, reserving each corresponding sentence, and determining a plurality of reserved sentences as a plurality of target clauses of the official document to be drafted; or
And deleting each corresponding sentence when the length value of each sentence in the official document content does not meet the preset sentence length threshold range.
In this embodiment, a length threshold range of a sentence may be preset, for example, the preset length threshold range of the sentence may be set to be greater than or equal to 5 words and less than or equal to 30 words.
In the embodiment, the official document content of the official document to be drawn is obtained from the official document to be drawn, each sentence in the official document content is divided, the sentences within the preset sentence length threshold range are reserved, and other sentences in the official document content are deleted, so that the interference of irrelevant information is avoided, the data volume is reduced, and the subsequent text prompting feedback efficiency is improved.
A contraction processing module 403, configured to construct a first dictionary tree based on the multiple target clauses, and perform contraction processing on the first dictionary tree to obtain a second dictionary tree.
In this embodiment, on the basis of a basic data structure, a hash-trie with high efficiency, that is, a dictionary tree (also called a prefix tree) is adopted, the dictionary tree can reduce query time by using a common prefix of a character string, and can implement insertion and query operations within a constant time o (len), and is a data structure in which time is changed by using space, but the dictionary tree occupies a large amount of memory of a server and is not friendly to server resources.
In an alternative embodiment, the constructing the first dictionary tree based on the plurality of target clauses by the contraction processing module 403 includes:
performing word segmentation processing on the target clauses respectively to obtain a sample sequence corresponding to each target clause, wherein the sample sequence consists of text characters and/or special characters, and the special characters comprise brackets, quotation marks, colons or mathematical symbols;
randomly selecting a sample sequence corresponding to any one target clause to determine as a target sample sequence, and determining a root node of a preset dictionary tree as a current node;
determining the rest sample sequences as the sequences to be matched of the target sample sequence;
repeatedly executing the following steps until a new path of the preset dictionary tree is generated or the sequence to be matched is empty, updating the preset dictionary tree, and obtaining a first dictionary tree:
determining the subtree of the current node as a current target subtree;
searching a first character of the sequence to be matched in a first layer of the current target subtree;
if the first character of the sequence to be matched is not found in the current target subtree, sequentially inserting the characters of the sequence to be matched into corresponding layers in the current target subtree, and sequentially connecting the inserted characters to generate a new path in the dictionary tree;
and if the first character of the sequence to be matched is found in the current target subtree, updating the current node to the found first character, and removing the first character of the sequence to be matched from the sequence to be matched so as to update the sequence to be matched.
Illustratively, 3 target clauses, from which a first dictionary tree is constructed, in particular, a first clause: i is a Chinese; second clause: i is safe in China; the third clause: i dislike apple, the first dictionary tree constructed (as shown in fig. 2).
In this embodiment, each node of the first dictionary tree includes one character, each target clause is subjected to word segmentation processing and then sequenced, each sequenced target clause is determined as a sample sequence, and the first dictionary tree is constructed on a preset dictionary tree through a plurality of sample series corresponding to a plurality of target clauses.
In an optional embodiment, the step of performing the pinch processing on the first dictionary tree by the pinch processing module 403 to obtain a second dictionary tree includes:
traversing from the root node of the first dictionary tree, and identifying whether each node has a branch path;
and when any target node has no branch path, combining one or more nodes behind the target node with the target node, and updating the first dictionary tree to obtain a second dictionary tree.
Exemplarily, referring to fig. 3, merging one or more nodes corresponding to the target node without the branch with the target node, and updating to obtain a second dictionary tree.
Further, when any one target node has a branch path, the first dictionary tree is continuously traversed.
In this embodiment, when a branch path exists in any one target node, it is determined that the branch path of the target clause corresponding to any one target node is not traversed, and the first dictionary tree is continuously traversed.
The searching module 404 is configured to, when it is detected that a prefix to be searched is input by a user, sequentially search from a root node of the second trie based on the prefix to be searched to obtain a plurality of first search results.
In this embodiment, when searching is performed through the prefix to be searched input by the user, the obtained multiple first search results include different topics, for example, if the input prefix to be searched is "my is", then the returned multiple first search results are: "I am Chinese" and "I am safe in China".
In the embodiment, the prefixes to be searched are searched in the second dictionary tree, so that targeted search is achieved, the search time is reduced, and the efficiency of obtaining a plurality of first search results is improved.
An input module 405, configured to input the official document theme of the official document to be drawn into a theme distribution prediction model trained in advance for theme distribution prediction, so as to obtain a plurality of second search results;
in this embodiment, the theme distribution prediction module may be an LDA theme model, and after determining the official document theme of the official document to be drawn, the theme distribution prediction module inputs the official document theme into a pre-trained theme distribution prediction model to perform theme distribution prediction, so as to obtain a plurality of clauses, that is, a plurality of second search results, corresponding to the official document theme.
Specifically, the training process of the topic distribution prediction model includes:
acquiring a plurality of historical themes and a corpus text of each historical theme;
segmenting the corpus text of each historical topic according to a preset segmentation rule to obtain a plurality of clauses corresponding to each historical topic;
taking the plurality of historical themes and a plurality of clauses corresponding to each historical theme as a sample data set;
dividing the data set into a first number of training sets and a second number of testing sets according to a preset proportion;
inputting the training set into a preset convolutional neural network for training to obtain a theme distribution prediction model;
inputting the test set into the theme distribution prediction model for testing to obtain a test passing rate;
judging whether the test passing rate is greater than a preset passing rate threshold value or not;
when the test passing rate is greater than or equal to the preset passing rate threshold value, finishing the training of the theme distribution prediction model; or when the test passing rate is smaller than the preset passing rate threshold, increasing the number of the training sets, and re-training the theme distribution prediction model.
In this embodiment, a cutting rule may be preset, for example, a sentence length threshold range of each clause may be preset to be greater than or equal to 5 words and less than or equal to 30 words, a corpus text of each historical topic is divided to obtain a plurality of clauses of each historical topic, a topic distribution prediction model is trained according to the plurality of topics and the plurality of clauses corresponding to each topic, and in a subsequent prediction process, the plurality of clauses of each historical topic are used as new data to increase the number of the data sets, the topic distribution prediction model is retrained based on the new data sets, and the topic distribution prediction model is continuously updated, so that the accuracy of topic distribution prediction is continuously improved, and the accuracy of a plurality of second search results output by the topic distribution prediction model is further improved.
A calculating module 406, configured to calculate similarities between the multiple first search results and the multiple second search results by using multiple similarity algorithms, so as to obtain a target similarity of each first search result.
In the embodiment, the similarity between the plurality of first search results and the plurality of second search results is calculated, so that the fact that the fed-back prompt text is irrelevant to the official document theme of the official document to be drawn is avoided, and the accuracy of the fed-back prompt text is improved.
In an optional embodiment, the calculating module 406 calculates the similarity between the first search results and the second search results by using a plurality of similarity algorithms, and obtaining the target similarity of each first search result includes:
calculating the similarity between each first search result and the plurality of second search results by using an edit distance algorithm, and averaging the calculated similarities to obtain the first similarity of each first search result;
calculating the similarity between each first search result and the plurality of second search results by using a Jacobian similarity algorithm, and averaging the calculated similarities to obtain the second similarity of each first search result;
calculating the similarity between each first search result and the plurality of second search results by using a preset first text similarity algorithm, and averaging the calculated similarities to obtain a third similarity of each first search result;
and inputting the first similarity, the second similarity and the third similarity into a preset logistic regression model to obtain the target similarity of each first search result.
In this embodiment, the first Similarity, the second Similarity, and the third Similarity of each first search result, which are calculated by an Edit Distance algorithm (Edit Distance), a Jaccard Similarity algorithm (Jaccard Similarity) and a preset first text Similarity (BM25) algorithm, are input as three feature values into a preset logistic Regression model (logistic Regression), so as to obtain a value between 0 and 1, and the value is used as the target Similarity of each first search result.
In this embodiment, the Edit Distance algorithm (Edit Distance), the Jaccard Similarity algorithm (Jaccard Similarity) and the preset first text Similarity algorithm (BM25) are prior art, and this embodiment will not be described in detail herein.
Illustratively, when performing a logistic regression algorithm, the present embodiment selects a Sigmoid function, and specifically, an expression of the Sigmoid function is as follows:
Figure BDA0003137374630000191
and combining the Sigmoid function and the linear regression function, and taking the output of the linear regression model as the input of the Sigmoid function to obtain a logistic regression model:
Figure BDA0003137374630000192
wherein, wTRepresenting a preset set of weight vectors, x represents [ a first similarity of each of the first search results, a second similarity of each of the first search results, a third similarity of each of the first search results]And y represents the target similarity of each of the first search results.
In the embodiment, for a plurality of first search results obtained by searching according to the prefix to be searched and a plurality of second search results obtained according to the official document theme of the official document to be drawn, the similarity between each first search result and each second search result is calculated by adopting a plurality of similarity algorithms, the target similarity of each first search result is determined from a plurality of dimensional considerations, the accuracy of the target similarity is improved, and the accuracy of the fed-back target prompt text is improved.
The determining module 407 is configured to determine, according to the multiple target similarities of the multiple first search results, a target prompt text corresponding to the to-be-drawn official document.
In this embodiment, the target prompt text is a prompt text fed back by the official document draft simulating subsystem according to the official document draft simulating request, and the user can simulate the official document draft according to the prompt text.
In an optional embodiment, the determining module 407 determines, according to the target similarities of the first search results, that the target prompt text corresponding to the document to be drawn includes:
selecting a plurality of target first search results corresponding to a plurality of target similarities with larger similarities from the plurality of similarities;
acquiring the click frequency corresponding to each node of the plurality of target first search results from the second dictionary tree;
calculating the sum of the click frequency of all nodes of each target first search result to obtain the target click frequency of each target first search result;
sorting a plurality of target click frequencies of the plurality of target first search results in a descending order;
and selecting a plurality of first search results of the targets which are ranked at the front from the descending ranking results, and determining the first search results as target prompt texts corresponding to the to-be-drawn official documents.
In this embodiment, theme distribution prediction is performed in the theme distribution prediction model, a plurality of second search results are returned, and the similarity between the plurality of first search results and the plurality of second search results is calculated, so that the fed-back target prompt text is prevented from being unrelated to the theme of the official document, the accuracy of the fed-back target prompt text is improved, the draft-making efficiency of the official document draft-making is improved, and the utilization rate and the stability of the completed official document are improved.
In summary, in the apparatus for prompting draft of official document according to this embodiment, on one hand, the number of nodes and the memory occupation of the server are reduced by constructing the first dictionary tree based on the plurality of target clauses and performing the contraction processing on the constructed first dictionary tree, thereby improving the feedback efficiency of the subsequent prompt text; on the other hand, similarity between the plurality of first search results and the plurality of second search results is calculated by adopting a plurality of similarity algorithms, a target prompt text corresponding to the official document to be drawn is determined, the similarity between each first search result and each second search result is calculated by adopting the plurality of similarity algorithms according to the plurality of first search results obtained by searching the prefix to be searched and the plurality of second search results obtained according to the official document theme of the official document to be drawn, the target prompt text of the official document to be drawn is determined by adopting the plurality of similarity algorithms and the plurality of dimension considerations from the prefix to be searched, the official document theme and the plurality of similarity algorithms, the accuracy of the fed-back target prompt text is improved, and the official document drawing efficiency is improved; and finally, sequentially searching from the root node of the second dictionary tree based on the prefix to be searched to obtain a plurality of first search results, searching with pertinence, reducing the search time and improving the efficiency of obtaining the plurality of first search results.
EXAMPLE III
Fig. 5 is a schematic structural diagram of an electronic device according to a third embodiment of the present invention. In the preferred embodiment of the present invention, the electronic device 5 comprises a memory 51, at least one processor 52, at least one communication bus 53 and a transceiver 54.
It will be appreciated by those skilled in the art that the configuration of the electronic device shown in fig. 5 does not constitute a limitation of the embodiment of the present invention, and may be a bus-type configuration or a star-type configuration, and the electronic device 5 may include more or less hardware or software than those shown, or different component arrangements.
In some embodiments, the electronic device 5 is an electronic device capable of automatically performing numerical calculation and/or information processing according to instructions set or stored in advance, and the hardware thereof includes but is not limited to a microprocessor, an application specific integrated circuit, a programmable gate array, a digital processor, an embedded device, and the like. The electronic device 5 may also include a client device, which includes, but is not limited to, any electronic product that can interact with a client through a keyboard, a mouse, a remote controller, a touch pad, or a voice control device, for example, a personal computer, a tablet computer, a smart phone, a digital camera, and the like.
It should be noted that the electronic device 5 is only an example, and other existing or future electronic products, such as those that can be adapted to the present invention, should also be included in the scope of the present invention, and are included herein by reference.
In some embodiments, the memory 51 is used for storing program codes and various data, such as the manuscript prompter 40 installed in the electronic device 5, and realizes high-speed and automatic access to programs or data during the operation of the electronic device 5. The Memory 51 includes a Read-Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Programmable Read-Only Memory (EPROM), a One-time Programmable Read-Only Memory (OTPROM), an electronically Erasable rewritable Read-Only Memory (Electrically-Erasable Programmable Read-Only Memory (EEPROM)), an optical Read-Only disk (CD-ROM) or other optical disk Memory, a magnetic disk Memory, a tape Memory, or any other medium readable by a computer capable of carrying or storing data.
In some embodiments, the at least one processor 52 may be composed of an integrated circuit, for example, a single packaged integrated circuit, or may be composed of a plurality of integrated circuits packaged with the same or different functions, including one or more Central Processing Units (CPUs), microprocessors, digital Processing chips, graphics processors, and combinations of various control chips. The at least one processor 52 is a Control Unit (Control Unit) of the electronic device 5, connects various components of the electronic device 5 by using various interfaces and lines, and executes various functions and processes data of the electronic device 5 by running or executing programs or modules stored in the memory 51 and calling data stored in the memory 51.
In some embodiments, the at least one communication bus 53 is arranged to enable connection communication between the memory 51 and the at least one processor 52, etc.
Although not shown, the electronic device 5 may further include a power source (such as a battery) for supplying power to each component, and optionally, the power source may be logically connected to the at least one processor 52 through a power management device, so as to implement functions of managing charging, discharging, and power consumption through the power management device. The power supply may also include any component of one or more dc or ac power sources, recharging devices, power failure detection circuitry, power converters or inverters, power status indicators, and the like. The electronic device 5 may further include various sensors, a bluetooth module, a Wi-Fi module, and the like, which are not described herein again.
It is to be understood that the described embodiments are for purposes of illustration only and that the scope of the appended claims is not limited to such structures.
The integrated unit implemented in the form of a software functional module may be stored in a computer-readable storage medium. The software functional module is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, an electronic device, or a network device) or a processor (processor) to execute parts of the methods according to the embodiments of the present invention.
In a further embodiment, in conjunction with fig. 4, the at least one processor 52 may execute operating devices of the electronic device 5, as well as various installed applications (e.g., the manuscript prompter 40), program code, and the like, such as the various modules described above.
The memory 51 has program code stored therein, and the at least one processor 52 can call the program code stored in the memory 51 to perform related functions. For example, the various modules illustrated in fig. 4 are program code stored in the memory 51 and executed by the at least one processor 52 to implement the functions of the various modules for the purpose of manuscript filing.
Illustratively, the program code may be divided into one or more modules/units, which are stored in the memory 51 and executed by the processor 52 to accomplish the present application. The one or more modules/units may be a series of computer readable instruction segments capable of performing certain functions, which are used to describe the execution of the program code in the electronic device 5. For example, the program code may be divided into parsing module 401, dividing module 402, pinch processing module 403, searching module 404, input module 405, calculating module 406, and determining module 407.
In one embodiment of the invention, the memory 51 stores a plurality of computer readable instructions that are executed by the at least one processor 52 to implement the functionality of a manuscript-like reminder.
Specifically, the method for implementing the instruction by the at least one processor 52 may refer to the description of the relevant steps in the embodiments corresponding to fig. 1 to fig. 3, which is not repeated herein.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it is obvious that the word "comprising" does not exclude other elements or that the singular does not exclude the plural. A plurality of units or means recited in the present invention may also be implemented by one unit or means through software or hardware. The terms first, second, etc. are used to denote names, but not any particular order.
Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims (10)

1. A method for prompting draft simulation of a official document is characterized by comprising the following steps:
analyzing the received official document draft making request to acquire the official document to be drafted;
segmenting the official document content extracted from the official document to be drafted to obtain a plurality of target clauses of the official document to be drafted;
constructing a first dictionary tree based on the target clauses, and performing contraction processing on the first dictionary tree to obtain a second dictionary tree;
when detecting that a user inputs a prefix to be searched, sequentially searching from a root node of the second dictionary tree based on the prefix to be searched to obtain a plurality of first search results;
inputting the official document theme of the official document to be drawn into a theme distribution prediction model trained in advance to perform theme distribution prediction to obtain a plurality of second search results;
calculating the similarity between the plurality of first search results and the plurality of second search results by adopting a plurality of similarity algorithms to obtain the target similarity of each first search result;
and determining a target prompt text corresponding to the official document to be drawn according to the plurality of target similarities of the plurality of first search results.
2. The method of claim 1, wherein the narrowing the first trie to obtain a second trie comprises:
traversing from the root node of the first dictionary tree, and identifying whether each node has a branch path;
and when any target node has no branch path, combining one or more nodes behind the target node with the target node, and updating the first dictionary tree to obtain a second dictionary tree.
3. The method of claim 1, wherein constructing a first dictionary tree based on the plurality of target clauses comprises:
performing word segmentation processing on the target clauses respectively to obtain a sample sequence corresponding to each target clause, wherein the sample sequence consists of text characters and/or special characters, and the special characters comprise brackets, quotation marks, colons or mathematical symbols;
randomly selecting a sample sequence corresponding to any one target clause to determine as a target sample sequence, and determining a root node of a preset dictionary tree as a current node;
determining the rest sample sequences as the sequences to be matched of the target sample sequence;
repeatedly executing the following steps until a new path of the preset dictionary tree is generated or the sequence to be matched is empty, updating the preset dictionary tree, and obtaining a first dictionary tree:
determining the subtree of the current node as a current target subtree;
searching a first character of the sequence to be matched in a first layer of the current target subtree;
if the first character of the sequence to be matched is not found in the current target subtree, sequentially inserting the characters of the sequence to be matched into corresponding layers in the current target subtree, and sequentially connecting the inserted characters to generate a new path in the dictionary tree;
and if the first character of the sequence to be matched is found in the current target subtree, updating the current node to the found first character, and removing the first character of the sequence to be matched from the sequence to be matched so as to update the sequence to be matched.
4. The method for prompting manuscript imitation of claim 1, wherein the step of segmenting the content of the official document extracted from the official document to be submitted to obtain a plurality of target clauses of the official document to be submitted comprises the following steps:
extracting the official document content of the official document to be drawn from the official document to be drawn;
calculating length values of a plurality of sentences in the official document content;
judging whether the length value of each sentence in the official document content meets a preset sentence length threshold range or not;
when the length value of each sentence in the official document content meets the preset sentence length threshold range, reserving each corresponding sentence, and determining a plurality of reserved sentences as a plurality of target clauses of the official document to be drawn.
5. The method of claim 1, wherein the training process of the topic distribution prediction model comprises:
acquiring a plurality of historical themes and a corpus text of each historical theme;
segmenting the corpus text of each historical topic according to a preset segmentation rule to obtain a plurality of clauses corresponding to each historical topic;
taking the plurality of historical themes and a plurality of clauses corresponding to each historical theme as a sample data set;
dividing the data set into a first number of training sets and a second number of testing sets according to a preset proportion;
inputting the training set into a preset convolutional neural network for training to obtain a theme distribution prediction model;
inputting the test set into the theme distribution prediction model for testing to obtain a test passing rate;
judging whether the test passing rate is greater than a preset passing rate threshold value or not;
when the test passing rate is greater than or equal to the preset passing rate threshold value, finishing the training of the theme distribution prediction model; or when the test passing rate is smaller than the preset passing rate threshold, increasing the number of the training sets, and re-training the theme distribution prediction model.
6. The method of claim 1, wherein the calculating the similarity between the first search results and the second search results using a plurality of similarity algorithms to obtain the target similarity of each first search result comprises:
calculating the similarity between each first search result and the plurality of second search results by using an edit distance algorithm, and averaging the calculated similarities to obtain the first similarity of each first search result;
calculating the similarity between each first search result and the plurality of second search results by using a Jacobian similarity algorithm, and averaging the calculated similarities to obtain the second similarity of each first search result;
calculating the similarity between each first search result and the plurality of second search results by using a preset first text similarity algorithm, and averaging the calculated similarities to obtain a third similarity of each first search result;
and inputting the first similarity, the second similarity and the third similarity into a preset logistic regression model to obtain the target similarity of each first search result.
7. The method of claim 1, wherein the determining the target prompt text corresponding to the official document to be drawn according to the target similarities of the first search results comprises:
selecting a plurality of target first search results corresponding to a plurality of target similarities with larger similarities from the plurality of similarities;
acquiring the click frequency corresponding to each node of the plurality of target first search results from the second dictionary tree;
calculating the sum of the click frequency of all nodes of each target first search result to obtain the target click frequency of each target first search result;
sorting a plurality of target click frequencies of the plurality of target first search results in a descending order;
and selecting a plurality of first search results of the targets which are ranked at the front from the descending ranking results, and determining the first search results as target prompt texts corresponding to the to-be-drawn official documents.
8. A device for prompting manuscript simulation, which is characterized in that the device comprises:
the analysis module is used for analyzing the received official document draft making request to acquire the official document to be drafted;
the segmentation module is used for segmenting the official document content extracted from the official document to be drawn to obtain a plurality of target clauses of the official document to be drawn;
the contraction processing module is used for constructing a first dictionary tree based on the target clauses and carrying out contraction processing on the first dictionary tree to obtain a second dictionary tree;
the searching module is used for sequentially searching from the root node of the second dictionary tree based on the prefix to be searched to obtain a plurality of first searching results when the prefix to be searched input by the user is detected;
the input module is used for inputting the official document theme of the official document to be drawn into a theme distribution prediction model trained in advance to perform theme distribution prediction so as to obtain a plurality of second search results;
the calculating module is used for calculating the similarity between the plurality of first search results and the plurality of second search results by adopting a plurality of similarity algorithms to obtain the target similarity of each first search result;
and the determining module is used for determining a target prompt text corresponding to the to-be-drawn official document according to the plurality of target similarities of the plurality of first search results.
9. An electronic device, comprising a processor and a memory, wherein the processor is configured to implement the manuscript hinting method of any one of claims 1-7 when executing a computer program stored in the memory.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the manuscript mimicry prompting method according to any one of claims 1 to 7.
CN202110722850.2A 2021-06-29 2021-06-29 Method and device for prompting draft simulation of official document, electronic equipment and storage medium Pending CN113434661A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110722850.2A CN113434661A (en) 2021-06-29 2021-06-29 Method and device for prompting draft simulation of official document, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110722850.2A CN113434661A (en) 2021-06-29 2021-06-29 Method and device for prompting draft simulation of official document, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN113434661A true CN113434661A (en) 2021-09-24

Family

ID=77757374

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110722850.2A Pending CN113434661A (en) 2021-06-29 2021-06-29 Method and device for prompting draft simulation of official document, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113434661A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104899322A (en) * 2015-06-18 2015-09-09 百度在线网络技术(北京)有限公司 Search engine and implementation method thereof
CN107526846A (en) * 2017-09-27 2017-12-29 百度在线网络技术(北京)有限公司 Generation, sort method, device, server and the medium of channel sequencing model
CN108345601A (en) * 2017-01-23 2018-07-31 腾讯科技(深圳)有限公司 Search result ordering method and device
CN109740165A (en) * 2019-01-09 2019-05-10 网易(杭州)网络有限公司 Dictionary tree constructing method, sentence data search method, apparatus, equipment and storage medium
CN109962983A (en) * 2019-03-29 2019-07-02 北京搜狗科技发展有限公司 A kind of clicking rate statistical method and device
CN111966654A (en) * 2020-08-18 2020-11-20 浪潮云信息技术股份公司 Mixed filter based on Trie dictionary tree

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104899322A (en) * 2015-06-18 2015-09-09 百度在线网络技术(北京)有限公司 Search engine and implementation method thereof
CN108345601A (en) * 2017-01-23 2018-07-31 腾讯科技(深圳)有限公司 Search result ordering method and device
CN107526846A (en) * 2017-09-27 2017-12-29 百度在线网络技术(北京)有限公司 Generation, sort method, device, server and the medium of channel sequencing model
CN109740165A (en) * 2019-01-09 2019-05-10 网易(杭州)网络有限公司 Dictionary tree constructing method, sentence data search method, apparatus, equipment and storage medium
CN109962983A (en) * 2019-03-29 2019-07-02 北京搜狗科技发展有限公司 A kind of clicking rate statistical method and device
CN111966654A (en) * 2020-08-18 2020-11-20 浪潮云信息技术股份公司 Mixed filter based on Trie dictionary tree

Similar Documents

Publication Publication Date Title
WO2022141861A1 (en) Emotion classification method and apparatus, electronic device, and storage medium
CN111539197B (en) Text matching method and device, computer system and readable storage medium
CN108664599B (en) Intelligent question-answering method and device, intelligent question-answering server and storage medium
CN111783468B (en) Text processing method, device, equipment and medium
CN111881316B (en) Search method, search device, server and computer readable storage medium
CN112052356B (en) Multimedia classification method, apparatus and computer readable storage medium
CN105389349A (en) Dictionary updating method and apparatus
CN111639486A (en) Paragraph searching method and device, electronic equipment and storage medium
CN111563158B (en) Text ranking method, ranking apparatus, server and computer-readable storage medium
CN112115232A (en) Data error correction method and device and server
CN110795544B (en) Content searching method, device, equipment and storage medium
CN111309916B (en) Digest extracting method and apparatus, storage medium, and electronic apparatus
CN106469145A (en) Text emotion analysis method and device
CN112926308B (en) Method, device, equipment, storage medium and program product for matching text
CN112559717B (en) Search matching method, device, electronic equipment and storage medium
CN111859953A (en) Training data mining method and device, electronic equipment and storage medium
CN118377783B (en) SQL sentence generation method and device
CN113435582A (en) Text processing method based on sentence vector pre-training model and related equipment
CN113409823A (en) Voice emotion recognition method and device, electronic equipment and storage medium
CN112231453A (en) Intelligent question and answer method and device, computer equipment and storage medium
CN112287656A (en) Text comparison method, device, equipment and storage medium
CN111931516A (en) Text emotion analysis method and system based on reinforcement learning
CN113420542B (en) Dialogue generation method, device, electronic equipment and storage medium
CN114020892A (en) Answer selection method and device based on artificial intelligence, electronic equipment and medium
CN117787290A (en) Drawing prompting method and device based on knowledge graph

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210924