CN116128364A - Text writing quality monitoring method and system - Google Patents
Text writing quality monitoring method and system Download PDFInfo
- Publication number
- CN116128364A CN116128364A CN202310134696.6A CN202310134696A CN116128364A CN 116128364 A CN116128364 A CN 116128364A CN 202310134696 A CN202310134696 A CN 202310134696A CN 116128364 A CN116128364 A CN 116128364A
- Authority
- CN
- China
- Prior art keywords
- text
- writing
- technical
- forming
- log
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 110
- 238000012544 monitoring process Methods 0.000 title claims abstract description 62
- 230000006399 behavior Effects 0.000 claims abstract description 103
- 238000011156 evaluation Methods 0.000 claims abstract description 61
- 230000008569 process Effects 0.000 claims abstract description 55
- 239000013598 vector Substances 0.000 claims description 89
- 238000013139 quantization Methods 0.000 claims description 58
- 230000011218 segmentation Effects 0.000 claims description 58
- 238000009826 distribution Methods 0.000 claims description 46
- 238000011002 quantification Methods 0.000 claims description 20
- 238000012549 training Methods 0.000 claims description 16
- 238000012986 modification Methods 0.000 claims description 10
- 230000004048 modification Effects 0.000 claims description 10
- 238000012545 processing Methods 0.000 claims description 10
- 238000005516 engineering process Methods 0.000 claims description 9
- 230000008859 change Effects 0.000 claims description 6
- 238000013507 mapping Methods 0.000 claims description 6
- 230000000306 recurrent effect Effects 0.000 claims description 2
- 230000007246 mechanism Effects 0.000 abstract description 3
- 238000005457 optimization Methods 0.000 abstract description 3
- 238000005259 measurement Methods 0.000 description 14
- 230000033458 reproduction Effects 0.000 description 9
- 230000015572 biosynthetic process Effects 0.000 description 8
- 239000003795 chemical substances by application Substances 0.000 description 7
- 239000013604 expression vector Substances 0.000 description 7
- 230000006870 function Effects 0.000 description 7
- 230000008901 benefit Effects 0.000 description 5
- 238000010276 construction Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 5
- 238000003780 insertion Methods 0.000 description 5
- 230000037431 insertion Effects 0.000 description 5
- 230000003993 interaction Effects 0.000 description 4
- 230000007812 deficiency Effects 0.000 description 3
- 238000012217 deletion Methods 0.000 description 3
- 230000037430 deletion Effects 0.000 description 3
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 238000012854 evaluation process Methods 0.000 description 2
- 239000012634 fragment Substances 0.000 description 2
- 238000007689 inspection Methods 0.000 description 2
- 230000010354 integration Effects 0.000 description 2
- 238000003058 natural language processing Methods 0.000 description 2
- 230000000750 progressive effect Effects 0.000 description 2
- 238000003860 storage Methods 0.000 description 2
- 230000002123 temporal effect Effects 0.000 description 2
- 238000005303 weighing Methods 0.000 description 2
- 238000009825 accumulation Methods 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 150000001875 compounds Chemical class 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 238000013441 quality evaluation Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0639—Performance analysis of employees; Performance analysis of enterprise or organisation operations
- G06Q10/06395—Quality analysis or management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/166—Editing, e.g. inserting or deleting
- G06F40/186—Templates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/18—Legal services
- G06Q50/184—Intellectual property management
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02P—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
- Y02P90/00—Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
- Y02P90/30—Computing systems specially adapted for manufacturing
Landscapes
- Business, Economics & Management (AREA)
- Engineering & Computer Science (AREA)
- Human Resources & Organizations (AREA)
- Theoretical Computer Science (AREA)
- Tourism & Hospitality (AREA)
- Economics (AREA)
- Entrepreneurship & Innovation (AREA)
- General Physics & Mathematics (AREA)
- Strategic Management (AREA)
- Physics & Mathematics (AREA)
- Operations Research (AREA)
- General Health & Medical Sciences (AREA)
- Educational Administration (AREA)
- Marketing (AREA)
- General Business, Economics & Management (AREA)
- Technology Law (AREA)
- Development Economics (AREA)
- Health & Medical Sciences (AREA)
- Game Theory and Decision Science (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Quality & Reliability (AREA)
- Primary Health Care (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Machine Translation (AREA)
Abstract
The invention provides a text writing quality monitoring method and system, which solve the technical problem that the writing quality of the prior art type standard text lacks quantitative supervision. The method comprises the following steps: forming a behavior acquisition frame for quantifying text input behaviors in a text writing environment; recording text input behaviors in a writing environment through a behavior acquisition framework to form a writing log; forming an effective output assessment of the text authoring based on the composition log; forming an effective information carrying capacity assessment of the text according to the written log; an information integrity assessment of the text is formed from the composition log. A hierarchical monitoring mechanism of time cost, personnel quality, and text quality throughout the writing process is formed. The evaluation data of the monitoring process provides evaluation basis among personal occupational quality, technical literacy, agent case technical schemes and manuscript determination texts for writers in the technical field, such as writers in the patent agent field, and provides quantitative basis for resource optimization for industries mainly based on human resource investment.
Description
Technical Field
The invention relates to the technical field of data processing, in particular to a text writing quality monitoring method and system.
Background
In the existing standard text writing process, the specific specification of a text format framework is met through a standard text template. The substantive text content is then formed within the format framework. There is greater flexibility in expressing the textual content of the information, and there may be greater composition differences for the substantial textual content of the same information load, including grammars, orderings, lexical elements, and logical mappings of contexts, among others. Taking the application text or the prosecution text as an example, the formation process involves comprehensive factors such as the practitioner's practice quality, technical literacy, time span, discussion integrity, and the like, which directly affect the text quality. The patent application text related to technology needs to show double of statement logic and technology logic in the writing process, and often needs to be written by a senior practitioner to meet the technical description rigors and text expression accuracy of substantial text contents. The text writing quality of primary practitioners needs to be ensured through quality inspection, and a large amount of precious human resources can be consumed. If a relatively objective automatic evaluation means is provided, the quality supervision of primary practitioners can be facilitated, the quality inspection resource consumption is relieved, and the training cost of the primary practitioners is further reduced.
Disclosure of Invention
In view of the above problems, embodiments of the present invention provide a method and a system for monitoring writing quality of text, which solve the technical problem that writing quality of the prior art standard type text lacks quantitative supervision.
The text writing quality monitoring method of the embodiment of the invention comprises the following steps:
forming a behavior acquisition frame for quantifying text input behaviors in a text writing environment;
recording text input behaviors in a writing environment through a behavior acquisition framework to form a writing log;
forming an effective output assessment of the text authoring based on the composition log;
forming an effective information carrying capacity assessment of the text according to the written log;
an information integrity assessment of the text is formed from the composition log.
The text writing quality monitoring system of the embodiment of the invention comprises:
the memory is used for storing program codes adopted in the processing process of the text writing quality monitoring method;
and a processor for executing the program code.
The text writing quality monitoring system of the embodiment of the invention comprises:
the input behavior quantification device is used for forming a behavior acquisition frame for quantifying text input behaviors in the text writing environment;
the input behavior recording device is used for recording text input behaviors in a writing environment through the behavior acquisition framework to form a writing log;
The effective output evaluation device is used for forming effective output evaluation of the text creation according to the writing log;
an information bearing evaluation device for forming an effective information bearing evaluation of the text according to the writing log;
and the information integrity assessment device is used for forming information integrity assessment of the text according to the writing log.
The text writing training system of the embodiment of the invention is characterized by comprising the following steps of:
the training client is used for deploying authentication service and word segmentation service, and acquiring a text writing environment, a behavior acquisition frame, a standard text template or an existing writing file according to an authentication result;
the training server side is used for forming a writing log by collecting quantized text input behaviors through the behavior collection framework, carrying out patent text evaluation corresponding to the training client side according to the writing log, and feeding back an evaluation result in a text writing environment.
The text writing quality monitoring method and system of the embodiment of the invention form a hierarchical monitoring mechanism of time cost, personnel quality and text quality throughout the writing process. The evaluation data of the monitoring process provides evaluation basis among personal professional quality, technical literacy, agent case technical scheme and manuscript determination text for writers in the technical field, such as writers in the patent agent field, and provides coordination quantification basis for coordinating writers and agent cases according to the technical field. The method can be further used as a weighing and weighting factor for the effective workload and the writing level of a writer, and provides a quantification basis for resource optimization for industries mainly based on human resource investment.
Drawings
FIG. 1 is a flow chart of a text writing quality monitoring method according to an embodiment of the invention.
FIG. 2 is a flow chart illustrating the quantification of input behavior in a text composition quality monitoring method according to an embodiment of the present invention.
FIG. 3 is a flow chart illustrating formation of a writing log in a text writing quality monitoring method according to an embodiment of the present invention.
FIG. 4 is a flow chart of the effective output assessment in the text composition quality monitoring method according to an embodiment of the present invention.
Fig. 5 is a schematic diagram of a primary manuscript time coordinate system in a text writing quality monitoring method according to an embodiment of the invention.
Fig. 6 is a schematic diagram showing a secondary manuscript behavior coordinate system in the text writing quality monitoring method according to an embodiment of the invention.
Fig. 7 is a schematic flow chart of effective information bearing capacity evaluation in a text writing quality monitoring method according to an embodiment of the invention.
FIG. 8 is a diagram of an information-bearing quantized coordinate system in a text-based quality monitoring method according to an embodiment of the invention.
FIG. 9 is a flow chart illustrating the evaluation of the integrity of the graph information in the text writing quality monitoring method according to an embodiment of the present invention.
FIG. 10 is a schematic diagram of a system for monitoring quality of patent text writing according to an embodiment of the present invention.
FIG. 11 is a schematic diagram of a training system for patent writing according to an embodiment of the present invention.
Detailed Description
The present invention will be further described with reference to the drawings and the detailed description below, in order to make the objects, technical solutions and advantages of the present invention more apparent. It will be apparent that the described embodiments are only some, but not all, embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
A text writing quality monitoring method according to an embodiment of the invention is shown in FIG. 1. In fig. 1, an embodiment of the present invention includes:
step 100: a behavior acquisition framework is formed that quantifies text input behavior in a text composition environment.
It will be appreciated by those skilled in the art that the basic function of the writing environment is to provide a graphical interface for receiving text input, to provide text editing and text shuffling functions in the graphical interface, and to provide text in text presentation and storage functions. Basic function expansion is performed on the basis of the writing environment to form a behavior acquisition framework consisting of a single behavior acquisition process, and automatic measurement tools for providing a series of quantized text input behaviors in the process of text input is carried out in the writing environment can comprise, but are not limited to, a time sequence measurement tool, a text word segmentation tool, a technical vocabulary resource, a vocabulary characteristic quantization tool, a vocabulary distribution characteristic quantization tool and the like. A labelkit utilizing the standard text template is also provided for locating text regions and a writer authentication kit utilizing the system service is also provided for binding the periodically formed writer text and writer behavior quantification data with the writer.
Step 200: a composition log is formed by recording text input behaviors in a composition environment through a behavior acquisition framework.
The identity of the writer in the writing environment is authenticated so that the text content can be bound to the quantified data of the writing action. The log is written to form behavior related data in the text content process, the behavior related data comprise operations and operation contents such as sequential input, partial insertion or partial deletion and the like corresponding to new building, merging or eliminating of text editing behaviors, and the operation contents can also comprise quantization types and quantization data of operations such as new building, linkage or expansion and the like corresponding to staged text content, for example, obtained word analysis results, attribute according to word change and the like. The composition log may also include, but is not limited to, the identity of the composer, the type of behavior quantification of the corresponding stage input text content, and behavior quantification data.
Step 300: an effective output assessment of the text authoring is formed from the composition log.
The associated data of the input behaviors included in the composition log can quantify composition professional quality dimensions such as the stepwise time sequence characteristics, the text accumulation rate, the text modification change scale and the like when the text content is input into the paragraph, and effectively evaluate the text composition efficiency.
Step 400: an effective information bearing capacity assessment of the text is formed according to the written log.
The related data of word analysis (such as technical words and common words) performed by the corresponding system service included in the composition log can quantify technical literacy dimensions such as the staged professional keyword number, professional keyword word frequency, grammar keyword number, grammar keyword word frequency, keyword distribution topological path, keyword topological path superposition dimension, keyword similarity, cross-reference complexity and the like when the text content is input into the paragraph, and effectively evaluate the technical information load.
Step 500: an information integrity assessment of the text is formed from the composition log.
The associated data of the word quantization basis included in the composition log can quantize the maturity dimension of information coordination matching among information loads in text expression processes such as global information logic level, local information logic level, information distribution logic similarity of each area in a standard text template and information distribution continuity of text content input stages, and effectively evaluate the overall rationality of the composition text.
Step 600: and forming a graphic interaction prompt of the text quality deficiency in the text writing environment according to the evaluation result.
The evaluation result corresponds to the writing stage and has time sequence. And displaying the data of the evaluation result in the writing environment in a graphic logic interaction mode when writing the paragraph, and carrying out adaptation prompt with corresponding text content or text area. Meanwhile, each evaluation result data can be independently stored as time sequence branch data in the writing log to form a continuous evaluation basis for text quality deficiency.
According to the text writing quality monitoring method, a hierarchical monitoring mechanism of time cost, personnel quality and text quality throughout the writing process is formed. The evaluation data of the monitoring process provides evaluation basis among personal professional quality, technical literacy, agent case technical scheme and manuscript determination text for writers in the technical field, such as writers in the patent agent field, and provides coordination quantification basis for coordinating writers and agent cases according to the technical field. The method can be further used as a weighing and weighting factor for the effective workload and the writing level of a writer, and provides a quantification basis for resource optimization for industries mainly based on human resource investment.
The input behavior quantization in the text composition quality monitoring method according to an embodiment of the present invention is shown in fig. 2. In fig. 2, the behavior acquisition framework of the embodiment of the present invention includes:
Step 110: and carrying out the identity authentication of the writer, matching the corresponding writing environment and behavior acquisition framework according to the identity authentication, and establishing or calling the corresponding writing log and writing file according to the identity authentication.
It will be appreciated by those skilled in the art that the user identity authentication process may be accomplished by a generic authentication module, an authentication component, or invoking an authentication service. The writer identity authentication process comprises the steps of identifying the identity of the writer, and associating and selecting the position type, the text template type, the current text selection and the like corresponding to the identity. The writer identity corresponds to a post type, the post type corresponds to the type of the standard text to be formed, and different standard texts have corresponding customized writing environments. Writer post types such as agent, lawyer, etc. The types of the system text include, for example, a patent application text corresponding to a post type, a civil divorce prosecution text, and the like. After the writer passes the identity authentication and determines the post and the type of the standard text, the writer forms the matching or the new construction of the existing text and the corresponding writing log according to the selection or the new construction state of the existing text, and forms the call or the new construction of the writing file in the writing environment. Those skilled in the art will appreciate that the writing environment may be formed using existing mature text editing environments such as word, wps, etc. or through secondary development customization, with basic text editing, storage, presentation functions. And simultaneously, synchronous forwarding or calling functions of corresponding state data and text data of the text operation behaviors are formed. The writing log records behavior quantification data mainly through a customized data format, and the writing file mainly records the internal identification of a text writing environment identifiable image-text object of the input and edited text content.
The text writing quality monitoring method of the embodiment of the invention integrates the behavior acquisition function on the basis of the existing writing environment and authentication service, so that the monitoring process and result analysis can be performed in a modularized manner, the behavior acquisition process and the monitoring evaluation process can have the position distribution characteristic and the management concentration characteristic in the network environment, the formation of a monitoring mode of a BS or CS is facilitated, and the monitoring scale is expanded.
Step 120: by capturing the time sequential state of text input, time sequential quantized data of text input behavior is formed.
The timeliness of text input is related to the continuous working state of the writer, to the professional quality of the writer, and to the performance characteristics of on-time billing. In one embodiment of the present invention, the formation of time-series quantized data includes:
step 121: the time interval of the initial text input is recorded.
Initial text entry refers to text content that sequentially increases at the end of an existing entered text or at the beginning of a newly created document in a composition document formed by a text template. The input time of two continuous phrases can be obtained by calling the system time, and the word numbers of the two phrases are recorded at the same time. Typically for inputs that do not use an input method, the two phrases may be two words that are split by space. Typically for input using an input method, each phrase may be a string of chinese-western characters of unequal length. The smallest scale temporal feature of the continuous input can be quantized according to the input time difference of adjacent phrases and the phrase word number.
Step 122: recording the time interval between sentences according to the sentence reading punctuation in the initial text input.
Periods such as commas and periods in the initial text entry may be used to mark a complete sentence. And acquiring the input time of two continuous sentences by calling the system time, and simultaneously recording the word numbers of the two sentences. The time features of the larger scale of the continuous input can be quantified according to the input time differences of the adjacent sentences.
Step 123: the time interval for paragraph formation is recorded based on paragraph marks in the initial text input.
Paragraph labels such as carriage return + line feed symbols in the initial text input may be used to label a complete paragraph. And acquiring the input time of two continuous paragraphs by calling the system time, and simultaneously recording the word numbers of the two sentences. Large scale temporal features of successive inputs can be quantified from the input time differences of adjacent paragraphs.
The combination of the above time features of different scales can accurately quantize the accurate formation time of characters, sentences and paragraphs in the initial character input (i.e. sequential input) process and form a quantization reference for subsequent reediting. The text object imported or embedded in the initial text input determines the time interval with the adjacent input text by the system time when the behavior occurs. Editable content in the graphic object is processed as initial text input.
Step 124: a draft time series is formed according to the termination mark of the initial text input.
The end mark may typically select a trigger event such as closing of the composition file, a timed store, or a specific button, and a time to finalize the document may be determined based on the system time at the time of the event trigger. The sequential process of initial text input forms a sequence of successive one-time contribution times with sequential monotonicity, each one-time contribution time being one of the sequential nodes in the initial text input.
Step 125: and taking the primary manuscript forming time as a measuring node, and recording the operation content of editing the initial text again by the relative measuring node.
The one-time draft time sequence forms a staged time feature of the overall draft text, and can be used as a measurement standard of the overall text position. Editing operations such as insertion, replacement, and deletion performed after one finalization time determination are performed by calling a specific time or period of the system time acquisition operation. The time characteristics and the operation characteristics of the writing text modification behavior can be quantified according to the operation specific time.
In one embodiment of the present invention, the interval time may be weighted according to the type of the input image object.
According to the text writing quality monitoring method, three time quantification modes for expressing writing behaviors are formed during time feature acquisition according to knowledge of writing behaviors. The first method comprises the steps of recording the minimum quantization time difference of initial text input, and establishing the minimum time granularity of quality evaluation; the second type establishes a time sequence node by using the intermittent time difference of the initial text input to form a time difference reference of a text intermittent input-continuous text forming process; and thirdly, recording additional time characteristics caused by modification in continuous texts on the basis of time sequence nodes. And a measurement basis for consistency of the modified content, time consumption of the modified content and the like in the process of writing the text is formed through association of three time quantization modes, so that the time measurement and the position measurement have correlation.
Step 130: sentence segmentation is carried out according to the text paragraphs, so that word quantization data of text input are formed.
Sentence segmentation is performed with paragraphs at the time of advertising the paragraphs. The word segmentation mainly adopts general natural language processing technology such as dictionary word segmentation algorithm, HMM, CRF, SVM, deep learning algorithm and the like, and more mature tools such as stanford word segmentation tool, hanlp word segmentation tool and system services such as hundred-degree natural language processing service. The forming of lexical quantization data includes:
Step 131: the segmentation in the paragraph determines a corresponding set of words.
The term set corresponds to a paragraph. The method has the advantages that the paragraph dividing word set can embody the association relation of all words in a single set, the main words in the word set are quantized, and the paragraph is taken as a basic unit to better adapt to the effective word dividing of adjacent words.
Step 132: and determining the association relationship among the words in the word set.
Through the special name recognition, part-of-speech tagging and lexical analysis in the system service word segmentation process, the association relation among the words in the paragraphs is established, the paragraphs are taken as basic units, the reliable association mapping sequence among the words with shorter distance can be better obtained, and the inter-word association sequence and meaning transmission sequence are defined.
Step 133: and obtaining a technical field professional word stock.
Technical field professional word stock is derived from technical literature formed according to technical classification. The simple technical vocabulary and the compound vocabulary which are used for writing the diversity of theoretical vocabulary, device vocabulary, functional vocabulary, public welfare vocabulary and the like in the text industry are covered, and the technical description vocabulary of the non-creative technical content can be completely covered.
Step 134: determining technical attributes of words in the word set and technical association among the words according to the association relation between the technical field professional word library and the words, and forming a technical word set and a common word set according to the mapping relation among the words.
On the basis that word sets in each paragraph form association relations, determining technical association relations of the same words and the same technical phrases among paragraphs. The technical vocabulary's quantized data may reflect the effective intrinsic association of technical logic and the common vocabulary's quantized data may reflect the effective intrinsic association of written logic.
Step 135: and determining similar technical words among the word sets according to the technical field professional word library and the technical word sets.
And determining reference technical words through a professional word stock, and matching similar technical words with the same word number but different words in the technical word set on the basis of the reference technical words. In one mode of determination, the close technical terms and the reference technical terms tend to have one or both of the same text but different text, and the close technical terms tend not to belong to standard technical terms in the professional word stock.
According to the text writing quality monitoring method, the words are segmented around the natural paragraphs in the text through the universal word segmentation technology, so that the word segmentation accuracy is effectively improved, and the influence of writing errors on other paragraphs is avoided. And determining the effective internal association of the technical vocabulary and the technical logic between the technical vocabulary by using the association relation between the professional dictionary and the paragraph segmentation words. The realization is that the technical words of the technical logic are highlighted and fault-tolerant from the common words, and the measurement basis of the writing quality of the technical content is ensured.
Step 140: forming word vectors according to the word quantization data, and primarily quantizing word distribution according to the word vectors.
The vectorization of the words can be based on the writing starting point of the text expression formed by developing from the writer thought, and the writing reference is used for measuring the measurement basis of the durability, the importance and the interaction force of the words.
The forming of the word vector data includes:
step 141: and carrying out distance vectorization on all technical words in the area through a writing reference set in a preset area of the text in the preparation mode, and forming area vector data of each technical word.
As will be appreciated by those skilled in the art, the system text has a fixed composition template that includes a fixed preset area within which the corresponding composition areas may be marked by label marks at the start and end positions. The tag may be set at the end of the preset area head system paragraph text. The start or stop label is used as writing reference. For example, taking a patent writing template as an example, setting a start tag and a stop tag in a preset area to define the preset area after 'specific implementation' of the template, taking the start tag as a writing reference of word vectorization, and generating corresponding area vectors for all technical words in the preset area after writing a paragraph. One region vector for each technical term is defined in terms of the number of characters of the technical term from the starting tag (of the preset region). To express the output degree of the technical information corresponding to the technical words in the exclusive area.
Step 142: local vector data for each technical term is formed by distance vectorization of all technical terms within a paragraph at a paragraph reference set by paragraph composition.
Paragraphs in the preset area have independence of technical logic expression. The beginning of each paragraph is used as the vectorization writing reference of words in the paragraph. For example, a start tag is set at the beginning of each paragraph and a stop tag is set at the end of each paragraph in a preset area of 'detailed description', and all technical words in each paragraph are generated into corresponding local vectors. One region vector for each technical term may be specifically the number of characters of the technical term from the start tag (of the paragraph). The output degree of the technical information corresponding to the technical words in each paragraph can be expressed.
Step 143: and carrying out statement complexity vectorization of common words in the paragraphs by taking the paragraph starting point as a paragraph writing reference to form expression vector data of the common word segments.
Paragraphs in the preset area have independence of language logic expression. Each paragraph start tag is used as a reference for vectorization of words in the paragraphs. For example, a start tag is set at the beginning of each paragraph in a preset area of the template 'detailed description', a stop tag is set at the end of each paragraph, and all non-technical words in each paragraph are generated into corresponding expression vectors. In one embodiment of the present invention, an expression vector of non-technical terms may be specifically the number of characters of the non-technical term segment between technical terms from the starting tag (of the paragraph). The complexity of the grammatical schedule corresponding to the common words in each paragraph can be expressed.
Step 144: distribution vector data of the same technical words in each region and each paragraph is determined.
And forming distribution vector data of the same technical words according to the regional vector data and the local vector data. A distribution vector of the same technical word may be specifically defined as a repeated occurrence of the same technical word in each preset region of the prefabricated template and grouped according to the local vector. A distribution characteristic of a technical word may be expressed.
Step 145: distribution state vector data of common words in each paragraph is determined.
And forming a distribution state vector of the common words according to the expression vector data. A distribution state vector of common words may be specifically defined as a repeated occurrence of the same non-technical word segment in each preset region of the prefabricated template. A distribution characteristic of a non-technical word segment may be expressed.
Step 146: and determining distribution difference vector data in each region and each paragraph after the advantages of similar technical words are combined.
And correlating the distribution state vector data of the adjacent technical words, mainly using the distribution state vector data of the reference technical word with the highest occurrence frequency, and adding and marking the distribution state vector data of the similar technical words.
According to the text writing quality monitoring method provided by the embodiment of the invention, technical words and common words are vectorized on the technical expression logic and the grammar expression complexity in the word vectorization process, and meanwhile, the compatibility quantization between the technical expression logic and the grammar expression complexity is ensured. The quantization reference accords with the development habit of the technical thought, and can fully quantize the measurement dimensions of newly created opportunity, reference frequency, importance and the like of the technical word.
The construction of the writing log in the text writing quality monitoring method according to an embodiment of the invention is shown in fig. 3. In FIG. 3, the composition log construction of an embodiment of the present invention includes:
step 210: the following initial text input key value pair structure is formed and used for caching the image-text object state of the current initial text input process:
text editing system time= (text object internal identification sequence, editing state).
The initial text refers to new composition text content that follows the composition text content. The text editing system time as a key refers to the system time of continuing the tail of the existing writing text in the writing environment, and performing single character, one-time character string segment entry or image-text object insertion. The internal identification of a graphic object as a value refers to the (continuous) unique internal identification given by the context in which the text or graphic object is written. The text content includes words, sentence reading punctuation and paragraph symbols. The teletext object comprises an embedded object such as a table or a graphic. The editing state refers to the current state of the image-text object corresponding to the internal identification sequence, and comprises one of a determination state of new creation, insertion and deletion.
Step 220: when the current initial text input process is finished or ended, the initial text input key value pairs in the sorting buffer form a time sequence log segment corresponding to initial text stage input, the primary draft section of the draft time sequence log is added, and the time sequence node is added in the primary draft time sequence.
The composition paragraph may trigger a state definition using composition file closure, composition file timing store, or a specific identification. The buffered sequence of pairs of initial text input key values records an edit process of the initial text input process. The internal identification sequence in the initial text entry key value pair corresponds to the determined edit content in each edit step, including but not limited to text, graphic object description parameters, punctuation marks, and the like. And finishing the editing content in the initial text input by combining the editing state, so as to form a time sequence log segment of the writing text corresponding to the complete and clean initial text input when writing the paragraph. The one-time manuscript duration can be relatively determined according to the first text writing time, the last text writing time, the sentence reading punctuation and the paragraph symbols of the time sequence log fragment. The time of the first paragraph is used as a time sequence node in the time sequence of the first manuscript forming, so that stable quantification of the first manuscript forming process can be ensured, and superposition operation of the first manuscript forming process and the second editing process can be distinguished.
Step 230: the following secondary editing key value pair structure is established by utilizing the time sequence node and is used for caching the image-text object state in the secondary editing process of the written text:
text editing system time= (text object internal identification sequence, editing state, editing position of reference time sequence node section).
And using the time sequence node as a reference standard, and using the time characteristic of the time sequence node time sequence to be converted into the position characteristic of the relative time sequence interval so as to establish the relative position standard of the secondary editing behavior, and referring to the relative position of the time sequence node section formed by the time sequence node to mark the marginal position.
Step 240: when the secondary editing process is a paragraph, the secondary editing key value pair is added to the secondary manuscript section of the time sequence log.
The behavior data of the second editing are all used as time sequence log fragments.
Step 250: when the written text paragraph is ended, the written text corresponding to the written time sequence log is segmented by calling the system segmentation service, and the segmentation result is sequentially stored in the written segmentation log through the following segmentation key values:
word internal identification sequence= (word segmentation word, word technical attribute, (((associated word) word internal identification sequence) associated order label)).
The word internal identification sequence corresponds to the word segmentation words of the word segmentation result. The recognizable embedded text in the image-text object is combined with the internal identification sequence of the image-text object and the internal identification sequence of the object as a part of the text to perform word segmentation. The word segmentation words record word segmentation texts, the word technical attributes record matching marks of the word segmentation texts and technical field professional word banks, and the matching marks can be binary marks. The related words are related word segmentation texts obtained according to the system word segmentation service, and word internal identification sequences and related sequence marks of the corresponding related words, wherein the related sequence marks can be quantization marks of related distances of the related words and the related words.
The latter written text often has discrete editing behavior and editing content compared with the last text, and the system word segmentation service is performed on the whole written text, and word segmentation is performed according to parallel processing requirements and processing resources when the writing of the text of a paragraph is finished or when the writing is terminated.
Step 260: when writing a text paragraph or ending, building area vectors and local vectors of technical words according to the word segmentation result, the preset area writing reference and the paragraph writing reference through the following technical word vector key value pairs, and storing the area vectors and the local vectors in a writing vector log:
Technical word internal identification sequence= (technical word, number of characters from preset area writing reference, number of paragraphs where technical word is located, number of characters from paragraph where reference is written).
The technical word internal identification sequence corresponds to the word segmentation technical word of the word segmentation result. The identifiable embedded technical words in the image-text object are combined with the internal identification sequence of the image-text object and the internal identification sequence of the object to be used as a part of the technical words for vector processing, and the image-text object is used as an independent paragraph. Technical words describe word text. And establishing an absolute area vector of each technical word in the preset area by using the character number of the writing reference from the preset area, and establishing a local vector of each technical word in the relative area of the preset area by using the number of the paragraph where the technical word is located and the character number of the writing reference from the paragraph where the technical word is located.
Step 270: at the beginning or end of the composition text paragraph, on the basis of vectorization of technical terms, the expression vector of the structured common terms is stored in the composition vector log by the following common term vector key values:
internal identification sequence of two adjacent technical words= (common word sequence between two adjacent technical words, region vector gap between two adjacent technical words).
The internal identification sequence of two adjacent technical words is the internal identification and combination of two adjacent technical words separated by only a common word. The common word sequence between two adjacent technical words is the corresponding common word independent text. The difference of the regional vectors between the adjacent two technical words is obtained according to the regional vector comparison of the technical words, and can reflect the difference between the adjacent technical words in the paragraphs and also can reflect the difference between the adjacent technical words in the paragraphs.
In one embodiment of the invention, certain sentence symbols such as commas and periods are treated as words in vectorization as constant weighted components.
Step 280: upon the composition of the text paragraph or termination, based on the vectorization of the technical terms, building technical term distribution features for the same technical term by the following technical term distribution key value pairs is stored in the composition vector log:
single technical word= (sequence of region vector and local vector at each occurrence of single technical word).
A single technical term refers to a text-consistent technical term. The distribution characteristics of the single technical word in the written text are quantized through the area vector when the single technical word is duplicated in the written text. Similar technical terms can be combined in advantage on the basis of single technical term distribution characteristics, and the similar technical terms are normalized into single technical term distribution characteristics.
According to the text writing quality monitoring method, quantitative data of different dimensions and scales of personalized text input behaviors are formed in a progressive mode through writing a time sequence log, writing a word segmentation log and writing a vector log. And the real-time record and staged integration of the text input behavior are ensured. The complete vectorization measurement dimension is established on the basis of ensuring that the written text word segmentation recognition effectively utilizes the mature technology to ensure the efficiency and the accuracy, so that the assessment of the workload, the working efficiency and the working quality has data diversity, and the deep mining of the text input behavior mode can be realized.
The effective throughput evaluation in the text composition quality monitoring method of an embodiment of the present invention is shown in fig. 4. In fig. 4, the effective output evaluation of the embodiment of the present invention includes:
step 310: and reading a primary draft section of the draft time sequence log, and acquiring a time sequence node, an initial text input time point and the corresponding input character number of a primary draft process to form a primary draft time coordinate system.
In one embodiment of the invention, the text and the image-text object of the identifiable content are processed similarly, and the image-text object of the unrecognizable content is weighted according to the number of characters corresponding to the object type. In an embodiment of the present invention, a primary draft time coordinate system (as shown in fig. 5) is used to quantify the data content of a primary draft process, where an x-axis is used as a primary draft time coordinate, a primary draft time span is marked by a timing node, an input behavior node is marked on the x-axis by an initial text input time point, and a y-axis is used as the number of input characters formed by an input behavior.
Step 320: and establishing an evaluation time window and a corresponding minimum input quantity threshold value, sliding in the direction of one-time manuscript forming time, quantifying the maximum value and the average value of the input character quantity in unit time, and using the maximum value and the average value to evaluate the maximum efficiency and the average efficiency of one-time manuscript forming stage and quantifying the effective input time length of the initial text according to the minimum input quantity threshold value.
Step 330: the time sequence node data in the primary manuscript forming section is mapped into the relative position data of the primary manuscript forming content, the primary manuscript forming position section is quantitatively determined according to the relative position data, the secondary editing behavior data in the secondary manuscript forming section of the writing time sequence log is read, and the editing behavior of the secondary manuscript forming content is quantitatively determined in the primary manuscript forming position section to form a secondary manuscript forming behavior coordinate system.
The determining time data at least comprises a time sequence node of the primary manuscript forming process, a sentence reading time point and a paragraph marking time point of the initial text input, and the determining time data in the primary manuscript forming process is mapped into relative position data of primary manuscript forming content. In an embodiment of the present invention, the coordinate system of the secondary manuscript-forming behavior (as shown in fig. 6) is used to quantify the editing operation behavior of the secondary manuscript-forming process, and the x-axis may be used as the relative position interval of the primary manuscript-forming content formed by using the relative position data, and the y-axis may be used as the quantification amplitude of the complexity of the editing state. For example, the respective magnitudes are deleted, pasted and inserted in the edit status, the content insertion 20 words at a time is determined in one relative position section (corresponding to the edit position), and the operation magnitude is assigned 1 (20 words are recorded for the operation recessive or dominant). The operation amplitude forms a superposition when editing is repeated in the same position section.
Step 340: and establishing an evaluation distance window, sliding in a monotonic direction of the primary manuscript forming position interval, and quantifying the maximum change value and the average value of the unit initial text editing behavior, wherein the maximum change value and the average value are used for evaluating the maximum modification degree and the average modification degree of the secondary editing stage.
The text writing quality monitoring method of the embodiment of the invention determines the behavior quantification of basic writing contents in the gradual sequential formation process of complete writing text through one-time manuscript forming process. And determining the behavior quantification of the editing content of the existing text content in the gradual sequence forming process of the complete written text through the secondary manuscript forming process. On the basis of facilitating the simplification of primary manuscript forming process behavior quantification, the time attribute of the primary manuscript forming process is converted into a distance interval of secondary manuscript forming content by using a primary manuscript forming process behavior quantification result in a type mapping mode. The method ensures that the twice behavior quantification aims at different dimensionalities affecting text output, ensures continuity of the twice behavior quantification, and can effectively form accurate manpower and time assessment for orderly writing and error correction adjustment of writing texts. The data base and basic steps required for evaluation are provided.
The evaluation of the effective information bearing capacity in the text writing quality monitoring method according to an embodiment of the present invention is shown in fig. 7. In fig. 7, the effective information bearing capacity evaluation of the embodiment of the present invention includes:
Step 410: and reading the written vector log, forming a single technical word set in the determined preset area, and determining the area vector and the local vector set of each single technical word in the determined preset area.
And taking each single technical word as a classification unit, and establishing vector sets of the technical words in the determined area and the area paragraph of the selected standard text model. So that the information bearing of each single technical word in the written text can be quantified intensively.
Step 420: and establishing the linear relative distance between the single technical words according to the determined area vector of the first occurrence of the single technical words in the preset area.
Those skilled in the art will appreciate that the effective technical information bearing of the written text is presented through the progressive exploitation of technical words. The sequence of occurrence among single technical words can reflect the time when the effective information bearing starts to expand and overlap, and can quantitatively reflect the basic logic level and inclusion degree of the effective information. The linear relative distance is utilized to quantify the effective connection between logic level and technical characteristics.
Step 430: and according to the number of paragraphs in the determined preset area, establishing paragraph axes of common endpoints of equal angle intervals around the linear position where the single technical word is located.
The number of paragraphs within the preset area is determined to be stable and quantifiable at the beginning or end of writing a paragraph of text. The polar axis is uniformly distributed in a two-dimensional space by taking the linear relative distance of a single technical word as the origin of the polar axis and taking the number of paragraphs as the number of polar axes. The feature distribution among the same words in the preset area can be quantitatively determined by combining paragraph distribution features and word distribution features through establishing a two-dimensional space of polar coordinate types.
Step 440: on the axis of the corresponding paragraph, the relative distances of the reproduction of the single technical word are established on the paragraph axis according to the local vector of the single technical word in the corresponding paragraph.
The linear relative distance of a single technical term in the same and different paragraphs is reflected on the polar axis.
Step 450: an information bearing quantized coordinate system is formed from the linear relative distance between the single technical words and the recurring relative distance of the single technical words.
An information-bearing quantized coordinate system (as shown in fig. 8) that determines the written text within the preset area can be established through the above quantization steps. The linear relative distance is utilized to establish the logic characteristic association among the single technical words, the polar axis two-dimensional space is utilized to establish the distribution characteristic association among the single technical words, and the polar axis distribution regularity of the polar axis two-dimensional space is utilized to establish the distribution characteristic association among the single technical words, so that the effective information bearing capacity among the technical words can be compared and quantified.
Step 460: and reading associated word information in the written word segmentation log, determining the association relation among the technical words, and quantifying the association relation among the technical words according to the linear relative distance among the single technical words to form the technical word information load association strength evaluation.
And determining basic association relations among technical words by writing associated word information of the technical words of word segmentation results in the word segmentation log, so as to form qualitative assessment of information load relation among the technical words. The quantitative assessment of the information load relationship between technical words is formed by the linear relative distance between single technical words in the information bearing quantization coordinate system. The evaluation condition may include, but is not limited to, a minimum linear relative distance between two associated single technical words, a maximum linear relative distance between two associated single technical words, a number of words and a series accumulated linear relative distance of a single technical word series chain, a degree of overlap between single technical word series chains, etc. The above evaluation conditions can provide a basic quantization basis for determining the information bearing distribution and the information logic architecture of the written text in the preset area.
Step 470: the information load logic state of the key technology content is evaluated based on the state of the recurring relative distances of the single technology word subsets in the same paragraph.
The relative distances of reproduction of the individual technical terms in the information-bearing quantized coordinate system are reflected by the relative distances of reproduction of the axes of the corresponding paragraphs. By superposing the axes of the corresponding paragraphs, the recurrence logic of a single technical word designed in the paragraphs can be obtained, and further the technical logic complexity of the technical content of the Ge i paragraphs and the information load density of the technical content are quantified.
According to the text writing quality monitoring method, a data association basis and a basic evaluation process thought for evaluating information load linear logic and distribution characteristics of technical words are established through the formation process of an information bearing quantization coordinate system. The technical logic among single technical words can be objectively quantized and displayed on the aspects of linear relation compactness and distribution density, visual quantization of evaluation results is facilitated, and evaluation feedback is established.
The evaluation of information integrity in the text composition quality monitoring method according to an embodiment of the present invention is shown in fig. 9. In fig. 9, the information integrity assessment includes:
step 510: and forming a corresponding information bearing quantization coordinate system in each preset area of the written text, and determining a technical word set of each preset area.
The information bearing quantization coordinate system of each preset area is formed by adopting the forming mode of determining the information bearing quantization coordinate system of the preset area, so that the information bearing quantization process in each preset area is consistent with a reference.
Step 520: and comparing the differences of the single technical words in the information bearing quantization coordinate system, and determining the technical logic differences of the technical contents in different preset areas.
The differences in the single technical terms may include, but are not limited to, differences in adjacency of the single technical terms determined from linear relative distances between the single technical terms, differences in term absence, and the like. The variation or expression difference of the technical logic of the technical content in different preset areas can be directly evaluated.
Step 530: and comparing the differences of the reproduction relative distances of the single technical words in the information bearing quantization coordinate system to determine the technical description differences of the technical contents in different preset areas.
Differences in the relative distances of the reproductions of the single technical word include, but are not limited to, the number of reproductions and the distances of reproductions of the single technical word, etc. The similarity of the parts of the technical content in different preset areas can be directly evaluated.
The text writing quality monitoring method provided by the embodiment of the invention can compare different volumes of the overall technical content of the written text so as to determine the degree of tightness of reference, the degree of similarity of text paragraphs and the degree of technical logic complexity in the overall technical scheme.
The text writing quality monitoring system according to an embodiment of the present invention includes:
a memory for storing program codes adopted in the processing procedure of the embodiment text writing quality monitoring method;
and the processor is used for executing the program codes adopted by the processing procedure of the embodiment text writing quality monitoring method.
The processor may employ a DSP (Digital Signal Processor) digital signal processor, an FPGA (Field-Programmable Gate Array) Field programmable gate array, a MCU (Microcontroller Unit) system board, a SoC (system on a chip) system board, or an PLC (Programmable Logic Controller) minimum system including I/O, or cloud processing resources.
A text composition quality monitoring system in accordance with an embodiment of the present invention is shown in FIG. 10. In fig. 10, an embodiment of the present invention includes:
input behavior quantifying means 10 for forming a behavior acquisition framework quantifying text input behaviors in a text composition environment;
input behavior recording means 20 for recording text input behaviors in a writing environment through a behavior acquisition framework to form a writing log;
effective output evaluation means 30 for forming an effective output evaluation of the text composition from the composition log;
Information bearing evaluation means 40 for forming an effective information bearing amount evaluation of the text from the composition log;
information integrity assessment means 50 for forming an information integrity assessment of the text from the composition log;
and the evaluation result prompting device 60 is used for forming a graphic interaction prompt of text quality deficiency in the text writing environment according to the evaluation result.
As shown in fig. 10, in an embodiment of the present invention, the input behavior quantization apparatus 10 includes:
the quantization environment setting module 11 is used for carrying out identity authentication of a writer, matching corresponding writing environments and behavior acquisition frames according to the identity authentication, and establishing or calling corresponding writing logs and writing files according to the identity authentication;
a time-series data quantization module 12 for forming time-series quantized data of text input behavior by capturing time-series states of text input;
the word segmentation service calling module 13 is used for performing sentence segmentation according to the text paragraphs to form word quantization data of text input;
the word distribution quantization module 14 is configured to form word vectors according to the word quantization data, and primarily quantize the word distribution according to the word vectors.
As shown in fig. 10, in an embodiment of the present invention, the time-series data quantization module 12 includes:
An initial segment quantization unit 12a for recording a time interval of initial text input;
an initial sentence quantization unit 12b for recording a time interval between sentences based on sentence reading punctuation in the initial text input;
an initial paragraph quantization unit 12c for recording a time interval formed by a paragraph on the basis of a paragraph mark in the initial text input;
an initial input marking unit 12d for forming a one-time finalization time series based on a termination mark of initial text input;
the secondary modification quantization unit 12e is configured to record the operation content of re-editing the initial text with respect to the measurement node by using the primary finalization time as the measurement node.
As shown in fig. 10, in an embodiment of the present invention, the word segmentation service calling module 13 includes:
a paragraph segmentation quantization unit 13a, configured to segment words in a paragraph to determine a corresponding word set;
a word segmentation and wave observation quantization unit 13b, configured to determine association relationships between words in the word set;
a technical word stock reading unit 13c, configured to obtain a technical field professional word stock;
a technical word marking unit 13d, configured to determine technical attributes of words in the word set and technical associations between words according to association relationships between technical field professional word libraries and words, and form a technical word set and a common word set according to mapping relationships between words;
And the similar word marking unit 13e is used for determining similar technical words among the word sets according to the technical field professional word stock and the technical word sets.
As shown in fig. 10, in an embodiment of the present invention, the word distribution quantization module 14 includes:
a technical area quantization unit 14a, configured to perform distance vectorization of all technical terms in an area through a writing reference set in a preset area of the text in the style, and form area vector data of each technical term;
a technical local quantization unit 14b for forming local vector data of each technical word by performing distance vectorization of all technical words in a paragraph at a paragraph reference set by paragraph writing;
a grammar local quantization unit 14c for vectorizing the presentation complexity of the common words in the paragraph with the paragraph start as the paragraph writing reference to form the expression vector data of the common word segments;
a technical distribution quantization unit 14d for determining distribution vector data of the same technical word in each region and each paragraph;
a grammar distribution quantization unit 14e for determining distribution state vector data of the common words in each paragraph;
a technical similarity quantization unit 14f for determining distribution difference vector data in each region and each paragraph after the similar technical word advantages are combined.
As shown in fig. 10, in an embodiment of the present invention, the input behavior recording apparatus 20 includes:
the draft process recording module 21 is configured to form the following initial text input key value pair structure, and is configured to buffer the state of the image-text object in the current initial text input process:
text editing system time= (text object internal identification sequence, editing state);
a primary draft recording module 22, configured to, when the current initial text input process is completed or terminated, sort the initial text input key value pairs in the buffer to form a time-series log segment corresponding to the initial text stage input, append a primary draft section in the draft time-series log, and append a time-series node in the primary draft time-series;
the editing process recording module 23 is configured to establish the following secondary editing key value pair structure by using the timing node, and is configured to buffer the state of the image-text object for performing the secondary editing process on the written text:
text editing system time= (text object internal identification sequence, editing state, editing position of reference time sequence node section);
a secondary manuscript recording module 24, configured to append the secondary editing key value pair to a secondary manuscript section of the written time sequence log to form a time sequence log segment of the text secondary editing when the secondary editing process is a paragraph;
The word segmentation content recording module 25 is configured to segment the written text corresponding to the written time sequence log by calling the system word segmentation service when the written text paragraph is open or terminated, and store the word segmentation result in the written word segmentation log according to the following word segmentation key values:
word internal identification sequence= (word segmentation word, word technical attribute, ((associated word) word internal identification sequence) associated order label);
a technical vector recording module 26 for constructing, at the time of writing a paragraph or a termination of the text, a region vector and a local vector of technical words from the word segmentation result, the preset region writing reference and the paragraph writing reference by the following technical word vector key value pair, stored in a writing vector log:
technical word internal identification sequence= (technical word, number of characters composing reference from preset area, number of paragraphs where technical word is located, number of characters composing reference from paragraphs where technical word is located);
the grammar vector recording module 27 is used for storing the expression vector of the structural ordinary words in the composition vector log by the following ordinary word vector key values on the basis of vectorization of the technical words when the composition text paragraph or the termination:
Internal identification sequence of two adjacent technical words= (common word sequence between two adjacent technical words, region vector gap between two adjacent technical words);
the word distribution recording module 28 is configured to, upon completion or expiration of a paragraph of written text, based on vectorization of technical words,
the structure of the technical term distribution features for the same technical term is stored in the composed vector log by the following technical term distribution key values:
single technical word= (sequence of region vector and local vector at each occurrence of single technical word).
As shown in fig. 10, in an embodiment of the present invention, the effective output evaluating device 30 includes:
the manuscript data quantization module 31 is configured to read a primary manuscript forming section of the time sequence log, obtain a time sequence node of a primary manuscript forming process, an initial text input time point and a corresponding input character number to form a primary manuscript forming time coordinate system;
the manuscript-forming efficiency quantifying module 32 is configured to establish an evaluation time window and a corresponding minimum input amount threshold, slide in a direction of one-time manuscript-forming time, quantify a maximum value and an average value of the number of input characters in a unit time, and evaluate a maximum efficiency and an average efficiency of one-time manuscript-forming stage, and quantify an effective input duration of an initial text according to the minimum input amount threshold;
An edit data quantization module 33, configured to map time sequence node data in the primary draft section to relative position data of primary draft content, determine a primary draft position section according to the relative position data quantization, read secondary edit behavior data in a secondary draft section of the draft time sequence log, and quantize edit behaviors of the secondary draft content in the primary draft position section to form a secondary draft behavior coordinate system;
the editing efficiency quantization module 34 is configured to establish an evaluation distance window, slide in a monotonic direction of the primary manuscript-forming location interval, and quantize a maximum variation value and an average value of the editing behavior of the unit initial text, and evaluate a maximum modification degree and an average modification degree of the secondary editing stage.
As shown in fig. 10, in an embodiment of the present invention, the information bearer evaluation device 40 includes:
a technical vector collection module 41, configured to read the written vector log, determine a set of single technical terms, and determine an area vector and a local vector set of each single technical term in a determined preset area;
the area vector measurement module 42 is configured to establish a linear relative distance between the single technical terms according to the area vector that determines that the single technical term first appears in the preset area;
The local vector measurement module 43 is configured to establish paragraph axes of common endpoints of equal angle intervals around a linear position where a single technical term is located according to the determined number of paragraphs in the preset area;
a local vector marking module 44, configured to establish a reproduction relative distance of the single technical term on the axis of the corresponding paragraph according to the local vector of the single technical term in the corresponding paragraph;
a technical vector integration module 45, configured to form an information-bearing quantized coordinate system according to the linear relative distance between the single technical terms and the recurrent relative distance of the single technical terms;
the technical term evaluation module 46 is used for reading the associated term information in the written word segmentation log, determining the association relation among the technical terms, quantifying the association relation among the technical terms according to the linear relative distance among the single technical terms, and forming the technical term information load association strength evaluation;
a technical paragraph evaluation module 47 for evaluating the information load logic state of the key technical content based on the state of the recurring relative distances of the single technical word subsets in the same paragraph.
As shown in fig. 10, in an embodiment of the present invention, the information integrity evaluation apparatus 50 includes:
The evaluation reference establishing module 51 is configured to form a corresponding information bearing quantization coordinate system in each preset area of the written text, and determine a technical word set in each preset area;
the technical logic evaluation module 52 is used for comparing the differences of single technical words in the information bearing quantization coordinate system and determining the technical logic differences of the technical content in different preset areas;
the technical load evaluation module 53 is configured to compare differences of reproduction relative distances of single technical terms in each information-bearing quantized coordinate system, and determine differences of technical descriptions of technical contents in different preset areas.
An embodiment of the present invention patent writing training system is shown in fig. 11. In fig. 11, the text writing quality monitoring method according to the embodiment of the present invention includes:
the training client is used for deploying authentication service and word segmentation service, and acquiring a text writing environment, a behavior acquisition frame, a standard text template or an existing writing file according to an authentication result;
the training server side is used for forming a writing log by collecting quantized text input behaviors through the behavior collection framework, carrying out patent text evaluation corresponding to the training client side according to the writing log, and feeding back an evaluation result in a text writing environment.
The patent writing training system of the embodiment of the invention utilizes the distributed network structure to deploy the processing process of the text writing quality monitoring method, so that the number of writers and the evaluation efficiency in the patent writing training process can be quickly adapted depending on the distributed network resources.
The present invention is not limited to the above-mentioned embodiments, and any changes or substitutions that can be easily understood by those skilled in the art within the technical scope of the present invention are intended to be included in the scope of the present invention. Therefore, the protection scope of the present invention should be subject to the protection scope of the claims.
Claims (10)
1. A text composition quality monitoring method, comprising:
forming a behavior acquisition frame for quantifying text input behaviors in a text writing environment;
recording text input behaviors in a writing environment through a behavior acquisition framework to form a writing log;
forming an effective output assessment of the text authoring based on the composition log;
forming an effective information carrying capacity assessment of the text according to the written log;
an information integrity assessment of the text is formed from the composition log.
2. The text composition quality monitoring method of claim 1, wherein the forming a behavior acquisition framework in a text composition environment that quantifies text input behavior comprises:
carrying out the identity authentication of the writer, matching the corresponding writing environment and behavior acquisition framework according to the identity authentication, and establishing or calling the corresponding writing log and writing file according to the identity authentication;
forming time sequence quantized data of text input behaviors by capturing time sequence states of text input;
sentence segmentation is carried out according to the text paragraphs, so that word quantization data of text input are formed;
forming word vectors according to the word quantization data, and primarily quantizing word distribution according to the word vectors.
3. The text composition quality monitoring method of claim 1, wherein the forming a composition log comprises:
when the current initial text input process is finished or is finished, forming a time sequence log segment corresponding to initial text stage input by initial text input key value pairs in an arrangement buffer, adding a primary draft section of a draft time sequence log, and adding a time sequence node in a primary draft time sequence;
when the secondary editing process is a paragraph, forming a time sequence log segment of text secondary editing by the secondary editing key value pair, and adding the time sequence log segment into a secondary manuscript writing section of the time sequence log;
When the writing text of the paragraph is written or is terminated, the written text corresponding to the writing time sequence log is segmented by calling a system word segmentation service, and the segmentation result is sequentially stored in the writing word segmentation log through word segmentation key values;
when writing a text paragraph or ending, establishing area vectors and local vectors of technical words according to word segmentation results, preset area writing benchmarks and paragraph writing benchmarks, and storing the area vectors and the local vectors in a writing vector log;
upon the conclusion or expiration of the composition text paragraph, the structural technical term distribution features are stored in the composition vector log for the same technical term by the technical term distribution key value pair on the basis of vectorization of the technical terms.
4. The text composition quality monitoring method of claim 1, wherein the forming an effective output assessment of text composition from the composition log comprises:
reading a primary manuscript forming section of a time sequence log, and acquiring a time sequence node, an initial text input time point and the corresponding input character number in a primary manuscript forming process to form a primary manuscript forming time coordinate system;
establishing an evaluation time window and a corresponding minimum input quantity threshold value, sliding in the direction of one-time manuscript forming time, quantifying the maximum value and the average value of the input character quantity in unit time, and using the maximum value and the average value to evaluate the maximum efficiency and the average efficiency of one-time manuscript forming stage and quantifying the effective input time length of the initial text according to the minimum input quantity threshold value;
Mapping time sequence node data in the primary manuscript forming section into relative position data of primary manuscript forming content, quantitatively determining a primary manuscript forming position section according to the relative position data, reading secondary editing behavior data in a secondary manuscript forming section of a writing time sequence log, and quantitatively determining editing behaviors of the secondary manuscript forming content in the primary manuscript forming position section to form a secondary manuscript forming behavior coordinate system;
and establishing an evaluation distance window, sliding in a monotonic direction of the primary manuscript forming position interval, and quantifying the maximum change value and the average value of the unit initial text editing behavior, wherein the maximum change value and the average value are used for evaluating the maximum modification degree and the average modification degree of the secondary editing stage.
5. The text composition quality monitoring method of claim 1, wherein the forming an effective information bearing capacity assessment of the text from the composition log comprises:
reading a written vector log, forming a single technical word set in a determined preset area, and determining an area vector and a local vector set of each single technical word in the determined preset area;
establishing linear relative distance between the single technical words according to the area vector of the first occurrence of the single technical word in the preset area;
According to the number of paragraphs in a determined preset area, establishing paragraph axes of common endpoints of equal angle intervals around the linear position where a single technical word is located;
establishing the reproduction relative distance of the single technical word on the axis of the corresponding paragraph according to the local vector of the single technical word in the corresponding paragraph;
forming an information bearing quantization coordinate system according to the linear relative distance between the single technical words and the recurrent relative distance of the single technical words;
and reading associated word information in the written word segmentation log, determining the association relation among the technical words, and quantifying the association relation among the technical words according to the linear relative distance among the single technical words to form the technical word information load association strength evaluation.
6. The text composition quality monitoring method of claim 5, further comprising:
the information load logic state of the key technology content is evaluated based on the state of the recurring relative distances of the single technology word subsets in the same paragraph.
7. The text composition quality monitoring method of claim 1, wherein the forming an information integrity assessment of the text from the composition log comprises:
forming a corresponding information bearing quantization coordinate system in each preset area of the written text, and determining a technical word set of each preset area;
Comparing the differences of single technical words in the information bearing quantization coordinate system to determine technical logic differences of technical contents in different preset areas;
and comparing the differences of the reproduction relative distances of the single technical words in the information bearing quantization coordinate system to determine the technical description differences of the technical contents in different preset areas.
8. A text composition quality monitoring system, comprising:
a memory for storing program code employed in the processing of the text composition quality monitoring method of any one of claims 1 to 7;
and a processor for executing the program code.
9. A text composition quality monitoring system, comprising:
the input behavior quantification device is used for forming a behavior acquisition frame for quantifying text input behaviors in the text writing environment;
the input behavior recording device is used for recording text input behaviors in a writing environment through the behavior acquisition framework to form a writing log;
the effective output evaluation device is used for forming effective output evaluation of the text creation according to the writing log;
an information bearing evaluation device for forming an effective information bearing evaluation of the text according to the writing log;
And the information integrity assessment device is used for forming information integrity assessment of the text according to the writing log.
10. A patent composition training system, characterized by using the text composition quality monitoring method according to any one of claims 1 to 7, comprising:
the training client is used for deploying authentication service and word segmentation service, and acquiring a text writing environment, a behavior acquisition frame, a standard text template or an existing writing file according to an authentication result;
the training server side is used for forming a writing log by collecting quantized text input behaviors through the behavior collection framework, carrying out patent text evaluation corresponding to the training client side according to the writing log, and feeding back an evaluation result in a text writing environment.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310134696.6A CN116128364B (en) | 2023-02-20 | 2023-02-20 | Text writing quality monitoring method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310134696.6A CN116128364B (en) | 2023-02-20 | 2023-02-20 | Text writing quality monitoring method and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116128364A true CN116128364A (en) | 2023-05-16 |
CN116128364B CN116128364B (en) | 2024-01-16 |
Family
ID=86300980
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310134696.6A Active CN116128364B (en) | 2023-02-20 | 2023-02-20 | Text writing quality monitoring method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116128364B (en) |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
AU2004218661A1 (en) * | 2003-10-09 | 2005-04-28 | Walter Fiori | A system and method for linguistic assessment of written text |
CN103208038A (en) * | 2013-05-03 | 2013-07-17 | 武汉大学 | Patent introduction predicted value calculation method |
CN105069511A (en) * | 2015-08-25 | 2015-11-18 | 长沙市麓智信息科技有限公司 | On-line patent writing business monitoring system |
US20160133147A1 (en) * | 2014-11-10 | 2016-05-12 | Educational Testing Service | Generating Scores and Feedback for Writing Assessment and Instruction Using Electronic Process Logs |
CN111832266A (en) * | 2020-07-14 | 2020-10-27 | 广东聚智诚科技有限公司 | On-line analysis method and system for patent application file writing quality |
CN112017078A (en) * | 2020-08-26 | 2020-12-01 | 深圳市唯德科创信息有限公司 | Auxiliary writing method, processing device and storage medium of patent document |
US10964224B1 (en) * | 2016-03-15 | 2021-03-30 | Educational Testing Service | Generating scores and feedback for writing assessment and instruction using electronic process logs |
CN113221536A (en) * | 2020-12-29 | 2021-08-06 | 广东电网有限责任公司 | Method and device for analyzing similar paragraphs in file based on natural language |
CN115496630A (en) * | 2022-09-02 | 2022-12-20 | 维正知识产权科技有限公司 | Patent writing quality checking method and system based on natural language algorithm |
-
2023
- 2023-02-20 CN CN202310134696.6A patent/CN116128364B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
AU2004218661A1 (en) * | 2003-10-09 | 2005-04-28 | Walter Fiori | A system and method for linguistic assessment of written text |
CN103208038A (en) * | 2013-05-03 | 2013-07-17 | 武汉大学 | Patent introduction predicted value calculation method |
US20160133147A1 (en) * | 2014-11-10 | 2016-05-12 | Educational Testing Service | Generating Scores and Feedback for Writing Assessment and Instruction Using Electronic Process Logs |
CN105069511A (en) * | 2015-08-25 | 2015-11-18 | 长沙市麓智信息科技有限公司 | On-line patent writing business monitoring system |
US10964224B1 (en) * | 2016-03-15 | 2021-03-30 | Educational Testing Service | Generating scores and feedback for writing assessment and instruction using electronic process logs |
CN111832266A (en) * | 2020-07-14 | 2020-10-27 | 广东聚智诚科技有限公司 | On-line analysis method and system for patent application file writing quality |
CN112017078A (en) * | 2020-08-26 | 2020-12-01 | 深圳市唯德科创信息有限公司 | Auxiliary writing method, processing device and storage medium of patent document |
CN113221536A (en) * | 2020-12-29 | 2021-08-06 | 广东电网有限责任公司 | Method and device for analyzing similar paragraphs in file based on natural language |
CN115496630A (en) * | 2022-09-02 | 2022-12-20 | 维正知识产权科技有限公司 | Patent writing quality checking method and system based on natural language algorithm |
Also Published As
Publication number | Publication date |
---|---|
CN116128364B (en) | 2024-01-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108170468B (en) | Method and system for automatically detecting annotation and code consistency | |
CN110717324B (en) | Judgment document answer information extraction method, device, extractor, medium and equipment | |
CN113254574A (en) | Method, device and system for auxiliary generation of customs official documents | |
CN111462752B (en) | Attention mechanism, feature embedding and BI-LSTM (business-to-business) based customer intention recognition method | |
Shen et al. | A hybrid model for quality assessment of Wikipedia articles | |
CN112417083B (en) | Method for constructing and deploying text entity relationship extraction model and storage device | |
WO2022226716A1 (en) | Deep learning-based java program internal annotation generation method and system | |
CN113076720B (en) | Long text segmentation method and device, storage medium and electronic device | |
CN114218379B (en) | Attribution method for question answering incapacity of intelligent question answering system | |
CN112541337B (en) | Document template automatic generation method and system based on recurrent neural network language model | |
CN115357719B (en) | Power audit text classification method and device based on improved BERT model | |
CN112560419B (en) | Automatic document generation method and system | |
CN113204967B (en) | Resume named entity identification method and system | |
CN109885821B (en) | Article writing method and device based on artificial intelligence and computer storage medium | |
Fong et al. | What did they do? deriving high-level edit histories in wikis | |
CN113220768A (en) | Resume information structuring method and system based on deep learning | |
CN117520561A (en) | Entity relation extraction method and system for knowledge graph construction in helicopter assembly field | |
CN116795789B (en) | Method and device for automatically generating patent retrieval report | |
CN116128364B (en) | Text writing quality monitoring method and system | |
CN117421226A (en) | Defect report reconstruction method and system based on large language model | |
CN113033536A (en) | Work note generation method and device | |
CN111797236A (en) | Automatic text quality evaluation method based on long text segmentation | |
CN112528642A (en) | Implicit discourse relation automatic identification method and system | |
CN116258131A (en) | Template engine-based scheme compiling method and system | |
CN115577712A (en) | Text error correction method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20230914 Address after: Room 107, 1st Floor, 101, Building 1 to 2, Yard 1, Hangfeng Road, Fengtai District, Beijing, 100160 Applicant after: Beijing Tianfang Intellectual Property Agency Co.,Ltd. Address before: Room 316, 3rd Floor, North 2nd Gate, East Passage, Building 2, Bishui Garden Public Building, Xiwengzhuang Town, Miyun District, Beijing, 101512 Applicant before: Beijing Zhonglian Xunjie Communication Technology Co.,Ltd. |
|
GR01 | Patent grant | ||
GR01 | Patent grant |