CN110069785A - A kind of paper Authority Contro1 and analysis platform and system based on component agreement - Google Patents
A kind of paper Authority Contro1 and analysis platform and system based on component agreement Download PDFInfo
- Publication number
- CN110069785A CN110069785A CN201910368553.5A CN201910368553A CN110069785A CN 110069785 A CN110069785 A CN 110069785A CN 201910368553 A CN201910368553 A CN 201910368553A CN 110069785 A CN110069785 A CN 110069785A
- Authority
- CN
- China
- Prior art keywords
- module
- component
- syntagma
- writer
- paper
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000004458 analytical method Methods 0.000 title claims abstract description 48
- 230000005540 biological transmission Effects 0.000 claims description 26
- 239000012634 fragment Substances 0.000 claims description 17
- 238000000605 extraction Methods 0.000 claims description 16
- 238000012545 processing Methods 0.000 claims description 13
- 230000003252 repetitive effect Effects 0.000 claims description 12
- 238000003058 natural language processing Methods 0.000 claims description 11
- 238000007418 data mining Methods 0.000 claims description 10
- 238000012544 monitoring process Methods 0.000 claims description 8
- 230000006835 compression Effects 0.000 claims description 4
- 238000007906 compression Methods 0.000 claims description 4
- 230000008520 organization Effects 0.000 claims description 3
- 235000013399 edible fruits Nutrition 0.000 claims 1
- 238000000034 method Methods 0.000 abstract description 18
- 230000008569 process Effects 0.000 abstract description 13
- 238000011160 research Methods 0.000 abstract description 10
- 230000006399 behavior Effects 0.000 abstract description 4
- 238000013135 deep learning Methods 0.000 abstract description 3
- 239000000284 extract Substances 0.000 description 12
- 230000011218 segmentation Effects 0.000 description 6
- 238000012549 training Methods 0.000 description 6
- 238000009412 basement excavation Methods 0.000 description 4
- 241000208340 Araliaceae Species 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 2
- 206010028916 Neologism Diseases 0.000 description 2
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 description 2
- 235000003140 Panax quinquefolius Nutrition 0.000 description 2
- 238000013075 data extraction Methods 0.000 description 2
- 235000008434 ginseng Nutrition 0.000 description 2
- 238000007373 indentation Methods 0.000 description 2
- 230000010354 integration Effects 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 239000000203 mixture Substances 0.000 description 2
- 238000003908 quality control method Methods 0.000 description 2
- 238000007619 statistical method Methods 0.000 description 2
- 230000009471 action Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 230000005055 memory storage Effects 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/237—Lexical tools
- G06F40/247—Thesauruses; Synonyms
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/253—Grammatical analysis; Style critique
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Software Systems (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Medical Informatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a kind of paper Authority Contro1s and analysis platform and system based on component agreement, the pattern to component provided according to universities and colleges and profession is arranged, the input data of student (writer) is converted to the XML component with related style, and boil down to text document, process is write to student simultaneously and carries out real-time duplicate checking, and log history writes situation.The present invention solves the problems, such as that traditional manual editorial efficiency is low, arbitrariness is also big, error rate is high, tracking and analysis based on historical record, can act of plagiarism of the students ' (writer) during writing article in real time, the effectively academic improper behavior of control.Further, the text analyzing method based on deep learning, can excavate the potential relationship of paper area research trend and domain entities, and grasp thesis work totality propulsion progress.
Description
Technical field
The present invention relates to technical field of data processing, and in particular to it is a kind of based on component agreement paper Authority Contro1 with point
Analyse platform and system.
Background technique
Currently, each universities and colleges have differences in academic dissertation format specification, quality control, it is main in paper format specification
By off-line editing device and Microsoft Word software, artificial typesetting and inspection are carried out according to call format, by template
Off-line editing device generate paper meet domain requirement but do not meet universities and colleges and its particular professional demand, and generate opinion
Text needs a large amount of manual modifications, then needs a large amount of artificial typesettings by the document of Microsoft Word software editing, exist with
The drawbacks such as meaning property is big, error rate is high.The structure duplicate checking or segmented text duplicate checking to final version paper are relied primarily in quality control
System realizes, tutor can not real-time tracking student to the situation of writing of its paper main body, and duplicate checking result is only the plagiarism at original text
Rate can not track plagiarism process and behavior based on writing article history.Meanwhile school and responsible educational institution's contour level
Secondary control unit can not control based on a large amount of papers it paper studies situation in unit, such as field distribution, research preference,
Quality of Papers, plagiarism situation etc. carry out grasp macroscopical.
Summary of the invention
In view of the above-mentioned problems, the present invention provides a kind of paper Authority Contro1 and analysis platform based on component agreement and is
System, the pattern to component provided according to universities and colleges and profession arrange, the input data of student (writer) are converted to phase
The XML component of pattern, and boil down to text document are closed, while process is write to student and carries out real-time duplicate checking, and log history is write
Situation is write, provides data supporting for statistical analysis.
The present invention specifically:
In a first aspect, providing a kind of paper Authority Contro1 and analysis platform based on component agreement, comprising:
Edit support system, command and control system, supervision module;Editor's support system include syntagma extraction module,
Analyzing sentence fragments module, component generation module, syntagma memory module;The command and control system includes component definition module, binding
Module;
The syntagma extraction module, for the sentence based on natural language processing (NLP) model extraction writer's input data
Section, and the syntagma is passed into analyzing sentence fragments module and syntagma memory module;
The analyzing sentence fragments module, for being classified according to preset algorithm to the syntagma, to sorted syntagma into
Row duplicate checking, and duplicate checking result is passed into syntagma memory module;The classification of the classification includes: definition type, reference type, common
Type;
The component generation module, for receiving the text export instruction of writer's sending, by the input data of writer
Text file is exported as, and returns to the text file to writer;
The syntagma memory module, for saving writer's input data, syntagma, duplicate checking result;
The component definition module, for being sorted according to paper typesetting specification definitions component pattern and component;For with institute
Component generation module is stated to carry out data transmission, Xiang Suoshu component generation module transmitting assembly pattern and component sequence match confidence
Breath;
The binding module, for receiving and processing binding writer's request;
The supervision module, for sending binding writer's request to the binding module.
The component definition module is suitable according to the paper typesetting specification definitions component pattern and component of universities and colleges and subject composition
Sequence can drag control arrangement in graphic interface when operation, define to complete;Optional component include but is not limited to title,
Subtitle, tutor, student name, profession, catalogue, header, footer, cover template, abstract title, clip Text, abstract are crucial
Word title, abstract keyword content, be applicable in subject category, level-one title, second level title, three-level title, level Four title, in text
Hold, annex content;The component pattern includes paragraph style, text style, customized pattern, and the paragraph style includes but not
It is limited to paragraph alignment, paragraph indentation, section spacing, line-spacing, outline rank, paragraph paging attribute, text style includes but not
It is limited to applicable Chinese font, applicable english font, font size, word space, font vertical alignment mode;When definition, need
Paragraph style and text style are set for each component, and the component of definition and universities and colleges' agreement are bound, and universities and colleges can be according to its subject
Or the corresponding paper component of professional definition, and it is saved in database.
Further, the syntagma extraction module is specifically used for:
The languages that the input data is identified based on natural language processing, according to preset canonical template to the input number
According to cutting syntagma, the syntagma of cutting is forwarded to the analyzing sentence fragments module and syntagma memory module.
Further, the analyzing sentence fragments module is specifically used for:
Part of speech identification and text classification are carried out based on syntagma of the natural language processing to the cutting, calculates the cutting
The repetitive rate of syntagma, and repeat type is marked, the repetitive rate and repeat type are passed into the syntagma memory module;It calculates
The process of repetitive rate specifically:
Based on word2vec model using disclosed general corpus and each disciplines open training word to
Amount.Given syntagma is quickly divided based on scheduled dictionary and using AhoCorasickDoubleArrayTrie algorithm
Word similarly segments the syntagma of papers other in syntagma repository or document.To each word segmentation result obtain corresponding word to
Term vector phase adduction is averaging by amount.The vector for obtaining syntagma indicates, and calculates its included angle cosine value.This algorithm flow is used for
The similarity of two given syntagmas is calculated, the analysis for calculating similarity only takes and first three immediate sentence of the target syntagma meaning of one's words
Section.With the original repetition score value of similar score benchmark.And duplicate paragraph, position, repetitive rate are recorded, in conjunction with universities and colleges
The weight of agreement repeats comprehensive marking to syntagma, obtains comprehensive repetitive rate;The repeat type includes that definition type repeats, draws
It is repeated with type, maximum probability plagiarism type repeats.
Further, the syntagma memory module is based on determinant storage organization, stores up writer in same data column memory
To the version of input data modified each time, and corresponding timestamp is saved, the data entity of each version storage wraps
The input data for including writer, the syntagma extracted from input data, the duplicate checking result of syntagma.
Further, editor's support system further includes historical trace module, in the syntagma memory module
The edition data modified each time is obtained, for the edition data editor ID and time parameter modified each time, is exported to go through
History editor's data.
Second aspect provides a kind of paper Authority Contro1 and analysis system based on component agreement, including above-mentioned platform, volume
It collects terminal, command control terminal, guide and supervise terminal;
The editor terminal carries out data transmission with editor's support system, for writer provide text editing and
Export interface;
The command control terminal carries out data transmission with the command and control system, for providing commander's control to manager
Interface processed;
The terminal of guiding and supervising carries out data transmission with the supervision module, carries out for providing to teacher to writer
The interface guided and supervised.
Further, the editor terminal specifically includes:
Editor module, for providing RichText Edition environment to writer;
Monitoring module obtains writer's input data, and the input data is passed for monitoring writer's input state
Pass the syntagma extraction module and syntagma memory module;The process specifically: monitor the volume that writer completes a paragraph
When writing, the data write are forwarded to syntagma in the form of JSON (JavaScript Object Notation) serialized data
Extraction module and syntagma memory module;
The text is exported instruction and sent by text export module for obtaining the text export instruction of writer's transmission
To the component generation module, the input data is exported as text file, and receive by the component generation module
The text file returned.
The editor terminal is the equipment such as the mobile phone for being equipped with special-purpose software or application, tablet computer, the special-purpose software
Or the application integration editor module, monitoring module for writer provide input in forms such as APP or Web applications as carrier
Data-interface, and guide writer to edit, monitor editing mode etc..
Further, after writer triggers the text export module, the component generation module is specifically used for:
Carry out data transmission with the component definition module, according to the parameter that writer provides, Xiang Suoshu component definition mould
Block obtains corresponding component pattern and component sequence, according to the component pattern of acquisition, converts symbol for the input data of writer
The XML component for closing office open xml specification, the component sequence further according to acquisition are ranked up the XML component;It will row
XML component after sequence carries out set compression, generates text file;Should during, component definition module is by the component pattern of definition
It sorts with component and the component generation module is passed to by JSON serialized data;During being somebody's turn to do, component generation module foundation
The relevant JSON serialized data of component extracts each component dependence sequence, meets Office Open according to the generation of component pattern
The XML component of xml specification, then according to XML component described in component ordered arrangement, it later will according to Office Open xml specification
XML assembly set boil down to docx file after arrangement, or docx file is optionally converted into pdf document, finally by docx
File or pdf document return to writer;The parameter that the writer provides includes the affiliated school of writer, institute, subject, i.e.,
The module data that writer is accordingly defined by itself school, institute, subject to the component definition module request;The text
The format of file includes docx, PDF.
Further, the command control terminal specifically includes:
Input module is defined, is carried out data transmission with the component definition module;For editing paper typesetting specification, and will
The paper typesetting specification is sent to the component definition module;
Processing module is bound, for receiving and processing asking by the specified writer of binding for guiding and supervising terminal transmission
It asks;For the permission guiding and supervising terminal distribution and being guided and supervised to corresponding writer;It is described to guide and supervise terminal acquisition
After the management supervision permission, all data that whole can be supervised corresponding writer and edit, and history current to writer's paper
It writes situation and carries out guidance and supervision.
Further, which further includes statistics and analysis module, and target is school and responsible educational institution using object
The control unit of contour level is docked for carrying out data with all editor terminals in extent of competence, is obtained syntagma and is stored mould
The full-text data of block carries out data mining with parser according to default statistics, obtains statistics and analysis data;By the statistics
The command and control terminal is returned to analysis data, is visualized;
The task that the statistics is completed with analysis statistical module includes carrying out data mining to monograph, to high-level
A large amount of papers in control range carry out data mining.Analysis will excavate area research entity from paper data, can incite somebody to action
Similar field studies the paper cluster of entity.Research entity and each entity in paper extractable for the excavation of monograph
Between potential relationship.
For the data mining solution processes of monograph are as follows: 1, extract all of monograph from syntagma memory module
Syntagma.2, name Entity recognition is realized using the mode of predefined dictionary+machine learning.The dictionary is predefined word
Allusion quotation, predefined dictionary include the technical term and common name of every subjects, place name and other common words.It is based on
AhoCorasickDoubleArrayTrie algorithm model carries out very fast participle to paper syntagma using customized dictionary, and from
The technical term of name, place name, each subject is extracted in word segmentation result.Participle based on dictionary is limited to the covering of dictionary itself
Range and precision, the participle based on CRF (condition random field) have better identification, the process of specific implementation to neologisms are as follows: utilize
Disclosed corpus executes the training of CRF model, provides participle interface based on training result, and divided using syntagma as ginseng is entered
Word.All word segmentation results are integrated, are sorted according to the frequency occurred in syntagma.3, in the result that name entity extracts, into
One step extracts the entity relationship in syntagma using BiLSTM+attention model.Between excavation applications research object
Potential relationship.
A large amount of papers in high-rise control range are carried out with the process of data mining are as follows: 1, based on control range to document point
Group 2 extracts name entity in each paper syntagma based on extracting identical mode with monograph.3, based on being extracted in each paper
High frequency occur name entity to all papers in control range utilize KNN (K-Nearest Neighbor) algorithm model
It is clustered, and exports the cluster result of name entity and paper.
The beneficial effects of the present invention are embodied in:
The present invention can be required according to the paper format of different universities and colleges, subject, profession, be carried out using modularization agreement to paper
Automatic format processing, solves the problems, such as that traditional manual editorial efficiency is low, arbitrariness is also big, error rate is high.Based on historical record
Tracking and analysis, can act of plagiarism of the students ' (writer) during writing article in real time, rather than only focus on and look into
Weight is as a result, the effectively academic improper behavior of control.Further, the text analyzing method based on deep learning, can excavate opinion
Literary area research tends to and the potential relationship of domain entities, and grasps thesis work totality propulsion progress.
Detailed description of the invention
It, below will be to specific in order to illustrate more clearly of the specific embodiment of the invention or technical solution in the prior art
Embodiment or attached drawing needed to be used in the description of the prior art are briefly described.In all the appended drawings, similar element
Or part is generally identified by similar appended drawing reference.In attached drawing, each element or part might not be drawn according to actual ratio.
Fig. 1 is a kind of structural representation of paper Authority Contro1 and analysis platform based on component agreement of the embodiment of the present invention
Figure;
Fig. 2 is the structural representation of paper Authority Contro1 and analysis platform that another kind of the embodiment of the present invention is arranged based on component
Figure;
Fig. 3 is a kind of structural representation of paper Authority Contro1 and analysis system based on component agreement of the embodiment of the present invention
Figure;
Fig. 4 is structural representation of the another kind of the embodiment of the present invention based on the component paper Authority Contro1 arranged and analysis system
Figure;
Fig. 5 is that a kind of data transmission of paper Authority Contro1 and analysis system based on component agreement of the embodiment of the present invention is shown
It is intended to.
Specific embodiment
It is described in detail below in conjunction with embodiment of the attached drawing to technical solution of the present invention.Following embodiment is only used for
Clearly illustrate technical solution of the present invention, therefore be only used as example, and cannot be used as a limitation and limit protection model of the invention
It encloses.
It should be noted that unless otherwise indicated, technical term or scientific term used in this application should be this hair
The ordinary meaning that bright one of ordinary skill in the art are understood.
As shown in Figure 1, for a kind of paper Authority Contro1 and analysis platform embodiment based on component agreement of the present invention, packet
It includes:
Edit support system 11, command and control system 12, supervision module 13;Editor's support system 11 is mentioned including syntagma
Modulus block, analyzing sentence fragments module, component generation module, syntagma memory module;The command and control system 12 includes component definition
Module, binding module;
The syntagma extraction module, for the sentence based on natural language processing (NLP) model extraction writer's input data
Section, and the syntagma is passed into analyzing sentence fragments module and syntagma memory module;
The analyzing sentence fragments module, for being classified according to preset algorithm to the syntagma, to sorted syntagma into
Row duplicate checking, and duplicate checking result is passed into syntagma memory module;The classification of the classification includes: definition type, reference type, common
Type;
The component generation module, for receiving the text export instruction of writer's sending, by the input data of writer
Text file is exported as, and returns to the text file to writer;
The syntagma memory module, for saving writer's input data, syntagma, duplicate checking result;
The component definition module, for being sorted according to paper typesetting specification definitions component pattern and component;For with institute
Component generation module is stated to carry out data transmission, Xiang Suoshu component generation module transmitting assembly pattern and component sequence match confidence
Breath;
The binding module, for receiving and processing binding writer's request;
The supervision module, for sending binding writer's request to the binding module.
The component definition module is suitable according to the paper typesetting specification definitions component pattern and component of universities and colleges and subject composition
Sequence can drag control arrangement in graphic interface when operation, define to complete;Optional component include but is not limited to title,
Subtitle, tutor, student name, profession, catalogue, header, footer, cover template, abstract title, clip Text, abstract are crucial
Word title, abstract keyword content, be applicable in subject category, level-one title, second level title, three-level title, level Four title, in text
Hold, annex content;The component pattern includes paragraph style, text style, customized pattern, and the paragraph style includes but not
It is limited to paragraph alignment, paragraph indentation, section spacing, line-spacing, outline rank, paragraph paging attribute, text style includes but not
It is limited to applicable Chinese font, applicable english font, font size, word space, font vertical alignment mode;When definition, need
Paragraph style and text style are set for each component, and the component of definition and universities and colleges' agreement are bound, and universities and colleges can be according to its subject
Or the corresponding paper component of professional definition, and it is saved in database.
Preferably, the syntagma extraction module is specifically used for:
The languages that the input data is identified based on natural language processing, according to preset canonical template to the input number
According to cutting syntagma, the syntagma of cutting is forwarded to the analyzing sentence fragments module and syntagma memory module.
Preferably, the analyzing sentence fragments module is specifically used for:
Part of speech identification and text classification are carried out based on syntagma of the natural language processing to the cutting, calculates the cutting
The repetitive rate of syntagma, and repeat type is marked, the repetitive rate and repeat type are passed into the syntagma memory module;It calculates
The process of repetitive rate specifically:
Based on word2vec model using disclosed general corpus and each disciplines open training word to
Amount.Given syntagma is quickly divided based on scheduled dictionary and using AhoCorasickDoubleArrayTrie algorithm
Word similarly segments the syntagma of papers other in syntagma repository or document.To each word segmentation result obtain corresponding word to
Term vector phase adduction is averaging by amount.The vector for obtaining syntagma indicates, and calculates its included angle cosine value.This algorithm flow is used for
The similarity of two given syntagmas is calculated, the analysis for calculating similarity only takes and first three immediate sentence of the target syntagma meaning of one's words
Section.With the original repetition score value of similar score benchmark.And duplicate paragraph, position, repetitive rate are recorded, in conjunction with universities and colleges
The weight of agreement repeats comprehensive marking to syntagma, obtains comprehensive repetitive rate;The repeat type includes that definition type repeats, draws
It is repeated with type, maximum probability plagiarism type repeats.Preferably, the syntagma memory module is based on determinant storage organization, in same number
According to column memory storage writer to the version of input data modified each time, and corresponding timestamp is saved, each version is deposited
Syntagma, the duplicate checking result of syntagma that the data entity of storage includes the input data of writer, extracts from input data.
Preferably, editor's support system 11 further includes historical trace module as described in Figure 2, for storing mould in the syntagma
The edition data modified each time is obtained in block, for the edition data editor ID and time parameter modified each time, output
For historical compilation data.
As shown in figure 3, for a kind of paper Authority Contro1 and analysis system embodiment based on component agreement of the present invention, including
Above-mentioned platform, command control terminal 32, guides and supervises terminal 33 at editor terminal 31;
The editor terminal 31 carries out data transmission with editor's support system 11, compiles for providing text to writer
Volume and export interface;
The command control terminal 32 carries out data transmission with the command and control system 12, refers to for providing to manager
Wave control interface;
The terminal 33 of guiding and supervising carries out data transmission with the supervision module 13, for providing to teacher to writer
The interface guided and supervised.
Preferably, the editor terminal 31 specifically includes:
Editor module, for providing RichText Edition environment to writer;
Monitoring module obtains writer's input data, and the input data is passed for monitoring writer's input state
Pass the syntagma extraction module and syntagma memory module;The process specifically: monitor the volume that writer completes a paragraph
When writing, the data write are forwarded to syntagma in the form of JSON (JavaScript Object Notation) serialized data
Extraction module and syntagma memory module;
The text is exported instruction and sent by text export module for obtaining the text export instruction of writer's transmission
To the component generation module, the input data is exported as text file, and receive by the component generation module
The text file returned.
The editor terminal 31 is to be equipped with the equipment such as mobile phone, the tablet computer of special-purpose software or application, described dedicated soft
Part or the application integration editor module, monitoring module provide defeated in forms such as APP or Web applications as carrier for writer
Enter data-interface, and guide writer to edit, monitor editing mode etc..
Preferably, after writer triggers the text export module, the component generation module is specifically used for:
Carry out data transmission with the component definition module, according to the parameter that writer provides, Xiang Suoshu component definition mould
Block obtains corresponding component pattern and component sequence, according to the component pattern of acquisition, converts symbol for the input data of writer
The XML component for closing office open xml specification, the component sequence further according to acquisition are ranked up the XML component;It will row
XML component after sequence carries out set compression, generates text file;Should during, component definition module is by the component pattern of definition
It sorts with component and the component generation module is passed to by JSON serialized data;During being somebody's turn to do, component generation module foundation
The relevant JSON serialized data of component extracts each component dependence sequence, meets Office Open according to the generation of component pattern
The XML component of xml specification, then according to XML component described in component ordered arrangement, it later will according to Office Open xml specification
XML assembly set boil down to docx file after arrangement, or docx file is optionally converted into pdf document, finally by docx
File or pdf document return to writer;The parameter that the writer provides includes the affiliated school of writer, institute, subject, i.e.,
The module data that writer is accordingly defined by itself school, institute, subject to the component definition module request;The text
The format of file includes docx, PDF.
Preferably, the command control terminal 32 specifically includes:
Input module is defined, is carried out data transmission with the component definition module;For editing paper typesetting specification, and will
The paper typesetting specification is sent to the component definition module;
Processing module is bound, guides and supervises the specified writer's of binding that terminal 33 sends by described for receiving and processing
Request;The permission guided and supervised to corresponding writer is distributed for the terminal 33 of guiding and supervising;It is described to guide and supervise terminal
After 33 obtain the management supervision permission, all data that whole can be supervised corresponding writer and edit are current to writer's paper
Situation, which is write, with history carries out guidance and supervision.
Preferably, as shown in figure 4, further including statistics and analysis module, target is school and education supervisor portion using object
The control unit of the contour level of door is docked for carrying out data with all editor terminals 31 in extent of competence, is obtained syntagma and is deposited
The full-text data for storing up module carries out data mining with parser according to default statistics, obtains statistics and analysis data;It will be described
Statistics returns to the command control terminal 32 with analysis data, is visualized;The statistics and analysis statistics mould
The task that block is completed includes carrying out data mining to monograph, carries out data digging to a large amount of papers in high-level control range
Pick.Analysis will excavate area research entity from paper data, and the paper that similar field can be studied to entity clusters.For
The potential relationship between research entity and each entity in the extractable paper of the excavation of monograph.
For the data mining solution processes of monograph are as follows: 1, extract all of monograph from syntagma memory module
Syntagma.2, name Entity recognition is realized using the mode of predefined dictionary+machine learning.The dictionary is predefined word
Allusion quotation, predefined dictionary include the technical term and common name of every subjects, place name and other common words.It is based on
AhoCorasickDoubleArrayTrie algorithm model carries out very fast participle to paper syntagma using customized dictionary, and from
The technical term of name, place name, each subject is extracted in word segmentation result.Participle based on dictionary is limited to the covering of dictionary itself
Range and precision, the participle based on CRF (condition random field) have better identification, the process of specific implementation to neologisms are as follows: utilize
Disclosed corpus executes the training of CRF model, provides participle interface based on training result, and divided using syntagma as ginseng is entered
Word.All word segmentation results are integrated, are sorted according to the frequency occurred in syntagma.3, in the result that name entity extracts, into
One step extracts the entity relationship in syntagma using BiLSTM+attention model.Between excavation applications research object
Potential relationship.
A large amount of papers in high-rise control range are carried out with the process of data mining are as follows: 1, based on control range to document point
Group 2 extracts name entity in each paper syntagma based on extracting identical mode with monograph.3, based on being extracted in each paper
High frequency occur name entity to all papers in control range utilize KNN (K-Nearest Neighbor) algorithm model
It is clustered, and exports the cluster result of name entity and paper.
Further to turn to be illustrated to data transmission stream of the present invention, a kind of paper specification control based on component agreement is provided
The data of system and analysis system transmit schematic diagram, as described in Figure 5, comprising:
User oriented editor terminal 31, command control terminal 32 guide and supervise terminal 33, carry out function support and business
Syntagma extraction module, analyzing sentence fragments module, syntagma memory module, historical trace module, component generation module, the binding mould of processing
Block, component definition module, supervision module, statistical analysis module;Each terminal is carried out data transmission with each intermodule by gateway;Group
The paper component pattern and component collating sequence definitions component that part definition module is provided according to command control terminal 32, and store and arrive
This module;Editor terminal gets component definition from component definition module, and guidance writer divide paragragh editor's paper, monitors every
The paragragh that editor completes is uploaded to syntagma and provided by the editing mode of a nature paragraph when a paragragh editor completes
Module extracts syntagma, and the complete paragragh of the syntagma of extraction and upload is sent to syntagma memory module and deposit for the first time
Storage, then analyzing sentence fragments module is sent by the syntagma of extraction;Analyzing sentence fragments module classifies to each syntagma, duplicate checking is given a mark, language
Method analysis, and analysis result is stored again in syntagma memory module;
Editor terminal 31, which issues, generates paper documented instructions to component generation module, and component generation module is from component definition mould
Block securing component defines data and obtains paper editor's data of recent release from syntagma memory module, according in component definition
Component pattern data and editor's data generate multiple components, and according to the module arrangement sequence arrangement component in component definition, will
The component structure of arrangement is docx file according to office open xml Protocol compression and feeds back to editor terminal 31;
It guides and supervises terminal 33 request to bind student by command control terminal 32, command control terminal 32 is by bind request
It to be given to binding module, and is forwarded to the editor terminal 31 of corresponding student, the editor terminal 31 of binding student's account receives request, with
Specified teacher, which establishes, supervises and guides relationship;Command control terminal 32 can be tracked to historical trace module by the history number of instruction of papil
According to;
Statistics and analysis module from syntagma memory module obtain the syntagma data in each control range, according to scheduled statistics
Statistics and analysis is carried out to the data in each control range with mining model, and statistic analysis result is stored to this system, is referred to
It waves controlling terminal 32 and goes back the statistic analysis result in specified range and to visualize with analysis module to statistics.
The present invention can be required according to the paper format of different universities and colleges, subject, profession, be carried out using modularization agreement to paper
Automatic format processing, solves the problems, such as that traditional manual editorial efficiency is low, arbitrariness is also big, error rate is high.Based on historical record
Tracking and analysis, can act of plagiarism of the students ' (writer) during writing article in real time, rather than only focus on and look into
Weight is as a result, the effectively academic improper behavior of control.Further, the text analyzing method based on deep learning, can excavate opinion
Literary area research tends to and the potential relationship of domain entities, and grasps thesis work totality propulsion progress.
Finally, it should be noted that the above embodiments are only used to illustrate the technical solution of the present invention., rather than its limitations;To the greatest extent
Pipe present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that: its according to
So be possible to modify the technical solutions described in the foregoing embodiments, or to some or all of the technical features into
Row equivalent replacement;And these are modified or replaceed, various embodiments of the present invention technology that it does not separate the essence of the corresponding technical solution
The range of scheme should all cover within the scope of the claims and the description of the invention.
Claims (10)
1. a kind of paper Authority Contro1 and analysis platform based on component agreement characterized by comprising
Edit support system, command and control system, supervision module;Editor's support system includes syntagma extraction module, syntagma
Analysis module, component generation module, syntagma memory module;The command and control system includes component definition module, binding module;
The syntagma extraction module, for extracting the syntagma of writer's input data based on Natural Language Processing Models, and by institute
It states syntagma and passes to analyzing sentence fragments module and syntagma memory module;
The analyzing sentence fragments module looks into sorted syntagma for being classified according to preset algorithm to the syntagma
Weight, and duplicate checking result is passed into syntagma memory module;
The component generation module exports the input data of writer for receiving the text export instruction of writer's sending
For text file, and the text file is returned to writer;
The syntagma memory module, for saving writer's input data, syntagma, duplicate checking result;
The component definition module, for being sorted according to paper typesetting specification definitions component pattern and component;For with described group
Part generation module carries out data transmission, the configuration information of Xiang Suoshu component generation module transmitting assembly pattern and component sequence;
The binding module, for receiving and processing binding writer's request;
The supervision module, for sending binding writer's request to the binding module.
2. platform as described in claim 1, which is characterized in that the syntagma extraction module is specifically used for:
The languages that the input data is identified based on natural language processing cut the input data according to preset canonical template
The syntagma of cutting is forwarded to the analyzing sentence fragments module and syntagma memory module by subordinate sentence section.
3. platform as claimed in claim 2, which is characterized in that the analyzing sentence fragments module is specifically used for:
Part of speech identification and text classification are carried out based on syntagma of the natural language processing to the cutting, calculates the syntagma of the cutting
Repetitive rate, and mark repeat type, the repetitive rate and repeat type passed into the syntagma memory module;The repetition
Type includes that definition type repeats, reference type repeats, maximum probability plagiarism type repeats.
4. platform as claimed in claim 3, which is characterized in that the syntagma memory module is based on determinant storage organization,
Same data column memory stores up writer to the version of input data modified each time, and saves corresponding timestamp, each
The data entity of version storage includes the input data of writer, the duplicate checking knot of the syntagma extracted from input data, syntagma
Fruit.
5. platform as claimed in claim 4, which is characterized in that editor's support system further includes historical trace module, is used
In obtaining the edition data modified each time in the syntagma memory module, for the edition data editor modified each time
ID and time parameter export as historical compilation data.
6. a kind of paper Authority Contro1 and analysis system based on component agreement, which is characterized in that including such as claim 1-5
Any platform, command control terminal, guides and supervises terminal at editor terminal;
The editor terminal carries out data transmission with editor's support system, for providing text editing and export to writer
Interface;
The command control terminal carries out data transmission with the command and control system, connects for providing command and control to manager
Mouthful;
The terminal of guiding and supervising carries out data transmission with the supervision module, instructs for providing to teacher writer
The interface of supervision.
7. paper Authority Contro1 and analysis system as claimed in claim 6 based on component agreement, which is characterized in that the volume
Terminal is collected to specifically include:
Editor module, for providing RichText Edition environment to writer;
Monitoring module obtains writer's input data, and the input data is passed to for monitoring writer's input state
The syntagma extraction module and syntagma memory module;
Text export instruction is sent to institute for obtaining the text export instruction of writer's transmission by text export module
Component generation module is stated, the input data is exported as text file, and receives and is returned by the component generation module
The text file.
8. paper Authority Contro1 and analysis system as claimed in claim 7 based on component agreement, which is characterized in that writing
After people triggers the text export module, the component generation module is specifically used for:
Carry out data transmission with the component definition module, according to the parameter that writer provides, Xiang Suoshu component definition module is obtained
It takes corresponding component pattern and component to sort, according to the component pattern of acquisition, converts the input data of writer to and meet
The XML component of office open xml specification, the component sequence further according to acquisition are ranked up the XML component;It will sequence
XML component afterwards carries out set compression, generates text file;The parameter that the writer provides include the affiliated school of writer,
Institute, subject;The format of the text file includes docx, PDF.
9. paper Authority Contro1 and analysis system as claimed in claim 8 based on component agreement, which is characterized in that the finger
Controlling terminal is waved to specifically include:
Input module is defined, is carried out data transmission with the component definition module;For editing paper typesetting specification, and will be described
Paper typesetting specification is sent to the component definition module;
Processing module is bound, for receiving and processing by the request of the specified writer of the binding for guiding and supervising terminal transmission;
For the permission guiding and supervising terminal distribution and being guided and supervised to corresponding writer.
10. paper Authority Contro1 and analysis system as claimed in claim 9 based on component agreement, which is characterized in that also wrap
Statistics and analysis module are included, is docked for carrying out data with all editor terminals in extent of competence, and obtains syntagma storage mould
The full-text data of block carries out data mining with parser according to default statistics, obtains statistics and analysis data;By the statistics
The command control terminal is returned to analysis data, is visualized;The statistics includes: that paper is ground with analysis data
Study carefully field trend, thesis work totality propulsion progress.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910368553.5A CN110069785A (en) | 2019-05-05 | 2019-05-05 | A kind of paper Authority Contro1 and analysis platform and system based on component agreement |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910368553.5A CN110069785A (en) | 2019-05-05 | 2019-05-05 | A kind of paper Authority Contro1 and analysis platform and system based on component agreement |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110069785A true CN110069785A (en) | 2019-07-30 |
Family
ID=67370171
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910368553.5A Pending CN110069785A (en) | 2019-05-05 | 2019-05-05 | A kind of paper Authority Contro1 and analysis platform and system based on component agreement |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110069785A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111507073A (en) * | 2020-04-10 | 2020-08-07 | 甯航 | Thesis editing and intelligent typesetting method and platform based on web rich text |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101178786A (en) * | 2006-11-09 | 2008-05-14 | 上海晨鸟信息科技有限公司 | Online dissertation management method for realizing plagiarize and format checking by network resource |
-
2019
- 2019-05-05 CN CN201910368553.5A patent/CN110069785A/en active Pending
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101178786A (en) * | 2006-11-09 | 2008-05-14 | 上海晨鸟信息科技有限公司 | Online dissertation management method for realizing plagiarize and format checking by network resource |
Non-Patent Citations (1)
Title |
---|
方婷云: "基于XML的社科期刊自适应排版技术研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111507073A (en) * | 2020-04-10 | 2020-08-07 | 甯航 | Thesis editing and intelligent typesetting method and platform based on web rich text |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105279495B (en) | A kind of video presentation method summarized based on deep learning and text | |
CN110825882A (en) | Knowledge graph-based information system management method | |
CN109299865B (en) | Psychological evaluation system and method based on semantic analysis and information data processing terminal | |
CN110377900A (en) | Checking method, device, computer equipment and the storage medium of Web content publication | |
Baur et al. | eXplainable cooperative machine learning with NOVA | |
CN110612524B (en) | Information processing apparatus, information processing method, and recording medium | |
CN111311459B (en) | Interactive question-setting method and system for international Chinese teaching | |
CN106407482B (en) | A kind of network academic report category method based on multi-feature fusion | |
CN109933783A (en) | A kind of essence of a contract method of non-performing asset operation field | |
CN106776695A (en) | The method for realizing the automatic identification of secretarial document value | |
Qin et al. | Learning latent semantic annotations for grounding natural language to structured data | |
CN110069785A (en) | A kind of paper Authority Contro1 and analysis platform and system based on component agreement | |
Shao et al. | Research on a new automatic generation algorithm of concept map based on text clustering and association rules mining | |
Zhou et al. | Research on college graduates employment prediction model based on C4. 5 algorithm | |
Oyama et al. | Visual clarity analysis and improvement support for presentation slides | |
CN117235233A (en) | Automatic financial report question-answering method and device based on large model | |
Brath et al. | Automated Insights on Visualizations with Natural Language Generation | |
CN109711799A (en) | Guide the teaching software and its operation method of the standardization office of administration hilllock | |
CN109902299A (en) | A kind of text handling method and device | |
CN113688233A (en) | Text understanding method for semantic search of knowledge graph | |
CN115204128A (en) | Configuration file generation method and device and computer readable storage medium | |
Miksatko et al. | What’s in a cluster? automatically detecting interesting interactions in student e-discussions | |
Rachapudi et al. | Discovery of structured data using unsupervised spatial clustering and human supervision | |
CN109543182A (en) | A kind of electric power enterprise based on solr engine takes turns interactive semantic analysis method more | |
Sengupta et al. | In-detail analysis on custom teaching and learning framework |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190730 |