CN110069785A - A kind of paper Authority Contro1 and analysis platform and system based on component agreement - Google Patents

A kind of paper Authority Contro1 and analysis platform and system based on component agreement Download PDF

Info

Publication number
CN110069785A
CN110069785A CN201910368553.5A CN201910368553A CN110069785A CN 110069785 A CN110069785 A CN 110069785A CN 201910368553 A CN201910368553 A CN 201910368553A CN 110069785 A CN110069785 A CN 110069785A
Authority
CN
China
Prior art keywords
module
component
syntagma
writer
paper
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910368553.5A
Other languages
Chinese (zh)
Inventor
甯航
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN201910368553.5A priority Critical patent/CN110069785A/en
Publication of CN110069785A publication Critical patent/CN110069785A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/247Thesauruses; Synonyms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/253Grammatical analysis; Style critique
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a kind of paper Authority Contro1s and analysis platform and system based on component agreement, the pattern to component provided according to universities and colleges and profession is arranged, the input data of student (writer) is converted to the XML component with related style, and boil down to text document, process is write to student simultaneously and carries out real-time duplicate checking, and log history writes situation.The present invention solves the problems, such as that traditional manual editorial efficiency is low, arbitrariness is also big, error rate is high, tracking and analysis based on historical record, can act of plagiarism of the students ' (writer) during writing article in real time, the effectively academic improper behavior of control.Further, the text analyzing method based on deep learning, can excavate the potential relationship of paper area research trend and domain entities, and grasp thesis work totality propulsion progress.

Description

A kind of paper Authority Contro1 and analysis platform and system based on component agreement
Technical field
The present invention relates to technical field of data processing, and in particular to it is a kind of based on component agreement paper Authority Contro1 with point Analyse platform and system.
Background technique
Currently, each universities and colleges have differences in academic dissertation format specification, quality control, it is main in paper format specification By off-line editing device and Microsoft Word software, artificial typesetting and inspection are carried out according to call format, by template Off-line editing device generate paper meet domain requirement but do not meet universities and colleges and its particular professional demand, and generate opinion Text needs a large amount of manual modifications, then needs a large amount of artificial typesettings by the document of Microsoft Word software editing, exist with The drawbacks such as meaning property is big, error rate is high.The structure duplicate checking or segmented text duplicate checking to final version paper are relied primarily in quality control System realizes, tutor can not real-time tracking student to the situation of writing of its paper main body, and duplicate checking result is only the plagiarism at original text Rate can not track plagiarism process and behavior based on writing article history.Meanwhile school and responsible educational institution's contour level Secondary control unit can not control based on a large amount of papers it paper studies situation in unit, such as field distribution, research preference, Quality of Papers, plagiarism situation etc. carry out grasp macroscopical.
Summary of the invention
In view of the above-mentioned problems, the present invention provides a kind of paper Authority Contro1 and analysis platform based on component agreement and is System, the pattern to component provided according to universities and colleges and profession arrange, the input data of student (writer) are converted to phase The XML component of pattern, and boil down to text document are closed, while process is write to student and carries out real-time duplicate checking, and log history is write Situation is write, provides data supporting for statistical analysis.
The present invention specifically:
In a first aspect, providing a kind of paper Authority Contro1 and analysis platform based on component agreement, comprising:
Edit support system, command and control system, supervision module;Editor's support system include syntagma extraction module, Analyzing sentence fragments module, component generation module, syntagma memory module;The command and control system includes component definition module, binding Module;
The syntagma extraction module, for the sentence based on natural language processing (NLP) model extraction writer's input data Section, and the syntagma is passed into analyzing sentence fragments module and syntagma memory module;
The analyzing sentence fragments module, for being classified according to preset algorithm to the syntagma, to sorted syntagma into Row duplicate checking, and duplicate checking result is passed into syntagma memory module;The classification of the classification includes: definition type, reference type, common Type;
The component generation module, for receiving the text export instruction of writer's sending, by the input data of writer Text file is exported as, and returns to the text file to writer;
The syntagma memory module, for saving writer's input data, syntagma, duplicate checking result;
The component definition module, for being sorted according to paper typesetting specification definitions component pattern and component;For with institute Component generation module is stated to carry out data transmission, Xiang Suoshu component generation module transmitting assembly pattern and component sequence match confidence Breath;
The binding module, for receiving and processing binding writer's request;
The supervision module, for sending binding writer's request to the binding module.
The component definition module is suitable according to the paper typesetting specification definitions component pattern and component of universities and colleges and subject composition Sequence can drag control arrangement in graphic interface when operation, define to complete;Optional component include but is not limited to title, Subtitle, tutor, student name, profession, catalogue, header, footer, cover template, abstract title, clip Text, abstract are crucial Word title, abstract keyword content, be applicable in subject category, level-one title, second level title, three-level title, level Four title, in text Hold, annex content;The component pattern includes paragraph style, text style, customized pattern, and the paragraph style includes but not It is limited to paragraph alignment, paragraph indentation, section spacing, line-spacing, outline rank, paragraph paging attribute, text style includes but not It is limited to applicable Chinese font, applicable english font, font size, word space, font vertical alignment mode;When definition, need Paragraph style and text style are set for each component, and the component of definition and universities and colleges' agreement are bound, and universities and colleges can be according to its subject Or the corresponding paper component of professional definition, and it is saved in database.
Further, the syntagma extraction module is specifically used for:
The languages that the input data is identified based on natural language processing, according to preset canonical template to the input number According to cutting syntagma, the syntagma of cutting is forwarded to the analyzing sentence fragments module and syntagma memory module.
Further, the analyzing sentence fragments module is specifically used for:
Part of speech identification and text classification are carried out based on syntagma of the natural language processing to the cutting, calculates the cutting The repetitive rate of syntagma, and repeat type is marked, the repetitive rate and repeat type are passed into the syntagma memory module;It calculates The process of repetitive rate specifically:
Based on word2vec model using disclosed general corpus and each disciplines open training word to Amount.Given syntagma is quickly divided based on scheduled dictionary and using AhoCorasickDoubleArrayTrie algorithm Word similarly segments the syntagma of papers other in syntagma repository or document.To each word segmentation result obtain corresponding word to Term vector phase adduction is averaging by amount.The vector for obtaining syntagma indicates, and calculates its included angle cosine value.This algorithm flow is used for The similarity of two given syntagmas is calculated, the analysis for calculating similarity only takes and first three immediate sentence of the target syntagma meaning of one's words Section.With the original repetition score value of similar score benchmark.And duplicate paragraph, position, repetitive rate are recorded, in conjunction with universities and colleges The weight of agreement repeats comprehensive marking to syntagma, obtains comprehensive repetitive rate;The repeat type includes that definition type repeats, draws It is repeated with type, maximum probability plagiarism type repeats.
Further, the syntagma memory module is based on determinant storage organization, stores up writer in same data column memory To the version of input data modified each time, and corresponding timestamp is saved, the data entity of each version storage wraps The input data for including writer, the syntagma extracted from input data, the duplicate checking result of syntagma.
Further, editor's support system further includes historical trace module, in the syntagma memory module The edition data modified each time is obtained, for the edition data editor ID and time parameter modified each time, is exported to go through History editor's data.
Second aspect provides a kind of paper Authority Contro1 and analysis system based on component agreement, including above-mentioned platform, volume It collects terminal, command control terminal, guide and supervise terminal;
The editor terminal carries out data transmission with editor's support system, for writer provide text editing and Export interface;
The command control terminal carries out data transmission with the command and control system, for providing commander's control to manager Interface processed;
The terminal of guiding and supervising carries out data transmission with the supervision module, carries out for providing to teacher to writer The interface guided and supervised.
Further, the editor terminal specifically includes:
Editor module, for providing RichText Edition environment to writer;
Monitoring module obtains writer's input data, and the input data is passed for monitoring writer's input state Pass the syntagma extraction module and syntagma memory module;The process specifically: monitor the volume that writer completes a paragraph When writing, the data write are forwarded to syntagma in the form of JSON (JavaScript Object Notation) serialized data Extraction module and syntagma memory module;
The text is exported instruction and sent by text export module for obtaining the text export instruction of writer's transmission To the component generation module, the input data is exported as text file, and receive by the component generation module The text file returned.
The editor terminal is the equipment such as the mobile phone for being equipped with special-purpose software or application, tablet computer, the special-purpose software Or the application integration editor module, monitoring module for writer provide input in forms such as APP or Web applications as carrier Data-interface, and guide writer to edit, monitor editing mode etc..
Further, after writer triggers the text export module, the component generation module is specifically used for:
Carry out data transmission with the component definition module, according to the parameter that writer provides, Xiang Suoshu component definition mould Block obtains corresponding component pattern and component sequence, according to the component pattern of acquisition, converts symbol for the input data of writer The XML component for closing office open xml specification, the component sequence further according to acquisition are ranked up the XML component;It will row XML component after sequence carries out set compression, generates text file;Should during, component definition module is by the component pattern of definition It sorts with component and the component generation module is passed to by JSON serialized data;During being somebody's turn to do, component generation module foundation The relevant JSON serialized data of component extracts each component dependence sequence, meets Office Open according to the generation of component pattern The XML component of xml specification, then according to XML component described in component ordered arrangement, it later will according to Office Open xml specification XML assembly set boil down to docx file after arrangement, or docx file is optionally converted into pdf document, finally by docx File or pdf document return to writer;The parameter that the writer provides includes the affiliated school of writer, institute, subject, i.e., The module data that writer is accordingly defined by itself school, institute, subject to the component definition module request;The text The format of file includes docx, PDF.
Further, the command control terminal specifically includes:
Input module is defined, is carried out data transmission with the component definition module;For editing paper typesetting specification, and will The paper typesetting specification is sent to the component definition module;
Processing module is bound, for receiving and processing asking by the specified writer of binding for guiding and supervising terminal transmission It asks;For the permission guiding and supervising terminal distribution and being guided and supervised to corresponding writer;It is described to guide and supervise terminal acquisition After the management supervision permission, all data that whole can be supervised corresponding writer and edit, and history current to writer's paper It writes situation and carries out guidance and supervision.
Further, which further includes statistics and analysis module, and target is school and responsible educational institution using object The control unit of contour level is docked for carrying out data with all editor terminals in extent of competence, is obtained syntagma and is stored mould The full-text data of block carries out data mining with parser according to default statistics, obtains statistics and analysis data;By the statistics The command and control terminal is returned to analysis data, is visualized;
The task that the statistics is completed with analysis statistical module includes carrying out data mining to monograph, to high-level A large amount of papers in control range carry out data mining.Analysis will excavate area research entity from paper data, can incite somebody to action Similar field studies the paper cluster of entity.Research entity and each entity in paper extractable for the excavation of monograph Between potential relationship.
For the data mining solution processes of monograph are as follows: 1, extract all of monograph from syntagma memory module Syntagma.2, name Entity recognition is realized using the mode of predefined dictionary+machine learning.The dictionary is predefined word Allusion quotation, predefined dictionary include the technical term and common name of every subjects, place name and other common words.It is based on AhoCorasickDoubleArrayTrie algorithm model carries out very fast participle to paper syntagma using customized dictionary, and from The technical term of name, place name, each subject is extracted in word segmentation result.Participle based on dictionary is limited to the covering of dictionary itself Range and precision, the participle based on CRF (condition random field) have better identification, the process of specific implementation to neologisms are as follows: utilize Disclosed corpus executes the training of CRF model, provides participle interface based on training result, and divided using syntagma as ginseng is entered Word.All word segmentation results are integrated, are sorted according to the frequency occurred in syntagma.3, in the result that name entity extracts, into One step extracts the entity relationship in syntagma using BiLSTM+attention model.Between excavation applications research object Potential relationship.
A large amount of papers in high-rise control range are carried out with the process of data mining are as follows: 1, based on control range to document point Group 2 extracts name entity in each paper syntagma based on extracting identical mode with monograph.3, based on being extracted in each paper High frequency occur name entity to all papers in control range utilize KNN (K-Nearest Neighbor) algorithm model It is clustered, and exports the cluster result of name entity and paper.
The beneficial effects of the present invention are embodied in:
The present invention can be required according to the paper format of different universities and colleges, subject, profession, be carried out using modularization agreement to paper Automatic format processing, solves the problems, such as that traditional manual editorial efficiency is low, arbitrariness is also big, error rate is high.Based on historical record Tracking and analysis, can act of plagiarism of the students ' (writer) during writing article in real time, rather than only focus on and look into Weight is as a result, the effectively academic improper behavior of control.Further, the text analyzing method based on deep learning, can excavate opinion Literary area research tends to and the potential relationship of domain entities, and grasps thesis work totality propulsion progress.
Detailed description of the invention
It, below will be to specific in order to illustrate more clearly of the specific embodiment of the invention or technical solution in the prior art Embodiment or attached drawing needed to be used in the description of the prior art are briefly described.In all the appended drawings, similar element Or part is generally identified by similar appended drawing reference.In attached drawing, each element or part might not be drawn according to actual ratio.
Fig. 1 is a kind of structural representation of paper Authority Contro1 and analysis platform based on component agreement of the embodiment of the present invention Figure;
Fig. 2 is the structural representation of paper Authority Contro1 and analysis platform that another kind of the embodiment of the present invention is arranged based on component Figure;
Fig. 3 is a kind of structural representation of paper Authority Contro1 and analysis system based on component agreement of the embodiment of the present invention Figure;
Fig. 4 is structural representation of the another kind of the embodiment of the present invention based on the component paper Authority Contro1 arranged and analysis system Figure;
Fig. 5 is that a kind of data transmission of paper Authority Contro1 and analysis system based on component agreement of the embodiment of the present invention is shown It is intended to.
Specific embodiment
It is described in detail below in conjunction with embodiment of the attached drawing to technical solution of the present invention.Following embodiment is only used for Clearly illustrate technical solution of the present invention, therefore be only used as example, and cannot be used as a limitation and limit protection model of the invention It encloses.
It should be noted that unless otherwise indicated, technical term or scientific term used in this application should be this hair The ordinary meaning that bright one of ordinary skill in the art are understood.
As shown in Figure 1, for a kind of paper Authority Contro1 and analysis platform embodiment based on component agreement of the present invention, packet It includes:
Edit support system 11, command and control system 12, supervision module 13;Editor's support system 11 is mentioned including syntagma Modulus block, analyzing sentence fragments module, component generation module, syntagma memory module;The command and control system 12 includes component definition Module, binding module;
The syntagma extraction module, for the sentence based on natural language processing (NLP) model extraction writer's input data Section, and the syntagma is passed into analyzing sentence fragments module and syntagma memory module;
The analyzing sentence fragments module, for being classified according to preset algorithm to the syntagma, to sorted syntagma into Row duplicate checking, and duplicate checking result is passed into syntagma memory module;The classification of the classification includes: definition type, reference type, common Type;
The component generation module, for receiving the text export instruction of writer's sending, by the input data of writer Text file is exported as, and returns to the text file to writer;
The syntagma memory module, for saving writer's input data, syntagma, duplicate checking result;
The component definition module, for being sorted according to paper typesetting specification definitions component pattern and component;For with institute Component generation module is stated to carry out data transmission, Xiang Suoshu component generation module transmitting assembly pattern and component sequence match confidence Breath;
The binding module, for receiving and processing binding writer's request;
The supervision module, for sending binding writer's request to the binding module.
The component definition module is suitable according to the paper typesetting specification definitions component pattern and component of universities and colleges and subject composition Sequence can drag control arrangement in graphic interface when operation, define to complete;Optional component include but is not limited to title, Subtitle, tutor, student name, profession, catalogue, header, footer, cover template, abstract title, clip Text, abstract are crucial Word title, abstract keyword content, be applicable in subject category, level-one title, second level title, three-level title, level Four title, in text Hold, annex content;The component pattern includes paragraph style, text style, customized pattern, and the paragraph style includes but not It is limited to paragraph alignment, paragraph indentation, section spacing, line-spacing, outline rank, paragraph paging attribute, text style includes but not It is limited to applicable Chinese font, applicable english font, font size, word space, font vertical alignment mode;When definition, need Paragraph style and text style are set for each component, and the component of definition and universities and colleges' agreement are bound, and universities and colleges can be according to its subject Or the corresponding paper component of professional definition, and it is saved in database.
Preferably, the syntagma extraction module is specifically used for:
The languages that the input data is identified based on natural language processing, according to preset canonical template to the input number According to cutting syntagma, the syntagma of cutting is forwarded to the analyzing sentence fragments module and syntagma memory module.
Preferably, the analyzing sentence fragments module is specifically used for:
Part of speech identification and text classification are carried out based on syntagma of the natural language processing to the cutting, calculates the cutting The repetitive rate of syntagma, and repeat type is marked, the repetitive rate and repeat type are passed into the syntagma memory module;It calculates The process of repetitive rate specifically:
Based on word2vec model using disclosed general corpus and each disciplines open training word to Amount.Given syntagma is quickly divided based on scheduled dictionary and using AhoCorasickDoubleArrayTrie algorithm Word similarly segments the syntagma of papers other in syntagma repository or document.To each word segmentation result obtain corresponding word to Term vector phase adduction is averaging by amount.The vector for obtaining syntagma indicates, and calculates its included angle cosine value.This algorithm flow is used for The similarity of two given syntagmas is calculated, the analysis for calculating similarity only takes and first three immediate sentence of the target syntagma meaning of one's words Section.With the original repetition score value of similar score benchmark.And duplicate paragraph, position, repetitive rate are recorded, in conjunction with universities and colleges The weight of agreement repeats comprehensive marking to syntagma, obtains comprehensive repetitive rate;The repeat type includes that definition type repeats, draws It is repeated with type, maximum probability plagiarism type repeats.Preferably, the syntagma memory module is based on determinant storage organization, in same number According to column memory storage writer to the version of input data modified each time, and corresponding timestamp is saved, each version is deposited Syntagma, the duplicate checking result of syntagma that the data entity of storage includes the input data of writer, extracts from input data.
Preferably, editor's support system 11 further includes historical trace module as described in Figure 2, for storing mould in the syntagma The edition data modified each time is obtained in block, for the edition data editor ID and time parameter modified each time, output For historical compilation data.
As shown in figure 3, for a kind of paper Authority Contro1 and analysis system embodiment based on component agreement of the present invention, including Above-mentioned platform, command control terminal 32, guides and supervises terminal 33 at editor terminal 31;
The editor terminal 31 carries out data transmission with editor's support system 11, compiles for providing text to writer Volume and export interface;
The command control terminal 32 carries out data transmission with the command and control system 12, refers to for providing to manager Wave control interface;
The terminal 33 of guiding and supervising carries out data transmission with the supervision module 13, for providing to teacher to writer The interface guided and supervised.
Preferably, the editor terminal 31 specifically includes:
Editor module, for providing RichText Edition environment to writer;
Monitoring module obtains writer's input data, and the input data is passed for monitoring writer's input state Pass the syntagma extraction module and syntagma memory module;The process specifically: monitor the volume that writer completes a paragraph When writing, the data write are forwarded to syntagma in the form of JSON (JavaScript Object Notation) serialized data Extraction module and syntagma memory module;
The text is exported instruction and sent by text export module for obtaining the text export instruction of writer's transmission To the component generation module, the input data is exported as text file, and receive by the component generation module The text file returned.
The editor terminal 31 is to be equipped with the equipment such as mobile phone, the tablet computer of special-purpose software or application, described dedicated soft Part or the application integration editor module, monitoring module provide defeated in forms such as APP or Web applications as carrier for writer Enter data-interface, and guide writer to edit, monitor editing mode etc..
Preferably, after writer triggers the text export module, the component generation module is specifically used for:
Carry out data transmission with the component definition module, according to the parameter that writer provides, Xiang Suoshu component definition mould Block obtains corresponding component pattern and component sequence, according to the component pattern of acquisition, converts symbol for the input data of writer The XML component for closing office open xml specification, the component sequence further according to acquisition are ranked up the XML component;It will row XML component after sequence carries out set compression, generates text file;Should during, component definition module is by the component pattern of definition It sorts with component and the component generation module is passed to by JSON serialized data;During being somebody's turn to do, component generation module foundation The relevant JSON serialized data of component extracts each component dependence sequence, meets Office Open according to the generation of component pattern The XML component of xml specification, then according to XML component described in component ordered arrangement, it later will according to Office Open xml specification XML assembly set boil down to docx file after arrangement, or docx file is optionally converted into pdf document, finally by docx File or pdf document return to writer;The parameter that the writer provides includes the affiliated school of writer, institute, subject, i.e., The module data that writer is accordingly defined by itself school, institute, subject to the component definition module request;The text The format of file includes docx, PDF.
Preferably, the command control terminal 32 specifically includes:
Input module is defined, is carried out data transmission with the component definition module;For editing paper typesetting specification, and will The paper typesetting specification is sent to the component definition module;
Processing module is bound, guides and supervises the specified writer's of binding that terminal 33 sends by described for receiving and processing Request;The permission guided and supervised to corresponding writer is distributed for the terminal 33 of guiding and supervising;It is described to guide and supervise terminal After 33 obtain the management supervision permission, all data that whole can be supervised corresponding writer and edit are current to writer's paper Situation, which is write, with history carries out guidance and supervision.
Preferably, as shown in figure 4, further including statistics and analysis module, target is school and education supervisor portion using object The control unit of the contour level of door is docked for carrying out data with all editor terminals 31 in extent of competence, is obtained syntagma and is deposited The full-text data for storing up module carries out data mining with parser according to default statistics, obtains statistics and analysis data;It will be described Statistics returns to the command control terminal 32 with analysis data, is visualized;The statistics and analysis statistics mould The task that block is completed includes carrying out data mining to monograph, carries out data digging to a large amount of papers in high-level control range Pick.Analysis will excavate area research entity from paper data, and the paper that similar field can be studied to entity clusters.For The potential relationship between research entity and each entity in the extractable paper of the excavation of monograph.
For the data mining solution processes of monograph are as follows: 1, extract all of monograph from syntagma memory module Syntagma.2, name Entity recognition is realized using the mode of predefined dictionary+machine learning.The dictionary is predefined word Allusion quotation, predefined dictionary include the technical term and common name of every subjects, place name and other common words.It is based on AhoCorasickDoubleArrayTrie algorithm model carries out very fast participle to paper syntagma using customized dictionary, and from The technical term of name, place name, each subject is extracted in word segmentation result.Participle based on dictionary is limited to the covering of dictionary itself Range and precision, the participle based on CRF (condition random field) have better identification, the process of specific implementation to neologisms are as follows: utilize Disclosed corpus executes the training of CRF model, provides participle interface based on training result, and divided using syntagma as ginseng is entered Word.All word segmentation results are integrated, are sorted according to the frequency occurred in syntagma.3, in the result that name entity extracts, into One step extracts the entity relationship in syntagma using BiLSTM+attention model.Between excavation applications research object Potential relationship.
A large amount of papers in high-rise control range are carried out with the process of data mining are as follows: 1, based on control range to document point Group 2 extracts name entity in each paper syntagma based on extracting identical mode with monograph.3, based on being extracted in each paper High frequency occur name entity to all papers in control range utilize KNN (K-Nearest Neighbor) algorithm model It is clustered, and exports the cluster result of name entity and paper.
Further to turn to be illustrated to data transmission stream of the present invention, a kind of paper specification control based on component agreement is provided The data of system and analysis system transmit schematic diagram, as described in Figure 5, comprising:
User oriented editor terminal 31, command control terminal 32 guide and supervise terminal 33, carry out function support and business Syntagma extraction module, analyzing sentence fragments module, syntagma memory module, historical trace module, component generation module, the binding mould of processing Block, component definition module, supervision module, statistical analysis module;Each terminal is carried out data transmission with each intermodule by gateway;Group The paper component pattern and component collating sequence definitions component that part definition module is provided according to command control terminal 32, and store and arrive This module;Editor terminal gets component definition from component definition module, and guidance writer divide paragragh editor's paper, monitors every The paragragh that editor completes is uploaded to syntagma and provided by the editing mode of a nature paragraph when a paragragh editor completes Module extracts syntagma, and the complete paragragh of the syntagma of extraction and upload is sent to syntagma memory module and deposit for the first time Storage, then analyzing sentence fragments module is sent by the syntagma of extraction;Analyzing sentence fragments module classifies to each syntagma, duplicate checking is given a mark, language Method analysis, and analysis result is stored again in syntagma memory module;
Editor terminal 31, which issues, generates paper documented instructions to component generation module, and component generation module is from component definition mould Block securing component defines data and obtains paper editor's data of recent release from syntagma memory module, according in component definition Component pattern data and editor's data generate multiple components, and according to the module arrangement sequence arrangement component in component definition, will The component structure of arrangement is docx file according to office open xml Protocol compression and feeds back to editor terminal 31;
It guides and supervises terminal 33 request to bind student by command control terminal 32, command control terminal 32 is by bind request It to be given to binding module, and is forwarded to the editor terminal 31 of corresponding student, the editor terminal 31 of binding student's account receives request, with Specified teacher, which establishes, supervises and guides relationship;Command control terminal 32 can be tracked to historical trace module by the history number of instruction of papil According to;
Statistics and analysis module from syntagma memory module obtain the syntagma data in each control range, according to scheduled statistics Statistics and analysis is carried out to the data in each control range with mining model, and statistic analysis result is stored to this system, is referred to It waves controlling terminal 32 and goes back the statistic analysis result in specified range and to visualize with analysis module to statistics.
The present invention can be required according to the paper format of different universities and colleges, subject, profession, be carried out using modularization agreement to paper Automatic format processing, solves the problems, such as that traditional manual editorial efficiency is low, arbitrariness is also big, error rate is high.Based on historical record Tracking and analysis, can act of plagiarism of the students ' (writer) during writing article in real time, rather than only focus on and look into Weight is as a result, the effectively academic improper behavior of control.Further, the text analyzing method based on deep learning, can excavate opinion Literary area research tends to and the potential relationship of domain entities, and grasps thesis work totality propulsion progress.
Finally, it should be noted that the above embodiments are only used to illustrate the technical solution of the present invention., rather than its limitations;To the greatest extent Pipe present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that: its according to So be possible to modify the technical solutions described in the foregoing embodiments, or to some or all of the technical features into Row equivalent replacement;And these are modified or replaceed, various embodiments of the present invention technology that it does not separate the essence of the corresponding technical solution The range of scheme should all cover within the scope of the claims and the description of the invention.

Claims (10)

1. a kind of paper Authority Contro1 and analysis platform based on component agreement characterized by comprising
Edit support system, command and control system, supervision module;Editor's support system includes syntagma extraction module, syntagma Analysis module, component generation module, syntagma memory module;The command and control system includes component definition module, binding module;
The syntagma extraction module, for extracting the syntagma of writer's input data based on Natural Language Processing Models, and by institute It states syntagma and passes to analyzing sentence fragments module and syntagma memory module;
The analyzing sentence fragments module looks into sorted syntagma for being classified according to preset algorithm to the syntagma Weight, and duplicate checking result is passed into syntagma memory module;
The component generation module exports the input data of writer for receiving the text export instruction of writer's sending For text file, and the text file is returned to writer;
The syntagma memory module, for saving writer's input data, syntagma, duplicate checking result;
The component definition module, for being sorted according to paper typesetting specification definitions component pattern and component;For with described group Part generation module carries out data transmission, the configuration information of Xiang Suoshu component generation module transmitting assembly pattern and component sequence;
The binding module, for receiving and processing binding writer's request;
The supervision module, for sending binding writer's request to the binding module.
2. platform as described in claim 1, which is characterized in that the syntagma extraction module is specifically used for:
The languages that the input data is identified based on natural language processing cut the input data according to preset canonical template The syntagma of cutting is forwarded to the analyzing sentence fragments module and syntagma memory module by subordinate sentence section.
3. platform as claimed in claim 2, which is characterized in that the analyzing sentence fragments module is specifically used for:
Part of speech identification and text classification are carried out based on syntagma of the natural language processing to the cutting, calculates the syntagma of the cutting Repetitive rate, and mark repeat type, the repetitive rate and repeat type passed into the syntagma memory module;The repetition Type includes that definition type repeats, reference type repeats, maximum probability plagiarism type repeats.
4. platform as claimed in claim 3, which is characterized in that the syntagma memory module is based on determinant storage organization, Same data column memory stores up writer to the version of input data modified each time, and saves corresponding timestamp, each The data entity of version storage includes the input data of writer, the duplicate checking knot of the syntagma extracted from input data, syntagma Fruit.
5. platform as claimed in claim 4, which is characterized in that editor's support system further includes historical trace module, is used In obtaining the edition data modified each time in the syntagma memory module, for the edition data editor modified each time ID and time parameter export as historical compilation data.
6. a kind of paper Authority Contro1 and analysis system based on component agreement, which is characterized in that including such as claim 1-5 Any platform, command control terminal, guides and supervises terminal at editor terminal;
The editor terminal carries out data transmission with editor's support system, for providing text editing and export to writer Interface;
The command control terminal carries out data transmission with the command and control system, connects for providing command and control to manager Mouthful;
The terminal of guiding and supervising carries out data transmission with the supervision module, instructs for providing to teacher writer The interface of supervision.
7. paper Authority Contro1 and analysis system as claimed in claim 6 based on component agreement, which is characterized in that the volume Terminal is collected to specifically include:
Editor module, for providing RichText Edition environment to writer;
Monitoring module obtains writer's input data, and the input data is passed to for monitoring writer's input state The syntagma extraction module and syntagma memory module;
Text export instruction is sent to institute for obtaining the text export instruction of writer's transmission by text export module Component generation module is stated, the input data is exported as text file, and receives and is returned by the component generation module The text file.
8. paper Authority Contro1 and analysis system as claimed in claim 7 based on component agreement, which is characterized in that writing After people triggers the text export module, the component generation module is specifically used for:
Carry out data transmission with the component definition module, according to the parameter that writer provides, Xiang Suoshu component definition module is obtained It takes corresponding component pattern and component to sort, according to the component pattern of acquisition, converts the input data of writer to and meet The XML component of office open xml specification, the component sequence further according to acquisition are ranked up the XML component;It will sequence XML component afterwards carries out set compression, generates text file;The parameter that the writer provides include the affiliated school of writer, Institute, subject;The format of the text file includes docx, PDF.
9. paper Authority Contro1 and analysis system as claimed in claim 8 based on component agreement, which is characterized in that the finger Controlling terminal is waved to specifically include:
Input module is defined, is carried out data transmission with the component definition module;For editing paper typesetting specification, and will be described Paper typesetting specification is sent to the component definition module;
Processing module is bound, for receiving and processing by the request of the specified writer of the binding for guiding and supervising terminal transmission; For the permission guiding and supervising terminal distribution and being guided and supervised to corresponding writer.
10. paper Authority Contro1 and analysis system as claimed in claim 9 based on component agreement, which is characterized in that also wrap Statistics and analysis module are included, is docked for carrying out data with all editor terminals in extent of competence, and obtains syntagma storage mould The full-text data of block carries out data mining with parser according to default statistics, obtains statistics and analysis data;By the statistics The command control terminal is returned to analysis data, is visualized;The statistics includes: that paper is ground with analysis data Study carefully field trend, thesis work totality propulsion progress.
CN201910368553.5A 2019-05-05 2019-05-05 A kind of paper Authority Contro1 and analysis platform and system based on component agreement Pending CN110069785A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910368553.5A CN110069785A (en) 2019-05-05 2019-05-05 A kind of paper Authority Contro1 and analysis platform and system based on component agreement

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910368553.5A CN110069785A (en) 2019-05-05 2019-05-05 A kind of paper Authority Contro1 and analysis platform and system based on component agreement

Publications (1)

Publication Number Publication Date
CN110069785A true CN110069785A (en) 2019-07-30

Family

ID=67370171

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910368553.5A Pending CN110069785A (en) 2019-05-05 2019-05-05 A kind of paper Authority Contro1 and analysis platform and system based on component agreement

Country Status (1)

Country Link
CN (1) CN110069785A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111507073A (en) * 2020-04-10 2020-08-07 甯航 Thesis editing and intelligent typesetting method and platform based on web rich text

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101178786A (en) * 2006-11-09 2008-05-14 上海晨鸟信息科技有限公司 Online dissertation management method for realizing plagiarize and format checking by network resource

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101178786A (en) * 2006-11-09 2008-05-14 上海晨鸟信息科技有限公司 Online dissertation management method for realizing plagiarize and format checking by network resource

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
方婷云: "基于XML的社科期刊自适应排版技术研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111507073A (en) * 2020-04-10 2020-08-07 甯航 Thesis editing and intelligent typesetting method and platform based on web rich text

Similar Documents

Publication Publication Date Title
CN105279495B (en) A kind of video presentation method summarized based on deep learning and text
CN110825882A (en) Knowledge graph-based information system management method
CN109299865B (en) Psychological evaluation system and method based on semantic analysis and information data processing terminal
CN110377900A (en) Checking method, device, computer equipment and the storage medium of Web content publication
Baur et al. eXplainable cooperative machine learning with NOVA
CN110612524B (en) Information processing apparatus, information processing method, and recording medium
CN111311459B (en) Interactive question-setting method and system for international Chinese teaching
CN106407482B (en) A kind of network academic report category method based on multi-feature fusion
CN109933783A (en) A kind of essence of a contract method of non-performing asset operation field
CN106776695A (en) The method for realizing the automatic identification of secretarial document value
Qin et al. Learning latent semantic annotations for grounding natural language to structured data
CN110069785A (en) A kind of paper Authority Contro1 and analysis platform and system based on component agreement
Shao et al. Research on a new automatic generation algorithm of concept map based on text clustering and association rules mining
Zhou et al. Research on college graduates employment prediction model based on C4. 5 algorithm
Oyama et al. Visual clarity analysis and improvement support for presentation slides
CN117235233A (en) Automatic financial report question-answering method and device based on large model
Brath et al. Automated Insights on Visualizations with Natural Language Generation
CN109711799A (en) Guide the teaching software and its operation method of the standardization office of administration hilllock
CN109902299A (en) A kind of text handling method and device
CN113688233A (en) Text understanding method for semantic search of knowledge graph
CN115204128A (en) Configuration file generation method and device and computer readable storage medium
Miksatko et al. What’s in a cluster? automatically detecting interesting interactions in student e-discussions
Rachapudi et al. Discovery of structured data using unsupervised spatial clustering and human supervision
CN109543182A (en) A kind of electric power enterprise based on solr engine takes turns interactive semantic analysis method more
Sengupta et al. In-detail analysis on custom teaching and learning framework

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190730