CN110069772A - Predict device, method and the storage medium of the scoring of question and answer content - Google Patents
Predict device, method and the storage medium of the scoring of question and answer content Download PDFInfo
- Publication number
- CN110069772A CN110069772A CN201910185054.2A CN201910185054A CN110069772A CN 110069772 A CN110069772 A CN 110069772A CN 201910185054 A CN201910185054 A CN 201910185054A CN 110069772 A CN110069772 A CN 110069772A
- Authority
- CN
- China
- Prior art keywords
- question
- answer content
- scoring
- answer
- content
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 36
- 238000003860 storage Methods 0.000 title claims abstract description 20
- 230000008676 import Effects 0.000 claims abstract description 7
- 238000012545 processing Methods 0.000 claims description 25
- 230000011218 segmentation Effects 0.000 claims description 18
- 230000003252 repetitive effect Effects 0.000 claims description 7
- 238000010276 construction Methods 0.000 claims description 6
- 238000009826 distribution Methods 0.000 claims description 3
- 239000004744 fabric Substances 0.000 claims description 3
- 238000005516 engineering process Methods 0.000 abstract 1
- 238000010586 diagram Methods 0.000 description 6
- 238000004891 communication Methods 0.000 description 5
- 238000012549 training Methods 0.000 description 3
- 238000005520 cutting process Methods 0.000 description 2
- 238000005194 fractionation Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000007115 recruitment Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004140 cleaning Methods 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000011430 maximum method Methods 0.000 description 1
- 238000013077 scoring method Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0639—Performance analysis of employees; Performance analysis of enterprise or organisation operations
- G06Q10/06393—Score-carding, benchmarking or key performance indicator [KPI] analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/20—Education
- G06Q50/205—Education administration or guidance
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Business, Economics & Management (AREA)
- Engineering & Computer Science (AREA)
- Human Resources & Organizations (AREA)
- Theoretical Computer Science (AREA)
- Educational Administration (AREA)
- Strategic Management (AREA)
- Tourism & Hospitality (AREA)
- Economics (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Entrepreneurship & Innovation (AREA)
- Educational Technology (AREA)
- General Health & Medical Sciences (AREA)
- Marketing (AREA)
- General Business, Economics & Management (AREA)
- Health & Medical Sciences (AREA)
- Development Economics (AREA)
- Game Theory and Decision Science (AREA)
- Quality & Reliability (AREA)
- Operations Research (AREA)
- Primary Health Care (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Engineering & Computer Science (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Electrically Operated Instructional Devices (AREA)
Abstract
The present invention relates to a kind of big data technologies, disclose device, method and the storage medium of a kind of scoring for predicting question and answer content, this method comprises: collecting the question and answer content of written examination link history and to the corresponding practical scoring of each question and answer content;Based on question and answer composition of content participle library, corpus, word frequency inverse document frequency model and implicit Di Li Cray distributed model and save into database;Import the participle library in the database, corpus, word frequency inverse document frequency model and implicit Di Li Cray distributed model, after being segmented, segmenting statistics word frequency, be input in word frequency inverse document frequency model and implicit Di Li Cray distributed model, obtain output the question and answer content to be scored with this belong to a theme history question and answer content maximum probability queue;The corresponding practical scoring of queue based on the maximum probability calculates the prediction scoring of the question and answer content to be scored.The present invention can ensure the justice of scoring.
Description
Technical field
The present invention relates to data analysis technique field more particularly to a kind of devices for the scoring for predicting question and answer content, method
And storage medium.
Background technique
Currently, enterprises recruitment is directed to written examination link, catechetical problem is generally comprised in written examination link, is especially managed
The posies such as reason, product, catechetical problem occupies biggish part in written examination link.For catechetical problem in written examination link
Marking mode be typically dependent on artificial scoring, this mode manually to score largely subjective is thought by personal
The influence of dimension and preference influences the objectivity of scoring, and time-consuming and laborious.
Summary of the invention
The purpose of the present invention is to provide device, method and the storage mediums of a kind of scoring for predicting question and answer content, it is intended to
Objective, just scoring is carried out to the question and answer content in written examination link.
To achieve the above object, the present invention provides a kind of device of scoring for predicting question and answer content, in the prediction question and answer
The device of the scoring of appearance includes memory and the processor that connect with the memory, and being stored in the memory can be described
The processing system run on processor, the processing system realize following steps when being executed by the processor:
Collect the question and answer content of written examination link history and to the corresponding practical scoring of each question and answer content;
Based on question and answer composition of content participle library, corpus, word frequency inverse document frequency model and implicit Di Li Cray
Distributed model is simultaneously saved into database;
Import the participle library in the database, corpus, word frequency inverse document frequency model and implicit Di Li Cray point
Cloth model, the question and answer content for treating scoring based on the participle library are segmented, based in the corpus question and answer to be scored to this
After the participle statistics word frequency of appearance, which is sequentially input to word frequency inverse document frequency model and implicit Di Li
In Cray distributed model, the question and answer content for obtaining the history for belonging to a theme with the question and answer content to be scored of output is general
The maximum queue of rate;
It is chosen from the queue of the maximum probability and is somebody's turn to do question and answer content similarity to be scored more than or equal to predetermined threshold
Question and answer content is as similar queue;
If the length of the similar queue is more than or equal to 2, the corresponding reality of each question and answer content in the similar queue is obtained
Scoring calculates the pre- assessment of the question and answer content to be scored based on the corresponding practical scoring of question and answer content each in the similar queue
Point.
Preferably, described this is calculated based on the corresponding practical scoring of question and answer content each in the similar queue to be scored to ask
The step of answering the prediction scoring of content, specifically includes:
Wherein, PjIt scores for the prediction of the question and answer content j to be scored,For
The corresponding mean value actually to score of whole question and answer content in the similar queue, L are the length of the similar queue, and Sim (i, j) is should
The similarity of the question and answer content i of similar queue and question and answer content j that should be to be scored, riFor i pairs of the question and answer content of the similar queue
The practical scoring answered.
Preferably, when the processing system is executed by the processor, following steps are also realized:
The practical scoring for obtaining N number of question and answer content to be scored calculates this based on the practical scoring and N number of to be scored asks
Answer the mean absolute error of the prediction scoring of content, comprising:
Wherein, rjFor the corresponding practical scoring of the question and answer content j to be scored, N be greater than etc.
In 2 integer;
The accuracy of prediction scoring is analyzed based on the mean absolute error.
Preferably, described based on question and answer composition of content participle library, corpus, word frequency inverse document frequency model and hidden
The distributed model of Cray containing Di Li simultaneously saves the step into database, specifically includes:
Each question and answer content is segmented using scheduled segmentation methods, obtains the word segmentation result of each question and answer content,
Corresponding participle library is constructed based on the word segmentation result, corresponding corpus is generated based on the participle library, is constructed based on the corpus
Word frequency inverse document frequency model implies Di Li Cray distributed model based on the word frequency inverse document frequency Construction of A Model,
It is pre- in the repetitive exercise participle library, corpus, word frequency inverse document frequency model and implicit Di Li Cray distributed model respectively
It is saved after determining number.
To achieve the above object, the present invention also provides a kind of method of scoring for predicting question and answer content, the prediction question and answer
The method of the scoring of content includes:
S1 collects the question and answer content of written examination link history and to the corresponding practical scoring of each question and answer content;
S2, based on question and answer composition of content participle library, corpus, word frequency inverse document frequency model and implicit Di Like
Thunder distributed model is simultaneously saved into database;
S3 imports the participle library in the database, corpus, word frequency inverse document frequency model and implicit Di Li Cray
Distributed model, the question and answer content for treating scoring based on the participle library are segmented, based on the corpus question and answer to be scored to this
After the participle statistics word frequency of content, which is sequentially input to word frequency inverse document frequency model and implicit Di
In sharp Cray distributed model, obtain output the question and answer content to be scored with this belong to a theme history question and answer content
The queue of maximum probability;
S4 chooses from the queue of the maximum probability and is somebody's turn to do question and answer content similarity to be scored more than or equal to predetermined threshold
Question and answer content as similar queue;
S5 obtains the corresponding reality of each question and answer content in the similar queue if the length of the similar queue is more than or equal to 2
Border scoring calculates the prediction of the question and answer content to be scored based on the corresponding practical scoring of question and answer content each in the similar queue
Scoring.
Preferably, described this is calculated based on the corresponding practical scoring of question and answer content each in the similar queue to be scored to ask
The step of answering the prediction scoring of content, specifically includes:
Wherein, PjIt scores for the prediction of the question and answer content j to be scored,For
The corresponding mean value actually to score of whole question and answer content in the similar queue, L are the length of the similar queue, and Sim (i, j) is should
The similarity of the question and answer content i of similar queue and question and answer content j that should be to be scored, riFor i pairs of the question and answer content of the similar queue
The practical scoring answered.
Preferably, after the step S5, further includes:
The practical scoring for obtaining N number of question and answer content to be scored calculates this based on the practical scoring and N number of to be scored asks
Answer the mean absolute error of the prediction scoring of content, comprising:
Wherein, rjFor the corresponding practical scoring of the question and answer content j to be scored, N be greater than etc.
In 2 integer;
The accuracy of prediction scoring is analyzed based on the mean absolute error.
Preferably, described based on question and answer composition of content participle library, corpus, word frequency inverse document frequency model and hidden
The distributed model of Cray containing Di Li simultaneously saves the step into database, specifically includes:
Each question and answer content is segmented using scheduled segmentation methods, obtains the word segmentation result of each question and answer content,
Corresponding participle library is constructed based on the word segmentation result, corresponding corpus is generated based on the participle library, is constructed based on the corpus
Word frequency inverse document frequency model implies Di Li Cray distributed model based on the word frequency inverse document frequency Construction of A Model,
It is pre- in the repetitive exercise participle library, corpus, word frequency inverse document frequency model and implicit Di Li Cray distributed model respectively
It is saved after determining number.
Preferably, the scheduled segmentation methods are Hidden Markov algorithm.
The present invention also provides a kind of computer readable storage medium, processing is stored on the computer readable storage medium
The step of system, the processing system realizes the method for the scoring of above-mentioned prediction question and answer content when being executed by processor.
The beneficial effects of the present invention are: the present invention is primarily based on the magnanimity question and answer composition of content participle of existing written examination link
Library, corpus, word frequency inverse document frequency model and implicit Di Li Cray distributed model, then, in use, based on participle
The question and answer content that scoring is treated in library is segmented, and word frequency is counted based on the corpus, finally, word frequency statistics result is sequentially input
Into word frequency inverse document frequency model and implicit Di Li Cray distributed model, in the question and answer to be scored with this that are exported
Hold the queue for belonging to the question and answer content maximum probability of history of a theme, it is corresponding to obtain more similar queue in the queue
Practical scoring, based on this it is practical scoring calculate the question and answer content to be scored prediction scoring, the present invention pass through magnanimity question and answer
The continuous repetition training of content obtains model, eliminates the influence of the subjective thinking and preference of scoring person to the objectivity of scoring, protects
Hinder the justice of scoring, and time saving and energy saving.
Detailed description of the invention
Fig. 1 is the optional application environment schematic diagram of each embodiment one of the invention;
Fig. 2 is the schematic diagram that the hardware structure of one embodiment of device of scoring of question and answer content is predicted in Fig. 1;
Fig. 3 is the Program modual graph that processing system unifies embodiment in Fig. 1, Fig. 2;
Fig. 4 is the flow diagram of one embodiment of method of the scoring of present invention prediction question and answer content.
Specific embodiment
In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to the accompanying drawings and embodiments, right
The present invention is further elaborated.It should be appreciated that described herein, specific examples are only used to explain the present invention, not
For limiting the present invention.Based on the embodiments of the present invention, those of ordinary skill in the art are not before making creative work
Every other embodiment obtained is put, shall fall within the protection scope of the present invention.
It should be noted that the description for being related to " first ", " second " etc. in the present invention is used for description purposes only, and cannot
It is interpreted as its relative importance of indication or suggestion or implicitly indicates the quantity of indicated technical characteristic.Define as a result, " the
One ", the feature of " second " can explicitly or implicitly include at least one of the features.In addition, the skill between each embodiment
Art scheme can be combined with each other, but must be based on can be realized by those of ordinary skill in the art, when technical solution
Will be understood that the combination of this technical solution is not present in conjunction with there is conflicting or cannot achieve when, also not the present invention claims
Protection scope within.
As shown in fig.1, being the application environment schematic diagram of present pre-ferred embodiments.It, should in the application environment schematic diagram
Predict that the device 1 of the scoring of question and answer content is connected with input unit 2, output device 3 by network 4.It is defeated by input unit 2
Enter question and answer content to be scored, the question and answer content that the device 1 of the scoring of prediction question and answer content treats scoring carries out prediction scoring, will
Prediction scoring is transmitted to output device 3 by network 4.The device 1 for predicting the scoring of question and answer content includes processing system 10
(APP), the question and answer content that processing system 10 treats scoring is analyzed to obtain prediction scoring, is exported by output device 3.
The device 1 of the scoring of the prediction question and answer content is that one kind can be according to the instruction for being previously set or storing, certainly
The dynamic equipment for carrying out numerical value calculating and/or information processing.It is described prediction question and answer content scoring device 1 can be computer,
Be also possible to single network server, multiple network servers composition server group or based on cloud computing by a large amount of hosts
Or the cloud that network server is constituted, wherein cloud computing is one kind of distributed computing, by the computer set of a group loose couplings
One super virtual computer of composition.
In the present embodiment, it as shown in Fig. 2, the device 1 of the scoring of prediction question and answer content may include, but is not limited only to, it can
The memory 11, processor 12, network interface 13 of connection are in communication with each other by system bus, memory 11, which is stored with, to be handled
The processing system run on device 12.It should be pointed out that Fig. 2 illustrates only commenting for the prediction question and answer content with component 11-13
Point device 1, it should be understood that be not required for implementing all components shown, the implementation that can be substituted is more or more
Few component.
Wherein, memory 11 includes the readable storage medium storing program for executing of memory and at least one type.Inside save as prediction question and answer content
Scoring device 1 operation provide caching;Readable storage medium storing program for executing can be for such as flash memory, hard disk, multimedia card, card-type memory
(for example, SD or DX memory etc.), random access storage device (RAM), static random-access memory (SRAM), read-only memory
(ROM), electrically erasable programmable read-only memory (EEPROM), programmable read only memory (PROM), magnetic storage, magnetic
The non-volatile memory medium of disk, CD etc..In some embodiments, readable storage medium storing program for executing can be commenting for prediction question and answer content
Point device 1 internal storage unit, such as the prediction question and answer content scoring device 1 hard disk;In other embodiments
In, which is also possible to predict the External memory equipment of the device 1 of the scoring of question and answer content, such as predicts
The plug-in type hard disk being equipped on the device 1 of the scoring of question and answer content, intelligent memory card (Smart Media Card, SMC), safety
Digital (Secure Digital, SD) card, flash card (Flash Card) etc..In the present embodiment, the readable storage of memory 11
Medium is installed on the operating system and types of applications software of the device 1 of the scoring of prediction question and answer content commonly used in storage, such as
Store the program code etc. of the processing system in one embodiment of the invention.In addition, memory 11 can be also used for temporarily storing
The Various types of data that has exported or will export.
The processor 12 can be in some embodiments central processing unit (Central Processing Unit,
CPU), controller, microcontroller, microprocessor or other data processing chips.The processor 12 is described pre- commonly used in controlling
The overall operation of the device 1 of the scoring of question and answer content is surveyed, such as is executed related to the progress data interaction of other devices or communication
Control and processing etc..In the present embodiment, the processor 12 for run the program code stored in the memory 11 or
Person handles data, such as operation processing system etc..
The network interface 13 may include radio network interface or wired network interface, which is commonly used in
Communication connection is established between the device 1 and other devices of the scoring of the prediction question and answer content.In the present embodiment, network interface 13
It is mainly used for for the device 1 for predicting the scoring of question and answer content being connected with input unit 2, output device 3, establishes data transmission channel
And communication connection.
The processing system is stored in memory 11, is stored in including at least one computer-readable in memory 11
Instruction, at least one computer-readable instruction can be executed by processor device 12, the method to realize each embodiment of the application;With
And the function that at least one computer-readable instruction is realized according to its each section is different, can be divided into different logic moulds
Block.
In one embodiment, following steps are realized when above-mentioned processing system is executed by the processor 12:
Collect the question and answer content of written examination link history and to the corresponding practical scoring of each question and answer content;
Based on question and answer composition of content participle library, corpus, word frequency inverse document frequency model and implicit Di Li Cray
Distributed model is simultaneously saved into database;
Import the participle library in the database, corpus, word frequency inverse document frequency model and implicit Di Li Cray point
Cloth model, the question and answer content for treating scoring based on the participle library are segmented, based in the corpus question and answer to be scored to this
After the participle statistics word frequency of appearance, which is sequentially input to word frequency inverse document frequency model and implicit Di Li
In Cray distributed model, the question and answer content for obtaining the history for belonging to a theme with the question and answer content to be scored of output is general
The maximum queue of rate;
It is chosen from the queue of the maximum probability and is somebody's turn to do question and answer content similarity to be scored more than or equal to predetermined threshold
Question and answer content is as similar queue;
If the length of the similar queue is more than or equal to 2, the corresponding reality of each question and answer content in the similar queue is obtained
Scoring calculates the pre- assessment of the question and answer content to be scored based on the corresponding practical scoring of question and answer content each in the similar queue
Point.
It is the Program modual graph of processing system 10 in Fig. 1, Fig. 2 referring to shown in Fig. 3.The processing system 10 is divided into
Multiple modules, multiple module are stored in memory 12, and are executed by processor 13, to complete the present invention.Institute of the present invention
The module of title is the series of computation machine program instruction section for referring to complete specific function.
Collection module 101, for collecting the question and answer content of written examination link history and to the corresponding reality of each question and answer content
Scoring;
Constructing module 102, for segmenting library, corpus, word frequency inverse document frequency mould based on the question and answer composition of content
Type and implicit Di Li Cray distributed model are simultaneously saved into database;
Output module 103, participle library, corpus, word frequency inverse document frequency model for importing in the database
And implicit Di Li Cray distributed model, the question and answer content for treating scoring based on the participle library are segmented, and the corpus pair is based on
After the participle statistics word frequency of the question and answer content wait scoring, which is sequentially input and is referred to word frequency against text frequency
In exponential model and implicit Di Li Cray distributed model, the question and answer content to be scored with this for obtaining output belongs to a theme
The queue of the question and answer content maximum probability of history;
Module 104 is chosen, it is big for the selection from the queue of the maximum probability and question and answer content similarity that should be to be scored
In the question and answer content equal to predetermined threshold as similar queue;
It predicts grading module 105, if the length for the similar queue is more than or equal to 2, obtains every in the similar queue
It is to be evaluated to calculate this based on the corresponding practical scoring of question and answer content each in the similar queue for the corresponding practical scoring of one question and answer content
The prediction scoring for the question and answer content divided.
As shown in figure 4, Fig. 4 is the flow diagram of one embodiment of method of the scoring of present invention prediction question and answer content, it should
Predict question and answer content scoring method the following steps are included:
Step S1 collects the question and answer content of written examination link history and to the corresponding practical scoring of each question and answer content;
Wherein, collect the history question and answer content of written examination link magnanimity in each enterprise, the question and answer content include enterprises recruitment to
Answer content, the answer content of applicant out are answered for example, answer content is when colleague " disagree with how to handle "
Content is " when encountering and colleague can not see eye to eye, I can combine and actually link up ".For the answer content enterprise meeting
Provide corresponding practical scoring.
Further, in order to reduce interference information, data cleansing can be carried out to question and answer content, including the spelling to data
Mistake, messy code etc. are cleaned, but are not needed removal and repeated the case where answering, because subsequent be related to word frequency statistics.
Step S2, based on question and answer composition of content participle library, corpus, word frequency inverse document frequency model and implicit Di
Sharp Cray distributed model is simultaneously saved into database;
Firstly, segmenting using scheduled segmentation methods to each question and answer content, the participle of each question and answer content is obtained
As a result, constructing corresponding participle library based on the word segmentation result, corresponding corpus is generated based on the participle library, is based on the corpus
Word frequency inverse document frequency model is constructed, Di Li Cray distributed mode is implied based on the word frequency inverse document frequency Construction of A Model
Type, preferably library and model in order to obtain, in the repetitive exercise participle library, corpus, word frequency inverse document frequency mould respectively
It is saved after type and implicit Di Li Cray distributed model pre-determined number (for example, 5000 times).
Wherein, scheduled segmentation methods can be Hidden Markov algorithm, be also possible to other algorithms, such as forward direction is most
Question and answer content (is taken m character as matching field, m is longest entry in big machine dictionary by big matching algorithm from left to right
Number, successful match are then segmented), it is reverse maximum matching algorithm (for the reverse thought of positive maximum matching algorithm), two-way
Maximum matching method (i.e. by word segmentation result that Forward Maximum Method method obtains and reverse maximum matching method to result compare
Compared with to determine correct segmenting method) etc..In a preferred embodiment, the rule that priority of long word also can be used carries out
Participle: firstly, by preset kind punctuation mark (for example, ", ", ", " etc.) and to question and answer content carry out short sentence fractionation, for example, from this
Question and answer content initial position to the information between first preset kind punctuation mark is a short sentence, from first preset kind
Punctuation mark to the information between second preset kind punctuation mark is a short sentence, and so on.To each of fractionation
Short sentence continues to segment using priority of long word principle.Priority of long word principle refers to: the short sentence for needing to segment for one
T1 finds out the longest word X1 originated by A from the dictionary of built in advance, X1 is then rejected from T1 first since first character A
Remaining T2, then identical cutting principle is used to T2, the result after cutting is " X1/X2/ ... ".
Corresponding participle library is constructed based on the word segmentation result, segments the form in library for example: (" colleague ", 4), (" opinion ",
3), (" in conjunction with reality ", 1), (" communication ", 2).Wherein, (" colleague ", 4) indicates that number of the participle " colleague " in participle library is
4。
Corresponding corpus is generated based on the participle library, corpus is time that statistics participle occurs in a question and answer content
Number, i.e. word frequency.The form of corpus is for example: [(0,2), (1,1), (2,1)], [(3,1), (4,1), (5,1)].Each bracket
A question and answer content is represented, is separated with comma, (0,2) represents number to be occurred 2 times in this question and answer content as 0 participle.
Word frequency inverse document frequency TF-IDF model is constructed based on the corpus, TF-IDF model consists of two parts,
A part is TF (Token Frequency), indicates that one segments the number occurred in a question and answer content, i.e. word frequency;Separately
A part is IDF (Inverse Document Frequency), indicates that some participle appears in how many a question and answer contents, i.e.,
Reverse document frequency.If some segments the frequency TF high occurred in a question and answer content, and in other question and answer contents very
It is few to occur, then it is assumed that this participle has good class discrimination ability, is adapted to classify.The form of TF-IDF model is for example:
[(0,0.1469), (1,0.2842), (2,0.2561), (3,0.1528)], (0,0.1469) indicate to number the participle for being 0 to this
The importance probability of question and answer content is 0.1469.
Di Li Cray distributed model (Latent Dirichlet is implied based on the word frequency inverse document frequency Construction of A Model
Allocation, LDA), implicit Di Li Cray distributed model is a kind of document subject matter generation model.Implicit Di Li Cray distributed mode
Type has recorded the probability distribution that each question and answer content belongs to different themes, and form is as follows:
[(theme one), (theme two), (theme three)]
[(0,0.7188),(1,0.1550),(2,0.1260)]
[(0,0.2856),(1,0.6423),(2,0.0719)]
[(0,0.4189),(1,0.3004),(2,0.2806)]
Wherein, the probability that (0,0.7188) indicates that question and answer content one belongs to theme one is 0.7188,0 expression question and answer content
One, 1 indicates question and answer content two, and 2 indicate question and answer content three.By comparing, question and answer content one and question and answer content three belong to theme one
Maximum probability, question and answer content two belongs to the maximum probability of theme two.
Step S3 imports the participle library in the database, corpus, word frequency inverse document frequency model and implicit Di Li
Cray distributed model, the question and answer content for treating scoring based on the participle library is segmented, to be scored to this based on the corpus
After the participle statistics word frequency of question and answer content, which is sequentially input to word frequency inverse document frequency model and hidden
In the distributed model of Cray containing Di Li, obtain output the question and answer content to be scored with this belong to a theme history question and answer
The queue of content maximum probability;
Wherein, the participle library after above-mentioned repetitive exercise, corpus, word frequency inverse document frequency model and implicit Di Li
When Cray distributed model is applied to scoring scene, while importing and being applied.It, can be by it for question and answer content to be scored
After being cleaned (as above-mentioned cleaning way), it is then based on participle library and is segmented, word frequency is counted based on corpus, by word
After frequency statistical result inputs word frequency inverse document frequency model training, then the word frequency inverse document frequency model is exported
As a result it inputs in the implicit Di Li Cray distributed model, which to be scored asked with this
Answer content belong to a theme history question and answer content maximum probability queue.
Step S4 chooses from the queue of the maximum probability and is somebody's turn to do question and answer content similarity to be scored more than or equal to predetermined
The question and answer content of threshold value is as similar queue;
For example, the queue of maximum probability is (3,0.8550), (4,0.6423), (7,0.9004), wherein 3 indicate question and answer
Content four, 4 indicate question and answer content five, and 7 indicate question and answer content eight.Probability to belong to a theme is chosen as similarity
The question and answer content similarity to be scored with this is more than or equal to the question and answer content of predetermined threshold (for example, 0.85), if similarity is
0.85, then the similar queue of question and answer content four and the question and answer content to be scored as this of question and answer content eight.
It is corresponding to obtain each question and answer content in the similar queue if the length of the similar queue is more than or equal to 2 by step S5
Practical scoring, which is calculated based on the corresponding practical scoring of question and answer content each in the similar queue
Prediction scoring.
Wherein, if the length of the similar queue is 1, using the practical scoring of the similar queue as the question and answer to be scored
The prediction of content is scored;
If the length of the similar queue is more than or equal to 2, based on the corresponding reality of question and answer content each in the similar queue
Scoring calculates the prediction scoring of the question and answer content to be scored, comprising:
Wherein, PjIt scores for the prediction of the question and answer content j to be scored,For
The corresponding mean value actually to score of whole question and answer content in the similar queue, L are the length of the similar queue (more than or equal to 2
Integer), Sim (i, j) is the similarity of the question and answer content i and question and answer content j that should be to be scored of the similar queue, riIt is similar for this
The corresponding practical scoring of the question and answer content i of queue.
Further, in order to evaluate the accuracy that above-mentioned prediction is scored, N number of question and answer content to be scored can also be obtained
Practical scoring, based on this it is practical scoring calculate N number of question and answer content to be scored prediction scoring mean absolute error, base
Di Li Cray distributed model is implied to this in the mean absolute error to evaluate, in which:
Wherein, rjFor the corresponding practical scoring of the question and answer content j to be scored, N be greater than etc.
In 2 integer.
Wherein, mean absolute error then predicts that the accuracy of scoring is high, above-mentioned participle library, corpus, word closer to 0
The repetitive exercise effect of frequency inverse document frequency model and implicit Di Li Cray distributed model is better.
Compared with prior art, the present invention be primarily based on existing written examination link magnanimity question and answer composition of content participle library,
Corpus, word frequency inverse document frequency model and implicit Di Li Cray distributed model, then, in use, based on participle library
The question and answer content for treating scoring is segmented, based on the corpus count word frequency, finally, by word frequency statistics result sequentially input to
In word frequency inverse document frequency model and implicit Di Li Cray distributed model, the question and answer content to be scored with this that is exported
The queue for belonging to the question and answer content maximum probability of the history of a theme, it is corresponding to obtain more similar queue in the queue
Practical scoring, the prediction scoring of the question and answer content to be scored is calculated based on the practical scoring, and the present invention passes through in magnanimity question and answer
Hold continuous repetition training and obtain model, eliminates the influence of the subjective thinking and preference of scoring person to the objectivity of scoring, ensure
The justice of scoring, and it is time saving and energy saving.
The present invention also provides a kind of computer readable storage medium, processing is stored on the computer readable storage medium
The step of system, the processing system realizes the method for the scoring of above-mentioned prediction question and answer content when being executed by processor.
The serial number of the above embodiments of the invention is only for description, does not represent the advantages or disadvantages of the embodiments.
Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment side
Method can be realized by means of software and necessary general hardware platform, naturally it is also possible to by hardware, but in many cases
The former is more preferably embodiment.Based on this understanding, technical solution of the present invention substantially in other words does the prior art
The part contributed out can be embodied in the form of software products, which is stored in a storage medium
In (such as ROM/RAM, magnetic disk, CD), including some instructions are used so that a terminal device (can be mobile phone, computer, clothes
Business device, air conditioner or the network equipment etc.) execute method described in each embodiment of the present invention.
The above is only a preferred embodiment of the present invention, is not intended to limit the scope of the invention, all to utilize this hair
Equivalent structure or equivalent flow shift made by bright specification and accompanying drawing content is applied directly or indirectly in other relevant skills
Art field, is included within the scope of the present invention.
Claims (10)
1. a kind of device for the scoring for predicting question and answer content, which is characterized in that the device packet of the scoring of the prediction question and answer content
The processor for including memory and connecting with the memory is stored with the place that can be run on the processor in the memory
Reason system, the processing system realize following steps when being executed by the processor:
Collect the question and answer content of written examination link history and to the corresponding practical scoring of each question and answer content;
Based on question and answer composition of content participle library, corpus, word frequency inverse document frequency model and the distribution of implicit Di Li Cray
Model is simultaneously saved into database;
Import the participle library in the database, corpus, word frequency inverse document frequency model and implicit Di Li Cray distributed mode
Type, the question and answer content for treating scoring based on the participle library are segmented, based on the corpus question and answer content to be scored to this
After participle statistics word frequency, which is sequentially input to word frequency inverse document frequency model and implicit Di Li Cray
In distributed model, obtain output the question and answer content to be scored with this belong to a theme history question and answer content probability most
Big queue;
It is chosen from the queue of the maximum probability and is somebody's turn to do the question and answer of question and answer content similarity to be scored more than or equal to predetermined threshold
Content is as similar queue;
If the length of the similar queue is more than or equal to 2, the corresponding practical scoring of each question and answer content in the similar queue is obtained,
The prediction scoring of the question and answer content to be scored is calculated based on the corresponding practical scoring of question and answer content each in the similar queue.
2. the device of the scoring of prediction question and answer content according to claim 1, which is characterized in that described to be based on the similar team
The step of corresponding practical scoring of each question and answer content calculates the prediction scoring of the question and answer content to be scored in column, it is specific to wrap
It includes:
Wherein, PjIt scores for the prediction of the question and answer content j to be scored,For the phase
Like the corresponding mean value actually to score of question and answer contents whole in queue, L is the length of the similar queue, and Sim (i, j) is that this is similar
The similarity of the question and answer content i of queue and question and answer content j that should be to be scored, riIt is corresponding for the question and answer content i of the similar queue
Practical scoring.
3. the device of the scoring of prediction question and answer content according to claim 2, which is characterized in that the processing system is by institute
When stating processor execution, following steps are also realized:
The practical scoring for obtaining N number of question and answer content to be scored is calculated in N number of question and answer to be scored based on the practical scoring
The mean absolute error of the prediction scoring of appearance, comprising:
Wherein, rjFor the corresponding practical scoring of the question and answer content j to be scored, N is more than or equal to 2
Integer;
The accuracy of prediction scoring is analyzed based on the mean absolute error.
4. the device of the scoring of prediction question and answer content according to any one of claims 1 to 3, which is characterized in that the base
Library, corpus, word frequency inverse document frequency model and implicit Di Li Cray distributed model are segmented simultaneously in the question and answer composition of content
The step into database is saved, is specifically included:
Each question and answer content is segmented using scheduled segmentation methods, the word segmentation result of each question and answer content is obtained, is based on
The word segmentation result constructs corresponding participle library, generates corresponding corpus based on the participle library, constructs word frequency based on the corpus
Inverse document frequency model implies Di Li Cray distributed model based on the word frequency inverse document frequency Construction of A Model, is dividing
The other repetitive exercise participle library, corpus, word frequency inverse document frequency model and implicit Di Li Cray distributed model predetermined time
It is saved after number.
5. a kind of method for the scoring for predicting question and answer content, which is characterized in that the method packet of the scoring of the prediction question and answer content
It includes:
S1 collects the question and answer content of written examination link history and to the corresponding practical scoring of each question and answer content;
S2, based on question and answer composition of content participle library, corpus, word frequency inverse document frequency model and implicit Di Li Cray point
Cloth model is simultaneously saved into database;
S3 imports the participle library in the database, corpus, word frequency inverse document frequency model and the distribution of implicit Di Li Cray
Model, the question and answer content for treating scoring based on the participle library are segmented, based on the corpus question and answer content to be scored to this
Participle statistics word frequency after, which is sequentially input to word frequency inverse document frequency model and implicit Di Like
In thunder distributed model, obtain output the question and answer content to be scored with this belong to a theme history question and answer content probability
Maximum queue;
S4 chooses from the queue of the maximum probability and is somebody's turn to do question and answer content similarity to be scored asking more than or equal to predetermined threshold
Content is answered as similar queue;
S5 obtains in the similar queue that each question and answer content is corresponding actually to be commented if the length of the similar queue is more than or equal to 2
Point, the pre- assessment of the question and answer content to be scored is calculated based on the corresponding practical scoring of question and answer content each in the similar queue
Point.
6. the method for the scoring of prediction question and answer content according to claim 5, which is characterized in that described to be based on the similar team
The step of corresponding practical scoring of each question and answer content calculates the prediction scoring of the question and answer content to be scored in column, it is specific to wrap
It includes:
Wherein, PjIt scores for the prediction of the question and answer content j to be scored,For the phase
Like the corresponding mean value actually to score of question and answer contents whole in queue, L is the length of the similar queue, and Sim (i, j) is that this is similar
The similarity of the question and answer content i of queue and question and answer content j that should be to be scored, riIt is corresponding for the question and answer content i of the similar queue
Practical scoring.
7. the method for the scoring of prediction question and answer content according to claim 6, which is characterized in that after the step S5,
Further include:
The practical scoring for obtaining N number of question and answer content to be scored is calculated in N number of question and answer to be scored based on the practical scoring
The mean absolute error of the prediction scoring of appearance, comprising:
Wherein, rjFor the corresponding practical scoring of the question and answer content j to be scored, N is more than or equal to 2
Integer;
The accuracy of prediction scoring is analyzed based on the mean absolute error.
8. according to the method for the scoring of the described in any item prediction question and answer contents of claim 5 to 7, which is characterized in that the base
Library, corpus, word frequency inverse document frequency model and implicit Di Li Cray distributed model are segmented simultaneously in the question and answer composition of content
The step into database is saved, is specifically included:
Each question and answer content is segmented using scheduled segmentation methods, the word segmentation result of each question and answer content is obtained, is based on
The word segmentation result constructs corresponding participle library, generates corresponding corpus based on the participle library, constructs word frequency based on the corpus
Inverse document frequency model implies Di Li Cray distributed model based on the word frequency inverse document frequency Construction of A Model, is dividing
The other repetitive exercise participle library, corpus, word frequency inverse document frequency model and implicit Di Li Cray distributed model predetermined time
It is saved after number.
9. the method for the scoring of prediction question and answer content according to claim 8, which is characterized in that the scheduled participle is calculated
Method is Hidden Markov algorithm.
10. a kind of computer readable storage medium, which is characterized in that be stored with processing system on the computer readable storage medium
System realizes commenting for the prediction question and answer content as described in any one of claim 5 to 9 when the processing system is executed by processor
The step of method divided.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910185054.2A CN110069772B (en) | 2019-03-12 | 2019-03-12 | Device, method and storage medium for predicting scoring of question-answer content |
PCT/CN2019/116548 WO2020181800A1 (en) | 2019-03-12 | 2019-11-08 | Apparatus and method for predicting score for question and answer content, and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910185054.2A CN110069772B (en) | 2019-03-12 | 2019-03-12 | Device, method and storage medium for predicting scoring of question-answer content |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110069772A true CN110069772A (en) | 2019-07-30 |
CN110069772B CN110069772B (en) | 2023-10-20 |
Family
ID=67366178
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910185054.2A Active CN110069772B (en) | 2019-03-12 | 2019-03-12 | Device, method and storage medium for predicting scoring of question-answer content |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN110069772B (en) |
WO (1) | WO2020181800A1 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020181800A1 (en) * | 2019-03-12 | 2020-09-17 | 平安科技(深圳)有限公司 | Apparatus and method for predicting score for question and answer content, and storage medium |
CN113342942A (en) * | 2021-08-02 | 2021-09-03 | 平安科技(深圳)有限公司 | Corpus automatic acquisition method and device, computer equipment and storage medium |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101621391A (en) * | 2009-08-07 | 2010-01-06 | 北京百问百答网络技术有限公司 | Method and system for classifying short texts based on probability topic |
US20150254565A1 (en) * | 2014-03-07 | 2015-09-10 | Educational Testing Service | Systems and Methods for Constructed Response Scoring Using Metaphor Detection |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103559894B (en) * | 2013-11-08 | 2016-04-20 | 科大讯飞股份有限公司 | Oral evaluation method and system |
CN107133238A (en) * | 2016-02-29 | 2017-09-05 | 阿里巴巴集团控股有限公司 | A kind of text message clustering method and text message clustering system |
CN108153876B (en) * | 2017-12-26 | 2021-07-23 | 爱因互动科技发展(北京)有限公司 | Intelligent question and answer method and system |
CN108415980A (en) * | 2018-02-09 | 2018-08-17 | 平安科技(深圳)有限公司 | Question and answer data processing method, electronic device and storage medium |
CN108595427B (en) * | 2018-04-24 | 2021-06-08 | 成都海天数联科技有限公司 | Subjective question scoring method and device, readable storage medium and electronic equipment |
CN108960574A (en) * | 2018-06-07 | 2018-12-07 | 百度在线网络技术(北京)有限公司 | Quality determination method, device, server and the storage medium of question and answer |
CN110069772B (en) * | 2019-03-12 | 2023-10-20 | 平安科技(深圳)有限公司 | Device, method and storage medium for predicting scoring of question-answer content |
-
2019
- 2019-03-12 CN CN201910185054.2A patent/CN110069772B/en active Active
- 2019-11-08 WO PCT/CN2019/116548 patent/WO2020181800A1/en active Application Filing
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101621391A (en) * | 2009-08-07 | 2010-01-06 | 北京百问百答网络技术有限公司 | Method and system for classifying short texts based on probability topic |
US20150254565A1 (en) * | 2014-03-07 | 2015-09-10 | Educational Testing Service | Systems and Methods for Constructed Response Scoring Using Metaphor Detection |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020181800A1 (en) * | 2019-03-12 | 2020-09-17 | 平安科技(深圳)有限公司 | Apparatus and method for predicting score for question and answer content, and storage medium |
CN113342942A (en) * | 2021-08-02 | 2021-09-03 | 平安科技(深圳)有限公司 | Corpus automatic acquisition method and device, computer equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
WO2020181800A1 (en) | 2020-09-17 |
CN110069772B (en) | 2023-10-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2022141861A1 (en) | Emotion classification method and apparatus, electronic device, and storage medium | |
CN109815487B (en) | Text quality inspection method, electronic device, computer equipment and storage medium | |
CN108833458B (en) | Application recommendation method, device, medium and equipment | |
CN108595583A (en) | Dynamic chart class page data crawling method, device, terminal and storage medium | |
WO2019062001A1 (en) | Intelligent robotic customer service method, electronic device and computer readable storage medium | |
CN106649742A (en) | Database maintenance method and device | |
CN110866181A (en) | Resource recommendation method, device and storage medium | |
US10748166B2 (en) | Method and system for mining churn factor causing user churn for network application | |
CN110929145A (en) | Public opinion analysis method, public opinion analysis device, computer device and storage medium | |
CN110046298A (en) | Query word recommendation method and device, terminal device and computer readable medium | |
CN107220867A (en) | object control method and device | |
CN115150471B (en) | Data processing method, apparatus, device, storage medium, and program product | |
WO2019085332A1 (en) | Financial data analysis method, application server, and computer readable storage medium | |
CN112307860A (en) | Image recognition model training method and device and image recognition method and device | |
CN110069772A (en) | Predict device, method and the storage medium of the scoring of question and answer content | |
CN115344805A (en) | Material auditing method, computing equipment and storage medium | |
CN113592036A (en) | Flow cheating behavior identification method and device, storage medium and electronic equipment | |
CN106909454A (en) | A kind of rules process method and equipment | |
CN110309293A (en) | Text recommended method and device | |
CN111491300A (en) | Risk detection method, device, equipment and storage medium | |
CN106649732A (en) | Information pushing method and device | |
CN110019556B (en) | Topic news acquisition method, device and equipment thereof | |
CN108052520A (en) | Conjunctive word analysis method, electronic device and storage medium based on topic model | |
CN116720009A (en) | Social robot detection method, device, equipment and storage medium | |
US11709798B2 (en) | Hash suppression |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |