CN110069772A - Predict device, method and the storage medium of the scoring of question and answer content - Google Patents

Predict device, method and the storage medium of the scoring of question and answer content Download PDF

Info

Publication number
CN110069772A
CN110069772A CN201910185054.2A CN201910185054A CN110069772A CN 110069772 A CN110069772 A CN 110069772A CN 201910185054 A CN201910185054 A CN 201910185054A CN 110069772 A CN110069772 A CN 110069772A
Authority
CN
China
Prior art keywords
question
answer content
scoring
answer
content
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910185054.2A
Other languages
Chinese (zh)
Other versions
CN110069772B (en
Inventor
程磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201910185054.2A priority Critical patent/CN110069772B/en
Publication of CN110069772A publication Critical patent/CN110069772A/en
Priority to PCT/CN2019/116548 priority patent/WO2020181800A1/en
Application granted granted Critical
Publication of CN110069772B publication Critical patent/CN110069772B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/20Education
    • G06Q50/205Education administration or guidance
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • Theoretical Computer Science (AREA)
  • Educational Administration (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • Economics (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Educational Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Health & Medical Sciences (AREA)
  • Development Economics (AREA)
  • Game Theory and Decision Science (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • Primary Health Care (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

The present invention relates to a kind of big data technologies, disclose device, method and the storage medium of a kind of scoring for predicting question and answer content, this method comprises: collecting the question and answer content of written examination link history and to the corresponding practical scoring of each question and answer content;Based on question and answer composition of content participle library, corpus, word frequency inverse document frequency model and implicit Di Li Cray distributed model and save into database;Import the participle library in the database, corpus, word frequency inverse document frequency model and implicit Di Li Cray distributed model, after being segmented, segmenting statistics word frequency, be input in word frequency inverse document frequency model and implicit Di Li Cray distributed model, obtain output the question and answer content to be scored with this belong to a theme history question and answer content maximum probability queue;The corresponding practical scoring of queue based on the maximum probability calculates the prediction scoring of the question and answer content to be scored.The present invention can ensure the justice of scoring.

Description

Predict device, method and the storage medium of the scoring of question and answer content
Technical field
The present invention relates to data analysis technique field more particularly to a kind of devices for the scoring for predicting question and answer content, method And storage medium.
Background technique
Currently, enterprises recruitment is directed to written examination link, catechetical problem is generally comprised in written examination link, is especially managed The posies such as reason, product, catechetical problem occupies biggish part in written examination link.For catechetical problem in written examination link Marking mode be typically dependent on artificial scoring, this mode manually to score largely subjective is thought by personal The influence of dimension and preference influences the objectivity of scoring, and time-consuming and laborious.
Summary of the invention
The purpose of the present invention is to provide device, method and the storage mediums of a kind of scoring for predicting question and answer content, it is intended to Objective, just scoring is carried out to the question and answer content in written examination link.
To achieve the above object, the present invention provides a kind of device of scoring for predicting question and answer content, in the prediction question and answer The device of the scoring of appearance includes memory and the processor that connect with the memory, and being stored in the memory can be described The processing system run on processor, the processing system realize following steps when being executed by the processor:
Collect the question and answer content of written examination link history and to the corresponding practical scoring of each question and answer content;
Based on question and answer composition of content participle library, corpus, word frequency inverse document frequency model and implicit Di Li Cray Distributed model is simultaneously saved into database;
Import the participle library in the database, corpus, word frequency inverse document frequency model and implicit Di Li Cray point Cloth model, the question and answer content for treating scoring based on the participle library are segmented, based in the corpus question and answer to be scored to this After the participle statistics word frequency of appearance, which is sequentially input to word frequency inverse document frequency model and implicit Di Li In Cray distributed model, the question and answer content for obtaining the history for belonging to a theme with the question and answer content to be scored of output is general The maximum queue of rate;
It is chosen from the queue of the maximum probability and is somebody's turn to do question and answer content similarity to be scored more than or equal to predetermined threshold Question and answer content is as similar queue;
If the length of the similar queue is more than or equal to 2, the corresponding reality of each question and answer content in the similar queue is obtained Scoring calculates the pre- assessment of the question and answer content to be scored based on the corresponding practical scoring of question and answer content each in the similar queue Point.
Preferably, described this is calculated based on the corresponding practical scoring of question and answer content each in the similar queue to be scored to ask The step of answering the prediction scoring of content, specifically includes:
Wherein, PjIt scores for the prediction of the question and answer content j to be scored,For The corresponding mean value actually to score of whole question and answer content in the similar queue, L are the length of the similar queue, and Sim (i, j) is should The similarity of the question and answer content i of similar queue and question and answer content j that should be to be scored, riFor i pairs of the question and answer content of the similar queue The practical scoring answered.
Preferably, when the processing system is executed by the processor, following steps are also realized:
The practical scoring for obtaining N number of question and answer content to be scored calculates this based on the practical scoring and N number of to be scored asks Answer the mean absolute error of the prediction scoring of content, comprising:
Wherein, rjFor the corresponding practical scoring of the question and answer content j to be scored, N be greater than etc. In 2 integer;
The accuracy of prediction scoring is analyzed based on the mean absolute error.
Preferably, described based on question and answer composition of content participle library, corpus, word frequency inverse document frequency model and hidden The distributed model of Cray containing Di Li simultaneously saves the step into database, specifically includes:
Each question and answer content is segmented using scheduled segmentation methods, obtains the word segmentation result of each question and answer content, Corresponding participle library is constructed based on the word segmentation result, corresponding corpus is generated based on the participle library, is constructed based on the corpus Word frequency inverse document frequency model implies Di Li Cray distributed model based on the word frequency inverse document frequency Construction of A Model, It is pre- in the repetitive exercise participle library, corpus, word frequency inverse document frequency model and implicit Di Li Cray distributed model respectively It is saved after determining number.
To achieve the above object, the present invention also provides a kind of method of scoring for predicting question and answer content, the prediction question and answer The method of the scoring of content includes:
S1 collects the question and answer content of written examination link history and to the corresponding practical scoring of each question and answer content;
S2, based on question and answer composition of content participle library, corpus, word frequency inverse document frequency model and implicit Di Like Thunder distributed model is simultaneously saved into database;
S3 imports the participle library in the database, corpus, word frequency inverse document frequency model and implicit Di Li Cray Distributed model, the question and answer content for treating scoring based on the participle library are segmented, based on the corpus question and answer to be scored to this After the participle statistics word frequency of content, which is sequentially input to word frequency inverse document frequency model and implicit Di In sharp Cray distributed model, obtain output the question and answer content to be scored with this belong to a theme history question and answer content The queue of maximum probability;
S4 chooses from the queue of the maximum probability and is somebody's turn to do question and answer content similarity to be scored more than or equal to predetermined threshold Question and answer content as similar queue;
S5 obtains the corresponding reality of each question and answer content in the similar queue if the length of the similar queue is more than or equal to 2 Border scoring calculates the prediction of the question and answer content to be scored based on the corresponding practical scoring of question and answer content each in the similar queue Scoring.
Preferably, described this is calculated based on the corresponding practical scoring of question and answer content each in the similar queue to be scored to ask The step of answering the prediction scoring of content, specifically includes:
Wherein, PjIt scores for the prediction of the question and answer content j to be scored,For The corresponding mean value actually to score of whole question and answer content in the similar queue, L are the length of the similar queue, and Sim (i, j) is should The similarity of the question and answer content i of similar queue and question and answer content j that should be to be scored, riFor i pairs of the question and answer content of the similar queue The practical scoring answered.
Preferably, after the step S5, further includes:
The practical scoring for obtaining N number of question and answer content to be scored calculates this based on the practical scoring and N number of to be scored asks Answer the mean absolute error of the prediction scoring of content, comprising:
Wherein, rjFor the corresponding practical scoring of the question and answer content j to be scored, N be greater than etc. In 2 integer;
The accuracy of prediction scoring is analyzed based on the mean absolute error.
Preferably, described based on question and answer composition of content participle library, corpus, word frequency inverse document frequency model and hidden The distributed model of Cray containing Di Li simultaneously saves the step into database, specifically includes:
Each question and answer content is segmented using scheduled segmentation methods, obtains the word segmentation result of each question and answer content, Corresponding participle library is constructed based on the word segmentation result, corresponding corpus is generated based on the participle library, is constructed based on the corpus Word frequency inverse document frequency model implies Di Li Cray distributed model based on the word frequency inverse document frequency Construction of A Model, It is pre- in the repetitive exercise participle library, corpus, word frequency inverse document frequency model and implicit Di Li Cray distributed model respectively It is saved after determining number.
Preferably, the scheduled segmentation methods are Hidden Markov algorithm.
The present invention also provides a kind of computer readable storage medium, processing is stored on the computer readable storage medium The step of system, the processing system realizes the method for the scoring of above-mentioned prediction question and answer content when being executed by processor.
The beneficial effects of the present invention are: the present invention is primarily based on the magnanimity question and answer composition of content participle of existing written examination link Library, corpus, word frequency inverse document frequency model and implicit Di Li Cray distributed model, then, in use, based on participle The question and answer content that scoring is treated in library is segmented, and word frequency is counted based on the corpus, finally, word frequency statistics result is sequentially input Into word frequency inverse document frequency model and implicit Di Li Cray distributed model, in the question and answer to be scored with this that are exported Hold the queue for belonging to the question and answer content maximum probability of history of a theme, it is corresponding to obtain more similar queue in the queue Practical scoring, based on this it is practical scoring calculate the question and answer content to be scored prediction scoring, the present invention pass through magnanimity question and answer The continuous repetition training of content obtains model, eliminates the influence of the subjective thinking and preference of scoring person to the objectivity of scoring, protects Hinder the justice of scoring, and time saving and energy saving.
Detailed description of the invention
Fig. 1 is the optional application environment schematic diagram of each embodiment one of the invention;
Fig. 2 is the schematic diagram that the hardware structure of one embodiment of device of scoring of question and answer content is predicted in Fig. 1;
Fig. 3 is the Program modual graph that processing system unifies embodiment in Fig. 1, Fig. 2;
Fig. 4 is the flow diagram of one embodiment of method of the scoring of present invention prediction question and answer content.
Specific embodiment
In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to the accompanying drawings and embodiments, right The present invention is further elaborated.It should be appreciated that described herein, specific examples are only used to explain the present invention, not For limiting the present invention.Based on the embodiments of the present invention, those of ordinary skill in the art are not before making creative work Every other embodiment obtained is put, shall fall within the protection scope of the present invention.
It should be noted that the description for being related to " first ", " second " etc. in the present invention is used for description purposes only, and cannot It is interpreted as its relative importance of indication or suggestion or implicitly indicates the quantity of indicated technical characteristic.Define as a result, " the One ", the feature of " second " can explicitly or implicitly include at least one of the features.In addition, the skill between each embodiment Art scheme can be combined with each other, but must be based on can be realized by those of ordinary skill in the art, when technical solution Will be understood that the combination of this technical solution is not present in conjunction with there is conflicting or cannot achieve when, also not the present invention claims Protection scope within.
As shown in fig.1, being the application environment schematic diagram of present pre-ferred embodiments.It, should in the application environment schematic diagram Predict that the device 1 of the scoring of question and answer content is connected with input unit 2, output device 3 by network 4.It is defeated by input unit 2 Enter question and answer content to be scored, the question and answer content that the device 1 of the scoring of prediction question and answer content treats scoring carries out prediction scoring, will Prediction scoring is transmitted to output device 3 by network 4.The device 1 for predicting the scoring of question and answer content includes processing system 10 (APP), the question and answer content that processing system 10 treats scoring is analyzed to obtain prediction scoring, is exported by output device 3.
The device 1 of the scoring of the prediction question and answer content is that one kind can be according to the instruction for being previously set or storing, certainly The dynamic equipment for carrying out numerical value calculating and/or information processing.It is described prediction question and answer content scoring device 1 can be computer, Be also possible to single network server, multiple network servers composition server group or based on cloud computing by a large amount of hosts Or the cloud that network server is constituted, wherein cloud computing is one kind of distributed computing, by the computer set of a group loose couplings One super virtual computer of composition.
In the present embodiment, it as shown in Fig. 2, the device 1 of the scoring of prediction question and answer content may include, but is not limited only to, it can The memory 11, processor 12, network interface 13 of connection are in communication with each other by system bus, memory 11, which is stored with, to be handled The processing system run on device 12.It should be pointed out that Fig. 2 illustrates only commenting for the prediction question and answer content with component 11-13 Point device 1, it should be understood that be not required for implementing all components shown, the implementation that can be substituted is more or more Few component.
Wherein, memory 11 includes the readable storage medium storing program for executing of memory and at least one type.Inside save as prediction question and answer content Scoring device 1 operation provide caching;Readable storage medium storing program for executing can be for such as flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory etc.), random access storage device (RAM), static random-access memory (SRAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), programmable read only memory (PROM), magnetic storage, magnetic The non-volatile memory medium of disk, CD etc..In some embodiments, readable storage medium storing program for executing can be commenting for prediction question and answer content Point device 1 internal storage unit, such as the prediction question and answer content scoring device 1 hard disk;In other embodiments In, which is also possible to predict the External memory equipment of the device 1 of the scoring of question and answer content, such as predicts The plug-in type hard disk being equipped on the device 1 of the scoring of question and answer content, intelligent memory card (Smart Media Card, SMC), safety Digital (Secure Digital, SD) card, flash card (Flash Card) etc..In the present embodiment, the readable storage of memory 11 Medium is installed on the operating system and types of applications software of the device 1 of the scoring of prediction question and answer content commonly used in storage, such as Store the program code etc. of the processing system in one embodiment of the invention.In addition, memory 11 can be also used for temporarily storing The Various types of data that has exported or will export.
The processor 12 can be in some embodiments central processing unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor or other data processing chips.The processor 12 is described pre- commonly used in controlling The overall operation of the device 1 of the scoring of question and answer content is surveyed, such as is executed related to the progress data interaction of other devices or communication Control and processing etc..In the present embodiment, the processor 12 for run the program code stored in the memory 11 or Person handles data, such as operation processing system etc..
The network interface 13 may include radio network interface or wired network interface, which is commonly used in Communication connection is established between the device 1 and other devices of the scoring of the prediction question and answer content.In the present embodiment, network interface 13 It is mainly used for for the device 1 for predicting the scoring of question and answer content being connected with input unit 2, output device 3, establishes data transmission channel And communication connection.
The processing system is stored in memory 11, is stored in including at least one computer-readable in memory 11 Instruction, at least one computer-readable instruction can be executed by processor device 12, the method to realize each embodiment of the application;With And the function that at least one computer-readable instruction is realized according to its each section is different, can be divided into different logic moulds Block.
In one embodiment, following steps are realized when above-mentioned processing system is executed by the processor 12:
Collect the question and answer content of written examination link history and to the corresponding practical scoring of each question and answer content;
Based on question and answer composition of content participle library, corpus, word frequency inverse document frequency model and implicit Di Li Cray Distributed model is simultaneously saved into database;
Import the participle library in the database, corpus, word frequency inverse document frequency model and implicit Di Li Cray point Cloth model, the question and answer content for treating scoring based on the participle library are segmented, based in the corpus question and answer to be scored to this After the participle statistics word frequency of appearance, which is sequentially input to word frequency inverse document frequency model and implicit Di Li In Cray distributed model, the question and answer content for obtaining the history for belonging to a theme with the question and answer content to be scored of output is general The maximum queue of rate;
It is chosen from the queue of the maximum probability and is somebody's turn to do question and answer content similarity to be scored more than or equal to predetermined threshold Question and answer content is as similar queue;
If the length of the similar queue is more than or equal to 2, the corresponding reality of each question and answer content in the similar queue is obtained Scoring calculates the pre- assessment of the question and answer content to be scored based on the corresponding practical scoring of question and answer content each in the similar queue Point.
It is the Program modual graph of processing system 10 in Fig. 1, Fig. 2 referring to shown in Fig. 3.The processing system 10 is divided into Multiple modules, multiple module are stored in memory 12, and are executed by processor 13, to complete the present invention.Institute of the present invention The module of title is the series of computation machine program instruction section for referring to complete specific function.
Collection module 101, for collecting the question and answer content of written examination link history and to the corresponding reality of each question and answer content Scoring;
Constructing module 102, for segmenting library, corpus, word frequency inverse document frequency mould based on the question and answer composition of content Type and implicit Di Li Cray distributed model are simultaneously saved into database;
Output module 103, participle library, corpus, word frequency inverse document frequency model for importing in the database And implicit Di Li Cray distributed model, the question and answer content for treating scoring based on the participle library are segmented, and the corpus pair is based on After the participle statistics word frequency of the question and answer content wait scoring, which is sequentially input and is referred to word frequency against text frequency In exponential model and implicit Di Li Cray distributed model, the question and answer content to be scored with this for obtaining output belongs to a theme The queue of the question and answer content maximum probability of history;
Module 104 is chosen, it is big for the selection from the queue of the maximum probability and question and answer content similarity that should be to be scored In the question and answer content equal to predetermined threshold as similar queue;
It predicts grading module 105, if the length for the similar queue is more than or equal to 2, obtains every in the similar queue It is to be evaluated to calculate this based on the corresponding practical scoring of question and answer content each in the similar queue for the corresponding practical scoring of one question and answer content The prediction scoring for the question and answer content divided.
As shown in figure 4, Fig. 4 is the flow diagram of one embodiment of method of the scoring of present invention prediction question and answer content, it should Predict question and answer content scoring method the following steps are included:
Step S1 collects the question and answer content of written examination link history and to the corresponding practical scoring of each question and answer content;
Wherein, collect the history question and answer content of written examination link magnanimity in each enterprise, the question and answer content include enterprises recruitment to Answer content, the answer content of applicant out are answered for example, answer content is when colleague " disagree with how to handle " Content is " when encountering and colleague can not see eye to eye, I can combine and actually link up ".For the answer content enterprise meeting Provide corresponding practical scoring.
Further, in order to reduce interference information, data cleansing can be carried out to question and answer content, including the spelling to data Mistake, messy code etc. are cleaned, but are not needed removal and repeated the case where answering, because subsequent be related to word frequency statistics.
Step S2, based on question and answer composition of content participle library, corpus, word frequency inverse document frequency model and implicit Di Sharp Cray distributed model is simultaneously saved into database;
Firstly, segmenting using scheduled segmentation methods to each question and answer content, the participle of each question and answer content is obtained As a result, constructing corresponding participle library based on the word segmentation result, corresponding corpus is generated based on the participle library, is based on the corpus Word frequency inverse document frequency model is constructed, Di Li Cray distributed mode is implied based on the word frequency inverse document frequency Construction of A Model Type, preferably library and model in order to obtain, in the repetitive exercise participle library, corpus, word frequency inverse document frequency mould respectively It is saved after type and implicit Di Li Cray distributed model pre-determined number (for example, 5000 times).
Wherein, scheduled segmentation methods can be Hidden Markov algorithm, be also possible to other algorithms, such as forward direction is most Question and answer content (is taken m character as matching field, m is longest entry in big machine dictionary by big matching algorithm from left to right Number, successful match are then segmented), it is reverse maximum matching algorithm (for the reverse thought of positive maximum matching algorithm), two-way Maximum matching method (i.e. by word segmentation result that Forward Maximum Method method obtains and reverse maximum matching method to result compare Compared with to determine correct segmenting method) etc..In a preferred embodiment, the rule that priority of long word also can be used carries out Participle: firstly, by preset kind punctuation mark (for example, ", ", ", " etc.) and to question and answer content carry out short sentence fractionation, for example, from this Question and answer content initial position to the information between first preset kind punctuation mark is a short sentence, from first preset kind Punctuation mark to the information between second preset kind punctuation mark is a short sentence, and so on.To each of fractionation Short sentence continues to segment using priority of long word principle.Priority of long word principle refers to: the short sentence for needing to segment for one T1 finds out the longest word X1 originated by A from the dictionary of built in advance, X1 is then rejected from T1 first since first character A Remaining T2, then identical cutting principle is used to T2, the result after cutting is " X1/X2/ ... ".
Corresponding participle library is constructed based on the word segmentation result, segments the form in library for example: (" colleague ", 4), (" opinion ", 3), (" in conjunction with reality ", 1), (" communication ", 2).Wherein, (" colleague ", 4) indicates that number of the participle " colleague " in participle library is 4。
Corresponding corpus is generated based on the participle library, corpus is time that statistics participle occurs in a question and answer content Number, i.e. word frequency.The form of corpus is for example: [(0,2), (1,1), (2,1)], [(3,1), (4,1), (5,1)].Each bracket A question and answer content is represented, is separated with comma, (0,2) represents number to be occurred 2 times in this question and answer content as 0 participle.
Word frequency inverse document frequency TF-IDF model is constructed based on the corpus, TF-IDF model consists of two parts, A part is TF (Token Frequency), indicates that one segments the number occurred in a question and answer content, i.e. word frequency;Separately A part is IDF (Inverse Document Frequency), indicates that some participle appears in how many a question and answer contents, i.e., Reverse document frequency.If some segments the frequency TF high occurred in a question and answer content, and in other question and answer contents very It is few to occur, then it is assumed that this participle has good class discrimination ability, is adapted to classify.The form of TF-IDF model is for example: [(0,0.1469), (1,0.2842), (2,0.2561), (3,0.1528)], (0,0.1469) indicate to number the participle for being 0 to this The importance probability of question and answer content is 0.1469.
Di Li Cray distributed model (Latent Dirichlet is implied based on the word frequency inverse document frequency Construction of A Model Allocation, LDA), implicit Di Li Cray distributed model is a kind of document subject matter generation model.Implicit Di Li Cray distributed mode Type has recorded the probability distribution that each question and answer content belongs to different themes, and form is as follows:
[(theme one), (theme two), (theme three)]
[(0,0.7188),(1,0.1550),(2,0.1260)]
[(0,0.2856),(1,0.6423),(2,0.0719)]
[(0,0.4189),(1,0.3004),(2,0.2806)]
Wherein, the probability that (0,0.7188) indicates that question and answer content one belongs to theme one is 0.7188,0 expression question and answer content One, 1 indicates question and answer content two, and 2 indicate question and answer content three.By comparing, question and answer content one and question and answer content three belong to theme one Maximum probability, question and answer content two belongs to the maximum probability of theme two.
Step S3 imports the participle library in the database, corpus, word frequency inverse document frequency model and implicit Di Li Cray distributed model, the question and answer content for treating scoring based on the participle library is segmented, to be scored to this based on the corpus After the participle statistics word frequency of question and answer content, which is sequentially input to word frequency inverse document frequency model and hidden In the distributed model of Cray containing Di Li, obtain output the question and answer content to be scored with this belong to a theme history question and answer The queue of content maximum probability;
Wherein, the participle library after above-mentioned repetitive exercise, corpus, word frequency inverse document frequency model and implicit Di Li When Cray distributed model is applied to scoring scene, while importing and being applied.It, can be by it for question and answer content to be scored After being cleaned (as above-mentioned cleaning way), it is then based on participle library and is segmented, word frequency is counted based on corpus, by word After frequency statistical result inputs word frequency inverse document frequency model training, then the word frequency inverse document frequency model is exported As a result it inputs in the implicit Di Li Cray distributed model, which to be scored asked with this Answer content belong to a theme history question and answer content maximum probability queue.
Step S4 chooses from the queue of the maximum probability and is somebody's turn to do question and answer content similarity to be scored more than or equal to predetermined The question and answer content of threshold value is as similar queue;
For example, the queue of maximum probability is (3,0.8550), (4,0.6423), (7,0.9004), wherein 3 indicate question and answer Content four, 4 indicate question and answer content five, and 7 indicate question and answer content eight.Probability to belong to a theme is chosen as similarity The question and answer content similarity to be scored with this is more than or equal to the question and answer content of predetermined threshold (for example, 0.85), if similarity is 0.85, then the similar queue of question and answer content four and the question and answer content to be scored as this of question and answer content eight.
It is corresponding to obtain each question and answer content in the similar queue if the length of the similar queue is more than or equal to 2 by step S5 Practical scoring, which is calculated based on the corresponding practical scoring of question and answer content each in the similar queue Prediction scoring.
Wherein, if the length of the similar queue is 1, using the practical scoring of the similar queue as the question and answer to be scored The prediction of content is scored;
If the length of the similar queue is more than or equal to 2, based on the corresponding reality of question and answer content each in the similar queue Scoring calculates the prediction scoring of the question and answer content to be scored, comprising:
Wherein, PjIt scores for the prediction of the question and answer content j to be scored,For The corresponding mean value actually to score of whole question and answer content in the similar queue, L are the length of the similar queue (more than or equal to 2 Integer), Sim (i, j) is the similarity of the question and answer content i and question and answer content j that should be to be scored of the similar queue, riIt is similar for this The corresponding practical scoring of the question and answer content i of queue.
Further, in order to evaluate the accuracy that above-mentioned prediction is scored, N number of question and answer content to be scored can also be obtained Practical scoring, based on this it is practical scoring calculate N number of question and answer content to be scored prediction scoring mean absolute error, base Di Li Cray distributed model is implied to this in the mean absolute error to evaluate, in which:
Wherein, rjFor the corresponding practical scoring of the question and answer content j to be scored, N be greater than etc. In 2 integer.
Wherein, mean absolute error then predicts that the accuracy of scoring is high, above-mentioned participle library, corpus, word closer to 0 The repetitive exercise effect of frequency inverse document frequency model and implicit Di Li Cray distributed model is better.
Compared with prior art, the present invention be primarily based on existing written examination link magnanimity question and answer composition of content participle library, Corpus, word frequency inverse document frequency model and implicit Di Li Cray distributed model, then, in use, based on participle library The question and answer content for treating scoring is segmented, based on the corpus count word frequency, finally, by word frequency statistics result sequentially input to In word frequency inverse document frequency model and implicit Di Li Cray distributed model, the question and answer content to be scored with this that is exported The queue for belonging to the question and answer content maximum probability of the history of a theme, it is corresponding to obtain more similar queue in the queue Practical scoring, the prediction scoring of the question and answer content to be scored is calculated based on the practical scoring, and the present invention passes through in magnanimity question and answer Hold continuous repetition training and obtain model, eliminates the influence of the subjective thinking and preference of scoring person to the objectivity of scoring, ensure The justice of scoring, and it is time saving and energy saving.
The present invention also provides a kind of computer readable storage medium, processing is stored on the computer readable storage medium The step of system, the processing system realizes the method for the scoring of above-mentioned prediction question and answer content when being executed by processor.
The serial number of the above embodiments of the invention is only for description, does not represent the advantages or disadvantages of the embodiments.
Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment side Method can be realized by means of software and necessary general hardware platform, naturally it is also possible to by hardware, but in many cases The former is more preferably embodiment.Based on this understanding, technical solution of the present invention substantially in other words does the prior art The part contributed out can be embodied in the form of software products, which is stored in a storage medium In (such as ROM/RAM, magnetic disk, CD), including some instructions are used so that a terminal device (can be mobile phone, computer, clothes Business device, air conditioner or the network equipment etc.) execute method described in each embodiment of the present invention.
The above is only a preferred embodiment of the present invention, is not intended to limit the scope of the invention, all to utilize this hair Equivalent structure or equivalent flow shift made by bright specification and accompanying drawing content is applied directly or indirectly in other relevant skills Art field, is included within the scope of the present invention.

Claims (10)

1. a kind of device for the scoring for predicting question and answer content, which is characterized in that the device packet of the scoring of the prediction question and answer content The processor for including memory and connecting with the memory is stored with the place that can be run on the processor in the memory Reason system, the processing system realize following steps when being executed by the processor:
Collect the question and answer content of written examination link history and to the corresponding practical scoring of each question and answer content;
Based on question and answer composition of content participle library, corpus, word frequency inverse document frequency model and the distribution of implicit Di Li Cray Model is simultaneously saved into database;
Import the participle library in the database, corpus, word frequency inverse document frequency model and implicit Di Li Cray distributed mode Type, the question and answer content for treating scoring based on the participle library are segmented, based on the corpus question and answer content to be scored to this After participle statistics word frequency, which is sequentially input to word frequency inverse document frequency model and implicit Di Li Cray In distributed model, obtain output the question and answer content to be scored with this belong to a theme history question and answer content probability most Big queue;
It is chosen from the queue of the maximum probability and is somebody's turn to do the question and answer of question and answer content similarity to be scored more than or equal to predetermined threshold Content is as similar queue;
If the length of the similar queue is more than or equal to 2, the corresponding practical scoring of each question and answer content in the similar queue is obtained, The prediction scoring of the question and answer content to be scored is calculated based on the corresponding practical scoring of question and answer content each in the similar queue.
2. the device of the scoring of prediction question and answer content according to claim 1, which is characterized in that described to be based on the similar team The step of corresponding practical scoring of each question and answer content calculates the prediction scoring of the question and answer content to be scored in column, it is specific to wrap It includes:
Wherein, PjIt scores for the prediction of the question and answer content j to be scored,For the phase Like the corresponding mean value actually to score of question and answer contents whole in queue, L is the length of the similar queue, and Sim (i, j) is that this is similar The similarity of the question and answer content i of queue and question and answer content j that should be to be scored, riIt is corresponding for the question and answer content i of the similar queue Practical scoring.
3. the device of the scoring of prediction question and answer content according to claim 2, which is characterized in that the processing system is by institute When stating processor execution, following steps are also realized:
The practical scoring for obtaining N number of question and answer content to be scored is calculated in N number of question and answer to be scored based on the practical scoring The mean absolute error of the prediction scoring of appearance, comprising:
Wherein, rjFor the corresponding practical scoring of the question and answer content j to be scored, N is more than or equal to 2 Integer;
The accuracy of prediction scoring is analyzed based on the mean absolute error.
4. the device of the scoring of prediction question and answer content according to any one of claims 1 to 3, which is characterized in that the base Library, corpus, word frequency inverse document frequency model and implicit Di Li Cray distributed model are segmented simultaneously in the question and answer composition of content The step into database is saved, is specifically included:
Each question and answer content is segmented using scheduled segmentation methods, the word segmentation result of each question and answer content is obtained, is based on The word segmentation result constructs corresponding participle library, generates corresponding corpus based on the participle library, constructs word frequency based on the corpus Inverse document frequency model implies Di Li Cray distributed model based on the word frequency inverse document frequency Construction of A Model, is dividing The other repetitive exercise participle library, corpus, word frequency inverse document frequency model and implicit Di Li Cray distributed model predetermined time It is saved after number.
5. a kind of method for the scoring for predicting question and answer content, which is characterized in that the method packet of the scoring of the prediction question and answer content It includes:
S1 collects the question and answer content of written examination link history and to the corresponding practical scoring of each question and answer content;
S2, based on question and answer composition of content participle library, corpus, word frequency inverse document frequency model and implicit Di Li Cray point Cloth model is simultaneously saved into database;
S3 imports the participle library in the database, corpus, word frequency inverse document frequency model and the distribution of implicit Di Li Cray Model, the question and answer content for treating scoring based on the participle library are segmented, based on the corpus question and answer content to be scored to this Participle statistics word frequency after, which is sequentially input to word frequency inverse document frequency model and implicit Di Like In thunder distributed model, obtain output the question and answer content to be scored with this belong to a theme history question and answer content probability Maximum queue;
S4 chooses from the queue of the maximum probability and is somebody's turn to do question and answer content similarity to be scored asking more than or equal to predetermined threshold Content is answered as similar queue;
S5 obtains in the similar queue that each question and answer content is corresponding actually to be commented if the length of the similar queue is more than or equal to 2 Point, the pre- assessment of the question and answer content to be scored is calculated based on the corresponding practical scoring of question and answer content each in the similar queue Point.
6. the method for the scoring of prediction question and answer content according to claim 5, which is characterized in that described to be based on the similar team The step of corresponding practical scoring of each question and answer content calculates the prediction scoring of the question and answer content to be scored in column, it is specific to wrap It includes:
Wherein, PjIt scores for the prediction of the question and answer content j to be scored,For the phase Like the corresponding mean value actually to score of question and answer contents whole in queue, L is the length of the similar queue, and Sim (i, j) is that this is similar The similarity of the question and answer content i of queue and question and answer content j that should be to be scored, riIt is corresponding for the question and answer content i of the similar queue Practical scoring.
7. the method for the scoring of prediction question and answer content according to claim 6, which is characterized in that after the step S5, Further include:
The practical scoring for obtaining N number of question and answer content to be scored is calculated in N number of question and answer to be scored based on the practical scoring The mean absolute error of the prediction scoring of appearance, comprising:
Wherein, rjFor the corresponding practical scoring of the question and answer content j to be scored, N is more than or equal to 2 Integer;
The accuracy of prediction scoring is analyzed based on the mean absolute error.
8. according to the method for the scoring of the described in any item prediction question and answer contents of claim 5 to 7, which is characterized in that the base Library, corpus, word frequency inverse document frequency model and implicit Di Li Cray distributed model are segmented simultaneously in the question and answer composition of content The step into database is saved, is specifically included:
Each question and answer content is segmented using scheduled segmentation methods, the word segmentation result of each question and answer content is obtained, is based on The word segmentation result constructs corresponding participle library, generates corresponding corpus based on the participle library, constructs word frequency based on the corpus Inverse document frequency model implies Di Li Cray distributed model based on the word frequency inverse document frequency Construction of A Model, is dividing The other repetitive exercise participle library, corpus, word frequency inverse document frequency model and implicit Di Li Cray distributed model predetermined time It is saved after number.
9. the method for the scoring of prediction question and answer content according to claim 8, which is characterized in that the scheduled participle is calculated Method is Hidden Markov algorithm.
10. a kind of computer readable storage medium, which is characterized in that be stored with processing system on the computer readable storage medium System realizes commenting for the prediction question and answer content as described in any one of claim 5 to 9 when the processing system is executed by processor The step of method divided.
CN201910185054.2A 2019-03-12 2019-03-12 Device, method and storage medium for predicting scoring of question-answer content Active CN110069772B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201910185054.2A CN110069772B (en) 2019-03-12 2019-03-12 Device, method and storage medium for predicting scoring of question-answer content
PCT/CN2019/116548 WO2020181800A1 (en) 2019-03-12 2019-11-08 Apparatus and method for predicting score for question and answer content, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910185054.2A CN110069772B (en) 2019-03-12 2019-03-12 Device, method and storage medium for predicting scoring of question-answer content

Publications (2)

Publication Number Publication Date
CN110069772A true CN110069772A (en) 2019-07-30
CN110069772B CN110069772B (en) 2023-10-20

Family

ID=67366178

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910185054.2A Active CN110069772B (en) 2019-03-12 2019-03-12 Device, method and storage medium for predicting scoring of question-answer content

Country Status (2)

Country Link
CN (1) CN110069772B (en)
WO (1) WO2020181800A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020181800A1 (en) * 2019-03-12 2020-09-17 平安科技(深圳)有限公司 Apparatus and method for predicting score for question and answer content, and storage medium
CN113342942A (en) * 2021-08-02 2021-09-03 平安科技(深圳)有限公司 Corpus automatic acquisition method and device, computer equipment and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101621391A (en) * 2009-08-07 2010-01-06 北京百问百答网络技术有限公司 Method and system for classifying short texts based on probability topic
US20150254565A1 (en) * 2014-03-07 2015-09-10 Educational Testing Service Systems and Methods for Constructed Response Scoring Using Metaphor Detection

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103559894B (en) * 2013-11-08 2016-04-20 科大讯飞股份有限公司 Oral evaluation method and system
CN107133238A (en) * 2016-02-29 2017-09-05 阿里巴巴集团控股有限公司 A kind of text message clustering method and text message clustering system
CN108153876B (en) * 2017-12-26 2021-07-23 爱因互动科技发展(北京)有限公司 Intelligent question and answer method and system
CN108415980A (en) * 2018-02-09 2018-08-17 平安科技(深圳)有限公司 Question and answer data processing method, electronic device and storage medium
CN108595427B (en) * 2018-04-24 2021-06-08 成都海天数联科技有限公司 Subjective question scoring method and device, readable storage medium and electronic equipment
CN108960574A (en) * 2018-06-07 2018-12-07 百度在线网络技术(北京)有限公司 Quality determination method, device, server and the storage medium of question and answer
CN110069772B (en) * 2019-03-12 2023-10-20 平安科技(深圳)有限公司 Device, method and storage medium for predicting scoring of question-answer content

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101621391A (en) * 2009-08-07 2010-01-06 北京百问百答网络技术有限公司 Method and system for classifying short texts based on probability topic
US20150254565A1 (en) * 2014-03-07 2015-09-10 Educational Testing Service Systems and Methods for Constructed Response Scoring Using Metaphor Detection

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020181800A1 (en) * 2019-03-12 2020-09-17 平安科技(深圳)有限公司 Apparatus and method for predicting score for question and answer content, and storage medium
CN113342942A (en) * 2021-08-02 2021-09-03 平安科技(深圳)有限公司 Corpus automatic acquisition method and device, computer equipment and storage medium

Also Published As

Publication number Publication date
WO2020181800A1 (en) 2020-09-17
CN110069772B (en) 2023-10-20

Similar Documents

Publication Publication Date Title
WO2022141861A1 (en) Emotion classification method and apparatus, electronic device, and storage medium
CN109815487B (en) Text quality inspection method, electronic device, computer equipment and storage medium
CN108833458B (en) Application recommendation method, device, medium and equipment
CN108595583A (en) Dynamic chart class page data crawling method, device, terminal and storage medium
WO2019062001A1 (en) Intelligent robotic customer service method, electronic device and computer readable storage medium
CN106649742A (en) Database maintenance method and device
CN110866181A (en) Resource recommendation method, device and storage medium
US10748166B2 (en) Method and system for mining churn factor causing user churn for network application
CN110929145A (en) Public opinion analysis method, public opinion analysis device, computer device and storage medium
CN110046298A (en) Query word recommendation method and device, terminal device and computer readable medium
CN107220867A (en) object control method and device
CN115150471B (en) Data processing method, apparatus, device, storage medium, and program product
WO2019085332A1 (en) Financial data analysis method, application server, and computer readable storage medium
CN112307860A (en) Image recognition model training method and device and image recognition method and device
CN110069772A (en) Predict device, method and the storage medium of the scoring of question and answer content
CN115344805A (en) Material auditing method, computing equipment and storage medium
CN113592036A (en) Flow cheating behavior identification method and device, storage medium and electronic equipment
CN106909454A (en) A kind of rules process method and equipment
CN110309293A (en) Text recommended method and device
CN111491300A (en) Risk detection method, device, equipment and storage medium
CN106649732A (en) Information pushing method and device
CN110019556B (en) Topic news acquisition method, device and equipment thereof
CN108052520A (en) Conjunctive word analysis method, electronic device and storage medium based on topic model
CN116720009A (en) Social robot detection method, device, equipment and storage medium
US11709798B2 (en) Hash suppression

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant