WO2020147238A1 - Keyword determination method, automatic scoring method, apparatus and device, and medium - Google Patents

Keyword determination method, automatic scoring method, apparatus and device, and medium Download PDF

Info

Publication number
WO2020147238A1
WO2020147238A1 PCT/CN2019/088544 CN2019088544W WO2020147238A1 WO 2020147238 A1 WO2020147238 A1 WO 2020147238A1 CN 2019088544 W CN2019088544 W CN 2019088544W WO 2020147238 A1 WO2020147238 A1 WO 2020147238A1
Authority
WO
WIPO (PCT)
Prior art keywords
sample
keywords
test site
keyword
decision tree
Prior art date
Application number
PCT/CN2019/088544
Other languages
French (fr)
Chinese (zh)
Inventor
金戈
徐亮
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2020147238A1 publication Critical patent/WO2020147238A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition

Definitions

  • This application relates to the field of intelligent decision-making, and in particular to a method for determining keywords, an automatic scoring method, device, equipment, and storage medium.
  • the method of scoring the content of the test takers’ subjective questions is usually to manually establish the content of the test site and related keywords through the information of the grading rules in advance, and then identify the content of the test based on the content of the test site and related keywords through the regular matching method to identify the content of the test, which is The content of the answer is scored.
  • the determined test sites and related keywords not only have low generalization ability, but also have low accuracy. Therefore, the final grading results obtained when grading the test takers’ answer content will be biased, which cannot reflect the real level of the examinees.
  • the embodiments of the present application provide a method, device, device, and storage medium for determining keywords to solve the problem of low keyword generalization ability and low accuracy.
  • the embodiments of the present application provide an automatic scoring method, device, equipment, and storage medium to solve the problem that the test taker’s answer content cannot be efficiently and accurately scored.
  • a method for determining keywords including:
  • each of the first sample answer data includes sample answer information and a first score value, and N is a positive integer;
  • sample word segmentation set to perform feature conversion on the sample answer information of each of the first sample answer data to obtain sample training features
  • An automatic scoring method including:
  • the target test site uses the target test site to perform feature transformation on the core keywords to obtain the features of the test site to be scored; wherein, the target test site is obtained by using the keyword determination method of claim 2;
  • the characteristics of the test point to be scored are input into a preset decision tree reference model to obtain the accurate score of the answer information to be scored.
  • a keyword determining device includes:
  • the first sample answer data acquisition module is used to acquire N first sample answer data, each of the first sample answer data includes sample answer information and a first score value, and N is a positive integer;
  • the word segmentation processing module is configured to perform word segmentation processing on the sample answer information of each of the first sample answer data to obtain the sample word segmentation of each of the first sample answer data;
  • the total vocabulary segmentation module is used to summarize the sample segmentation of each of the first sample answer data to obtain a sample segmentation set
  • a sample feature conversion module configured to use the sample word segmentation set to perform feature conversion on the sample answer information of each of the first sample answer data to obtain sample training features
  • the decision tree sample model training module is used to train the decision tree model according to the sample training feature and the corresponding first score value to obtain the decision tree sample model;
  • the sample keyword extraction module is used to extract sample keywords from the decision tree sample model.
  • An automatic scoring device including:
  • To-be-graded answer information acquisition module used to obtain the to-be-graded answer information
  • the keyword extraction module is used to extract keywords from the answer information to be scored to obtain core keywords
  • the feature conversion module of the test point to be scored is used to transform the core keywords with the target test point to obtain the feature of the test point to be scored; wherein, the target test point is obtained by using the method for determining keywords according to claim 2;
  • the input module is used to input the characteristics of the test site to be scored into a preset decision tree reference model to obtain an accurate score of the answer information to be scored.
  • a computer device including a memory, a processor, and computer-readable instructions stored in the memory and capable of running on the processor, and the processor realizes the determination of the above-mentioned keywords when the computer-readable instructions are executed
  • the steps of the method or the steps of the automatic scoring method described above are implemented when the processor executes the computer-readable instructions.
  • a computer-readable storage medium stores computer-readable instructions, the computer-readable instructions are executed by a processor to achieve the steps of the method for determining keywords, or the computer-readable instructions The steps of the above-mentioned automatic scoring method are realized when executed by the processor.
  • FIG. 1 is a schematic diagram of an application environment of a method for determining keywords or an automatic scoring method in an embodiment of the present application
  • FIG. 2 is an example diagram of a method for determining keywords in an embodiment of the present application
  • FIG. 3 is another example diagram of a method for determining keywords in an embodiment of the present application.
  • FIG. 4 is a functional block diagram of a keyword determining device in an embodiment of the present application.
  • FIG. 5 is another principle block diagram of an apparatus for determining keywords in an embodiment of the present application.
  • Fig. 6 is an example diagram of an automatic scoring method in an embodiment of the present application.
  • FIG. 7 is another example diagram of an automatic scoring method in an embodiment of the present application.
  • FIG. 8 is another example diagram of an automatic scoring method in an embodiment of the present application.
  • Fig. 9 is a functional block diagram of an automatic scoring device in an embodiment of the present application.
  • Fig. 10 is a schematic diagram of a computer device in an embodiment of the present application.
  • the embodiment of the present application provides a method for determining keywords, and the method for determining keywords can be applied to the application environment shown in FIG. 1.
  • the keyword determination method is applied in a keyword determination system.
  • the keyword determination system includes the client and server as shown in Figure 1.
  • the client and server communicate through the network for solving The problem of low generalization ability and low accuracy of keywords at the test site determined according to the scoring rule information.
  • the client is also called the client, which refers to the program that corresponds to the server and provides local services to the client.
  • the client can be installed on but not limited to various personal computers, notebook computers, smart phones, tablets and portable wearable devices.
  • the server can be implemented with an independent server or a server cluster composed of multiple servers.
  • a method for determining keywords is provided.
  • the method is applied to the server in FIG. 1 as an example for description, including the following steps:
  • each first sample answer data includes sample answer information and a first score value, and N is a positive integer.
  • the first sample answer data refers to the test taker's answer data.
  • Each first sample answer data includes sample answer information and a corresponding first score value, that is, the first sample answer data includes sample answer information and a corresponding first score value obtained after preliminary scoring of the sample answer information.
  • the sample answer information refers to the candidate's answer information of a certain subjective question obtained from the answer text of the scoring system.
  • the first sample answer data can be obtained from a scoring system. The scoring system can perform preliminary scoring on sample answer information and obtain the first scoring value.
  • the answer information written by the examinee on the paper answer sheet can also be obtained in advance, and then the answer information written by the examinee on the paper answer sheet is scanned and recognized, and the corresponding answer text is generated and submitted to the grading system to obtain sample answers information.
  • the first scoring value refers to the scoring value obtained after preliminary scoring of the sample answer information by manual scoring or computer scoring.
  • the first sample answer data may also be obtained by scanning and identifying the answer information written on the paper answer sheet and manual scoring.
  • the first sample answer data may include a sample answer information and a first scoring value obtained after preliminary grading of the sample answer information, and may also include multiple sample answer information and preliminary results for each sample answer information. Multiple corresponding first score values obtained after scoring.
  • the number of the first sample answer data obtained is N, where N is a positive integer.
  • N is a positive integer.
  • the specific value of N can be set according to actual needs. The higher the value of N, the higher the accuracy of subsequent sample keyword extraction, but the extraction efficiency will decrease, and the selection of N can be comprehensively considered in terms of accuracy and efficiency.
  • S12 Perform word segmentation processing on sample answer information of each first sample answer data to obtain sample segmentation of each first sample answer data.
  • the sample word segmentation refers to the individual word segmentation obtained after word segmentation processing is performed on the sample answer information of each first sample answer data.
  • performing word segmentation processing on the sample answer information of each first sample answer data includes: first adopting a word segmentation algorithm to perform vocabulary splitting on the sample answer information of each first sample answer data.
  • the word segmentation algorithm may adopt a word segmentation algorithm based on string matching, or a word segmentation algorithm based on understanding, or a word segmentation algorithm based on statistics.
  • the automatic split function of the sample answer information of each first sample answer data can also be realized through the split function of the Java language, or by importing the sample answer information into the computer's EXCEL or PPT and other software with automatic character splitting function. Split. Then filter the split sample answer information by using Java language regular expressions to filter out some specific words that have no meaning, such as: auxiliary words, modal particles or conjunctions, etc.; finally each first sample answer is obtained Sample segmentation of data.
  • the sample word segmentation set refers to the word segmentation set obtained by uniformly summarizing the sample word segmentation of each first sample answer data. Specifically, the sample word segmentation of each first sample answer data is obtained, and then the sample word segmentation of each first sample answer data obtained is summarized to obtain a sample word segmentation set. Preferably, if each first sample answer data contains multiple sample answer information, when the sample segmentation of each first sample answer data is summarized, the sample of each first sample answer data The answer information is summarized in units, that is, the sample word segmentation set is corresponding to each sample answer information.
  • summarizing the sample word segmentation of each first sample answer data includes: obtaining the sample word segmentation of the sample answer information in each first sample answer data, and then assign each obtained word in the order from smallest to largest All the sample word segmentation in this answer information are assigned corresponding identification numbers, and finally the sample word segmentation set distributed in the order from small to large is obtained.
  • the first sample answer data The sample word segmentation for deduplication is performed, and then the sample word segmentation of each first sample answer data after deduplication is summarized to obtain a sample word segmentation set.
  • the Count function, the Editor editor, or the R language can be used to de-duplicate the sample word segmentation of each first sample answer data.
  • the sample word segmentation of each first sample answer data can also be directly imported into the computer's EXCEL table, and the automatic deduplication of the sample word segmentation can be realized through the advanced screening function of EXCEL.
  • S14 Use the sample word segmentation set to perform feature transformation on the sample answer information of each first sample answer data to obtain sample training features.
  • the sample training feature refers to the result output after the feature conversion of the sample answer information of each first sample answer data.
  • a sample word segmentation set is used to transform the sample answer information of each first sample answer data to obtain sample training features.
  • the bag-of-words model refers to the specific situation that the sample answer information of each first sample answer data appears in the sample word segmentation set.
  • the establishment of the bag-of-words model can be achieved by using the CountVectorizer in SKLearn.
  • CountVectorizer is a common method of feature value calculation. For each training text, CountVectorizer only considers the frequency of each vocabulary in the training text. CountVectorizer can convert a document into a vector by counting, train the extracted vocabulary, and generate a CountVectorizerModel to store the corresponding Vocabulary vector space.
  • using the sample word segmentation set to perform feature transformation on the sample answer information of each first sample answer data includes: first establishing a set of word vectors based on the number of sample word segmentation sets, and then using the regular matching method to transform each The sample answer information of the first sample answer data is matched with all the sample word segmentation in the sample word segmentation set; if the sample answer information of the first sample answer data matches the sample word segmentation in the sample word segmentation set successfully, the corresponding element in the word vector The value is 1.
  • the sample answer information of the first sample answer data does not match the sample word segmentation in the sample word segmentation set, the corresponding element in the word vector is 0, and finally a set of word vectors composed of a number of 1s and 0s is obtained , which is the sample training feature.
  • sample segmentation set containing five sample segmentation words B 1 , B 2 , C 1 , C 2 , C 3 and two sample answer information of B and C are obtained; sample answer information B contains B 1 , B 2 two word segmentation, sample answer information C contains C 1 , C 2 , C 3 three word segmentation; then use this sample word segmentation set to transform the sample answer information B, the sample training feature is obtained as [1,1, 0,0,0]; After using the sample word segmentation set to transform the sample answer information C, the sample training feature is obtained as [0,0,1,1,1].
  • the regular matching method is used to test the application of regular expressions.
  • the regular expression is a logical formula for the operation of strings or special characters, which refers to the use of predefined specific characters and combinations of these specific characters , Compose a "rule string", this "rule string” is used to express a kind of filtering logic on the string.
  • a regular expression is a text pattern that describes one or more strings to be matched when searching for text.
  • the decision tree sample model refers to a sample model generated after training the decision tree model based on the characteristics of the bag-of-words model according to the sample training characteristics and the corresponding first score value.
  • the establishment process of the decision tree sample model includes: input the sample training features and the corresponding first score value into the decision tree model, and then train the decision tree model by using the C4.5 algorithm to generate the trained decision tree Sample model.
  • the C4.5 algorithm is a series of algorithms used in machine learning and data mining classification problems.
  • the goal of the C4.5 algorithm is supervised learning. Given a data set, each tuple in it can be described by a set of attribute values, and each tuple belongs to a certain category in a mutually exclusive category.
  • the C4.5 algorithm can find a mapping relationship from attribute values to categories through learning, and this mapping can be used to classify new entities with unknown categories.
  • the size of the decision tree sample model is determined by the depth of the decision tree and the number of node samples.
  • the maximum depth of the decision tree is set to 5
  • the minimum number of leaf node samples is set Is 50 and the classification standard is entropy.
  • the sample keyword refers to the characteristic attribute value corresponding to each output node of the decision tree sample model.
  • the extraction of sample keywords is also called the feature value extraction of the decision tree sample model. Since each feature of the decision tree sample model belongs to the decision attribute in the decision sample model, each feature value of the decision tree sample model corresponds to the branch of the decision attribute in the decision sample model. Understandably, the output node of each branch in the decision tree sample model has a corresponding sample keyword.
  • extracting sample keywords from the decision tree sample model can be achieved by first reading the decision tree sample model as a sourcable object, then coding the decision tree sample model through the tosource method, and then obtaining the decision tree sample model by analyzing the code structure The output sample keywords are finally extracted.
  • each first sample answer data includes sample answer information and a first score value
  • segmentation is performed on the sample answer information of each first sample answer data Process
  • get the sample word segmentation set and then use the sample word segmentation set to perform feature transformation on the sample answer information of each first sample answer data to obtain the sample training features, and then make the decision tree model based on the sample training features and the corresponding first score value Train to obtain the decision tree sample model, and finally extract the sample keywords from the decision tree sample model, which can not only improve the generalization ability and accuracy of keywords in the test site, ensure that the keywords are more comprehensive, but also improve the accuracy of subsequent scoring.
  • the method for determining the keywords further includes the following steps:
  • S17 Obtain scoring rule information, where the scoring rule information includes preset test sites and preset keywords corresponding to each preset test site.
  • the scoring rule information refers to the basic scoring basis provided by the business party, including preset test sites and preset keywords corresponding to each preset test site.
  • the preset test point refers to the knowledge point provided by the business side to judge whether the test taker’s answer information is correct.
  • the preset test sites include the wrong test sites for judging candidates' wrong answers and the correct test sites for judging candidates' correct answers.
  • the scoring rule information is a preliminary scoring standard, and there may be a problem that the keywords are not accurate or comprehensive.
  • the preset test site can be a word, a sentence, or a paragraph.
  • each preset test site may be given a different mark in advance.
  • each preset test site may be represented by at least one of Arabic numerals, English capital letters, or English lowercase letters.
  • Each preset test site contains corresponding preset keywords.
  • the preset keywords refer to words that are extracted from the preset test sites and can be directly used for rule quantification. Understandably, a preset test site contains at least one preset keyword.
  • the preset test site 1 is: Du Fu is a great realist poet in the Tang Dynasty; the preset keywords corresponding to the preset test site 1 can be "Du Fu", “Tang Dynasty", “realism” and "poet”.
  • the target keywords refer to keywords extracted from the sample keywords that are different from the preset keywords.
  • S19 Send the target keyword to the client, and obtain the test center label returned by the client according to the target keyword.
  • the test site label refers to a label assigned a corresponding identification number to the acquired target keyword according to a preset test site.
  • the user can analyze the acquired target keywords, and then assign each target keyword the same identification number as the corresponding preset test site according to the preset test site, to obtain the test site
  • the label is sent to the server.
  • the test center label corresponding to each target keyword may be uniformly generated and then sent to the server.
  • the target test site refers to the test site after adding the target keywords.
  • the server receives the test center label text sent from the client, it adds each target keyword to the corresponding preset test center with the same identification number according to the identification number corresponding to each target keyword in the test center label text .
  • the keywords contained in the target test site are richer and more comprehensive than the keywords contained in the preset test site.
  • preset test site 1 includes three preset keywords a 1 , a 2 , and a 3
  • preset test site 2 includes three preset keywords, respectively B 1 , b 2 , b 3
  • the target keywords obtained in step S18 are a 4 , a 5 , b 4 , b 5
  • the target keywords a 4 , a 5 are assigned to the test site label as 1
  • the target key The words b 4 , b 5 are assigned to the test site label as 2.
  • a 4 , a 5 are added to the default test site 1
  • b 4 , b 5 are added to the default test site 2
  • the target test site 1 is obtained
  • the keywords included are a 1 , a 2 , a 3 , a 4 , and a 5
  • the keywords included in the target test site 2 are b 1 , b 2 , b 3 , b 4 , and b 5 .
  • the scoring rule information includes the preset test sites and the preset keywords corresponding to each preset test site.
  • the keywords that are repeated with the preset keywords are removed from the sample keywords to obtain the target Keywords, send the target keywords to the client, and then obtain the test center tags returned by the client according to the target keywords, and finally add each target keyword to the corresponding preset test center according to the test center tags to obtain the target test center; further enriched Keywords contained in the test site determined according to the scoring rule information.
  • a keyword determining device is provided, and the keyword determining device corresponds to the keyword determining method in the foregoing embodiment in a one-to-one correspondence.
  • the keyword determination device includes a first sample answer data acquisition module 11, a word segmentation processing module 12, a total vocabulary module 13, a sample feature conversion module 14, a decision tree sample model training module 15 and a sample key Word extraction module 16.
  • the detailed description of each functional module is as follows:
  • the first sample answer data acquisition module 11 is used to acquire N first sample answer data, each first sample answer data includes sample answer information and a first score value, and N is a positive integer;
  • the word segmentation processing module 12 is used to perform word segmentation processing on the sample answer information of each first sample answer data to obtain the sample word segmentation of each first sample answer data;
  • the total vocabulary module 13 is used to summarize the sample word segmentation of each first sample answer data to obtain a sample word segmentation set
  • the sample feature conversion module 14 is used to use the sample word segmentation set to perform feature conversion on the sample answer information of each first sample answer data to obtain sample training features;
  • the decision tree sample model training module 15 is used to train the decision tree model according to the sample training characteristics and the corresponding first score value to obtain the decision tree sample model;
  • the sample keyword extraction module 16 is used to extract sample keywords from the decision tree sample model.
  • the keyword determining device further includes:
  • the scoring rule information obtaining module 17 is used to obtain scoring rule information, the scoring rule information includes preset test sites and preset keywords corresponding to each preset test site;
  • the repetitive keyword removal module 18 is used to remove keywords that are repeated with preset keywords from the sample keywords to obtain target keywords;
  • the test center label obtaining module 19 is used to send the target keyword to the client, and obtain the test center label returned by the client according to the target keyword;
  • the target keyword adding module 20 is used to add each target keyword to the corresponding preset test site according to the test site tag to obtain the target test site.
  • the various modules in the device for determining the above keywords can be implemented in whole or in part by software, hardware, and combinations thereof.
  • the above-mentioned modules may be embedded in the form of hardware or independent of the processor in the computer equipment, or may be stored in the memory of the computer equipment in the form of software, so that the processor can call and execute the operations corresponding to the above-mentioned modules.
  • the embodiment of the present application also provides an automatic scoring method, which can be applied in the application environment shown in FIG. 1.
  • the automatic scoring method is applied in an automatic scoring system.
  • the automatic scoring system includes a client and a server as shown in FIG. 1.
  • the client and the server communicate through the network to solve the problem of the test taker’s answer. Make efficient and accurate scoring questions.
  • the client is also called the client, which refers to the program that corresponds to the server and provides local services to the client.
  • the client can be installed on but not limited to various personal computers, notebook computers, smart phones, tablets and portable wearable devices.
  • the server can be implemented with an independent server or a server cluster composed of multiple servers.
  • an automatic scoring method is provided, and the method is applied to the server in FIG. 1 as an example for description, including the following steps:
  • the answer information to be graded refers to the answer information obtained from the test taker's answer text.
  • the answer information of any examinee can be obtained directly from the answer text of the grading system, or the answer information written by any examinee on the paper answer sheet is scanned and recognized.
  • S22 Perform keyword extraction on the response information to be scored to obtain core keywords.
  • the core keywords refer to individual keywords extracted from the obtained answer information to be scored.
  • performing keyword extraction on the response information to be scored includes: first adopting a word segmentation algorithm to perform vocabulary splitting on the response information to be scored.
  • the word segmentation algorithm may adopt a word segmentation algorithm based on string matching, or a word segmentation algorithm based on understanding, or a word segmentation algorithm based on statistics.
  • the automatic splitting of the response information to be scored can also be realized through the split function of the Java language, or by importing the response information to be scored into the computer's EXCEL or PPT software with automatic character splitting function.
  • the regular expressions of the Java language are used to filter the split answering information to be scored, and some specific words that have no meaning, such as auxiliary words, modal particles, or conjunctions, are filtered out.
  • the words obtained after screening are extracted as core keywords.
  • the number of core keywords should be no less than one.
  • S23 Use the target test site to perform feature transformation on the core keywords to obtain the features of the test site to be scored; wherein the target test site is obtained by using the above keyword determination method.
  • the feature of the test site to be scored refers to a feature that measures the similarity between the core keywords and the keywords in the target test site.
  • the target test point is obtained by using the method for determining keywords in the above embodiment.
  • target test sites to perform feature transformation of core keywords includes: first establish a set of test site vectors based on the number of target test sites, and then match each core keyword with the keywords in the target test site through the regular matching method. Matching results, to determine whether the core keywords match the target test site. Specifically, judging whether the core keyword matches the target test site can be judged according to the degree of matching between the core keyword and the keywords contained in the target test site. It can be that as long as the core keyword matches any one of the keywords in the target test site, the core keyword is considered to match the corresponding target test site, or it can be that the core keyword matches at least two keywords in the corresponding target test site. Match, it is considered that the core keyword matches the corresponding target test site.
  • the specific settings can be customized according to the actual situation.
  • the core keyword matches any keyword in the target test site successfully it means that the core keyword matches the target test site, and the corresponding element value in the test site vector is 1, if the core keyword matches the target test site If none of the keywords in the test site match, it means that the core keyword fails to match the target test site, and the corresponding element value in the test site vector is 0.
  • a set of test site vectors consisting of a number of 1s and 0s is obtained, that is, the characteristics of test sites to be scored.
  • S24 Input the characteristics of the test point to be scored into the preset decision tree reference model to obtain the accurate score of the answer information to be scored.
  • the accurate score refers to the score obtained by training the decision tree reference model for the answer information to be scored.
  • the decision tree reference model is pre-established and stored in the backend database of the server. After step S23 is executed and the characteristics of the test points to be scored are obtained, it can be directly retrieved from the database of the server.
  • the decision tree reference model refers to the decision tree to obtain the probability that the expected value of the net present value is greater than or equal to zero based on the known probability of various situations. It belongs to a tree structure, in which each internal node Represents a test on an attribute, each branch represents a test output, and each leaf node represents a category.
  • the target test site is used to transform the core keywords to obtain the characteristics of the test site to be scored; where the target test site is It is obtained by using the above keyword determination method; finally, the characteristics of the test points to be scored are input into the preset decision tree reference model to obtain the accurate score of the answer information to be scored; efficient and accurate scoring of the candidate's answer information is realized.
  • using target test sites to perform feature transformation on core keywords to obtain the features of test sites to be scored includes the following steps:
  • effective keywords refer to all the keywords contained in the target test site.
  • the keywords corresponding to each target test site have been determined. Therefore, the effective keywords corresponding to the target test site can be obtained directly according to each target test site and from each target test site. Obtain the corresponding valid keywords in.
  • the one-to-one matching of valid keywords with core keywords refers to defining valid keywords as specific characters, and then combining these specific characters composed of valid keywords into a "rule string", using To express a filtering logic for core keywords, so as to match the core keywords corresponding to the effective keywords, and obtain keyword matching information.
  • the keyword matching information refers to the matching result obtained after matching the effective keyword with the core keyword, including matching success and matching failure.
  • the effective keywords are matched with the core keywords one by one, and the corresponding keyword matching information is obtained according to the matching result. For example: if 10 core keywords and 5 effective keywords are obtained, any core keyword is extracted, and the 5 effective keywords obtained are matched one by one through the regular matching method.
  • the matching process as long as the core keyword If the keyword matches any one of the obtained 5 effective keywords, it means the matching is successful.
  • the core keyword does not match the obtained 5 effective keywords, it means the matching failed; according to the above steps Extract the core keywords one by one, and use the regular matching method to match the extracted core keywords with the obtained 5 effective keywords one by one, until the obtained 10 core keywords are matched with the obtained 5 effective keywords one by one Complete, finally get keyword matching information.
  • S233 Assign a corresponding matching identifier to each core keyword according to the keyword matching information.
  • the matching identifier refers to a type of identifier assigned to each core keyword according to the keyword matching information, which can be Arabic numerals, uppercase letters, or lowercase letters.
  • the matching identifier reflects the matching situation between the core keyword and the target keyword.
  • the test point corresponding to the effective keyword needs to be clarified. Therefore, when the core keyword that successfully matches the effective keyword is assigned a matching mark, the mark The test site identification corresponding to the valid keyword. This scheme does not impose any restrictions on the specific matching identification.
  • the core keyword that successfully matches the effective keyword is assigned a capital letter logo and a corresponding test site logo, for example, A1, and a capital letter A indicates a successful match with the effective keyword
  • 1 represents the test site identifier corresponding to the valid keyword
  • the core keyword that fails to match the valid keyword is only given a lowercase letter identifier, for example, a, and the lowercase letter a indicates that the valid keyword fails to match.
  • S234 Obtain the feature of the test site to be scored according to the matching identifier of each core keyword.
  • each core keyword it is determined whether the core keyword matches the corresponding target test site successfully. If the core keyword matches the target test site successfully, the corresponding element value in the test site vector is 0, if If the core keyword fails to match the target test site, the corresponding element value in the test site vector is 0, and finally a set of test site vectors composed of a number of 1s and 0s is obtained, that is, the test site features to be scored.
  • each target test site contains at least 1 valid keyword and 5 core keywords; according to the regular matching method, the 5 core keywords are combined with the effective key of the target test site After the words are matched one by one, only the first three core keywords are successfully matched with the target test site, and the test site feature to be scored is [1,1,1,0,0,0].
  • the effective keywords corresponding to the target test site are obtained; the effective keywords are matched with the core keywords one by one through the regular matching method to obtain keyword matching information; then according to the keyword matching information, each A core keyword is assigned a corresponding matching identifier, and finally, according to the matching identifier of each core keyword, the characteristics of the test site to be scored are obtained; further ensuring the accuracy and effectiveness of the newly added test site keywords.
  • the automatic scoring method further includes:
  • the second sample answer data refers to the test taker's answer data.
  • Each second sample answer data includes original answer information and a second score value; that is, the second sample answer data includes original answer information and a second score value obtained after preliminary grading of the original answer information.
  • the second sample answer data can be obtained from a scoring system.
  • the scoring system can perform preliminary scoring on the original answer information and obtain the second scoring value.
  • the original answer information refers to the candidate's answer information of a certain subjective question obtained from the answer text of the scoring system.
  • the second scoring value refers to the scoring value obtained by preliminary scoring the original answer information in advance by means of manual scoring or computer scoring.
  • the number of the second sample answer data obtained is M, where M is a positive integer.
  • M is a positive integer.
  • the specific value of M can be set according to actual needs. The higher the value of M, the higher the accuracy of the subsequent decision tree reference model, but the extraction efficiency will decrease. The accuracy and efficiency can be comprehensively considered to select M.
  • S242 Use the target test site to perform feature transformation on the original answer information of each second sample of answer data to obtain training features of the test site.
  • the test site training feature refers to a feature that measures the similarity between the target test site and the original answer information of each second sample of answer data.
  • the target test site is obtained by using the above-mentioned keyword determination method.
  • using the target test site to perform feature transformation on the original answer information of each second sample answer data includes: first, based on the number of target test sites, establish a set of empty test site vectors, and then use the synonym word forest semantic code to convert each
  • the original answer information of the second sample of answer data is compared with the target test site; if the original answer information matches any target test site successfully, the corresponding element in the test site vector is 1. If the original answer information matches any target test site If none of them match, the corresponding element value in the test site vector is 0, and finally a set of test site vectors consisting of several 1's and 0's is obtained, that is, the test site training feature.
  • the synonym word forest semantic code is a method used to calculate the similarity between words.
  • the test site sample set refers to the sample data to be input into the decision tree model for training; it includes the test site training features and the corresponding second score value.
  • the test site sample set is a data set composed of several test site samples, and the test site samples include test site training features and a second score value corresponding to the test site training features. Understandably, each test site training feature is associated with the corresponding second score value.
  • the decision tree reference model is a predictive model, which represents a mapping relationship between object attributes and object values.
  • Each node in the decision tree represents an object, and each bifurcation path represents a certain possibility.
  • Each leaf node corresponds to the value of the object represented by the path from the root node to the leaf node.
  • the decision tree model is trained according to the test site sample set, and the decision tree reference model is obtained. After the test site training characteristics and the corresponding second score value are input into the decision tree model, the decision tree model is performed by using the C4.5 algorithm Training, generate the trained decision tree sample model.
  • the test site sample set is divided into a training set for modeling and a test set for verifying the effect of the model.
  • the training set refers to the data set used to build the decision tree sample model.
  • the test set refers to the data set used to verify the effect of the established decision tree sample model.
  • 75% of the acquired test site sample set is used as the training set, and 25% of the acquired test site sample set is used as the test set.
  • each second sample answer data includes the original answer information and the second score value, and then use the target test site to perform the original answer information of each second sample answer data Feature transformation, the test site training characteristics are obtained, and finally the decision tree model is trained according to the test site training characteristics and the corresponding second score value to obtain the decision tree reference model; further ensuring the accuracy of the score of the candidate’s answer information through the decision tree reference model Sex.
  • an automatic scoring device is provided, and the automatic scoring device corresponds to the automatic scoring method in the above-mentioned embodiment one-to-one.
  • the automatic scoring device includes a module 21 for obtaining answer information to be scored, a keyword extraction module 22, a feature conversion module 23 for the test site to be scored, and an input module 24.
  • the detailed description of each functional module is as follows:
  • the answer information obtaining module 21 to be graded is used to obtain answer information to be graded
  • the keyword extraction module 22 is used for keyword extraction on the answer information to be scored to obtain core keywords
  • the feature conversion module 23 of the test point to be scored is used to transform the core keywords by using the target test point to obtain the feature of the test point to be scored; wherein the target test point is obtained by the keyword determination method;
  • the input module 24 is used to input the characteristics of the test site to be scored into the preset decision tree reference model to obtain the accurate score of the answer information to be scored.
  • the feature conversion module 23 of the test point to be scored includes:
  • the effective keyword acquisition unit is used to obtain the effective keywords corresponding to the target test site
  • the matching unit is used to match the effective keywords with the core keywords one by one through the regular matching method to obtain keyword matching information
  • the allocation unit is used to allocate a corresponding matching identifier for each core keyword according to the keyword matching information
  • the obtaining unit is used to obtain the feature of the test point to be scored according to the matching identifier of each core keyword.
  • the input module 24 includes:
  • the second sample answer data acquisition unit is used to acquire M second sample answer data, each second sample answer data includes original answer information and a second score value, and M is a positive integer;
  • the test site feature transformation unit is used to use the target test site to perform feature transformation on the original answer information of each second sample answer data to obtain the test site training features;
  • the constituent unit is used to form the test site sample set by the test site training features and the corresponding second score value
  • the decision tree reference model training unit is used to train the decision tree model according to the test site sample set to obtain the decision tree reference model.
  • Each module in the above-mentioned automatic scoring device can be implemented in whole or in part by software, hardware, and a combination thereof.
  • the above-mentioned modules may be embedded in the form of hardware or independent of the processor in the computer equipment, or may be stored in the memory of the computer equipment in the form of software, so that the processor can call and execute the operations corresponding to the above-mentioned modules.
  • a computer device is provided.
  • the computer device may be a server, and its internal structure diagram may be as shown in FIG. 10.
  • the computer device includes a processor, memory, network interface, and database connected by a system bus. Among them, the processor of the computer device is used to provide computing and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium stores an operating system, computer-readable instructions, and a database.
  • the internal memory provides an environment for the operation of the operating system and computer-readable instructions in the non-volatile storage medium.
  • the database of the computer device is used to store the data used in the above-mentioned keyword determination method and the above-mentioned automatic scoring method.
  • the network interface of the computer device is used to communicate with external terminals through a network connection.
  • the computer-readable instruction is executed by the processor to implement a method for determining keywords, or the computer-readable instruction is executed by the processor to implement an automatic scoring method.
  • a computer device including a memory, a processor, and computer-readable instructions stored in the memory and running on the processor, and the processor implements the following steps when the processor executes the computer-readable instructions:
  • each of the first sample answer data includes sample answer information and a first score value, and N is a positive integer;
  • sample word segmentation set to perform feature conversion on the sample answer information of each of the first sample answer data to obtain sample training features
  • a computer device including a memory, a processor, and computer-readable instructions stored in the memory and running on the processor, and the processor implements the following steps when the processor executes the computer-readable instructions:
  • the target test site uses the target test site to perform feature transformation on the core keywords to obtain the features of the test site to be scored; wherein, the target test site is obtained by using the keyword determination method of claim 2;
  • the characteristics of the test point to be scored are input into a preset decision tree reference model to obtain the accurate score of the answer information to be scored.
  • one or more non-volatile readable storage media storing computer readable instructions, when the computer readable instructions are executed by one or more processors, cause the one or more processing
  • the device performs the following steps:
  • each of the first sample answer data includes sample answer information and a first score value, and N is a positive integer;
  • sample word segmentation set to perform feature conversion on the sample answer information of each of the first sample answer data to obtain sample training features
  • one or more non-volatile readable storage media storing computer readable instructions, when the computer readable instructions are executed by one or more processors, cause the one or more processing
  • the device performs the following steps:
  • the target test site uses the target test site to perform feature transformation on the core keywords to obtain the features of the test site to be scored; wherein, the target test site is obtained by using the keyword determination method of claim 2;
  • the characteristics of the test point to be scored are input into a preset decision tree reference model to obtain the accurate score of the answer information to be scored.
  • Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory can include random access memory (RAM) or external cache memory.
  • RAM random access memory
  • DRAM dynamic RAM
  • SDRAM synchronous DRAM
  • DDRSDRAM double data rate SDRAM
  • ESDRAM enhanced SDRAM
  • SLDRAM synchronous chain (Synchlink) DRAM
  • RDRAM direct RAM
  • DRAM direct memory bus dynamic RAM
  • RDRAM memory bus dynamic RAM

Abstract

A keyword determination method, an automatic scoring method, apparatus and device, and a medium. The keyword determination method comprises: acquiring first sample answer data, and performing word segmentation processing and summarization on sample answer information in the first sample answer data to obtain a set of segmented sample words; performing feature conversion on the sample answer information to obtain a sample training feature; training a decision tree model according to the sample training feature and a first score to obtain a decision tree sample model; and extracting a sample keyword from the decision tree sample model. The automatic scoring method comprises: extracting a keyword from answer information to be scored to obtain a core keyword; and performing feature conversion on the core keyword by means of a target examination point to obtain an examination point feature to be scored, and then inputting same into a decision tree reference model to obtain an accurate score of the answer information to be scored. Thus, the keyword generalization capability and accuracy are improved; and efficient and accurate scoring of answer content of an examinee is realized.

Description

关键词的确定方法、自动评分方法、装置、设备及介质Keyword determination method, automatic scoring method, device, equipment and medium
本申请以2019年1月18日提交的申请号为201910049180.5,名称为“一种关键词的确定方法、自动评分方法、装置、设备及存储介质”的中国发明专利申请为基础,并要求其优先权。This application is based on the Chinese invention patent application filed on January 18, 2019 with the application number 201910049180.5, titled "a method for determining keywords, automatic scoring methods, devices, equipment and storage media", and requires priority right.
技术领域Technical field
本申请涉及智能决策领域,尤其涉及一种关键词的确定方法、自动评分方法、装置、设备及存储介质。This application relates to the field of intelligent decision-making, and in particular to a method for determining keywords, an automatic scoring method, device, equipment, and storage medium.
背景技术Background technique
随着社会的发展,竞争越来越大,而考试逐渐成为了衡量一个人学到多少知识掌握多少技能的一项常规手段,因此一系列对考生的答题内容进行评分的系统也随着考试的盛行而发展起来。随着计算机技术的发展,对于考生客观题的答题内容已经可实现全自动的计算机在线阅卷并实时评分,但是基于主观题存在一定的随机性以及记忆成分,若根据同样的评分方法,采用计算机对考生主观题的答题内容进行评分,则极易发生误判或者产生误差。此外,如果采用人工阅卷,当考生人群数量较大时,人工评分的工作量就会变得非常大,操作也会变得非常困难。目前,对考生主观题的答题内容进行评分的方法通常是预先通过评分规则信息,人工确立考点内容以及相关关键词,然后根据考点内容以及相关关键词,通过正则匹配方法识别答题内容,对考生的答题内容进行评分。然而,只通过评分规则信息而不考虑其它考生对同一主观题的答题情况,所确定的考点以及相关关键词不但泛化能力较低,且准确性不高。因此也会导致在后续对考生的答题内容进行评分时,得出的最终评分结果出现偏差,而无法体现考生的真实水平。With the development of society, competition has become greater and greater, and examinations have gradually become a conventional means of measuring how much knowledge and skills a person has learned. Therefore, a series of systems for scoring test takers’ answer content also follow the examination. Prevail and develop. With the development of computer technology, fully automatic computer online scoring and real-time scoring of test takers’ objective questions can be achieved. However, subjective questions have certain randomness and memory elements. If the same scoring method is used, a computer Candidates' subjective questions are scored for the answer content, it is very easy to make misjudgments or errors. In addition, if manual scoring is used, when the number of candidates is large, the workload of manual scoring will become very large and the operation will become very difficult. At present, the method of scoring the content of the test takers’ subjective questions is usually to manually establish the content of the test site and related keywords through the information of the grading rules in advance, and then identify the content of the test based on the content of the test site and related keywords through the regular matching method to identify the content of the test, which is The content of the answer is scored. However, only through the grading rule information without considering the answers of other candidates to the same subjective question, the determined test sites and related keywords not only have low generalization ability, but also have low accuracy. Therefore, the final grading results obtained when grading the test takers’ answer content will be biased, which cannot reflect the real level of the examinees.
发明内容Summary of the invention
本申请实施例提供一种关键词的确定方法、装置、设备及存储介质,以解决关键词泛化能力低、准确性不高的问题。The embodiments of the present application provide a method, device, device, and storage medium for determining keywords to solve the problem of low keyword generalization ability and low accuracy.
本申请实施例提供一种自动评分方法、装置、设备及存储介质,以解决无法对考生的答题内容进行高效准确评分的问题。The embodiments of the present application provide an automatic scoring method, device, equipment, and storage medium to solve the problem that the test taker’s answer content cannot be efficiently and accurately scored.
一种关键词的确定方法,包括:A method for determining keywords, including:
获取N个第一样本答题数据,每一所述第一样本答题数据包括样本答题信息和第一评分值,N为正整数;Acquiring N first sample answer data, each of the first sample answer data includes sample answer information and a first score value, and N is a positive integer;
对每一所述第一样本答题数据的所述样本答题信息进行分词处理,得到每一所述第一样本答题数据的样本分词;Performing word segmentation processing on the sample answer information of each of the first sample answer data to obtain sample word segmentation of each of the first sample answer data;
对每一所述第一样本答题数据的所述样本分词进行汇总,得到样本分词集;Summarize the sample word segmentation of each of the first sample answer data to obtain a sample word segmentation set;
采用所述样本分词集对每一所述第一样本答题数据的所述样本答题信息进行特征转化,得到样本训练特征;Using the sample word segmentation set to perform feature conversion on the sample answer information of each of the first sample answer data to obtain sample training features;
根据所述样本训练特征和对应的第一评分值对决策树模型进行训练,得到决策树样本模型;Training the decision tree model according to the sample training feature and the corresponding first score value to obtain the decision tree sample model;
从所述决策树样本模型中提取样本关键词。Extract sample keywords from the decision tree sample model.
一种自动评分方法,包括:An automatic scoring method including:
获取待评分答题信息;Get information about the answer to be graded;
对所述待评分答题信息进行关键词提取,得到核心关键词;Perform keyword extraction on the answer information to be scored to obtain core keywords;
采用目标考点对所述核心关键词进行特征转化,得到待评分考点特征;其中,所述目标考点是采用权利要求2所述的关键词的确定方法所得的;Use the target test site to perform feature transformation on the core keywords to obtain the features of the test site to be scored; wherein, the target test site is obtained by using the keyword determination method of claim 2;
将所述待评分考点特征输入到预设的决策树参考模型中,得到所述待评分答题信息的准确分值。The characteristics of the test point to be scored are input into a preset decision tree reference model to obtain the accurate score of the answer information to be scored.
一种关键词的确定装置,包括:A keyword determining device includes:
第一样本答题数据获取模块,用于获取N个第一样本答题数据,每一所述第一样本答题数据包括样本答题信息和第一评分值,N为正整数;The first sample answer data acquisition module is used to acquire N first sample answer data, each of the first sample answer data includes sample answer information and a first score value, and N is a positive integer;
分词处理模块,用于对每一所述第一样本答题数据的所述样本答题信息进行分词处理,得到每一所述第一样本答题数据的样本分词;The word segmentation processing module is configured to perform word segmentation processing on the sample answer information of each of the first sample answer data to obtain the sample word segmentation of each of the first sample answer data;
分词汇总模块,用于对每一所述第一样本答题数据的所述样本分词进行汇总,得到样本分词集;The total vocabulary segmentation module is used to summarize the sample segmentation of each of the first sample answer data to obtain a sample segmentation set;
样本特征转化模块,用于采用所述样本分词集对每一所述第一样本答题数据的所述样本答题信息进行特征转化,得到样本训练特征;A sample feature conversion module, configured to use the sample word segmentation set to perform feature conversion on the sample answer information of each of the first sample answer data to obtain sample training features;
决策树样本模型训练模块,用于根据所述样本训练特征和对应的第一评分值对决策树模型进行训练,得到决策树样本模型;The decision tree sample model training module is used to train the decision tree model according to the sample training feature and the corresponding first score value to obtain the decision tree sample model;
样本关键词提取模块,用于从所述决策树样本模型中提取样本关键词。The sample keyword extraction module is used to extract sample keywords from the decision tree sample model.
一种自动评分装置,包括:An automatic scoring device, including:
待评分答题信息获取模块,用于获取待评分答题信息;To-be-graded answer information acquisition module, used to obtain the to-be-graded answer information;
关键词提取模块,用于对所述待评分答题信息进行关键词提取,得到核心关键词;The keyword extraction module is used to extract keywords from the answer information to be scored to obtain core keywords;
待评分考点特征转化模块,用于采用目标考点对所述核心关键词进行特征转化,得到待评分考点特征;其中,所述目标考点是采用权利要求2所述的关键词的确定方法所得的;The feature conversion module of the test point to be scored is used to transform the core keywords with the target test point to obtain the feature of the test point to be scored; wherein, the target test point is obtained by using the method for determining keywords according to claim 2;
输入模块,用于将所述待评分考点特征输入到预设的决策树参考模型中,得到所述待评分答题信息的准确分值。The input module is used to input the characteristics of the test site to be scored into a preset decision tree reference model to obtain an accurate score of the answer information to be scored.
一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时实现上述关键词的确定方法的步骤,或所述处理器执行所述计算机可读指令时实现上述自动评分方法的步骤。A computer device including a memory, a processor, and computer-readable instructions stored in the memory and capable of running on the processor, and the processor realizes the determination of the above-mentioned keywords when the computer-readable instructions are executed The steps of the method or the steps of the automatic scoring method described above are implemented when the processor executes the computer-readable instructions.
一种计算机可读存储介质,所述计算机可读存储介质存储有计算机可读指令,所述计算机可读指令被处理器执行时实现上述关键词的确定方法的步骤,或所述计算机可读指令被处理器执行时实现上述自动评分方法的步骤。A computer-readable storage medium, the computer-readable storage medium stores computer-readable instructions, the computer-readable instructions are executed by a processor to achieve the steps of the method for determining keywords, or the computer-readable instructions The steps of the above-mentioned automatic scoring method are realized when executed by the processor.
本申请的一个或多个实施例的细节在下面的附图和描述中提出,本申请的其他特征和优点将从说明书、附图以及权利要求变得明显。The details of one or more embodiments of the present application are set forth in the following drawings and description, and other features and advantages of the present application will become apparent from the description, drawings, and claims.
附图说明BRIEF DESCRIPTION
为了更清楚地说明本申请实施例的技术方案,下面将对本申请实施例的描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to more clearly explain the technical solutions of the embodiments of the present application, the following will briefly introduce the drawings required in the description of the embodiments of the present application. Obviously, the drawings in the following description are only some embodiments of the present application For those of ordinary skill in the art, without paying creative labor, other drawings can also be obtained based on these drawings.
图1是本申请一实施例中关键词的确定方法或自动评分方法的一应用环境示意图;FIG. 1 is a schematic diagram of an application environment of a method for determining keywords or an automatic scoring method in an embodiment of the present application;
图2是本申请一实施例中关键词的确定方法的一示例图;2 is an example diagram of a method for determining keywords in an embodiment of the present application;
图3是本申请一实施例中关键词的确定方法的另一示例图;FIG. 3 is another example diagram of a method for determining keywords in an embodiment of the present application;
图4是本申请一实施例中关键词的确定装置的一原理框图;FIG. 4 is a functional block diagram of a keyword determining device in an embodiment of the present application;
图5是本申请一实施例中关键词的确定装置的另一原理框图;FIG. 5 is another principle block diagram of an apparatus for determining keywords in an embodiment of the present application;
图6是本申请一实施例中自动评分方法的一示例图;Fig. 6 is an example diagram of an automatic scoring method in an embodiment of the present application;
图7是本申请一实施例中自动评分方法的另一示例图;FIG. 7 is another example diagram of an automatic scoring method in an embodiment of the present application;
图8是本申请一实施例中自动评分方法的另一示例图;FIG. 8 is another example diagram of an automatic scoring method in an embodiment of the present application;
图9是本申请一实施例中自动评分装置的一原理框图;Fig. 9 is a functional block diagram of an automatic scoring device in an embodiment of the present application;
图10是本申请一实施例中计算机设备的一示意图。Fig. 10 is a schematic diagram of a computer device in an embodiment of the present application.
具体实施方式detailed description
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The technical solutions in the embodiments of the present application will be described clearly and completely below with reference to the drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, but not all the embodiments. Based on the embodiments in the present application, all other embodiments obtained by a person of ordinary skill in the art without creative work fall within the scope of protection of the present application.
本申请实施例提供一关键词的确定方法,该关键词的确定方法可应用如图1所示的应用环境中。具体地,该关键词的确定方法应用在关键词的确定系统中,该关键词的确定系统包括如图1所示的客户端和服务端,客户端与服务端通过网络进行通信,用于解决根据评分规则信息所确定的考点关键词泛化能力低、准确性不高的问题。其中,客户端又称为用户端,是指与服务端相对应,为客户提供本地服务的程序。客户端可安装在但不限于各种个人计算机、笔记本电脑、智能手机、平板电脑和便携式可穿戴设备上。服务端可以用独立的服务器或者是多个服务器组成的服务器集群来实现。The embodiment of the present application provides a method for determining keywords, and the method for determining keywords can be applied to the application environment shown in FIG. 1. Specifically, the keyword determination method is applied in a keyword determination system. The keyword determination system includes the client and server as shown in Figure 1. The client and server communicate through the network for solving The problem of low generalization ability and low accuracy of keywords at the test site determined according to the scoring rule information. Among them, the client is also called the client, which refers to the program that corresponds to the server and provides local services to the client. The client can be installed on but not limited to various personal computers, notebook computers, smart phones, tablets and portable wearable devices. The server can be implemented with an independent server or a server cluster composed of multiple servers.
在一实施例中,如图2所示,提供一种关键词的确定方法,以该方法应用在图1中的服务端为例进行说明,包括如下步骤:In an embodiment, as shown in FIG. 2, a method for determining keywords is provided. The method is applied to the server in FIG. 1 as an example for description, including the following steps:
S11:获取N个第一样本答题数据,每一第一样本答题数据包括样本答题信息和第一评分值,N为正整数。S11: Acquire N first sample answer data, each first sample answer data includes sample answer information and a first score value, and N is a positive integer.
其中,第一样本答题数据指考生的答题数据。每一第一样本答题数据包括样本答题信息和对应的第一评分值,即第一样本答题数据中包括样本答题信息和该样本答题信息进行初步评分后所得的对应的第一评分值。其中,样本答题信息指从评分系统的答题文本上获取的某一主观题的考生的答题信息。可选地,该第一样本答题数据可以从一评分系统中获取。该评分系统可以对样本答题信息进行初步评分,得到第一评分值。优选地,还可预先获取考生在纸质答卷上写入的答题信息,然后对考生在纸质答卷上写入的答题信息进行扫描识别,生成对应的答题文本提交到评分系统上,得到样本答题信息。第一评分值指通过人工评分或者计算机评分的方式,预先对样本答题信息进行初步评分后所得的评分值。该第一样本答题数据还可以是将纸质答卷上写入的答题信息以及人工评分进行扫描识别后获取得到。另外地,第一样本答题数据可以包括一个样本答题信息和对该样本答题信息进行初步评分后所得的一个第一评分值,还可以包括多个样本答题信息和对每一样本答题信息进行初步评分后所得的对应的多个第一评分值。Among them, the first sample answer data refers to the test taker's answer data. Each first sample answer data includes sample answer information and a corresponding first score value, that is, the first sample answer data includes sample answer information and a corresponding first score value obtained after preliminary scoring of the sample answer information. Among them, the sample answer information refers to the candidate's answer information of a certain subjective question obtained from the answer text of the scoring system. Optionally, the first sample answer data can be obtained from a scoring system. The scoring system can perform preliminary scoring on sample answer information and obtain the first scoring value. Preferably, the answer information written by the examinee on the paper answer sheet can also be obtained in advance, and then the answer information written by the examinee on the paper answer sheet is scanned and recognized, and the corresponding answer text is generated and submitted to the grading system to obtain sample answers information. The first scoring value refers to the scoring value obtained after preliminary scoring of the sample answer information by manual scoring or computer scoring. The first sample answer data may also be obtained by scanning and identifying the answer information written on the paper answer sheet and manual scoring. In addition, the first sample answer data may include a sample answer information and a first scoring value obtained after preliminary grading of the sample answer information, and may also include multiple sample answer information and preliminary results for each sample answer information. Multiple corresponding first score values obtained after scoring.
获取的第一样本答题数据的数量为N个,其中,N为正整数。而N的具体数值可以根据实际需要进行设定。N的数值越高,后续样本关键词提准的准确性会越高,然而提取效率会有所下降,可以在准确度和效率上进行综合考虑进行对N的选取。The number of the first sample answer data obtained is N, where N is a positive integer. The specific value of N can be set according to actual needs. The higher the value of N, the higher the accuracy of subsequent sample keyword extraction, but the extraction efficiency will decrease, and the selection of N can be comprehensively considered in terms of accuracy and efficiency.
S12:对每一第一样本答题数据的样本答题信息进行分词处理,得到每一第一样本答题数据的样本分词。S12: Perform word segmentation processing on sample answer information of each first sample answer data to obtain sample segmentation of each first sample answer data.
其中,样本分词指对每一第一样本答题数据的样本答题信息进行分词处理后,所得到的一个个独立的分词。具体地,对每一第一样本答题数据的样本答题信息进行分词处理包括:先采用分词算法对每一第一样本答题数据的样本答题信息进行词汇拆分。可选地,分词算法可以采用基于字符串匹配的分词算法,或者采用基于理解的分词算法,还可以采用基于统计的分词算法实现。优选地,还可通过Java语言的拆分函数,或者通过将样本答题信息导入计算机的EXCEL或PPT等具有自动拆分字符功能的软件实现对每一第一样本答题数据的样本答题信息的自动拆分。然后通过采用Java语言的正则表达式对拆分后的样本答题信息进行筛选,过滤掉一些不具有意义的特定词语,例如:助词、语气词或者连接 词等;最后得到每一第一样本答题数据的样本分词。Among them, the sample word segmentation refers to the individual word segmentation obtained after word segmentation processing is performed on the sample answer information of each first sample answer data. Specifically, performing word segmentation processing on the sample answer information of each first sample answer data includes: first adopting a word segmentation algorithm to perform vocabulary splitting on the sample answer information of each first sample answer data. Optionally, the word segmentation algorithm may adopt a word segmentation algorithm based on string matching, or a word segmentation algorithm based on understanding, or a word segmentation algorithm based on statistics. Preferably, the automatic split function of the sample answer information of each first sample answer data can also be realized through the split function of the Java language, or by importing the sample answer information into the computer's EXCEL or PPT and other software with automatic character splitting function. Split. Then filter the split sample answer information by using Java language regular expressions to filter out some specific words that have no meaning, such as: auxiliary words, modal particles or conjunctions, etc.; finally each first sample answer is obtained Sample segmentation of data.
S13:对每一第一样本答题数据的样本分词进行汇总,得到样本分词集。S13: Summarize the sample word segmentation of each first sample answer data to obtain a sample word segmentation set.
其中,样本分词集指对每一第一样本答题数据的样本分词进行统一汇总后所得到的分词集。具体地,获取每一第一样本答题数据的样本分词,然后对获取的每一第一样本答题数据的样本分词进行汇总,得到样本分词集。优选地,若每一第一样本答题数据中包含多个样本答题信息,则在对每一第一样本答题数据的样本分词进行汇总时,需以每一第一样本答题数据的样本答题信息为单位进行汇总,即得到的样本分词集是对应每一样本答题信息的。Among them, the sample word segmentation set refers to the word segmentation set obtained by uniformly summarizing the sample word segmentation of each first sample answer data. Specifically, the sample word segmentation of each first sample answer data is obtained, and then the sample word segmentation of each first sample answer data obtained is summarized to obtain a sample word segmentation set. Preferably, if each first sample answer data contains multiple sample answer information, when the sample segmentation of each first sample answer data is summarized, the sample of each first sample answer data The answer information is summarized in units, that is, the sample word segmentation set is corresponding to each sample answer information.
具体地,对每一第一样本答题数据的样本分词进行汇总包括:获取每一第一样本答题数据中样本答题信息的样本分词,然后按照从小到大的排列顺序给获取到的每一样本答题信息中的所有样本分词赋予对应的标识号,最后得到按照从小到大的排列顺序分布的样本分词集。例如:样本分词集为E={e 1,e 2,e 3,……,e r},其中,e 1,e 2,e 3,……,e r表示该样本分词集所包含的样本分词,1,2,3……r表示每个样本分词所对应的标识号。 Specifically, summarizing the sample word segmentation of each first sample answer data includes: obtaining the sample word segmentation of the sample answer information in each first sample answer data, and then assign each obtained word in the order from smallest to largest All the sample word segmentation in this answer information are assigned corresponding identification numbers, and finally the sample word segmentation set distributed in the order from small to large is obtained. For example: the sample word segmentation set is E={e 1 ,e 2 ,e 3 ,……, e r }, where e 1 , e 2 , e 3 ,……, e r represents the sample contained in the sample word segmentation set Word segmentation, 1, 2, 3...r represents the identification number corresponding to each sample segmentation.
优选地,若获取的每一第一样本答题数据的样本分词中存在重复的样本分词,则在对每一第一样本答题数据的样本分词进行汇总前,预先对第一样本答题数据的样本分词进行去重处理,然后再将去重后的每一第一样本答题数据的样本分词进行汇总,得到样本分词集。具体地,可采用Count函数、Editor编辑器或者R语言对每一第一样本答题数据的样本分词进行去重处理。优选地,也可直接将每一第一样本答题数据的样本分词导入计算机的EXCEL表格中,通过EXCEL的高级筛选功能实现对样本分词的自动去重。Preferably, if there are repeated sample word segments in the sample word segmentation of each first sample answer data obtained, before the sample word segmentation of each first sample answer data is summarized, the first sample answer data The sample word segmentation for deduplication is performed, and then the sample word segmentation of each first sample answer data after deduplication is summarized to obtain a sample word segmentation set. Specifically, the Count function, the Editor editor, or the R language can be used to de-duplicate the sample word segmentation of each first sample answer data. Preferably, the sample word segmentation of each first sample answer data can also be directly imported into the computer's EXCEL table, and the automatic deduplication of the sample word segmentation can be realized through the advanced screening function of EXCEL.
S14:采用样本分词集对每一第一样本答题数据的样本答题信息进行特征转化,得到样本训练特征。S14: Use the sample word segmentation set to perform feature transformation on the sample answer information of each first sample answer data to obtain sample training features.
其中,样本训练特征指每一第一样本答题数据的样本答题信息进行特征转化后所输出的结果。具体地,通过建立词袋模型的方式,采用样本分词集对每一第一样本答题数据的样本答题信息进行特征转化,得到样本训练特征。在本实施例中,词袋模型指衡量每一第一样本答题数据的样本答题信息出现在样本分词集中的具体情况。具体地,词袋模型的建立可通过使用SKLearn中的CountVectorizer实现。其中,CountVectorizer属于特征数值计算的常见方法。对于每一个训练文本,CountVectorizer只考虑每种词汇在该训练文本中出现的频率,CountVectorizer可通过计数来将一个文档转换为向量,并将提取的词汇进行训练,并生成一个CountVectorizerModel用于存储相应的词汇向量空间。Among them, the sample training feature refers to the result output after the feature conversion of the sample answer information of each first sample answer data. Specifically, by establishing a bag-of-words model, a sample word segmentation set is used to transform the sample answer information of each first sample answer data to obtain sample training features. In this embodiment, the bag-of-words model refers to the specific situation that the sample answer information of each first sample answer data appears in the sample word segmentation set. Specifically, the establishment of the bag-of-words model can be achieved by using the CountVectorizer in SKLearn. Among them, CountVectorizer is a common method of feature value calculation. For each training text, CountVectorizer only considers the frequency of each vocabulary in the training text. CountVectorizer can convert a document into a vector by counting, train the extracted vocabulary, and generate a CountVectorizerModel to store the corresponding Vocabulary vector space.
具体地,采用样本分词集对每一第一样本答题数据的样本答题信息进行特征转化包括:先以样本分词集的数量为基准,建立一组词向量,然后通过正则匹配法,将每一第一样本答题数据的样本答题信息与该样本分词集中的所有样本分词进行匹配;若第一样本答题数据的样本答题信息与样本分词集中的样本分词匹配成功,则词向量中对应的元素值为1,若第一样本答题数据的样本答题信息与样本分词集中的样本分词不匹配,则词向量中对应的元素值为0,最后得到一组由若干个1和0组成的词向量,即样本训练特征。Specifically, using the sample word segmentation set to perform feature transformation on the sample answer information of each first sample answer data includes: first establishing a set of word vectors based on the number of sample word segmentation sets, and then using the regular matching method to transform each The sample answer information of the first sample answer data is matched with all the sample word segmentation in the sample word segmentation set; if the sample answer information of the first sample answer data matches the sample word segmentation in the sample word segmentation set successfully, the corresponding element in the word vector The value is 1. If the sample answer information of the first sample answer data does not match the sample word segmentation in the sample word segmentation set, the corresponding element in the word vector is 0, and finally a set of word vectors composed of a number of 1s and 0s is obtained , Which is the sample training feature.
示例性地,若获有包含B 1,B 2,C 1,C 2,C 3五个样本分词的一样本分词集和B、C两个样本答题信息;样本答题信息B里面包含B 1,B 2两个分词,样本答题信息C里面包含C 1,C 2,C 3三个分词;则采用该样本分词集对样本答题信息B进行特征转化后,得到样本训练特征为[1,1,0,0,0];采用该样本分词集对样本答题信息C进行特征转化后,得到样本训练特征为[0,0,1,1,1]。 Exemplarily, if a sample segmentation set containing five sample segmentation words B 1 , B 2 , C 1 , C 2 , C 3 and two sample answer information of B and C are obtained; sample answer information B contains B 1 , B 2 two word segmentation, sample answer information C contains C 1 , C 2 , C 3 three word segmentation; then use this sample word segmentation set to transform the sample answer information B, the sample training feature is obtained as [1,1, 0,0,0]; After using the sample word segmentation set to transform the sample answer information C, the sample training feature is obtained as [0,0,1,1,1].
其中,正则匹配法是用于测试正则表达式的应用,其中,正则表达式是对字符串或特殊字符操作的一种逻辑公式,指用事先定义好的一些特定字符、及这些特定字符的组合,组成一个"规则字符串",这个"规则字符串"用来表达对字符串的一种过滤逻辑。正则表达式是一种文本模式,模式描述在搜索文本时要匹配的一个或多个字符串。Among them, the regular matching method is used to test the application of regular expressions. Among them, the regular expression is a logical formula for the operation of strings or special characters, which refers to the use of predefined specific characters and combinations of these specific characters , Compose a "rule string", this "rule string" is used to express a kind of filtering logic on the string. A regular expression is a text pattern that describes one or more strings to be matched when searching for text.
S15:根据样本训练特征和对应的第一评分值对决策树模型进行训练,得到决策树样 本模型。S15: Train the decision tree model according to the sample training characteristics and the corresponding first score value to obtain the decision tree sample model.
其中,决策树样本模型指以词袋模型的特征为基础,根据样本训练特征和对应的第一评分值对决策树模型进行训练后所生成的样本模型。具体地,决策树样本模型的建立过程包括:将样本训练特征和对应的第一评分值输入到决策树模型中,然后通过使用C4.5算法对决策树模型进行训练,生成训练后的决策树样本模型。C4.5算法是一系列用在机器学习和数据挖掘的分类问题中的算法。C4.5算法的目标是监督学习,给定一个数据集,其中的每一个元组都能用一组属性值来描述,每一个元组属于一个互斥的类别中的某一类。C4.5算法可通过学习,找到一个从属性值到类别的映射关系,并且这个映射能用于对新的类别未知的实体进行分类。Among them, the decision tree sample model refers to a sample model generated after training the decision tree model based on the characteristics of the bag-of-words model according to the sample training characteristics and the corresponding first score value. Specifically, the establishment process of the decision tree sample model includes: input the sample training features and the corresponding first score value into the decision tree model, and then train the decision tree model by using the C4.5 algorithm to generate the trained decision tree Sample model. The C4.5 algorithm is a series of algorithms used in machine learning and data mining classification problems. The goal of the C4.5 algorithm is supervised learning. Given a data set, each tuple in it can be described by a set of attribute values, and each tuple belongs to a certain category in a mutually exclusive category. The C4.5 algorithm can find a mapping relationship from attribute values to categories through learning, and this mapping can be used to classify new entities with unknown categories.
进一步地,在建立决策树样本模型前,还需确认决策树样本模型的大小,其中,决策树样本模型的大小是由决策树的深度和节点样本数决定的。可选地,在本实施中,为了保证建立的决策树样本模型不会出现过渡拟合现象,以及保证决策树样本模型的精度,将决策树的最大深度设为5,最小叶节点样本数设为50,分类标准为熵。Further, before establishing the decision tree sample model, it is necessary to confirm the size of the decision tree sample model, where the size of the decision tree sample model is determined by the depth of the decision tree and the number of node samples. Optionally, in this implementation, in order to ensure that the established decision tree sample model does not appear over-fitting and to ensure the accuracy of the decision tree sample model, the maximum depth of the decision tree is set to 5, and the minimum number of leaf node samples is set Is 50 and the classification standard is entropy.
S16:从决策树样本模型中提取样本关键词。S16: Extract sample keywords from the decision tree sample model.
其中,样本关键词指决策树样本模型的每一输出节点上所对应的特征属性值。具体地,样本关键词的提取也称决策树样本模型的特征值提取。由于决策树样本模型的每一特征属于决策样本模型中的决策属性,因此决策树样本模型的每一特征值对应的是决策样本模型中决策属性的分支。可以理解地,决策树样本模型中每个分支的输出节点都有对应的样本关键词。Among them, the sample keyword refers to the characteristic attribute value corresponding to each output node of the decision tree sample model. Specifically, the extraction of sample keywords is also called the feature value extraction of the decision tree sample model. Since each feature of the decision tree sample model belongs to the decision attribute in the decision sample model, each feature value of the decision tree sample model corresponds to the branch of the decision attribute in the decision sample model. Understandably, the output node of each branch in the decision tree sample model has a corresponding sample keyword.
具体地,从决策树样本模型中提取样本关键词可通过先将决策树样本模型读作为一个sourcable对象,然后通过tosource方法把决策树样本模型代码化,再通过分析代码结构得到该决策树样本模型所输出的样本关键词,最后将样本关键词提取出来。Specifically, extracting sample keywords from the decision tree sample model can be achieved by first reading the decision tree sample model as a sourcable object, then coding the decision tree sample model through the tosource method, and then obtaining the decision tree sample model by analyzing the code structure The output sample keywords are finally extracted.
在本实施例中,通过获取N个第一样本答题数据,每一第一样本答题数据包括样本答题信息和第一评分值,对每一第一样本答题数据的样本答题信息进行分词处理,得到样本分词集,然后采用样本分词集对每一第一样本答题数据的样本答题信息进行特征转化,得到样本训练特征,再根据样本训练特征和对应的第一评分值对决策树模型进行训练,得到决策树样本模型,最后从决策树样本模型中提取样本关键词,不但可以提高考点关键词的泛化能力和准确性,保证关键词更加全面,也可以提高后续评分的准确性。In this embodiment, by obtaining N first sample answer data, each first sample answer data includes sample answer information and a first score value, and segmentation is performed on the sample answer information of each first sample answer data Process, get the sample word segmentation set, and then use the sample word segmentation set to perform feature transformation on the sample answer information of each first sample answer data to obtain the sample training features, and then make the decision tree model based on the sample training features and the corresponding first score value Train to obtain the decision tree sample model, and finally extract the sample keywords from the decision tree sample model, which can not only improve the generalization ability and accuracy of keywords in the test site, ensure that the keywords are more comprehensive, but also improve the accuracy of subsequent scoring.
在一实施例中,如图3所示,在从决策树样本模型中提取样本关键词之后,该关键词的确定方法还包括如下步骤:In one embodiment, as shown in FIG. 3, after the sample keywords are extracted from the decision tree sample model, the method for determining the keywords further includes the following steps:
S17:获取评分规则信息,评分规则信息包括预设考点和每一预设考点对应的预设关键词。S17: Obtain scoring rule information, where the scoring rule information includes preset test sites and preset keywords corresponding to each preset test site.
其中,评分规则信息指业务方提供的基本评分依据,包括预设考点和每一预设考点对应的预设关键词。预设考点指业务方提供的评判考生的答题信息是否正确的知识点。预设考点包括评判考生答错的错误考点和评判考生答对的正确考点。可以理解地,该评分规则信息为一个初步的评分标准,可能存在关键词不够准确或者不够全面的问题。可选地,预设考点可以为一个词语、一句话、或者一段话。此外,在本实施中,为了便于区分不同的预设考点,还可预先给每一预设考点赋予不同的标识。具体地,每一预设考点对应的标识可以用阿拉伯数字、英文大写字母或英文小写字母中的至少一种表示。每一预设考点中都包含对应的预设关键词,预设关键词指从预设考点中提取出来的、可直接用于规则量化的词语。可以理解地,一个预设考点中至少包含一个预设关键词。例如:预设考点1为:杜甫是唐代伟大的现实主义诗人;则预设考点1所对应的预设关键词可以为“杜甫”、“唐代”、“现实主义”和“诗人”。Among them, the scoring rule information refers to the basic scoring basis provided by the business party, including preset test sites and preset keywords corresponding to each preset test site. The preset test point refers to the knowledge point provided by the business side to judge whether the test taker’s answer information is correct. The preset test sites include the wrong test sites for judging candidates' wrong answers and the correct test sites for judging candidates' correct answers. Understandably, the scoring rule information is a preliminary scoring standard, and there may be a problem that the keywords are not accurate or comprehensive. Optionally, the preset test site can be a word, a sentence, or a paragraph. In addition, in this implementation, in order to facilitate the distinction between different preset test sites, each preset test site may be given a different mark in advance. Specifically, the identifier corresponding to each preset test site may be represented by at least one of Arabic numerals, English capital letters, or English lowercase letters. Each preset test site contains corresponding preset keywords. The preset keywords refer to words that are extracted from the preset test sites and can be directly used for rule quantification. Understandably, a preset test site contains at least one preset keyword. For example, the preset test site 1 is: Du Fu is a great realist poet in the Tang Dynasty; the preset keywords corresponding to the preset test site 1 can be "Du Fu", "Tang Dynasty", "realism" and "poet".
S18:从样本关键词中去除和预设关键词重复的关键词,得到目标关键词。S18: Remove keywords that are repeated with preset keywords from the sample keywords to obtain target keywords.
其中,目标关键词指从样本关键词中提取出来的不同于预设关键词的关键词。具体地, 从样本关键词中去除和预设关键词重复的关键词可通过使用C++中的字符比较函数,将样本关键词与预设关键词进行一一比较,然后根据比较结果,去除掉与预设关键词相同的样本关键词,最后将剩下的与预设关键词不同的样本关键词提取出来,作为目标关键词。Among them, the target keywords refer to keywords extracted from the sample keywords that are different from the preset keywords. Specifically, to remove keywords that overlap with preset keywords from the sample keywords, you can compare the sample keywords with the preset keywords one by one by using the character comparison function in C++, and then remove the keywords with the preset keywords according to the comparison result. Preset sample keywords with the same keywords, and finally extract the remaining sample keywords that are different from the preset keywords as target keywords.
S19:发送目标关键词至客户端,获取客户端根据目标关键词返回的考点标签。S19: Send the target keyword to the client, and obtain the test center label returned by the client according to the target keyword.
其中,考点标签指根据预设考点,给获取目标关键词赋予对应标识号的标签。具体地,将目标关键词发送至客户端后,用户可对获取的目标关键词进行分析,然后根据预设考点,给每一目标关键词赋予与对应的预设考点相同的标识号,得到考点标签发送至服务端。优选地,还可以将每一目标关键词对应的考点标签统一生成考点标签文本后再发送至服务端。Wherein, the test site label refers to a label assigned a corresponding identification number to the acquired target keyword according to a preset test site. Specifically, after sending the target keywords to the client, the user can analyze the acquired target keywords, and then assign each target keyword the same identification number as the corresponding preset test site according to the preset test site, to obtain the test site The label is sent to the server. Preferably, the test center label corresponding to each target keyword may be uniformly generated and then sent to the server.
S20:根据考点标签将每一目标关键词加入到对应的预设考点中,得到目标考点。S20: Add each target keyword to the corresponding preset test center according to the test center label to obtain the target test center.
其中,目标考点指加入目标关键词后的考点。具体地,服务端接收从客户端发送的考点标签文本后,根据考点标签文本中每一目标关键词所对应的标识号,将每一目标关键词加入到对应的相同标识号的预设考点中。可以理解地,目标考点所包含的关键词比预设考点所包含的关键词更丰富更全面。Among them, the target test site refers to the test site after adding the target keywords. Specifically, after the server receives the test center label text sent from the client, it adds each target keyword to the corresponding preset test center with the same identification number according to the identification number corresponding to each target keyword in the test center label text . Understandably, the keywords contained in the target test site are richer and more comprehensive than the keywords contained in the preset test site.
示例性地,若有预设考点1和预设考点2,预设考点1包括三个预设关键词分别为a 1,a 2,a 3,预设考点2包括三个预设关键词分别为b 1,b 2,b 3,根据步骤S18获取的目标关键词为a 4,a 5,b 4,b 5,,将目标关键词a 4,a 5赋予考点标签为1,将目标关键词b 4,b 5赋予考点标签为2;则根据考点标签将a 4,a 5加入到预设考点1中,将b 4,b 5加入到预设考点2中;最后得到目标考点1所包含的关键词为a 1,a 2,a 3,a 4,a 5,目标考点2所包含的关键词为b 1,b 2,b 3,b 4,b 5Exemplarily, if there are preset test site 1 and preset test site 2, preset test site 1 includes three preset keywords a 1 , a 2 , and a 3 , and preset test site 2 includes three preset keywords, respectively B 1 , b 2 , b 3 , the target keywords obtained in step S18 are a 4 , a 5 , b 4 , b 5 , and the target keywords a 4 , a 5 are assigned to the test site label as 1, and the target key The words b 4 , b 5 are assigned to the test site label as 2. According to the test site label, a 4 , a 5 are added to the default test site 1, and b 4 , b 5 are added to the default test site 2; finally the target test site 1 is obtained The keywords included are a 1 , a 2 , a 3 , a 4 , and a 5 , and the keywords included in the target test site 2 are b 1 , b 2 , b 3 , b 4 , and b 5 .
在本实施例中,通过获取评分规则信息,评分规则信息包括预设考点和每一预设考点对应的预设关键词,从样本关键词中去除和预设关键词重复的关键词,得到目标关键词,发送目标关键词至客户端,然后获取客户端根据目标关键词返回的考点标签,最后根据考点标签将每一目标关键词加入到对应的预设考点中,得到目标考点;进一步丰富了根据评分规则信息所确定的考点所包含的关键词。In this embodiment, by obtaining the scoring rule information, the scoring rule information includes the preset test sites and the preset keywords corresponding to each preset test site. The keywords that are repeated with the preset keywords are removed from the sample keywords to obtain the target Keywords, send the target keywords to the client, and then obtain the test center tags returned by the client according to the target keywords, and finally add each target keyword to the corresponding preset test center according to the test center tags to obtain the target test center; further enriched Keywords contained in the test site determined according to the scoring rule information.
应理解,上述实施例中各步骤的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。It should be understood that the size of the sequence numbers of the steps in the above embodiments does not mean the order of execution, and the execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present application.
在一实施例中,提供一种关键词的确定装置,该关键词的确定装置与上述实施例中关键词的确定方法一一对应。如图4所示,该关键词的确定装置包括第一样本答题数据获取模块11、分词处理模块12、分词汇总模块13、样本特征转化模块14、决策树样本模型训练模块15和样本关键词提取模块16。各功能模块详细说明如下:In one embodiment, a keyword determining device is provided, and the keyword determining device corresponds to the keyword determining method in the foregoing embodiment in a one-to-one correspondence. As shown in Figure 4, the keyword determination device includes a first sample answer data acquisition module 11, a word segmentation processing module 12, a total vocabulary module 13, a sample feature conversion module 14, a decision tree sample model training module 15 and a sample key Word extraction module 16. The detailed description of each functional module is as follows:
第一样本答题数据获取模块11,用于获取N个第一样本答题数据,每一第一样本答题数据包括样本答题信息和第一评分值,N为正整数;The first sample answer data acquisition module 11 is used to acquire N first sample answer data, each first sample answer data includes sample answer information and a first score value, and N is a positive integer;
分词处理模块12,用于对每一第一样本答题数据的样本答题信息进行分词处理,得到每一第一样本答题数据的样本分词;The word segmentation processing module 12 is used to perform word segmentation processing on the sample answer information of each first sample answer data to obtain the sample word segmentation of each first sample answer data;
分词汇总模块13,用于对每一第一样本答题数据的样本分词进行汇总,得到样本分词集;The total vocabulary module 13 is used to summarize the sample word segmentation of each first sample answer data to obtain a sample word segmentation set;
样本特征转化模块14,用于采用样本分词集对每一第一样本答题数据的样本答题信息进行特征转化,得到样本训练特征;The sample feature conversion module 14 is used to use the sample word segmentation set to perform feature conversion on the sample answer information of each first sample answer data to obtain sample training features;
决策树样本模型训练模块15,用于根据样本训练特征和对应的第一评分值对决策树模型进行训练,得到决策树样本模型;The decision tree sample model training module 15 is used to train the decision tree model according to the sample training characteristics and the corresponding first score value to obtain the decision tree sample model;
样本关键词提取模块16,用于从决策树样本模型中提取样本关键词。The sample keyword extraction module 16 is used to extract sample keywords from the decision tree sample model.
优选地,如图5所示,该关键词的确定装置,还包括:Preferably, as shown in Fig. 5, the keyword determining device further includes:
评分规则信息获取模块17,用于获取评分规则信息,评分规则信息包括预设考点和每一预设考点对应的预设关键词;The scoring rule information obtaining module 17 is used to obtain scoring rule information, the scoring rule information includes preset test sites and preset keywords corresponding to each preset test site;
重复关键词去除模块18,用于从样本关键词中去除和预设关键词重复的关键词,得到目标关键词;The repetitive keyword removal module 18 is used to remove keywords that are repeated with preset keywords from the sample keywords to obtain target keywords;
考点标签获取模块19,用于发送目标关键词至客户端,获取客户端根据目标关键词返回的考点标签;The test center label obtaining module 19 is used to send the target keyword to the client, and obtain the test center label returned by the client according to the target keyword;
目标关键词加入模块20,用于根据考点标签将每一目标关键词加入到对应的预设考点中,得到目标考点。The target keyword adding module 20 is used to add each target keyword to the corresponding preset test site according to the test site tag to obtain the target test site.
关于关键词的确定装置的具体限定可以参见上文中对于关键词的确定方法的限定,在此不再赘述。上述关键词的确定装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。For the specific definition of the means for determining keywords, please refer to the above limitation on the method for determining keywords, which will not be repeated here. The various modules in the device for determining the above keywords can be implemented in whole or in part by software, hardware, and combinations thereof. The above-mentioned modules may be embedded in the form of hardware or independent of the processor in the computer equipment, or may be stored in the memory of the computer equipment in the form of software, so that the processor can call and execute the operations corresponding to the above-mentioned modules.
本申请实施例还提供一自动评分方法,该自动评分方法可应用如图1所示的应用环境中。具体地,该自动评分方法应用在自动评分系统中,该自动评分系统包括如图1所示的客户端和服务端,客户端与服务端通过网络进行通信,用于解决无法对考生的答题内容进行高效准确评分的问题。其中,客户端又称为用户端,是指与服务端相对应,为客户提供本地服务的程序。客户端可安装在但不限于各种个人计算机、笔记本电脑、智能手机、平板电脑和便携式可穿戴设备上。服务端可以用独立的服务器或者是多个服务器组成的服务器集群来实现。The embodiment of the present application also provides an automatic scoring method, which can be applied in the application environment shown in FIG. 1. Specifically, the automatic scoring method is applied in an automatic scoring system. The automatic scoring system includes a client and a server as shown in FIG. 1. The client and the server communicate through the network to solve the problem of the test taker’s answer. Make efficient and accurate scoring questions. Among them, the client is also called the client, which refers to the program that corresponds to the server and provides local services to the client. The client can be installed on but not limited to various personal computers, notebook computers, smart phones, tablets and portable wearable devices. The server can be implemented with an independent server or a server cluster composed of multiple servers.
在一实施例中,如图6所示,提供一种自动评分方法,以该方法应用在图1中的服务端为例进行说明,包括如下步骤:In an embodiment, as shown in FIG. 6, an automatic scoring method is provided, and the method is applied to the server in FIG. 1 as an example for description, including the following steps:
S21:获取待评分答题信息。S21: Obtain the answer information to be graded.
其中,待评分答题信息指从考生的答题文本上获取的答题信息。具体地,获取待评分答题信息可以直接从评分系统的答题文本上获取任意一考生的答题信息,或者将任意一考生在纸质答卷上写入的答题信息进行扫描识别后获取得到。Among them, the answer information to be graded refers to the answer information obtained from the test taker's answer text. Specifically, to obtain the answer information to be graded, the answer information of any examinee can be obtained directly from the answer text of the grading system, or the answer information written by any examinee on the paper answer sheet is scanned and recognized.
S22:对待评分答题信息进行关键词提取,得到核心关键词。S22: Perform keyword extraction on the response information to be scored to obtain core keywords.
其中,核心关键词指从获取的待评分答题信息中提取出来的一个个独立的关键词。具体地,对待评分答题信息进行关键词提取包括:先采用分词算法对待评分答题信息进行词汇拆分。可选地,分词算法可以采用基于字符串匹配的分词算法,或者采用基于理解的分词算法,还可以是采用基于统计的分词算法实现。优选地,还可通过Java语言的拆分函数,或者通过将待评分答题信息导入计算机的EXCEL或PPT等具有自动拆分字符功能的软件实现对待评分答题信息的自动拆分。然后通过采用Java语言的正则表达式对拆分后的待评分答题信息进行筛选,过滤掉一些不具有意义的特定词语,例如:助词、语气词或者连接词等。最后将筛选后所得的词语提取出来作为核心关键词。在本实施例中,核心关键词的个数应不少于一个。Among them, the core keywords refer to individual keywords extracted from the obtained answer information to be scored. Specifically, performing keyword extraction on the response information to be scored includes: first adopting a word segmentation algorithm to perform vocabulary splitting on the response information to be scored. Optionally, the word segmentation algorithm may adopt a word segmentation algorithm based on string matching, or a word segmentation algorithm based on understanding, or a word segmentation algorithm based on statistics. Preferably, the automatic splitting of the response information to be scored can also be realized through the split function of the Java language, or by importing the response information to be scored into the computer's EXCEL or PPT software with automatic character splitting function. Then the regular expressions of the Java language are used to filter the split answering information to be scored, and some specific words that have no meaning, such as auxiliary words, modal particles, or conjunctions, are filtered out. Finally, the words obtained after screening are extracted as core keywords. In this embodiment, the number of core keywords should be no less than one.
S23:采用目标考点对核心关键词进行特征转化,得到待评分考点特征;其中,目标考点是采用上述关键词的确定方法所得的。S23: Use the target test site to perform feature transformation on the core keywords to obtain the features of the test site to be scored; wherein the target test site is obtained by using the above keyword determination method.
其中,待评分考点特征指衡量核心关键词与目标考点中的关键词之间的相似度的一种特征。在本步骤中,目标考点是采用上述实施例中的关键词的确定方法所得的。Among them, the feature of the test site to be scored refers to a feature that measures the similarity between the core keywords and the keywords in the target test site. In this step, the target test point is obtained by using the method for determining keywords in the above embodiment.
采用目标考点对核心关键词进行特征转化包括:先以目标考点的数量为基准,建立一组考点向量,然后通过正则匹配法,将每一核心关键词与目标考点中的关键词进行匹配,根据匹配结果,判断核心关键词是否与目标考点相匹配。具体地,评判核心关键词是否与目标考点相匹配,可根据核心关键词与该目标考点所包含的关键词匹配程度判断。可以为只要核心关键词与目标考点中的任意一个关键词匹配,则认为该核心关键词与对应的目标考点相匹配,也可以为核心关键词与对应的目标考点中的至少两个关键词相匹配,才认为该核心关键词与对应的目标考点相匹配,具体可根据实际情况自定义设置。优选地,若核 心关键词与该目标考点中的任意一关键词匹配成功,则表示该核心关键词与目标考点相匹配,在考点向量中对应的元素值为1,若核心关键词与目标考点中的所有关键词都不匹配,则表示该核心关键词与目标考点匹配失败,在考点向量中对应的元素值为0。最后得到一组由若干个1和0组成的考点向量,即待评分考点特征。The use of target test sites to perform feature transformation of core keywords includes: first establish a set of test site vectors based on the number of target test sites, and then match each core keyword with the keywords in the target test site through the regular matching method. Matching results, to determine whether the core keywords match the target test site. Specifically, judging whether the core keyword matches the target test site can be judged according to the degree of matching between the core keyword and the keywords contained in the target test site. It can be that as long as the core keyword matches any one of the keywords in the target test site, the core keyword is considered to match the corresponding target test site, or it can be that the core keyword matches at least two keywords in the corresponding target test site. Match, it is considered that the core keyword matches the corresponding target test site. The specific settings can be customized according to the actual situation. Preferably, if the core keyword matches any keyword in the target test site successfully, it means that the core keyword matches the target test site, and the corresponding element value in the test site vector is 1, if the core keyword matches the target test site If none of the keywords in the test site match, it means that the core keyword fails to match the target test site, and the corresponding element value in the test site vector is 0. Finally, a set of test site vectors consisting of a number of 1s and 0s is obtained, that is, the characteristics of test sites to be scored.
S24:将待评分考点特征输入到预设的决策树参考模型中,得到待评分答题信息的准确分值。S24: Input the characteristics of the test point to be scored into the preset decision tree reference model to obtain the accurate score of the answer information to be scored.
其中,准确分值指待评分答题信息经决策树参考模型训练后所得的分值。在本实施例中,决策树参考模型是预设建立好并保存在服务端的后台数据库的,当执行完步骤S23,得到待评分考点特征后,可直接从服务端的数据库调取。Among them, the accurate score refers to the score obtained by training the decision tree reference model for the answer information to be scored. In this embodiment, the decision tree reference model is pre-established and stored in the backend database of the server. After step S23 is executed and the characteristics of the test points to be scored are obtained, it can be directly retrieved from the database of the server.
其中,决策树参考模型指是在已知各种情况发生概率的基础上,通过构成决策树来求取净现值的期望值大于等于零的概率,它属于一种树状结构,其中每个内部节点表示一个属性上的测试,每个分支代表一个测试输出,每个叶节点代表一种类别。Among them, the decision tree reference model refers to the decision tree to obtain the probability that the expected value of the net present value is greater than or equal to zero based on the known probability of various situations. It belongs to a tree structure, in which each internal node Represents a test on an attribute, each branch represents a test output, and each leaf node represents a category.
在本实施例中,通过获取待评分答题信息,对待评分答题信息进行关键词提取,得到核心关键词,然后采用目标考点对核心关键词进行特征转化,得到待评分考点特征;其中,目标考点是采用上述关键词的确定方法所得的;最后将待评分考点特征输入到预设的决策树参考模型中,得到待评分答题信息的准确分值;实现了对考生的答题信息进行高效准确的评分。In this embodiment, by obtaining the answer information to be scored, keyword extraction is performed on the answer information to be scored to obtain the core keywords, and then the target test site is used to transform the core keywords to obtain the characteristics of the test site to be scored; where the target test site is It is obtained by using the above keyword determination method; finally, the characteristics of the test points to be scored are input into the preset decision tree reference model to obtain the accurate score of the answer information to be scored; efficient and accurate scoring of the candidate's answer information is realized.
在一实施例中,如图7所示,采用目标考点对核心关键词进行特征转化,得到待评分考点特征,包括如下步骤:In one embodiment, as shown in FIG. 7, using target test sites to perform feature transformation on core keywords to obtain the features of test sites to be scored includes the following steps:
S231:获取目标考点对应的有效关键词。S231: Obtain valid keywords corresponding to the target test site.
其中,有效关键词指目标考点所包含的所有关键词。具体地,根据上述关键词的确定方法可知,每一目标考点所对应的关键词是已确定好的,因此获取目标考点所对应的有效关键词可直接根据每一目标考点,从每一目标考点中获取对应的有效关键词即可。Among them, effective keywords refer to all the keywords contained in the target test site. Specifically, according to the above keyword determination method, it can be known that the keywords corresponding to each target test site have been determined. Therefore, the effective keywords corresponding to the target test site can be obtained directly according to each target test site and from each target test site. Obtain the corresponding valid keywords in.
S232:通过正则匹配法,将有效关键词与核心关键词进行一一匹配,得到关键词匹配信息。S232: Through the regular matching method, the effective keywords are matched with the core keywords one by one to obtain keyword matching information.
具体地,通过正则匹配法,将有效关键词与核心关键词进行一一匹配指将有效关键词定义为特定字符,然后将这些由有效关键词组成的特定字符组成一个“规则字符串”,用来表达对核心关键词的一种过滤逻辑,从而匹配出与有效关键词相对应的核心关键词,得到关键词匹配信息。Specifically, through the regular matching method, the one-to-one matching of valid keywords with core keywords refers to defining valid keywords as specific characters, and then combining these specific characters composed of valid keywords into a "rule string", using To express a filtering logic for core keywords, so as to match the core keywords corresponding to the effective keywords, and obtain keyword matching information.
其中,关键词匹配信息指将有效关键词与核心关键词进行匹配后所得的匹配结果,包括匹配成功和匹配失败。具体地,通过正则匹配法,将有效关键词与核心关键词进行一一匹配,根据匹配结果,得到对应的关键词匹配信息。例如:若获取到10个核心关键词和5个有效关键词,则提取任意一核心关键词,通过正则匹配法,逐一与获取的5个有效关键词进行匹配,在匹配过程中,只要该核心关键词与获取的5个有效关键词中的任意一个有效关键词相匹配,则表示匹配成功,若该核心关键词与获取的5个有效关键词都不匹配,则表示匹配失败;根据上述步骤逐一提取核心关键词,并将提取的核心关键词,通过正则匹配法,逐一与获取的5个有效关键词进行匹配,直至将获取的10个核心关键词与获取5个有效关键词都逐一匹配完成,最后得到关键词匹配信息。Among them, the keyword matching information refers to the matching result obtained after matching the effective keyword with the core keyword, including matching success and matching failure. Specifically, through the regular matching method, the effective keywords are matched with the core keywords one by one, and the corresponding keyword matching information is obtained according to the matching result. For example: if 10 core keywords and 5 effective keywords are obtained, any core keyword is extracted, and the 5 effective keywords obtained are matched one by one through the regular matching method. During the matching process, as long as the core keyword If the keyword matches any one of the obtained 5 effective keywords, it means the matching is successful. If the core keyword does not match the obtained 5 effective keywords, it means the matching failed; according to the above steps Extract the core keywords one by one, and use the regular matching method to match the extracted core keywords with the obtained 5 effective keywords one by one, until the obtained 10 core keywords are matched with the obtained 5 effective keywords one by one Complete, finally get keyword matching information.
S233:根据关键词匹配信息,为每一核心关键词分配对应的匹配标识。S233: Assign a corresponding matching identifier to each core keyword according to the keyword matching information.
其中,匹配标识指根据关键词匹配信息,为每一核心关键词自定义赋予的一种标识号,可以为阿拉伯数字、大写字母或小写字母等。具体地,该匹配标识反映了核心关键词与目标关键词的匹配情况。另外地,由于核心关键词与有效关键词匹配成功后,还需明确该有效关键词所对应的考点,因此,在给与有效关键词匹配成功的核心关键词赋予匹配标识时,还需标识上有效关键词所对应的考点标识。此方案对具体的匹配标识不做任何限定。优选地,为了便于与后续待评分考点特征的标识区分,将与有效关键词匹配成功的核心关键词 赋予大写字母标识和对应的考点标识,例如A1,大写字母A表示与与有效关键词匹配成功,1表示有效关键词所对应的考点标识;将与有效关键词匹配失败的核心关键词只赋予小写字母标识,例如a,小写字母a表示与有效关键词匹配失败。Among them, the matching identifier refers to a type of identifier assigned to each core keyword according to the keyword matching information, which can be Arabic numerals, uppercase letters, or lowercase letters. Specifically, the matching identifier reflects the matching situation between the core keyword and the target keyword. In addition, after the core keyword and the effective keyword are successfully matched, the test point corresponding to the effective keyword needs to be clarified. Therefore, when the core keyword that successfully matches the effective keyword is assigned a matching mark, the mark The test site identification corresponding to the valid keyword. This scheme does not impose any restrictions on the specific matching identification. Preferably, in order to facilitate the identification of the feature of the test site to be scored later, the core keyword that successfully matches the effective keyword is assigned a capital letter logo and a corresponding test site logo, for example, A1, and a capital letter A indicates a successful match with the effective keyword , 1 represents the test site identifier corresponding to the valid keyword; the core keyword that fails to match the valid keyword is only given a lowercase letter identifier, for example, a, and the lowercase letter a indicates that the valid keyword fails to match.
S234:根据每一核心关键词的匹配标识,得到待评分考点特征。S234: Obtain the feature of the test site to be scored according to the matching identifier of each core keyword.
具体地,根据每一核心关键词的匹配标识,判断核心关键词与对应的目标考点是否匹配成功,若核心关键词与该目标考点匹配成功,则在考点向量中对应的元素值为0,若核心关键词与该目标考点匹配失败,则在考点向量中对应的元素值为0,最后得到一组由若干个1和0组成的考点向量,即待评分考点特征。Specifically, according to the matching identifier of each core keyword, it is determined whether the core keyword matches the corresponding target test site successfully. If the core keyword matches the target test site successfully, the corresponding element value in the test site vector is 0, if If the core keyword fails to match the target test site, the corresponding element value in the test site vector is 0, and finally a set of test site vectors composed of a number of 1s and 0s is obtained, that is, the test site features to be scored.
示例性的,若获有6个目标考点,每个目标考点都至少包含1个有效关键词,和5个核心关键词;根据正则匹配法,将5个核心关键词与目标考点中的有效关键词进行一一匹配后得到,只有前三个核心关键词与目标考点匹配成功,则得到待评分考点特征为[1,1,1,0,0,0]。Exemplarily, if there are 6 target test sites, each target test site contains at least 1 valid keyword and 5 core keywords; according to the regular matching method, the 5 core keywords are combined with the effective key of the target test site After the words are matched one by one, only the first three core keywords are successfully matched with the target test site, and the test site feature to be scored is [1,1,1,0,0,0].
在本实施例中,通过获取目标考点所对应的有效关键词;通过正则匹配法,将有效关键词与核心关键词进行一一匹配,得到关键词匹配信息;然后根据关键词匹配信息,为每一核心关键词分配对应的匹配标识,最后根据每一核心关键词的匹配标识,得到待评分考点特征;进一步保证了新增的考点关键词的准确性和有效性。In this embodiment, the effective keywords corresponding to the target test site are obtained; the effective keywords are matched with the core keywords one by one through the regular matching method to obtain keyword matching information; then according to the keyword matching information, each A core keyword is assigned a corresponding matching identifier, and finally, according to the matching identifier of each core keyword, the characteristics of the test site to be scored are obtained; further ensuring the accuracy and effectiveness of the newly added test site keywords.
在一实施例中,如图8所示,在将待评分考点特征输入到预设的决策树参考模型中,得到待评分答题信息的输出分值之前,自动评分方法还包括:In one embodiment, as shown in FIG. 8, before the features of the test points to be scored are input into the preset decision tree reference model to obtain the output score of the answer information to be scored, the automatic scoring method further includes:
S241:获取M个第二样本答题数据,每一第二样本答题数据包括原始答题信息和第二评分值,M为正整数。S241: Acquire M second sample answer data, each second sample answer data includes original answer information and a second score value, and M is a positive integer.
其中,第二样本答题数据指考生的答题数据。每一第二样本答题数据包括原始答题信息和第二评分值;即第二样本答题数据中包括原始答题信息和对该原始答题信息进行初步评分后所得的第二评分值。可选地,该第二样本答题数据可以从一评分系统中获取。该评分系统可以对原始答题信息进行初步评分,得到第二评分值。其中,原始答题信息指从评分系统的答题文本上获取的某一主观题的考生的答题信息。第二评分值指通过人工评分或者计算机评分的方式,预先对原始答题信息进行初步评分后所得的评分值。Among them, the second sample answer data refers to the test taker's answer data. Each second sample answer data includes original answer information and a second score value; that is, the second sample answer data includes original answer information and a second score value obtained after preliminary grading of the original answer information. Optionally, the second sample answer data can be obtained from a scoring system. The scoring system can perform preliminary scoring on the original answer information and obtain the second scoring value. Among them, the original answer information refers to the candidate's answer information of a certain subjective question obtained from the answer text of the scoring system. The second scoring value refers to the scoring value obtained by preliminary scoring the original answer information in advance by means of manual scoring or computer scoring.
获取的第二样本答题数据的数量为M个,其中,M为正整数。而M的具体数值可以根据实际需要进行设定。M的数值越高,后续决策树参考模型的准确性会越高,然而提取效率会有所下降,可以在准确度和效率上进行综合考虑进行对M的选取。The number of the second sample answer data obtained is M, where M is a positive integer. The specific value of M can be set according to actual needs. The higher the value of M, the higher the accuracy of the subsequent decision tree reference model, but the extraction efficiency will decrease. The accuracy and efficiency can be comprehensively considered to select M.
S242:采用目标考点对每一第二样本答题数据的原始答题信息进行特征转化,得到考点训练特征。S242: Use the target test site to perform feature transformation on the original answer information of each second sample of answer data to obtain training features of the test site.
其中,考点训练特征指衡量目标考点与每一第二样本答题数据的原始答题信息之间的相似度的一种特征。目标考点是采用上述关键词的确定方法所得的。Among them, the test site training feature refers to a feature that measures the similarity between the target test site and the original answer information of each second sample of answer data. The target test site is obtained by using the above-mentioned keyword determination method.
具体地,采用目标考点对每一第二样本答题数据的原始答题信息进行特征转化包括:先以目标考点的数量为基准,建立一组空的考点向量,然后按照同义词词林语义码,将每一第二样本答题数据的原始答题信息与目标考点进行词义比较;若原始答题信息与任意一目标考点匹配成功,则在考点向量中对应的元素值为1,若原始答题信息与任意一目标考点都不匹配,则在考点向量中对应的元素值为0,最后得到一组由若干个1和0组成的考点向量,即考点训练特征。其中,同义词词林语义码是一种用来计算词语间的相似度的一种方法。Specifically, using the target test site to perform feature transformation on the original answer information of each second sample answer data includes: first, based on the number of target test sites, establish a set of empty test site vectors, and then use the synonym word forest semantic code to convert each The original answer information of the second sample of answer data is compared with the target test site; if the original answer information matches any target test site successfully, the corresponding element in the test site vector is 1. If the original answer information matches any target test site If none of them match, the corresponding element value in the test site vector is 0, and finally a set of test site vectors consisting of several 1's and 0's is obtained, that is, the test site training feature. Among them, the synonym word forest semantic code is a method used to calculate the similarity between words.
S243:将考点训练特征和对应的第二评分值组成考点样本集。S243: Combine the test site training features and the corresponding second score value into a test site sample set.
其中,考点样本集指待输入到决策树模型中进行训练的样本数据;包括考点训练特征和对应的第二评分值。具体的,考点样本集是由若干个考点样本组成的数据集,考点样本包括考点训练特征和与该考点训练特征相对应的第二评分值。可以理解地,每一考点训练特征都与对应的第二评分值相关联。Among them, the test site sample set refers to the sample data to be input into the decision tree model for training; it includes the test site training features and the corresponding second score value. Specifically, the test site sample set is a data set composed of several test site samples, and the test site samples include test site training features and a second score value corresponding to the test site training features. Understandably, each test site training feature is associated with the corresponding second score value.
S244:根据考点样本集对决策树模型进行训练,得到决策树参考模型。S244: Train the decision tree model according to the test site sample set to obtain the decision tree reference model.
其中,决策树参考模型是一种预测模型,它代表对象属性与对象值之间的一种映射关系,决策树中每个节点表示某个对象,而每个分叉路径则代表的某个可能的属性值,而每个叶结点则对应从根节点到该叶节点所经历的路径所表示的对象的值。具体地,根据考点样本集对决策树模型进行训练,得到决策树参考模型指将考点训练特征和对应的第二评分值输入到决策树模型中后,通过使用C4.5算法对决策树模型进行训练,生成训练后的决策树样本模型。Among them, the decision tree reference model is a predictive model, which represents a mapping relationship between object attributes and object values. Each node in the decision tree represents an object, and each bifurcation path represents a certain possibility. Each leaf node corresponds to the value of the object represented by the path from the root node to the leaf node. Specifically, the decision tree model is trained according to the test site sample set, and the decision tree reference model is obtained. After the test site training characteristics and the corresponding second score value are input into the decision tree model, the decision tree model is performed by using the C4.5 algorithm Training, generate the trained decision tree sample model.
优选地,为了进一步验证决策树参考模型的准确性,还将考点样本集分成用于建模的训练集和用于验证模型效果的测试集。其中,训练集指用来建立决策树样本模型的数据集。测试集指用来验证建立后的决策树样本模型的效果的数据集。将考点样本集分为训练集和测试集可采用随机划分数据集或交叉检验的方法进行划分;划分后训练集和测试集的比例值可以为:训练集:测试集=6:4、训练集:测试集=7:3或训练集:测试集=75:25等。优选地,为了提高决策树样本模型的精度,在本步骤中,将获取的考点样本集的75%作为训练集,把获取的考点样本集的25%作为测试集。Preferably, in order to further verify the accuracy of the decision tree reference model, the test site sample set is divided into a training set for modeling and a test set for verifying the effect of the model. Among them, the training set refers to the data set used to build the decision tree sample model. The test set refers to the data set used to verify the effect of the established decision tree sample model. The test site sample set can be divided into training set and test set by randomly dividing the data set or cross-checking method; the ratio of training set and test set after division can be: training set: test set = 6:4, training set : Test set=7:3 or training set: Test set=75:25, etc. Preferably, in order to improve the accuracy of the decision tree sample model, in this step, 75% of the acquired test site sample set is used as the training set, and 25% of the acquired test site sample set is used as the test set.
在本实施例中,通过获取M个第二样本答题数据,每一第二样本答题数据包括原始答题信息和第二评分值,然后采用目标考点对每一第二样本答题数据的原始答题信息进行特征转化,得到考点训练特征,最后根据考点训练特征和对应的第二评分值对决策树模型进行训练,得到决策树参考模型;进一步保证了通过决策树参考模型对考生的答题信息进行评分的准确性。In this embodiment, by acquiring M second sample answer data, each second sample answer data includes the original answer information and the second score value, and then use the target test site to perform the original answer information of each second sample answer data Feature transformation, the test site training characteristics are obtained, and finally the decision tree model is trained according to the test site training characteristics and the corresponding second score value to obtain the decision tree reference model; further ensuring the accuracy of the score of the candidate’s answer information through the decision tree reference model Sex.
应理解,上述实施例中各步骤的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。It should be understood that the size of the sequence numbers of the steps in the above embodiments does not mean the order of execution, and the execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present application.
在一实施例中,提供一种自动评分装置,该自动评分装置与上述实施例中自动评分方法一一对应。如图9所示,该自动评分装置包括待评分答题信息获取模块21、关键词提取模块22、待评分考点特征转化模块23和输入模块24。各功能模块详细说明如下:In one embodiment, an automatic scoring device is provided, and the automatic scoring device corresponds to the automatic scoring method in the above-mentioned embodiment one-to-one. As shown in FIG. 9, the automatic scoring device includes a module 21 for obtaining answer information to be scored, a keyword extraction module 22, a feature conversion module 23 for the test site to be scored, and an input module 24. The detailed description of each functional module is as follows:
待评分答题信息获取模块21,用于获取待评分答题信息;The answer information obtaining module 21 to be graded is used to obtain answer information to be graded;
关键词提取模块22,用于对待评分答题信息进行关键词提取,得到核心关键词;The keyword extraction module 22 is used for keyword extraction on the answer information to be scored to obtain core keywords;
待评分考点特征转化模块23,用于采用目标考点对核心关键词进行特征转化,得到待评分考点特征;其中,目标考点是采用关键词的确定方法所得的;The feature conversion module 23 of the test point to be scored is used to transform the core keywords by using the target test point to obtain the feature of the test point to be scored; wherein the target test point is obtained by the keyword determination method;
输入模块24,用于将待评分考点特征输入到预设的决策树参考模型中,得到待评分答题信息的准确分值。The input module 24 is used to input the characteristics of the test site to be scored into the preset decision tree reference model to obtain the accurate score of the answer information to be scored.
优选地,待评分考点特征转化模块23,包括:Preferably, the feature conversion module 23 of the test point to be scored includes:
有效关键词获取单元,用于获取目标考点对应的有效关键词;The effective keyword acquisition unit is used to obtain the effective keywords corresponding to the target test site;
匹配单元,用于通过正则匹配法,将有效关键词与核心关键词进行一一匹配,得到关键词匹配信息;The matching unit is used to match the effective keywords with the core keywords one by one through the regular matching method to obtain keyword matching information;
分配单元,用于根据关键词匹配信息,为每一核心关键词分配对应的匹配标识;The allocation unit is used to allocate a corresponding matching identifier for each core keyword according to the keyword matching information;
得到单元,用于根据每一核心关键词的匹配标识,得到待评分考点特征。The obtaining unit is used to obtain the feature of the test point to be scored according to the matching identifier of each core keyword.
优选地,输入模块24,包括:Preferably, the input module 24 includes:
第二样本答题数据获取单元,用于获取M个第二样本答题数据,每一第二样本答题数据包括原始答题信息和第二评分值,M为正整数;The second sample answer data acquisition unit is used to acquire M second sample answer data, each second sample answer data includes original answer information and a second score value, and M is a positive integer;
考点特征转化单元,用于采用目标考点对每一第二样本答题数据的原始答题信息进行特征转化,得到考点训练特征;The test site feature transformation unit is used to use the target test site to perform feature transformation on the original answer information of each second sample answer data to obtain the test site training features;
组成单元,用于将考点训练特征和对应的第二评分值组成考点样本集;The constituent unit is used to form the test site sample set by the test site training features and the corresponding second score value;
决策树参考模型训练单元,用于根据考点样本集对决策树模型进行训练,得到决策树参考模型。The decision tree reference model training unit is used to train the decision tree model according to the test site sample set to obtain the decision tree reference model.
关于自动评分装置的具体限定可以参见上文中对于自动评分方法的限定,在此不再赘 述。上述自动评分装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。For the specific limitation of the automatic scoring device, please refer to the above limitation on the automatic scoring method, which will not be repeated here. Each module in the above-mentioned automatic scoring device can be implemented in whole or in part by software, hardware, and a combination thereof. The above-mentioned modules may be embedded in the form of hardware or independent of the processor in the computer equipment, or may be stored in the memory of the computer equipment in the form of software, so that the processor can call and execute the operations corresponding to the above-mentioned modules.
在一个实施例中,提供了一种计算机设备,该计算机设备可以是服务器,其内部结构图可以如图10所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口和数据库。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统、计算机可读指令和数据库。该内存储器为非易失性存储介质中的操作系统和计算机可读指令的运行提供环境。该计算机设备的数据库用于存储上述关键词的确定方法和上述自动评分方法中使用到的数据。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机可读指令被处理器执行时以实现一种关键词的确定方法,或该计算机可读指令被处理器执行时以实现一种自动评分方法。In one embodiment, a computer device is provided. The computer device may be a server, and its internal structure diagram may be as shown in FIG. 10. The computer device includes a processor, memory, network interface, and database connected by a system bus. Among them, the processor of the computer device is used to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer-readable instructions, and a database. The internal memory provides an environment for the operation of the operating system and computer-readable instructions in the non-volatile storage medium. The database of the computer device is used to store the data used in the above-mentioned keyword determination method and the above-mentioned automatic scoring method. The network interface of the computer device is used to communicate with external terminals through a network connection. The computer-readable instruction is executed by the processor to implement a method for determining keywords, or the computer-readable instruction is executed by the processor to implement an automatic scoring method.
在一个实施例中,提供了一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机可读指令,处理器执行计算机可读指令时实现以下步骤:In one embodiment, a computer device is provided, including a memory, a processor, and computer-readable instructions stored in the memory and running on the processor, and the processor implements the following steps when the processor executes the computer-readable instructions:
获取N个第一样本答题数据,每一所述第一样本答题数据包括样本答题信息和第一评分值,N为正整数;Acquiring N first sample answer data, each of the first sample answer data includes sample answer information and a first score value, and N is a positive integer;
对每一所述第一样本答题数据的所述样本答题信息进行分词处理,得到每一所述第一样本答题数据的样本分词;Performing word segmentation processing on the sample answer information of each of the first sample answer data to obtain sample word segmentation of each of the first sample answer data;
对每一所述第一样本答题数据的所述样本分词进行汇总,得到样本分词集;Summarize the sample word segmentation of each of the first sample answer data to obtain a sample word segmentation set;
采用所述样本分词集对每一所述第一样本答题数据的所述样本答题信息进行特征转化,得到样本训练特征;Using the sample word segmentation set to perform feature conversion on the sample answer information of each of the first sample answer data to obtain sample training features;
根据所述样本训练特征和对应的第一评分值对决策树模型进行训练,得到决策树样本模型;Training the decision tree model according to the sample training feature and the corresponding first score value to obtain the decision tree sample model;
从所述决策树样本模型中提取样本关键词。Extract sample keywords from the decision tree sample model.
在一个实施例中,提供了一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机可读指令,处理器执行计算机可读指令时实现以下步骤:In one embodiment, a computer device is provided, including a memory, a processor, and computer-readable instructions stored in the memory and running on the processor, and the processor implements the following steps when the processor executes the computer-readable instructions:
获取待评分答题信息;Get information about the answer to be graded;
对所述待评分答题信息进行关键词提取,得到核心关键词;Perform keyword extraction on the answer information to be scored to obtain core keywords;
采用目标考点对所述核心关键词进行特征转化,得到待评分考点特征;其中,所述目标考点是采用权利要求2所述的关键词的确定方法所得的;Use the target test site to perform feature transformation on the core keywords to obtain the features of the test site to be scored; wherein, the target test site is obtained by using the keyword determination method of claim 2;
将所述待评分考点特征输入到预设的决策树参考模型中,得到所述待评分答题信息的准确分值。The characteristics of the test point to be scored are input into a preset decision tree reference model to obtain the accurate score of the answer information to be scored.
在一个实施例中,一个或多个存储有计算机可读指令的非易失性可读存储介质,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行如下步骤:In one embodiment, one or more non-volatile readable storage media storing computer readable instructions, when the computer readable instructions are executed by one or more processors, cause the one or more processing The device performs the following steps:
获取N个第一样本答题数据,每一所述第一样本答题数据包括样本答题信息和第一评分值,N为正整数;Acquiring N first sample answer data, each of the first sample answer data includes sample answer information and a first score value, and N is a positive integer;
对每一所述第一样本答题数据的所述样本答题信息进行分词处理,得到每一所述第一样本答题数据的样本分词;Performing word segmentation processing on the sample answer information of each of the first sample answer data to obtain sample word segmentation of each of the first sample answer data;
对每一所述第一样本答题数据的所述样本分词进行汇总,得到样本分词集;Summarize the sample word segmentation of each of the first sample answer data to obtain a sample word segmentation set;
采用所述样本分词集对每一所述第一样本答题数据的所述样本答题信息进行特征转化,得到样本训练特征;Using the sample word segmentation set to perform feature conversion on the sample answer information of each of the first sample answer data to obtain sample training features;
根据所述样本训练特征和对应的第一评分值对决策树模型进行训练,得到决策树样本模型;Training the decision tree model according to the sample training feature and the corresponding first score value to obtain the decision tree sample model;
从所述决策树样本模型中提取样本关键词。Extract sample keywords from the decision tree sample model.
在一个实施例中,一个或多个存储有计算机可读指令的非易失性可读存储介质,所述 计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行如下步骤:In one embodiment, one or more non-volatile readable storage media storing computer readable instructions, when the computer readable instructions are executed by one or more processors, cause the one or more processing The device performs the following steps:
获取待评分答题信息;Get information about the answer to be graded;
对所述待评分答题信息进行关键词提取,得到核心关键词;Perform keyword extraction on the answer information to be scored to obtain core keywords;
采用目标考点对所述核心关键词进行特征转化,得到待评分考点特征;其中,所述目标考点是采用权利要求2所述的关键词的确定方法所得的;Use the target test site to perform feature transformation on the core keywords to obtain the features of the test site to be scored; wherein, the target test site is obtained by using the keyword determination method of claim 2;
将所述待评分考点特征输入到预设的决策树参考模型中,得到所述待评分答题信息的准确分值。The characteristics of the test point to be scored are input into a preset decision tree reference model to obtain the accurate score of the answer information to be scored.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,所述的计算机可读指令可存储于一非易失性计算机可读取存储介质中,该计算机可读指令在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。A person of ordinary skill in the art may understand that all or part of the process in the method of the foregoing embodiments may be completed by instructing relevant hardware through computer-readable instructions, and the computer-readable instructions may be stored in a non-volatile computer In a readable storage medium, when the computer-readable instructions are executed, they may include the processes of the foregoing method embodiments. Wherein, any reference to the memory, storage, database or other media used in the embodiments provided in this application may include non-volatile and/or volatile memory. Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory can include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,仅以上述各功能单元、模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能单元、模块完成,即将所述装置的内部结构划分成不同的功能单元或模块,以完成以上描述的全部或者部分功能。Those skilled in the art can clearly understand that, for convenience and conciseness of description, only the above-mentioned division of each functional unit and module is used as an example for illustration. In practical applications, the above-mentioned functions may be allocated by different functional units, Module completion means that the internal structure of the device is divided into different functional units or modules to complete all or part of the functions described above.
以上所述实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围,均应包含在本申请的保护范围之内。The above-mentioned embodiments are only used to illustrate the technical solutions of the present application, not to limit them; although the present application has been described in detail with reference to the foregoing embodiments, persons of ordinary skill in the art should understand that they can still implement the foregoing The technical solutions described in the examples are modified, or some of the technical features are equivalently replaced; and these modifications or replacements do not deviate from the spirit and scope of the technical solutions of the embodiments of the present application. Within the scope of protection of this application.

Claims (20)

  1. 一种关键词的确定方法,其特征在于,包括:A method for determining keywords is characterized by including:
    获取N个第一样本答题数据,每一所述第一样本答题数据包括样本答题信息和第一评分值,N为正整数;Acquiring N first sample answer data, each of the first sample answer data includes sample answer information and a first score value, and N is a positive integer;
    对每一所述第一样本答题数据的所述样本答题信息进行分词处理,得到每一所述第一样本答题数据的样本分词;Performing word segmentation processing on the sample answer information of each of the first sample answer data to obtain sample word segmentation of each of the first sample answer data;
    对每一所述第一样本答题数据的所述样本分词进行汇总,得到样本分词集;Summarize the sample word segmentation of each of the first sample answer data to obtain a sample word segmentation set;
    采用所述样本分词集对每一所述第一样本答题数据的所述样本答题信息进行特征转化,得到样本训练特征;Using the sample word segmentation set to perform feature conversion on the sample answer information of each of the first sample answer data to obtain sample training features;
    根据所述样本训练特征和对应的第一评分值对决策树模型进行训练,得到决策树样本模型;Training the decision tree model according to the sample training feature and the corresponding first score value to obtain the decision tree sample model;
    从所述决策树样本模型中提取样本关键词。Extract sample keywords from the decision tree sample model.
  2. 如权利要求1所述的关键词的确定方法,其特征在于,在所述从所述决策树样本模型中提取样本关键词之后,所述关键词的确定方法还包括:5. The method for determining keywords according to claim 1, wherein after said extracting the sample keywords from the decision tree sample model, the method for determining keywords further comprises:
    获取评分规则信息,所述评分规则信息包括预设考点和每一所述预设考点对应的预设关键词;Acquiring scoring rule information, where the scoring rule information includes preset test sites and preset keywords corresponding to each of the preset test sites;
    从所述样本关键词中去除和所述预设关键词重复的关键词,得到目标关键词;Removing keywords that overlap with the preset keywords from the sample keywords to obtain the target keywords;
    发送所述目标关键词至客户端,获取所述客户端根据所述目标关键词返回的考点标签;Sending the target keyword to the client, and obtaining the test center label returned by the client according to the target keyword;
    根据所述考点标签将每一所述目标关键词加入到对应的所述预设考点中,得到目标考点。Add each of the target keywords to the corresponding preset test center according to the test center tag to obtain the target test center.
  3. 一种自动评分方法,其特征在于,包括:An automatic scoring method, characterized in that it comprises:
    获取待评分答题信息;Get information about the answer to be graded;
    对所述待评分答题信息进行关键词提取,得到核心关键词;Perform keyword extraction on the answer information to be scored to obtain core keywords;
    采用目标考点对所述核心关键词进行特征转化,得到待评分考点特征;其中,所述目标考点是采用权利要求2所述的关键词的确定方法所得的;Use the target test site to perform feature transformation on the core keywords to obtain the features of the test site to be scored; wherein, the target test site is obtained by using the keyword determination method of claim 2;
    将所述待评分考点特征输入到预设的决策树参考模型中,得到所述待评分答题信息的准确分值。The characteristics of the test point to be scored are input into a preset decision tree reference model to obtain the accurate score of the answer information to be scored.
  4. 如权利要求3所述的自动评分方法,其特征在于,所述采用目标考点对所述核心关键词进行特征转化,得到待评分考点特征,包括:5. The automatic scoring method according to claim 3, wherein said adopting the target test site to perform feature conversion on the core keywords to obtain the features of the test site to be scored comprises:
    获取所述目标考点对应的有效关键词;Obtain valid keywords corresponding to the target test site;
    通过正则匹配法,将所述有效关键词与所述核心关键词进行一一匹配,得到关键词匹配信息;Through the regular matching method, the effective keywords are matched with the core keywords one by one to obtain keyword matching information;
    根据所述关键词匹配信息,为每一所述核心关键词分配对应的匹配标识;According to the keyword matching information, assign a corresponding matching identifier to each of the core keywords;
    根据每一所述核心关键词的匹配标识,得到待评分考点特征。According to the matching identifier of each of the core keywords, the characteristics of the test points to be scored are obtained.
  5. 如权利要求3所述的自动评分方法,其特征在于,在将所述待评分考点特征输入到预设的决策树参考模型中,得到所述待评分答题信息的输出分值之前,所述自动评分方法还包括:The automatic scoring method according to claim 3, characterized in that, before the feature of the test point to be scored is input into a preset decision tree reference model to obtain the output score of the answer information to be scored, the automatic The scoring method also includes:
    获取M个第二样本答题数据,每一所述第二样本答题数据包括原始答题信息和第二评分值,M为正整数;Acquiring M second sample answer data, each of the second sample answer data includes original answer information and a second score value, and M is a positive integer;
    采用所述目标考点对每一所述第二样本答题数据的所述原始答题信息进行特征转化,得到考点训练特征;Using the target test site to perform feature transformation on the original answer information of each of the second sample answer data to obtain test site training features;
    将所述考点训练特征和对应的所述第二评分值组成考点样本集;Forming the test site training feature and the corresponding second score value into a test site sample set;
    根据所述考点样本集对决策树模型进行训练,得到决策树参考模型。The decision tree model is trained according to the test site sample set to obtain the decision tree reference model.
  6. 一种关键词的确定装置,其特征在于,包括:A keyword determining device is characterized in that it includes:
    第一样本答题数据获取模块,用于获取N个第一样本答题数据,每一所述第一样本答题数据包括样本答题信息和第一评分值,N为正整数;The first sample answer data acquisition module is used to acquire N first sample answer data, each of the first sample answer data includes sample answer information and a first score value, and N is a positive integer;
    分词处理模块,用于对每一所述第一样本答题数据的所述样本答题信息进行分词处理,得到每一所述第一样本答题数据的样本分词;The word segmentation processing module is configured to perform word segmentation processing on the sample answer information of each of the first sample answer data to obtain the sample word segmentation of each of the first sample answer data;
    分词汇总模块,用于对每一所述第一样本答题数据的所述样本分词进行汇总,得到样本分词集;The total vocabulary segmentation module is used to summarize the sample segmentation of each of the first sample answer data to obtain a sample segmentation set;
    样本特征转化模块,用于采用所述样本分词集对每一所述第一样本答题数据的所述样本答题信息进行特征转化,得到样本训练特征;A sample feature conversion module, configured to use the sample word segmentation set to perform feature conversion on the sample answer information of each of the first sample answer data to obtain sample training features;
    决策树样本模型训练模块,用于根据所述样本训练特征和对应的第一评分值对决策树模型进行训练,得到决策树样本模型;The decision tree sample model training module is used to train the decision tree model according to the sample training feature and the corresponding first score value to obtain the decision tree sample model;
    样本关键词提取模块,用于从所述决策树样本模型中提取样本关键词。The sample keyword extraction module is used to extract sample keywords from the decision tree sample model.
  7. 一种关键词的确定装置,其特征在于,还包括:A keyword determining device is characterized in that it further includes:
    评分规则信息获取模块,用于获取评分规则信息,所述评分规则信息包括预设考点和每一所述预设考点对应的预设关键词;The scoring rule information obtaining module is used to obtain scoring rule information, where the scoring rule information includes preset test sites and preset keywords corresponding to each of the preset test sites;
    重复关键词去除模块,用于从所述样本关键词中去除和所述预设关键词重复的关键词,得到目标关键词;The repeated keyword removal module is used to remove keywords that are repeated with the preset keywords from the sample keywords to obtain target keywords;
    考点标签获取模块,用于发送所述目标关键词至客户端,获取所述客户端根据所述目标关键词返回的考点标签;The test center label acquisition module, configured to send the target keyword to the client, and obtain the test center label returned by the client according to the target keyword;
    目标关键词加入模块,用于根据所述考点标签将每一所述目标关键词加入到对应的所述预设考点中,得到目标考点。The target keyword adding module is used to add each target keyword to the corresponding preset test site according to the test site tag to obtain a target test site.
  8. 一种自动评分装置,其特征在于,包括:An automatic scoring device, characterized by comprising:
    待评分答题信息获取模块,用于获取待评分答题信息;To-be-graded answer information acquisition module, used to obtain the to-be-graded answer information;
    关键词提取模块,用于对所述待评分答题信息进行关键词提取,得到核心关键词;The keyword extraction module is used to extract keywords from the answer information to be scored to obtain core keywords;
    待评分考点特征转化模块,用于采用目标考点对所述核心关键词进行特征转化,得到待评分考点特征;其中,所述目标考点是采用权利要求2所述的关键词的确定方法所得的;The feature conversion module of the test point to be scored is used to transform the core keywords with the target test point to obtain the feature of the test point to be scored; wherein, the target test point is obtained by using the method for determining keywords according to claim 2;
    输入模块,用于将所述待评分考点特征输入到预设的决策树参考模型中,得到所述待评分答题信息的准确分值。The input module is used to input the characteristics of the test site to be scored into a preset decision tree reference model to obtain an accurate score of the answer information to be scored.
  9. 如权利要求8所述的自动评分装置,其特征在于,所述待评分考点特征转化模块包括:8. The automatic scoring device according to claim 8, wherein the feature conversion module of the test point to be scored comprises:
    有效关键词获取单元,用于获取所述目标考点对应的有效关键词;The effective keyword acquisition unit is used to acquire the effective keywords corresponding to the target test site;
    匹配单元,用于通过正则匹配法,将所述有效关键词与所述核心关键词进行一一匹配,得到关键词匹配信息;The matching unit is used to match the effective keywords with the core keywords one by one through a regular matching method to obtain keyword matching information;
    分配单元,用于根据所述关键词匹配信息,为每一所述核心关键词分配对应的匹配标识;An allocation unit, configured to allocate a corresponding matching identifier to each of the core keywords according to the keyword matching information;
    得到单元,用于根据每一所述核心关键词的匹配标识,得到待评分考点特征。The obtaining unit is used to obtain the feature of the test point to be scored according to the matching identifier of each of the core keywords.
  10. 如权利要求9所述的自动评分装置,其特征在于,所述输入模块包括:9. The automatic scoring device of claim 9, wherein the input module comprises:
    第二样本答题数据获取单元,用于获取M个第二样本答题数据,每一所述第二样本答题数据包括原始答题信息和第二评分值,M为正整数;The second sample answer data acquisition unit is used to acquire M second sample answer data, each of the second sample answer data includes original answer information and a second score value, and M is a positive integer;
    考点特征转化单元,用于采用所述目标考点对每一所述第二样本答题数据的所述原始答题信息进行特征转化,得到考点训练特征;A test site feature conversion unit, configured to use the target test site to perform feature conversion on the original answer information of each of the second sample answer data to obtain test site training features;
    组成单元,用于将所述考点训练特征和对应的所述第二评分值组成考点样本集;A composition unit, configured to form the test site training feature and the corresponding second score value into a test site sample set;
    决策树参考模型训练单元,用于根据所述考点样本集对决策树模型进行训练,得到决策树参考模型。The decision tree reference model training unit is used to train the decision tree model according to the test site sample set to obtain the decision tree reference model.
  11. 一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,其特征在于,所述处理器执行所述计算机可读指令时实现如 下步骤:A computer device includes a memory, a processor, and computer-readable instructions stored in the memory and capable of running on the processor, wherein the processor executes the computer-readable instructions as follows step:
    获取N个第一样本答题数据,每一所述第一样本答题数据包括样本答题信息和第一评分值,N为正整数;Acquiring N first sample answer data, each of the first sample answer data includes sample answer information and a first score value, and N is a positive integer;
    对每一所述第一样本答题数据的所述样本答题信息进行分词处理,得到每一所述第一样本答题数据的样本分词;Performing word segmentation processing on the sample answer information of each of the first sample answer data to obtain sample word segmentation of each of the first sample answer data;
    对每一所述第一样本答题数据的所述样本分词进行汇总,得到样本分词集;Summarize the sample word segmentation of each of the first sample answer data to obtain a sample word segmentation set;
    采用所述样本分词集对每一所述第一样本答题数据的所述样本答题信息进行特征转化,得到样本训练特征;Using the sample word segmentation set to perform feature conversion on the sample answer information of each of the first sample answer data to obtain sample training features;
    根据所述样本训练特征和对应的第一评分值对决策树模型进行训练,得到决策树样本模型;Training the decision tree model according to the sample training feature and the corresponding first score value to obtain the decision tree sample model;
    从所述决策树样本模型中提取样本关键词。Extract sample keywords from the decision tree sample model.
  12. 如权利要求11所述的计算机设备,其特征在于,在所述从所述决策树样本模型中提取样本关键词之后,所述处理器执行所述计算机可读指令时还实现如下步骤:11. The computer device of claim 11, wherein after the sample keywords are extracted from the decision tree sample model, the processor further implements the following steps when executing the computer-readable instructions:
    获取评分规则信息,所述评分规则信息包括预设考点和每一所述预设考点对应的预设关键词;Acquiring scoring rule information, where the scoring rule information includes preset test sites and preset keywords corresponding to each of the preset test sites;
    从所述样本关键词中去除和所述预设关键词重复的关键词,得到目标关键词;Removing keywords that overlap with the preset keywords from the sample keywords to obtain the target keywords;
    发送所述目标关键词至客户端,获取所述客户端根据所述目标关键词返回的考点标签;Sending the target keyword to the client, and obtaining the test center label returned by the client according to the target keyword;
    根据所述考点标签将每一所述目标关键词加入到对应的所述预设考点中,得到目标考点。Add each of the target keywords to the corresponding preset test center according to the test center tag to obtain the target test center.
  13. 一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,其特征在于,所述处理器执行所述计算机可读指令时实现如下步骤:A computer device, including a memory, a processor, and computer-readable instructions stored in the memory and executable on the processor, characterized in that, when the processor executes the computer-readable instructions, it is implemented as follows step:
    获取待评分答题信息;Get information about the answer to be graded;
    对所述待评分答题信息进行关键词提取,得到核心关键词;Perform keyword extraction on the answer information to be scored to obtain core keywords;
    采用目标考点对所述核心关键词进行特征转化,得到待评分考点特征;其中,所述目标考点是采用权利要求2所述的关键词的确定方法所得的;Use the target test site to perform feature transformation on the core keywords to obtain the features of the test site to be scored; wherein, the target test site is obtained by using the keyword determination method of claim 2;
    将所述待评分考点特征输入到预设的决策树参考模型中,得到所述待评分答题信息的准确分值。The characteristics of the test point to be scored are input into a preset decision tree reference model to obtain the accurate score of the answer information to be scored.
  14. 如权利要求13所述的计算机设备,其特征在于,所述采用目标考点对所述核心关键词进行特征转化,得到待评分考点特征,包括:The computer device according to claim 13, wherein said adopting the target test site to perform feature conversion on the core keyword to obtain the feature of the test site to be scored comprises:
    获取所述目标考点对应的有效关键词;Obtain valid keywords corresponding to the target test site;
    通过正则匹配法,将所述有效关键词与所述核心关键词进行一一匹配,得到关键词匹配信息;Through the regular matching method, the effective keywords are matched with the core keywords one by one to obtain keyword matching information;
    根据所述关键词匹配信息,为每一所述核心关键词分配对应的匹配标识;According to the keyword matching information, assign a corresponding matching identifier to each of the core keywords;
    根据每一所述核心关键词的匹配标识,得到待评分考点特征。According to the matching identifier of each of the core keywords, the characteristics of the test points to be scored are obtained.
  15. 如权利要求14所述的计算机设备,其特征在于,在将所述待评分考点特征输入到预设的决策树参考模型中,得到所述待评分答题信息的输出分值之前,所述处理器执行所述计算机可读指令时还实现如下步骤:The computer device according to claim 14, wherein before the feature of the test point to be scored is input into a preset decision tree reference model to obtain the output score of the answer information to be scored, the processor The following steps are also implemented when the computer-readable instructions are executed:
    获取M个第二样本答题数据,每一所述第二样本答题数据包括原始答题信息和第二评分值,M为正整数;Acquiring M second sample answer data, each of the second sample answer data includes original answer information and a second score value, and M is a positive integer;
    采用所述目标考点对每一所述第二样本答题数据的所述原始答题信息进行特征转化,得到考点训练特征;Using the target test site to perform feature transformation on the original answer information of each of the second sample answer data to obtain test site training features;
    将所述考点训练特征和对应的所述第二评分值组成考点样本集;Forming the test site training feature and the corresponding second score value into a test site sample set;
    根据所述考点样本集对决策树模型进行训练,得到决策树参考模型。The decision tree model is trained according to the test site sample set to obtain the decision tree reference model.
  16. 一个或多个存储有计算机可读指令的非易失性可读存储介质,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行如下步骤:One or more non-volatile readable storage media storing computer readable instructions. When the computer readable instructions are executed by one or more processors, the one or more processors execute the following steps:
    获取N个第一样本答题数据,每一所述第一样本答题数据包括样本答题信息和第一评分值,N为正整数;Acquiring N first sample answer data, each of the first sample answer data includes sample answer information and a first score value, and N is a positive integer;
    对每一所述第一样本答题数据的所述样本答题信息进行分词处理,得到每一所述第一样本答题数据的样本分词;Performing word segmentation processing on the sample answer information of each of the first sample answer data to obtain sample word segmentation of each of the first sample answer data;
    对每一所述第一样本答题数据的所述样本分词进行汇总,得到样本分词集;Summarize the sample word segmentation of each of the first sample answer data to obtain a sample word segmentation set;
    采用所述样本分词集对每一所述第一样本答题数据的所述样本答题信息进行特征转化,得到样本训练特征;Using the sample word segmentation set to perform feature conversion on the sample answer information of each of the first sample answer data to obtain sample training features;
    根据所述样本训练特征和对应的第一评分值对决策树模型进行训练,得到决策树样本模型;Training the decision tree model according to the sample training feature and the corresponding first score value to obtain the decision tree sample model;
    从所述决策树样本模型中提取样本关键词。Extract sample keywords from the decision tree sample model.
  17. 如权利要求16所述的非易失性可读存储介质,其特征在于,在所述从所述决策树样本模型中提取样本关键词之后,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器还执行如下步骤:The non-volatile readable storage medium of claim 16, wherein, after the sample keywords are extracted from the decision tree sample model, the computer-readable instructions are executed by one or more processors When executed, the one or more processors are caused to further execute the following steps:
    获取评分规则信息,所述评分规则信息包括预设考点和每一所述预设考点对应的预设关键词;Acquiring scoring rule information, where the scoring rule information includes preset test sites and preset keywords corresponding to each of the preset test sites;
    从所述样本关键词中去除和所述预设关键词重复的关键词,得到目标关键词;Removing keywords that overlap with the preset keywords from the sample keywords to obtain the target keywords;
    发送所述目标关键词至客户端,获取所述客户端根据所述目标关键词返回的考点标签;Sending the target keyword to the client, and obtaining the test center label returned by the client according to the target keyword;
    根据所述考点标签将每一所述目标关键词加入到对应的所述预设考点中,得到目标考点。Add each of the target keywords to the corresponding preset test center according to the test center tag to obtain the target test center.
  18. 一个或多个存储有计算机可读指令的非易失性可读存储介质,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行如下步骤:One or more non-volatile readable storage media storing computer readable instructions. When the computer readable instructions are executed by one or more processors, the one or more processors execute the following steps:
    获取待评分答题信息;Get information about the answer to be graded;
    对所述待评分答题信息进行关键词提取,得到核心关键词;Perform keyword extraction on the answer information to be scored to obtain core keywords;
    采用目标考点对所述核心关键词进行特征转化,得到待评分考点特征;其中,所述目标考点是采用权利要求2所述的关键词的确定方法所得的;Use the target test site to perform feature transformation on the core keywords to obtain the features of the test site to be scored; wherein, the target test site is obtained by using the keyword determination method of claim 2;
    将所述待评分考点特征输入到预设的决策树参考模型中,得到所述待评分答题信息的准确分值。The characteristics of the test point to be scored are input into a preset decision tree reference model to obtain the accurate score of the answer information to be scored.
  19. 如权利要求18所述的非易失性可读存储介质,其特征在于,所述采用目标考点对所述核心关键词进行特征转化,得到待评分考点特征,包括:The non-volatile readable storage medium according to claim 18, wherein said adopting the target test site to perform feature conversion on the core keywords to obtain the features of the test site to be scored comprises:
    获取所述目标考点对应的有效关键词;Obtain valid keywords corresponding to the target test site;
    通过正则匹配法,将所述有效关键词与所述核心关键词进行一一匹配,得到关键词匹配信息;Through the regular matching method, the effective keywords are matched with the core keywords one by one to obtain keyword matching information;
    根据所述关键词匹配信息,为每一所述核心关键词分配对应的匹配标识;According to the keyword matching information, assign a corresponding matching identifier to each of the core keywords;
    根据每一所述核心关键词的匹配标识,得到待评分考点特征。According to the matching identifier of each of the core keywords, the characteristics of the test points to be scored are obtained.
  20. 如权利要求19所述的非易失性可读存储介质,其特征在于,在将所述待评分考点特征输入到预设的决策树参考模型中,得到所述待评分答题信息的输出分值之前,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行如下步骤:The non-volatile readable storage medium of claim 19, wherein the feature of the test point to be scored is input into a preset decision tree reference model to obtain the output score of the answer information to be scored Previously, when the computer-readable instructions are executed by one or more processors, the one or more processors execute the following steps:
    获取M个第二样本答题数据,每一所述第二样本答题数据包括原始答题信息和第二评分值,M为正整数;Acquiring M second sample answer data, each of the second sample answer data includes original answer information and a second score value, and M is a positive integer;
    采用所述目标考点对每一所述第二样本答题数据的所述原始答题信息进行特征转化,得到考点训练特征;Using the target test site to perform feature transformation on the original answer information of each of the second sample answer data to obtain test site training features;
    将所述考点训练特征和对应的所述第二评分值组成考点样本集;Forming the test site training feature and the corresponding second score value into a test site sample set;
    根据所述考点样本集对决策树模型进行训练,得到决策树参考模型。The decision tree model is trained according to the test site sample set to obtain the decision tree reference model.
PCT/CN2019/088544 2019-01-18 2019-05-27 Keyword determination method, automatic scoring method, apparatus and device, and medium WO2020147238A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910049180.5A CN109829155B (en) 2019-01-18 2019-01-18 Keyword determination method, automatic scoring method, device, equipment and medium
CN201910049180.5 2019-01-18

Publications (1)

Publication Number Publication Date
WO2020147238A1 true WO2020147238A1 (en) 2020-07-23

Family

ID=66861731

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/088544 WO2020147238A1 (en) 2019-01-18 2019-05-27 Keyword determination method, automatic scoring method, apparatus and device, and medium

Country Status (2)

Country Link
CN (1) CN109829155B (en)
WO (1) WO2020147238A1 (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112036572A (en) * 2020-08-28 2020-12-04 上海冰鉴信息科技有限公司 Text list-based user feature extraction method and device
CN112115236A (en) * 2020-10-09 2020-12-22 湖北中烟工业有限责任公司 Method and device for constructing tobacco scientific and technical literature data deduplication model
CN112307133A (en) * 2020-10-29 2021-02-02 平安普惠企业管理有限公司 Security protection method and device, computer equipment and storage medium
CN112508405A (en) * 2020-12-07 2021-03-16 云南电网有限责任公司普洱供电局 Evaluation method and device for power distribution network operation control, computer equipment and medium
CN113065132A (en) * 2021-03-25 2021-07-02 深信服科技股份有限公司 Confusion detection method and device for macro program, electronic equipment and storage medium
CN113344125A (en) * 2021-06-29 2021-09-03 中国平安人寿保险股份有限公司 Long text matching identification method and device, electronic equipment and storage medium
CN113823326A (en) * 2021-08-16 2021-12-21 华南理工大学 Method for using training sample of efficient voice keyword detector
CN114281928A (en) * 2020-09-28 2022-04-05 中国移动通信集团广西有限公司 Model generation method, device and equipment based on text data
CN114329051A (en) * 2021-12-31 2022-04-12 腾讯科技(深圳)有限公司 Data information identification method, device, equipment, storage medium and program product
CN115936530A (en) * 2022-12-29 2023-04-07 北京三星九千认证中心有限公司 Keyword-based job performance assessment method and device
CN116072274A (en) * 2023-03-06 2023-05-05 四川互慧软件有限公司 Automatic dispatch system for medical care of ambulance
CN116304277A (en) * 2023-03-01 2023-06-23 深圳一资源网络平台有限公司 Intelligent matching method, system and storage medium based on AI

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110414819B (en) * 2019-07-19 2023-05-26 中国电信集团工会上海市委员会 Work order scoring method
CN110609953A (en) * 2019-08-28 2019-12-24 苏州承儒信息科技有限公司 Reading and amending method and system for internet education
CN111414456A (en) * 2020-03-20 2020-07-14 北京师范大学 Method and system for automatically scoring open type short answer questions
CN111523322A (en) * 2020-04-25 2020-08-11 中信银行股份有限公司 Requirement document quality evaluation model training method and requirement document quality evaluation method
CN112395855A (en) * 2020-12-03 2021-02-23 中国联合网络通信集团有限公司 Comment-based evaluation method and device
CN112634689A (en) * 2020-12-24 2021-04-09 广州奇大教育科技有限公司 Application method of regular expression in automatic subjective question changing in computer teaching
CN112613585A (en) * 2021-01-07 2021-04-06 绿湾网络科技有限公司 Method and device for determining article category
CN113724738B (en) * 2021-08-31 2024-04-23 硅基(昆山)智能科技有限公司 Speech processing method, decision tree model training method, device, equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101859338A (en) * 2009-05-14 2010-10-13 深圳市海云天科技股份有限公司 Examination paper reading system and marking implementation method thereof
US9342491B2 (en) * 2012-07-31 2016-05-17 International Business Machines Corporation Enriching website content with extracted feature multi-dimensional vector comparison
CN105989040A (en) * 2015-02-03 2016-10-05 阿里巴巴集团控股有限公司 Intelligent question-answer method, device and system
CN108363687A (en) * 2018-01-16 2018-08-03 深圳市脑洞科技有限公司 Subjective item scores and its construction method, electronic equipment and the storage medium of model

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103151042B (en) * 2013-01-23 2016-02-24 中国科学院深圳先进技术研究院 Full-automatic oral evaluation management and points-scoring system and methods of marking thereof
US10579940B2 (en) * 2016-08-18 2020-03-03 International Business Machines Corporation Joint embedding of corpus pairs for domain mapping
CN106897384B (en) * 2017-01-23 2020-09-11 科大讯飞股份有限公司 Method and device for automatically evaluating key points
EP3392780A3 (en) * 2017-04-19 2018-11-07 Tata Consultancy Services Limited Systems and methods for classification of software defect reports
CN107273861A (en) * 2017-06-20 2017-10-20 广东小天才科技有限公司 A kind of subjective question marking methods of marking, device and terminal device
CN108197098B (en) * 2017-11-22 2021-12-24 创新先进技术有限公司 Method, device and equipment for generating keyword combination strategy and expanding keywords
CN108959261A (en) * 2018-07-06 2018-12-07 京工博创(北京)科技有限公司 Paper subjective item based on natural language sentences topic device and method
CN108846138B (en) * 2018-07-10 2022-06-07 苏州大学 Question classification model construction method, device and medium fusing answer information
CN109213847A (en) * 2018-09-14 2019-01-15 广州神马移动信息科技有限公司 Layered approach and its device, electronic equipment, the computer-readable medium of answer

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101859338A (en) * 2009-05-14 2010-10-13 深圳市海云天科技股份有限公司 Examination paper reading system and marking implementation method thereof
US9342491B2 (en) * 2012-07-31 2016-05-17 International Business Machines Corporation Enriching website content with extracted feature multi-dimensional vector comparison
CN105989040A (en) * 2015-02-03 2016-10-05 阿里巴巴集团控股有限公司 Intelligent question-answer method, device and system
CN108363687A (en) * 2018-01-16 2018-08-03 深圳市脑洞科技有限公司 Subjective item scores and its construction method, electronic equipment and the storage medium of model

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112036572A (en) * 2020-08-28 2020-12-04 上海冰鉴信息科技有限公司 Text list-based user feature extraction method and device
CN112036572B (en) * 2020-08-28 2024-03-12 上海冰鉴信息科技有限公司 Text list-based user feature extraction method and device
CN114281928A (en) * 2020-09-28 2022-04-05 中国移动通信集团广西有限公司 Model generation method, device and equipment based on text data
CN112115236A (en) * 2020-10-09 2020-12-22 湖北中烟工业有限责任公司 Method and device for constructing tobacco scientific and technical literature data deduplication model
CN112115236B (en) * 2020-10-09 2024-02-02 湖北中烟工业有限责任公司 Construction method and device of tobacco science and technology literature data deduplication model
CN112307133A (en) * 2020-10-29 2021-02-02 平安普惠企业管理有限公司 Security protection method and device, computer equipment and storage medium
CN112508405A (en) * 2020-12-07 2021-03-16 云南电网有限责任公司普洱供电局 Evaluation method and device for power distribution network operation control, computer equipment and medium
CN113065132A (en) * 2021-03-25 2021-07-02 深信服科技股份有限公司 Confusion detection method and device for macro program, electronic equipment and storage medium
CN113065132B (en) * 2021-03-25 2023-11-03 深信服科技股份有限公司 Method and device for detecting confusion of macro program, electronic equipment and storage medium
CN113344125A (en) * 2021-06-29 2021-09-03 中国平安人寿保险股份有限公司 Long text matching identification method and device, electronic equipment and storage medium
CN113344125B (en) * 2021-06-29 2024-04-05 中国平安人寿保险股份有限公司 Long text matching recognition method and device, electronic equipment and storage medium
CN113823326B (en) * 2021-08-16 2023-09-19 华南理工大学 Method for using training sample of high-efficiency voice keyword detector
CN113823326A (en) * 2021-08-16 2021-12-21 华南理工大学 Method for using training sample of efficient voice keyword detector
CN114329051B (en) * 2021-12-31 2024-03-05 腾讯科技(深圳)有限公司 Data information identification method, device, apparatus, storage medium and program product
CN114329051A (en) * 2021-12-31 2022-04-12 腾讯科技(深圳)有限公司 Data information identification method, device, equipment, storage medium and program product
CN115936530A (en) * 2022-12-29 2023-04-07 北京三星九千认证中心有限公司 Keyword-based job performance assessment method and device
CN116304277A (en) * 2023-03-01 2023-06-23 深圳一资源网络平台有限公司 Intelligent matching method, system and storage medium based on AI
CN116304277B (en) * 2023-03-01 2023-12-15 张素愿 Intelligent matching method, system and storage medium based on AI
CN116072274A (en) * 2023-03-06 2023-05-05 四川互慧软件有限公司 Automatic dispatch system for medical care of ambulance

Also Published As

Publication number Publication date
CN109829155B (en) 2024-03-22
CN109829155A (en) 2019-05-31

Similar Documents

Publication Publication Date Title
WO2020147238A1 (en) Keyword determination method, automatic scoring method, apparatus and device, and medium
US20230205610A1 (en) Systems and methods for removing identifiable information
US11327975B2 (en) Methods and systems for improved entity recognition and insights
WO2021093755A1 (en) Matching method and apparatus for questions, and reply method and apparatus for questions
WO2020057022A1 (en) Associative recommendation method and apparatus, computer device, and storage medium
WO2022142613A1 (en) Training corpus expansion method and apparatus, and intent recognition model training method and apparatus
US10713306B2 (en) Content pattern based automatic document classification
CN112818093B (en) Evidence document retrieval method, system and storage medium based on semantic matching
US11914968B2 (en) Official document processing method, device, computer equipment and storage medium
WO2022048363A1 (en) Website classification method and apparatus, computer device, and storage medium
WO2022088671A1 (en) Automated question answering method and apparatus, device, and storage medium
CN110555206A (en) named entity identification method, device, equipment and storage medium
WO2022134805A1 (en) Document classification prediction method and apparatus, and computer device and storage medium
CN111723870B (en) Artificial intelligence-based data set acquisition method, apparatus, device and medium
CN113656547B (en) Text matching method, device, equipment and storage medium
CN111368096A (en) Knowledge graph-based information analysis method, device, equipment and storage medium
JP2022119207A (en) Utilizing machine learning and natural language processing to extract and verify vaccination data
CN114647713A (en) Knowledge graph question-answering method, device and storage medium based on virtual confrontation
US11604923B2 (en) High volume message classification and distribution
CN113486664A (en) Text data visualization analysis method, device, equipment and storage medium
CN117520503A (en) Financial customer service dialogue generation method, device, equipment and medium based on LLM model
CN112087473A (en) Document downloading method and device, computer readable storage medium and computer equipment
CN116822491A (en) Log analysis method and device, equipment and storage medium
CN113254612A (en) Knowledge question-answering processing method, device, equipment and storage medium
CN112667809A (en) Text processing method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19910144

Country of ref document: EP

Kind code of ref document: A1