CN110489743A - A kind of information processing method, electronic equipment and storage medium - Google Patents
A kind of information processing method, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN110489743A CN110489743A CN201910662902.4A CN201910662902A CN110489743A CN 110489743 A CN110489743 A CN 110489743A CN 201910662902 A CN201910662902 A CN 201910662902A CN 110489743 A CN110489743 A CN 110489743A
- Authority
- CN
- China
- Prior art keywords
- document
- subset
- obtains
- scoring
- attribute information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Machine Translation (AREA)
Abstract
The embodiment of the present application discloses a kind of information processing method, this method comprises: obtaining the first document to be scored;Wherein, the first document includes multiple words;The attribute information of multiple words is obtained, attribute information is based on, determining the second document with the first document matches from preset collection of document;Based on the scoring of the second document, the scoring of the first document is predicted.Embodiments herein also discloses a kind of electronic equipment and storage medium simultaneously.
Description
Technical field
This application involves but be not limited to field of computer technology more particularly to a kind of information processing method, electronic equipment and
Storage medium.
Background technique
In the related technology when scoring a document, needs rating staff to read over the full content of the document, cause
The efficiency that scores is lower.
Apply for content
In order to solve the above technical problems, the embodiment of the present application is intended to provide a kind of information processing method, electronic equipment and deposits
Storage media solves the full content for when scoring a document, needing rating staff to read over the document in the related technology,
Lead to the lower problem of efficiency that scores, realizes automatic scoring, improve scoring efficiency.
The technical solution of the application is achieved in that
A kind of information processing method, which comprises
Obtain the first document to be scored;Wherein, first document includes multiple words;
The attribute information of the multiple word is obtained, the attribute information is based on, the determining and institute from preset collection of document
State the second document of the first document matches;
Based on the scoring of second document, the scoring of first document is predicted.
Optionally, the method also includes:
The preset collection of document is divided, is obtained and associated the first multiple and different subset of different Doctypes;
Subset belonging to first document is determined from multiple first subsets, obtains second subset;
Correspondingly, described be based on the attribute information, determining and first document matches from preset collection of document
The second document, comprising:
Based on the attribute information, second document is determined from the second subset.
Optionally, described that subset belonging to first document is determined from multiple first subsets, second subset is obtained, is wrapped
It includes:
Obtain the first keyword of each document in every one first subset and the first semantic letter of each document
Breath;
Obtain the second keyword of first document and the second semantic information of first document;
Obtain the first matching result and multiple described first of multiple first keywords and second keyword
Second matching result of semantic information and second semantic information;
Based on first matching result and second matching result, determined from the multiple first subset described
Second subset.
Optionally, first matching result for obtaining first keyword and second keyword and described
Second matching result of the first semantic information and second semantic information, comprising:
Corresponding multiple first subvectors of multiple first keywords are obtained, and determine the multiple first subvector
Weighted average obtains primary vector;
Corresponding multiple second subvectors of multiple first semantic informations are obtained, and determine the multiple second subvector
Weighted average, obtain secondary vector;
Obtain the corresponding third vector of second keyword;
Obtain corresponding 4th vector of second semantic information;
It determines the similarity between the primary vector and the third vector, obtains first matching result;
It determines the similarity between the secondary vector and the 4th vector, obtains second matching result.
Optionally, described to be based on the attribute information, second document is determined from the second subset, comprising:
Based on the attribute information, syntax dependency parsing is carried out to first document, obtains the first analysis result;
Based on the attribute information, semantic dependency analysis is carried out to first document, obtains the second analysis result;
Based on the first analysis result and second analysis as a result, determining second text from the second subset
Shelves.
Optionally, the scoring of first document is predicted in the scoring based on second document, comprising:
The multiple word is analyzed based on character recognition technologies, obtains the alignment degree of first document;
First document is analyzed based on analytic hierarchy process (AHP), obtains the syntactic structure of first document;
Scoring based at least one of the alignment degree and the syntactic structure and second document, in advance
Survey the scoring of first document.
A kind of electronic equipment, the electronic equipment include: processor, memory and communication bus;
The communication bus is for realizing the communication connection between processor and memory;
The processor is for executing the message handling program stored in memory, to perform the steps of
Obtain the first document to be scored;Wherein, first document includes multiple words;
The attribute information of the multiple word is obtained, the attribute information is based on, the determining and institute from preset collection of document
State the second document of the first document matches;
Based on the scoring of second document, the scoring of first document is predicted.
Optionally, the processor is also used to perform the steps of
The preset collection of document is divided, is obtained and associated the first multiple and different subset of different Doctypes;
Subset belonging to first document is determined from multiple first subsets, obtains second subset;
Correspondingly, described be based on the attribute information, determining and first document matches from preset collection of document
The second document, comprising:
Based on the attribute information, second document is determined from the second subset.
Optionally, the processor is also used to perform the steps of
The multiple word is analyzed based on character recognition technologies, obtains the alignment degree of first document;
First document is analyzed based on analytic hierarchy process (AHP), obtains the syntactic structure of first document;
Scoring based at least one of the alignment degree and the syntactic structure and second document, in advance
Survey the scoring of first document.
A kind of storage medium, which is characterized in that the storage medium is stored with one or more program, it is one or
The multiple programs of person can be executed by one or more processor, with realize as above-mentioned information processing method the step of.
Information processing method, electronic equipment provided by the embodiment of the present application and storage medium obtain to be scored first
Document, the first document include multiple words;And then the attribute information of multiple words is obtained, based on the attribute information from preset including more
The second document with the first document matches is filtered out in the collection of document of a document;It is based ultimately upon the scoring of the second document, to
The scoring of one document is predicted;In this way, realizing automatic scoring and accurate prediction, the first document is readed over without rating staff
Full content improves scoring efficiency.
Detailed description of the invention
Fig. 1 is a kind of flow diagram for information processing method that embodiments herein provides;
Fig. 2 is the flow diagram for another information processing method that embodiments herein provides;
Fig. 3 is a kind of structural schematic diagram for terminal that embodiments herein provides.
Specific embodiment
Below in conjunction with the attached drawing in the embodiment of the present application, technical solutions in the embodiments of the present application carries out clear, complete
Site preparation description.
Embodiments herein provides a kind of information processing method, is applied to electronic equipment, shown referring to Fig.1, this method
The following steps are included:
Step 101 obtains the first document to be scored.
Wherein, the first document includes multiple words;Illustratively, in the embodiment of the present application, multiple words form sentence, sentence
It is linked to be paragraph, paragraph forms the first document.Document involved in the embodiment of the present application includes but is not limited to composition, paper and text
Chapter.
In practical applications, the first document is as object to be scored;Illustratively, it in teaching process, such as is criticizing
When readding paper, the first document can be composition to be scored;During paper/article in more new database, the first text
Shelves can be the paper/article being newly uploaded in database to be scored.
Step 102, the attribute information for obtaining multiple words, are based on attribute information, determine from preset collection of document and the
Second document of one document matches.
In the embodiment of the present application, the attribute information of multiple words can be understood as multiple phrases possessed attribute letter altogether
Breath;Such as the attribute information of multiple words include multiple phrases altogether as a whole i.e. the first document when, first document
Possessed attribute information;May also comprise multiple phrases altogether as a whole i.e. the first document when, each word is at this
Possessed attribute information in entirety.
Illustratively, if the first document is composition to be scored, the attribute information of multiple words can be used for characterizing the composition
Type, such as narrative, expository writing, practical writing or argumentative writing;Certainly, it is more to can be also used for characterization for the attribute information of multiple words
A word sentence element corresponding in the sentence that these words form;Such as in the Chinese sentence of multiple word compositions, multiple words
Attribute information includes subject, predicate, object, dynamic language, attribute, the adverbial modifier, complement and head;The English of for example multiple word compositions again
In sentence, the attribute information of multiple words includes subject, predicate, object, predicative, attribute, the adverbial modifier, complement and appositive.
In practical applications, electronic equipment gets the first document to be scored, and obtains multiple words in the first document
After attribute information, it is based on the attribute information, the second document with the first document matches is found out from preset collection of document.Example
Property, if the first document is composition to be scored, the attribute informations of multiple words is used to characterize the type of the composition, then, with the
Second document of one document matches can be the document with the first document same type.In another example, if the first document is to be evaluated
The composition divided, the attribute information of multiple words are used to characterize multiple words sentence element corresponding in the sentence that these words form,
So, it can be the highest text of degree same or similar with the sentence element of the first document with the second document of the first document matches
Shelves.
Step 103, the scoring based on the second document predict the scoring of the first document.
In the embodiment of the present application, electronic equipment finds out the second document with the first document matches from preset collection of document
Afterwards, the scoring of the second document, and the scoring based on the second document are obtained, the scoring of the first document is predicted, and then obtains
Prediction result.It should be noted that the scoring for the first document that electronic equipment predicts, can be used as the most final review of the first document
The reference scoring divided;The scoring for the first document that electronic equipment predicts can also be used as the final scoring of the first document.
Information processing method, electronic equipment provided by the embodiment of the present application and storage medium obtain to be scored first
Document, the first document include multiple words;And then the attribute information of multiple words is obtained, and include from preset based on the attribute information
The second document with the first document matches is filtered out in the collection of document of multiple documents;It is based ultimately upon the scoring of the second document, it is right
The scoring of first document is predicted;In this way, realizing automatic scoring and accurate prediction, the first document is readed over without rating staff
Full content, improve scoring efficiency.
Based on previous embodiment, embodiments herein provides a kind of information processing method, is applied to electronic equipment, reference
Shown in Fig. 2, method includes the following steps:
Step 201 obtains the first document to be scored.
Wherein, the first document includes multiple words.
Step 202 divides preset collection of document, obtains and associated the first multiple and different son of different Doctypes
Collection.
In the embodiment of the present application, preset collection of document includes multiple with reference to text for scoring the first document
Shelves.Multiple reference documents have multiple Doctypes.Illustratively, different Doctypes corresponds to different document scores, and/
Or, different themes.Here, theme can be understood as the document central idea to be showed.
In the embodiment of the present application, electronic equipment can be divided preset collection of document, and the division operation is
It is carried out based on Doctype;It should be noted that electronic equipment divides preset collection of document, to obtain multiple
The operation of one subset can be based on realizing before the first subset executes corresponding operating in electronic equipment;That is, the application is real
It applies in example, the execution sequence for the step of dividing preset collection of document and obtaining the first document is not especially limited.
In practical applications, electronic equipment be based on Doctype, preset collection of document is divided, obtain from it is different
Associated the first multiple and different subset of Doctype.At this point, the first different subsets, corresponding different Doctype, example
Such as, the first different subsets has different scoring ranges, and/or, the first different subsets has different themes.
Step 203 determines subset belonging to the first document from multiple first subsets, obtains second subset.
In the embodiment of the present application, electronic equipment divides preset collection of document, obtains the feelings of multiple first subsets
Under condition, it can be focused to find out subset described in the first document i.e. second subset from multiple first sons, realize the classification of the first document.
In the embodiment of the present application, step 203 determines subset belonging to the first document from multiple first subsets, obtains the
The process that two subsets, i.e. electronic equipment sort out the first document can be realized by following steps, comprising:
Step 203a, obtain each document in every one first subset the first keyword and each document first
Semantic information.
Keyword involved in the embodiment of the present application includes but is not limited to select from the title of document, summary and text
Come, there is the vocabulary of essential meaning to the centre point of statement document.Illustratively, electronic equipment can be based at natural language
(Natural Language Processing, the NLP) technology of managing inverse text frequency (the term frequency-of word frequency-
Inverse document frequency, TF-IDF) extraction of the algorithm to document progress keyword.
In the embodiment of the present application, electronic equipment can obtain the semantic information of document based on deep learning model.
Step 203b, the second keyword of the first document and the second semantic information of the first document are obtained.
In practical applications, electronic equipment for the first document and divides each text in obtained each first subset
Shelves execute the operation of keyword extraction and Semantic features extraction, obtain the first keyword and each document of each document
First semantic information;The second keyword of the first document and the second semantic information of the first document are also obtained simultaneously.
Step 203c, the first matching result and multiple first languages of multiple first keywords and the second keyword are obtained
Second matching result of adopted information and the second semantic information.
In the embodiment of the present application, electronic equipment is based on keyword and semantic information the two key factors, realizes to first
The classification of document.Electronic equipment gets multiple first keywords, the second keyword, multiple first semantic informations and the second language
In the case where adopted information, multiple first keywords are matched to obtain the first matching result with the second keyword, and will be multiple
First semantic information is matched to obtain the second matching result with the second semantic information.Further, electronic equipment can be based on
First matching result and the second matching result realize the classification to the first document.
It, can be using Vectors matching when electronic equipment executes above-mentioned matching operation in the application based on previous embodiment
Mode is realized, it is possible to understand that ground, step 203c obtain the first matching result of multiple first keywords and the second keyword, and
Second matching result of multiple first semantic informations and the second semantic information, may include steps of:
Step1 obtains corresponding multiple first subvectors of multiple first keywords, and determines adding for multiple first subvectors
Weight average value, obtains primary vector.
In the embodiment of the present application, each subset includes multiple documents, and multiple documents correspond to multiple first keywords.Electronics is set
It is standby and then to obtain corresponding multiple first subvectors of multiple first keywords in the case where getting multiple first keywords,
And determine the weighted average of multiple first subvectors, obtain primary vector.
Step2 obtains corresponding multiple second subvectors of multiple first semantic informations, and determines multiple second subvectors
Weighted average obtains secondary vector.
In the embodiment of the present application, each subset includes multiple documents, and multiple documents correspond to multiple first semantic informations.Electronics
Equipment obtains corresponding multiple second sons of multiple first semantic informations in the case where getting multiple first semantic informations
Vector, and determine the weighted average of multiple second subvectors, obtain secondary vector.
Step3 obtains the corresponding third vector of the second keyword.
In the embodiment of the present application, electronic equipment obtains the second keyword in the case where getting the second keyword
Corresponding third vector.
Step4 obtains corresponding 4th vector of the second semantic information.
In the embodiment of the present application, electronic equipment is in the case where getting the second semantic information, and then it is semantic to obtain second
Corresponding 4th vector of information.
Step5 determines the similarity between primary vector and third vector, obtains the first matching result.
In the embodiment of the present application, electronic equipment determines primary vector in the case where obtaining primary vector and third vector
With the similarity between third vector, the first matching result is obtained.
Step6 determines the similarity between secondary vector and the 4th vector, obtains the second matching result.
In the embodiment of the present application, electronic equipment determines secondary vector in the case where obtaining secondary vector and four vectors
With the similarity between the 4th vector, the second matching result is obtained.
It can be seen from the above, during realizing the first document classification, it can be based on each document that each subset includes
In corresponding first subvector of keyword, third vector corresponding with the keyword that the first document includes matched;It can also
With include based on each subset all documents in corresponding first subvector of keyword weighted average i.e. primary vector,
Third vector corresponding with the keyword that the first document includes is matched;Meanwhile each document for based on each subset including
Corresponding second subvector of the first semantic information, corresponding with the second semantic information of the first document the 4th vector carries out
Match;Can also corresponding second subvector of the first semantic information based on all documents that each subset includes weighted average
That is secondary vector, the 4th vector corresponding with the second semantic information of the first document are matched.It should be noted that based on the
One vector is matched with third vector, and is matched based on secondary vector with the 4th vector, and matching times can be reduced,
It improves and sorts out efficiency.
Step 203d, it is based on the first matching result and the second matching result, the second son is determined from multiple first subsets
Collection.
It, can be in the case that electronic equipment gets the first matching result and the second matching result in the embodiment of the present application
Based on the first matching result and the second matching result, second subset is determined from multiple first subsets.It is to be appreciated that electronics
In the case that equipment gets the first matching result and the second matching result, the first matching knot in multiple first subsets can be determined
Distance is nearest between the corresponding primary vector of fruit and third vector and the corresponding secondary vector of the second matching result and four-way
The first nearest subset of distance is subset belonging to the first document between amount.
Step 204, the attribute information for obtaining multiple words, and it is based on attribute information, the second document is determined from second subset.
In the embodiment of the present application, electronic equipment in the case where determining the i.e. second subset of subset belonging to the first document, into
One step, determine in the second subset with the first document matches degree highest second document.
In practical applications, electronic equipment obtains the attribute information of multiple words, and is based on attribute information, from determining second
Son is focused to find out and most matched second document of the first document.
Based on previous embodiment, electronic equipment searches the second text with the first document matches from second subset in the application
When shelves, attribute information can be based on that is, in step 204 from the aspect of syntax dependency parsing and semantic dependency analysis two, from the
The second document is determined in two subsets, can be realized by following steps, comprising:
Step 204a, it is based on attribute information, syntax dependency parsing is carried out to the first document, obtains the first analysis result.
In the embodiment of the present application, it is possible to understand that ground, attribute information include multiple phrases altogether as a whole i.e.
When one document, each word possessed attribute information in this entirety.
Here, electronic equipment can be based on attribute information based on deep learning algorithm and analyze syntactic structure, construct syntax tree
Sentence structure is obtained, and then structure is analyzed based on the syntax dependency parsing structure i.e. first that sentence structure obtains the first document.
Step 204b, it is based on attribute information, semantic dependency analysis is carried out to the first document, obtains the second analysis result.
In the embodiment of the present application, electronic equipment is based on attribute information, carries out semantic dependency analysis, Ke Yili to the first document
Semantic association between each linguistic unit of parsing sentence is solved, and semantic association is presented with dependency structure, in this way, obtaining the
Two analysis results.
Step 204c, based on the first analysis result and the second analysis as a result, determining the second document from second subset.
In the embodiment of the present application, electronic equipment is in the case where getting the first analysis result and the second analysis result, base
In the first analysis result and the second analysis as a result, determining the second document with the first document matches from second subset.
Step 205, the scoring based on the second document predict the scoring of the first document.
In the embodiment of the present application, electronic equipment is obtained in the case where determining the second document with the first document matches
The scoring of second document, and the scoring based on the first document of the score in predicting of the second document.
In the embodiment of the present application, the scoring of the first document is predicted in scoring of the step 205 based on the second document, comprising:
Step 205a, multiple words are analyzed based on character recognition technologies, obtains the alignment degree of the first document.
In the embodiment of the present application, character recognition technologies include optical character identification (Optical Character
Recognition, OCR) technology.Alignment degree characterizes the neat degree of font in document.
In practical applications, electronic equipment analyzes multiple words based on character recognition technologies, obtains the first document
Alignment degree, and using alignment degree as one of the reference factor of scoring of the first document of prediction.
Step 205b, the first document is analyzed based on analytic hierarchy process (AHP), obtains the syntactic structure of the first document.
In the embodiment of the present application, electronic equipment is based on analytic hierarchy process (AHP) and analyzes the first document, obtains the syntax of the first document
Structure, and using syntactic structure as one of the reference factor of scoring of the first document of prediction.Here, electronic equipment obtains syntax knot
The clear and coherent degree of sentence in available first document of structure.
Step 205c, based on the scoring of at least one of alignment degree and syntactic structure and the second document, prediction the
The scoring of one document.
In the embodiment of the present application, feelings of the electronic equipment in the scoring for getting alignment degree, syntactic structure and the second document
Under condition, it can be based on the scoring of at least one of alignment degree and syntactic structure and the second document, the first document of prediction
Scoring realizes automatic scoring and accurate prediction, the full content of the first document is readed over without rating staff, improves scoring effect
Rate.
It should be noted that in the present embodiment with the explanation of same steps in other embodiments and identical content, Ke Yican
According to the description in other embodiments, details are not described herein again.
Based on previous embodiment, embodiments herein provides a kind of electronic equipment, which can be applied to Fig. 1
In a kind of information processing method that~2 corresponding embodiments provide, referring to shown in Fig. 3, the electronic equipment 3 include: processor 31,
Memory 32 and communication bus 33, in which:
Communication bus 33 is for realizing the communication connection between processor 31 and memory 32.
Processor 31 is for executing the message handling program stored in memory 32, to perform the steps of
Obtain the first document to be scored;Wherein, the first document includes multiple words;
The attribute information of multiple words is obtained, attribute information is based on, determining and the first document from preset collection of document
The second document matched;
Based on the scoring of the second document, the scoring of the first document is predicted.
In the other embodiments of the application, processor 31 is used to execute the message handling program stored in memory 32,
To perform the steps of
Preset collection of document is divided, is obtained and associated the first multiple and different subset of different Doctypes;
Subset belonging to the first document is determined from multiple first subsets, obtains second subset;
Correspondingly, it is based on attribute information, determining the second document with the first document matches, packet from preset collection of document
It includes:
Based on attribute information, the second document is determined from second subset.
In the other embodiments of the application, processor 31 is used to execute the message handling program stored in memory 32,
To perform the steps of
Obtain the first keyword of each document in every one first subset and the first semantic information of each document;
Obtain the second keyword of the first document and the second semantic information of the first document;
Obtain the first matching result and multiple first semantic informations and the of multiple first keywords and the second keyword
Second matching result of two semantic informations;
Based on the first matching result and the second matching result, second subset is determined from multiple first subsets.
In the other embodiments of the application, processor 31 is used to execute the message handling program stored in memory 32,
To perform the steps of
Corresponding multiple first subvectors of multiple first keywords are obtained, and determine the weighted average of multiple first subvectors
Value, obtains primary vector;
Corresponding multiple second subvectors of multiple first semantic informations are obtained, and determine that the weighting of multiple second subvectors is flat
Mean value obtains secondary vector;
Obtain the corresponding third vector of the second keyword;
Obtain corresponding 4th vector of the second semantic information;
It determines the similarity between primary vector and third vector, obtains the first matching result;
It determines the similarity between secondary vector and the 4th vector, obtains the second matching result.
In the other embodiments of the application, processor 31 is used to execute the message handling program stored in memory 32,
To perform the steps of
Based on attribute information, syntax dependency parsing is carried out to the first document, obtains the first analysis result;
Based on attribute information, semantic dependency analysis is carried out to the first document, obtains the second analysis result;
Based on the first analysis result and the second analysis as a result, determining the second document from second subset.
In the other embodiments of the application, processor 31 is used to execute the message handling program stored in memory 32,
To perform the steps of
Multiple words are analyzed based on character recognition technologies, obtain the alignment degree of the first document;
The first document is analyzed based on analytic hierarchy process (AHP), obtains the syntactic structure of the first document;
Scoring based at least one of alignment degree and syntactic structure and the second document, the first document of prediction
Scoring.
Electronic equipment provided by the embodiment of the present application, obtains the first document to be scored, and the first document includes multiple words;
And then the attribute information of multiple words is obtained, and screen from the preset collection of document including multiple documents based on the attribute information
Out with the second document of the first document matches;It is based ultimately upon the scoring of the second document, the scoring of the first document is predicted;Such as
This, realizes automatic scoring and accurate prediction, the full content of the first document is readed over without rating staff, improve scoring effect
Rate.
It should be noted that in the present embodiment step performed by processor specific implementation process, be referred to Fig. 1~
The realization process in information processing method that 2 corresponding embodiments provide, details are not described herein again.
Based on previous embodiment, embodiments herein provides a kind of computer readable storage medium, this is computer-readable
Storage medium is stored with one or more program, which can be executed by one or more processor,
To realize following steps:
Obtain the first document to be scored;Wherein, the first document includes multiple words;
The attribute information of multiple words is obtained, attribute information is based on, determining and the first document from preset collection of document
The second document matched;
Based on the scoring of the second document, the scoring of the first document is predicted.
In the other embodiments of the application, which can be executed by one or more processor,
It can also perform the steps of
Preset collection of document is divided, is obtained and associated the first multiple and different subset of different Doctypes;
Subset belonging to the first document is determined from multiple first subsets, obtains second subset;
Correspondingly, it is based on attribute information, determining the second document with the first document matches, packet from preset collection of document
It includes:
Based on attribute information, the second document is determined from second subset.
In the other embodiments of the application, which can be executed by one or more processor,
It can also perform the steps of
Obtain the first keyword of each document in every one first subset and the first semantic information of each document;
Obtain the second keyword of the first document and the second semantic information of the first document;
Obtain the first matching result and multiple first semantic informations and the of multiple first keywords and the second keyword
Second matching result of two semantic informations;
Based on the first matching result and the second matching result, second subset is determined from multiple first subsets.
In the other embodiments of the application, which can be executed by one or more processor,
It can also perform the steps of
Corresponding multiple first subvectors of multiple first keywords are obtained, and determine the weighted average of multiple first subvectors
Value, obtains primary vector;
Corresponding multiple second subvectors of multiple first semantic informations are obtained, and determine that the weighting of multiple second subvectors is flat
Mean value obtains secondary vector;
Obtain the corresponding third vector of the second keyword;
Obtain corresponding 4th vector of the second semantic information;
It determines the similarity between primary vector and third vector, obtains the first matching result;
It determines the similarity between secondary vector and the 4th vector, obtains the second matching result.
In the other embodiments of the application, which can be executed by one or more processor,
It can also perform the steps of
Based on attribute information, syntax dependency parsing is carried out to the first document, obtains the first analysis result;
Based on attribute information, semantic dependency analysis is carried out to the first document, obtains the second analysis result;
Based on the first analysis result and the second analysis as a result, determining the second document from second subset.
In the other embodiments of the application, which can be executed by one or more processor,
It can also perform the steps of
Multiple words are analyzed based on character recognition technologies, obtain the alignment degree of the first document;
The first document is analyzed based on analytic hierarchy process (AHP), obtains the syntactic structure of the first document;
Scoring based at least one of alignment degree and syntactic structure and the second document, the first document of prediction
Scoring.
Computer readable storage medium provided by the embodiment of the present application obtains the first document to be scored, the first document
Including multiple words;And then the attribute informations of multiple words is obtained, and based on the attribute information from the preset text including multiple documents
The second document with the first document matches is filtered out in shelves set;It is based ultimately upon the scoring of the second document, the first document is commented
Divide and is predicted;In this way, realizing automatic scoring and accurate prediction, the full content of the first document is readed over without rating staff,
Improve scoring efficiency.
It should be noted that in the present embodiment step performed by processor specific implementation process, be referred to Fig. 1~
The realization process in information processing method that 2 corresponding embodiments provide, details are not described herein again.
It should be understood by those skilled in the art that, embodiments herein can provide as method, system or computer program
Product.Therefore, the shape of hardware embodiment, software implementation or embodiment combining software and hardware aspects can be used in the application
Formula.Moreover, the application, which can be used, can use storage in the computer that one or more wherein includes computer usable program code
The form for the computer program product implemented on medium (including but not limited to magnetic disk storage and optical memory etc.).
The application is referring to method, the process of equipment (system) and computer program product according to the embodiment of the present application
Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions
The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs
Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce
A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real
The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy
Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates,
Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or
The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting
Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or
The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one
The step of function of being specified in a box or multiple boxes.
The above, the only preferred embodiment of the application, are not intended to limit the protection scope of the application.
Claims (10)
1. a kind of information processing method, which comprises
Obtain the first document to be scored;Wherein, first document includes multiple words;
The attribute information of the multiple word is obtained, the attribute information is based on, it is determining with described the from preset collection of document
Second document of one document matches;
Based on the scoring of second document, the scoring of first document is predicted.
2. the method according to claim 1, wherein the method also includes:
The preset collection of document is divided, is obtained and associated the first multiple and different subset of different Doctypes;
Subset belonging to first document is determined from multiple first subsets, obtains second subset;
Correspondingly, described be based on the attribute information, determining the with first document matches from preset collection of document
Two documents, comprising:
Based on the attribute information, second document is determined from the second subset.
3. according to the method described in claim 2, it is characterized in that, described determine first document from multiple first subsets
Affiliated subset, obtains second subset, comprising:
Obtain the first keyword of each document in every one first subset and the first semantic information of each document;
Obtain the second keyword of first document and the second semantic information of first document;
Obtain the first matching result and multiple first semantemes of multiple first keywords and second keyword
Second matching result of information and second semantic information;
Based on first matching result and second matching result, described second is determined from the multiple first subset
Subset.
4. according to the method described in claim 3, it is characterized in that, described obtain first keyword and second key
Second matching result of the first matching result of word and first semantic information and second semantic information, comprising:
Corresponding multiple first subvectors of multiple first keywords are obtained, and determine the weighting of the multiple first subvector
Average value obtains primary vector;
Corresponding multiple second subvectors of multiple first semantic informations are obtained, and determine adding for the multiple second subvector
Weight average value, obtains secondary vector;
Obtain the corresponding third vector of second keyword;
Obtain corresponding 4th vector of second semantic information;
It determines the similarity between the primary vector and the third vector, obtains first matching result;
It determines the similarity between the secondary vector and the 4th vector, obtains second matching result.
5. method according to any one of claim 2 to 4, which is characterized in that it is described to be based on the attribute information, from institute
It states and determines second document in second subset, comprising:
Based on the attribute information, syntax dependency parsing is carried out to first document, obtains the first analysis result;
Based on the attribute information, semantic dependency analysis is carried out to first document, obtains the second analysis result;
Based on the first analysis result and second analysis as a result, determining second document from the second subset.
6. the method according to claim 1, wherein the scoring based on second document, described in prediction
The scoring of first document, comprising:
The multiple word is analyzed based on character recognition technologies, obtains the alignment degree of first document;
First document is analyzed based on analytic hierarchy process (AHP), obtains the syntactic structure of first document;
Institute is predicted in scoring based at least one of the alignment degree and the syntactic structure and second document
State the scoring of the first document.
7. a kind of electronic equipment, which is characterized in that the electronic equipment includes: processor, memory and communication bus;
The communication bus is for realizing the communication connection between processor and memory;
The processor is for executing the message handling program stored in memory, to perform the steps of
Obtain the first document to be scored;Wherein, first document includes multiple words;
Obtain the attribute information of the multiple word, and be based on the attribute information, determined from preset collection of document with it is described
Second document of the first document matches;
Based on the scoring of second document, the scoring of first document is predicted.
8. electronic equipment according to claim 7, which is characterized in that the processor is also used to perform the steps of
The preset collection of document is divided, is obtained and associated the first multiple and different subset of different Doctypes;
Subset belonging to first document is determined from multiple first subsets, obtains second subset;
Correspondingly, described be based on the attribute information, determining the with first document matches from preset collection of document
Two documents, comprising:
Based on the attribute information, second document is determined from the second subset.
9. electronic equipment according to claim 7, which is characterized in that the processor is also used to perform the steps of
The multiple word is analyzed based on character recognition technologies, obtains the alignment degree of first document;
First document is analyzed based on analytic hierarchy process (AHP), obtains the syntactic structure of first document;
Institute is predicted in scoring based at least one of the alignment degree and the syntactic structure and second document
State the scoring of the first document.
10. a kind of storage medium, which is characterized in that the storage medium is stored with one or more program, it is one or
The multiple programs of person can be executed by one or more processor, to realize as at information described in any one of claims 1 to 6
The step of method of reason.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910662902.4A CN110489743A (en) | 2019-07-22 | 2019-07-22 | A kind of information processing method, electronic equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910662902.4A CN110489743A (en) | 2019-07-22 | 2019-07-22 | A kind of information processing method, electronic equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110489743A true CN110489743A (en) | 2019-11-22 |
Family
ID=68547882
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910662902.4A Pending CN110489743A (en) | 2019-07-22 | 2019-07-22 | A kind of information processing method, electronic equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110489743A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111832278A (en) * | 2020-06-15 | 2020-10-27 | 北京百度网讯科技有限公司 | Document fluency detection method and device, electronic equipment and medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101238459A (en) * | 2005-05-13 | 2008-08-06 | 柯廷技术大学 | Comparing text based documents |
CN102279844A (en) * | 2011-08-31 | 2011-12-14 | 中国科学院自动化研究所 | Method and system for automatically testing Chinese composition |
US20150199913A1 (en) * | 2014-01-10 | 2015-07-16 | LightSide Labs, LLC | Method and system for automated essay scoring using nominal classification |
CN107506360A (en) * | 2016-06-14 | 2017-12-22 | 科大讯飞股份有限公司 | A kind of essay grade method and system |
-
2019
- 2019-07-22 CN CN201910662902.4A patent/CN110489743A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101238459A (en) * | 2005-05-13 | 2008-08-06 | 柯廷技术大学 | Comparing text based documents |
CN102279844A (en) * | 2011-08-31 | 2011-12-14 | 中国科学院自动化研究所 | Method and system for automatically testing Chinese composition |
US20150199913A1 (en) * | 2014-01-10 | 2015-07-16 | LightSide Labs, LLC | Method and system for automated essay scoring using nominal classification |
CN107506360A (en) * | 2016-06-14 | 2017-12-22 | 科大讯飞股份有限公司 | A kind of essay grade method and system |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111832278A (en) * | 2020-06-15 | 2020-10-27 | 北京百度网讯科技有限公司 | Document fluency detection method and device, electronic equipment and medium |
CN111832278B (en) * | 2020-06-15 | 2024-02-09 | 北京百度网讯科技有限公司 | Document fluency detection method and device, electronic equipment and medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11531818B2 (en) | Device and method for machine reading comprehension question and answer | |
CN109791569B (en) | Causal relationship identification device and storage medium | |
CN108304468B (en) | Text classification method and text classification device | |
US20170031894A1 (en) | Systems and methods for domain-specific machine-interpretation of input data | |
CN103246687B (en) | Automatic Blog abstracting method based on characteristic information | |
CN109918657A (en) | A method of extracting target keyword from text | |
CN104133855B (en) | A kind of method and device of input method intelligent association | |
KR20150037924A (en) | Information classification based on product recognition | |
US20130304468A1 (en) | Contextual Voice Query Dilation | |
CN109446313B (en) | Sequencing system and method based on natural language analysis | |
KR102376489B1 (en) | Text document cluster and topic generation apparatus and method thereof | |
Anderson et al. | Distilling neural networks for greener and faster dependency parsing | |
KR20240073376A (en) | Method and apparatus for retrieving a document | |
CN114912425A (en) | Presentation generation method and device | |
Choi et al. | Self-Supervised Speech Representations are More Phonetic than Semantic | |
CN110489743A (en) | A kind of information processing method, electronic equipment and storage medium | |
Tikhonova et al. | NLP methods for automatic candidate’s CV segmentation | |
Kang et al. | Bottom up: Exploring word emotions for chinese sentence chief sentiment classification | |
JP2017068742A (en) | Relevant document retrieval device, model creation device, method and program therefor | |
CN114741512A (en) | Automatic text classification method and system | |
Malandrakis et al. | Affective language model adaptation via corpus selection | |
Fernandes et al. | Hedge detection using the RelHunter approach | |
CN113868431A (en) | Financial knowledge graph-oriented relation extraction method and device and storage medium | |
CN114328820A (en) | Information searching method and related equipment | |
Eidelman et al. | Lessons learned in part-of-speech tagging of conversational speech |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20191122 |
|
RJ01 | Rejection of invention patent application after publication |