CN110489743A

CN110489743A - A kind of information processing method, electronic equipment and storage medium

Info

Publication number: CN110489743A
Application number: CN201910662902.4A
Authority: CN
Inventors: 常新峰; 张晓平; 臧晨迪; 李凯
Original assignee: Lenovo Beijing Ltd
Current assignee: Lenovo Beijing Ltd
Priority date: 2019-07-22
Filing date: 2019-07-22
Publication date: 2019-11-22

Abstract

The embodiment of the present application discloses a kind of information processing method, this method comprises: obtaining the first document to be scored；Wherein, the first document includes multiple words；The attribute information of multiple words is obtained, attribute information is based on, determining the second document with the first document matches from preset collection of document；Based on the scoring of the second document, the scoring of the first document is predicted.Embodiments herein also discloses a kind of electronic equipment and storage medium simultaneously.

Description

A kind of information processing method, electronic equipment and storage medium

Technical field

This application involves but be not limited to field of computer technology more particularly to a kind of information processing method, electronic equipment and Storage medium.

Background technique

In the related technology when scoring a document, needs rating staff to read over the full content of the document, cause The efficiency that scores is lower.

Apply for content

In order to solve the above technical problems, the embodiment of the present application is intended to provide a kind of information processing method, electronic equipment and deposits Storage media solves the full content for when scoring a document, needing rating staff to read over the document in the related technology, Lead to the lower problem of efficiency that scores, realizes automatic scoring, improve scoring efficiency.

The technical solution of the application is achieved in that

A kind of information processing method, which comprises

Obtain the first document to be scored；Wherein, first document includes multiple words；

The attribute information of the multiple word is obtained, the attribute information is based on, the determining and institute from preset collection of document State the second document of the first document matches；

Based on the scoring of second document, the scoring of first document is predicted.

Optionally, the method also includes:

The preset collection of document is divided, is obtained and associated the first multiple and different subset of different Doctypes；

Subset belonging to first document is determined from multiple first subsets, obtains second subset；

Correspondingly, described be based on the attribute information, determining and first document matches from preset collection of document The second document, comprising:

Based on the attribute information, second document is determined from the second subset.

Optionally, described that subset belonging to first document is determined from multiple first subsets, second subset is obtained, is wrapped It includes:

Obtain the first keyword of each document in every one first subset and the first semantic letter of each document Breath；

Obtain the second keyword of first document and the second semantic information of first document；

Obtain the first matching result and multiple described first of multiple first keywords and second keyword Second matching result of semantic information and second semantic information；

Based on first matching result and second matching result, determined from the multiple first subset described Second subset.

Optionally, first matching result for obtaining first keyword and second keyword and described Second matching result of the first semantic information and second semantic information, comprising:

Corresponding multiple first subvectors of multiple first keywords are obtained, and determine the multiple first subvector Weighted average obtains primary vector；

Corresponding multiple second subvectors of multiple first semantic informations are obtained, and determine the multiple second subvector Weighted average, obtain secondary vector；

Obtain the corresponding third vector of second keyword；

Obtain corresponding 4th vector of second semantic information；

It determines the similarity between the primary vector and the third vector, obtains first matching result；

It determines the similarity between the secondary vector and the 4th vector, obtains second matching result.

Optionally, described to be based on the attribute information, second document is determined from the second subset, comprising:

Based on the attribute information, syntax dependency parsing is carried out to first document, obtains the first analysis result；

Based on the attribute information, semantic dependency analysis is carried out to first document, obtains the second analysis result；

Based on the first analysis result and second analysis as a result, determining second text from the second subset Shelves.

Optionally, the scoring of first document is predicted in the scoring based on second document, comprising:

The multiple word is analyzed based on character recognition technologies, obtains the alignment degree of first document；

First document is analyzed based on analytic hierarchy process (AHP), obtains the syntactic structure of first document；

Scoring based at least one of the alignment degree and the syntactic structure and second document, in advance Survey the scoring of first document.

A kind of electronic equipment, the electronic equipment include: processor, memory and communication bus；

The communication bus is for realizing the communication connection between processor and memory；

The processor is for executing the message handling program stored in memory, to perform the steps of

Optionally, the processor is also used to perform the steps of

A kind of storage medium, which is characterized in that the storage medium is stored with one or more program, it is one or The multiple programs of person can be executed by one or more processor, with realize as above-mentioned information processing method the step of.

Information processing method, electronic equipment provided by the embodiment of the present application and storage medium obtain to be scored first Document, the first document include multiple words；And then the attribute information of multiple words is obtained, based on the attribute information from preset including more The second document with the first document matches is filtered out in the collection of document of a document；It is based ultimately upon the scoring of the second document, to The scoring of one document is predicted；In this way, realizing automatic scoring and accurate prediction, the first document is readed over without rating staff Full content improves scoring efficiency.

Detailed description of the invention

Fig. 1 is a kind of flow diagram for information processing method that embodiments herein provides；

Fig. 2 is the flow diagram for another information processing method that embodiments herein provides；

Fig. 3 is a kind of structural schematic diagram for terminal that embodiments herein provides.

Specific embodiment

Below in conjunction with the attached drawing in the embodiment of the present application, technical solutions in the embodiments of the present application carries out clear, complete Site preparation description.

Embodiments herein provides a kind of information processing method, is applied to electronic equipment, shown referring to Fig.1, this method The following steps are included:

Step 101 obtains the first document to be scored.

Wherein, the first document includes multiple words；Illustratively, in the embodiment of the present application, multiple words form sentence, sentence It is linked to be paragraph, paragraph forms the first document.Document involved in the embodiment of the present application includes but is not limited to composition, paper and text Chapter.

In practical applications, the first document is as object to be scored；Illustratively, it in teaching process, such as is criticizing When readding paper, the first document can be composition to be scored；During paper/article in more new database, the first text Shelves can be the paper/article being newly uploaded in database to be scored.

Step 102, the attribute information for obtaining multiple words, are based on attribute information, determine from preset collection of document and the Second document of one document matches.

In the embodiment of the present application, the attribute information of multiple words can be understood as multiple phrases possessed attribute letter altogether Breath；Such as the attribute information of multiple words include multiple phrases altogether as a whole i.e. the first document when, first document Possessed attribute information；May also comprise multiple phrases altogether as a whole i.e. the first document when, each word is at this Possessed attribute information in entirety.

Illustratively, if the first document is composition to be scored, the attribute information of multiple words can be used for characterizing the composition Type, such as narrative, expository writing, practical writing or argumentative writing；Certainly, it is more to can be also used for characterization for the attribute information of multiple words A word sentence element corresponding in the sentence that these words form；Such as in the Chinese sentence of multiple word compositions, multiple words Attribute information includes subject, predicate, object, dynamic language, attribute, the adverbial modifier, complement and head；The English of for example multiple word compositions again In sentence, the attribute information of multiple words includes subject, predicate, object, predicative, attribute, the adverbial modifier, complement and appositive.

In practical applications, electronic equipment gets the first document to be scored, and obtains multiple words in the first document After attribute information, it is based on the attribute information, the second document with the first document matches is found out from preset collection of document.Example Property, if the first document is composition to be scored, the attribute informations of multiple words is used to characterize the type of the composition, then, with the Second document of one document matches can be the document with the first document same type.In another example, if the first document is to be evaluated The composition divided, the attribute information of multiple words are used to characterize multiple words sentence element corresponding in the sentence that these words form, So, it can be the highest text of degree same or similar with the sentence element of the first document with the second document of the first document matches Shelves.

Step 103, the scoring based on the second document predict the scoring of the first document.

In the embodiment of the present application, electronic equipment finds out the second document with the first document matches from preset collection of document Afterwards, the scoring of the second document, and the scoring based on the second document are obtained, the scoring of the first document is predicted, and then obtains Prediction result.It should be noted that the scoring for the first document that electronic equipment predicts, can be used as the most final review of the first document The reference scoring divided；The scoring for the first document that electronic equipment predicts can also be used as the final scoring of the first document.

Information processing method, electronic equipment provided by the embodiment of the present application and storage medium obtain to be scored first Document, the first document include multiple words；And then the attribute information of multiple words is obtained, and include from preset based on the attribute information The second document with the first document matches is filtered out in the collection of document of multiple documents；It is based ultimately upon the scoring of the second document, it is right The scoring of first document is predicted；In this way, realizing automatic scoring and accurate prediction, the first document is readed over without rating staff Full content, improve scoring efficiency.

Based on previous embodiment, embodiments herein provides a kind of information processing method, is applied to electronic equipment, reference Shown in Fig. 2, method includes the following steps:

Step 201 obtains the first document to be scored.

Wherein, the first document includes multiple words.

Step 202 divides preset collection of document, obtains and associated the first multiple and different son of different Doctypes Collection.

In the embodiment of the present application, preset collection of document includes multiple with reference to text for scoring the first document Shelves.Multiple reference documents have multiple Doctypes.Illustratively, different Doctypes corresponds to different document scores, and/ Or, different themes.Here, theme can be understood as the document central idea to be showed.

In the embodiment of the present application, electronic equipment can be divided preset collection of document, and the division operation is It is carried out based on Doctype；It should be noted that electronic equipment divides preset collection of document, to obtain multiple The operation of one subset can be based on realizing before the first subset executes corresponding operating in electronic equipment；That is, the application is real It applies in example, the execution sequence for the step of dividing preset collection of document and obtaining the first document is not especially limited.

In practical applications, electronic equipment be based on Doctype, preset collection of document is divided, obtain from it is different Associated the first multiple and different subset of Doctype.At this point, the first different subsets, corresponding different Doctype, example Such as, the first different subsets has different scoring ranges, and/or, the first different subsets has different themes.

Step 203 determines subset belonging to the first document from multiple first subsets, obtains second subset.

In the embodiment of the present application, electronic equipment divides preset collection of document, obtains the feelings of multiple first subsets Under condition, it can be focused to find out subset described in the first document i.e. second subset from multiple first sons, realize the classification of the first document.

In the embodiment of the present application, step 203 determines subset belonging to the first document from multiple first subsets, obtains the The process that two subsets, i.e. electronic equipment sort out the first document can be realized by following steps, comprising:

Step 203a, obtain each document in every one first subset the first keyword and each document first Semantic information.

Keyword involved in the embodiment of the present application includes but is not limited to select from the title of document, summary and text Come, there is the vocabulary of essential meaning to the centre point of statement document.Illustratively, electronic equipment can be based at natural language (Natural Language Processing, the NLP) technology of managing inverse text frequency (the term frequency-of word frequency- Inverse document frequency, TF-IDF) extraction of the algorithm to document progress keyword.

In the embodiment of the present application, electronic equipment can obtain the semantic information of document based on deep learning model.

Step 203b, the second keyword of the first document and the second semantic information of the first document are obtained.

In practical applications, electronic equipment for the first document and divides each text in obtained each first subset Shelves execute the operation of keyword extraction and Semantic features extraction, obtain the first keyword and each document of each document First semantic information；The second keyword of the first document and the second semantic information of the first document are also obtained simultaneously.

Step 203c, the first matching result and multiple first languages of multiple first keywords and the second keyword are obtained Second matching result of adopted information and the second semantic information.

In the embodiment of the present application, electronic equipment is based on keyword and semantic information the two key factors, realizes to first The classification of document.Electronic equipment gets multiple first keywords, the second keyword, multiple first semantic informations and the second language In the case where adopted information, multiple first keywords are matched to obtain the first matching result with the second keyword, and will be multiple First semantic information is matched to obtain the second matching result with the second semantic information.Further, electronic equipment can be based on First matching result and the second matching result realize the classification to the first document.

It, can be using Vectors matching when electronic equipment executes above-mentioned matching operation in the application based on previous embodiment Mode is realized, it is possible to understand that ground, step 203c obtain the first matching result of multiple first keywords and the second keyword, and Second matching result of multiple first semantic informations and the second semantic information, may include steps of:

Step1 obtains corresponding multiple first subvectors of multiple first keywords, and determines adding for multiple first subvectors Weight average value, obtains primary vector.

In the embodiment of the present application, each subset includes multiple documents, and multiple documents correspond to multiple first keywords.Electronics is set It is standby and then to obtain corresponding multiple first subvectors of multiple first keywords in the case where getting multiple first keywords, And determine the weighted average of multiple first subvectors, obtain primary vector.

Step2 obtains corresponding multiple second subvectors of multiple first semantic informations, and determines multiple second subvectors Weighted average obtains secondary vector.

In the embodiment of the present application, each subset includes multiple documents, and multiple documents correspond to multiple first semantic informations.Electronics Equipment obtains corresponding multiple second sons of multiple first semantic informations in the case where getting multiple first semantic informations Vector, and determine the weighted average of multiple second subvectors, obtain secondary vector.

Step3 obtains the corresponding third vector of the second keyword.

In the embodiment of the present application, electronic equipment obtains the second keyword in the case where getting the second keyword Corresponding third vector.

Step4 obtains corresponding 4th vector of the second semantic information.

In the embodiment of the present application, electronic equipment is in the case where getting the second semantic information, and then it is semantic to obtain second Corresponding 4th vector of information.

Step5 determines the similarity between primary vector and third vector, obtains the first matching result.

In the embodiment of the present application, electronic equipment determines primary vector in the case where obtaining primary vector and third vector With the similarity between third vector, the first matching result is obtained.

Step6 determines the similarity between secondary vector and the 4th vector, obtains the second matching result.

In the embodiment of the present application, electronic equipment determines secondary vector in the case where obtaining secondary vector and four vectors With the similarity between the 4th vector, the second matching result is obtained.

It can be seen from the above, during realizing the first document classification, it can be based on each document that each subset includes In corresponding first subvector of keyword, third vector corresponding with the keyword that the first document includes matched；It can also With include based on each subset all documents in corresponding first subvector of keyword weighted average i.e. primary vector, Third vector corresponding with the keyword that the first document includes is matched；Meanwhile each document for based on each subset including Corresponding second subvector of the first semantic information, corresponding with the second semantic information of the first document the 4th vector carries out Match；Can also corresponding second subvector of the first semantic information based on all documents that each subset includes weighted average That is secondary vector, the 4th vector corresponding with the second semantic information of the first document are matched.It should be noted that based on the One vector is matched with third vector, and is matched based on secondary vector with the 4th vector, and matching times can be reduced, It improves and sorts out efficiency.

Step 203d, it is based on the first matching result and the second matching result, the second son is determined from multiple first subsets Collection.

It, can be in the case that electronic equipment gets the first matching result and the second matching result in the embodiment of the present application Based on the first matching result and the second matching result, second subset is determined from multiple first subsets.It is to be appreciated that electronics In the case that equipment gets the first matching result and the second matching result, the first matching knot in multiple first subsets can be determined Distance is nearest between the corresponding primary vector of fruit and third vector and the corresponding secondary vector of the second matching result and four-way The first nearest subset of distance is subset belonging to the first document between amount.

Step 204, the attribute information for obtaining multiple words, and it is based on attribute information, the second document is determined from second subset.

In the embodiment of the present application, electronic equipment in the case where determining the i.e. second subset of subset belonging to the first document, into One step, determine in the second subset with the first document matches degree highest second document.

In practical applications, electronic equipment obtains the attribute information of multiple words, and is based on attribute information, from determining second Son is focused to find out and most matched second document of the first document.

Based on previous embodiment, electronic equipment searches the second text with the first document matches from second subset in the application When shelves, attribute information can be based on that is, in step 204 from the aspect of syntax dependency parsing and semantic dependency analysis two, from the The second document is determined in two subsets, can be realized by following steps, comprising:

Step 204a, it is based on attribute information, syntax dependency parsing is carried out to the first document, obtains the first analysis result.

In the embodiment of the present application, it is possible to understand that ground, attribute information include multiple phrases altogether as a whole i.e. When one document, each word possessed attribute information in this entirety.

Here, electronic equipment can be based on attribute information based on deep learning algorithm and analyze syntactic structure, construct syntax tree Sentence structure is obtained, and then structure is analyzed based on the syntax dependency parsing structure i.e. first that sentence structure obtains the first document.

Step 204b, it is based on attribute information, semantic dependency analysis is carried out to the first document, obtains the second analysis result.

In the embodiment of the present application, electronic equipment is based on attribute information, carries out semantic dependency analysis, Ke Yili to the first document Semantic association between each linguistic unit of parsing sentence is solved, and semantic association is presented with dependency structure, in this way, obtaining the Two analysis results.

Step 204c, based on the first analysis result and the second analysis as a result, determining the second document from second subset.

In the embodiment of the present application, electronic equipment is in the case where getting the first analysis result and the second analysis result, base In the first analysis result and the second analysis as a result, determining the second document with the first document matches from second subset.

Step 205, the scoring based on the second document predict the scoring of the first document.

In the embodiment of the present application, electronic equipment is obtained in the case where determining the second document with the first document matches The scoring of second document, and the scoring based on the first document of the score in predicting of the second document.

In the embodiment of the present application, the scoring of the first document is predicted in scoring of the step 205 based on the second document, comprising:

Step 205a, multiple words are analyzed based on character recognition technologies, obtains the alignment degree of the first document.

In the embodiment of the present application, character recognition technologies include optical character identification (Optical Character Recognition, OCR) technology.Alignment degree characterizes the neat degree of font in document.

In practical applications, electronic equipment analyzes multiple words based on character recognition technologies, obtains the first document Alignment degree, and using alignment degree as one of the reference factor of scoring of the first document of prediction.

Step 205b, the first document is analyzed based on analytic hierarchy process (AHP), obtains the syntactic structure of the first document.

In the embodiment of the present application, electronic equipment is based on analytic hierarchy process (AHP) and analyzes the first document, obtains the syntax of the first document Structure, and using syntactic structure as one of the reference factor of scoring of the first document of prediction.Here, electronic equipment obtains syntax knot The clear and coherent degree of sentence in available first document of structure.

Step 205c, based on the scoring of at least one of alignment degree and syntactic structure and the second document, prediction the The scoring of one document.

In the embodiment of the present application, feelings of the electronic equipment in the scoring for getting alignment degree, syntactic structure and the second document Under condition, it can be based on the scoring of at least one of alignment degree and syntactic structure and the second document, the first document of prediction Scoring realizes automatic scoring and accurate prediction, the full content of the first document is readed over without rating staff, improves scoring effect Rate.

It should be noted that in the present embodiment with the explanation of same steps in other embodiments and identical content, Ke Yican According to the description in other embodiments, details are not described herein again.

Based on previous embodiment, embodiments herein provides a kind of electronic equipment, which can be applied to Fig. 1 In a kind of information processing method that~2 corresponding embodiments provide, referring to shown in Fig. 3, the electronic equipment 3 include: processor 31, Memory 32 and communication bus 33, in which:

Communication bus 33 is for realizing the communication connection between processor 31 and memory 32.

Processor 31 is for executing the message handling program stored in memory 32, to perform the steps of

Obtain the first document to be scored；Wherein, the first document includes multiple words；

The attribute information of multiple words is obtained, attribute information is based on, determining and the first document from preset collection of document The second document matched；

Based on the scoring of the second document, the scoring of the first document is predicted.

In the other embodiments of the application, processor 31 is used to execute the message handling program stored in memory 32, To perform the steps of

Preset collection of document is divided, is obtained and associated the first multiple and different subset of different Doctypes；

Subset belonging to the first document is determined from multiple first subsets, obtains second subset；

Correspondingly, it is based on attribute information, determining the second document with the first document matches, packet from preset collection of document It includes:

Based on attribute information, the second document is determined from second subset.

Obtain the first keyword of each document in every one first subset and the first semantic information of each document；

Obtain the second keyword of the first document and the second semantic information of the first document；

Obtain the first matching result and multiple first semantic informations and the of multiple first keywords and the second keyword Second matching result of two semantic informations；

Based on the first matching result and the second matching result, second subset is determined from multiple first subsets.

Corresponding multiple first subvectors of multiple first keywords are obtained, and determine the weighted average of multiple first subvectors Value, obtains primary vector；

Corresponding multiple second subvectors of multiple first semantic informations are obtained, and determine that the weighting of multiple second subvectors is flat Mean value obtains secondary vector；

Obtain the corresponding third vector of the second keyword；

Obtain corresponding 4th vector of the second semantic information；

It determines the similarity between primary vector and third vector, obtains the first matching result；

It determines the similarity between secondary vector and the 4th vector, obtains the second matching result.

Based on attribute information, syntax dependency parsing is carried out to the first document, obtains the first analysis result；

Based on attribute information, semantic dependency analysis is carried out to the first document, obtains the second analysis result；

Based on the first analysis result and the second analysis as a result, determining the second document from second subset.

Multiple words are analyzed based on character recognition technologies, obtain the alignment degree of the first document；

The first document is analyzed based on analytic hierarchy process (AHP), obtains the syntactic structure of the first document；

Scoring based at least one of alignment degree and syntactic structure and the second document, the first document of prediction Scoring.

Electronic equipment provided by the embodiment of the present application, obtains the first document to be scored, and the first document includes multiple words； And then the attribute information of multiple words is obtained, and screen from the preset collection of document including multiple documents based on the attribute information Out with the second document of the first document matches；It is based ultimately upon the scoring of the second document, the scoring of the first document is predicted；Such as This, realizes automatic scoring and accurate prediction, the full content of the first document is readed over without rating staff, improve scoring effect Rate.

It should be noted that in the present embodiment step performed by processor specific implementation process, be referred to Fig. 1~ The realization process in information processing method that 2 corresponding embodiments provide, details are not described herein again.

Based on previous embodiment, embodiments herein provides a kind of computer readable storage medium, this is computer-readable Storage medium is stored with one or more program, which can be executed by one or more processor, To realize following steps:

In the other embodiments of the application, which can be executed by one or more processor, It can also perform the steps of

Obtain the corresponding third vector of the second keyword；

Obtain corresponding 4th vector of the second semantic information；

Computer readable storage medium provided by the embodiment of the present application obtains the first document to be scored, the first document Including multiple words；And then the attribute informations of multiple words is obtained, and based on the attribute information from the preset text including multiple documents The second document with the first document matches is filtered out in shelves set；It is based ultimately upon the scoring of the second document, the first document is commented Divide and is predicted；In this way, realizing automatic scoring and accurate prediction, the full content of the first document is readed over without rating staff, Improve scoring efficiency.

It should be understood by those skilled in the art that, embodiments herein can provide as method, system or computer program Product.Therefore, the shape of hardware embodiment, software implementation or embodiment combining software and hardware aspects can be used in the application Formula.Moreover, the application, which can be used, can use storage in the computer that one or more wherein includes computer usable program code The form for the computer program product implemented on medium (including but not limited to magnetic disk storage and optical memory etc.).

The application is referring to method, the process of equipment (system) and computer program product according to the embodiment of the present application Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.

These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates, Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or The function of being specified in multiple boxes.

These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one The step of function of being specified in a box or multiple boxes.

The above, the only preferred embodiment of the application, are not intended to limit the protection scope of the application.

Claims

1. a kind of information processing method, which comprises

The attribute information of the multiple word is obtained, the attribute information is based on, it is determining with described the from preset collection of document Second document of one document matches；

2. the method according to claim 1, wherein the method also includes:

Correspondingly, described be based on the attribute information, determining the with first document matches from preset collection of document Two documents, comprising:

3. according to the method described in claim 2, it is characterized in that, described determine first document from multiple first subsets Affiliated subset, obtains second subset, comprising:

Obtain the first matching result and multiple first semantemes of multiple first keywords and second keyword Second matching result of information and second semantic information；

Based on first matching result and second matching result, described second is determined from the multiple first subset Subset.

4. according to the method described in claim 3, it is characterized in that, described obtain first keyword and second key Second matching result of the first matching result of word and first semantic information and second semantic information, comprising:

Corresponding multiple first subvectors of multiple first keywords are obtained, and determine the weighting of the multiple first subvector Average value obtains primary vector；

Corresponding multiple second subvectors of multiple first semantic informations are obtained, and determine adding for the multiple second subvector Weight average value, obtains secondary vector；

Obtain the corresponding third vector of second keyword；

Obtain corresponding 4th vector of second semantic information；

5. method according to any one of claim 2 to 4, which is characterized in that it is described to be based on the attribute information, from institute It states and determines second document in second subset, comprising:

Based on the first analysis result and second analysis as a result, determining second document from the second subset.

6. the method according to claim 1, wherein the scoring based on second document, described in prediction The scoring of first document, comprising:

Institute is predicted in scoring based at least one of the alignment degree and the syntactic structure and second document State the scoring of the first document.

7. a kind of electronic equipment, which is characterized in that the electronic equipment includes: processor, memory and communication bus；

Obtain the attribute information of the multiple word, and be based on the attribute information, determined from preset collection of document with it is described Second document of the first document matches；

8. electronic equipment according to claim 7, which is characterized in that the processor is also used to perform the steps of

9. electronic equipment according to claim 7, which is characterized in that the processor is also used to perform the steps of

10. a kind of storage medium, which is characterized in that the storage medium is stored with one or more program, it is one or The multiple programs of person can be executed by one or more processor, to realize as at information described in any one of claims 1 to 6 The step of method of reason.