CN110263171A - Document Classification Method, device and terminal - Google Patents

Document Classification Method, device and terminal Download PDF

Info

Publication number
CN110263171A
CN110263171A CN201910554455.0A CN201910554455A CN110263171A CN 110263171 A CN110263171 A CN 110263171A CN 201910554455 A CN201910554455 A CN 201910554455A CN 110263171 A CN110263171 A CN 110263171A
Authority
CN
China
Prior art keywords
document
vector
text
content
option information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910554455.0A
Other languages
Chinese (zh)
Other versions
CN110263171B (en
Inventor
邱昭鹏
吴贤
范伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201910554455.0A priority Critical patent/CN110263171B/en
Publication of CN110263171A publication Critical patent/CN110263171A/en
Application granted granted Critical
Publication of CN110263171B publication Critical patent/CN110263171B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Present disclose provides a kind of Document Classification Method, device and terminals, belong to depth learning technology field.The described method includes: determining the first document to be sorted;Determine corresponding multiple second documents of the first option information and the corresponding multiple third documents of the second option information in multiple option informations;According to the description information of the first document, multiple second documents and multiple third documents, determines the first predicted vector of the similarity relation for being used to indicate the first option information and the second option information and be used to indicate the second predicted vector of the derivation relationship of description information and the first option information;According to the first predicted vector and the second predicted vector, the third predicted vector of the first document is determined;According to third predicted vector, the classification of the first document is determined.By determining the similarity relation and derivation relationship of the relevant documentation of the first document, the classification of first document is determined, when so as to avoid by artificially classifying to document, existing subjective bias reduces manpower consumption, reduces costs.

Description

Document Classification Method, device and terminal
Technical field
This disclosure relates to depth learning technology field, in particular to a kind of Document Classification Method, device and terminal.
Background technique
The demand of student and electronization education manufacturer to individualized learning policing algorithm is formulated is more more and more intense.It is a formulating Property learning strategy when, often according to the progress or ability of different students, recommend the topic of different difficulty classifications for different students. Therefore, it when formulating individualized learning strategy, needs first to carry out document classification to topic according to item difficulty.
In the related technology, when carrying out document classification to topic, often by the method for expert judgments, sentenced according to the subjectivity of expert It is disconnected, determine the difficulty of corresponding first document of the topic;Alternatively, the sample investigation in student determines topic by the feedback of student The difficulty of corresponding first document of mesh;Classify later further according to the difficulty of first document to first document.
It is above-mentioned in the related technology, the side classified by expert judgments or Students ' Feedback the first document corresponding to topic In method, due to being all by artificially being judged, leading to prediction result, there are subjective bias, also, during prediction, need A large amount of manpower is consumed, human cost is high.
Summary of the invention
The embodiment of the present disclosure provides a kind of Document Classification Method, device and terminal, is sentenced at present by expert for solving In the method that disconnected or Students ' Feedback the first document corresponding to topic is classified, due to be all by artificially being judged, Leading to prediction result, there are subjective bias, also, during prediction, need to consume a large amount of manpower, the high problem of human cost. The technical solution is as follows:
On the one hand, a kind of Document Classification Method is provided, which comprises
Determine the first document to be sorted, first document includes description information and multiple option informations, the multiple It include corresponding first option information of at least one described description information in option information;
Determine corresponding multiple second documents of first option information and the corresponding multiple third texts of the second option information Shelves, second option information are other option informations in the multiple option information in addition to first option information;
According to the multiple second document and the multiple third document, determine be used to indicate first option information with First predicted vector of the similarity relation of second option information;
According to the description information of the multiple second document and first document, determination is used to indicate the description information With the second predicted vector of the derivation relationship of first option information;
According to first predicted vector and second predicted vector, the pre- direction finding of third of first document is determined Amount;
According to the third predicted vector, the classification of first document is determined.
In one possible implementation, corresponding multiple second documents of the determination first option information and The corresponding multiple third documents of two option informations, comprising:
Determine the first content of text of the description information, the second content of text of first option information and described The third content of text of two option informations;
According to first content of text and second content of text, first content of text and described second are determined Corresponding first keyword of content of text;
According to first keyword, corresponding with first keyword the multiple the is determined from document database Two documents;
According to first content of text and the third content of text, first content of text and described second are determined Corresponding second keyword of content of text;
According to second keyword, determination is corresponding with second keyword described more from the document database A third document.
It is described according to the multiple second document and the multiple third document in alternatively possible implementation, Determine the first predicted vector for being used to indicate the similarity relation of first option information and second option information, comprising:
According to the multiple second document and the multiple third document, first option information and described second is determined The similarity relation of option information;According to the similarity relation of second document and the third document, first prediction is determined Vector;Alternatively,
Second document and the third document are inputted in the first prediction model, first predicted vector is obtained.
It is described according to the multiple second document and the multiple third document in alternatively possible implementation, Determine the similarity relation of first option information Yu second option information, comprising:
Respectively according to multiple in multiple first words and the multiple third document in the multiple second document Second word determines the first matrix of the multiple second document composition and the second matrix of the multiple third document composition;
Each column of each column of first matrix and second matrix are subjected to dot product, obtain being used to indicate described First matching matrix of the similarity relation between the second document and the third document;
The first weight is determined according to first matching matrix, and first weight is the multiple second document and described The weight of the context vector of multiple third documents;
According to first matrix and first weight, determine the second word of each of the multiple third document in institute State the first context vector in multiple second documents;And according to second matrix and first weight, determine described in Second context vector of the first word of each of multiple second documents in the multiple third document;
First context vector and second context vector are compared, first option information is obtained With the similarity relation of second option information.
In alternatively possible implementation, the description according to the multiple second document and first document Information determines the second predicted vector for being used to indicate the derivation relationship of the description information and first option information, comprising:
The description information of first document and the first option information are formed into the 4th content of text;According to the multiple Two documents and the 4th content of text, determine the derivation relationship of second document and the 4th content of text;According to institute The derivation relationship for stating the second document and the 4th content of text determines second predicted vector;Alternatively,
Second document, the description information of first document and first document are inputted into the second prediction model In, obtain second predicted vector.
It is described according to the multiple second document and the 4th content of text in alternatively possible implementation, Determine the derivation relationship of second document and the 4th content of text, comprising:
Respectively according to the 4th word in the multiple third words and second content of text in the multiple second document Language determines the third matrix and the 4th text of multiple third words composition of each second document in the multiple second document 4th matrix of multiple 4th words composition in this;
It will be every in each column of the 4th matrix respectively multiple third matrixes corresponding with the multiple second document One column carry out dot product, obtain the of the derivation relationship being used to indicate between the 4th content of text and the multiple second document Two matching matrixes;
The second weight is determined according to second matching matrix, and second weight is the multiple second document and described The weight of the context vector of 4th content of text;
According to the 4th content of text and second weight, each second document in the multiple second document is determined Third context vector of each third word in the 4th content of text;
According to the second document of each of the third context vector, the multiple second document and second power Weight, determines fourth context vector of the 4th word of each of described 4th content of text in each second document;
According to the 4th context vector, the derivation relationship of second document and the 4th content of text is determined.
It is described according in third context vector, the multiple second document in alternatively possible implementation Each second document and second weight determine the 4th word of each of the 4th content of text described each second The 4th context vector in document, comprising:
The third context vector is fused in the corresponding third matrix of the multiple second document, is merged Multiple 5th matrixes of second document and the 4th content of text;
Dot product is carried out by the 5th matrix of each of the multiple 5th matrix and with the 5th context vector, obtains third Weight, the weight of derivation relationship of the third weight between the multiple 4th document, the 5th context vector is institute State context vector of the 5th word in each 4th document in the 4th document;
According to the third weight, determine fourth context of the 5th word in each 4th document to Amount.
It is described according to first predicted vector and second predicted vector in alternatively possible implementation, Determine the third predicted vector of first document, comprising:
First the first maximum value of equalization vector sum of first predicted vector is determined according to first predicted vector Change vector;
First maximization vector described in the first equalization vector sum is spliced, the 4th predicted vector is obtained;
Second the second maximum value of equalization vector sum of second predicted vector is determined according to second predicted vector Change vector;
Second maximization vector described in the second equalization vector sum is spliced, the 5th predicted vector is obtained;
5th predicted vector and the 4th predicted vector are spliced, the third predicted vector is obtained.
On the other hand, a kind of document sorting apparatus is provided, described device includes:
First determining module, for determining the first document to be sorted, first document includes description information and multiple Option information includes corresponding first option information of at least one described description information in the multiple option information;
Second determining module, for determining corresponding multiple second documents of first option information and the second option information Corresponding multiple third documents, second option information be the multiple option information in addition to first option information Other option informations;
Third determining module, for according to the multiple second document and the multiple third document, determination to be used to indicate First predicted vector of the similarity relation of first option information and second option information;
4th determining module is determined and is used for the description information according to the multiple second document and first document In the second predicted vector of the derivation relationship for indicating the description information and first option information;
5th determining module, for determining described first according to first predicted vector and second predicted vector The third predicted vector of document;
6th determining module, for determining the classification of first document according to the third predicted vector.
In one possible implementation,
Second determining module is also used to determine the first content of text, first option letter of the description information Second content of text of breath and the third content of text of second option information;According to first content of text and described Two content of text determine first content of text and corresponding first keyword of second content of text;According to described One keyword determines the multiple second document corresponding with first keyword from document database;According to described One content of text and the third content of text, determine first content of text and second content of text corresponding second Keyword;According to second keyword, determination is corresponding with second keyword described more from the document database A third document.
In alternatively possible implementation, the third determining module is also used to according to the multiple second document With the multiple third document, the similarity relation of first option information Yu second option information is determined;According to described The similarity relation of second document and the third document determines first predicted vector;Alternatively, by second document and institute It states third document to input in the first prediction model, obtains first predicted vector.
In alternatively possible implementation, the third determining module is also used to respectively according to the multiple second Multiple second words in multiple first words and the multiple third document in document, determine the multiple second document Second matrix of the first matrix of composition and the multiple third document composition;By each column of first matrix and described the Each column of two matrixes carry out dot product, obtain the similarity relation being used to indicate between second document and the third document First matching matrix;Determine that the first weight, first weight are the multiple second document according to first matching matrix With the weight of the context vector of the multiple third document;According to first matrix and first weight, determine described in First context vector of the second word of each of multiple third documents in the multiple second document;And according to described Second matrix and first weight determine the first word of each of the multiple second document in the multiple third document The second context vector;First context vector and second context vector are compared, obtain described The similarity relation of one option information and second option information.
In alternatively possible implementation, the 4th determining module is also used to the description of first document Information and the first option information form the 4th content of text;According to the multiple second document and the 4th content of text, really The derivation relationship of fixed second document and the 4th content of text;According to second document and the 4th content of text Derivation relationship, determine second predicted vector;Alternatively, by second document, first document description information and First document inputs in the second prediction model, obtains second predicted vector.
In alternatively possible implementation, the 4th determining module is also used to respectively according to the multiple second The 4th word in multiple third words and second content of text in document determines each in the multiple second document The 4th of multiple 4th words composition in the third matrix and the 4th text of multiple third words composition of second document Matrix;By each column in each column of the 4th matrix respectively multiple third matrixes corresponding with the multiple second document Dot product is carried out, second of the derivation relationship being used to indicate between the 4th content of text and the multiple second document is obtained With matrix;The second weight is determined according to second matching matrix, and second weight is the multiple second document and described The weight of the context vector of 4th content of text;According to the 4th content of text and second weight, determine described more Third context vector of each third word of each second document in the 4th content of text in a second document;Root According to the second document of each of the third context vector, the multiple second document and second weight, determine described in Fourth context vector of the 4th word of each of 4th content of text in each second document;According to the described 4th Context vector determines the derivation relationship of second document and the 4th content of text.
In alternatively possible implementation, the 4th determining module is also used to the third context vector It is fused in the corresponding third matrix of the multiple second document, has been merged in second document and the 4th text Multiple 5th matrixes held;It is carried out a little by the 5th matrix of each of the multiple 5th matrix and with the 5th context vector Product, obtains third weight, the weight of derivation relationship of the third weight between the multiple 4th document, and the described about 5th Literary vector is context vector of the 5th word in the 4th document in each 4th document;According to the third Weight determines fourth context vector of the 5th word in each 4th document.
In alternatively possible implementation, the 5th determining module is also used to according to first predicted vector Determine first equalization vector sum the first maximization vector of first predicted vector;By the first equalization vector sum The first maximization vector is spliced, and the 4th predicted vector is obtained;Described is determined according to second predicted vector Second equalization vector sum the second maximization vector of two predicted vectors;Most by described in the second equalization vector sum second Big value vector is spliced, and the 5th predicted vector is obtained;5th predicted vector and the 4th predicted vector are carried out Splicing, obtains the third predicted vector.
On the other hand, a kind of terminal is provided, the terminal includes processor and memory, is stored in the memory At least one instruction, at least a Duan Chengxu, code set or instruction set, described instruction, described program, the code set or the finger Collection is enabled to be loaded by the processor and executed to realize operation performed in Document Classification Method in the embodiment of the present disclosure.
On the other hand, a kind of computer readable storage medium is provided, is stored in the computer readable storage medium At least one instruction, at least a Duan Chengxu, code set or instruction set, described instruction, described program, the code set or the finger Collection is enabled to be loaded by processor and executed to realize such as operation performed in Document Classification Method in the embodiment of the present disclosure.
The technical solution that the embodiment of the present disclosure provides has the benefit that
In the embodiments of the present disclosure, by the first document to be sorted, determination is relevant to first document to be sorted Multiple second documents and multiple third documents, determine in first document, the first option according to second document and third document Information and the second option information determine first predicted vector, determine the second pre- direction finding according to first document and the second document Amount, first predicted vector and the second predicted vector is spliced, the document class for obtaining being used to indicate first document is other Third predicted vector.By determining the similarity relation and derivation relationship of the relevant documentation of the first document, first document is determined Classification, when so as to avoid by artificially classifying to document, existing subjective bias reduces manpower consumption, reduces Cost.
Detailed description of the invention
In order to illustrate more clearly of the technical solution in the embodiment of the present disclosure, will make below to required in embodiment description Attached drawing is briefly described, it should be apparent that, the accompanying drawings in the following description is only some embodiments of the present disclosure, for For those of ordinary skill in the art, without creative efforts, it can also be obtained according to these attached drawings other Attached drawing.
Fig. 1 is a kind of Document Classification Method flow chart provided according to an exemplary embodiment;
Fig. 2 is a kind of Document Classification Method schematic diagram provided according to an exemplary embodiment;
Fig. 3 is a kind of Document Classification Method schematic diagram provided according to an exemplary embodiment;
Fig. 4 is a kind of Document Classification Method schematic diagram provided according to an exemplary embodiment;
Fig. 5 is a kind of block diagram of the document sorting apparatus provided according to an exemplary embodiment;
Fig. 6 is a kind of structural schematic diagram of the terminal provided according to an exemplary embodiment.
Specific embodiment
To keep the purposes, technical schemes and advantages of the disclosure clearer, below in conjunction with attached drawing to disclosure embodiment party Formula is described in further detail.
Example embodiments are described in detail here, and the example is illustrated in the accompanying drawings.Following description is related to When attached drawing, unless otherwise indicated, the same numbers in different drawings indicate the same or similar elements.Following exemplary embodiment Described in embodiment do not represent all implementations consistent with this disclosure.On the contrary, they be only with it is such as appended The example of the consistent device and method of some aspects be described in detail in claims, the disclosure.
The embodiment of the present disclosure is applied in the scene classified to document.When the method provided by the embodiment of the present disclosure When classifying to the first document, can first according in first document the first option information and the second option information determine with Relevant multiple second documents of first document and multiple third documents, according to multiple second document and multiple third documents, The first predicted vector for determining determination first document in second document and third document is believed according to the description of the first document Breath and second document, determine the second predicted vector of first document, according to first predicted vector and the second predicted vector, Determine the classification of first document.
It should be noted is that the quantity of first option information and the second option information can according to need and be set It sets and changes, also, the quantity of first option information and the quantity of the second option information can be the same or different, at this In open embodiment, the quantity of the first option information and the second option information is not especially limited.For example, first option is believed The quantity of breath and the second option information can be 1,2,4,5 or 6 etc..
Need to illustrate on the other hand, the quantity of multiple second document and multiple third documents also can according to need into Row is arranged and changes, also, the quantity of multiple second document and the quantity of multiple third documents can be the same or different, In the embodiments of the present disclosure, the quantity of second document and third document is not especially limited, for example, second document and The quantity of three documents can be 10,15 or 20 etc..
When the document classification method can be applied to formulate individualized learning strategy, the process of the difficulty of each topic is determined In.Correspondingly, first document can be the corresponding document of topic to be sorted, correspondingly, can be with by the embodiment of the present disclosure Topic is classified according to different complexities.When the method provided by the embodiment of the present disclosure carries out difficulty or ease journey to topic When the classification of degree, it is first determined corresponding first document of the topic to be sorted includes description information in first document and more A option information, wherein the description information can be the stem of the topic, and multiple option information can be the choosing of the topic , correspondingly, first option information can be the correct option in topic, which can be dry in topic Disturb answer.Then multiple second document is document relevant to the correct option, and multiple third document is and the interference answer Relevant document.First predicted vector can be the corresponding predicted vector of memory difficulty of the topic, second predicted vector It can be the corresponding predicted vector of reasoning difficulty of the topic.Terminal determines the complexity of topic to be sorted by this method, And then determine the difficulty classification of the topic to be sorted.
Terminal is according to being somebody's turn to do multiple second documents relevant to correct option and being somebody's turn to do third document relevant with interference answer, really The fixed correct option and the similarity degree for interfering answer, are determined the interference answer and obscure journey caused by the correct option Degree, obscures degree according to this, determines the corresponding memory difficulty of topic;Terminal can also be according to relevant to the correct option multiple The description information of second document and the topic determines the reasoning difficulty that the correct option is inferred by the topic;According to the note Recall difficulty and reasoning difficulty determines the corresponding prediction difficulty of the topic, and then the topic is divided according to the difficulty of the topic Class obtains the corresponding learning strategy of the topic.
It should be noted is that the topic can be inscribed when being classified by this method to topic for any section's purpose Mesh is in the embodiments of the present disclosure not especially limited the subject of the topic, for example, the topic can be medicine, English, political affairs It controls.
The document classification method can also be applied to carry out the classification scene such as classify to article according to the content of article title In.
Fig. 1 is a kind of Document Classification Method flow chart provided according to an exemplary embodiment, as shown in Figure 1, the document Classification method the following steps are included:
Step 101: terminal determines the first document to be sorted, which includes description information and multiple options letter It ceases, includes corresponding first option information of at least one description information in multiple option information.
In this step, terminal can directly acquire the first document of user's input, correspondingly, terminal can receive user First document of input;Terminal can also call stored first document to classify from database, correspondingly, terminal tune With the first document stored in the first document database.It should be noted is that first document database can be for eventually Database in end, or the first database stored in first server does not make this to have in the embodiments of the present disclosure Body limits.When first document database is the document database in terminal, terminal can be called by data-interface this Document in one document database, using the document as the first document.When first document database is in first server When document database, terminal can send the first acquisition request to the first server, and first server receives first acquisition Request, determines the first document according to first acquisition request, which is sent to terminal, terminal receives first server The first document sent.
It should be noted is that first document can also may be used for the document that terminal is directly obtained, first document Think the picture comprising the first document, in the embodiments of the present disclosure, the form of first document is not especially limited.Accordingly , when terminal gets the picture comprising first document, terminal can identify the word content in the picture, by the picture It is converted to text and obtains the first document.
First document includes description information and multiple option informations, wherein the description information may include first text The problem of shelves, describes, and multiple option information may include option information relevant to the description information.For example, when first text When shelves are multiple-choice question, which can be the stem of the topic, and multiple option information can be corresponding more for the stem A option.It may include the correct option and wrong option in multiple option.
In addition, when first option information can obtain first document for terminal, it is just true from multiple option informations The first fixed option information, after which can also get the first document for terminal, according in first document Description information and multiple option informations, from multiple option information determine the first option information.In the embodiment of the present disclosure In, this is not especially limited.
For example, terminal can directly receive the stem Q of the topic when terminal carries out difficulty level classification to any topic With answer A, using stem Q as description information, using answer A as the first option information.Terminal can also directly receive this The stem Q of topic multiple option Cs corresponding with the stemi, further according to stem Q and multiple option Ci, from multiple option CiIn Correct option A is selected, using correct option A as the first option information.
The terminal can be mobile phone, computer or wearable device etc..In the embodiments of the present disclosure, which is not made to have Body limitation.Document classification application program can be installed in the terminal, terminal can by the document classification application program to this One document is classified.Also, the terminal can carry out data interaction by network connection or data-interface with first server.
Step 102: terminal determines that corresponding multiple second documents of first option information and the second option information are corresponding Multiple third documents, second option information are other options letter in multiple option information in addition to first option information Breath.
The first implementation, terminal can determine first option according to the first option information and the second option information Corresponding multiple second documents of information and the corresponding multiple third documents of the second option information.Second of implementation, terminal Corresponding multiple second documents of first option information can also be determined according to the first option information and description information, according to Two option informations and description information determine the corresponding multiple third documents of second option information.
As shown in Fig. 2, this step can be with are as follows: terminal determines second when this step is realized by the first implementation The degree of association between each document and the first option information in document database, according to each document and the first option information it Between the degree of association, from document data block select the degree of association be more than the first preset threshold document, by the degree of association be more than first in advance If the document of threshold value is as corresponding second document of the first option information.Terminal determines each document and in document database The degree of association between two option informations, according to the degree of association between each document and the second option information, from document data block The selection degree of association is more than the document of the second preset threshold, and the document that the degree of association is more than the second preset threshold is believed as the second option Cease corresponding third document.
Wherein, terminal can determine that the first option is believed by BM25 (Best Match25, best match strategy) algorithm The degree of association between the degree of association and the second option information and document between breath and document.Also, terminal determines the first option When the degree of association between information and document, terminal can first identify the first keyword in the first option information and in document Three keywords determine the degree of association between the first keyword and third keyword, will be between the first keyword and third keyword The degree of association of the degree of association as the first option information and the document.Equally, terminal determines between the second option information and document The degree of association when, terminal can first identify the second keyword in the second option information and the third keyword in document, determine The degree of association between second keyword and third keyword, using the degree of association between the second keyword and third keyword as The degree of association of one option information and the document.
When this step is realized by second of implementation, this step can be realized by following steps 1021-1025, Include:
1021: terminal determine the first content of text of the description information, first option information the second content of text and The third content of text of second option information.
Wherein, first content of text, the second content of text and third content of text can for be respectively the description information, First option information and the corresponding word content of the second option information, then terminal can by identify the description information obtain this One content of text, by identifying that first option information obtains the second content of text, by identifying that second option information obtains The third content of text.For example, terminal can identify the picture when first document is the document of graphic form, Obtain corresponding first content of text of description information in the image, corresponding second content of text of the first option information and the second choosing The corresponding third content of text of item information.
1022: terminal according to first content of text and second content of text, determine first content of text and this Corresponding first keyword of two content of text.
In this step, terminal can segment the first content of text and the second content of text recognized respectively, Obtaining the first content of text and second is the keyword in content of text.Terminal can be respectively by the pass in first content of text Keyword in keyword and the second content of text is all used as first keyword;Terminal can also first determine first content of text Keyword and the second content of text keyword in identical keyword, using the identical keyword as the first keyword.
It is configured and changes it should be noted is that the quantity of first keyword can according to need, in this public affairs It opens in embodiment, this is not especially limited, for example, the quantity of first keyword can be 5,10 or 15 etc..
1023: terminal determines that corresponding with first keyword this is more according to first keyword from document database A second document.
In this step, terminal can retrieve first keyword according to first keyword from the second document database Corresponding multiple second documents.Correspondingly, the second document database of relevant documentation can be first established before this step, it should Process can be by means of search server, for example, ElasticSearch index engine.Later further according to second document data Relevant second document is retrieved in the ElasticSearch index engine in library.
Second document database can be the document database stored in terminal, can also be to store in second server Document database this is not especially limited in the embodiments of the present disclosure.When second database is stored in terminal When database, terminal can determine corresponding with the first keyword multiple the by data-interface from second document database Two documents;When second document database is the document database in second server, terminal is sent to the second server Second acquisition request in second acquisition request, carries first keyword, and server receives second acquisition request, according to The second keyword in second acquisition request retrieves relevant more to first keyword from second document database Multiple second document is sent to terminal by a second document, and terminal receives multiple second documents of second server transmission.
It should be noted is that the second server can be identical server with first server, second clothes Business device can also be that different servers is in the embodiments of the present disclosure not especially limited this with first server.
1024: terminal according to first content of text and the third content of text, determine first content of text and this Corresponding second keyword of two content of text.
This step is similar to step 1022, and details are not described herein.
1025: terminal determines be somebody's turn to do corresponding with second keyword according to second keyword from this article profile database Multiple third documents.
This step is similar to step 1023, and details are not described herein.
It should be noted is that terminal can first obtain multiple second documents, then obtain multiple third documents;Terminal Multiple third documents can be first obtained, then obtain multiple second documents;Terminal can also obtain simultaneously multiple second document and Multiple third documents.In the embodiments of the present disclosure, multiple sequences for obtaining document and multiple third documents are obtained to terminal not make to have Body limits.For example, terminal can first carry out step 1022-1023 executes step 1024-1025 again, terminal can also first carry out step Rapid 1024-1025 is executing step 1022-1023, and terminal may also be performed simultaneously step 1022-1023 and step 1024- 1025。
For example, terminal can be based on the answer A and interference option in topic when terminal carries out rating to any topic Ci, corresponding multiple second documents of answer A and interference option C are determined respectivelyiCorresponding multiple third documents, terminal can be with Corresponding multiple second documents of text of answer A and problem Q composition, and interference option C are determined respectivelyiWith the text of problem Q composition This corresponding multiple third document.
Step 103: terminal is used to indicate first option according to multiple second document and multiple third document, determination First predicted vector of information and the similarity relation of second option information.
With continued reference to 2, in this step, terminal is selected according to corresponding multiple second documents of first option information and second The corresponding multiple third documents of item information determine the first prediction of the similarity relation of first option information and the second option information Vector.In one possible implementation, terminal is according to multiple second document and multiple third document, determine this first The similarity relation of option information and second option information;Terminal according to the similarity relation of second document and the third document, Determine first predicted vector.In alternatively possible implementation, terminal is previously stored the first prediction model, the first prediction Model is used to predict similarity between multiple second documents and multiple third documents, so that it is determined that first option information and the The similarity relation of two option informations, and then determine first predicted vector of first option information and the second option information.Accordingly , this step can be with are as follows: terminal inputs second document, the third document in the first prediction model, obtains first prediction Vector.
Wherein, in the embodiments of the present disclosure, terminal can by Attention (attention mechanism), to the second document and Each of third document sequence is handled.When terminal determines first predicted vector according to the first implementation, eventually End determines the acquaintance relation of first option information and the second option information according to multiple second documents and multiple third documents Process can be accomplished by the following way, comprising:
1031: terminal is respectively according in multiple first words and multiple third document in multiple second document Multiple second words determine the first matrix of multiple second document composition and the second matrix of multiple third document composition.
In this step, terminal first segments multiple second document and multiple third documents respectively, obtains first Word and the second word, in one possible implementation, terminal directly form the first matrix according to first word, according to Second word forms the second matrix.In alternatively possible implementation, as shown in figure 3, terminal can be by first word Language and the second word are respectively converted into vector form, determine first matrix and the second matrix according to the vector form, correspondingly, The terminal can be with by the process that the vector form of the first word and the second word forms the first matrix and the second matrix are as follows: terminal The vector representation of first word and the second word is determined respectively, which can determine each by way of tabling look-up The d dimensional vector of word.
With continued reference to Fig. 3, terminal determines that the vector of each word later again encodes the corresponding vector of each word, the volume Code process can pass through Bi-GRU (Bi-directional Gated Recurrent Unit, bidirectional valve controlled cycling element nerve Network) or Bi-LSTM (Long Short-Term Memory, two-way shot and long term memory network) progress loop coding, wherein The structure of RNN (RecurrentNeural Network, Recognition with Recurrent Neural Network) is different in Bi-GRU network and Bi-LSTM network. It, can be according in each second document after terminal obtains the vector representation of multiple first words and multiple second words The vector representation of one word is spliced, and the first matrix is obtained, according to the second word in each third document to Amount representation is spliced, and the second matrix is obtained.
1032: each column of each column of first matrix and second matrix are carried out dot product by terminal, are obtained for referring to Show the first matching matrix of the similarity relation between second document and the third document.
With continued reference to Fig. 3, in this step, each column are carried out in the first matrix and the second matrix that terminal obtains splicing Dot product obtains the first matching matrix (matching matrix M).For example, first matrix can be DA, the second matrix can be with For DC, D is calculated laterAAnd DCEach column carry out dot product and obtain the first matching matrix M.
1033: terminal determines the first weight according to first matching matrix, first weight be multiple second document and The weight of the context vector of multiple third document.
Terminal can select a column sequence from the first matching matrix, be normalized, obtain to the column sequence First weight.Each column sequence in first matching matrix can also be normalized respectively for terminal, obtain multiple First weight.Wherein, which can be realized by function softmax.
1034: terminal determines the second word of each of multiple third document according to first matrix and first weight The first context vector in multiple second document;And it according to second matrix and first weight, determines multiple Second context vector of the first word of each of second document in multiple third document.
With continued reference to Fig. 3, in this step, terminal passes through the first word of each of first weight and the first matrix The vector form of language be multiplied carry out summation can be obtained by the form of expression of second matrix in first matrix namely this First context vector of the second word of each of three documents in multiple second document indicates.For example, according to this first The first row of matching vector determines the first weight, then can be determined in the third document by first weight and the first matrix Context vector of first the second word in multiple second document.
It is multiplied and is summed by the vector form of the second word of each of the first weight and the second matrix, can obtained To second context vector of the first word of each of multiple second document in multiple third document.Determine about second The process of literary vector is similar to the process of the first context vector is determined, details are not described herein.
1035: terminal compares first context vector and second context vector, obtains first option The similarity relation of information and second option information.
First context vector is spliced, context of multiple third document in multiple second documents is obtained It indicates;Second context vector is spliced, context table of multiple second document in multiple third documents is obtained Show, multiple second document and is determined according to spliced first context vector and spliced second context vector more Similitude between a third document determines that first option is believed according to the similitude between second document and third document Similarity relation between breath and the second option information.
It should be noted is that being obtained when terminal inputs second document and the third document in the first prediction model When to first predicted vector, which may include the first text conversion coating (EmbedLayer), the first coding Layer (EncodeLayer) and the first prediction interval (AttendLayer).The output end of text conversion coating and the input terminal of coding layer connect It connects, the output end of coding layer is connect with the input terminal of the first prediction interval.Wherein, text conversion coating be used for will receive first Document, multiple second documents and multiple third documents are segmented, and multiple words are obtained, and terminal will in the process and step 1031 The process segmented in multiple second document and multiple third documents is similar, and details are not described herein.The coding layer is used for will Multiple first words and multiple second words that text conversion coating segments are encoded, and the context table of each word is obtained Show, the process is similar to the process encoded in step 1031 to multiple first words and multiple second words, herein no longer It repeats.First prediction interval is used to determine the first option information and the second choosing according to multiple first word and multiple second words The similarity relation of item information, and then determine the similarity relation of multiple second document and multiple third documents, so that it is determined that first First predicted vector of option information and the second option information, the process is similar to step 1032-1035, and details are not described herein.
Step 104: terminal determines that being used to indicate this retouches according to the description information of multiple second document and first document State the second predicted vector of information and the derivation relationship of first option information.
With continued reference to Fig. 2, in this step, terminal determines description information and the first option information in the first document, and Multiple second documents, according to, with multiple second documents, determination is according to second document in the description information and the first option information Infer the derivation relationship of first option information with description information, so determination second predicted vector.In a kind of possibility Implementation in, the description information of first document and the first option information are formed the 4th content of text by terminal;Terminal root According to multiple second document and the 4th content of text, the derivation relationship of second document and the 4th content of text is determined;Eventually End determines second predicted vector according to the derivation relationship of second document and the 4th content of text.Alternatively possible In implementation, second document, the description information of first document and first document are inputted the second prediction model by terminal In, obtain second predicted vector.
Wherein, when terminal determines second predicted vector according to the first implementation, terminal is according to multiple second text Shelves and the 4th content of text determine that the process of the derivation relationship of second document and the 4th content of text can be by following Step is realized, comprising:
1401: terminal is respectively according to the in the multiple third words and second content of text in multiple second document Four words determine the third matrix and the 4th text of multiple third words composition of each second document in multiple second document 4th matrix of multiple 4th words composition in this.
Before this step, terminal will be in the description information and the first phenomenon information the 4th text of composition in first document Hold, according to the 4th content of text.For example, terminal is by the topic when the terminal carries out the classification of complexity to the topic Stem Q and topic correct option A form the 4th text.
Similar to step 103, in this step, terminal segments multiple second documents and the 4th content of text, obtains To third word and the 4th word, in one possible implementation, terminal directly forms third square according to the third word Battle array forms the 4th matrix according to the 4th word.In alternatively possible implementation, terminal can by the third word and 4th word is respectively converted into vector form, according to the third word of the vector form and the 4th word determine the third matrix and 4th matrix, the terminal by the vector form of third word and the 4th word form the process of third matrix and the 4th matrix with Terminal is similar by the process that the vector form of first word and the second word forms the first matrix and the second matrix, herein not It repeats again.
It should be noted is that the first word and third word are the word for multiple second documents that the terminal determines Language, correspondingly, first word and third word can be identical word, terminal only can be segmented once, thus together When determine multiple first words and third word, in the embodiments of the present disclosure, this is not especially limited.
1402: terminal will be in each column of the 4th matrix respectively multiple third matrixes corresponding with multiple second document Each column carry out dot product, obtain the of the derivation relationship being used to indicate between the 4th content of text and multiple second document Two matching matrixes.
This step is similar to step 1302, and details are not described herein.
It should be noted is that in multiple second document each second document require to determine by the process this In corresponding second matching matrix of two documents namely this step, terminal determines multiple second matching matrixes.
1403: terminal determines the second weight according to second matching matrix, second weight be multiple second document and The weight of the context vector of 4th content of text.
This step is similar to step 1033, and details are not described herein.
1404: terminal determines each second in multiple second document according to the 4th content of text and second weight Third context vector of each third word of document in the 4th content of text.
This step is similar to step 1034, and details are not described herein.
For example, corresponding 4th matrix of the 4th content of text is D, multiple second document DsAIn each second document it is corresponding Third matrix is Di∈DA, determine each second document DiIn each word wij∈DiThe context in the 4th content of text D Vector.
1405: terminal according to the second document of each of the third context vector, multiple second document and this second Weight determines fourth context vector of the 4th word of each of the 4th content of text in each second document.
In this step, terminal merges the third context vector with second document, according to this it is fused to Measure the 4th context vector for determining the 4th word in the 4th content of text in each second document.
The terminal is according to the second document of each of the third context vector, multiple second document and second power Weight, determines the process of fourth context vector of the 4th word of each of the 4th content of text in each second document It can be realized by following steps, comprising:
(1) the third context vector is fused in the corresponding third matrix of multiple second document by terminal, is melted Multiple 5th matrixes of second document He the 4th content of text are closed.
As shown in figure 4, in this step, which is fused to multiple second document pair by terminal respectively In the third matrix answered, which is the corresponding matrix of each second document.Terminal can be directly by the third context The third context vector is fused in third matrix by vector by the corresponding connecting method being added, and obtains multiple 5th squares Battle array.Since each third word that the third context vector is each second document in multiple second document is in the 4th text The random row below vector of the third is spliced in the third matrix by the context vector in this content, so that it may by this second Document and the 4th content of text are fused into a vector matrix, then include the second document and the 4th text in the 5th matrix Content.
It should be noted is that the third context vector is fused to the corresponding third matrix of second document by terminal In after, can be by the third matrix of fusion third context vector directly as the 5th matrix.Terminal can also melt this The dot matrix for having closed third context vector re-starts context and extracts and encode, which can pass through Bi-GRU or Bi- LSTM is carried out, and in the embodiments of the present disclosure, is not especially limited to this.
(2) terminal carries out dot product by the 5th matrix of each of multiple 5th matrix and with the 5th context vector, obtains To third weight, the weight of derivation relationship of the third weight between multiple 4th document, the 5th context vector is should Context vector of the 5th word in the 4th document in each 4th document.
This step is similar to step 1033, and details are not described herein.
(3) terminal is according to the third weight, determine fourth context of the 5th word in each 4th document to Amount.
This step is similar to step 1034, and details are not described herein.
1406: terminal determines that the reasoning of second document and the 4th content of text is closed according to the 4th context vector System.
In this step, in the 4th context vector, including second document and the 4th content of text, then according to this The characteristic feature of four context vectors can determine and infer corresponding first choosing of the description information according to multiple second document The derivation relationship of item information.
It should be noted is that when terminal is by second document, the description information of first document and first document It inputs in the second prediction model, when obtaining second predicted vector, second prediction model and the second prediction model can be phase Same model, or different models is in the embodiments of the present disclosure not especially limited this.When the first prediction mould It include the second text conversion coating, the second coding layer in second prediction model when type and the second prediction model are different models With the second prediction interval, the structure of second prediction model and the first prediction model is similar, and details are not described herein.When first prediction It include the first text conversion coating, the first coding in second prediction model when model and the second prediction model are identical model Layer and the first prediction interval and third prediction interval.Wherein, the input of the third prediction interval is connect with the output of the first coding layer.
It needing to illustrate on the other hand, terminal can first determine that first predicted vector determines second predicted vector again, Terminal can also first determine that second predicted vector determines first predicted vector again, and terminal can also determine that this is first pre- simultaneously The second predicted vector of vector sum is surveyed in the disclosure not make the sequence of first predicted vector of determination and the second predicted vector It is specific to limit.Namely terminal can first carry out step 103 and execute step 104 again, terminal can also first carry out step 104 and execute again Step 103, terminal may also be performed simultaneously step 103 and step 104.
Step 105: terminal determines that the third of first document is pre- according to first predicted vector and second predicted vector Direction finding amount.
In this step, the first predicted vector of output and the second predicted vector are spliced, obtaining can be with complete table Up to the third predicted vector of the first document information.Wherein, first predicted vector and the second predicted vector can be equal length Vector, or the vector of different length is in the embodiments of the present disclosure not especially limited this.
First predicted vector and the second predicted vector direct splicing directly can be obtained third predicted vector by terminal, eventually End can also determine first predicted vector and the second predicted vector from different angles respectively, should further according to different angles First predicted vector and the second predicted vector are spliced.The different angle can be mean angular or maximum value angle etc., In the embodiments of the present disclosure, this is not especially limited, when by manner angle and maximum value angle to first predicted vector When being spliced with the second predicted vector, this step can be realized by following steps, comprising:
1051: terminal determines the first equalization vector sum first of first predicted vector most according to first predicted vector Big value vector.
With continued reference to Fig. 3, terminal can obtain first the first equalization of predicted vector vector sum first most by pond method Big value vector.Correspondingly, terminal can determine the first mean value of the first predicted vector by mean value pond (meanpooling) Change vector and the first maximization vector of the first predicted vector is determined by maximum value pond (maxpooling).
1052: terminal splices the first equalization vector sum the first maximization vector, obtains the 4th prediction Vector.
In this step, terminal splices the first equalization vector sum the first maximization vector, in the disclosure In embodiment, the connecting method of hair the first equalization vector sum the second equalization vector is not especially limited, for example, this One the first maximization of equalization vector sum vector can carry out ending splicing, can also be spliced side by side.
1053: terminal determines the second equalization vector sum second of second predicted vector most according to second predicted vector Big value vector.
This step is similar to step 1501, and details are not described herein.
1054: terminal splices the second equalization vector sum the second maximization vector, obtains the 5th prediction Vector.
This step is similar to step 1502, and details are not described herein.
1055: terminal splices the 5th predicted vector and the 4th predicted vector, obtains the third predicted vector.
It should be noted is that terminal can also only carry out at equalization the first predicted vector and the second predicted vector Reason or maximization processing, correspondingly, in this step, third predicted vector can be for by the first predicted vector of maximization The third predicted vector spliced with the second predicted vector, the third predicted vector can also be for by the first predictions of equalization The third predicted vector that the second predicted vector of vector sum is spliced.
Step 106: terminal determines the classification of first document according to the third predicted vector.
With continued reference to Fig. 2, in this step, the characterization feature and document classification of the predicted vector can be stored in advance in terminal Corresponding relationship can be pre- from this according to the characterization feature of the third predicted vector when terminal determines the third predicted vector In the characterization feature and the other corresponding relationship of document class of direction finding amount, the class of corresponding first document of the third predicted vector is determined Not.
It should be noted is that terminal is after the classification that the first document has been determined, it can also be by different classes of One document recommends different users, correspondingly, the process can be with are as follows: the classification of document needed for terminal determines user;According to The classification of document needed for the user recommends the first document corresponding with the category to user.
For example, terminal can recommend the exercise of different difficulty to student according to the demand of different students, the student's Demand can determine the difficulty for the exercise that the student needs according to the ability of the current study schedule of student or student itself, in turn Selection recommends the student with the matched exercise of the difficulty, completes the specified of individualized learning.
In the embodiments of the present disclosure, by the first document to be sorted, determination is relevant to first document to be sorted Multiple second documents and multiple third documents, determine in first document, the first option according to second document and third document Information and the second option information determine first predicted vector, determine the second pre- direction finding according to first document and the second document Amount, first predicted vector and the second predicted vector is spliced, the document class for obtaining being used to indicate first document is other Third predicted vector.By determining the similarity relation and derivation relationship of the relevant documentation of the first document, first document is determined Classification, when so as to avoid by artificially classifying to document, existing subjective bias reduces manpower consumption, reduces Cost.
Fig. 5 is a kind of block diagram of the document sorting apparatus provided according to an exemplary embodiment.The device is above-mentioned for executing The step of when Document Classification Method executes, referring to Fig. 5, device includes:
First determining module 501, for determining the first document to be sorted, which includes description information and multiple Option information includes corresponding first option information of at least one description information in multiple option information;
Second determining module 502, for determining corresponding multiple second documents of first option information and the second option letter Corresponding multiple third documents are ceased, which is its in addition to first option information in multiple option information His option information;
Third determining module 503, for according to multiple second document and multiple third document, determination to be used to indicate this First predicted vector of the similarity relation of the first option information and second option information;
4th determining module 504, for the description information according to multiple second document and first document, determination is used for Indicate second predicted vector of the description information and the derivation relationship of first option information;
5th determining module 505, for determining first document according to first predicted vector and second predicted vector Third predicted vector;
6th determining module 506, for determining the classification of first document according to the third predicted vector.
In one possible implementation,
Second determining module 502 is also used to determine the first content of text of the description information, first option information The third content of text of second content of text and second option information;According in first content of text and second text Hold, determines first content of text the first keyword corresponding with second content of text;According to first keyword, from document Multiple second document corresponding with first keyword is determined in database;According to first content of text and the third text Content determines first content of text the second keyword corresponding with second content of text;According to second keyword, from this Multiple third document corresponding with second keyword is determined in document database.
In alternatively possible implementation, the third determining module 503, be also used to according to multiple second document and Multiple third document, determines the similarity relation of first option information and second option information;According to second document and The similarity relation of the third document determines first predicted vector;Alternatively, by second document and third document input first In prediction model, first predicted vector is obtained.
In alternatively possible implementation, which is also used to respectively according to multiple second text Multiple second words in multiple first words and multiple third document in shelves determine multiple second document composition Second matrix of the first matrix and multiple third document composition;By each of each column of first matrix and second matrix Column carry out dot product, obtain the first matching matrix of the similarity relation being used to indicate between second document and the third document;Root Determine the first weight according to first matching matrix, first weight be multiple second document and multiple third document up and down The weight of literary vector;According to first matrix and first weight, determine the second word of each of multiple third document at this The first context vector in multiple second documents;And according to second matrix and first weight, determine multiple second Second context vector of the first word of each of document in multiple third document;By first context vector and this Two context vectors compare, and obtain the similarity relation of first option information and second option information.
In alternatively possible implementation, the 4th determining module 504 is also used to believe the description of first document Breath and the first option information form the 4th content of text;According to multiple second document and the 4th content of text, determine this The derivation relationship of two documents and the 4th content of text;According to the derivation relationship of second document and the 4th content of text, really Fixed second predicted vector;Alternatively, second document, the description information of first document and first document input second is pre- It surveys in model, obtains second predicted vector.
In alternatively possible implementation, the 4th determining module 504 is also used to respectively according to multiple second text The 4th word in multiple third words and second content of text in shelves determines each second text in multiple second document 4th matrix of the third matrix of multiple third words composition of shelves and multiple 4th words composition in the 4th text;It should Each column in multiple third matrixes corresponding with multiple second document carry out dot product to each column of 4th matrix respectively, obtain It is used to indicate the second matching matrix of the derivation relationship between the 4th content of text and multiple second document;According to this second Matching matrix determines the second weight, which is the context vector of multiple second document and the 4th content of text Weight;According to the 4th content of text and second weight, each of each second document in multiple second document is determined Third context vector of three words in the 4th content of text;According to the third context vector, multiple second document Each of the second document and second weight, determine the 4th word of each of the 4th content of text in each second text The 4th context vector in shelves;According to the 4th context vector, pushing away for second document and the 4th content of text is determined Reason relationship.
In alternatively possible implementation, the 4th determining module 504 is also used to melt the third context vector It closes in the corresponding third matrix of multiple second document, has been merged the multiple of second document and the 4th content of text 5th matrix;Dot product is carried out by the 5th matrix of each of multiple 5th matrix and with the 5th context vector, obtains third Weight, the weight of derivation relationship of the third weight between multiple 4th document, the 5th context vector be this each the Context vector of the 5th word in the 4th document in four documents;According to the third weight, determine that the 5th word exists The 4th context vector in each 4th document.
In alternatively possible implementation, the 5th determining module 505 is also used to true according to first predicted vector First equalization vector sum the first maximization vector of fixed first predicted vector;By the first equalization vector sum this first Maximization vector is spliced, and the 4th predicted vector is obtained;Second predicted vector is determined according to second predicted vector Second equalization vector sum the second maximization vector;By the second equalization vector sum, the second maximization vector is spelled It connects, obtains the 5th predicted vector;5th predicted vector and the 4th predicted vector are spliced, the pre- direction finding of the third is obtained Amount.
In the embodiments of the present disclosure, by the first document to be sorted, determination is relevant to first document to be sorted Multiple second documents and multiple third documents, determine in first document, the first option according to second document and third document Information and the second option information determine first predicted vector, determine the second pre- direction finding according to first document and the second document Amount, first predicted vector and the second predicted vector is spliced, the document class for obtaining being used to indicate first document is other Third predicted vector.By determining the similarity relation and derivation relationship of the relevant documentation of the first document, first document is determined Classification, when so as to avoid by artificially classifying to document, existing subjective bias reduces manpower consumption, reduces Cost.
It should be understood that document sorting apparatus provided by the above embodiment is in document classification, only with above-mentioned each function The division progress of module can according to need and for example, in practical application by above-mentioned function distribution by different function moulds Block is completed, i.e., the internal structure of device is divided into different functional modules, to complete all or part of function described above Energy.In addition, document sorting apparatus provided by the above embodiment and Document Classification Method embodiment belong to same design, it is specific real Existing process is detailed in embodiment of the method, and which is not described herein again.
Fig. 6 shows the structural block diagram of the terminal 600 of one exemplary embodiment of disclosure offer.The terminal 600 can be with Be: (Moving Picture Experts GroupAudio Layer III is moved for smart phone, tablet computer, MP3 player State image expert's compression standard audio level 3), MP4 (Moving PictureExperts Group Audio Layer IV, Dynamic image expert's compression standard audio level 4) player, laptop or desktop computer.Terminal 600 is also possible to be referred to as Other titles such as user equipment, portable terminal, laptop terminal, terminal console.
In general, terminal 600 includes: processor 601 and memory 602.
Processor 601 may include one or more processing cores, such as 4 core processors, 8 core processors etc..Place Reason device 601 can use DSP (Digital Signal Processing, Digital Signal Processing), FPGA (Field- Programmable Gate Array, field programmable gate array), PLA (ProgrammableLogic Array, may be programmed Logic array) at least one of example, in hardware realize.Processor 601 also may include primary processor and coprocessor, master Processor is the processor for being handled data in the awake state, also referred to as CPU (Central Processing Unit, central processing unit);Coprocessor is the low power processor for being handled data in the standby state.? In some embodiments, processor 601 can be integrated with GPU (Graphics Processing Unit, image processor), GPU is used to be responsible for the rendering and drafting of content to be shown needed for display screen.In some embodiments, processor 601 can also be wrapped AI (Artificial Intelligence, artificial intelligence) processor is included, the AI processor is for handling related machine learning Calculating operation.
Memory 602 may include one or more computer readable storage mediums, which can To be non-transient.Memory 602 may also include high-speed random access memory and nonvolatile memory, such as one Or multiple disk storage equipments, flash memory device.In some embodiments, the non-transient computer in memory 602 can Storage medium is read for storing at least one instruction, at least one instruction for performed by processor 601 to realize this public affairs Open the Document Classification Method that middle embodiment of the method provides.
In some embodiments, terminal 600 is also optional includes: peripheral device interface 603 and at least one peripheral equipment. It can be connected by bus or signal wire between processor 601, memory 602 and peripheral device interface 603.Each peripheral equipment It can be connected by bus, signal wire or circuit board with peripheral device interface 603.Specifically, peripheral equipment includes: radio circuit 604, at least one of display screen 605, camera 606, voicefrequency circuit 607, positioning component 608 and power supply 609.
Peripheral device interface 603 can be used for I/O (Input/Output, input/output) is relevant outside at least one Peripheral equipment is connected to processor 601 and memory 602.In some embodiments, processor 601, memory 602 and peripheral equipment Interface 603 is integrated on same chip or circuit board;In some other embodiments, processor 601, memory 602 and outer Any one or two in peripheral equipment interface 603 can realize on individual chip or circuit board, the present embodiment to this not It is limited.
Radio circuit 604 is for receiving and emitting RF (Radio Frequency, radio frequency) signal, also referred to as electromagnetic signal.It penetrates Frequency circuit 604 is communicated by electromagnetic signal with communication network and other communication equipments.Radio circuit 604 turns electric signal It is changed to electromagnetic signal to be sent, alternatively, the electromagnetic signal received is converted to electric signal.Optionally, radio circuit 604 wraps It includes: antenna system, RF transceiver, one or more amplifiers, tuner, oscillator, digital signal processor, codec chip Group, user identity module card etc..Radio circuit 604 can be carried out by least one wireless communication protocol with other terminals Communication.The wireless communication protocol includes but is not limited to: Metropolitan Area Network (MAN), each third generation mobile communication network (2G, 3G, 4G and 5G), wireless office Domain net and/or WiFi (Wireless Fidelity, Wireless Fidelity) network.In some embodiments, radio circuit 604 may be used also To include the related circuit of NFC (Near Field Communication, wireless near field communication), the disclosure is not subject to this It limits.
Display screen 605 is for showing UI (User Interface, user interface).The UI may include figure, text, figure Mark, video and its their any combination.When display screen 605 is touch display screen, display screen 605 also there is acquisition to show The ability of the touch signal on the surface or surface of screen 605.The touch signal can be used as control signal and be input to processor 601 are handled.At this point, display screen 605 can be also used for providing virtual push button and/or dummy keyboard, also referred to as soft button and/or Soft keyboard.In some embodiments, display screen 605 can be one, and the front panel of terminal 600 is arranged;In other embodiments In, display screen 605 can be at least two, be separately positioned on the different surfaces of terminal 600 or in foldover design;In still other reality It applies in example, display screen 605 can be flexible display screen, be arranged on the curved surface of terminal 600 or on fold plane.Even, it shows Display screen 605 can also be arranged to non-rectangle irregular figure, namely abnormity screen.Display screen 605 can use LCD (Liquid Crystal Display, liquid crystal display), OLED (OrganicLight-Emitting Diode, Organic Light Emitting Diode) Etc. materials preparation.
CCD camera assembly 606 is for acquiring image or video.Optionally, CCD camera assembly 606 include front camera and Rear camera.In general, the front panel of terminal is arranged in front camera, the back side of terminal is arranged in rear camera.One In a little embodiments, rear camera at least two is main camera, depth of field camera, wide-angle camera, focal length camera shooting respectively Any one in head, to realize that main camera and the fusion of depth of field camera realize background blurring function, main camera and wide-angle Camera fusion realizes that pan-shot and VR (Virtual Reality, virtual reality) shooting function or other fusions are clapped Camera shooting function.In some embodiments, CCD camera assembly 606 can also include flash lamp.Flash lamp can be monochromatic warm flash lamp, It is also possible to double-colored temperature flash lamp.Double-colored temperature flash lamp refers to the combination of warm light flash lamp and cold light flash lamp, can be used for not With the light compensation under colour temperature.
Voicefrequency circuit 607 may include microphone and loudspeaker.Microphone is used to acquire the sound wave of user and environment, and will Sound wave, which is converted to electric signal and is input to processor 601, to be handled, or is input to radio circuit 604 to realize voice communication. For stereo acquisition or the purpose of noise reduction, microphone can be separately positioned on the different parts of terminal 600 to be multiple.Mike Wind can also be array microphone or omnidirectional's acquisition type microphone.Loudspeaker is then used to that processor 601 or radio circuit will to be come from 604 electric signal is converted to sound wave.Loudspeaker can be traditional wafer speaker, be also possible to piezoelectric ceramic loudspeaker.When When loudspeaker is piezoelectric ceramic loudspeaker, the audible sound wave of the mankind can be not only converted electrical signals to, it can also be by telecommunications Number the sound wave that the mankind do not hear is converted to carry out the purposes such as ranging.In some embodiments, voicefrequency circuit 607 can also include Earphone jack.
Positioning component 608 is used for the current geographic position of positioning terminal 600, to realize navigation or LBS (Location Based Service, location based service).Positioning component 608 can be the GPS (Global based on the U.S. Positioning System, global positioning system), the dipper system of China, Russia Gray receive this system or European Union The positioning component of Galileo system.
Power supply 609 is used to be powered for the various components in terminal 600.Power supply 609 can be alternating current, direct current, Disposable battery or rechargeable battery.When power supply 609 includes rechargeable battery, which can support wired charging Or wireless charging.The rechargeable battery can be also used for supporting fast charge technology.
In some embodiments, terminal 600 further includes having one or more sensors 610.The one or more sensors 610 include but is not limited to: acceleration transducer 611, gyro sensor 612, pressure sensor 613, fingerprint sensor 614, Optical sensor 615 and proximity sensor 616.
The acceleration that acceleration transducer 611 can detecte in three reference axis of the coordinate system established with terminal 600 is big It is small.For example, acceleration transducer 611 can be used for detecting component of the acceleration of gravity in three reference axis.Processor 601 can With the acceleration of gravity signal acquired according to acceleration transducer 611, control display screen 605 with transverse views or longitudinal view into The display of row user interface.Acceleration transducer 611 can be also used for the acquisition of game or the exercise data of user.
Gyro sensor 612 can detecte body direction and the rotational angle of terminal 600, and gyro sensor 612 can To cooperate with acquisition user to act the 3D of terminal 600 with acceleration transducer 611.Processor 601 is according to gyro sensor 612 Following function may be implemented in the data of acquisition: when action induction (for example changing UI according to the tilt operation of user), shooting Image stabilization, game control and inertial navigation.
The lower layer of side frame and/or display screen 605 in terminal 600 can be set in pressure sensor 613.Work as pressure sensing When the side frame of terminal 600 is arranged in device 613, user can detecte to the gripping signal of terminal 600, by 601 basis of processor The gripping signal that pressure sensor 613 acquires carries out right-hand man's identification or prompt operation.When the setting of pressure sensor 613 is being shown When the lower layer of screen 605, the pressure operation of display screen 605 is realized to operating on the interface UI according to user by processor 601 Property control is controlled.Operability control include button control, scroll bar control, icon control, in menu control at least It is a kind of.
Fingerprint sensor 614 is used to acquire the fingerprint of user, collected according to fingerprint sensor 614 by processor 601 The identity of fingerprint recognition user, alternatively, by fingerprint sensor 614 according to the identity of collected fingerprint recognition user.It is identifying When the identity of user is trusted identity out, the user is authorized to execute relevant sensitive operation, the sensitive operation packet by processor 601 Include solution lock screen, check encryption information, downloading software, payment and change setting etc..Terminal can be set in fingerprint sensor 614 600 front, the back side or side.When being provided with physical button or manufacturer Logo in terminal 600, fingerprint sensor 614 can be with It is integrated with physical button or manufacturer Logo.
Optical sensor 615 is for acquiring ambient light intensity.In one embodiment, processor 601 can be according to optics The ambient light intensity that sensor 615 acquires controls the display brightness of display screen 605.Specifically, when ambient light intensity is higher, The display brightness of display screen 605 is turned up;When ambient light intensity is lower, the display brightness of display screen 605 is turned down.In another reality It applies in example, the ambient light intensity that processor 601 can also be acquired according to optical sensor 615, dynamic adjusts CCD camera assembly 606 Acquisition parameters.
Proximity sensor 616, also referred to as range sensor are generally arranged at the front panel of terminal 600.Proximity sensor 616 For acquiring the distance between the front of user Yu terminal 600.In one embodiment, when proximity sensor 616 detects use When family and the distance between the front of terminal 600 gradually become smaller, display screen 605 is controlled by processor 601 and is switched from bright screen state To cease screen state;When proximity sensor 616 detects user and the distance between the front of terminal 600 becomes larger, by Reason device 601 controls display screen 605 and is switched to bright screen state from breath screen state.
It will be understood by those skilled in the art that the restriction of structure shown in Fig. 6 not structure paired terminal 600, can wrap It includes than illustrating more or fewer components, perhaps combine certain components or is arranged using different components.
The embodiment of the present disclosure additionally provides a kind of computer readable storage medium, which is applied to Terminal is stored at least one instruction, at least a Duan Chengxu, code set or instruction set in the computer readable storage medium, should Instruction, the program, the code set or the instruction set are loaded by processor and execute the document classification side to realize above-described embodiment Operation performed by terminal in method.
Those of ordinary skill in the art will appreciate that realizing that all or part of the steps of above-described embodiment can pass through hardware It completes, relevant hardware can also be instructed to complete by program, which can store in a kind of computer-readable storage In medium, storage medium mentioned above can be read-only memory, disk or CD etc..
The above is only the preferred embodiments of the disclosure, not to limit the disclosure, all spirit and principle in the disclosure Within, any modification, equivalent replacement, improvement and so on should be included within the protection scope of the disclosure.

Claims (10)

1. a kind of Document Classification Method, which is characterized in that the described method includes:
Determine that the first document to be sorted, first document include description information and multiple option informations, the multiple option It include corresponding first option information of at least one described description information in information;
Determine corresponding multiple second documents of first option information and the corresponding multiple third documents of the second option information, institute Stating the second option information is other option informations in the multiple option information in addition to first option information;
According to the multiple second document and the multiple third document, determine be used to indicate first option information with it is described First predicted vector of the similarity relation of the second option information;
According to the description information of the multiple second document and first document, determination is used to indicate the description information and institute State the second predicted vector of the derivation relationship of the first option information;
According to first predicted vector and second predicted vector, the third predicted vector of first document is determined;
According to the third predicted vector, the classification of first document is determined.
2. the method according to claim 1, wherein the determination first option information corresponding multiple Two documents and the corresponding multiple third documents of the second option information, comprising:
Determine the first content of text of the description information, the second content of text of first option information and second choosing The third content of text of item information;
According to first content of text and second content of text, first content of text and second text are determined Corresponding first keyword of content;
According to first keyword, the multiple second text corresponding with first keyword is determined from document database Shelves;
According to first content of text and the third content of text, first content of text and second text are determined Corresponding second keyword of content;
According to second keyword, corresponding with second keyword the multiple the is determined from the document database Three documents.
3. the method according to claim 1, wherein described according to the multiple second document and the multiple Three documents determine the first pre- direction finding for being used to indicate the similarity relation of first option information and second option information Amount, comprising:
According to the multiple second document and the multiple third document, first option information and second option are determined The similarity relation of information;According to the similarity relation of second document and the third document, first predicted vector is determined; Alternatively,
Second document and the third document are inputted in the first prediction model, first predicted vector is obtained.
4. according to the method described in claim 3, it is characterized in that, described according to the multiple second document and the multiple Three documents determine the similarity relation of first option information Yu second option information, comprising:
Respectively according to multiple second in multiple first words and the multiple third document in the multiple second document Word determines the first matrix of the multiple second document composition and the second matrix of the multiple third document composition;
Each column of each column of first matrix and second matrix are subjected to dot product, obtain being used to indicate described second First matching matrix of the similarity relation between document and the third document;
The first weight is determined according to first matching matrix, and first weight is the multiple second document and the multiple The weight of the context vector of third document;
According to first matrix and first weight, determine the second word of each of the multiple third document described more The first context vector in a second document;And it according to second matrix and first weight, determines the multiple Second context vector of the first word of each of second document in the multiple third document;
First context vector and second context vector are compared, first option information and institute are obtained State the similarity relation of the second option information.
5. the method according to claim 1, wherein it is described according to the multiple second document and it is described first text The description information of shelves determines the second pre- direction finding for being used to indicate the derivation relationship of the description information and first option information Amount, comprising:
The description information of first document and the first option information are formed into the 4th content of text;According to the multiple second text Shelves and the 4th content of text, determine the derivation relationship of second document and the 4th content of text;According to described The derivation relationship of two documents and the 4th content of text determines second predicted vector;Alternatively,
Second document, the description information of first document and first document are inputted in the second prediction model, obtained To second predicted vector.
6. according to the method described in claim 5, it is characterized in that, it is described according to the multiple second document and it is described 4th text This content determines the derivation relationship of second document and the 4th content of text, comprising:
Respectively according to the 4th word in the multiple third words and second content of text in the multiple second document, really In fixed the multiple second document in the third matrix and the 4th text of multiple third words composition of each second document Multiple 4th words composition the 4th matrix;
By each column in each column of the 4th matrix respectively multiple third matrixes corresponding with the multiple second document Dot product is carried out, second of the derivation relationship being used to indicate between the 4th content of text and the multiple second document is obtained With matrix;
Determine that the second weight, second weight are the multiple second document and the described 4th according to second matching matrix The weight of the context vector of content of text;
According to the 4th content of text and second weight, the every of each second document in the multiple second document is determined Third context vector of a third word in the 4th content of text;
According to the second document of each of the third context vector, the multiple second document and second weight, really Each of fixed 4th content of text fourth context vector of the 4th word in each second document;
According to the 4th context vector, the derivation relationship of second document and the 4th content of text is determined.
7. according to the method described in claim 6, it is characterized in that, described according to third context vector, the multiple second The second document of each of document and second weight determine the 4th word of each of the 4th content of text described The 4th context vector in each second document, comprising:
The third context vector is fused in the corresponding third matrix of the multiple second document, has been merged described Multiple 5th matrixes of second document and the 4th content of text;
Dot product is carried out by the 5th matrix of each of the multiple 5th matrix and with the 5th context vector, obtains third power Weight, the weight of derivation relationship of the third weight between the multiple 4th document, the 5th context vector is described Context vector of the 5th word in the 4th document in each 4th document;
According to the third weight, fourth context vector of the 5th word in each 4th document is determined.
8. the method according to claim 1, wherein described according to first predicted vector and described second pre- Direction finding amount determines the third predicted vector of first document, comprising:
According to first predicted vector determine first the first maximization of equalization vector sum of first predicted vector to Amount;
First maximization vector described in the first equalization vector sum is spliced, the 4th predicted vector is obtained;
According to second predicted vector determine second the second maximization of equalization vector sum of second predicted vector to Amount;
Second maximization vector described in the second equalization vector sum is spliced, the 5th predicted vector is obtained;
5th predicted vector and the 4th predicted vector are spliced, the third predicted vector is obtained.
9. a kind of document sorting apparatus, which is characterized in that described device includes:
First determining module, for determining that the first document to be sorted, first document include description information and multiple options Information includes corresponding first option information of at least one described description information in the multiple option information;
Second determining module, for determining that corresponding multiple second documents of first option information and the second option information are corresponding Multiple third documents, second option information is its in addition to first option information in the multiple option information His option information;
Third determining module, for determining described in being used to indicate according to the multiple second document and the multiple third document First predicted vector of the similarity relation of the first option information and second option information;
4th determining module is determined for the description information according to the multiple second document and first document for referring to Show the second predicted vector of the derivation relationship of the description information and first option information;
5th determining module, for determining first document according to first predicted vector and second predicted vector Third predicted vector;
6th determining module, for determining the classification of first document according to the third predicted vector.
10. a kind of terminal, which is characterized in that the terminal includes processor and memory, is stored at least in the memory One instruction, at least a Duan Chengxu, code set or instruction set, described instruction, described program, the code set or described instruction collection It is loaded as the processor and is executed to realize institute in the Document Classification Method as described in claim 1 to 8 any claim The operation of execution.
CN201910554455.0A 2019-06-25 2019-06-25 Document classification method, device and terminal Active CN110263171B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910554455.0A CN110263171B (en) 2019-06-25 2019-06-25 Document classification method, device and terminal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910554455.0A CN110263171B (en) 2019-06-25 2019-06-25 Document classification method, device and terminal

Publications (2)

Publication Number Publication Date
CN110263171A true CN110263171A (en) 2019-09-20
CN110263171B CN110263171B (en) 2023-07-18

Family

ID=67921273

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910554455.0A Active CN110263171B (en) 2019-06-25 2019-06-25 Document classification method, device and terminal

Country Status (1)

Country Link
CN (1) CN110263171B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030078899A1 (en) * 2001-08-13 2003-04-24 Xerox Corporation Fuzzy text categorizer
US20140365410A1 (en) * 2013-06-05 2014-12-11 MultiModel Research, LLC Apparatus and method for building and using inference engines based on representations of data that preserve relationships between objects
WO2017101342A1 (en) * 2015-12-15 2017-06-22 乐视控股(北京)有限公司 Sentiment classification method and apparatus
CN109241284A (en) * 2018-08-27 2019-01-18 中国人民解放军国防科技大学 Document classification method and device
CN109325120A (en) * 2018-09-14 2019-02-12 江苏师范大学 A kind of text sentiment classification method separating user and product attention mechanism
CN109492110A (en) * 2018-11-28 2019-03-19 南京中孚信息技术有限公司 Document Classification Method and device
CN109766423A (en) * 2018-12-29 2019-05-17 上海智臻智能网络科技股份有限公司 Answering method and device neural network based, storage medium, terminal
WO2019112223A1 (en) * 2017-12-08 2019-06-13 빈닷컴 주식회사 Electronic document retrieval method and server therefor

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030078899A1 (en) * 2001-08-13 2003-04-24 Xerox Corporation Fuzzy text categorizer
US20140365410A1 (en) * 2013-06-05 2014-12-11 MultiModel Research, LLC Apparatus and method for building and using inference engines based on representations of data that preserve relationships between objects
WO2017101342A1 (en) * 2015-12-15 2017-06-22 乐视控股(北京)有限公司 Sentiment classification method and apparatus
WO2019112223A1 (en) * 2017-12-08 2019-06-13 빈닷컴 주식회사 Electronic document retrieval method and server therefor
CN109241284A (en) * 2018-08-27 2019-01-18 中国人民解放军国防科技大学 Document classification method and device
CN109325120A (en) * 2018-09-14 2019-02-12 江苏师范大学 A kind of text sentiment classification method separating user and product attention mechanism
CN109492110A (en) * 2018-11-28 2019-03-19 南京中孚信息技术有限公司 Document Classification Method and device
CN109766423A (en) * 2018-12-29 2019-05-17 上海智臻智能网络科技股份有限公司 Answering method and device neural network based, storage medium, terminal

Also Published As

Publication number Publication date
CN110263171B (en) 2023-07-18

Similar Documents

Publication Publication Date Title
CN109740068B (en) Media data recommendation method, device and storage medium
CN110121118A (en) Video clip localization method, device, computer equipment and storage medium
CN110020140A (en) Recommendation display methods, apparatus and system
CN111737573A (en) Resource recommendation method, device, equipment and storage medium
CN109284445A (en) Recommended method, device, server and the storage medium of Internet resources
CN109918669A (en) Entity determines method, apparatus and storage medium
WO2022057435A1 (en) Search-based question answering method, and storage medium
CN110263213A (en) Video pushing method, device, computer equipment and storage medium
CN110162604B (en) Statement generation method, device, equipment and storage medium
CN111611490A (en) Resource searching method, device, equipment and storage medium
CN111897996A (en) Topic label recommendation method, device, equipment and storage medium
CN111581958A (en) Conversation state determining method and device, computer equipment and storage medium
CN110096525A (en) Calibrate method, apparatus, equipment and the storage medium of interest point information
CN108320756A (en) It is a kind of detection audio whether be absolute music audio method and apparatus
CN108922531A (en) Slot position recognition methods, device, electronic equipment and storage medium
CN110020880A (en) Advertisement placement method, device and equipment
CN110942046A (en) Image retrieval method, device, equipment and storage medium
CN110555102A (en) media title recognition method, device and storage medium
CN107656794A (en) Interface display method and device
CN110244999A (en) Control method, apparatus, equipment and the storage medium of destination application operation
CN113269612A (en) Article recommendation method and device, electronic equipment and storage medium
CN109992685A (en) A kind of method and device of retrieving image
CN110166275A (en) Information processing method, device and storage medium
CN108717849A (en) The method, apparatus and storage medium of splicing multimedia data
CN113486260B (en) Method and device for generating interactive information, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant