CN110008312A

CN110008312A - A kind of document writing assistant implementation method, system and electronic equipment

Info

Publication number: CN110008312A
Application number: CN201910284378.1A
Authority: CN
Inventors: 许林
Original assignee: Chengdu University of Information Technology
Current assignee: Chengdu University of Information Technology
Priority date: 2019-04-10
Filing date: 2019-04-10
Publication date: 2019-07-12

Abstract

The invention discloses a kind of document writing assistant implementation method, system and electronic equipments comprising: in documents editing interface, the search terms that should include in the information to be searched for are inputted, described search item includes at least keyword or word or sentence；Described search item is converted into after term vector search and the matched sentence vector of term vector from the database pre-established, each sentence vector is arranged in an independent data cell of database, the data cell reference information included including at least sentence text information, sentence vector, sentence source, sentence；In documents editing interface, the reference information that sentence text information, sentence vector, sentence source, sentence in the corresponding data cell of return carry is for editor's selection.Sentence and word are all converted into real vector and are stored and matched by the present invention by term vector model.It is matched compared with prior art by dictionary or regularization expression formula, search result is more acurrate.

Description

A kind of document writing assistant implementation method, system and electronic equipment

Technical field

The present invention relates to the document edit method in natural language processing field, specifically a kind of document writing assistant Implementation method, system and electronic equipment.

Background technique

When we are in Paper Writing and professional class technical documentation, it is not known that carried out accurately with vocabulary how or sentence Description, especially when writing English papers, due to language barrier, cannot express the thing that we are really intended by.At present There are no effective ground related art schemes to be prompted in writing, as the office of Microsoft carries grammar checker energy Certain syntax check is carried out, but Office grammar checker mainly stores common word, examines after segmenting to sentence Whether the word looked into sentence can find in dictionary；

And to search similar sentence can only carry out keyword search in Baidu's science, Google's science, meanwhile, these dragnets Station is to go to be retrieved by the search of regularization expression formula, such as searches for " mobile phone ", then search result can only contain " mobile phone " two word Document, if being write as " mobile terminal " in document retrieval less than, meanwhile, retrieved web return the result is that entire document go out Locate network address and simple abstract, user, which needs to further click on website, can just check detailed result.To sum up, the prior art Only has the function of the included wrong word that can only detect, but it is to the no too big help of the tissue of sentence, and searches for website Detailed results cannot directly be returned.

Summary of the invention

Based on this, to solve above-mentioned deficiency, spy proposes a kind of document writing assistant realization method and system, effectively to solve Certainly mentioned in background technique the technical issues of, can be realized intelligent retrieval and goes out similar statement list during document production Up to document production personnel reference is supplied to, with help document, writing personnel faster more accurately complete document production.

A kind of document writing assistant implementation method characterized by comprising

S1, in documents editing interface, input the information to be searched in should include search terms, described search item Including at least keyword or word or sentence；

S2, described search item are converted into after term vector search and the matched sentence of term vector from the database pre-established Vector, each sentence vector are arranged in an independent data cell of database, which includes at least The included reference information of sentence text information, sentence vector, sentence source, sentence；

Including at least with the matched sentence of term vector and its satellite information, the satellite information include at least sentence source, The included reference information of sentence；

S3, sentence text information, sentence vector, sentence in documents editing interface, in the corresponding data cell of return The included reference information sentence vector of sub- source, sentence is for editor's selection.Optionally, described in one of the embodiments, The establishment process of database includes: and searches for from network data base in advance and arrange document, and extract text from document in S2 Information；The extraction process of the text information includes the text snippet extracted in document, right one by one after text and reference information Text snippet or body matter are made pauses in reading unpunctuated ancient writings；Using the good term vector model of pre-training, all words in each punctuate are used After term vector expression, participle and part-of-speech tagging are carried out to each word；It is obtained corresponding to current sentence based on the part of speech marked Real vector, that is, sentence vector expression-form.

Optionally, the acquisition process of sentence vector expression-form includes based on the word marked in one of the embodiments, Property to each word be weighted summation obtain sentence vector corresponding to current sentence.

Optionally, described in one of the embodiments, that marked part of speech is utilized to be weighted summation to each word The sentence vector for obtaining sentence, which is expressed, includes:

Summation is weighted to each word based on the part of speech marked；The weighted sum formula is

Wherein, s indicates that sentence vector, N indicate the number of word in the sentence, and v indicates that term vector, α indicate corresponding weight；

The α weight calculation mode are as follows:F is single thus The number that the word frequency of word, i.e. word occur in sentence.

Optionally, in one of the embodiments, from the database pre-established search with the matched sentence of term vector to Amount process includes: the sentence that sentence vector of the search comprising the corresponding term vector of described search item and judgement search from the database Whether vector meets similarity evaluation standard, is, confirms this Vectors matching.

Optionally, judge whether the sentence vector searched meets similarity evaluation standard in one of the embodiments, be Then confirm that this Vectors matching includes: the inner product of vectors for obtaining the sentence vector term vector corresponding with described search item searched, And pick out the corresponding all information of sentence vector after the sentence vector for meeting similarity evaluation value.

Optionally, sentence vector is corresponding after picking out the sentence vector for meeting similarity evaluation value in one of the embodiments, If all information include: the sentence vector term vector corresponding with described search item currently searched inner product of vectors be greater than phase Like degree evaluation of estimate, then this vector is stored in the interim array in database；It will be whole in interim array after to be searched Sentence vector sorted from large to small according to the inner product of vectors of its term vector corresponding with described search item, and select multiple sentences to Amount.

A kind of document writing assistant realization system characterized by comprising

Receiving module, for receiving the content information of input in documents editing interface；

MIM message input module, in documents editing interface, should include in the information to be searched for of input to be searched Suo Xiang, described search item include at least keyword or word or sentence；

Information search module, for so that search terms be converted into after term vector from the database pre-established search with The matched sentence vector of term vector, each sentence vector are arranged in an independent data cell of database, the number According to the unit reference information included including at least sentence text information, sentence vector, sentence source, sentence；

Information feedback module, in documents editing interface, returning to the sentence text in the corresponding data cell The included reference information of information, sentence vector, sentence source, sentence is for editor's selection.

Optionally, the establishment process of database includes: preparatory in the information search module in one of the embodiments, It is searched for from network data base and arranges document, and extract text information from document；The extraction process packet of the text information It includes the text snippet extracted in document, after text and reference information, makes pauses in reading unpunctuated ancient writings one by one to text snippet or body matter；It adopts Each word is divided after all word word vectors expression in each punctuate with the good term vector model of pre-training Word and part-of-speech tagging；Real vector i.e. sentence vector expression-form corresponding to current sentence is obtained based on the part of speech marked；Institute Stating an acquisition process for vector expression-form includes being weighted summation to each word based on the part of speech marked to obtain currently Sentence vector corresponding to sentence.

Optionally, in one of the embodiments, from the database pre-established search with the matched sentence of term vector to Amount process includes: the sentence that sentence vector of the search comprising the corresponding term vector of described search item and judgement search from the database Whether vector meets similarity evaluation standard, is, confirms this Vectors matching；It is described to judge whether the sentence vector searched accords with Similarity evaluation standard is closed, is to confirm that this Vectors matching includes: that obtain the sentence vector that searches corresponding with described search item Term vector inner product of vectors, and pick out the corresponding all information of sentence vector after the sentence vector for meeting similarity evaluation value；Institute If stating the corresponding all information of sentence vector after picking out the sentence vector for meeting similarity evaluation value includes: the sentence currently searched The inner product of vectors of vector term vector corresponding with described search item is greater than similarity evaluation value, then this vector is stored in database In interim array in；After to be searched by sentence vectors whole in interim array according to its word corresponding with described search item to The inner product of vectors of amount sorts from large to small, and selects multiple vectors.

A kind of electronic equipment, including processor, memory and be stored on the memory and can be on the processor The computer program of fortune, the processor is for executing implementation method described above.

Compared with prior art, beneficial effects of the present invention:

Sentence and word are all converted into real vector and are stored and matched by the present invention by term vector model.Compared to existing There is technology to match by dictionary or regularization expression formula, search result is more acurrate.Meanwhile directly storing general information, it uses Really desired information can be directly obtained after the retrieval of family.Therefore the present invention can write work for document and provide necessary reference letter Breath reduces user's search time, to accelerate the writing of document.

Detailed description of the invention

In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with It obtains other drawings based on these drawings.

Wherein:

Fig. 1 is a kind of document writing assistant implementation method flow diagram；

Fig. 2 is the structural block diagram that a kind of document writing assistant realizes system；

Fig. 3 is core flow chart in intelligent server in the embodiment of the present invention；

Fig. 4 is core flow chart in SmartClient in the embodiment of the present invention.

Specific embodiment

In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to the accompanying drawings and embodiments, right The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and It is not used in the restriction present invention.

Unless otherwise defined, all technical and scientific terms used herein and belong to technical field of the invention The normally understood meaning of technical staff is identical.Term as used herein in the specification of the present invention is intended merely to description tool The purpose of the embodiment of body, it is not intended that the limitation present invention.It is appreciated that term " first " used in the present invention, " second " Etc. can be used to describe various elements herein, but these elements should not be limited by these terms.These terms are only used to by first A element and another element are distinguished.For example, in the case where not departing from scope of the present application, first element can be claimed It can be first element by second element for second element, and similarly.First element and second element both element, but It is not identity element.

To solve the technical problem in traditional technology, in the present embodiment, spy proposes a kind of document writing assistant realization Method, can be during document production, and intelligent retrieval goes out similar sentence expression and is supplied to document production personnel reference, with side Document writing personnel are helped faster more accurately to complete document production.As shown in Figure 1, being a kind of document writing assistant implementation method Flow diagram, the document writing assistant implementation method,

Wherein, S1, in documents editing interface, input the information to be searched in should include search terms, it is described Search terms include at least keyword or word or sentence (short sentence)；

Wherein, S2, described search item are converted into after term vector search and term vector from the database pre-established The sentence vector matched, each sentence vector are arranged in an independent data cell of database, and the data cell is extremely The reference information included including sentence text information, sentence vector, sentence source, sentence less；Specifically, to save in database Be that a sentence accounts for a data cell namely a line, this line includes many column, wherein a column are sentence texts, second Column are a vectors, and subsequent several column may also include sentence source, the information such as reference；

In some specific embodiments, the establishment process of database includes: in advance from network data base in the S2 Search and arrangement document, and text information is extracted from document；The extraction process of the text information includes extracting in document Text snippet, after text and reference information, make pauses in reading unpunctuated ancient writings one by one to text snippet or body matter, the punctuate includes to mark Point symbol fullstop, question mark, exclamation mark etc. make pauses in reading unpunctuated ancient writings to abstract and body matter；It, will using the good term vector model of pre-training After all word word vectors expression in each punctuate, each word is segmented by BI-LSTM model and CRF algorithm And part-of-speech tagging, specifically due to being opened between English word by unit natural division of space in sentence, without participle； And word and word are before without list separator in Chinese character sentence, it is therefore desirable to first segment operation to it, it is single that sentence, which is divided into word, Position, a word may be a word, it is also possible to be multiple words.Such as English document, term vector mould first good using pre-training Type, by BI-LSTM model and CRF algorithm, carries out part-of-speech tagging to each word after the expression of all word word vectors；It is right In Chinese document, then term vector model first good using pre-training, after all words are indicated with word vector, passes through Bi-LSTM mould Type and CRF algorithm carry out participle and part-of-speech tagging to sentence；Real number corresponding to current sentence is obtained based on the part of speech marked Vector, that is, sentence vector expression-form.The acquisition process of this vector expression-form includes based on the part of speech marked to each word It is weighted summation and obtains sentence vector corresponding to current sentence, this vector is the real vector of higher-dimension, specifically, in this reality It applies in example, sentence vector is expressed as the real vector of 256 dimensions.It is described to utilize marked part of speech in some specific embodiments To each word be weighted summation obtain sentence sentence vector expression include:

The α weight calculation mode are as follows:F is single thus The number that the word frequency of word, i.e. word occur in sentence.In some specific embodiments, searched from the database pre-established Rope and the matched sentence vector process of term vector include: sentence of the search comprising the corresponding term vector of described search item from the database Vector simultaneously judges whether the sentence vector searched meets similarity evaluation standard, is to confirm this Vectors matching；Judgement search To sentence vector whether meet similarity evaluation standard, be to confirm that this Vectors matching includes: to obtain the sentence vector that searches The inner product of vectors of term vector corresponding with described search item, and pick out sentence vector pair after the sentence vector for meeting similarity evaluation value The all information answered；If it includes: current for picking out the corresponding all information of sentence vector after the sentence vector for meeting similarity evaluation value The inner product of vectors (each corresponding element, which is multiplied, sums) of the sentence vector searched term vector corresponding with described search item is greater than phase Like degree evaluation of estimate, then this vector is stored in the interim array in database；It will be whole in interim array after to be searched Sentence vector sorted from large to small according to the inner product of vectors of its term vector corresponding with described search item, and select multiple sentences to Amount.In some specific embodiments, Euclidean distance, manhatton distance, Pearson correlation coefficient, Spearman can also be used (grade) related coefficient, Jie Kade similarity factor or a variety of obtain one of common distance measure such as SimHash+ Hamming distance Take similarity evaluation value.

Wherein, S3, in documents editing interface, return sentence text information in the corresponding data cell, sentence to The included reference information sentence vector of amount, sentence source, sentence is for editor's selection.

Based on the above principles, a kind of document writing assistant realization system is additionally provided, shown in Fig. 2, which is characterized in that packet It includes:

Information search module, for so that search terms be converted into after term vector from the database pre-established search with The matched sentence vector of term vector, each sentence vector are arranged in an independent data cell of database, the number According to the unit reference information included including at least sentence text information, sentence vector, sentence source, sentence；An implementation wherein In example, the establishment process of database includes: to search for and arrange in advance document from network data base in the information search module, And text information is extracted from document；The extraction process of the text information includes the text snippet extracted in document, text After reference information, make pauses in reading unpunctuated ancient writings one by one to text snippet or body matter；It, will be each using the good term vector model of pre-training After all word word vectors expression in punctuate, participle and part-of-speech tagging are carried out to each word；Based on the part of speech marked Obtain real vector, that is, sentence vector expression-form corresponding to current sentence；The acquisition process of the sentence vector expression-form includes Summation is weighted to each word based on the part of speech marked and obtains sentence vector corresponding to current sentence.

It is described utilize marked part of speech to each word be weighted summation obtain sentence sentence vector expression include:

The α weight calculation mode are as follows:F is single thus The number that the word frequency of word, i.e. word occur in sentence.Finally by text corresponding to all vectors, text source and text The information such as reference involved in this are stored in database.

It include: to be searched for from the database from searching in the database pre-established with the matched sentence vector process of term vector Sentence vector comprising the corresponding term vector of described search item simultaneously judges whether the sentence vector searched meets similarity evaluation standard, It is to confirm this Vectors matching；It is described to judge whether the sentence vector searched meets similarity evaluation standard, it is that then confirmation should Sentence Vectors matching includes: the inner product of vectors for obtaining the sentence vector term vector corresponding with described search item searched, and is picked out Meet the corresponding all information of sentence vector after the sentence vector of similarity evaluation value；It is described to pick out the sentence for meeting similarity evaluation value If the corresponding all information of sentence vector includes: the sentence vector term vector corresponding with described search item currently searched after vector Inner product of vectors be greater than similarity evaluation value, then will this vector be stored in database in interim array in；After to be searched Sentence vectors whole in interim array are sorted from large to small according to the inner product of vectors of its term vector corresponding with described search item, and Select multiple vectors.

Information feedback module, in documents editing interface, returning to the sentence text in the corresponding data cell The included reference information sentence vector of information, sentence vector, sentence source, sentence is for editor's selection.

Based on above content, this case is illustrated with specific example below:

One thesis writing of embodiment

Information search module is arranged at intelligent server end, shown in Fig. 3: it, in advance will be a certain in information search module Or multiple fields paper is all downloaded, and after the paper full text that periodical is delivered under electronic field IEEE is downloaded, extracts its text This abstract, text and reference；Pass through punctuation mark to abstract and text: text is cut into sentence by fullstop, question mark, exclamation mark etc. For unit；The information search module first obtains often English papers using the good term vector model of existing disclosed pre-training The term vector of a word obtains the term vector of word using the BERT of Google in the present embodiment.Then, pass through Bi-LSTM mould Type and CRF algorithm (GMM-CRF, CNN, RNN algorithm also can), carry out part-of-speech tagging, such as noun to each word, verb is then denoted as reality Word, for example auxiliary word, pronoun are then designated as function word；The sentence vector expression of the higher-dimension real number of sentence is obtained by weighted sum with by sentence Real vector is changed into, in the present embodiment, sentence is converted into the real vector of 256 dimensions.Alternatively, in addition to weighted sum obtains It obtains outside sentence vector, the bag of words (BoW) based on statistics, RNN, CNN, the bag of words based on statistics, bag of words can also be passed through The existing public technology such as model obtains sentence vector, and this example is not specifically limited in this embodiment.

All sentences are finally converted into real vector, and as unit of sentence, it will be in its all information deposit database A data cell in, data unit form chart specific as follows；

Sentence text

Sentence vector

The source of sentence

Sentence reference 1

Sentence reference 2

Sentence reference 3

Wherein, sentence source indicates this sentence is where selected from, and is listed by way of reference citation；Meanwhile in paper Many sentences can quote other bibliography, therefore, if there are reference citations for this sentence, list corresponding reference.Such as at this In embodiment, if single sentence at most quotes 3 other documents.Therefore, if reference citation 1, reference citation 2 and document draw With 3.Herein, all reference citations provide three kinds of formats, GB/T7714, MLA, tri- kinds of reference citation formats of APA.

SmartClient (setting receiving module, MIM message input module and information feedback module), shown in Fig. 4: user is writing When writing paper, several keywords can be merely entered by MIM message input module for unfamiliar expression, SmartClient passes through Keyword is transmitted through the network to intelligent server end by MIM message input module, and the information search module at intelligent server end will close Keyword is converted to term vector, then carries out retrieving similar sentence in database, specifically, with inner product (each corresponding element of vector Element, which is multiplied, sums) compare the product of two vector field homoemorphisms to judge similarity.Optionally, it can also be used Euclidean distance, manhatton distance, Pearson correlation coefficient, Spearman (grade) related coefficient, Jie Kade similarity factor, SimHash+ Hamming distance etc. it is common away from From one of estimating or a variety of.Such as using inner product of vectors as similarity is judged, then 1 indicate closest, 0 indicates least to connect Closely.The sentence vector in the sentence vector of the sentence of retrieval and database is successively first calculated into inner product of vectors, such as less than 0.6 abandons, Such as larger than 0.6, there are in an interim array, finally to sorting from large to small in array according to inner product, is chosen first three to five A sentence finally returns to all information of similar sentence, is transmitted to client as most like sentence.Such as larger than 0.6 number It is sky in group, then returns the result as sky, indicate no similar sentence.The result that the information feedback module display of SmartClient returns To user, user can use for reference its expression to write corresponding sentence, meanwhile, its reproducible bibliography.

Two, patent drafting of embodiment

Information search module is arranged at intelligent server end: in information search module, by a certain field license Book is all downloaded, and after downloading such as the granted patent of electronic field, extracts its abstract, claims and specification.To abstract and Specification passes through punctuation mark: fullstop, question mark, and it is unit that text is cut into sentence by exclamation mark etc..To claims to weigh Benefit requires to be that unit is divided.

All sentences are converted into real vector, then method is stored in database with embodiment 1；

Sentence text

Sentence vector

The source of sentence

Wherein, sentence source indicates this sentence is where selected from, and is indicated by the patent No..

SmartClient (setting receiving module, MIM message input module and information feedback module): user when writing patent, Several keywords can be merely entered for unfamiliar expression, client is passed keyword by network by MIM message input module Intelligent server end is transported to, keyword is converted to term vector by the information search module at intelligent server end, then in database It carries out retrieving similar sentence, specifically, the product of two vector field homoemorphisms is compared with the inner product (each corresponding element, which is multiplied, sums) of vector To judge similarity.Optionally, Euclidean distance, manhatton distance, Pearson correlation coefficient, Spearman (grade) can also be used Related coefficient, Jie Kade similarity factor, one of common distance measure such as SimHash+ Hamming distance or a variety of.Then, it selects It takes first three most like to five sentences, returns to all information of similar sentence, be transmitted to client.The information of client is fed back For the result that module display returns to user, user can use for reference its expression to write corresponding sentence, meanwhile, it can avoid as far as possible and existing There is the claim of granted patent to be overlapped or conflict.In summary, the present invention is realized assists writing by the way that sentence semantics are similar Make and weighting is constructed according to part of speech by sentence vector；Sentence vector can be quoted, the common storage mode in source simultaneously.

Implement the embodiment of the present invention, will have the following beneficial effects:

The several embodiments of the application above described embodiment only expresses, the description thereof is more specific and detailed, but simultaneously The limitation to the application the scope of the patents therefore cannot be interpreted as.It should be pointed out that for those of ordinary skill in the art For, without departing from the concept of this application, various modifications and improvements can be made, these belong to the guarantor of the application Protect range.Therefore, the scope of protection shall be subject to the appended claims for the application patent.

Claims

1. a kind of document writing assistant implementation method characterized by comprising

S1, in documents editing interface, input the information to be searched in should include search terms, described search item is at least Including keyword or word or sentence；

S2, described search item be converted into after term vector from the database pre-established search and the matched sentence of term vector to Amount, each sentence vector are arranged in an independent data cell of database, which includes at least sentence The included reference information of sub- text information, sentence vector, sentence source, sentence；

S3, in documents editing interface, return to the sentence text information in the corresponding data cell, sentence vector, sentence and go out The included reference information of place, sentence is for editor's selection.

2. the method according to claim 1, wherein in the S2 establishment process of database include: in advance from It is searched in network data base and arranges document, and extract text information from document；The extraction process of the text information includes It extracts the text snippet in document, after text and reference information, makes pauses in reading unpunctuated ancient writings one by one to text snippet or body matter；Using The good term vector model of pre-training carries out part of speech to each word after all word word vectors expression in each punctuate Mark；Real vector i.e. sentence vector expression-form corresponding to current sentence is obtained based on the part of speech marked.

3. according to the method described in claim 2, it is characterized in that, the acquisition process of sentence vector expression-form includes being based on being marked The part of speech of note is weighted summation to each word and obtains sentence vector corresponding to current sentence.

4. according to the method described in claim 3, it is characterized in that, described add each word using the part of speech marked The sentence vector that power summation obtains sentence, which is expressed, includes:

The α weight calculation mode are as follows:F word thus The number that word frequency, i.e. word occur in sentence.

5. being matched the method according to claim 1, wherein being searched for from the database pre-established with term vector Sentence vector process include: from the database search comprising the corresponding term vector of described search item sentence vector and judge to search for To sentence vector whether meet similarity evaluation standard, be to confirm this Vectors matching.

6. according to the method described in claim 5, it is characterized in that, whether the sentence vector that judgement searches meets similarity evaluation Standard be confirm this Vectors matching include: obtain search sentence vector term vector corresponding with described search item to Inner product is measured, and picks out the corresponding all information of sentence vector after the sentence vector for meeting similarity evaluation value.

7. according to the method described in claim 6, it is characterized in that, selecting sentence vector after the sentence vector for meeting similarity evaluation value If corresponding all information includes: that the inner product of vectors of the sentence vector term vector corresponding with described search item currently searched is big In similarity evaluation value, then this vector is stored in the interim array in database；It will be in interim array after to be searched Whole sentence vectors are sorted from large to small according to the inner product of vectors of its term vector corresponding with described search item, and select multiple sentences Vector.

8. a kind of document writing assistant realizes system characterized by comprising

MIM message input module, for inputting the search terms that should include in the information to be searched in documents editing interface, Described search item includes at least keyword or word or sentence；

Information search module, for so that search terms be converted into after term vector from the database pre-established search and word to Flux matched sentence vector, each sentence vector are arranged in an independent data cell of database, the data sheet The member reference information included including at least sentence text information, sentence vector, sentence source, sentence；In the information search module The establishment process of database includes: to search for from network data base in advance and arrange document, and extract text information from document； The extraction process of the text information includes the text snippet extracted in document, after text and reference information, one by one to text Abstract or body matter are made pauses in reading unpunctuated ancient writings；Using the good term vector model of pre-training, by all word words in each punctuate to After amount indicates, participle and part-of-speech tagging are carried out to each word；Reality corresponding to current sentence is obtained based on the part of speech marked Number vector, that is, sentence vector expression-form；The acquisition process of the sentence vector expression-form includes based on the part of speech marked to each Word is weighted summation and obtains sentence vector corresponding to current sentence；

Information feedback module, in documents editing interface, return sentence text information in the corresponding data cell, The included reference information of sentence vector, sentence source, sentence is for editor's selection.

9. system according to claim 8, which is characterized in that search for from the database pre-established and matched with term vector Sentence vector process include: from the database search comprising the corresponding term vector of described search item sentence vector and judge to search for To sentence vector whether meet similarity evaluation standard, be to confirm this Vectors matching；The sentence vector for judging to search Whether meet similarity evaluation standard, be, confirms that this Vectors matching includes: to obtain the sentence vector searched and described search The inner product of vectors of the corresponding term vector of item, and pick out the corresponding all letters of sentence vector after the sentence vector for meeting similarity evaluation value Breath；If it is described pick out the sentence vector for meeting similarity evaluation value after the corresponding all information of sentence vector include: currently to be searched for The inner product of vectors of the sentence vector arrived term vector corresponding with described search item is greater than similarity evaluation value, then is stored in this vector In interim array in database；It is after to be searched that sentence vectors whole in interim array are corresponding with described search item according to it The inner product of vectors of term vector sort from large to small, and select multiple vectors.

10. a kind of electronic equipment, including processor, memory and it is stored on the memory and can transports on the processor Computer program, the processor is for executing implementation method described in the claims 1-7.