CN110008312A - A kind of document writing assistant implementation method, system and electronic equipment - Google Patents

A kind of document writing assistant implementation method, system and electronic equipment Download PDF

Info

Publication number
CN110008312A
CN110008312A CN201910284378.1A CN201910284378A CN110008312A CN 110008312 A CN110008312 A CN 110008312A CN 201910284378 A CN201910284378 A CN 201910284378A CN 110008312 A CN110008312 A CN 110008312A
Authority
CN
China
Prior art keywords
sentence
vector
word
information
search
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910284378.1A
Other languages
Chinese (zh)
Inventor
许林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu University of Information Technology
Original Assignee
Chengdu University of Information Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu University of Information Technology filed Critical Chengdu University of Information Technology
Priority to CN201910284378.1A priority Critical patent/CN110008312A/en
Publication of CN110008312A publication Critical patent/CN110008312A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Abstract

The invention discloses a kind of document writing assistant implementation method, system and electronic equipments comprising: in documents editing interface, the search terms that should include in the information to be searched for are inputted, described search item includes at least keyword or word or sentence;Described search item is converted into after term vector search and the matched sentence vector of term vector from the database pre-established, each sentence vector is arranged in an independent data cell of database, the data cell reference information included including at least sentence text information, sentence vector, sentence source, sentence;In documents editing interface, the reference information that sentence text information, sentence vector, sentence source, sentence in the corresponding data cell of return carry is for editor's selection.Sentence and word are all converted into real vector and are stored and matched by the present invention by term vector model.It is matched compared with prior art by dictionary or regularization expression formula, search result is more acurrate.

Description

A kind of document writing assistant implementation method, system and electronic equipment
Technical field
The present invention relates to the document edit method in natural language processing field, specifically a kind of document writing assistant Implementation method, system and electronic equipment.
Background technique
When we are in Paper Writing and professional class technical documentation, it is not known that carried out accurately with vocabulary how or sentence Description, especially when writing English papers, due to language barrier, cannot express the thing that we are really intended by.At present There are no effective ground related art schemes to be prompted in writing, as the office of Microsoft carries grammar checker energy Certain syntax check is carried out, but Office grammar checker mainly stores common word, examines after segmenting to sentence Whether the word looked into sentence can find in dictionary;
And to search similar sentence can only carry out keyword search in Baidu's science, Google's science, meanwhile, these dragnets Station is to go to be retrieved by the search of regularization expression formula, such as searches for " mobile phone ", then search result can only contain " mobile phone " two word Document, if being write as " mobile terminal " in document retrieval less than, meanwhile, retrieved web return the result is that entire document go out Locate network address and simple abstract, user, which needs to further click on website, can just check detailed result.To sum up, the prior art Only has the function of the included wrong word that can only detect, but it is to the no too big help of the tissue of sentence, and searches for website Detailed results cannot directly be returned.
Summary of the invention
Based on this, to solve above-mentioned deficiency, spy proposes a kind of document writing assistant realization method and system, effectively to solve Certainly mentioned in background technique the technical issues of, can be realized intelligent retrieval and goes out similar statement list during document production Up to document production personnel reference is supplied to, with help document, writing personnel faster more accurately complete document production.
A kind of document writing assistant implementation method characterized by comprising
S1, in documents editing interface, input the information to be searched in should include search terms, described search item Including at least keyword or word or sentence;
S2, described search item are converted into after term vector search and the matched sentence of term vector from the database pre-established Vector, each sentence vector are arranged in an independent data cell of database, which includes at least The included reference information of sentence text information, sentence vector, sentence source, sentence;
Including at least with the matched sentence of term vector and its satellite information, the satellite information include at least sentence source, The included reference information of sentence;
S3, sentence text information, sentence vector, sentence in documents editing interface, in the corresponding data cell of return The included reference information sentence vector of sub- source, sentence is for editor's selection.Optionally, described in one of the embodiments, The establishment process of database includes: and searches for from network data base in advance and arrange document, and extract text from document in S2 Information;The extraction process of the text information includes the text snippet extracted in document, right one by one after text and reference information Text snippet or body matter are made pauses in reading unpunctuated ancient writings;Using the good term vector model of pre-training, all words in each punctuate are used After term vector expression, participle and part-of-speech tagging are carried out to each word;It is obtained corresponding to current sentence based on the part of speech marked Real vector, that is, sentence vector expression-form.
Optionally, the acquisition process of sentence vector expression-form includes based on the word marked in one of the embodiments, Property to each word be weighted summation obtain sentence vector corresponding to current sentence.
Optionally, described in one of the embodiments, that marked part of speech is utilized to be weighted summation to each word The sentence vector for obtaining sentence, which is expressed, includes:
Summation is weighted to each word based on the part of speech marked;The weighted sum formula is
Wherein, s indicates that sentence vector, N indicate the number of word in the sentence, and v indicates that term vector, α indicate corresponding weight;
The α weight calculation mode are as follows:F is single thus The number that the word frequency of word, i.e. word occur in sentence.
Optionally, in one of the embodiments, from the database pre-established search with the matched sentence of term vector to Amount process includes: the sentence that sentence vector of the search comprising the corresponding term vector of described search item and judgement search from the database Whether vector meets similarity evaluation standard, is, confirms this Vectors matching.
Optionally, judge whether the sentence vector searched meets similarity evaluation standard in one of the embodiments, be Then confirm that this Vectors matching includes: the inner product of vectors for obtaining the sentence vector term vector corresponding with described search item searched, And pick out the corresponding all information of sentence vector after the sentence vector for meeting similarity evaluation value.
Optionally, sentence vector is corresponding after picking out the sentence vector for meeting similarity evaluation value in one of the embodiments, If all information include: the sentence vector term vector corresponding with described search item currently searched inner product of vectors be greater than phase Like degree evaluation of estimate, then this vector is stored in the interim array in database;It will be whole in interim array after to be searched Sentence vector sorted from large to small according to the inner product of vectors of its term vector corresponding with described search item, and select multiple sentences to Amount.
A kind of document writing assistant realization system characterized by comprising
Receiving module, for receiving the content information of input in documents editing interface;
MIM message input module, in documents editing interface, should include in the information to be searched for of input to be searched Suo Xiang, described search item include at least keyword or word or sentence;
Information search module, for so that search terms be converted into after term vector from the database pre-established search with The matched sentence vector of term vector, each sentence vector are arranged in an independent data cell of database, the number According to the unit reference information included including at least sentence text information, sentence vector, sentence source, sentence;
Information feedback module, in documents editing interface, returning to the sentence text in the corresponding data cell The included reference information of information, sentence vector, sentence source, sentence is for editor's selection.
Optionally, the establishment process of database includes: preparatory in the information search module in one of the embodiments, It is searched for from network data base and arranges document, and extract text information from document;The extraction process packet of the text information It includes the text snippet extracted in document, after text and reference information, makes pauses in reading unpunctuated ancient writings one by one to text snippet or body matter;It adopts Each word is divided after all word word vectors expression in each punctuate with the good term vector model of pre-training Word and part-of-speech tagging;Real vector i.e. sentence vector expression-form corresponding to current sentence is obtained based on the part of speech marked;Institute Stating an acquisition process for vector expression-form includes being weighted summation to each word based on the part of speech marked to obtain currently Sentence vector corresponding to sentence.
Optionally, described in one of the embodiments, that marked part of speech is utilized to be weighted summation to each word The sentence vector for obtaining sentence, which is expressed, includes:
Summation is weighted to each word based on the part of speech marked;The weighted sum formula is
Wherein, s indicates that sentence vector, N indicate the number of word in the sentence, and v indicates that term vector, α indicate corresponding weight;
The α weight calculation mode are as follows:F is single thus The number that the word frequency of word, i.e. word occur in sentence.
Optionally, in one of the embodiments, from the database pre-established search with the matched sentence of term vector to Amount process includes: the sentence that sentence vector of the search comprising the corresponding term vector of described search item and judgement search from the database Whether vector meets similarity evaluation standard, is, confirms this Vectors matching;It is described to judge whether the sentence vector searched accords with Similarity evaluation standard is closed, is to confirm that this Vectors matching includes: that obtain the sentence vector that searches corresponding with described search item Term vector inner product of vectors, and pick out the corresponding all information of sentence vector after the sentence vector for meeting similarity evaluation value;Institute If stating the corresponding all information of sentence vector after picking out the sentence vector for meeting similarity evaluation value includes: the sentence currently searched The inner product of vectors of vector term vector corresponding with described search item is greater than similarity evaluation value, then this vector is stored in database In interim array in;After to be searched by sentence vectors whole in interim array according to its word corresponding with described search item to The inner product of vectors of amount sorts from large to small, and selects multiple vectors.
A kind of electronic equipment, including processor, memory and be stored on the memory and can be on the processor The computer program of fortune, the processor is for executing implementation method described above.
Compared with prior art, beneficial effects of the present invention:
Sentence and word are all converted into real vector and are stored and matched by the present invention by term vector model.Compared to existing There is technology to match by dictionary or regularization expression formula, search result is more acurrate.Meanwhile directly storing general information, it uses Really desired information can be directly obtained after the retrieval of family.Therefore the present invention can write work for document and provide necessary reference letter Breath reduces user's search time, to accelerate the writing of document.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with It obtains other drawings based on these drawings.
Wherein:
Fig. 1 is a kind of document writing assistant implementation method flow diagram;
Fig. 2 is the structural block diagram that a kind of document writing assistant realizes system;
Fig. 3 is core flow chart in intelligent server in the embodiment of the present invention;
Fig. 4 is core flow chart in SmartClient in the embodiment of the present invention.
Specific embodiment
In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to the accompanying drawings and embodiments, right The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and It is not used in the restriction present invention.
Unless otherwise defined, all technical and scientific terms used herein and belong to technical field of the invention The normally understood meaning of technical staff is identical.Term as used herein in the specification of the present invention is intended merely to description tool The purpose of the embodiment of body, it is not intended that the limitation present invention.It is appreciated that term " first " used in the present invention, " second " Etc. can be used to describe various elements herein, but these elements should not be limited by these terms.These terms are only used to by first A element and another element are distinguished.For example, in the case where not departing from scope of the present application, first element can be claimed It can be first element by second element for second element, and similarly.First element and second element both element, but It is not identity element.
To solve the technical problem in traditional technology, in the present embodiment, spy proposes a kind of document writing assistant realization Method, can be during document production, and intelligent retrieval goes out similar sentence expression and is supplied to document production personnel reference, with side Document writing personnel are helped faster more accurately to complete document production.As shown in Figure 1, being a kind of document writing assistant implementation method Flow diagram, the document writing assistant implementation method,
Wherein, S1, in documents editing interface, input the information to be searched in should include search terms, it is described Search terms include at least keyword or word or sentence (short sentence);
Wherein, S2, described search item are converted into after term vector search and term vector from the database pre-established The sentence vector matched, each sentence vector are arranged in an independent data cell of database, and the data cell is extremely The reference information included including sentence text information, sentence vector, sentence source, sentence less;Specifically, to save in database Be that a sentence accounts for a data cell namely a line, this line includes many column, wherein a column are sentence texts, second Column are a vectors, and subsequent several column may also include sentence source, the information such as reference;
In some specific embodiments, the establishment process of database includes: in advance from network data base in the S2 Search and arrangement document, and text information is extracted from document;The extraction process of the text information includes extracting in document Text snippet, after text and reference information, make pauses in reading unpunctuated ancient writings one by one to text snippet or body matter, the punctuate includes to mark Point symbol fullstop, question mark, exclamation mark etc. make pauses in reading unpunctuated ancient writings to abstract and body matter;It, will using the good term vector model of pre-training After all word word vectors expression in each punctuate, each word is segmented by BI-LSTM model and CRF algorithm And part-of-speech tagging, specifically due to being opened between English word by unit natural division of space in sentence, without participle; And word and word are before without list separator in Chinese character sentence, it is therefore desirable to first segment operation to it, it is single that sentence, which is divided into word, Position, a word may be a word, it is also possible to be multiple words.Such as English document, term vector mould first good using pre-training Type, by BI-LSTM model and CRF algorithm, carries out part-of-speech tagging to each word after the expression of all word word vectors;It is right In Chinese document, then term vector model first good using pre-training, after all words are indicated with word vector, passes through Bi-LSTM mould Type and CRF algorithm carry out participle and part-of-speech tagging to sentence;Real number corresponding to current sentence is obtained based on the part of speech marked Vector, that is, sentence vector expression-form.The acquisition process of this vector expression-form includes based on the part of speech marked to each word It is weighted summation and obtains sentence vector corresponding to current sentence, this vector is the real vector of higher-dimension, specifically, in this reality It applies in example, sentence vector is expressed as the real vector of 256 dimensions.It is described to utilize marked part of speech in some specific embodiments To each word be weighted summation obtain sentence sentence vector expression include:
Summation is weighted to each word based on the part of speech marked;The weighted sum formula is
Wherein, s indicates that sentence vector, N indicate the number of word in the sentence, and v indicates that term vector, α indicate corresponding weight;
The α weight calculation mode are as follows:F is single thus The number that the word frequency of word, i.e. word occur in sentence.In some specific embodiments, searched from the database pre-established Rope and the matched sentence vector process of term vector include: sentence of the search comprising the corresponding term vector of described search item from the database Vector simultaneously judges whether the sentence vector searched meets similarity evaluation standard, is to confirm this Vectors matching;Judgement search To sentence vector whether meet similarity evaluation standard, be to confirm that this Vectors matching includes: to obtain the sentence vector that searches The inner product of vectors of term vector corresponding with described search item, and pick out sentence vector pair after the sentence vector for meeting similarity evaluation value The all information answered;If it includes: current for picking out the corresponding all information of sentence vector after the sentence vector for meeting similarity evaluation value The inner product of vectors (each corresponding element, which is multiplied, sums) of the sentence vector searched term vector corresponding with described search item is greater than phase Like degree evaluation of estimate, then this vector is stored in the interim array in database;It will be whole in interim array after to be searched Sentence vector sorted from large to small according to the inner product of vectors of its term vector corresponding with described search item, and select multiple sentences to Amount.In some specific embodiments, Euclidean distance, manhatton distance, Pearson correlation coefficient, Spearman can also be used (grade) related coefficient, Jie Kade similarity factor or a variety of obtain one of common distance measure such as SimHash+ Hamming distance Take similarity evaluation value.
Wherein, S3, in documents editing interface, return sentence text information in the corresponding data cell, sentence to The included reference information sentence vector of amount, sentence source, sentence is for editor's selection.
Based on the above principles, a kind of document writing assistant realization system is additionally provided, shown in Fig. 2, which is characterized in that packet It includes:
Receiving module, for receiving the content information of input in documents editing interface;
MIM message input module, in documents editing interface, should include in the information to be searched for of input to be searched Suo Xiang, described search item include at least keyword or word or sentence;
Information search module, for so that search terms be converted into after term vector from the database pre-established search with The matched sentence vector of term vector, each sentence vector are arranged in an independent data cell of database, the number According to the unit reference information included including at least sentence text information, sentence vector, sentence source, sentence;An implementation wherein In example, the establishment process of database includes: to search for and arrange in advance document from network data base in the information search module, And text information is extracted from document;The extraction process of the text information includes the text snippet extracted in document, text After reference information, make pauses in reading unpunctuated ancient writings one by one to text snippet or body matter;It, will be each using the good term vector model of pre-training After all word word vectors expression in punctuate, participle and part-of-speech tagging are carried out to each word;Based on the part of speech marked Obtain real vector, that is, sentence vector expression-form corresponding to current sentence;The acquisition process of the sentence vector expression-form includes Summation is weighted to each word based on the part of speech marked and obtains sentence vector corresponding to current sentence.
It is described utilize marked part of speech to each word be weighted summation obtain sentence sentence vector expression include:
Summation is weighted to each word based on the part of speech marked;The weighted sum formula is
Wherein, s indicates that sentence vector, N indicate the number of word in the sentence, and v indicates that term vector, α indicate corresponding weight;
The α weight calculation mode are as follows:F is single thus The number that the word frequency of word, i.e. word occur in sentence.Finally by text corresponding to all vectors, text source and text The information such as reference involved in this are stored in database.
It include: to be searched for from the database from searching in the database pre-established with the matched sentence vector process of term vector Sentence vector comprising the corresponding term vector of described search item simultaneously judges whether the sentence vector searched meets similarity evaluation standard, It is to confirm this Vectors matching;It is described to judge whether the sentence vector searched meets similarity evaluation standard, it is that then confirmation should Sentence Vectors matching includes: the inner product of vectors for obtaining the sentence vector term vector corresponding with described search item searched, and is picked out Meet the corresponding all information of sentence vector after the sentence vector of similarity evaluation value;It is described to pick out the sentence for meeting similarity evaluation value If the corresponding all information of sentence vector includes: the sentence vector term vector corresponding with described search item currently searched after vector Inner product of vectors be greater than similarity evaluation value, then will this vector be stored in database in interim array in;After to be searched Sentence vectors whole in interim array are sorted from large to small according to the inner product of vectors of its term vector corresponding with described search item, and Select multiple vectors.
Information feedback module, in documents editing interface, returning to the sentence text in the corresponding data cell The included reference information sentence vector of information, sentence vector, sentence source, sentence is for editor's selection.
A kind of electronic equipment, including processor, memory and be stored on the memory and can be on the processor The computer program of fortune, the processor is for executing implementation method described above.
Based on above content, this case is illustrated with specific example below:
One thesis writing of embodiment
Information search module is arranged at intelligent server end, shown in Fig. 3: it, in advance will be a certain in information search module Or multiple fields paper is all downloaded, and after the paper full text that periodical is delivered under electronic field IEEE is downloaded, extracts its text This abstract, text and reference;Pass through punctuation mark to abstract and text: text is cut into sentence by fullstop, question mark, exclamation mark etc. For unit;The information search module first obtains often English papers using the good term vector model of existing disclosed pre-training The term vector of a word obtains the term vector of word using the BERT of Google in the present embodiment.Then, pass through Bi-LSTM mould Type and CRF algorithm (GMM-CRF, CNN, RNN algorithm also can), carry out part-of-speech tagging, such as noun to each word, verb is then denoted as reality Word, for example auxiliary word, pronoun are then designated as function word;The sentence vector expression of the higher-dimension real number of sentence is obtained by weighted sum with by sentence Real vector is changed into, in the present embodiment, sentence is converted into the real vector of 256 dimensions.Alternatively, in addition to weighted sum obtains It obtains outside sentence vector, the bag of words (BoW) based on statistics, RNN, CNN, the bag of words based on statistics, bag of words can also be passed through The existing public technology such as model obtains sentence vector, and this example is not specifically limited in this embodiment.
All sentences are finally converted into real vector, and as unit of sentence, it will be in its all information deposit database A data cell in, data unit form chart specific as follows;
Sentence text Sentence vector The source of sentence Sentence reference 1 Sentence reference 2 Sentence reference 3
Wherein, sentence source indicates this sentence is where selected from, and is listed by way of reference citation;Meanwhile in paper Many sentences can quote other bibliography, therefore, if there are reference citations for this sentence, list corresponding reference.Such as at this In embodiment, if single sentence at most quotes 3 other documents.Therefore, if reference citation 1, reference citation 2 and document draw With 3.Herein, all reference citations provide three kinds of formats, GB/T7714, MLA, tri- kinds of reference citation formats of APA.
SmartClient (setting receiving module, MIM message input module and information feedback module), shown in Fig. 4: user is writing When writing paper, several keywords can be merely entered by MIM message input module for unfamiliar expression, SmartClient passes through Keyword is transmitted through the network to intelligent server end by MIM message input module, and the information search module at intelligent server end will close Keyword is converted to term vector, then carries out retrieving similar sentence in database, specifically, with inner product (each corresponding element of vector Element, which is multiplied, sums) compare the product of two vector field homoemorphisms to judge similarity.Optionally, it can also be used Euclidean distance, manhatton distance, Pearson correlation coefficient, Spearman (grade) related coefficient, Jie Kade similarity factor, SimHash+ Hamming distance etc. it is common away from From one of estimating or a variety of.Such as using inner product of vectors as similarity is judged, then 1 indicate closest, 0 indicates least to connect Closely.The sentence vector in the sentence vector of the sentence of retrieval and database is successively first calculated into inner product of vectors, such as less than 0.6 abandons, Such as larger than 0.6, there are in an interim array, finally to sorting from large to small in array according to inner product, is chosen first three to five A sentence finally returns to all information of similar sentence, is transmitted to client as most like sentence.Such as larger than 0.6 number It is sky in group, then returns the result as sky, indicate no similar sentence.The result that the information feedback module display of SmartClient returns To user, user can use for reference its expression to write corresponding sentence, meanwhile, its reproducible bibliography.
Two, patent drafting of embodiment
Information search module is arranged at intelligent server end: in information search module, by a certain field license Book is all downloaded, and after downloading such as the granted patent of electronic field, extracts its abstract, claims and specification.To abstract and Specification passes through punctuation mark: fullstop, question mark, and it is unit that text is cut into sentence by exclamation mark etc..To claims to weigh Benefit requires to be that unit is divided.
All sentences are converted into real vector, then method is stored in database with embodiment 1;
Sentence text Sentence vector The source of sentence
Wherein, sentence source indicates this sentence is where selected from, and is indicated by the patent No..
SmartClient (setting receiving module, MIM message input module and information feedback module): user when writing patent, Several keywords can be merely entered for unfamiliar expression, client is passed keyword by network by MIM message input module Intelligent server end is transported to, keyword is converted to term vector by the information search module at intelligent server end, then in database It carries out retrieving similar sentence, specifically, the product of two vector field homoemorphisms is compared with the inner product (each corresponding element, which is multiplied, sums) of vector To judge similarity.Optionally, Euclidean distance, manhatton distance, Pearson correlation coefficient, Spearman (grade) can also be used Related coefficient, Jie Kade similarity factor, one of common distance measure such as SimHash+ Hamming distance or a variety of.Then, it selects It takes first three most like to five sentences, returns to all information of similar sentence, be transmitted to client.The information of client is fed back For the result that module display returns to user, user can use for reference its expression to write corresponding sentence, meanwhile, it can avoid as far as possible and existing There is the claim of granted patent to be overlapped or conflict.In summary, the present invention is realized assists writing by the way that sentence semantics are similar Make and weighting is constructed according to part of speech by sentence vector;Sentence vector can be quoted, the common storage mode in source simultaneously.
Implement the embodiment of the present invention, will have the following beneficial effects:
Sentence and word are all converted into real vector and are stored and matched by the present invention by term vector model.Compared to existing There is technology to match by dictionary or regularization expression formula, search result is more acurrate.Meanwhile directly storing general information, it uses Really desired information can be directly obtained after the retrieval of family.Therefore the present invention can write work for document and provide necessary reference letter Breath reduces user's search time, to accelerate the writing of document.
The several embodiments of the application above described embodiment only expresses, the description thereof is more specific and detailed, but simultaneously The limitation to the application the scope of the patents therefore cannot be interpreted as.It should be pointed out that for those of ordinary skill in the art For, without departing from the concept of this application, various modifications and improvements can be made, these belong to the guarantor of the application Protect range.Therefore, the scope of protection shall be subject to the appended claims for the application patent.

Claims (10)

1. a kind of document writing assistant implementation method characterized by comprising
S1, in documents editing interface, input the information to be searched in should include search terms, described search item is at least Including keyword or word or sentence;
S2, described search item be converted into after term vector from the database pre-established search and the matched sentence of term vector to Amount, each sentence vector are arranged in an independent data cell of database, which includes at least sentence The included reference information of sub- text information, sentence vector, sentence source, sentence;
S3, in documents editing interface, return to the sentence text information in the corresponding data cell, sentence vector, sentence and go out The included reference information of place, sentence is for editor's selection.
2. the method according to claim 1, wherein in the S2 establishment process of database include: in advance from It is searched in network data base and arranges document, and extract text information from document;The extraction process of the text information includes It extracts the text snippet in document, after text and reference information, makes pauses in reading unpunctuated ancient writings one by one to text snippet or body matter;Using The good term vector model of pre-training carries out part of speech to each word after all word word vectors expression in each punctuate Mark;Real vector i.e. sentence vector expression-form corresponding to current sentence is obtained based on the part of speech marked.
3. according to the method described in claim 2, it is characterized in that, the acquisition process of sentence vector expression-form includes being based on being marked The part of speech of note is weighted summation to each word and obtains sentence vector corresponding to current sentence.
4. according to the method described in claim 3, it is characterized in that, described add each word using the part of speech marked The sentence vector that power summation obtains sentence, which is expressed, includes:
Summation is weighted to each word based on the part of speech marked;The weighted sum formula is
Wherein, s indicates that sentence vector, N indicate the number of word in the sentence, and v indicates that term vector, α indicate corresponding weight;
The α weight calculation mode are as follows:F word thus The number that word frequency, i.e. word occur in sentence.
5. being matched the method according to claim 1, wherein being searched for from the database pre-established with term vector Sentence vector process include: from the database search comprising the corresponding term vector of described search item sentence vector and judge to search for To sentence vector whether meet similarity evaluation standard, be to confirm this Vectors matching.
6. according to the method described in claim 5, it is characterized in that, whether the sentence vector that judgement searches meets similarity evaluation Standard be confirm this Vectors matching include: obtain search sentence vector term vector corresponding with described search item to Inner product is measured, and picks out the corresponding all information of sentence vector after the sentence vector for meeting similarity evaluation value.
7. according to the method described in claim 6, it is characterized in that, selecting sentence vector after the sentence vector for meeting similarity evaluation value If corresponding all information includes: that the inner product of vectors of the sentence vector term vector corresponding with described search item currently searched is big In similarity evaluation value, then this vector is stored in the interim array in database;It will be in interim array after to be searched Whole sentence vectors are sorted from large to small according to the inner product of vectors of its term vector corresponding with described search item, and select multiple sentences Vector.
8. a kind of document writing assistant realizes system characterized by comprising
Receiving module, for receiving the content information of input in documents editing interface;
MIM message input module, for inputting the search terms that should include in the information to be searched in documents editing interface, Described search item includes at least keyword or word or sentence;
Information search module, for so that search terms be converted into after term vector from the database pre-established search and word to Flux matched sentence vector, each sentence vector are arranged in an independent data cell of database, the data sheet The member reference information included including at least sentence text information, sentence vector, sentence source, sentence;In the information search module The establishment process of database includes: to search for from network data base in advance and arrange document, and extract text information from document; The extraction process of the text information includes the text snippet extracted in document, after text and reference information, one by one to text Abstract or body matter are made pauses in reading unpunctuated ancient writings;Using the good term vector model of pre-training, by all word words in each punctuate to After amount indicates, participle and part-of-speech tagging are carried out to each word;Reality corresponding to current sentence is obtained based on the part of speech marked Number vector, that is, sentence vector expression-form;The acquisition process of the sentence vector expression-form includes based on the part of speech marked to each Word is weighted summation and obtains sentence vector corresponding to current sentence;
Information feedback module, in documents editing interface, return sentence text information in the corresponding data cell, The included reference information of sentence vector, sentence source, sentence is for editor's selection.
9. system according to claim 8, which is characterized in that search for from the database pre-established and matched with term vector Sentence vector process include: from the database search comprising the corresponding term vector of described search item sentence vector and judge to search for To sentence vector whether meet similarity evaluation standard, be to confirm this Vectors matching;The sentence vector for judging to search Whether meet similarity evaluation standard, be, confirms that this Vectors matching includes: to obtain the sentence vector searched and described search The inner product of vectors of the corresponding term vector of item, and pick out the corresponding all letters of sentence vector after the sentence vector for meeting similarity evaluation value Breath;If it is described pick out the sentence vector for meeting similarity evaluation value after the corresponding all information of sentence vector include: currently to be searched for The inner product of vectors of the sentence vector arrived term vector corresponding with described search item is greater than similarity evaluation value, then is stored in this vector In interim array in database;It is after to be searched that sentence vectors whole in interim array are corresponding with described search item according to it The inner product of vectors of term vector sort from large to small, and select multiple vectors.
10. a kind of electronic equipment, including processor, memory and it is stored on the memory and can transports on the processor Computer program, the processor is for executing implementation method described in the claims 1-7.
CN201910284378.1A 2019-04-10 2019-04-10 A kind of document writing assistant implementation method, system and electronic equipment Pending CN110008312A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910284378.1A CN110008312A (en) 2019-04-10 2019-04-10 A kind of document writing assistant implementation method, system and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910284378.1A CN110008312A (en) 2019-04-10 2019-04-10 A kind of document writing assistant implementation method, system and electronic equipment

Publications (1)

Publication Number Publication Date
CN110008312A true CN110008312A (en) 2019-07-12

Family

ID=67170706

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910284378.1A Pending CN110008312A (en) 2019-04-10 2019-04-10 A kind of document writing assistant implementation method, system and electronic equipment

Country Status (1)

Country Link
CN (1) CN110008312A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111309866A (en) * 2020-02-15 2020-06-19 深圳前海黑顿科技有限公司 System and method for intelligently retrieving written materials by utilizing semantic fuzzy search
CN113254574A (en) * 2021-03-15 2021-08-13 河北地质大学 Method, device and system for auxiliary generation of customs official documents
CN114780690A (en) * 2022-06-20 2022-07-22 成都信息工程大学 Patent text retrieval method and device based on multi-mode matrix vector representation

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1490744A (en) * 2002-09-19 2004-04-21 Method and system for searching confirmatory sentence
CN104462357A (en) * 2014-12-08 2015-03-25 百度在线网络技术(北京)有限公司 Method and device for realizing personalized search
CN106095771A (en) * 2016-05-07 2016-11-09 深圳职业技术学院 Writing householder method and device
CN108304390A (en) * 2017-12-15 2018-07-20 腾讯科技(深圳)有限公司 Training method, interpretation method, device based on translation model and storage medium
JP2018129016A (en) * 2017-02-09 2018-08-16 章光 森 System for generating sentence from words entered by user using document data

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1490744A (en) * 2002-09-19 2004-04-21 Method and system for searching confirmatory sentence
CN104462357A (en) * 2014-12-08 2015-03-25 百度在线网络技术(北京)有限公司 Method and device for realizing personalized search
CN106095771A (en) * 2016-05-07 2016-11-09 深圳职业技术学院 Writing householder method and device
JP2018129016A (en) * 2017-02-09 2018-08-16 章光 森 System for generating sentence from words entered by user using document data
CN108304390A (en) * 2017-12-15 2018-07-20 腾讯科技(深圳)有限公司 Training method, interpretation method, device based on translation model and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
SANJEEV ARORA, ET AL: ""A SIMPLE BUT TOUGH-TO-BEAT BASELINE FOR SENTENCE EMBEDDINGS"", 《ICLR 2017》 *
赵红红: ""汉语阅读理解问答题解答研究"", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111309866A (en) * 2020-02-15 2020-06-19 深圳前海黑顿科技有限公司 System and method for intelligently retrieving written materials by utilizing semantic fuzzy search
CN111309866B (en) * 2020-02-15 2023-09-15 深圳前海黑顿科技有限公司 System and method for intelligently searching authoring materials by utilizing semantic fuzzy search
CN113254574A (en) * 2021-03-15 2021-08-13 河北地质大学 Method, device and system for auxiliary generation of customs official documents
CN114780690A (en) * 2022-06-20 2022-07-22 成都信息工程大学 Patent text retrieval method and device based on multi-mode matrix vector representation

Similar Documents

Publication Publication Date Title
CN108717406B (en) Text emotion analysis method and device and storage medium
CN106649818B (en) Application search intention identification method and device, application search method and server
US8108204B2 (en) Text categorization using external knowledge
Gupta et al. A survey of text question answering techniques
US8275600B2 (en) Machine learning for transliteration
CN104679728B (en) A kind of text similarity detection method
Ahmed et al. Language identification from text using n-gram based cumulative frequency addition
CN103106287B (en) A kind of processing method and system of user search sentence
US20070219986A1 (en) Method and apparatus for extracting terms based on a displayed text
Jha et al. Homs: Hindi opinion mining system
CN103399901A (en) Keyword extraction method
CN109002473A (en) A kind of sentiment analysis method based on term vector and part of speech
CN108549723B (en) Text concept classification method and device and server
CN112069312B (en) Text classification method based on entity recognition and electronic device
CN110008312A (en) A kind of document writing assistant implementation method, system and electronic equipment
CN111694927A (en) Automatic document review method based on improved word-shifting distance algorithm
CN111027306A (en) Intellectual property matching technology based on keyword extraction and word shifting distance
Wang et al. Chinese subjectivity detection using a sentiment density-based naive Bayesian classifier
CN114139537A (en) Word vector generation method and device
CN111160007B (en) Search method and device based on BERT language model, computer equipment and storage medium
Ahmed et al. Question analysis for Arabic question answering systems
CN112559711A (en) Synonymous text prompting method and device and electronic equipment
Mohnot et al. Hybrid approach for Part of Speech Tagger for Hindi language
CN111259661A (en) New emotion word extraction method based on commodity comments
Maynard et al. Automatic language-independent induction of gazetteer lists

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190712

RJ01 Rejection of invention patent application after publication