CN110008312A - A kind of document writing assistant implementation method, system and electronic equipment - Google Patents
A kind of document writing assistant implementation method, system and electronic equipment Download PDFInfo
- Publication number
- CN110008312A CN110008312A CN201910284378.1A CN201910284378A CN110008312A CN 110008312 A CN110008312 A CN 110008312A CN 201910284378 A CN201910284378 A CN 201910284378A CN 110008312 A CN110008312 A CN 110008312A
- Authority
- CN
- China
- Prior art keywords
- sentence
- vector
- word
- information
- search
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Abstract
The invention discloses a kind of document writing assistant implementation method, system and electronic equipments comprising: in documents editing interface, the search terms that should include in the information to be searched for are inputted, described search item includes at least keyword or word or sentence;Described search item is converted into after term vector search and the matched sentence vector of term vector from the database pre-established, each sentence vector is arranged in an independent data cell of database, the data cell reference information included including at least sentence text information, sentence vector, sentence source, sentence;In documents editing interface, the reference information that sentence text information, sentence vector, sentence source, sentence in the corresponding data cell of return carry is for editor's selection.Sentence and word are all converted into real vector and are stored and matched by the present invention by term vector model.It is matched compared with prior art by dictionary or regularization expression formula, search result is more acurrate.
Description
Technical field
The present invention relates to the document edit method in natural language processing field, specifically a kind of document writing assistant
Implementation method, system and electronic equipment.
Background technique
When we are in Paper Writing and professional class technical documentation, it is not known that carried out accurately with vocabulary how or sentence
Description, especially when writing English papers, due to language barrier, cannot express the thing that we are really intended by.At present
There are no effective ground related art schemes to be prompted in writing, as the office of Microsoft carries grammar checker energy
Certain syntax check is carried out, but Office grammar checker mainly stores common word, examines after segmenting to sentence
Whether the word looked into sentence can find in dictionary;
And to search similar sentence can only carry out keyword search in Baidu's science, Google's science, meanwhile, these dragnets
Station is to go to be retrieved by the search of regularization expression formula, such as searches for " mobile phone ", then search result can only contain " mobile phone " two word
Document, if being write as " mobile terminal " in document retrieval less than, meanwhile, retrieved web return the result is that entire document go out
Locate network address and simple abstract, user, which needs to further click on website, can just check detailed result.To sum up, the prior art
Only has the function of the included wrong word that can only detect, but it is to the no too big help of the tissue of sentence, and searches for website
Detailed results cannot directly be returned.
Summary of the invention
Based on this, to solve above-mentioned deficiency, spy proposes a kind of document writing assistant realization method and system, effectively to solve
Certainly mentioned in background technique the technical issues of, can be realized intelligent retrieval and goes out similar statement list during document production
Up to document production personnel reference is supplied to, with help document, writing personnel faster more accurately complete document production.
A kind of document writing assistant implementation method characterized by comprising
S1, in documents editing interface, input the information to be searched in should include search terms, described search item
Including at least keyword or word or sentence;
S2, described search item are converted into after term vector search and the matched sentence of term vector from the database pre-established
Vector, each sentence vector are arranged in an independent data cell of database, which includes at least
The included reference information of sentence text information, sentence vector, sentence source, sentence;
Including at least with the matched sentence of term vector and its satellite information, the satellite information include at least sentence source,
The included reference information of sentence;
S3, sentence text information, sentence vector, sentence in documents editing interface, in the corresponding data cell of return
The included reference information sentence vector of sub- source, sentence is for editor's selection.Optionally, described in one of the embodiments,
The establishment process of database includes: and searches for from network data base in advance and arrange document, and extract text from document in S2
Information;The extraction process of the text information includes the text snippet extracted in document, right one by one after text and reference information
Text snippet or body matter are made pauses in reading unpunctuated ancient writings;Using the good term vector model of pre-training, all words in each punctuate are used
After term vector expression, participle and part-of-speech tagging are carried out to each word;It is obtained corresponding to current sentence based on the part of speech marked
Real vector, that is, sentence vector expression-form.
Optionally, the acquisition process of sentence vector expression-form includes based on the word marked in one of the embodiments,
Property to each word be weighted summation obtain sentence vector corresponding to current sentence.
Optionally, described in one of the embodiments, that marked part of speech is utilized to be weighted summation to each word
The sentence vector for obtaining sentence, which is expressed, includes:
Summation is weighted to each word based on the part of speech marked;The weighted sum formula is
Wherein, s indicates that sentence vector, N indicate the number of word in the sentence, and v indicates that term vector, α indicate corresponding weight;
The α weight calculation mode are as follows:F is single thus
The number that the word frequency of word, i.e. word occur in sentence.
Optionally, in one of the embodiments, from the database pre-established search with the matched sentence of term vector to
Amount process includes: the sentence that sentence vector of the search comprising the corresponding term vector of described search item and judgement search from the database
Whether vector meets similarity evaluation standard, is, confirms this Vectors matching.
Optionally, judge whether the sentence vector searched meets similarity evaluation standard in one of the embodiments, be
Then confirm that this Vectors matching includes: the inner product of vectors for obtaining the sentence vector term vector corresponding with described search item searched,
And pick out the corresponding all information of sentence vector after the sentence vector for meeting similarity evaluation value.
Optionally, sentence vector is corresponding after picking out the sentence vector for meeting similarity evaluation value in one of the embodiments,
If all information include: the sentence vector term vector corresponding with described search item currently searched inner product of vectors be greater than phase
Like degree evaluation of estimate, then this vector is stored in the interim array in database;It will be whole in interim array after to be searched
Sentence vector sorted from large to small according to the inner product of vectors of its term vector corresponding with described search item, and select multiple sentences to
Amount.
A kind of document writing assistant realization system characterized by comprising
Receiving module, for receiving the content information of input in documents editing interface;
MIM message input module, in documents editing interface, should include in the information to be searched for of input to be searched
Suo Xiang, described search item include at least keyword or word or sentence;
Information search module, for so that search terms be converted into after term vector from the database pre-established search with
The matched sentence vector of term vector, each sentence vector are arranged in an independent data cell of database, the number
According to the unit reference information included including at least sentence text information, sentence vector, sentence source, sentence;
Information feedback module, in documents editing interface, returning to the sentence text in the corresponding data cell
The included reference information of information, sentence vector, sentence source, sentence is for editor's selection.
Optionally, the establishment process of database includes: preparatory in the information search module in one of the embodiments,
It is searched for from network data base and arranges document, and extract text information from document;The extraction process packet of the text information
It includes the text snippet extracted in document, after text and reference information, makes pauses in reading unpunctuated ancient writings one by one to text snippet or body matter;It adopts
Each word is divided after all word word vectors expression in each punctuate with the good term vector model of pre-training
Word and part-of-speech tagging;Real vector i.e. sentence vector expression-form corresponding to current sentence is obtained based on the part of speech marked;Institute
Stating an acquisition process for vector expression-form includes being weighted summation to each word based on the part of speech marked to obtain currently
Sentence vector corresponding to sentence.
Optionally, described in one of the embodiments, that marked part of speech is utilized to be weighted summation to each word
The sentence vector for obtaining sentence, which is expressed, includes:
Summation is weighted to each word based on the part of speech marked;The weighted sum formula is
Wherein, s indicates that sentence vector, N indicate the number of word in the sentence, and v indicates that term vector, α indicate corresponding weight;
The α weight calculation mode are as follows:F is single thus
The number that the word frequency of word, i.e. word occur in sentence.
Optionally, in one of the embodiments, from the database pre-established search with the matched sentence of term vector to
Amount process includes: the sentence that sentence vector of the search comprising the corresponding term vector of described search item and judgement search from the database
Whether vector meets similarity evaluation standard, is, confirms this Vectors matching;It is described to judge whether the sentence vector searched accords with
Similarity evaluation standard is closed, is to confirm that this Vectors matching includes: that obtain the sentence vector that searches corresponding with described search item
Term vector inner product of vectors, and pick out the corresponding all information of sentence vector after the sentence vector for meeting similarity evaluation value;Institute
If stating the corresponding all information of sentence vector after picking out the sentence vector for meeting similarity evaluation value includes: the sentence currently searched
The inner product of vectors of vector term vector corresponding with described search item is greater than similarity evaluation value, then this vector is stored in database
In interim array in;After to be searched by sentence vectors whole in interim array according to its word corresponding with described search item to
The inner product of vectors of amount sorts from large to small, and selects multiple vectors.
A kind of electronic equipment, including processor, memory and be stored on the memory and can be on the processor
The computer program of fortune, the processor is for executing implementation method described above.
Compared with prior art, beneficial effects of the present invention:
Sentence and word are all converted into real vector and are stored and matched by the present invention by term vector model.Compared to existing
There is technology to match by dictionary or regularization expression formula, search result is more acurrate.Meanwhile directly storing general information, it uses
Really desired information can be directly obtained after the retrieval of family.Therefore the present invention can write work for document and provide necessary reference letter
Breath reduces user's search time, to accelerate the writing of document.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below
There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this
Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with
It obtains other drawings based on these drawings.
Wherein:
Fig. 1 is a kind of document writing assistant implementation method flow diagram;
Fig. 2 is the structural block diagram that a kind of document writing assistant realizes system;
Fig. 3 is core flow chart in intelligent server in the embodiment of the present invention;
Fig. 4 is core flow chart in SmartClient in the embodiment of the present invention.
Specific embodiment
In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to the accompanying drawings and embodiments, right
The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and
It is not used in the restriction present invention.
Unless otherwise defined, all technical and scientific terms used herein and belong to technical field of the invention
The normally understood meaning of technical staff is identical.Term as used herein in the specification of the present invention is intended merely to description tool
The purpose of the embodiment of body, it is not intended that the limitation present invention.It is appreciated that term " first " used in the present invention, " second "
Etc. can be used to describe various elements herein, but these elements should not be limited by these terms.These terms are only used to by first
A element and another element are distinguished.For example, in the case where not departing from scope of the present application, first element can be claimed
It can be first element by second element for second element, and similarly.First element and second element both element, but
It is not identity element.
To solve the technical problem in traditional technology, in the present embodiment, spy proposes a kind of document writing assistant realization
Method, can be during document production, and intelligent retrieval goes out similar sentence expression and is supplied to document production personnel reference, with side
Document writing personnel are helped faster more accurately to complete document production.As shown in Figure 1, being a kind of document writing assistant implementation method
Flow diagram, the document writing assistant implementation method,
Wherein, S1, in documents editing interface, input the information to be searched in should include search terms, it is described
Search terms include at least keyword or word or sentence (short sentence);
Wherein, S2, described search item are converted into after term vector search and term vector from the database pre-established
The sentence vector matched, each sentence vector are arranged in an independent data cell of database, and the data cell is extremely
The reference information included including sentence text information, sentence vector, sentence source, sentence less;Specifically, to save in database
Be that a sentence accounts for a data cell namely a line, this line includes many column, wherein a column are sentence texts, second
Column are a vectors, and subsequent several column may also include sentence source, the information such as reference;
In some specific embodiments, the establishment process of database includes: in advance from network data base in the S2
Search and arrangement document, and text information is extracted from document;The extraction process of the text information includes extracting in document
Text snippet, after text and reference information, make pauses in reading unpunctuated ancient writings one by one to text snippet or body matter, the punctuate includes to mark
Point symbol fullstop, question mark, exclamation mark etc. make pauses in reading unpunctuated ancient writings to abstract and body matter;It, will using the good term vector model of pre-training
After all word word vectors expression in each punctuate, each word is segmented by BI-LSTM model and CRF algorithm
And part-of-speech tagging, specifically due to being opened between English word by unit natural division of space in sentence, without participle;
And word and word are before without list separator in Chinese character sentence, it is therefore desirable to first segment operation to it, it is single that sentence, which is divided into word,
Position, a word may be a word, it is also possible to be multiple words.Such as English document, term vector mould first good using pre-training
Type, by BI-LSTM model and CRF algorithm, carries out part-of-speech tagging to each word after the expression of all word word vectors;It is right
In Chinese document, then term vector model first good using pre-training, after all words are indicated with word vector, passes through Bi-LSTM mould
Type and CRF algorithm carry out participle and part-of-speech tagging to sentence;Real number corresponding to current sentence is obtained based on the part of speech marked
Vector, that is, sentence vector expression-form.The acquisition process of this vector expression-form includes based on the part of speech marked to each word
It is weighted summation and obtains sentence vector corresponding to current sentence, this vector is the real vector of higher-dimension, specifically, in this reality
It applies in example, sentence vector is expressed as the real vector of 256 dimensions.It is described to utilize marked part of speech in some specific embodiments
To each word be weighted summation obtain sentence sentence vector expression include:
Summation is weighted to each word based on the part of speech marked;The weighted sum formula is
Wherein, s indicates that sentence vector, N indicate the number of word in the sentence, and v indicates that term vector, α indicate corresponding weight;
The α weight calculation mode are as follows:F is single thus
The number that the word frequency of word, i.e. word occur in sentence.In some specific embodiments, searched from the database pre-established
Rope and the matched sentence vector process of term vector include: sentence of the search comprising the corresponding term vector of described search item from the database
Vector simultaneously judges whether the sentence vector searched meets similarity evaluation standard, is to confirm this Vectors matching;Judgement search
To sentence vector whether meet similarity evaluation standard, be to confirm that this Vectors matching includes: to obtain the sentence vector that searches
The inner product of vectors of term vector corresponding with described search item, and pick out sentence vector pair after the sentence vector for meeting similarity evaluation value
The all information answered;If it includes: current for picking out the corresponding all information of sentence vector after the sentence vector for meeting similarity evaluation value
The inner product of vectors (each corresponding element, which is multiplied, sums) of the sentence vector searched term vector corresponding with described search item is greater than phase
Like degree evaluation of estimate, then this vector is stored in the interim array in database;It will be whole in interim array after to be searched
Sentence vector sorted from large to small according to the inner product of vectors of its term vector corresponding with described search item, and select multiple sentences to
Amount.In some specific embodiments, Euclidean distance, manhatton distance, Pearson correlation coefficient, Spearman can also be used
(grade) related coefficient, Jie Kade similarity factor or a variety of obtain one of common distance measure such as SimHash+ Hamming distance
Take similarity evaluation value.
Wherein, S3, in documents editing interface, return sentence text information in the corresponding data cell, sentence to
The included reference information sentence vector of amount, sentence source, sentence is for editor's selection.
Based on the above principles, a kind of document writing assistant realization system is additionally provided, shown in Fig. 2, which is characterized in that packet
It includes:
Receiving module, for receiving the content information of input in documents editing interface;
MIM message input module, in documents editing interface, should include in the information to be searched for of input to be searched
Suo Xiang, described search item include at least keyword or word or sentence;
Information search module, for so that search terms be converted into after term vector from the database pre-established search with
The matched sentence vector of term vector, each sentence vector are arranged in an independent data cell of database, the number
According to the unit reference information included including at least sentence text information, sentence vector, sentence source, sentence;An implementation wherein
In example, the establishment process of database includes: to search for and arrange in advance document from network data base in the information search module,
And text information is extracted from document;The extraction process of the text information includes the text snippet extracted in document, text
After reference information, make pauses in reading unpunctuated ancient writings one by one to text snippet or body matter;It, will be each using the good term vector model of pre-training
After all word word vectors expression in punctuate, participle and part-of-speech tagging are carried out to each word;Based on the part of speech marked
Obtain real vector, that is, sentence vector expression-form corresponding to current sentence;The acquisition process of the sentence vector expression-form includes
Summation is weighted to each word based on the part of speech marked and obtains sentence vector corresponding to current sentence.
It is described utilize marked part of speech to each word be weighted summation obtain sentence sentence vector expression include:
Summation is weighted to each word based on the part of speech marked;The weighted sum formula is
Wherein, s indicates that sentence vector, N indicate the number of word in the sentence, and v indicates that term vector, α indicate corresponding weight;
The α weight calculation mode are as follows:F is single thus
The number that the word frequency of word, i.e. word occur in sentence.Finally by text corresponding to all vectors, text source and text
The information such as reference involved in this are stored in database.
It include: to be searched for from the database from searching in the database pre-established with the matched sentence vector process of term vector
Sentence vector comprising the corresponding term vector of described search item simultaneously judges whether the sentence vector searched meets similarity evaluation standard,
It is to confirm this Vectors matching;It is described to judge whether the sentence vector searched meets similarity evaluation standard, it is that then confirmation should
Sentence Vectors matching includes: the inner product of vectors for obtaining the sentence vector term vector corresponding with described search item searched, and is picked out
Meet the corresponding all information of sentence vector after the sentence vector of similarity evaluation value;It is described to pick out the sentence for meeting similarity evaluation value
If the corresponding all information of sentence vector includes: the sentence vector term vector corresponding with described search item currently searched after vector
Inner product of vectors be greater than similarity evaluation value, then will this vector be stored in database in interim array in;After to be searched
Sentence vectors whole in interim array are sorted from large to small according to the inner product of vectors of its term vector corresponding with described search item, and
Select multiple vectors.
Information feedback module, in documents editing interface, returning to the sentence text in the corresponding data cell
The included reference information sentence vector of information, sentence vector, sentence source, sentence is for editor's selection.
A kind of electronic equipment, including processor, memory and be stored on the memory and can be on the processor
The computer program of fortune, the processor is for executing implementation method described above.
Based on above content, this case is illustrated with specific example below:
One thesis writing of embodiment
Information search module is arranged at intelligent server end, shown in Fig. 3: it, in advance will be a certain in information search module
Or multiple fields paper is all downloaded, and after the paper full text that periodical is delivered under electronic field IEEE is downloaded, extracts its text
This abstract, text and reference;Pass through punctuation mark to abstract and text: text is cut into sentence by fullstop, question mark, exclamation mark etc.
For unit;The information search module first obtains often English papers using the good term vector model of existing disclosed pre-training
The term vector of a word obtains the term vector of word using the BERT of Google in the present embodiment.Then, pass through Bi-LSTM mould
Type and CRF algorithm (GMM-CRF, CNN, RNN algorithm also can), carry out part-of-speech tagging, such as noun to each word, verb is then denoted as reality
Word, for example auxiliary word, pronoun are then designated as function word;The sentence vector expression of the higher-dimension real number of sentence is obtained by weighted sum with by sentence
Real vector is changed into, in the present embodiment, sentence is converted into the real vector of 256 dimensions.Alternatively, in addition to weighted sum obtains
It obtains outside sentence vector, the bag of words (BoW) based on statistics, RNN, CNN, the bag of words based on statistics, bag of words can also be passed through
The existing public technology such as model obtains sentence vector, and this example is not specifically limited in this embodiment.
All sentences are finally converted into real vector, and as unit of sentence, it will be in its all information deposit database
A data cell in, data unit form chart specific as follows;
Sentence text | Sentence vector | The source of sentence | Sentence reference 1 | Sentence reference 2 | Sentence reference 3 |
Wherein, sentence source indicates this sentence is where selected from, and is listed by way of reference citation;Meanwhile in paper
Many sentences can quote other bibliography, therefore, if there are reference citations for this sentence, list corresponding reference.Such as at this
In embodiment, if single sentence at most quotes 3 other documents.Therefore, if reference citation 1, reference citation 2 and document draw
With 3.Herein, all reference citations provide three kinds of formats, GB/T7714, MLA, tri- kinds of reference citation formats of APA.
SmartClient (setting receiving module, MIM message input module and information feedback module), shown in Fig. 4: user is writing
When writing paper, several keywords can be merely entered by MIM message input module for unfamiliar expression, SmartClient passes through
Keyword is transmitted through the network to intelligent server end by MIM message input module, and the information search module at intelligent server end will close
Keyword is converted to term vector, then carries out retrieving similar sentence in database, specifically, with inner product (each corresponding element of vector
Element, which is multiplied, sums) compare the product of two vector field homoemorphisms to judge similarity.Optionally, it can also be used Euclidean distance, manhatton distance,
Pearson correlation coefficient, Spearman (grade) related coefficient, Jie Kade similarity factor, SimHash+ Hamming distance etc. it is common away from
From one of estimating or a variety of.Such as using inner product of vectors as similarity is judged, then 1 indicate closest, 0 indicates least to connect
Closely.The sentence vector in the sentence vector of the sentence of retrieval and database is successively first calculated into inner product of vectors, such as less than 0.6 abandons,
Such as larger than 0.6, there are in an interim array, finally to sorting from large to small in array according to inner product, is chosen first three to five
A sentence finally returns to all information of similar sentence, is transmitted to client as most like sentence.Such as larger than 0.6 number
It is sky in group, then returns the result as sky, indicate no similar sentence.The result that the information feedback module display of SmartClient returns
To user, user can use for reference its expression to write corresponding sentence, meanwhile, its reproducible bibliography.
Two, patent drafting of embodiment
Information search module is arranged at intelligent server end: in information search module, by a certain field license
Book is all downloaded, and after downloading such as the granted patent of electronic field, extracts its abstract, claims and specification.To abstract and
Specification passes through punctuation mark: fullstop, question mark, and it is unit that text is cut into sentence by exclamation mark etc..To claims to weigh
Benefit requires to be that unit is divided.
All sentences are converted into real vector, then method is stored in database with embodiment 1;
Sentence text | Sentence vector | The source of sentence |
Wherein, sentence source indicates this sentence is where selected from, and is indicated by the patent No..
SmartClient (setting receiving module, MIM message input module and information feedback module): user when writing patent,
Several keywords can be merely entered for unfamiliar expression, client is passed keyword by network by MIM message input module
Intelligent server end is transported to, keyword is converted to term vector by the information search module at intelligent server end, then in database
It carries out retrieving similar sentence, specifically, the product of two vector field homoemorphisms is compared with the inner product (each corresponding element, which is multiplied, sums) of vector
To judge similarity.Optionally, Euclidean distance, manhatton distance, Pearson correlation coefficient, Spearman (grade) can also be used
Related coefficient, Jie Kade similarity factor, one of common distance measure such as SimHash+ Hamming distance or a variety of.Then, it selects
It takes first three most like to five sentences, returns to all information of similar sentence, be transmitted to client.The information of client is fed back
For the result that module display returns to user, user can use for reference its expression to write corresponding sentence, meanwhile, it can avoid as far as possible and existing
There is the claim of granted patent to be overlapped or conflict.In summary, the present invention is realized assists writing by the way that sentence semantics are similar
Make and weighting is constructed according to part of speech by sentence vector;Sentence vector can be quoted, the common storage mode in source simultaneously.
Implement the embodiment of the present invention, will have the following beneficial effects:
Sentence and word are all converted into real vector and are stored and matched by the present invention by term vector model.Compared to existing
There is technology to match by dictionary or regularization expression formula, search result is more acurrate.Meanwhile directly storing general information, it uses
Really desired information can be directly obtained after the retrieval of family.Therefore the present invention can write work for document and provide necessary reference letter
Breath reduces user's search time, to accelerate the writing of document.
The several embodiments of the application above described embodiment only expresses, the description thereof is more specific and detailed, but simultaneously
The limitation to the application the scope of the patents therefore cannot be interpreted as.It should be pointed out that for those of ordinary skill in the art
For, without departing from the concept of this application, various modifications and improvements can be made, these belong to the guarantor of the application
Protect range.Therefore, the scope of protection shall be subject to the appended claims for the application patent.
Claims (10)
1. a kind of document writing assistant implementation method characterized by comprising
S1, in documents editing interface, input the information to be searched in should include search terms, described search item is at least
Including keyword or word or sentence;
S2, described search item be converted into after term vector from the database pre-established search and the matched sentence of term vector to
Amount, each sentence vector are arranged in an independent data cell of database, which includes at least sentence
The included reference information of sub- text information, sentence vector, sentence source, sentence;
S3, in documents editing interface, return to the sentence text information in the corresponding data cell, sentence vector, sentence and go out
The included reference information of place, sentence is for editor's selection.
2. the method according to claim 1, wherein in the S2 establishment process of database include: in advance from
It is searched in network data base and arranges document, and extract text information from document;The extraction process of the text information includes
It extracts the text snippet in document, after text and reference information, makes pauses in reading unpunctuated ancient writings one by one to text snippet or body matter;Using
The good term vector model of pre-training carries out part of speech to each word after all word word vectors expression in each punctuate
Mark;Real vector i.e. sentence vector expression-form corresponding to current sentence is obtained based on the part of speech marked.
3. according to the method described in claim 2, it is characterized in that, the acquisition process of sentence vector expression-form includes being based on being marked
The part of speech of note is weighted summation to each word and obtains sentence vector corresponding to current sentence.
4. according to the method described in claim 3, it is characterized in that, described add each word using the part of speech marked
The sentence vector that power summation obtains sentence, which is expressed, includes:
Summation is weighted to each word based on the part of speech marked;The weighted sum formula is
Wherein, s indicates that sentence vector, N indicate the number of word in the sentence, and v indicates that term vector, α indicate corresponding weight;
The α weight calculation mode are as follows:F word thus
The number that word frequency, i.e. word occur in sentence.
5. being matched the method according to claim 1, wherein being searched for from the database pre-established with term vector
Sentence vector process include: from the database search comprising the corresponding term vector of described search item sentence vector and judge to search for
To sentence vector whether meet similarity evaluation standard, be to confirm this Vectors matching.
6. according to the method described in claim 5, it is characterized in that, whether the sentence vector that judgement searches meets similarity evaluation
Standard be confirm this Vectors matching include: obtain search sentence vector term vector corresponding with described search item to
Inner product is measured, and picks out the corresponding all information of sentence vector after the sentence vector for meeting similarity evaluation value.
7. according to the method described in claim 6, it is characterized in that, selecting sentence vector after the sentence vector for meeting similarity evaluation value
If corresponding all information includes: that the inner product of vectors of the sentence vector term vector corresponding with described search item currently searched is big
In similarity evaluation value, then this vector is stored in the interim array in database;It will be in interim array after to be searched
Whole sentence vectors are sorted from large to small according to the inner product of vectors of its term vector corresponding with described search item, and select multiple sentences
Vector.
8. a kind of document writing assistant realizes system characterized by comprising
Receiving module, for receiving the content information of input in documents editing interface;
MIM message input module, for inputting the search terms that should include in the information to be searched in documents editing interface,
Described search item includes at least keyword or word or sentence;
Information search module, for so that search terms be converted into after term vector from the database pre-established search and word to
Flux matched sentence vector, each sentence vector are arranged in an independent data cell of database, the data sheet
The member reference information included including at least sentence text information, sentence vector, sentence source, sentence;In the information search module
The establishment process of database includes: to search for from network data base in advance and arrange document, and extract text information from document;
The extraction process of the text information includes the text snippet extracted in document, after text and reference information, one by one to text
Abstract or body matter are made pauses in reading unpunctuated ancient writings;Using the good term vector model of pre-training, by all word words in each punctuate to
After amount indicates, participle and part-of-speech tagging are carried out to each word;Reality corresponding to current sentence is obtained based on the part of speech marked
Number vector, that is, sentence vector expression-form;The acquisition process of the sentence vector expression-form includes based on the part of speech marked to each
Word is weighted summation and obtains sentence vector corresponding to current sentence;
Information feedback module, in documents editing interface, return sentence text information in the corresponding data cell,
The included reference information of sentence vector, sentence source, sentence is for editor's selection.
9. system according to claim 8, which is characterized in that search for from the database pre-established and matched with term vector
Sentence vector process include: from the database search comprising the corresponding term vector of described search item sentence vector and judge to search for
To sentence vector whether meet similarity evaluation standard, be to confirm this Vectors matching;The sentence vector for judging to search
Whether meet similarity evaluation standard, be, confirms that this Vectors matching includes: to obtain the sentence vector searched and described search
The inner product of vectors of the corresponding term vector of item, and pick out the corresponding all letters of sentence vector after the sentence vector for meeting similarity evaluation value
Breath;If it is described pick out the sentence vector for meeting similarity evaluation value after the corresponding all information of sentence vector include: currently to be searched for
The inner product of vectors of the sentence vector arrived term vector corresponding with described search item is greater than similarity evaluation value, then is stored in this vector
In interim array in database;It is after to be searched that sentence vectors whole in interim array are corresponding with described search item according to it
The inner product of vectors of term vector sort from large to small, and select multiple vectors.
10. a kind of electronic equipment, including processor, memory and it is stored on the memory and can transports on the processor
Computer program, the processor is for executing implementation method described in the claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910284378.1A CN110008312A (en) | 2019-04-10 | 2019-04-10 | A kind of document writing assistant implementation method, system and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910284378.1A CN110008312A (en) | 2019-04-10 | 2019-04-10 | A kind of document writing assistant implementation method, system and electronic equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110008312A true CN110008312A (en) | 2019-07-12 |
Family
ID=67170706
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910284378.1A Pending CN110008312A (en) | 2019-04-10 | 2019-04-10 | A kind of document writing assistant implementation method, system and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110008312A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111309866A (en) * | 2020-02-15 | 2020-06-19 | 深圳前海黑顿科技有限公司 | System and method for intelligently retrieving written materials by utilizing semantic fuzzy search |
CN113254574A (en) * | 2021-03-15 | 2021-08-13 | 河北地质大学 | Method, device and system for auxiliary generation of customs official documents |
CN114780690A (en) * | 2022-06-20 | 2022-07-22 | 成都信息工程大学 | Patent text retrieval method and device based on multi-mode matrix vector representation |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1490744A (en) * | 2002-09-19 | 2004-04-21 | Method and system for searching confirmatory sentence | |
CN104462357A (en) * | 2014-12-08 | 2015-03-25 | 百度在线网络技术(北京)有限公司 | Method and device for realizing personalized search |
CN106095771A (en) * | 2016-05-07 | 2016-11-09 | 深圳职业技术学院 | Writing householder method and device |
CN108304390A (en) * | 2017-12-15 | 2018-07-20 | 腾讯科技(深圳)有限公司 | Training method, interpretation method, device based on translation model and storage medium |
JP2018129016A (en) * | 2017-02-09 | 2018-08-16 | 章光 森 | System for generating sentence from words entered by user using document data |
-
2019
- 2019-04-10 CN CN201910284378.1A patent/CN110008312A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1490744A (en) * | 2002-09-19 | 2004-04-21 | Method and system for searching confirmatory sentence | |
CN104462357A (en) * | 2014-12-08 | 2015-03-25 | 百度在线网络技术(北京)有限公司 | Method and device for realizing personalized search |
CN106095771A (en) * | 2016-05-07 | 2016-11-09 | 深圳职业技术学院 | Writing householder method and device |
JP2018129016A (en) * | 2017-02-09 | 2018-08-16 | 章光 森 | System for generating sentence from words entered by user using document data |
CN108304390A (en) * | 2017-12-15 | 2018-07-20 | 腾讯科技(深圳)有限公司 | Training method, interpretation method, device based on translation model and storage medium |
Non-Patent Citations (2)
Title |
---|
SANJEEV ARORA, ET AL: ""A SIMPLE BUT TOUGH-TO-BEAT BASELINE FOR SENTENCE EMBEDDINGS"", 《ICLR 2017》 * |
赵红红: ""汉语阅读理解问答题解答研究"", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111309866A (en) * | 2020-02-15 | 2020-06-19 | 深圳前海黑顿科技有限公司 | System and method for intelligently retrieving written materials by utilizing semantic fuzzy search |
CN111309866B (en) * | 2020-02-15 | 2023-09-15 | 深圳前海黑顿科技有限公司 | System and method for intelligently searching authoring materials by utilizing semantic fuzzy search |
CN113254574A (en) * | 2021-03-15 | 2021-08-13 | 河北地质大学 | Method, device and system for auxiliary generation of customs official documents |
CN114780690A (en) * | 2022-06-20 | 2022-07-22 | 成都信息工程大学 | Patent text retrieval method and device based on multi-mode matrix vector representation |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108717406B (en) | Text emotion analysis method and device and storage medium | |
CN106649818B (en) | Application search intention identification method and device, application search method and server | |
US8108204B2 (en) | Text categorization using external knowledge | |
Gupta et al. | A survey of text question answering techniques | |
US8275600B2 (en) | Machine learning for transliteration | |
CN104679728B (en) | A kind of text similarity detection method | |
Ahmed et al. | Language identification from text using n-gram based cumulative frequency addition | |
CN103106287B (en) | A kind of processing method and system of user search sentence | |
US20070219986A1 (en) | Method and apparatus for extracting terms based on a displayed text | |
Jha et al. | Homs: Hindi opinion mining system | |
CN103399901A (en) | Keyword extraction method | |
CN109002473A (en) | A kind of sentiment analysis method based on term vector and part of speech | |
CN108549723B (en) | Text concept classification method and device and server | |
CN112069312B (en) | Text classification method based on entity recognition and electronic device | |
CN110008312A (en) | A kind of document writing assistant implementation method, system and electronic equipment | |
CN111694927A (en) | Automatic document review method based on improved word-shifting distance algorithm | |
CN111027306A (en) | Intellectual property matching technology based on keyword extraction and word shifting distance | |
Wang et al. | Chinese subjectivity detection using a sentiment density-based naive Bayesian classifier | |
CN114139537A (en) | Word vector generation method and device | |
CN111160007B (en) | Search method and device based on BERT language model, computer equipment and storage medium | |
Ahmed et al. | Question analysis for Arabic question answering systems | |
CN112559711A (en) | Synonymous text prompting method and device and electronic equipment | |
Mohnot et al. | Hybrid approach for Part of Speech Tagger for Hindi language | |
CN111259661A (en) | New emotion word extraction method based on commodity comments | |
Maynard et al. | Automatic language-independent induction of gazetteer lists |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190712 |
|
RJ01 | Rejection of invention patent application after publication |