CN107688621A

CN107688621A - The optimization method and system of a kind of official documents and correspondence

Info

Publication number: CN107688621A
Application number: CN201710698292.4A
Authority: CN
Inventors: 刘月明; 梁岚; 李舰
Original assignee: Ai Mu (shanghai) Culture Media Co Ltd
Current assignee: Ai Mu (shanghai) Culture Media Co Ltd
Priority date: 2017-08-15
Filing date: 2017-08-15
Publication date: 2018-02-13
Anticipated expiration: 2037-08-15
Also published as: CN107688621B

Abstract

The invention discloses a kind of optimization method of official documents and correspondence and system, this method comprises the following steps：Text in official documents and correspondence is captured, obtains urtext；The urtext is handled, draws multiple first object texts；Receive the second target text that user selects from multiple first object texts；The corresponding temperature of each hot word in each first object text is calculated according to Newton's law of cooling；Similarity between remaining first object text and second target text is calculated according to default word2vec models；According to the height of the similarity, show that corresponding hot word recommends text to user；Its effect is：User can optimize the official documents and correspondence works of oneself by simple replacement operation, while amount of user effort is reduced, also improve the operating efficiency of user.

Description

The optimization method and system of a kind of official documents and correspondence

Technical field

The invention belongs to computer version areas of information technology, more particularly to the optimization method and system of a kind of official documents and correspondence.

Background technology

One good official documents and correspondence, to consider that its official documents and correspondence chromaticity and emotion embody, one section of word, a sentence even one Word, just spectators can be made to empathize or good opinion, this is only a good official documents and correspondence.There are different neologisms or hot word production daily Raw, these words and phrases may be exactly a spotlight in an official documents and correspondence.Traditional official documents and correspondence optimization, substantially collected by manpower or Person searches for nearest hot issue or popular word by search engine, and so often workload is big, operating efficiency is low and not Can meet the needs of creator.

The content of the invention

In order to solve the above problems, the present invention provides a kind of optimization method and system of official documents and correspondence, to solve in the prior art Workload greatly and ineffective defect.

A kind of technical scheme that the present invention takes is a kind of optimization method of official documents and correspondence, to comprise the following steps：

Text in official documents and correspondence is captured, obtains urtext；

The urtext is handled, draws multiple first object texts；

Receive the second target text that user selects from multiple first object texts；

The corresponding temperature of each hot word in each first object text is calculated according to Newton's law of cooling；

Phase between remaining first object text and second target text is calculated according to default word2vec models Like degree；

According to the height of the similarity, show that corresponding hot word recommends text to user.

Preferably, the text in official documents and correspondence is captured using crawler technology.

Preferably, word segmentation processing is carried out to the urtext by the way of jieba.

Preferably, using formula：

T'(t)=- k (T (t)-H), the corresponding temperature of each hot word in each target text, wherein T'(t are calculated) represent The speed of temperature change, negative sign represent cooling, and k represents cooling ratio and k>0, T (t) represents temperature T time t function, and H is represented Room temperature.

Preferably, according to default word2vec models calculate remaining first object text and second target text it Between similarity specifically include：

Be input to after being pre-processed to remaining first object text in word2vec models training obtain the word of multidimensional to Amount；

Attribute selection is carried out to the term vector of the multidimensional, draws corresponding characteristic；

The characteristic is input in the word2vec models with the second target text that user selects and carries out phase Calculated like degree.

Preferably, the pretreatment specifically includes the filtering of stop words, the filtering that punctuate meets and the filtering that expression meets.

The another technical solution that the present invention takes is a kind of optimization system of official documents and correspondence, including extraction unit, pretreatment list Member, receiving unit, processing unit and display unit；

The extraction unit is used to capture the text in official documents and correspondence, obtains urtext；

The pretreatment unit is used to handle the urtext, draws multiple first object texts；

The receiving unit is used to receive the second target text that user selects from multiple first object texts；

The processing unit includes the first computing unit and the second computing unit, and first computing unit is used for according to ox The law of cooling of pausing calculates the corresponding temperature of each hot word in each first object text, and second computing unit is used for according to pre- If word2vec models calculate similarity between remaining first object text and second target text；

The display unit is used for the height according to the similarity, shows that corresponding hot word recommends text to user.

Preferably, second computing unit specifically includes：

The characteristic and second target text are input to progress similarity meter in the word2vec models Calculate.

Using above-mentioned technical proposal, compared with prior art, by carrying out word segmentation processing to the text in official documents and correspondence, obtain more Individual first object text, and with reference to Newton's law of cooling calculate in each first object text the temperature of hot word and according to Word2vec models calculate the similarity between remaining first object text and second target text, by the similarity Height is ranked up, and recommends corresponding hot word to recommend text to user, user can optimize oneself by simple replacement operation Official documents and correspondence works, while amount of user effort is reduced, also improve the operating efficiency of user.

Brief description of the drawings

Fig. 1 is flow chart of the method for the present invention；

Fig. 2 is the system block diagram of the present invention.

Embodiment

In order that the technical problem to be solved in the present invention, technical scheme and advantage are clearer, below in conjunction with accompanying drawing and Specific embodiment is described in detail, and description here does not mean that all masters corresponding to the instantiation stated in embodiment Topic all refer in the claims.

With reference to shown in figure 1, a kind of optimization method of official documents and correspondence, comprise the following steps：

S101, the text in official documents and correspondence is captured, obtain urtext；

Specifically, the text in official documents and correspondence is captured using crawler technology, this avoid obtained by manual search Corresponding text, the operating efficiency of user is improved, in actual applications, also can obtain corresponding text by the way of purchase, The text can be a sentence, a paragraph or a chapter.

S102, the urtext is handled, draw multiple first object texts；

Specifically, word segmentation processing is carried out to the urtext by the way of jieba, can removes one by so handling A little insignificant words；Specifically include：Participle and part of speech standard are carried out first, using the word for meeting to specify part of speech as candidate word；So The TF-IDF values of each candidate word are calculated afterwards, are arranged according to the TF-IDF values descending of each candidate word, and are exported and specified number Vocabulary is as possible keyword, using the keyword as first object text.

Wherein, the calculation formula of TF (Term Frequency) word frequency is：

TF1=N/M, wherein N represent the word number that this feature item occurs, and M is the word number in text message；

The calculation formula of the reverse text frequencies of IDF (Inverse Document Frequency) is：

IDF=log D/Dw, wherein D represent total text message number, and Dw represents the text message number that keyword occurs.

S103, receive the second target text that user selects from multiple first object texts；

Specifically, user is selected mutually in requisition for the word of optimization, so convenient for the user to operate.

S104, the corresponding temperature of each hot word in each first object text is calculated according to Newton's law of cooling；

Specifically, using formula：

S105, calculated according to default word2vec models between remaining first object text and second target text Similarity；

Specifically, training in word2vec models is input to after being pre-processed to remaining first object text and obtains multidimensional Term vector；

The characteristic and the second target text are input in the word2vec models and carry out Similarity Measure.

S106, according to the height of the similarity, show that corresponding hot word recommends text to user.

Further, the pretreatment specifically includes the filtering of stop words, the filtering that punctuate meets and the mistake that expression meets Filter.

Using such scheme, user carries out word segmentation processing by the text in official documents and correspondence, obtains multiple first object texts, and The temperature of hot word is calculated in each first object text with reference to Newton's law of cooling and calculates remaining the according to word2vec models Similarity between one target text and second target text, is ranked up by the height of the similarity, is pushed away to user Recommend corresponding hot word and recommend text, user can optimize the official documents and correspondence works of oneself by simple replacement operation, reduce user While workload, the operating efficiency of user is also improved.

For example, carrying out word segmentation processing to " I understand PEOPLE SHOULD LEARN HOW TO IN TENTION you " this urtext, " my meeting ", " association " will be obtained " forgetting ", this multiple first object text, the text that then user's selection " my meeting " adjusts as needs, so " my meeting " Just turn into second target text, then according to calculating, show that hot words such as corresponding " we ", " you ", " you " and " thinking " pushes away Text is recommended, user can select corresponding hot word to be replaced, realize and original text is optimized according to the needs of oneself, reduce The workload of user, also improve the operating efficiency of user.

With reference to shown in figure 2, a kind of optimization system of official documents and correspondence, including extraction unit, pretreatment unit, receiving unit, processing Unit and display unit；

Specifically, can be shown according to the mode that descending arranges it is corresponding recommend hot word, user can by oneself need select It is corresponding to recommend hot word to be replaced, realize the optimization of official documents and correspondence.

Further, for the accurate calculating of the similarity, show that the high hot word of the degree of correlation recommends text, described second Computing unit specifically includes：

Finally it should be noted that foregoing description is the preferred embodiments of the present invention, one of ordinary skill in the art exists Under the enlightenment of the present invention, on the premise of without prejudice to present inventive concept and claim, expression as multiple types can be made, this The conversion of sample is each fallen within protection scope of the present invention.

Claims

1. a kind of optimization method of official documents and correspondence, it is characterised in that comprise the following steps：

Text in official documents and correspondence is captured, obtains urtext；

The urtext is handled, draws multiple first object texts；

Similarity between remaining first object text and second target text is calculated according to default word2vec models；

2. the optimization method of a kind of official documents and correspondence according to claim 1, it is characterised in that using crawler technology in official documents and correspondence Text is captured.

3. the optimization method of a kind of official documents and correspondence according to claim 1, it is characterised in that to described by the way of jieba Urtext carries out word segmentation processing.

4. the optimization method of a kind of official documents and correspondence according to claim 1, it is characterised in that using formula：

T'(t)=- k (T (t)-H), the corresponding temperature of each hot word in each target text, wherein T'(t are calculated) represent temperature The speed of change, negative sign represent cooling, and k represents cooling ratio and k>0, T (t) represents temperature T time t function, and H represents room Temperature.

5. the optimization method of a kind of official documents and correspondence according to claim 1, it is characterised in that according to default word2vec models The similarity calculated between each target text and second target text specifically includes：

Training in word2vec models is input to after being pre-processed to remaining first object text and obtains the term vector of multidimensional；

Second target text of the characteristic and user's selection is input in the word2vec models and carries out similarity Calculate.

6. the optimization method of a kind of official documents and correspondence according to claim 5, it is characterised in that the pretreatment specifically includes deactivation The filtering that the filtering and expression that the filtering of word, punctuate meet meet.

7. the optimization system of a kind of official documents and correspondence, it is characterised in that including extraction unit, pretreatment unit, receiving unit, processing unit And display unit；

The processing unit includes the first computing unit and the second computing unit, and first computing unit is used for cold according to newton But law calculates the corresponding temperature of each hot word in each first object text, and second computing unit is used for according to default Word2vec models calculate the similarity between remaining first object text and second target text；

8. the optimization system of a kind of official documents and correspondence according to claim 7, it is characterised in that second computing unit specifically wraps Include：

9. the optimization system of a kind of official documents and correspondence according to claim 8, it is characterised in that the pretreatment specifically includes deactivation The filtering that the filtering and expression that the filtering of word, punctuate meet meet.