CN107688621A - The optimization method and system of a kind of official documents and correspondence - Google Patents

The optimization method and system of a kind of official documents and correspondence Download PDF

Info

Publication number
CN107688621A
CN107688621A CN201710698292.4A CN201710698292A CN107688621A CN 107688621 A CN107688621 A CN 107688621A CN 201710698292 A CN201710698292 A CN 201710698292A CN 107688621 A CN107688621 A CN 107688621A
Authority
CN
China
Prior art keywords
text
correspondence
official documents
user
similarity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710698292.4A
Other languages
Chinese (zh)
Other versions
CN107688621B (en
Inventor
刘月明
梁岚
李舰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ai Mu (shanghai) Culture Media Co Ltd
Original Assignee
Ai Mu (shanghai) Culture Media Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ai Mu (shanghai) Culture Media Co Ltd filed Critical Ai Mu (shanghai) Culture Media Co Ltd
Priority to CN201710698292.4A priority Critical patent/CN107688621B/en
Publication of CN107688621A publication Critical patent/CN107688621A/en
Application granted granted Critical
Publication of CN107688621B publication Critical patent/CN107688621B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a kind of optimization method of official documents and correspondence and system, this method comprises the following steps:Text in official documents and correspondence is captured, obtains urtext;The urtext is handled, draws multiple first object texts;Receive the second target text that user selects from multiple first object texts;The corresponding temperature of each hot word in each first object text is calculated according to Newton's law of cooling;Similarity between remaining first object text and second target text is calculated according to default word2vec models;According to the height of the similarity, show that corresponding hot word recommends text to user;Its effect is:User can optimize the official documents and correspondence works of oneself by simple replacement operation, while amount of user effort is reduced, also improve the operating efficiency of user.

Description

The optimization method and system of a kind of official documents and correspondence
Technical field
The invention belongs to computer version areas of information technology, more particularly to the optimization method and system of a kind of official documents and correspondence.
Background technology
One good official documents and correspondence, to consider that its official documents and correspondence chromaticity and emotion embody, one section of word, a sentence even one Word, just spectators can be made to empathize or good opinion, this is only a good official documents and correspondence.There are different neologisms or hot word production daily Raw, these words and phrases may be exactly a spotlight in an official documents and correspondence.Traditional official documents and correspondence optimization, substantially collected by manpower or Person searches for nearest hot issue or popular word by search engine, and so often workload is big, operating efficiency is low and not Can meet the needs of creator.
The content of the invention
In order to solve the above problems, the present invention provides a kind of optimization method and system of official documents and correspondence, to solve in the prior art Workload greatly and ineffective defect.
A kind of technical scheme that the present invention takes is a kind of optimization method of official documents and correspondence, to comprise the following steps:
Text in official documents and correspondence is captured, obtains urtext;
The urtext is handled, draws multiple first object texts;
Receive the second target text that user selects from multiple first object texts;
The corresponding temperature of each hot word in each first object text is calculated according to Newton's law of cooling;
Phase between remaining first object text and second target text is calculated according to default word2vec models Like degree;
According to the height of the similarity, show that corresponding hot word recommends text to user.
Preferably, the text in official documents and correspondence is captured using crawler technology.
Preferably, word segmentation processing is carried out to the urtext by the way of jieba.
Preferably, using formula:
T'(t)=- k (T (t)-H), the corresponding temperature of each hot word in each target text, wherein T'(t are calculated) represent The speed of temperature change, negative sign represent cooling, and k represents cooling ratio and k>0, T (t) represents temperature T time t function, and H is represented Room temperature.
Preferably, according to default word2vec models calculate remaining first object text and second target text it Between similarity specifically include:
Be input to after being pre-processed to remaining first object text in word2vec models training obtain the word of multidimensional to Amount;
Attribute selection is carried out to the term vector of the multidimensional, draws corresponding characteristic;
The characteristic is input in the word2vec models with the second target text that user selects and carries out phase Calculated like degree.
Preferably, the pretreatment specifically includes the filtering of stop words, the filtering that punctuate meets and the filtering that expression meets.
The another technical solution that the present invention takes is a kind of optimization system of official documents and correspondence, including extraction unit, pretreatment list Member, receiving unit, processing unit and display unit;
The extraction unit is used to capture the text in official documents and correspondence, obtains urtext;
The pretreatment unit is used to handle the urtext, draws multiple first object texts;
The receiving unit is used to receive the second target text that user selects from multiple first object texts;
The processing unit includes the first computing unit and the second computing unit, and first computing unit is used for according to ox The law of cooling of pausing calculates the corresponding temperature of each hot word in each first object text, and second computing unit is used for according to pre- If word2vec models calculate similarity between remaining first object text and second target text;
The display unit is used for the height according to the similarity, shows that corresponding hot word recommends text to user.
Preferably, second computing unit specifically includes:
Be input to after being pre-processed to remaining first object text in word2vec models training obtain the word of multidimensional to Amount;
Attribute selection is carried out to the term vector of the multidimensional, draws corresponding characteristic;
The characteristic and second target text are input to progress similarity meter in the word2vec models Calculate.
Preferably, the pretreatment specifically includes the filtering of stop words, the filtering that punctuate meets and the filtering that expression meets.
Using above-mentioned technical proposal, compared with prior art, by carrying out word segmentation processing to the text in official documents and correspondence, obtain more Individual first object text, and with reference to Newton's law of cooling calculate in each first object text the temperature of hot word and according to Word2vec models calculate the similarity between remaining first object text and second target text, by the similarity Height is ranked up, and recommends corresponding hot word to recommend text to user, user can optimize oneself by simple replacement operation Official documents and correspondence works, while amount of user effort is reduced, also improve the operating efficiency of user.
Brief description of the drawings
Fig. 1 is flow chart of the method for the present invention;
Fig. 2 is the system block diagram of the present invention.
Embodiment
In order that the technical problem to be solved in the present invention, technical scheme and advantage are clearer, below in conjunction with accompanying drawing and Specific embodiment is described in detail, and description here does not mean that all masters corresponding to the instantiation stated in embodiment Topic all refer in the claims.
With reference to shown in figure 1, a kind of optimization method of official documents and correspondence, comprise the following steps:
S101, the text in official documents and correspondence is captured, obtain urtext;
Specifically, the text in official documents and correspondence is captured using crawler technology, this avoid obtained by manual search Corresponding text, the operating efficiency of user is improved, in actual applications, also can obtain corresponding text by the way of purchase, The text can be a sentence, a paragraph or a chapter.
S102, the urtext is handled, draw multiple first object texts;
Specifically, word segmentation processing is carried out to the urtext by the way of jieba, can removes one by so handling A little insignificant words;Specifically include:Participle and part of speech standard are carried out first, using the word for meeting to specify part of speech as candidate word;So The TF-IDF values of each candidate word are calculated afterwards, are arranged according to the TF-IDF values descending of each candidate word, and are exported and specified number Vocabulary is as possible keyword, using the keyword as first object text.
Wherein, the calculation formula of TF (Term Frequency) word frequency is:
TF1=N/M, wherein N represent the word number that this feature item occurs, and M is the word number in text message;
The calculation formula of the reverse text frequencies of IDF (Inverse Document Frequency) is:
IDF=log D/Dw, wherein D represent total text message number, and Dw represents the text message number that keyword occurs.
S103, receive the second target text that user selects from multiple first object texts;
Specifically, user is selected mutually in requisition for the word of optimization, so convenient for the user to operate.
S104, the corresponding temperature of each hot word in each first object text is calculated according to Newton's law of cooling;
Specifically, using formula:
T'(t)=- k (T (t)-H), the corresponding temperature of each hot word in each target text, wherein T'(t are calculated) represent The speed of temperature change, negative sign represent cooling, and k represents cooling ratio and k>0, T (t) represents temperature T time t function, and H is represented Room temperature.
S105, calculated according to default word2vec models between remaining first object text and second target text Similarity;
Specifically, training in word2vec models is input to after being pre-processed to remaining first object text and obtains multidimensional Term vector;
Attribute selection is carried out to the term vector of the multidimensional, draws corresponding characteristic;
The characteristic and the second target text are input in the word2vec models and carry out Similarity Measure.
S106, according to the height of the similarity, show that corresponding hot word recommends text to user.
Further, the pretreatment specifically includes the filtering of stop words, the filtering that punctuate meets and the mistake that expression meets Filter.
Using such scheme, user carries out word segmentation processing by the text in official documents and correspondence, obtains multiple first object texts, and The temperature of hot word is calculated in each first object text with reference to Newton's law of cooling and calculates remaining the according to word2vec models Similarity between one target text and second target text, is ranked up by the height of the similarity, is pushed away to user Recommend corresponding hot word and recommend text, user can optimize the official documents and correspondence works of oneself by simple replacement operation, reduce user While workload, the operating efficiency of user is also improved.
For example, carrying out word segmentation processing to " I understand PEOPLE SHOULD LEARN HOW TO IN TENTION you " this urtext, " my meeting ", " association " will be obtained " forgetting ", this multiple first object text, the text that then user's selection " my meeting " adjusts as needs, so " my meeting " Just turn into second target text, then according to calculating, show that hot words such as corresponding " we ", " you ", " you " and " thinking " pushes away Text is recommended, user can select corresponding hot word to be replaced, realize and original text is optimized according to the needs of oneself, reduce The workload of user, also improve the operating efficiency of user.
With reference to shown in figure 2, a kind of optimization system of official documents and correspondence, including extraction unit, pretreatment unit, receiving unit, processing Unit and display unit;
The extraction unit is used to capture the text in official documents and correspondence, obtains urtext;
The pretreatment unit is used to handle the urtext, draws multiple first object texts;
The receiving unit is used to receive the second target text that user selects from multiple first object texts;
The processing unit includes the first computing unit and the second computing unit, and first computing unit is used for according to ox The law of cooling of pausing calculates the corresponding temperature of each hot word in each first object text, and second computing unit is used for according to pre- If word2vec models calculate similarity between remaining first object text and second target text;
The display unit is used for the height according to the similarity, shows that corresponding hot word recommends text to user.
Specifically, can be shown according to the mode that descending arranges it is corresponding recommend hot word, user can by oneself need select It is corresponding to recommend hot word to be replaced, realize the optimization of official documents and correspondence.
Further, for the accurate calculating of the similarity, show that the high hot word of the degree of correlation recommends text, described second Computing unit specifically includes:
Be input to after being pre-processed to remaining first object text in word2vec models training obtain the word of multidimensional to Amount;
Attribute selection is carried out to the term vector of the multidimensional, draws corresponding characteristic;
The characteristic is input in the word2vec models with the second target text that user selects and carries out phase Calculated like degree.
Further, the pretreatment specifically includes the filtering of stop words, the filtering that punctuate meets and the mistake that expression meets Filter.
Finally it should be noted that foregoing description is the preferred embodiments of the present invention, one of ordinary skill in the art exists Under the enlightenment of the present invention, on the premise of without prejudice to present inventive concept and claim, expression as multiple types can be made, this The conversion of sample is each fallen within protection scope of the present invention.

Claims (9)

1. a kind of optimization method of official documents and correspondence, it is characterised in that comprise the following steps:
Text in official documents and correspondence is captured, obtains urtext;
The urtext is handled, draws multiple first object texts;
Receive the second target text that user selects from multiple first object texts;
The corresponding temperature of each hot word in each first object text is calculated according to Newton's law of cooling;
Similarity between remaining first object text and second target text is calculated according to default word2vec models;
According to the height of the similarity, show that corresponding hot word recommends text to user.
2. the optimization method of a kind of official documents and correspondence according to claim 1, it is characterised in that using crawler technology in official documents and correspondence Text is captured.
3. the optimization method of a kind of official documents and correspondence according to claim 1, it is characterised in that to described by the way of jieba Urtext carries out word segmentation processing.
4. the optimization method of a kind of official documents and correspondence according to claim 1, it is characterised in that using formula:
T'(t)=- k (T (t)-H), the corresponding temperature of each hot word in each target text, wherein T'(t are calculated) represent temperature The speed of change, negative sign represent cooling, and k represents cooling ratio and k>0, T (t) represents temperature T time t function, and H represents room Temperature.
5. the optimization method of a kind of official documents and correspondence according to claim 1, it is characterised in that according to default word2vec models The similarity calculated between each target text and second target text specifically includes:
Training in word2vec models is input to after being pre-processed to remaining first object text and obtains the term vector of multidimensional;
Attribute selection is carried out to the term vector of the multidimensional, draws corresponding characteristic;
Second target text of the characteristic and user's selection is input in the word2vec models and carries out similarity Calculate.
6. the optimization method of a kind of official documents and correspondence according to claim 5, it is characterised in that the pretreatment specifically includes deactivation The filtering that the filtering and expression that the filtering of word, punctuate meet meet.
7. the optimization system of a kind of official documents and correspondence, it is characterised in that including extraction unit, pretreatment unit, receiving unit, processing unit And display unit;
The extraction unit is used to capture the text in official documents and correspondence, obtains urtext;
The pretreatment unit is used to handle the urtext, draws multiple first object texts;
The receiving unit is used to receive the second target text that user selects from multiple first object texts;
The processing unit includes the first computing unit and the second computing unit, and first computing unit is used for cold according to newton But law calculates the corresponding temperature of each hot word in each first object text, and second computing unit is used for according to default Word2vec models calculate the similarity between remaining first object text and second target text;
The display unit is used for the height according to the similarity, shows that corresponding hot word recommends text to user.
8. the optimization system of a kind of official documents and correspondence according to claim 7, it is characterised in that second computing unit specifically wraps Include:
Training in word2vec models is input to after being pre-processed to remaining first object text and obtains the term vector of multidimensional;
Attribute selection is carried out to the term vector of the multidimensional, draws corresponding characteristic;
Second target text of the characteristic and user's selection is input in the word2vec models and carries out similarity Calculate.
9. the optimization system of a kind of official documents and correspondence according to claim 8, it is characterised in that the pretreatment specifically includes deactivation The filtering that the filtering and expression that the filtering of word, punctuate meet meet.
CN201710698292.4A 2017-08-15 2017-08-15 Method and system for optimizing file Active CN107688621B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710698292.4A CN107688621B (en) 2017-08-15 2017-08-15 Method and system for optimizing file

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710698292.4A CN107688621B (en) 2017-08-15 2017-08-15 Method and system for optimizing file

Publications (2)

Publication Number Publication Date
CN107688621A true CN107688621A (en) 2018-02-13
CN107688621B CN107688621B (en) 2021-06-29

Family

ID=61153398

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710698292.4A Active CN107688621B (en) 2017-08-15 2017-08-15 Method and system for optimizing file

Country Status (1)

Country Link
CN (1) CN107688621B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108509417A (en) * 2018-03-20 2018-09-07 腾讯科技(深圳)有限公司 Title generation method and equipment, storage medium, server
CN111340551A (en) * 2020-02-27 2020-06-26 广东博智林机器人有限公司 Method, device, terminal and storage medium for generating advertisement content
CN112015975A (en) * 2020-07-15 2020-12-01 北京淇瑀信息科技有限公司 Financial user-oriented information pushing method and device based on Newton's cooling law
CN113572753A (en) * 2021-07-16 2021-10-29 北京淇瑀信息科技有限公司 User equipment authentication method and device based on Newton's cooling law

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100217756A1 (en) * 2005-08-10 2010-08-26 Google Inc. Programmable Search Engine
US8312022B2 (en) * 2008-03-21 2012-11-13 Ramp Holdings, Inc. Search engine optimization
CN103377232A (en) * 2012-04-25 2013-10-30 阿里巴巴集团控股有限公司 Headline keyword recommendation method and system
CN104636334A (en) * 2013-11-06 2015-05-20 阿里巴巴集团控股有限公司 Keyword recommending method and device
CN106649536A (en) * 2016-11-01 2017-05-10 四川用联信息技术有限公司 Achievement of optimization of search engine keywords based on improved k Means algorithm

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100217756A1 (en) * 2005-08-10 2010-08-26 Google Inc. Programmable Search Engine
US8312022B2 (en) * 2008-03-21 2012-11-13 Ramp Holdings, Inc. Search engine optimization
CN103377232A (en) * 2012-04-25 2013-10-30 阿里巴巴集团控股有限公司 Headline keyword recommendation method and system
CN104636334A (en) * 2013-11-06 2015-05-20 阿里巴巴集团控股有限公司 Keyword recommending method and device
CN106649536A (en) * 2016-11-01 2017-05-10 四川用联信息技术有限公司 Achievement of optimization of search engine keywords based on improved k Means algorithm

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
王庆林等: "《气候变化领域本体手册》", 31 May 2015, 北京理工大学出版社 *
耿升华: "新词识别和热词排名方法研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108509417A (en) * 2018-03-20 2018-09-07 腾讯科技(深圳)有限公司 Title generation method and equipment, storage medium, server
CN111340551A (en) * 2020-02-27 2020-06-26 广东博智林机器人有限公司 Method, device, terminal and storage medium for generating advertisement content
CN112015975A (en) * 2020-07-15 2020-12-01 北京淇瑀信息科技有限公司 Financial user-oriented information pushing method and device based on Newton's cooling law
CN112015975B (en) * 2020-07-15 2023-11-14 北京淇瑀信息科技有限公司 Information pushing method and device for financial users based on Newton's law of cooling
CN113572753A (en) * 2021-07-16 2021-10-29 北京淇瑀信息科技有限公司 User equipment authentication method and device based on Newton's cooling law
CN113572753B (en) * 2021-07-16 2023-03-14 北京淇瑀信息科技有限公司 User equipment authentication method and device based on Newton's cooling law

Also Published As

Publication number Publication date
CN107688621B (en) 2021-06-29

Similar Documents

Publication Publication Date Title
WO2018205838A1 (en) Method and apparatus for retrieving similar video, and storage medium
CN108009228B (en) Method and device for setting content label and storage medium
CN110442777B (en) BERT-based pseudo-correlation feedback model information retrieval method and system
CN107180045B (en) Method for extracting geographic entity relation contained in internet text
CN102253982B (en) Query suggestion method based on query semantics and click-through data
WO2021218322A1 (en) Paragraph search method and apparatus, and electronic device and storage medium
WO2017024553A1 (en) Information emotion analysis method and system
CN107688621A (en) The optimization method and system of a kind of official documents and correspondence
CN105139211B (en) Product brief introduction generation method and system
US9818080B2 (en) Categorizing a use scenario of a product
CN110134792B (en) Text recognition method and device, electronic equipment and storage medium
CN103377258A (en) Method and device for classification display of microblog information
US20140379719A1 (en) System and method for tagging and searching documents
CN109710935A (en) A kind of museum guiding based on historical relic knowledge mapping and knowledge recommendation method
CN109325146A (en) A kind of video recommendation method, device, storage medium and server
US20090119283A1 (en) System and Method of Improving and Enhancing Electronic File Searching
US20150205860A1 (en) Information retrieval device, information retrieval method, and information retrieval program
CN108228612B (en) Method and device for extracting network event keywords and emotional tendency
CN106326210B (en) A kind of associated detecting method and device of text topic and emotion
US9928466B1 (en) Approaches for annotating phrases in search queries
CN115203421A (en) Method, device and equipment for generating label of long text and storage medium
TW202001621A (en) Corpus generating method and apparatus, and human-machine interaction processing method and apparatus
CN103927339A (en) System and method for reorganizing knowledge
CN109902286B (en) Entity identification method and device and electronic equipment
CN103020141A (en) Method and equipment for providing searching results

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant