CN107977472A - The method that house property class news article automatically generates - Google Patents

The method that house property class news article automatically generates Download PDF

Info

Publication number
CN107977472A
CN107977472A CN201711443090.1A CN201711443090A CN107977472A CN 107977472 A CN107977472 A CN 107977472A CN 201711443090 A CN201711443090 A CN 201711443090A CN 107977472 A CN107977472 A CN 107977472A
Authority
CN
China
Prior art keywords
character
data set
article
automatically generates
house property
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201711443090.1A
Other languages
Chinese (zh)
Other versions
CN107977472B (en
Inventor
白峻峰
张文战
刘子曜
苏伟杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhuge Qihang Suzhou Technology Co ltd
Original Assignee
Beijing Zhuge Zhaofang Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zhuge Zhaofang Information Technology Co Ltd filed Critical Beijing Zhuge Zhaofang Information Technology Co Ltd
Priority to CN201711443090.1A priority Critical patent/CN107977472B/en
Publication of CN107977472A publication Critical patent/CN107977472A/en
Application granted granted Critical
Publication of CN107977472B publication Critical patent/CN107977472B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/186Templates

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention relates to a kind of method that house property class news article automatically generates, comprise the following steps:Step(1):Data set is prepared, collection content includes article title, and summary and text three parts, three parts form sentence after collecting;Step(2):To step(1)In data set carry out modelling and training;Step(3):Based on step(2)After the completion of model training, text generation is carried out.It is an advantage of the invention that:Manual control writing is not required, saves manpower;Efficient work, cooperation efficiency are more times manually write;Generation is stablized, and the usual theme of context can control, and avoids the occurrence of yellow anti-sudden and violent probably content.

Description

The method that house property class news article automatically generates
Technical field
The present invention relates to a kind of method that house property class news article automatically generates.
Background technology
News article is issued, and can not only increase brand exposure, but also can bring flow, especially search engine long-tail flow Collection.But directly reprint, it is original not good enough, influence SEO(Search engine optimization), but it is completely manually original, open Sell larger, therefore we utilize artificial intelligence technology, the news article that trained a house property field automatically generates robot.And Content hair portion system of the Successful utilization to us.
Existing news category writing has two kinds mostly, and one kind is to reprint news, using crawler technology, is climbed with speed as soon as possible The original article in other sources is removed, is published to the website of oneself, another kind is artificial original writing, is created by manpower, automatic Generate article field, mainly some it is simple upset splicing, the content of generation is to people and unfriendly, merely for search engine Retrieval drainage uses.
The mode of existing these articles issue, there is larger shortcoming respectively, and consumption manpower is compared in manually creation, and reptile is reprinted, It is original bad, simply upset splicing, it is unfriendly to people.
The content of the invention
The defects of to overcome the prior art, the present invention provide a kind of method that house property class news article automatically generates, this hair Bright technical solution is:
The method that house property class news article automatically generates, comprises the following steps:
Step(1):Data set is prepared, collection content includes article title, and summary and text three parts, three parts collect After form sentence;
Step(2):To step(1)In data set carry out modelling and training;
Step(3):Based on step(2)After the completion of model training, text generation is carried out.
The step(1)Specifically include:Word frequency is counted, Arabic numerals handle and bebinning character and end Only character splices;The word frequency statistics are specially to carry out word frequency statistics to the language material of whole data set, occurring in data set Word of the frequency less than 10 times is replaced with setting character;The Arabic numerals processing is specially the appearance concentrated to data The digit of Arabic numerals together is marked;The bebinning character and final character splicing are specially in data set The foremost addition mark character representation article of every article start, in every article ending addition mark character representation below Article terminates.
The step(2)Specific method be:Build coder-decoder model, and to the encoder of the model and Decoder is separately trained:
Encoder:Being used as search condition by the whole section of character generated to bebinning character or current time goes whole training sets to examine Rope, obtains some related text collection, some before being taken according to Information-retrieval Algorithm;Some information of taking-up are carried out Coding, coding result is as the sentence for being adapted to generation;
Decoder:The sentence being currently generated is encoded;
Attention mechanism comprises the following steps that:Encoder output matrix is multiplied with decoder output matrix, obtains one group of power Weight values vector, weighted value vector are multiplied with encoder output matrix again, the information for the next character predicted, by one layer Softmax activation primitives, obtain the term vector of the final next character of prediction.
The step(3)Specific method be:
(3-1)Input starting words and expressions randomly selects banner word;
(3-2)According to starting words and expressions, whole data set is retrieved, some input information as encoder before taking;
(3-3)The result obtained to retrieval encodes;
(3-4)Present generated statement is encoded;
(3-5)Character late is obtained by attention mechanism;
(3-6)One sentence is generated untill there is end mark by beam search methods.
It is an advantage of the invention that:Manual control writing is not required, saves manpower;Efficient work, cooperation efficiency are manually to write More times made;Generation is stablized, and the usual theme of context can control, and avoids the occurrence of yellow anti-sudden and violent probably content.
Embodiment
The invention will now be further described with reference to specific embodiments, the advantages and features of the present invention will be with description and It is apparent.But these embodiments are only exemplary, do not form any restrictions to the scope of the present invention.People in the art Member it should be understood that without departing from the spirit and scope of the invention can to the details of technical solution of the present invention and form into Row modifications or substitutions, but these modifications and replacement are each fallen within protection scope of the present invention.
The present invention relates to a kind of method that house property class news article automatically generates, comprise the following steps:
Step(1):Data set is prepared, collection content includes article title, and summary and text three parts, three parts collect After form sentence;
Step(2):To step(1)In data set carry out modelling and training;
Step(3):Based on step(2)After the completion of model training, text generation is carried out.
The step(1)Specifically include:Word frequency is counted, Arabic numerals handle and bebinning character and end Only character splices;The word frequency statistics are specially to carry out word frequency statistics to the language material of whole data set, occurring in data set Word of the frequency less than 10 times is replaced with setting character, such as can be substituted for word of the word frequency less than 10<unk>;It is described Arabic numerals processing be specially that the digit of the Arabic numerals appeared together concentrated to data is marked, such as will The Arabic numerals appeared together are substituted for<numN>, the N in angle brackets represents the digit of numeral, such as:1234 should replace Change into<num4>, 342134 should be substituted for<num6>, the bebinning character and final character splicing are specially in data set In the foremost addition mark character representation article of every article start, in every article ending addition mark character list below Show that article terminates, with "<s>" character representation article starts, every article ending addition below "</s>" character representation article knot Beam.
The step(2)Specific method be:Build coder-decoder model, and to the encoder of the model and Decoder is separately trained:
Encoder:Being used as search condition by the whole section of character generated to bebinning character or current time goes whole training sets to examine Rope, obtains some related text collection, some before being taken according to Information-retrieval Algorithm;Some information of taking-up are carried out Coding, coding result is as the sentence for being adapted to generation;
Decoder:The sentence being currently generated is encoded;
Attention mechanism comprises the following steps that:Encoder output matrix is multiplied with decoder output matrix, obtains one group of power Weight values vector, weighted value vector are multiplied with encoder output matrix again, the information for the next character predicted, then by one Layer softmax activation primitives, obtain the term vector of the final next character of prediction.
The step(3)Specific method be:
(3-1)Input starting words and expressions randomly selects banner word;
(3-2)According to starting words and expressions, whole data set is retrieved, some input information as encoder before taking;
(3-3)The result obtained to retrieval encodes;
(3-4)Present generated statement is encoded;
(3-5)Character late is obtained by attention mechanism;
(3-6)One sentence is generated untill there is end mark by beam search methods.

Claims (4)

1. the method that house property class news article automatically generates, it is characterised in that comprise the following steps:
Step(1):Data set is prepared, collection content includes article title, and summary and text three parts, three parts collect After form sentence;
Step(2):To step(1)In data set carry out modelling and training;
Step(3):Based on step(2)After the completion of model training, text generation is carried out.
2. the method that house property class news article according to claim 1 automatically generates, it is characterised in that the step (1)Specifically include:Word frequency is counted, Arabic numerals are handled and bebinning character and final character splicing;The word Frequency statistics is specially to carry out word frequency statistics to the language material of whole data set, word of the frequency of occurrences in data set less than 10 times with setting Determine character to be replaced;The Arabic numerals processing is specially the Arabic numerals appeared together concentrated to data Digit is marked;The bebinning character and final character splicing are specially that the foremost of every article in data set adds Mark-on is known character representation article and is started, and in every article ending, addition mark character representation article terminates below.
3. the method that house property class news article according to claim 1 automatically generates, it is characterised in that the step (2)Specific method be:Coder-decoder model is built, and the encoder and decoder of the model are separately trained:
Encoder:Being used as search condition by the whole section of character generated to bebinning character or current time goes whole training sets to examine Rope, obtains some related text collection, some before being taken according to Information-retrieval Algorithm;Some information of taking-up are carried out Coding, coding result is as the sentence for being adapted to generation;
Decoder:The sentence being currently generated is encoded;
Attention mechanism comprises the following steps that:Encoder output matrix is multiplied with decoder output matrix, obtains one group of power Weight values vector, weighted value vector are multiplied with encoder output matrix again, the information for the next character predicted, by one layer Softmax activation primitives, obtain the term vector of the final next character of prediction.
4. the method that house property class news article according to claim 1 automatically generates, it is characterised in that the step (3)Specific method be:
(3-1)Input starting words and expressions randomly selects banner word;
(3-2)According to starting words and expressions, whole data set is retrieved, some input information as encoder before taking;
(3-3)The result obtained to retrieval encodes;
(3-4)Present generated statement is encoded;
(3-5)Character late is obtained by attention mechanism;
(3-6)One sentence is generated untill there is end mark by beam search methods.
CN201711443090.1A 2017-12-27 2017-12-27 Method for automatically generating house property news articles Active CN107977472B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711443090.1A CN107977472B (en) 2017-12-27 2017-12-27 Method for automatically generating house property news articles

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711443090.1A CN107977472B (en) 2017-12-27 2017-12-27 Method for automatically generating house property news articles

Publications (2)

Publication Number Publication Date
CN107977472A true CN107977472A (en) 2018-05-01
CN107977472B CN107977472B (en) 2021-11-05

Family

ID=62007995

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711443090.1A Active CN107977472B (en) 2017-12-27 2017-12-27 Method for automatically generating house property news articles

Country Status (1)

Country Link
CN (1) CN107977472B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109492112A (en) * 2018-10-24 2019-03-19 北京百科康讯科技有限公司 A kind of method of the computer aided writing scientific popular article of knowledge based map
CN110555198A (en) * 2018-05-31 2019-12-10 北京百度网讯科技有限公司 method, apparatus, device and computer-readable storage medium for generating article

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1489088A (en) * 2003-08-28 2004-04-14 北京英业科技开发有限公司 Computer article storing and displaying method
US20120053937A1 (en) * 2010-08-31 2012-03-01 International Business Machines Corporation Generalizing text content summary from speech content
CN105320642A (en) * 2014-06-30 2016-02-10 中国科学院声学研究所 Automatic abstract generation method based on concept semantic unit
CN105930314A (en) * 2016-04-14 2016-09-07 清华大学 Text summarization generation system and method based on coding-decoding deep neural networks
CN106126507A (en) * 2016-06-22 2016-11-16 哈尔滨工业大学深圳研究生院 A kind of based on character-coded degree of depth nerve interpretation method and system
CN106649223A (en) * 2016-12-23 2017-05-10 北京文因互联科技有限公司 Financial report automatic generation method based on natural language processing
CN106682387A (en) * 2016-10-26 2017-05-17 百度国际科技(深圳)有限公司 Method and device used for outputting information
US20170192956A1 (en) * 2015-12-31 2017-07-06 Google Inc. Generating parse trees of text segments using neural networks
CN106980683A (en) * 2017-03-30 2017-07-25 中国科学技术大学苏州研究院 Blog text snippet generation method based on deep learning
CN107145482A (en) * 2017-03-28 2017-09-08 百度在线网络技术(北京)有限公司 Article generation method and device, equipment and computer-readable recording medium based on artificial intelligence
CN107193792A (en) * 2017-05-18 2017-09-22 北京百度网讯科技有限公司 The method and apparatus of generation article based on artificial intelligence
CN107391609A (en) * 2017-07-01 2017-11-24 南京理工大学 A kind of Image Description Methods of two-way multi-modal Recursive Networks

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1489088A (en) * 2003-08-28 2004-04-14 北京英业科技开发有限公司 Computer article storing and displaying method
US20120053937A1 (en) * 2010-08-31 2012-03-01 International Business Machines Corporation Generalizing text content summary from speech content
CN105320642A (en) * 2014-06-30 2016-02-10 中国科学院声学研究所 Automatic abstract generation method based on concept semantic unit
US20170192956A1 (en) * 2015-12-31 2017-07-06 Google Inc. Generating parse trees of text segments using neural networks
CN105930314A (en) * 2016-04-14 2016-09-07 清华大学 Text summarization generation system and method based on coding-decoding deep neural networks
CN106126507A (en) * 2016-06-22 2016-11-16 哈尔滨工业大学深圳研究生院 A kind of based on character-coded degree of depth nerve interpretation method and system
CN106682387A (en) * 2016-10-26 2017-05-17 百度国际科技(深圳)有限公司 Method and device used for outputting information
CN106649223A (en) * 2016-12-23 2017-05-10 北京文因互联科技有限公司 Financial report automatic generation method based on natural language processing
CN107145482A (en) * 2017-03-28 2017-09-08 百度在线网络技术(北京)有限公司 Article generation method and device, equipment and computer-readable recording medium based on artificial intelligence
CN106980683A (en) * 2017-03-30 2017-07-25 中国科学技术大学苏州研究院 Blog text snippet generation method based on deep learning
CN107193792A (en) * 2017-05-18 2017-09-22 北京百度网讯科技有限公司 The method and apparatus of generation article based on artificial intelligence
CN107391609A (en) * 2017-07-01 2017-11-24 南京理工大学 A kind of Image Description Methods of two-way multi-modal Recursive Networks

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
LOPYREV,K.: "Generating News Headlines with Recurrent Neural Networks", 《ARXIV》 *
李宝程: "基于浅层语义分析的文本摘要方法研究与实现", 《中国优秀硕士学位论文全文数据库(电子期刊)》 *
李庆丰: "基于主题模型的多文档自动文摘方法研究", 《中国优秀硕士学位论文全文数据库(电子期刊)》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110555198A (en) * 2018-05-31 2019-12-10 北京百度网讯科技有限公司 method, apparatus, device and computer-readable storage medium for generating article
CN110555198B (en) * 2018-05-31 2023-05-23 北京百度网讯科技有限公司 Method, apparatus, device and computer readable storage medium for generating articles
CN109492112A (en) * 2018-10-24 2019-03-19 北京百科康讯科技有限公司 A kind of method of the computer aided writing scientific popular article of knowledge based map

Also Published As

Publication number Publication date
CN107977472B (en) 2021-11-05

Similar Documents

Publication Publication Date Title
Zhang et al. Deconvolutional paragraph representation learning
CN107908671A (en) Knowledge mapping construction method and system based on law data
CN108415977A (en) One is read understanding method based on the production machine of deep neural network and intensified learning
WO2021077974A1 (en) Personalized dialogue content generating method
CN108255805A (en) The analysis of public opinion method and device, storage medium, electronic equipment
CN112463424B (en) Graph-based end-to-end program repairing method
CN109189862A (en) A kind of construction of knowledge base method towards scientific and technological information analysis
CN108549658A (en) A kind of deep learning video answering method and system based on the upper attention mechanism of syntactic analysis tree
AU2020244577A1 (en) Slot filling with contextual information
CN112084841A (en) Cross-modal image multi-style subtitle generation method and system
CN106557298A (en) Background towards intelligent robot matches somebody with somebody sound outputting method and device
CN104376842A (en) Neural network language model training method and device and voice recognition method
CN107977472A (en) The method that house property class news article automatically generates
CN115906815B (en) Error correction method and device for modifying one or more types of error sentences
CN111198966B (en) Natural language video clip retrieval method based on multi-agent boundary perception network
CN114118065A (en) Chinese text error correction method and device in electric power field, storage medium and computing equipment
CN114492407A (en) News comment generation method, system, equipment and storage medium
CN108549636A (en) A kind of race written broadcasting live critical sentence abstracting method
CN114490991A (en) Dialog structure perception dialog method and system based on fine-grained local information enhancement
Yuan et al. Controllable video captioning with an exemplar sentence
CN114238652A (en) Industrial fault knowledge map establishing method for end-to-end scene
CN115983274A (en) Noise event extraction method based on two-stage label correction
CN116737922A (en) Tourist online comment fine granularity emotion analysis method and system
CN114691858A (en) Improved UNILM abstract generation method
Tian et al. BEBERT: Efficient and robust binary ensemble BERT

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB03 Change of inventor or designer information
CB03 Change of inventor or designer information

Inventor after: Li Zuochao

Inventor after: Bai Junfeng

Inventor after: Zhang Wenzhan

Inventor after: Liu Ziyao

Inventor after: Su Weijie

Inventor before: Bai Junfeng

Inventor before: Zhang Wenzhan

Inventor before: Liu Ziyao

Inventor before: Su Weijie

GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address
CP03 Change of name, title or address

Address after: Floor 20, Building 6, Smart Valley Park, the Taihu Lake Software Industrial Park, No. 1421, Wuzhong Avenue, Yuexi Street, Suzhou Economic Development Zone, Jiangsu Province, 215000

Patentee after: Zhuge Qihang (Suzhou) Technology Co.,Ltd.

Address before: No. 506, Xingang center, No. 16, Jiuxianqiao Road, Jiangtai Township, Chaoyang District, Beijing 100015

Patentee before: BEIJING ZHUGE ZHAOFANG INFORMATION TECHNOLOGY Co.,Ltd.