CN107977472B - Method for automatically generating house property news articles - Google Patents

Method for automatically generating house property news articles Download PDF

Info

Publication number
CN107977472B
CN107977472B CN201711443090.1A CN201711443090A CN107977472B CN 107977472 B CN107977472 B CN 107977472B CN 201711443090 A CN201711443090 A CN 201711443090A CN 107977472 B CN107977472 B CN 107977472B
Authority
CN
China
Prior art keywords
character
data set
article
initial
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201711443090.1A
Other languages
Chinese (zh)
Other versions
CN107977472A (en
Inventor
李作潮
白峻峰
张文战
刘子曜
苏伟杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhuge Qihang Suzhou Technology Co ltd
Original Assignee
Beijing Zhuge Zhaofang Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zhuge Zhaofang Information Technology Co ltd filed Critical Beijing Zhuge Zhaofang Information Technology Co ltd
Priority to CN201711443090.1A priority Critical patent/CN107977472B/en
Publication of CN107977472A publication Critical patent/CN107977472A/en
Application granted granted Critical
Publication of CN107977472B publication Critical patent/CN107977472B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/186Templates

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a method for automatically generating a house property news article, which comprises the following steps: step (1): preparing a data set, collecting three parts including article titles, abstracts and texts, and summarizing the three parts to form sentences; step (2): carrying out model design and training on the data set in the step (1); and (3): and (3) generating a text after the model training based on the step (2) is finished. The invention has the advantages that: manual control writing is not needed, and manpower is saved; the work is efficient, and the cooperation efficiency is multiple times of that of manual writing; the generation is stable, the context is usually subject to control, and the yellow anti-riot terrorist content is avoided.

Description

Method for automatically generating house property news articles
Technical Field
The invention relates to a method for automatically generating a house property news article.
Background
The release of news articles can increase brand exposure and bring traffic, especially the collection of long tail traffic of search engines. However, the news article automatic generation robot in the real estate field is trained by using an artificial intelligence technology because the news article automatic generation robot directly transfers the news article, the originality is not good enough, and the SEO (search engine optimization) is influenced, but the cost is high because the news article automatic generation robot completely depends on artificial originality. And successfully applied to our content distribution system.
The existing news writing is mostly divided into two types, one is to transfer news, original articles from other sources are crawled at a fast speed by using a crawler technology and are released to a website of the user, the other is to manually write the original articles, manual creation is performed, the articles are automatically generated in the field of splicing, the generated contents are not friendly to people, and the articles are simply used for searching and guiding search engines.
The existing article release modes have the defects that manual creation consumes manpower, reprinting of reptiles is poor in originality, simple in disorganization and splicing and not friendly to people.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a method for automatically generating a house property news article, and the technical scheme of the invention is as follows:
the method for automatically generating the house property news articles comprises the following steps:
step (1): preparing a data set, collecting three parts including article titles, abstracts and texts, and summarizing the three parts to form sentences;
step (2): carrying out model design and training on the data set in the step (1);
and (3): and (3) generating a text after the model training based on the step (2) is finished.
The step (1) specifically comprises the following steps: carrying out statistics, Arabic number processing and initial character and termination character splicing on the word frequency; the word frequency statistics specifically includes performing word frequency statistics on the corpus of the whole data set, and replacing words with the frequency less than 10 times in the data set with set characters; the Arabic number processing is specifically to mark digits of Arabic numbers appearing together in a data set; the splicing of the starting character and the ending character is specifically that a mark character is added to the forefront of each article in the data set to indicate the beginning of the article, and a mark character is added to the back of the end of each article to indicate the end of the article.
The specific method of the step (2) is as follows: constructing a coder-decoder model, and separately training the coder and decoder of the model:
an encoder: searching all training sets by taking the initial character or the whole character generated at the current moment as a searching condition to obtain a plurality of related text sets, and selecting the first plurality of text sets according to an information searching algorithm; coding the taken information, and taking the coding result as a statement suitable for generation;
a decoder: encoding the currently generated sentence;
the specific steps of the attention mechanism are as follows: multiplying the encoder output matrix and the decoder output matrix to obtain a group of weight value vectors, multiplying the weight value vectors by the encoder output matrix again to obtain information of a predicted next character, and obtaining a final word vector for predicting the next character through a softmax activation function.
The specific method of the step (3) is as follows:
(3-1) inputting an initial sentence or randomly selecting an initial character;
(3-2) retrieving the whole data set according to the initial words and sentences, and taking the first plurality of data sets as input information of the encoder;
(3-3) encoding the result obtained by the retrieval;
(3-4) encoding the current generated sentence;
(3-5) obtaining a next character through an attention mechanism;
(3-6) generating a sentence by the beam search method until the terminator occurs.
The invention has the advantages that: manual control writing is not needed, and manpower is saved; the work is efficient, and the cooperation efficiency is multiple times of that of manual writing; the generation is stable, the context is usually subject to control, and the yellow anti-riot terrorist content is avoided.
Detailed Description
The invention will be further described with reference to specific embodiments, and the advantages and features of the invention will become apparent as the description proceeds. These examples are illustrative only and do not limit the scope of the present invention in any way. It will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention, and that such changes and modifications may be made without departing from the spirit and scope of the invention.
The invention relates to a method for automatically generating a house property news article, which comprises the following steps:
step (1): preparing a data set, collecting three parts including article titles, abstracts and texts, and summarizing the three parts to form sentences;
step (2): carrying out model design and training on the data set in the step (1);
and (3): and (3) generating a text after the model training based on the step (2) is finished.
The step (1) specifically comprises the following steps: carrying out statistics, Arabic number processing and initial character and termination character splicing on the word frequency; the word frequency statistics is specifically to perform word frequency statistics on the corpus of the whole data set, and replace words with the frequency less than 10 times in the data set with set characters, for example, words with the frequency less than 10 can be replaced with < unk >; the arabic number processing is specifically to mark the digits of the arabic numbers appearing together in the data set, for example, the arabic numbers appearing together are replaced with < numN >, and N in parentheses represents the digits of the digits, for example: 1234 should be replaced by < num4>, 342134 should be replaced by < num6>, and the splicing of the start character and the end character is specifically implemented by adding a mark character at the forefront of each article in the data set to indicate the beginning of the article, adding a mark character after the end of each article to indicate the end of the article, using a "< s >" character to indicate the beginning of the article, and adding a "</s >" character after the end of each article to indicate the end of the article.
The specific method of the step (2) is as follows: constructing a coder-decoder model, and separately training the coder and decoder of the model:
an encoder: searching all training sets by taking the initial character or the whole character generated at the current moment as a searching condition to obtain a plurality of related text sets, and selecting the first plurality of text sets according to an information searching algorithm; coding the taken information, and taking the coding result as a statement suitable for generation;
a decoder: encoding the currently generated sentence;
the specific steps of the attention mechanism are as follows: and multiplying the encoder output matrix and the decoder output matrix to obtain a group of weight value vectors, multiplying the weight value vectors by the encoder output matrix again to obtain the information of the next predicted character, and performing a softmax activation function to obtain the final word vector for predicting the next character.
The specific method of the step (3) is as follows:
(3-1) inputting an initial sentence or randomly selecting an initial character;
(3-2) retrieving the whole data set according to the initial words and sentences, and taking the first plurality of data sets as input information of the encoder;
(3-3) encoding the result obtained by the retrieval;
(3-4) encoding the current generated sentence;
(3-5) obtaining a next character through an attention mechanism;
(3-6) generating a sentence by the beam search method until the terminator occurs.

Claims (1)

1. The method for automatically generating the house property news articles is characterized by comprising the following steps of:
step (1): preparing a data set, collecting three parts including article titles, abstracts and texts, and summarizing the three parts to form sentences;
step (2): carrying out model design and training on the data set in the step (1);
and (3): after the model training based on the step (2) is finished, generating a text;
the step (1) specifically comprises the following steps: carrying out statistics, Arabic number processing and initial character and termination character splicing on the word frequency; the word frequency statistics specifically includes performing word frequency statistics on the corpus of the whole data set, and replacing words with the frequency less than 10 times in the data set with set characters; the Arabic number processing is specifically to mark digits of Arabic numbers appearing together in a data set; the splicing of the starting character and the ending character is specifically that a mark character is added to the forefront of each article in the data set to indicate the beginning of the article, and a mark character is added to the back of the end of each article to indicate the end of the article;
the specific method of the step (2) is as follows: constructing a coder-decoder model, and separately training the coder and decoder of the model:
an encoder: searching all training sets by taking the initial character or the whole character generated at the current moment as a searching condition to obtain a plurality of related text sets, and selecting the first plurality of text sets according to an information searching algorithm; coding the taken information, and taking the coding result as a statement suitable for generation;
a decoder: encoding the currently generated sentence;
the specific steps of the attention mechanism are as follows: multiplying the encoder output matrix with the decoder output matrix to obtain a group of weight value vectors, multiplying the weight value vectors with the encoder output matrix again to obtain information of a predicted next character, and obtaining a final word vector for predicting the next character through a softmax activation function;
the specific method of the step (3) is as follows:
(3-1) inputting an initial sentence or randomly selecting an initial character;
(3-2) retrieving the whole data set according to the initial words and sentences, and taking the first plurality of data sets as input information of the encoder;
(3-3) encoding the result obtained by the retrieval;
(3-4) encoding the current generated sentence;
(3-5) obtaining a next character through an attention mechanism;
(3-6) generating a sentence by the beamsearch method until the terminator occurs.
CN201711443090.1A 2017-12-27 2017-12-27 Method for automatically generating house property news articles Active CN107977472B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711443090.1A CN107977472B (en) 2017-12-27 2017-12-27 Method for automatically generating house property news articles

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711443090.1A CN107977472B (en) 2017-12-27 2017-12-27 Method for automatically generating house property news articles

Publications (2)

Publication Number Publication Date
CN107977472A CN107977472A (en) 2018-05-01
CN107977472B true CN107977472B (en) 2021-11-05

Family

ID=62007995

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711443090.1A Active CN107977472B (en) 2017-12-27 2017-12-27 Method for automatically generating house property news articles

Country Status (1)

Country Link
CN (1) CN107977472B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110555198B (en) * 2018-05-31 2023-05-23 北京百度网讯科技有限公司 Method, apparatus, device and computer readable storage medium for generating articles
CN109492112A (en) * 2018-10-24 2019-03-19 北京百科康讯科技有限公司 A kind of method of the computer aided writing scientific popular article of knowledge based map

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105320642A (en) * 2014-06-30 2016-02-10 中国科学院声学研究所 Automatic abstract generation method based on concept semantic unit
CN105930314A (en) * 2016-04-14 2016-09-07 清华大学 Text summarization generation system and method based on coding-decoding deep neural networks
CN106649223A (en) * 2016-12-23 2017-05-10 北京文因互联科技有限公司 Financial report automatic generation method based on natural language processing
CN106980683A (en) * 2017-03-30 2017-07-25 中国科学技术大学苏州研究院 Blog text snippet generation method based on deep learning
CN107145482A (en) * 2017-03-28 2017-09-08 百度在线网络技术(北京)有限公司 Article generation method and device, equipment and computer-readable recording medium based on artificial intelligence
CN107193792A (en) * 2017-05-18 2017-09-22 北京百度网讯科技有限公司 The method and apparatus of generation article based on artificial intelligence

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1489088A (en) * 2003-08-28 2004-04-14 北京英业科技开发有限公司 Computer article storing and displaying method
CN102385861B (en) * 2010-08-31 2013-07-31 国际商业机器公司 System and method for generating text content summary from speech content
US10268671B2 (en) * 2015-12-31 2019-04-23 Google Llc Generating parse trees of text segments using neural networks
CN106126507B (en) * 2016-06-22 2019-08-09 哈尔滨工业大学深圳研究生院 A kind of depth nerve interpretation method and system based on character code
CN106682387A (en) * 2016-10-26 2017-05-17 百度国际科技(深圳)有限公司 Method and device used for outputting information
CN107391609B (en) * 2017-07-01 2020-07-31 南京理工大学 Image description method of bidirectional multi-mode recursive network

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105320642A (en) * 2014-06-30 2016-02-10 中国科学院声学研究所 Automatic abstract generation method based on concept semantic unit
CN105930314A (en) * 2016-04-14 2016-09-07 清华大学 Text summarization generation system and method based on coding-decoding deep neural networks
CN106649223A (en) * 2016-12-23 2017-05-10 北京文因互联科技有限公司 Financial report automatic generation method based on natural language processing
CN107145482A (en) * 2017-03-28 2017-09-08 百度在线网络技术(北京)有限公司 Article generation method and device, equipment and computer-readable recording medium based on artificial intelligence
CN106980683A (en) * 2017-03-30 2017-07-25 中国科学技术大学苏州研究院 Blog text snippet generation method based on deep learning
CN107193792A (en) * 2017-05-18 2017-09-22 北京百度网讯科技有限公司 The method and apparatus of generation article based on artificial intelligence

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于主题模型的多文档自动文摘方法研究;李庆丰;《中国优秀硕士学位论文全文数据库(电子期刊)》;20130915(第9期);全文 *

Also Published As

Publication number Publication date
CN107977472A (en) 2018-05-01

Similar Documents

Publication Publication Date Title
CN111694924B (en) Event extraction method and system
AU2020103654A4 (en) Method for intelligent construction of place name annotated corpus based on interactive and iterative learning
CN110110054B (en) Method for acquiring question-answer pairs from unstructured text based on deep learning
CN110597997B (en) Military scenario text event extraction corpus iterative construction method and device
CN109918640B (en) Chinese text proofreading method based on knowledge graph
CN110532554A (en) A kind of Chinese abstraction generating method, system and storage medium
CN111723295B (en) Content distribution method, device and storage medium
CN112883171B (en) Document keyword extraction method and device based on BERT model
CN116805001A (en) Intelligent question-answering system and method suitable for vertical field and application of intelligent question-answering system and method
CN110705272A (en) Named entity identification method for automobile engine fault diagnosis
CN108829823A (en) A kind of file classification method
CN115906815B (en) Error correction method and device for modifying one or more types of error sentences
CN106383814A (en) Word segmentation method of English social media short text
CN112906393A (en) Meta learning-based few-sample entity identification method
CN112163089A (en) Military high-technology text classification method and system fusing named entity recognition
CN107977472B (en) Method for automatically generating house property news articles
Hou et al. Inverse is better! fast and accurate prompt for few-shot slot tagging
CN115408495A (en) Social text enhancement method and system based on multi-modal retrieval and keyword extraction
Li et al. Abstractive text summarization with multi-head attention
Yu et al. IDCNN-CRF-based domain named entity recognition method
Du et al. Named entity recognition method with word position
CN115238691A (en) Knowledge fusion based embedded multi-intention recognition and slot filling model
Xue et al. A method of chinese tourism named entity recognition based on bblc model
CN111104520B (en) Personage entity linking method based on personage identity
CN116166768A (en) Text knowledge extraction method and system based on rules

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB03 Change of inventor or designer information
CB03 Change of inventor or designer information

Inventor after: Li Zuochao

Inventor after: Bai Junfeng

Inventor after: Zhang Wenzhan

Inventor after: Liu Ziyao

Inventor after: Su Weijie

Inventor before: Bai Junfeng

Inventor before: Zhang Wenzhan

Inventor before: Liu Ziyao

Inventor before: Su Weijie

GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address
CP03 Change of name, title or address

Address after: Floor 20, Building 6, Smart Valley Park, the Taihu Lake Software Industrial Park, No. 1421, Wuzhong Avenue, Yuexi Street, Suzhou Economic Development Zone, Jiangsu Province, 215000

Patentee after: Zhuge Qihang (Suzhou) Technology Co.,Ltd.

Address before: No. 506, Xingang center, No. 16, Jiuxianqiao Road, Jiangtai Township, Chaoyang District, Beijing 100015

Patentee before: BEIJING ZHUGE ZHAOFANG INFORMATION TECHNOLOGY Co.,Ltd.