JP2018147411A5 - - Google Patents

Download PDF

Info

Publication number
JP2018147411A5
JP2018147411A5 JP2017044433A JP2017044433A JP2018147411A5 JP 2018147411 A5 JP2018147411 A5 JP 2018147411A5 JP 2017044433 A JP2017044433 A JP 2017044433A JP 2017044433 A JP2017044433 A JP 2017044433A JP 2018147411 A5 JP2018147411 A5 JP 2018147411A5
Authority
JP
Japan
Prior art keywords
sentence
posted
noun
short
data processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
JP2017044433A
Other languages
Japanese (ja)
Other versions
JP7078244B2 (en
JP2018147411A (en
Filing date
Publication date
Application filed filed Critical
Priority to JP2017044433A priority Critical patent/JP7078244B2/en
Priority claimed from JP2017044433A external-priority patent/JP7078244B2/en
Publication of JP2018147411A publication Critical patent/JP2018147411A/en
Publication of JP2018147411A5 publication Critical patent/JP2018147411A5/ja
Application granted granted Critical
Publication of JP7078244B2 publication Critical patent/JP7078244B2/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Description

本発明に係るデータ処理方法は、1又は複数のデータ処理装置を用いて短文を自動作成する方法であって、前記データ処理装置により、インターネットを介して収集した任意の事象に関する投稿文章群の各投稿文を解析し、前記投稿文に含まれる単語を出現頻度で順位付けする文章解析工程と、前記単語の順位データに基づいて前記投稿文章群に関する短文を作成する短文作成工程とを行う。
本発明のデータ処理方法では、前記文章解析工程で、前記投稿文毎に品詞分解する工程と、品詞分解された単語から名詞を抽出する工程と、抽出された名詞を出現頻度で順位付けする工程とを実施し、前記短文作成工程で前記名詞の順位データに基づいて前記短文を作成してもよい。
また、前記短文作成工程では、例えば、前記単語の順位データに基づき、前記投稿文章群の各投稿文に含まれる名詞の中から特定の内容を表す名詞を選定する工程と、選定された名詞を用いて短文を作成する工程を行うことができ、その場合、前記名詞を選定する工程において順位が上位の名詞を優先して選定すればよい。
Data processing method according to the present invention is a method for automatically creating an SMS using one or more data processing device, by said data processor, each of the posts sentence group for any events collected through the Internet A sentence analyzing step of analyzing a posted sentence and ranking the words included in the posted sentence by the frequency of appearance, and a short sentence creating step of creating a short sentence related to the posted sentence group based on the ranking data of the words are performed.
In the data processing method according to the present invention, in the sentence analysis step, a step of performing part-of-speech decomposition for each of the posted sentences, a step of extracting a noun from a part-of-speech-decomposed word, and a step of ranking the extracted nouns by appearance frequency And the short sentence may be created based on the rank data of the noun in the short sentence creating step.
In the short sentence creating step, for example, a step of selecting a noun representing a specific content from nouns included in each posted sentence of the posted sentence group based on the ranking data of the word, A short sentence can be created using the noun, and in that case, the noun in the higher rank in the step of selecting the noun may be preferentially selected.

[動作]
次に、本実施形態のデータ処理装置1の動作、即ち、データ処理装置1を用いて投稿文章群に関する短文を自動作成する方法について説明する。本実施形態のデータ処理方法は、1又は複数のデータ処理装置1により、文章解析工程と、短文作成工程とを行い、投稿文章群に関する短文を自動作成する。図2は本実施形態のデータ処理方法を示すフローチャートであり、図3は順位データの例である。
[motion]
Next, an operation of the data processing apparatus 1 of the present embodiment, that is, a method of automatically creating a short sentence related to a group of posted texts using the data processing apparatus 1 will be described. In the data processing method according to the present embodiment, a sentence analysis step and a short sentence creation step are performed by one or a plurality of data processing apparatuses 1 to automatically create a short sentence relating to a group of posted sentences. FIG. 2 is a flowchart illustrating a data processing method according to the present embodiment, and FIG. 3 is an example of rank data.

[文章解析工程]
文章解析工程は、主にデータ処理装置1の記事解析部2で行われ、インターネットを介して収集した任意の事象に関する投稿文章群の各投稿文を解析し、投稿文に含まれる単語を出現頻度で順位付けする。単語の抽出及び順位付けの方法は、特に限定されるものではないが、例えば、図2に示すように、投稿文毎に品詞分解した後(ステップS1)、品詞分解された単語から名詞を抽出し(ステップS2)、抽出された名詞を出現頻度で順位付けする(ステップS3)ことにより、実施することができる。
[Sentence analysis process]
The sentence analysis process is mainly performed by the article analysis unit 2 of the data processing device 1, analyzes each posted sentence of a posted sentence group regarding an arbitrary event collected via the Internet, and determines a word included in the posted sentence by an appearance frequency. Rank in. The method of extracting and ranking the words is not particularly limited. For example, as shown in FIG. 2, after the parts of speech are decomposed for each posted sentence (step S1), the nouns are extracted from the words that have been decomposed. (Step S2), and the extracted nouns are ranked according to the frequency of appearance (Step S3).

Claims (13)

インターネットを介して収集した任意の事象に関する投稿文章群の各投稿文を解析し、前記投稿文に含まれる単語を出現頻度で順位付けする文章解析部と、
前記単語の順位データに基づいて前記投稿文章群に関する短文を作成する短文作成部と、
を有するデータ処理装置。
A sentence analysis unit that analyzes each posted sentence of a posted sentence group regarding an arbitrary event collected via the Internet, and ranks words included in the posted sentence by appearance frequency,
A short sentence creating unit that creates a short sentence related to the post sentence group based on the ranking data of the words,
A data processing device having:
前記文章解析部は、
前記投稿文毎に品詞分解する品詞分解部と、
品詞分解された単語から名詞を抽出する名詞抽出部と、
抽出された名詞を出現頻度で順位付けする名詞カウント部と
を備え、
前記短文作成部は前記名詞の順位データに基づいて前記短文を作成する請求項1に記載のデータ処理装置。
The sentence analysis unit,
A part-of-speech decomposition unit that performs part-of-speech decomposition for each post sentence,
A noun extraction unit that extracts a noun from the part-of-speech word,
A noun counting unit that ranks the extracted nouns by appearance frequency,
The data processing device according to claim 1, wherein the short sentence creating unit creates the short sentence based on rank data of the noun.
前記短文作成部は、
前記単語の順位データに基づき、前記投稿文章群の各投稿文に含まれる名詞の中から特定の内容を表す名詞を選定する単語選定部と、
前記単語選定部で選定された名詞を用いて短文を作成する短文生成部と
を備え、
前記単語選定部は順位が上位の名詞を優先して選定する請求項1又は2に記載のデータ処理装置。
The short sentence creation unit,
A word selecting unit that selects a noun representing a specific content from nouns included in each posted sentence of the posted sentence group, based on the ranking data of the word;
A short sentence generation unit that creates a short sentence using the noun selected by the word selection unit,
The data processing device according to claim 1, wherein the word selection unit selects a noun having a higher rank in priority.
前記特定の内容を表す名詞は、地域を表す名詞及び事象を表す名詞である請求項3に記載のデータ処理装置。   The data processing device according to claim 3, wherein the noun indicating the specific content is a noun indicating an area and a noun indicating an event. 更に、前記単語の順位データを記憶する順位データ記憶部を有する請求項1〜4のいずれか1項に記載のデータ処理装置。   The data processing apparatus according to claim 1, further comprising a rank data storage unit that stores rank data of the word. 前記短文はタイトル又は見出しである請求項1〜5のいずれか1項に記載のデータ処理装置。   The data processing device according to claim 1, wherein the short sentence is a title or a headline. 1又は複数のデータ処理装置を用いて短文を自動作成する方法であって、
前記データ処理装置により、
インターネットを介して収集した任意の事象に関する投稿文章群の各投稿文を解析し、前記投稿文に含まれる単語を出現頻度で順位付けする文章解析工程と、
前記単語の順位データに基づいて前記投稿文章群に関する短文を作成する短文作成工程と、
を行うデータ処理方法。
A method for automatically creating a short sentence using one or more data processing devices,
With the data processing device,
A sentence analysis step of analyzing each posted sentence of the posted sentence group regarding any event collected via the Internet, and ranking the words included in the posted sentence by the frequency of appearance,
A short sentence creating step of creating a short sentence related to the post sentence group based on the ranking data of the words,
Data processing method to do.
前記文章解析工程は、
前記投稿文毎に品詞分解する工程と、
品詞分解された単語から名詞を抽出する工程と、
抽出された名詞を出現頻度で順位付けする工程と
を有し、
前記短文作成工程は前記名詞の順位データに基づいて前記短文を作成する請求項7に記載のデータ処理方法。
The sentence analysis step includes:
A step of decomposing part of speech for each post sentence,
Extracting a noun from the part-of-speech decomposed word;
Ranking the extracted nouns by appearance frequency,
8. The data processing method according to claim 7, wherein the short sentence creating step creates the short sentence based on rank data of the noun.
前記短文作成工程は、
前記単語の順位データに基づき、前記投稿文章群の各投稿文に含まれる名詞の中から特定の内容を表す名詞を選定する工程と、
選定された名詞を用いて短文を作成する工程と
を有し、
前記名詞を選定する工程では、順位が上位の名詞を優先して選定する請求項7又は8に記載のデータ処理方法。
The short sentence creation step includes:
A step of selecting a noun representing a specific content from nouns included in each posted sentence of the posted sentence group based on the ranking data of the words;
Creating a short sentence using the selected noun,
9. The data processing method according to claim 7, wherein in the step of selecting the noun, the noun having a higher rank is preferentially selected.
インターネットを介して収集した任意の事象に関する投稿文章群の各投稿文を解析し、前記投稿文に含まれる単語を出現頻度で順位付けする文章解析装置と、
前記単語の順位データに基づいて前記投稿文章群に関する短文を作成する短文作成装置と、
を有するデータ処理システム。
A sentence analyzing apparatus that analyzes each posted sentence of a posted sentence group related to an arbitrary event collected via the Internet, and ranks words included in the posted sentence by appearance frequency,
A short sentence creating apparatus that creates a short sentence related to the post sentence group based on the ranking data of the words,
A data processing system having:
インターネットを介して収集した複数の投稿文を分類し、事象毎に投稿文章群を作成する投稿文分類装置を備え、
前記文章解析装置は、前記投稿文分類装置で作成された投稿文章群を解析する請求項10に記載のデータ処理システム。
A posted sentence classification device that classifies a plurality of posted sentences collected via the Internet and creates a posted sentence group for each event,
The data processing system according to claim 10, wherein the sentence analysis device analyzes a posted sentence group created by the posted sentence classification device.
前記短文作成装置で作成された短文のデータを、前記投稿文章群のデータに付加し、外部配信する配信装置を有する請求項10又は11に記載のデータ処理システム。   The data processing system according to claim 10, further comprising a delivery device that adds short sentence data created by the short sentence creation device to the posted sentence group data and externally distributes the data. コンピュータに、
インターネットを介して収集した任意の事象に関する投稿文章群の各投稿文を解析し、前記投稿文に含まれる単語を出現頻度で順位付けする文章解析機能と、
前記単語の順位データに基づいて前記投稿文章群に関する短文を自動作成する短文作成機能と
を実行させるプログラム。
を実行させるプログラム。
On the computer,
A sentence analysis function that analyzes each posted sentence of a posted sentence group related to any event collected via the Internet, and ranks words included in the posted sentence by frequency of appearance,
A program for executing a short sentence creation function for automatically creating a short sentence related to the post sentence group based on the ranking data of the words.
A program that executes
JP2017044433A 2017-03-08 2017-03-08 Data processing equipment, data processing methods, data processing systems and programs Active JP7078244B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2017044433A JP7078244B2 (en) 2017-03-08 2017-03-08 Data processing equipment, data processing methods, data processing systems and programs

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2017044433A JP7078244B2 (en) 2017-03-08 2017-03-08 Data processing equipment, data processing methods, data processing systems and programs

Publications (3)

Publication Number Publication Date
JP2018147411A JP2018147411A (en) 2018-09-20
JP2018147411A5 true JP2018147411A5 (en) 2020-02-06
JP7078244B2 JP7078244B2 (en) 2022-05-31

Family

ID=63592194

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2017044433A Active JP7078244B2 (en) 2017-03-08 2017-03-08 Data processing equipment, data processing methods, data processing systems and programs

Country Status (1)

Country Link
JP (1) JP7078244B2 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP7333931B2 (en) * 2019-02-08 2023-08-28 憲一 坂 Post analysis system, post analysis device and post analysis method

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH10134066A (en) * 1996-10-29 1998-05-22 Matsushita Electric Ind Co Ltd Sentence summarizing up device
JP2002163277A (en) * 2000-11-28 2002-06-07 Auto Network Gijutsu Kenkyusho:Kk Document information supply system, information terminal unit and document information supply method
JP2005250648A (en) * 2004-03-02 2005-09-15 Fuji Xerox Co Ltd Article summarizing device and news distributing device
KR100810999B1 (en) * 2006-06-30 2008-03-11 엔에이치엔(주) On-line e mail service system, and service method thereof
CN101296128A (en) * 2007-04-24 2008-10-29 北京大学 Method for monitoring abnormal state of internet information
KR101196935B1 (en) * 2010-07-05 2012-11-05 엔에이치엔(주) Method and system for providing reprsentation words of real-time popular keyword
JP6225012B2 (en) * 2013-07-31 2017-11-01 日本電信電話株式会社 Utterance sentence generation apparatus, method and program thereof
JP2015103101A (en) * 2013-11-26 2015-06-04 日本電信電話株式会社 Text summarization device, method, and program
WO2016027364A1 (en) * 2014-08-22 2016-02-25 株式会社日立製作所 Topic cluster selection device, and search method
JP2016110213A (en) * 2014-12-02 2016-06-20 シャープ株式会社 Information processing device, information processing system, terminal device, information processing method, and information processing program

Similar Documents

Publication Publication Date Title
US9697477B2 (en) Non-factoid question-answering system and computer program
EP3016002A1 (en) Non-factoid question-and-answer system and method
US10366154B2 (en) Information processing device, information processing method, and computer program product
JP2006344211A5 (en)
JP6529761B2 (en) Topic providing system and conversation control terminal device
CN110321553A (en) Short text subject identifying method, device and computer readable storage medium
JP2006293767A (en) Sentence categorizing device, sentence categorizing method, and categorization dictionary creating device
Atagün et al. Topic modeling using LDA and BERT techniques: Teknofest example
CN109657181A (en) Internet information chain type storage method, device, computer equipment and storage medium
WO2022134779A1 (en) Method, apparatus and device for extracting character action related data, and storage medium
CN105956181A (en) Searching method and apparatus
JP2018147411A5 (en)
JP2006134183A (en) Information classification method, system and program, and storage medium with program stored
CN113919305A (en) Document generation method and device and computer readable storage medium
CN108415959B (en) Text classification method and device
JP2008021139A (en) Model construction apparatus for semantic tagging, semantic tagging apparatus, and computer program
JP2008065468A (en) Device for multiple-classifying text, method for multiple-classifying text, program and storage medium
JP7078244B2 (en) Data processing equipment, data processing methods, data processing systems and programs
JP5491446B2 (en) Topic word acquisition apparatus, method, and program
CN104978375B (en) A kind of language material filter method and device
KR102372629B1 (en) Triple Extraction method using Pointer Network and the extraction apparatus
JP7171352B2 (en) Workshop support system and workshop support method
KR20130113000A (en) Apparatus for language processing and method thereof
CN109284364B (en) Interactive vocabulary updating method and device for voice microphone-connecting interaction
CN111291186A (en) Context mining method and device based on clustering algorithm and electronic equipment