KR101664711B1 - Keyword Extraction Method Using Period Weighting Value - Google Patents

Keyword Extraction Method Using Period Weighting Value Download PDF

Info

Publication number
KR101664711B1
KR101664711B1 KR1020150087212A KR20150087212A KR101664711B1 KR 101664711 B1 KR101664711 B1 KR 101664711B1 KR 1020150087212 A KR1020150087212 A KR 1020150087212A KR 20150087212 A KR20150087212 A KR 20150087212A KR 101664711 B1 KR101664711 B1 KR 101664711B1
Authority
KR
South Korea
Prior art keywords
period
words
extraction method
distribution curve
text data
Prior art date
Application number
KR1020150087212A
Other languages
Korean (ko)
Inventor
전채남
조인호
손기준
김찬우
김윤용
Original Assignee
(주) 더아이엠씨
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by (주) 더아이엠씨 filed Critical (주) 더아이엠씨
Priority to KR1020150087212A priority Critical patent/KR101664711B1/en
Application granted granted Critical
Publication of KR101664711B1 publication Critical patent/KR101664711B1/en

Links

Images

Classifications

    • G06F17/277
    • G06F17/2795

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

According to the present invention, a keyword extraction method using a period weighting value includes: (a) a step of classifying text data consisting of corpus according to a detail period; (b) a step of granting a period weighting value to a plurality of words included in the text data; and (c) a step of extracting a word more than a reference values among a plurality of words to which the period weighting value is granted. The present invention can effectively analyze a trend by selecting a keyword in consideration of the flow of time.

Description

{Keyword Extraction Method Using Period Weighting Value}

The present invention relates to a method for extracting key words from text data, and more particularly, to a key word extraction method for extracting key words in consideration of period weight.

Since the era of Web 2.0, the Internet has become more open and interactive, and users are sharing and disseminating experiences richly in a variety of ways. Typically, blogs, Twitter, Facebook, etc. occupy user online time.

There are many kinds of contents created in these services, such as moving pictures and photographs, but there are many types of data in a text form which is not particularly limited to a specific format or a limited range of contents. These texts provide important data for analyzing images of companies' brands, products, services, and so on.

Therefore, there are several methods for extracting information from text. For example, there are methods of extracting the frequency of words (keywords) in text, and methods of grasping the importance of text in documents.

However, these methods do not have enough consideration of duration or time. In other words, the frequency of a word is extracted at a certain point in time. In such a case, there is a problem that the portion where the frequency of the word changes according to the period can not be directly confirmed or confirmed.

Therefore, there is a need for a method for selecting a keyword in consideration of the frequency of words used in accordance with the change of the period.

Korean Patent Publication No. 10-2009-0083747

The keyword extraction method using period weight according to the present invention has an object to effectively analyze trends by selecting key words considering time flow.

The solution of the present invention is not limited to those mentioned above, and other solutions not mentioned can be clearly understood by those skilled in the art from the following description.

A key word extraction method using period weighting according to the present invention includes the steps of (a) classifying text data constituting a corpus according to a detailed period, (b) assigning a period weight to a plurality of words included in the text data, And (c) extracting a word having a reference value or more from among the plurality of words to which the term weighting is applied by the step (b).

In the step (b), the degree of uneven distribution over the entire period may be given as a period weight for the plurality of words.

In the step (b), period weighting may be given based on the absolute value of the degree of distortion of the distribution curve of the plurality of words.

And, the distortion of the distribution curve,

Figure 112015059385617-pat00001

Of the above-mentioned equation.

In the step (b), period weighting may be applied based on the absolute value of the kurtosis of the distribution curve of the plurality of words.

And the kurtosis of the distribution curve,

Figure 112015059385617-pat00002

Of the above-mentioned equation.

The key word extraction method using period weighting according to the present invention has the following effects.

First, since key words can be extracted in consideration of time rather than a predetermined time, there is an advantage that trends within a predetermined period can be analyzed precisely and effectively.

Secondly, it has the advantage of presenting important data for analyzing images of brands, products, and services of various companies.

The effects of the present invention are not limited to those mentioned above, and other effects not mentioned can be clearly understood by those skilled in the art from the following description.

FIG. 1 is a flowchart illustrating a key word extraction method according to an exemplary embodiment of the present invention. Referring to FIG.
2 is a graph showing a normal distribution curve.
3 is a graph showing a distribution curve in which the degree of distortion is smaller than zero.
4 is a graph showing a distribution curve in which the degree of distortion is larger than zero.
FIG. 5 is a graph showing a distribution curve having a kurtosis of greater than 3. FIG.
6 is a graph showing a distribution curve with a kurtosis less than 3. Fig.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, preferred embodiments of the present invention will be described with reference to the accompanying drawings. In describing the present embodiment, the same designations and the same reference numerals are used for the same components, and further description thereof will be omitted.

FIG. 1 is a flowchart illustrating a key word extraction method according to an exemplary embodiment of the present invention. Referring to FIG.

As shown in FIG. 1, a key word extraction method according to an embodiment of the present invention includes: (a) classifying text data constituting a corpus according to a detailed period; and And (c) extracting a word having a value greater than a reference value among a plurality of words to which period weighting is applied by the step (b).

That is, the present invention can select key words in consideration of the passage of time, and thereby can effectively analyze trends, interests, and the like within the period.

In the conventional key word derivation method, the frequency of words is considered. However, since the frequency of use varies with time, words need to be weighted.

For example, the frequency of major words varies by day / month / year, disappears, and in the case of buzzwords, the word or sentence may be used only at that time and rarely appear afterwards. In addition, it is difficult for the word or sentence to represent the whole period.

Therefore, it is very important to consider the term, that is, it is necessary to identify which words are changing with time, and to select key words according to the changes.

Further, the present invention can provide a selection criterion by presenting a criterion that can be selected automatically or from a calculated value, rather than manually selecting a word. In other words, by providing the index of the word, the criterion for selecting the keyword can be suggested.

Hereinafter, each of the above-described steps will be described in detail.

First, step (a) of classifying the text data constituting the corpus according to the detailed period is performed. The corpus refers to a set of data, and in the present embodiment refers to a set of text data of an online service such as the Web or SNS.

In this step, the text data existing on the online service is classified according to a predetermined criterion. That is, the text data is classified according to a predetermined unit period or the like. For example, the text data may be classified according to a month, or a period from a certain point of view to another point of time may be classified.

Next, a period weight is assigned to a plurality of words included in the text data.

In this step, in order to extract a meaningful word among a plurality of words included in the text data classified according to the period by the step (a), a period weight is given to each word.

In other words, generally used terms will generally have a distribution that does not show a large change in the frequency of use regardless of the period, and statistically excluded terms in such a distribution can be judged to have significance within the period .

Therefore, in this step, the uneven distribution over the entire period is given as a period weight for a plurality of words.

More specifically, in the case of this embodiment, when the frequency of use of the word according to the period is graphed, the period weight can be calculated on the basis of how far the curve is from the normal distribution curve. For this purpose, the present embodiment utilizes the skewness and kurtosis of the curve shown in the graph.

2 is a graph showing a normal distribution curve. As shown in FIG. 2, the normal distribution curve is formed symmetrically with respect to the center point. In other words, the closer to this distribution, the more commonly used words can be judged regardless of the period.

In the case of FIG. 3, the distribution is a distribution curve having a shape less than zero, and FIG. 4 is a distribution curve having a shape having a distribution greater than zero. In the case of FIGS. 3 and 4, the shape of the distribution is shifted to one side with respect to the central axis, and it can be determined that the word is used in a specific period.

In other words, the larger the magnitude, the more pronounced this tendency, and therefore the absolute value of the distribution curve can be used as the period weight.

In the present embodiment, the distortion as the period weight can be calculated by the following formula.

Figure 112015059385617-pat00003

That is, it is possible to calculate the absolute value of the degree of distortion by sequentially substituting corresponding values represented by the classified periods.

On the other hand, FIG. 5 shows a distribution curve having a kurtosis greater than 3, and FIG. 6 shows a distribution curve with a kurtosis less than 3. FIG. In the case of Figs. 5 and 6, the shape of the distribution is formed such that the highest point is higher or lower than the normal distribution curve.

That is, the kurtosis is an index indicating the degree of sharpness of the distribution curve, and it can be judged that the word is used intensively in a specific period. Therefore, a word with a high kurtosis can be regarded as a word that reflects important words or trends within that period.

On the other hand, the kurtosis of the normal distribution curve is 3, and thus the kurtosis of the distribution curve with the highest absolute value can be used as the period weight.

In the present embodiment, the kurtosis as the period weight can be calculated by the following formula.

Figure 112015059385617-pat00004

As described above, the step (b) is performed, and then the step (c) of extracting a word of a reference value or more out of a plurality of words to which the term weighting is applied is performed.

In this step, among the plurality of words to which the period weight is assigned, words longer than a certain reference value are extracted and selected as key words. In this case, the reference value may be a predetermined absolute reference value, but it may be based on a relative standard with other words.

As described above, according to the present invention, key words can be extracted in consideration of a time period rather than a predetermined time point. Therefore, it is possible to analyze trends within a predetermined period in a precise and effective manner. It is possible to present important data that can analyze the image of the user.

The embodiments and the accompanying drawings described in the present specification are merely illustrative of some of the technical ideas included in the present invention. Accordingly, the embodiments disclosed herein are for the purpose of describing rather than limiting the technical spirit of the present invention, and it is apparent that the scope of the technical idea of the present invention is not limited by these embodiments. It will be understood by those of ordinary skill in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (6)

In a key word extraction method performed by a processor,
(A) classifying text data constituting a corpus according to a detailed period;
(B) assigning a period weight to a plurality of words included in the text data; And
(C) extracting a word having a reference value or more among a plurality of words to which period weighting is applied by the step (b);
/ RTI >
The step (b)
A key word extraction method for assigning a period weight to a plurality of words based on the absolute value of the degree of distortion of the distribution curve of the plurality of words so as to give a degree of uneven distribution over the entire period as a period weight value .
In a key word extraction method performed by a processor,
(A) classifying text data constituting a corpus according to a detailed period;
(B) assigning a period weight to a plurality of words included in the text data; And
(C) extracting a word having a reference value or more among a plurality of words to which period weighting is applied by the step (b);
/ RTI >
The step (b)
Wherein a term weighting value is given based on an absolute value of a kurtosis with respect to a distribution curve of the plurality of words so that a degree of uneven distribution over the entire period is given as a period weight for the plurality of words.
delete The method according to claim 1,
The distortion of the distribution curve is,
Figure 112016048654772-pat00005

The key word extraction method which is calculated by the formula of.
delete 3. The method of claim 2,
The kurtosis of the distribution curve,
Figure 112016048654772-pat00006

The key word extraction method which is calculated by the formula of.
KR1020150087212A 2015-06-19 2015-06-19 Keyword Extraction Method Using Period Weighting Value KR101664711B1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
KR1020150087212A KR101664711B1 (en) 2015-06-19 2015-06-19 Keyword Extraction Method Using Period Weighting Value

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
KR1020150087212A KR101664711B1 (en) 2015-06-19 2015-06-19 Keyword Extraction Method Using Period Weighting Value

Publications (1)

Publication Number Publication Date
KR101664711B1 true KR101664711B1 (en) 2016-10-10

Family

ID=57145634

Family Applications (1)

Application Number Title Priority Date Filing Date
KR1020150087212A KR101664711B1 (en) 2015-06-19 2015-06-19 Keyword Extraction Method Using Period Weighting Value

Country Status (1)

Country Link
KR (1) KR101664711B1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20190050180A (en) 2017-11-02 2019-05-10 서강대학교산학협력단 keyword extraction method and apparatus for science document

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100626817B1 (en) * 2005-05-13 2006-09-20 한국과학기술정보연구원 System for extracting words, system and method for management of words life cycle and medium for storing for program carrying out method of management of words life cycle
KR20090083747A (en) 2008-01-30 2009-08-04 삼성전자주식회사 User terminal and method for providing summery of web page
KR20090125559A (en) * 2008-06-02 2009-12-07 엔에이치엔(주) Method and system for providing search service using timeliness query
KR101401175B1 (en) * 2012-12-28 2014-05-29 성균관대학교산학협력단 Method and system for text mining using weighted term frequency
KR20150071833A (en) * 2013-12-19 2015-06-29 한국전자통신연구원 Processing Method For Social Media Issue and Server Device supporting the same

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100626817B1 (en) * 2005-05-13 2006-09-20 한국과학기술정보연구원 System for extracting words, system and method for management of words life cycle and medium for storing for program carrying out method of management of words life cycle
KR20090083747A (en) 2008-01-30 2009-08-04 삼성전자주식회사 User terminal and method for providing summery of web page
KR20090125559A (en) * 2008-06-02 2009-12-07 엔에이치엔(주) Method and system for providing search service using timeliness query
KR101401175B1 (en) * 2012-12-28 2014-05-29 성균관대학교산학협력단 Method and system for text mining using weighted term frequency
KR20150071833A (en) * 2013-12-19 2015-06-29 한국전자통신연구원 Processing Method For Social Media Issue and Server Device supporting the same

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20190050180A (en) 2017-11-02 2019-05-10 서강대학교산학협력단 keyword extraction method and apparatus for science document
KR102017227B1 (en) 2017-11-02 2019-09-02 서강대학교산학협력단 keyword extraction method and apparatus for science document

Similar Documents

Publication Publication Date Title
Finkelstein et al. App store analysis: Mining app stores for relationships between customer, business and technical characteristics
Amanatidis et al. Social media for cultural communication: A critical investigation of museums’ Instagram practices
US11416680B2 (en) Classifying social media inputs via parts-of-speech filtering
US9583099B2 (en) Method and system for performing term analysis in social data
WO2017085717A1 (en) System and method for presentation of content linked comments
JP6599319B2 (en) Use of social information to recommend applications
US10528223B2 (en) Photo narrative essay application
US10832142B2 (en) System, method, and recording medium for expert recommendation while composing messages
US20140074857A1 (en) Weighted ranking of video data
US10339559B2 (en) Associating social comments with individual assets used in a campaign
US10530889B2 (en) Identifying member profiles containing disallowed content in an online social network
KR101664711B1 (en) Keyword Extraction Method Using Period Weighting Value
US20120311421A1 (en) Server device and method
Bahrini et al. It’s Long and Complicated! Enhancing One-Pager Privacy Policies in Smart Home Applications
TWI575391B (en) Social data filtering system, method and non-transitory computer readable storage medium of the same
US10432572B2 (en) Content posting method and apparatus
CN105706409B (en) Method, device and system for enhancing user engagement with service
JP5813052B2 (en) Information processing apparatus, method, and program
CN104317581B (en) Display method and electronic equipment
CN110058992B (en) Text template effect feedback method and device and electronic equipment
JP6732472B2 (en) User information processing server and user information processing method
US20150142576A1 (en) Methods and mobile devices for displaying an adaptive advertisement object and systems for generating the adaptive advertisement object
US9152679B2 (en) Displaying recommended entities in a relevance map
US20160078002A1 (en) Representing numerical data in a mobile device
JP2016045552A (en) Feature extraction program, feature extraction method, and feature extraction device

Legal Events

Date Code Title Description
E701 Decision to grant or registration of patent right
GRNT Written decision to grant
FPAY Annual fee payment

Payment date: 20190905

Year of fee payment: 4