KR20170034206A - Apparatus and Method for Topic Category Classification of Social Media Text based on Cross-Media Analysis - Google Patents
Apparatus and Method for Topic Category Classification of Social Media Text based on Cross-Media Analysis Download PDFInfo
- Publication number
- KR20170034206A KR20170034206A KR1020150132590A KR20150132590A KR20170034206A KR 20170034206 A KR20170034206 A KR 20170034206A KR 1020150132590 A KR1020150132590 A KR 1020150132590A KR 20150132590 A KR20150132590 A KR 20150132590A KR 20170034206 A KR20170034206 A KR 20170034206A
- Authority
- KR
- South Korea
- Prior art keywords
- words
- word
- subject category
- word dictionary
- weight
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 51
- 238000004458 analytical method Methods 0.000 title claims description 28
- 239000013598 vector Substances 0.000 claims abstract description 60
- 238000013480 data collection Methods 0.000 claims abstract description 31
- 238000012706 support-vector machine Methods 0.000 claims description 11
- 230000008030 elimination Effects 0.000 claims description 5
- 238000003379 elimination reaction Methods 0.000 claims description 5
- 230000000877 morphologic effect Effects 0.000 claims description 3
- 238000004364 calculation method Methods 0.000 claims description 2
- 238000010586 diagram Methods 0.000 description 18
- 238000004422 calculation algorithm Methods 0.000 description 10
- 239000000284 extract Substances 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 238000004590 computer program Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 238000012549 training Methods 0.000 description 3
- 239000000470 constituent Substances 0.000 description 2
- 238000002372 labelling Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 230000035755 proliferation Effects 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- FDKXTQMXEQVLRF-ZHACJKMWSA-N (E)-dacarbazine Chemical compound CN(C)\N=N\c1[nH]cnc1C(N)=O FDKXTQMXEQVLRF-ZHACJKMWSA-N 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
Images
Classifications
-
- G06F17/30873—
-
- G06F17/2735—
-
- G06F17/277—
-
- G06F17/30705—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/01—Social networking
-
- G06Q50/30—
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Tourism & Hospitality (AREA)
- Human Resources & Organizations (AREA)
- Primary Health Care (AREA)
- Health & Medical Sciences (AREA)
- Economics (AREA)
- General Health & Medical Sciences (AREA)
- Operations Research (AREA)
- Marketing (AREA)
- Computing Systems (AREA)
- Strategic Management (AREA)
- Physics & Mathematics (AREA)
- General Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention relates to a method and apparatus for automatically classifying a subject category of text included in a web page or social media content created in the Internet.
To this end, the text subject category classification apparatus according to the present invention comprises: a data collection unit for receiving a plurality of documents classified in advance by theme category, selecting words in sentences contained in the document, The data collection unit receives the words collected by the subject category, calculates a weight for the input words, and includes a word dictionary that is present in the subject category among the input words based on the calculated weight A word dictionary generating unit for selecting a word for each of the subject categories and registering the selected word dictionary in each of the word dictionary and a classification target sentence, Words are selected, And a subject category classifier for generating a feature vector according to the weight of the selected words and determining the subject category of the classification target sentence based on the generated feature vector.
Description
The present invention relates to a method and apparatus for automatically classifying a subject category of text included in a web page or social media content created in the Internet.
Due to the proliferation of mobile devices, the number of web contents transmitted on the Internet has been rapidly increasing. The number of users of social network services such as Twitter and Facebook is gradually increasing globally so that the number of data such as texts and images transmitted from Internet devices input from mobile devices or computer devices owned by users It is increasing rapidly.
Such web data on the Internet contains useful information in that it contains information on the status or interests of a large number of people. In particular, in the case of web data transmitted from a social network service, it is useful for grasping the status or information of the user in the point that the data is generated and transmitted by each user. Further, the status of the group and the information It is also useful data.
Therefore, researches have been conducted to analyze data on social networks and extract information therefrom. For example, "Kwak H, Lee C, Park H, Moon S. What is Twitter, a social network or a news media? In: Proceedings of the 19th international conference on World wide web, 2010. ACM, pp 591-600" And analyzing Twitter data to analyze when and where users talk about what topic they are talking to.
However, these existing studies mainly focus on specific keywords or topics that are handled at specific times or places, and do not provide a means to analyze the overall subject category in social media.
(Patent Document 0001) Korean Patent Publication No. 10-1480711
The present invention has been made to solve the above-mentioned problems, and it is an object of the present invention to provide a method and apparatus for classifying a subject category of sentences generated on the Internet more reliably than existing sentence topic classification methods such as social media, And a text subject category classification apparatus, system, and method using a word dictionary for each subject category generated by using document data of different kinds of media.
In order to solve the above problems, a text subject category classification apparatus according to one type of the present invention receives a plurality of documents classified in advance by theme category, selects words in a sentence included in the document, A data collecting unit for collecting words, a data collecting unit for receiving the words collected by the subject category, calculating a weight for the input words, calculating a weight for the input words based on the calculated weight, A word dictionary generating unit for selecting a word dictionary to be included in a word dictionary existing for each subject category by the subject category and registering the word dictionary in each of the word dictionary and a classification target sentence, The words included in each of the word dictionary are selected for each category And a subject category classifying unit for generating a feature vector according to the weight of the selected words for each of the subject categories and determining the subject category of the classification subject sentence based on the generated feature vector.
Here, the data collecting unit may select a word to be input to the word dictionary generating unit from the sentence by performing a morphological analysis by removing a character string or a special character or numeric character composed of a predetermined number of characters or less from the sentence.
Here, the data collection unit may receive a news article, a newspaper article, or a magazine article document classified in advance by the subject category, as a plurality of the documents classified in advance by the subject category.
Here, the word dictionary generation unit may calculate a TF-IDF weight based on the sentence including the input word and information on the subject category, with respect to the words input from the data collection unit, And a first word dictionary generation unit for selecting a word to be included in the word dictionary from among the input words based on the IDF weight.
Here, the first word dictionary generating unit may generate the first word dictionary by adding the number of the input words appearing in the document, the number of the sentences including the input words to the document, and the number of the theme categories including the input word And the TF-IDF weight is calculated based on the TF-IDF weight.
Here, the word dictionary generation unit performs an LDA analysis on the words input from the data collection unit, calculates an LDA rank weight according to the analysis result, and calculates the weighted sum of the input words And a second word dictionary generation unit for selecting words to be included in the word dictionary.
Here, the second word dictionary generation unit may calculate a TF-IDF weight based on the sentence including the input word and information on the subject category, with respect to words input from the data collection unit, The LDA analysis is performed on the remaining words after the removal of the words, the LDA rank weight is calculated according to the analysis result, and the calculated TF-IDF weight is calculated And selects a word to be included in the word dictionary from the input words based on the LDA rank weight.
Here, the word dictionary generation unit may further include a duplicate word elimination unit for eliminating duplicated words among words included in the word dictionary for each subject category.
Here, the redundant word removal unit may remove the redundant word based on the TF-IDF weight of the redundant word or the occurrence frequency of the redundant word in the word dictionary, among the redundant words commonly included in the two or more word dictionary. Select the subject category to be removed, and remove the duplicate word from the word dictionary of the selected subject category.
Wherein the subject category classifier selects words included in each word dictionary of the subject category from words included in the classification subject sentence and calculates a value obtained by calculating each of the weight values of the selected words by the subject category A feature vector extractor configured to generate the feature vector by setting each element of the feature vector; And a classification unit that determines the subject category of the classification target sentence based on the generated feature vector.
Wherein the feature vector extractor sets a value obtained by summing each of the weight values of the selected words in each of the theme categories to each of the elements of the feature vector.
Wherein the classification unit determines the subject category corresponding to the element having the largest value among the elements of the feature vector as the subject category of the classification target sentence according to a maximum weight technique .
Wherein the classifier classifies the subject category of the classification target sentence based on the feature vector using a pre-learned classifier based on a support vector machine (SVM).
Here, the word dictionary creation unit may include a non-related word elimination unit for selecting words not related to the subject category from words included in the word dictionary for each subject category, and removing the selected words from the word dictionary .
Wherein the non-related word elimination unit is configured to classify the words included in each of the word dictionaries into the number of the words in the subject category, the number of the documents in which the words are included in the subject category, Clustering into a plurality of subsets based on the frequency of occurrence of the words in the document, selecting at least one or more non-related clusters based on the frequency of the clustering subsets, And removing the words from the word dictionary.
The text subject category classification apparatus may further include a word dictionary database for storing the word dictionary generated by the word dictionary generation unit.
In order to solve the above problems, a text subject category classification system according to one type of the present invention may include a service server.
Here, the service server may include a data collection unit for receiving a plurality of documents classified in advance by theme category, selecting words in a sentence included in the document, and collecting words by the theme category, The method of claim 1, further comprising the steps of: receiving words collected for each subject category, calculating a weight for the input words, calculating a word to be included in a word dictionary existing in the subject category among the input words based on the calculated weight, And a word dictionary generation unit for registering the selected word dictionary in each of the word dictionary.
Wherein the service server receives a classification target sentence and selects words included in each of the word dictionary for each of the subject categories from among words included in the classification target sentence, And a subject category classifier for generating the feature vector according to the feature vector and determining the subject category of the classification target sentence based on the generated feature vector.
Wherein the text subject category classification system comprises: a word dictionary database for storing the word dictionary generated by the word dictionary generation unit; And a terminal.
Wherein the terminal receives a classification target sentence and connects to the word dictionary database to select words included in each word dictionary for each subject category from words included in the classification subject sentence, And a subject category classifier for generating a feature vector according to the weight of the selected words and determining the subject category of the classification target sentence based on the generated feature vector.
According to an aspect of the present invention, there is provided a method of classifying a text subject category according to one aspect of the present invention, the method comprising: receiving, by a service server, a plurality of documents classified in advance by theme categories; A data collection step of collecting words according to a subject category, a service server calculating a weight for words collected by the subject category, calculating a weighting value for each of the collected words based on the calculated weight, A word dictionary creation step of selecting words to be included in the dictionary in accordance with the subject category and registering the words in each of the word dictionary; receiving a classification target sentence; receiving, from among the words included in the classification target sentence, The selected words are selected, And a subject category classification step of generating a feature vector according to the weight of the selected words and determining the subject category of the classification target sentence based on the generated feature vector.
The apparatus and method for classifying a text subject category according to the present invention is a method and apparatus for classifying a subject category of a sentence of a specific medium by using a document created from different media and different types of media, preferably newspapers, news, It is possible to generate a word dictionary more efficiently by using media classified by other preliminary topics without manually generating the word dictionary by manually classifying the data of the media targeted for classification by providing a configuration using word dictionaries for each category .
In particular, the apparatus and method for classifying text subject categories according to the present invention provide a structure for eliminating heterogeneity of data between disparate media, that is, words that are not included in any subject category semantically by using a clustering analysis method , It is possible to remove the heterogeneity of data generated by generating the word dictionary using the data of the heterogeneous media, and to classify the subject category of the classified sentence more reliably.
In addition, the apparatus and method for classifying text subject categories according to the present invention have the effect of reliably classifying the subject categories of sentences generated on the Internet, such as social network services, over existing sentence topic classification methods. Then, information on the interest or inclination of a specific user can be extracted using the subject category analysis result classified by sentence, or information on interest or inclination of users in a specific group or during a specific period can be extracted Number is effective.
1 is a block diagram of a text subject category classification apparatus according to an embodiment of the present invention.
2 is a block diagram of a text subject category classification apparatus according to another embodiment of the present invention.
3 is a block diagram of a text subject category classification apparatus according to another embodiment of the present invention.
4 is a detailed block diagram according to an embodiment of the word dictionary generator.
5 is a detailed block diagram of a word dictionary generation unit according to another embodiment of the present invention.
6 is a reference diagram for explaining the operation of non-related word removal.
7 is a detailed block diagram of the subject category classification section.
8 is a block diagram of a text subject category classification system in accordance with the present invention.
9 is a block diagram of a text subject category classification system in the case of another embodiment of the present invention.
10 is a flowchart of a text subject category classification method according to another embodiment of the present invention.
Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. In the drawings, the same reference numerals are used to designate the same or similar components throughout the drawings. In the following description of the present invention, a detailed description of known functions and configurations incorporated herein will be omitted when it may make the subject matter of the present invention rather unclear. In addition, the preferred embodiments of the present invention will be described below, but it is needless to say that the technical idea of the present invention is not limited thereto and can be variously modified by those skilled in the art.
Due to the proliferation of mobile devices, the number of web contents transmitted on the Internet has been rapidly increasing. The number of users of social network services such as Twitter and Facebook is gradually increasing globally so that the number of data such as texts and images transmitted from Internet devices input from mobile devices or computer devices owned by users It is increasing rapidly.
Such web data on the Internet contains useful information in that it contains information on the status or interests of a large number of people. In particular, in the case of web data transmitted from a social network service, it is useful for grasping the status or information of the user in the point that the data is generated and transmitted by each user. Further, the status of the group and the information It is also useful data.
Therefore, researches have been conducted to analyze data on social networks and extract information therefrom. For example, "Kwak H, Lee C, Park H, Moon S. What is Twitter, a social network or a news media? In: Proceedings of the 19th international conference on World wide web, 2010. ACM, pp 591-600" And analyzing Twitter data to analyze when and where users talk about what topic they are talking to.
However, these existing studies mainly focus on specific keywords or topics that are handled at specific times or places, and do not provide a means to analyze the overall subject category in social media.
Accordingly, the present invention analyzes a sentence included in web data, for example, sentences generated on a social media medium such as a Twitter, and determines which category of the predetermined subject category the generated sentences belong to, , System, and method.
In particular, in the apparatus and method for classifying a text subject category according to the present invention, in generating a word dictionary to be used for classifying a subject category of a classification target sentence, a document in which types are classified in advance by theme category such as newspaper, news, As a source of a word dictionary.
In order to classify the subject category of the sentence to receive the classification target sentence, it is necessary to analyze which subject category the words included in the sentence correspond to, and for this, a word dictionary storing the words of the subject category is required. However, constructing a word dictionary by manually labeling the words to be included in the word dictionary of each subject category is very time-consuming and labor-intensive.
Accordingly, the apparatus and method for classifying a text subject category according to the present invention can be classified into a newspaper, a news article, or a magazine document by taking into consideration the fact that documents included in a newspaper, a news, To generate a word dictionary for each subject category. The present invention provides a configuration for classifying the subject category of the classification target sentence by using the word dictionary thus generated. For example, the apparatus and method for classifying a text subject category according to the present invention may be arranged such that a newspaper or a news article classified by a subject category is inputted and analyzed to generate a word dictionary, The subject category of the sentence can be classified by receiving the sentence.
As described above, in the apparatus and method for classifying text subject category according to the present invention, in order to classify a subject category of a sentence of a specific medium, different kinds of media other than the corresponding media, for example, document data of newspapers, A word dictionary for each subject category is generated. By performing the subject category classification using the heterogeneous media, the text subject category classification apparatus according to the present invention does not classify the subject, passively classifies the data of the media to be classified and does not generate the word dictionary, The word dictionary can be generated more efficiently by using the classified media.
In addition, the apparatus and method for classifying text subject categories according to the present invention can analyze words included in a document classified by a subject category as described above, and generate words using a word dictionary or LDA method generated using the TF-IDF method Provides a configuration using a word dictionary. Here, in order to refine the words included in the word dictionary according to the subject category, it is necessary to remove words repeatedly appearing in various subject categories, and in particular, when a word dictionary is generated using the LDA method, a word having a small TF- And 'Stop Word'.
In particular, in order to remove heterogeneity of data between different media, the present invention is judged to be included in a word dictionary of a specific subject category among the registered words according to the above-mentioned process, but actually, We propose a method to remove words from word dictionary using clustering analysis method. According to the present invention, the text subject category classification apparatus and method according to the present invention can remove the heterogeneity of data generated by generating word dictionaries using data of different media, It has the effect of classifying the subject category of the sentence.
In addition, the apparatus and method for classifying a text subject category according to the present invention provide a structure for classifying a subject category corresponding to a classification target sentence using a classifier based on a word dictionary for each subject category generated through the above process. More specifically, the words included in the classification target sentence are searched in a word dictionary for each subject category, the weight of the found words is calculated for each word dictionary, and the subject category having the highest calculation value is determined as the subject category of the classification target sentence .
According to an embodiment of the present invention, a text subject category classification apparatus and method according to the present invention receives a classification subject sentence generated on the Internet, extracts a feature vector capable of performing more effective classification based on the word dictionary, The subject category of the sentence can be reliably determined. Then, it is possible to extract information on the interest or inclination of a specific user using the subject category analysis result determined for each sentence, or to extract information on the interest or inclination of the user in a specific group or during a specific period .
Hereinafter, a text subject category classification apparatus, a method thereof, and a system therefor according to the present invention will be described in detail.
First, a text subject category classification apparatus according to an embodiment of the present invention will be described below.
1 is a block diagram of a text subject category classification apparatus according to an embodiment of the present invention.
The text subject category classification apparatus according to the present invention may include a
Here, the text subject category classification apparatus according to the present invention may be embodied as a computer program having a program module that performs a part or all of the functions in combination with some or all of the constituent elements selectively combined in one or a plurality of hardware have. In addition, each component may be implemented as a single independent hardware or included in each hardware as needed. In addition, the text subject category classification apparatus according to the present invention may be implemented as a software program and operate on a processor or a signal processing module, or may be implemented in hardware form and included in various processors, chips, semiconductors, devices, . Further, the text subject category classification apparatus according to the present invention may be included in a form of hardware or software module on a computer, various embedded systems or devices. Preferably, the text subject category classification apparatus according to the present invention may be implemented in a server connected to a network or included in a server. Here, the
The
The word
The subject
2 is a block diagram of a text subject category classification apparatus according to another embodiment of the present invention.
As shown in FIG. 2, the text subject category classification apparatus according to the present invention can operate in connection with an external word dictionary database 50. At this time, the text subject category classifier may store the word dictionary generated by the
If necessary, the text subject category classification apparatus according to another embodiment of the present invention may include a word dictionary database 50 storing the word dictionary generated by the word
FIG. 3 is a block diagram of a text subject category classification apparatus according to another embodiment of the present invention.
Next, the operation of the
The
Here, the subject category is a plurality of categories classified in advance in order to classify the subject of the document, for example, 'politics', 'economy', 'culture', 'society', 'art', 'science' There may be pre-classified topic categories such as. The number and type of subject categories can be set by the user as needed. Here, the document means a set of at least one sentence, and may be a paragraph or a paragraph. Here, the document is divided into several sentences, which are analyzed as described in detail below, which is particularly useful in situations where the sentences within a certain length, typically used in social network services, It is for analysis. Here, a sentence refers to a set of one or more words, and refers to a string in which one or more words are gathered together irrespective of whether or not the sentence is syntactically complete, irrelevant to the grammatical right or wrong. Thus, one sentence may be a completed sentence such as 'I went to school', but it could be a string that is a set of grammatically incomplete words such as 'school attendance' and, in some cases, It may be an invalid character string such as 'school'. Here, the word means a set of at least one character defined as having a specific meaning in each language, and may be a set of specific characters defined by the user as needed. For example, a set of characters such as 'school' and 'school' may be words.
At this time, the
At this time, the
As described above, manually labeling words as a source for generating a word dictionary for each subject category and registering them in the word dictionary is a very time-consuming and labor-intensive task. Accordingly, in the present invention, the documents included in the newspaper, the news, or the magazine are effectively classified by the experts according to the subject, so that the
Next, the operation of the word
The word
Here, the weight is a number indicating the degree to which a particular word is associated with the subject category. Therefore, the word
Then, the
According to the operation of the clue
FIG. 4 is a detailed block diagram according to an embodiment of the word
Here, the word
The first word dictionary generation unit 210 generates a first word dictionary TF-IDF (Term Frequency-IDF) based on the sentence including the input word and information about the subject category, with respect to the words input from the
The TF-IDF weight can be calculated according to the TF-IDF algorithm proposed in "Joachims T, A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization, DTIC Document (1996)". Here, the TF-IDF weight may be a weight value set for each word of a specific document according to the extent to which it is more related to a particular document than to other documents.
In the present invention, the modified TF-IDF algorithm proposed by Joachims T in the present invention is modified to calculate the modified TF-IDF weight to be more suitable for classification of text subject category using heterogeneous media analysis as in the present invention, Select words to be included in the dictionary. The following is a TF-IDF weight calculated according to the modified TF-IDF algorithm suggested by the present invention.
Here, it is preferable that the first word dictionary generation unit 210 calculates the TF-IDF weight so that a word having a higher TF-IDF weight for the subject category appears more in the subject category than the other subject categories .
For this, the first word dictionary generation unit 210 generates the first word dictionary generation unit 210 based on the number of the input words appearing in the document, the number of the sentences including the input words appearing in the document, The TF-IDF weight can be calculated based on the number of categories.
More specifically, when it is assumed that the word w i is the i-th word included in the document d as a word collected for each document d for each document d in the
Here, the first word dictionary generation unit 210 preferably calculates the TF-IDF weight according to Equation (1).
Where S (i) is the TF-IDF weighting, and TF is a weight (w i, d) TF, SF (w i, d) is SF weight, and the (w i, d) IDF is IDF weight. Where w i is the ith word in document d, word w i , d is the number of words w i in document d, and word total, d is the total number of words in document d. Where sentence wi, d is the number of sentences containing the word w i in document d, sentence total, and d is the total number of sentences in document d. In addition, where wi is the category number of the category containing the word w i, category total is the total of the category.
Here, F TF , F SF , and F IDF can be functions as shown in Equation (2) below.
Where a, b, and c are parameters set to adjust the value of the weight. For example, a, b may be set to 1, and c may be set between 0 and 1.
The first word dictionary generation unit 210 selects words in each subject category based on the TF-IDF weight calculated as described above, and registers the selected words in the word dictionary corresponding to each subject category. At this time, some words are selectively selected and removed using the redundant
Next, the second word
Here, the second word
When the LDA analysis is performed, it is assumed that the words included in the word dictionary of one subject category are words included in one document, and the words included in the word dictionary of each subject category , And classifies each word dictionary into a plurality of topics. Then, the second word
The second word
Where W in is the weight of the nth word for the i th topic and the ratio of the nth word in the topic. In this case, W in is the sum of the frequencies of the corresponding words appearing in the topic, and can be a value obtained by dividing the frequency of each word. Here, Pi represents the ratio of the i-th topic among the topics. And W n is the LDA word weight of the nth word.
Next, the second word
The second word
The second word
In the case where some words are removed according to the TF-IDF weight, the second
At this time, the
5 is a detailed block diagram of the word
Here, the word
The redundant word remover 230 removes the redundant word among the words included in the word dictionary for each subject category. When the word
For this, the redundant word remover 230 removes the redundant word based on the TF-IDF weight in the subject category of the redundant word among the redundant words commonly included in the two or more word dictionary Select the subject category, and remove the duplicate word from the word dictionary of the selected subject category. Here, the redundant word remover 230 may remove redundant words from the word dictionary of the remaining subject categories while leaving redundant words only in the word dictionary of the subject category with the highest TF-IDF weight among the redundant words.
Alternatively, the redundant word remover 230 may select the subject category to remove the redundant word based on the frequency of occurrence of the redundant word in the word dictionary, among the redundant words commonly included in the two or more word dictionary And remove the duplicate word from the word dictionary of the selected subject category. Here, the redundant word remover 230 may remove redundant words from the word dictionary of the remaining subject categories while leaving redundant words only in the word dictionary of the subject category having a high frequency of occurrence of redundant words.
Preferably, the redundant word remover 230 removes redundant words based on the TF-IDF weight as described above. When there are two or more subject categories having the highest TF-IDF weight, It is desirable to remove redundant words based on the frequency of occurrence of the word.
Next, the non-related
Here, the non-related
Here, an EM (Expectation-Maximization) clustering algorithm can be used as an algorithm for clustering.
Next, the non-related
Preferably, the non-related word remover 240 may select a cluster including words having a TF weight value smaller than a predetermined reference value as the non-related cluster. For this purpose, representative values of TF weight values representing each cluster can be calculated for each cluster, and non-related clusters among the clusters can be selected based on the representative values. Here, the reference value may be a value that can be set as needed.
Next, the non-related
6 is a reference diagram for explaining the operation of the non-related
FIG. 6 is a graph showing a result of clustering the words included in the word dictionary corresponding to the 'political' subject category according to the above-described method, according to the non-related
In this way, the text subject category classification apparatus according to the present invention removes the heterogeneity of data generated by generating the word dictionary by using the data of the heterogeneous media, and more reliably removes the subject There is an effect of classifying categories.
Next, the operation of the subject
The subject
Here, the classified target sentence may be a sentence included in data transmitted from the Internet, and preferably a sentence generated in a social media. For example, a sentence to be classified can be a sentence occurring in a social network service such as Twitter or Facebook. The sentence in such a social network service is usually shorter in length than the sentences contained in other general texts, and thus has a characteristic that the number of words to be included is small. Therefore, it is said that the existing subject classification method, which is mainly applied when a plurality of words are included in the existing long sentence, is not suitable in the above case.
Accordingly, in order to more reliably classify the subject category with respect to a short sentence, the
More specifically, the
7 is a detailed block diagram of the subject
The characteristic
Here, the feature vector may be a vector having a number of elements corresponding to the number of subject categories, and the value of each element may be a value obtained by calculating a weight value of words of the classification target sentences included in the word dictionary of each subject category have.
In this case, the feature
For example, when the subject categories are seven, namely, "culture", "economy", "world", "politics", "science", " It is possible to search for words in the classification target sentence in a word dictionary existing for each subject category and add the weight of words included in the word dictionary for each subject category by subject category to obtain the sum of weights for each subject category. For example, for each subject category of 'culture', 'economy', 'world', 'politics', 'science', 'society', and 'sports', the sum of the weight of each included word is '4' (4, 6, 8, 3, 0, 2, 0) vector and a vector Likewise, feature vectors can be generated.
Next, the classifying unit 320 may determine the subject category of the classification target sentence based on the generated characteristic vector.
The classification unit 320 may determine the subject category corresponding to the element having the largest value among the elements of the feature vector as the subject category of the classification target sentence according to a maximum weight technique . That is, the subject category corresponding to the element having the maximum value among the element values of the feature vector can be determined as the subject category of the classification target sentence. For example, a feature vector extracted from a classification target sentence for each subject category of 'culture', 'economy', 'world', 'politics', 'science', 'society' The subject category "world" corresponding to the highest element value 8 when the number of the elements is 4, 6, 8, 3, 0, 2, 0 can be determined as the subject category of the classification target sentence.
Alternatively, the classifier 320 may classify the subject category of the classification subject sentence based on the feature vector, using a pre-learned classifier based on a support vector machine (SVM).
Here, the classifier based on the support vector machine can be set and applied according to the method proposed by " Suykens JA, Vandewalle J, Least squares support vector machine classifiers. Neural processing letters 9 (3): 293-300 (1999) ". Here, the learning of the classifier based on the SVM can be performed using training data in which the subject category is set for each feature vector in advance.
Here, in learning the support vector machine-based classifier, the classifier 320 may divide the unit of the training data by a sentence unit rather than the document unit, and perform learning on a sentence-by-sentence basis. Here, it is preferable that the classifying unit 320 learns the parameters of the support vector machine-based classifier using training data generated by dividing a document into a plurality of the documents classified by subject category in units of sentences. Here, for example, a news article, a newspaper article, or a magazine article document classified in advance by the theme category can be used.
Hereinafter, a text subject category classification system according to another embodiment of the present invention will be described.
The text subject category classification system according to the present invention may include a
Here, the
The
The word
The subject
In the text subject category classification system according to the present invention, the
FIG. 8 is a block diagram of a text subject category classification system according to another embodiment of the present invention in the case where the terminal 20 exists separately.
At this time, the
Here, the
In another embodiment, the
FIG. 9 is a block diagram of a text subject category classification system in the case of another embodiment including the word dictionary database 50 in the server device.
The
10 is a flowchart of a text subject category classification method according to another embodiment of the present invention.
The text subject category classification method according to the present invention may include a data collection step S100, a word dictionary generation step S200, and a subject category classification step S300. The text subject category classification method according to the present invention can operate in the same manner as the text subject category classification apparatus described with reference to FIGS. 1 to 7 above. The overlapping portions will be omitted and briefly described.
In the data collection step S100, the
In the word dictionary creation step S200, the
In the subject category classification step S300, a classification target sentence is input, words included in each of the word dictionary are selected for each of the subject categories from the words included in the classification target sentence, and the selected words A feature vector may be generated according to the weight, and the subject category of the classification target sentence may be determined based on the generated feature vector. Here, the operation of the subject category classification step (S300) may be performed in the
It is to be understood that the present invention is not limited to these embodiments, and all elements constituting the embodiment of the present invention described above are described as being combined or operated in one operation. That is, within the scope of the present invention, all of the components may be selectively coupled to one or more of them.
In addition, although all of the components may be implemented as one independent hardware, some or all of the components may be selectively combined to perform a part or all of the functions in one or a plurality of hardware. As shown in FIG. In addition, such a computer program may be stored in a computer readable medium such as a USB memory, a CD disk, a flash memory, etc., and read and executed by a computer to implement an embodiment of the present invention. As the recording medium of the computer program, a magnetic recording medium, an optical recording medium, a carrier wave medium, and the like can be included.
Furthermore, all terms including technical or scientific terms have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs, unless otherwise defined in the Detailed Description. Commonly used terms, such as predefined terms, should be interpreted to be consistent with the contextual meanings of the related art, and are not to be construed as ideal or overly formal, unless expressly defined to the contrary.
It will be apparent to those skilled in the art that various modifications, substitutions and substitutions are possible, without departing from the scope and spirit of the invention as disclosed in the accompanying claims. will be. Therefore, the embodiments disclosed in the present invention and the accompanying drawings are intended to illustrate and not to limit the technical spirit of the present invention, and the scope of the technical idea of the present invention is not limited by these embodiments and the accompanying drawings . The scope of protection of the present invention should be construed according to the following claims, and all technical ideas within the scope of equivalents should be construed as falling within the scope of the present invention.
10: Text subject category classification device
20:
50: Word dictionary database
100: Data collection unit
200: Word dictionary creation unit
210: First word dictionary generating unit
220: second word dictionary generation unit
230: Remove duplicate words
240: Non-related word removal
300: Subject category classification section
310: Feature vector extraction unit
320:
S100: Data collection phase
S200: Word dictionary creation step
S300: Subject category classification step
Claims (17)
The data collection unit receives the words collected by the subject category, calculates a weight for the input words, and calculates a weight of the word dictionary based on the calculated weight, A word dictionary generation unit for selecting words to be included in each of the subject categories and registering the words in each of the word dictionary; And
A classification target sentence is received, words included in each word dictionary are selected for each of the subject categories from the words included in the classification target sentence, and a feature vector is generated according to the weight of the selected words for each subject category And a subject category classification unit for determining the subject category of the classification target sentence based on the generated feature vector.
Wherein the data collection unit removes a character string or a special character or numeric character composed of a predetermined number of characters or less from the sentence and performs morphological analysis to select words to be input to the word dictionary creation unit from the sentence, Text subject category sorting device.
Wherein the data collection unit receives a news article, a newspaper article, or a magazine article document classified in advance by the subject category, as a plurality of the documents classified in advance by the subject category.
The TF-IDF weight calculating unit calculates a TF-IDF weight based on the sentence including the input word and the information about the subject category with respect to the words input from the data collecting unit, And a first word dictionary generation unit for selecting a word to be included in the word dictionary from the input words.
Wherein the first word dictionary generation unit generates the first word dictionary based on the number of the input words appearing in the document, the number of the sentences including the inputted words appearing in the document, and the number of the theme categories including the input word And the TF-IDF weight is calculated by the TF-IDF weighting unit.
The LDA rank weight calculation unit calculates an LDA rank weight according to the analysis result, and selects a word to be included in the word dictionary from among the input words based on the calculated LDA rank weight, And a second word dictionary generation unit for selecting the second word dictionary generation unit.
Wherein the TF-IDF weight calculating unit calculates a TF-IDF weight based on the sentence including the input word and the information about the subject category with respect to the words input from the data collecting unit, Removing the small words from the input words,
An LDA analysis is performed on the remaining words after the removal, an LDA rank weight is calculated according to the analysis result, and a word to be included in the word dictionary is selected from the input words based on the calculated LDA rank weight Wherein the text subject category classification apparatus comprises:
Further comprising a duplicate word removing unit for removing duplicate words from words included in the word dictionary for each subject category.
Wherein the redundant word remover removes the redundant word based on the TF-IDF weight of the redundant word or the frequency of occurrence of the redundant word in the word dictionary, among the redundant words commonly included in the two or more word dictionary And to remove the duplicate word from the word dictionary of the selected subject category.
The words included in each word dictionary of the subject category among the words included in the classification target sentence, and calculating values of the respective weights of the selected words by the subject category to each element of the feature vector A feature vector extracting unit configured to generate the feature vector; And
And a classifier for determining the subject category of the classification target sentence based on the generated characteristic vector.
Wherein the classification unit determines the subject category corresponding to the element having the largest value among the elements of the feature vector as the subject category of the classification target sentence according to a maximum weight technique. Text subject category sorting device.
Wherein the classifier classifies the subject category of the classification subject sentence based on the feature vector using a pre-learned classifier based on a support vector machine (SVM).
And a non-related word elimination unit for selecting words not related to the subject category from words included in the word dictionary for each subject category and removing the selected words from the word dictionary. Category classification device.
The non-related word elimination unit may include:
Wherein the number of occurrences of the word in the subject category, the number of the document in which the word is included in the subject category, the frequency of occurrence of the word in the document including the word, , Clustering into a plurality of subsets,
Selecting at least one unrelated cluster based on the frequency among the clustered subsets,
And removing words included in the non-related cluster from the word dictionary.
And a word dictionary database for storing the word dictionary generated by the word dictionary generating unit.
The service server,
A data collecting unit that receives a plurality of documents classified in advance by theme category, selects words in sentences included in the document, and collects words by the subject category; And
The data collection unit receives the words collected by the subject category, calculates a weight for the input words, and calculates a weight of the word dictionary based on the calculated weight, And a word dictionary generation unit that selects words to be included in each of the subject categories and registers the selected words in each of the word dictionaries.
The service server calculates a weight for words collected by the subject category and selects words to be included in the word dictionary existing in the subject category among the collected words on the basis of the calculated weight by the subject category A word dictionary creation step of registering in each of the word dictionary; And
A classification target sentence is received, words included in each word dictionary are selected for each of the subject categories from the words included in the classification target sentence, and a feature vector is generated according to the weight of the selected words for each subject category And judging the subject category of the classification target sentence based on the generated characteristic vector.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020150132590A KR101737887B1 (en) | 2015-09-18 | 2015-09-18 | Apparatus and Method for Topic Category Classification of Social Media Text based on Cross-Media Analysis |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020150132590A KR101737887B1 (en) | 2015-09-18 | 2015-09-18 | Apparatus and Method for Topic Category Classification of Social Media Text based on Cross-Media Analysis |
Publications (2)
Publication Number | Publication Date |
---|---|
KR20170034206A true KR20170034206A (en) | 2017-03-28 |
KR101737887B1 KR101737887B1 (en) | 2017-05-19 |
Family
ID=58495957
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
KR1020150132590A KR101737887B1 (en) | 2015-09-18 | 2015-09-18 | Apparatus and Method for Topic Category Classification of Social Media Text based on Cross-Media Analysis |
Country Status (1)
Country | Link |
---|---|
KR (1) | KR101737887B1 (en) |
Cited By (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108334610A (en) * | 2018-02-06 | 2018-07-27 | 北京神州泰岳软件股份有限公司 | A kind of newsletter archive sorting technique, device and server |
KR20180117458A (en) * | 2017-04-19 | 2018-10-29 | 아시아나아이디티 주식회사 | Method for automatic document classification using sentence classification and device thereof |
WO2019107646A1 (en) * | 2017-12-01 | 2019-06-06 | 상명대학교산학협력단 | Apparatus for analyzing web content consumption behavior, and method therefor |
CN110019782A (en) * | 2017-09-26 | 2019-07-16 | 北京京东尚科信息技术有限公司 | Method and apparatus for exporting text categories |
CN110209806A (en) * | 2018-06-05 | 2019-09-06 | 腾讯科技(深圳)有限公司 | File classification method, document sorting apparatus and computer readable storage medium |
KR102126911B1 (en) * | 2018-12-27 | 2020-07-07 | 서울대학교산학협력단 | Key player detection method in social media using KeyplayerRank |
CN111611379A (en) * | 2020-05-18 | 2020-09-01 | 深圳证券信息有限公司 | Text information classification method, device, equipment and readable storage medium |
KR20200109515A (en) | 2019-03-13 | 2020-09-23 | 주식회사 키즈브라운파트너스 | Education contents generating method using big data |
KR20200112353A (en) * | 2019-03-22 | 2020-10-05 | 주식회사 커넥트닷 | Method of analyzing relationships of words or documents by subject and device implementing the same |
CN111861596A (en) * | 2019-04-04 | 2020-10-30 | 北京京东尚科信息技术有限公司 | Text classification method and device |
KR102217213B1 (en) * | 2020-10-27 | 2021-02-18 | 장경애 | Service providing apparatus and method for managing contents based on deep learning |
KR20210056812A (en) | 2019-11-11 | 2021-05-20 | 한림대학교 산학협력단 | Apparatus, method and program for extracting research category of research literature using category feature lexicon each research category |
CN112836051A (en) * | 2021-02-19 | 2021-05-25 | 太极计算机股份有限公司 | Online self-learning court electronic file text classification method |
KR20210064620A (en) * | 2019-11-26 | 2021-06-03 | 주식회사 와이즈넛 | The informatization method for youtube video metadata for personal media production |
WO2021153321A1 (en) * | 2020-01-29 | 2021-08-05 | 株式会社インタラクティブソリューションズ | Conversation analysis system |
KR20210119041A (en) * | 2020-03-24 | 2021-10-05 | 경북대학교 산학협력단 | Device and Method for Cluster-based duplicate document removal |
KR102363958B1 (en) * | 2021-08-05 | 2022-02-16 | 재단법인차세대융합기술연구원 | Method, apparatus and program for analyzing customer perception based on double clustering |
KR102387665B1 (en) * | 2021-01-20 | 2022-04-15 | 연세대학교 산학협력단 | Disaster Information Screening System and Screen Metood to analyze disaster message information on social media using disaster weights |
KR20220096748A (en) * | 2020-12-31 | 2022-07-07 | 주식회사 포스코아이씨티 | System for Classifying Unstructured Contents Automatically |
WO2022150838A1 (en) * | 2021-01-08 | 2022-07-14 | Schlumberger Technology Corporation | Exploration and production document content and metadata scanner |
KR102472868B1 (en) * | 2022-08-10 | 2022-12-01 | 주식회사 플리더스 | Game information management server that can determine the genre and subject matter of a game based on review data collected from game testers and the operating method thereof |
KR20230053373A (en) * | 2021-10-14 | 2023-04-21 | 비큐리오 주식회사 | Deep neural network-based document analysis system and method, and computer program stored in recording media and media in which the program is stored |
CN117708324A (en) * | 2023-11-07 | 2024-03-15 | 山东睿芯半导体科技有限公司 | Text topic classification method, device, chip and terminal |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR101480711B1 (en) | 2008-09-29 | 2015-01-09 | 에스케이플래닛 주식회사 | A detecting system and a method for subject, a storage means, an information offering system, an information offering service server and an information offering method |
-
2015
- 2015-09-18 KR KR1020150132590A patent/KR101737887B1/en active IP Right Grant
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR101480711B1 (en) | 2008-09-29 | 2015-01-09 | 에스케이플래닛 주식회사 | A detecting system and a method for subject, a storage means, an information offering system, an information offering service server and an information offering method |
Cited By (30)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20180117458A (en) * | 2017-04-19 | 2018-10-29 | 아시아나아이디티 주식회사 | Method for automatic document classification using sentence classification and device thereof |
CN110019782A (en) * | 2017-09-26 | 2019-07-16 | 北京京东尚科信息技术有限公司 | Method and apparatus for exporting text categories |
WO2019107646A1 (en) * | 2017-12-01 | 2019-06-06 | 상명대학교산학협력단 | Apparatus for analyzing web content consumption behavior, and method therefor |
CN108334610A (en) * | 2018-02-06 | 2018-07-27 | 北京神州泰岳软件股份有限公司 | A kind of newsletter archive sorting technique, device and server |
CN110209806B (en) * | 2018-06-05 | 2023-09-12 | 腾讯科技(深圳)有限公司 | Text classification method, text classification device and computer readable storage medium |
CN110209806A (en) * | 2018-06-05 | 2019-09-06 | 腾讯科技(深圳)有限公司 | File classification method, document sorting apparatus and computer readable storage medium |
KR102126911B1 (en) * | 2018-12-27 | 2020-07-07 | 서울대학교산학협력단 | Key player detection method in social media using KeyplayerRank |
KR20200109515A (en) | 2019-03-13 | 2020-09-23 | 주식회사 키즈브라운파트너스 | Education contents generating method using big data |
KR20200112353A (en) * | 2019-03-22 | 2020-10-05 | 주식회사 커넥트닷 | Method of analyzing relationships of words or documents by subject and device implementing the same |
CN111861596B (en) * | 2019-04-04 | 2024-04-12 | 北京京东振世信息技术有限公司 | Text classification method and device |
CN111861596A (en) * | 2019-04-04 | 2020-10-30 | 北京京东尚科信息技术有限公司 | Text classification method and device |
KR20210056812A (en) | 2019-11-11 | 2021-05-20 | 한림대학교 산학협력단 | Apparatus, method and program for extracting research category of research literature using category feature lexicon each research category |
KR20210064620A (en) * | 2019-11-26 | 2021-06-03 | 주식회사 와이즈넛 | The informatization method for youtube video metadata for personal media production |
WO2021153321A1 (en) * | 2020-01-29 | 2021-08-05 | 株式会社インタラクティブソリューションズ | Conversation analysis system |
US11881212B2 (en) | 2020-01-29 | 2024-01-23 | Interactive Solutions Corp. | Conversation analysis system |
JP2021117475A (en) * | 2020-01-29 | 2021-08-10 | 株式会社インタラクティブソリューションズ | Conversation analysis system |
CN114080640B (en) * | 2020-01-29 | 2022-06-21 | 互动解决方案公司 | Dialogue analysis system |
CN114080640A (en) * | 2020-01-29 | 2022-02-22 | 互动解决方案公司 | Dialogue analysis system |
KR20210119041A (en) * | 2020-03-24 | 2021-10-05 | 경북대학교 산학협력단 | Device and Method for Cluster-based duplicate document removal |
CN111611379A (en) * | 2020-05-18 | 2020-09-01 | 深圳证券信息有限公司 | Text information classification method, device, equipment and readable storage medium |
KR102217213B1 (en) * | 2020-10-27 | 2021-02-18 | 장경애 | Service providing apparatus and method for managing contents based on deep learning |
KR20220096748A (en) * | 2020-12-31 | 2022-07-07 | 주식회사 포스코아이씨티 | System for Classifying Unstructured Contents Automatically |
WO2022150838A1 (en) * | 2021-01-08 | 2022-07-14 | Schlumberger Technology Corporation | Exploration and production document content and metadata scanner |
KR102387665B1 (en) * | 2021-01-20 | 2022-04-15 | 연세대학교 산학협력단 | Disaster Information Screening System and Screen Metood to analyze disaster message information on social media using disaster weights |
CN112836051A (en) * | 2021-02-19 | 2021-05-25 | 太极计算机股份有限公司 | Online self-learning court electronic file text classification method |
CN112836051B (en) * | 2021-02-19 | 2024-03-26 | 太极计算机股份有限公司 | Online self-learning court electronic file text classification method |
KR102363958B1 (en) * | 2021-08-05 | 2022-02-16 | 재단법인차세대융합기술연구원 | Method, apparatus and program for analyzing customer perception based on double clustering |
KR20230053373A (en) * | 2021-10-14 | 2023-04-21 | 비큐리오 주식회사 | Deep neural network-based document analysis system and method, and computer program stored in recording media and media in which the program is stored |
KR102472868B1 (en) * | 2022-08-10 | 2022-12-01 | 주식회사 플리더스 | Game information management server that can determine the genre and subject matter of a game based on review data collected from game testers and the operating method thereof |
CN117708324A (en) * | 2023-11-07 | 2024-03-15 | 山东睿芯半导体科技有限公司 | Text topic classification method, device, chip and terminal |
Also Published As
Publication number | Publication date |
---|---|
KR101737887B1 (en) | 2017-05-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR101737887B1 (en) | Apparatus and Method for Topic Category Classification of Social Media Text based on Cross-Media Analysis | |
Al-Radaideh et al. | A hybrid approach for arabic text summarization using domain knowledge and genetic algorithms | |
US11514235B2 (en) | Information extraction from open-ended schema-less tables | |
Jha et al. | Homs: Hindi opinion mining system | |
Barnaghi et al. | Text analysis and sentiment polarity on FIFA world cup 2014 tweets | |
CN111309916A (en) | Abstract extraction method and device, storage medium and electronic device | |
JP6420268B2 (en) | Image evaluation learning device, image evaluation device, image search device, image evaluation learning method, image evaluation method, image search method, and program | |
KR102376489B1 (en) | Text document cluster and topic generation apparatus and method thereof | |
KR20160066216A (en) | Method of detecting issue patten associated with user search word, server performing the same and storage medium storing the same | |
González et al. | Siamese hierarchical attention networks for extractive summarization | |
CN109446520B (en) | Data clustering method and device for constructing knowledge base | |
CN107092679B (en) | Feature word vector obtaining method and text classification method and device | |
CN115062135B (en) | Patent screening method and electronic equipment | |
Oliveira et al. | A concept-based ILP approach for multi-document summarization exploring centrality and position | |
Frick et al. | Fraunhofer SIT at CheckThat!-2023: Enhancing the Detection of Multimodal and Multigenre Check-Worthiness Using Optical Character Recognition and Model Souping. | |
Shin et al. | Content-based unsupervised fake news detection on Ukraine-Russia war | |
Kaur et al. | News classification using neural networks | |
Mesquita et al. | Extracting information networks from the blogosphere: State-of-the-art and challenges | |
Choudhury et al. | User sentiment detection: a YouTube use case | |
Hung et al. | Aafndl-an accurate fake information recognition model using deep learning for the vietnamese language | |
Smith et al. | Classification of text to subject using LDA | |
Frick et al. | Fraunhofer SIT at CheckThat! 2023: Mixing Single-Modal Classifiers to Estimate the Check-Worthiness of Multi-Modal Tweets | |
Galiotou et al. | On the effect of stemming algorithms on extractive summarization: a case study | |
Garcia et al. | Text Summarization and Temporal Learning Models Applied to Portuguese Fake News Detection in a Novel Brazilian Corpus Dataset | |
Touahri et al. | Opinion and sentiment polarity detection using supervised machine learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
A201 | Request for examination | ||
E902 | Notification of reason for refusal | ||
E701 | Decision to grant or registration of patent right |