KR20170073424A - Method of data analysis for reputation management system using web crawling - Google Patents

Method of data analysis for reputation management system using web crawling Download PDF

Info

Publication number
KR20170073424A
KR20170073424A KR1020150182394A KR20150182394A KR20170073424A KR 20170073424 A KR20170073424 A KR 20170073424A KR 1020150182394 A KR1020150182394 A KR 1020150182394A KR 20150182394 A KR20150182394 A KR 20150182394A KR 20170073424 A KR20170073424 A KR 20170073424A
Authority
KR
South Korea
Prior art keywords
negative
morpheme
reputation management
qualities
web
Prior art date
Application number
KR1020150182394A
Other languages
Korean (ko)
Inventor
김소라
Original Assignee
김소라
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 김소라 filed Critical 김소라
Priority to KR1020150182394A priority Critical patent/KR20170073424A/en
Publication of KR20170073424A publication Critical patent/KR20170073424A/en

Links

Images

Classifications

    • G06F17/30707
    • G06F17/2735
    • G06F17/2755
    • G06F17/30539
    • G06F17/30572
    • G06F17/30864
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2216/00Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
    • G06F2216/03Data mining

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention relates to a data analysis method for a reputation management system using a web crawl, and more particularly, to a method for analyzing data for a reputation management system using web crawling, The data analysis apparatus comprising: receiving from the web server data crawled by the web using the specific search word; Extracting linguistic features by separating the data into sentences and performing language processing on each word in the separated sentence; Determining whether the extracted verbal qualities are positive or negative from the extracted verbal qualities by referring to the negative morpheme dictionary for each morpheme extracted from the extracted vernacular qualities and comparing the verbal qualities matched with the negative morpheme in one sentence to a sentence Calculating a false-positive decision index by calculating the attribute priority score of the negative morpheme matched with the number existing in the negative morpheme with a probability according to the weight; determining that the index is negative if the index is equal to or greater than a predetermined value, and storing the negative DB in the storage DB; Sorting and sorting according to the domain address if it is an unauthorized opinion, and delivering it to the reputation management server; And inputting the result stored in the storage DB to the reputation management server to generate a report form for the specific search word. The data analysis method for the reputation management system using the web crawl.

Description

BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a data analysis method for reputation management systems using web crawling,

The present invention aims at constructing a reputation management system through data analysis and data mining using web crawling.

It means to analyze the data in reputation management technology through data analysis to determine the tendency of opinions on specific names and to establish a strategy. However, in the past, the number of affirmative and negative elements was counted for each sentence to judge whether the sentence or paragraph was affirmative or not based on the elements having a larger number. However, for example, the sentence "The camera is pretty, the color is good and the function is heavy, but it is heavy" has three affirmative elements and one negative factor, but it can be seen that it is a negative sentence. It is judged as a sentence and an error occurs.

Accordingly, the present invention provides a method for more accurate data analysis by dividing sentences into words and sentence morphemes in order to provide a method of refining data analysis techniques.

Heo Jeong and 6 others, KIPS Tr. Software and Data Eng. Vol. 3, No. 12, pp. 553-564, "Automatic Generation of Issue Analysis Report Based on Social Big Data Mining" Chi-Hwan Choi et al., International Journal of Smart home Vol.7, No. 5 (2013). Pp. 291-304, " Voice of Customer Analysis for Internet Shopping Malls &

The present invention is based on the fact that, in the emotion analysis through the conventional data analysis, the machine learning algorithm judges only the number of attributes of the positive and negative extremes, The method of analysis is more detailed by the method of calculating.

The present invention relates to a data analysis method for a reputation management system using web crawling, and more particularly, to a method for analyzing data for a reputation management system using web crawling, The data analysis apparatus comprising: receiving from the web server data crawled by the web using the specific search word; Extracting linguistic features by separating the data into sentences and performing language processing on each word in the separated sentence; Determining whether the extracted verbal qualities are positive or negative from the extracted verbal qualities by referring to the negative morpheme dictionary for each morpheme extracted from the extracted vernacular qualities and comparing the verbal qualities matched with the negative morpheme in one sentence to one sentence Calculating a false-positive decision index by calculating an attribute priority score of a negative morpheme matching a number existing in the number of false negative morpheme with a probability according to a weight; determining that the index is negative if the index is equal to or greater than a predetermined value, and storing the negative DB in the storage DB; Sorting and sorting according to the domain address if it is an unauthorized opinion, and delivering it to the reputation management server; And inputting the result stored in the storage DB to the reputation management server to generate a report form for a specific search word. The present invention provides a data analysis method for a reputation management system using a web crawl.

The present invention utilizes a negative morpheme dictionary to calculate an unfixed probability and perform accurate data analysis, thereby systematizing reputation management services.

1 is a flowchart of a data analysis method for a reputation management system using web crawling according to the present invention.
2 is a configuration diagram of a flat plate management system according to the present invention.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, a hash value verification method for retrieving outgoing data according to the present invention will be described in detail with reference to the accompanying drawings.

According to another aspect of the present invention, there is provided a method of analyzing data for a reputation management system using web crawling, the method comprising: receiving a reputation management request for a specific query from a client; The data analysis apparatus comprising: receiving from the web server data crawled by the web using the specific search word; Extracting linguistic features by separating the data into sentences and performing language processing on each word in the separated sentence; Determining whether the extracted verbal qualities are positive or negative from the extracted verbal qualities by referring to the negative morpheme dictionary for each morpheme extracted from the extracted vernacular qualities and comparing the verbal qualities matched with the negative morpheme in one sentence to a sentence Calculating a false-positive decision index by calculating the attribute priority score of the negative morpheme matched with the number existing in the negative morpheme with a probability according to the weight; determining that the index is negative if the index is equal to or greater than a predetermined value, and storing the negative DB in the storage DB; Sorting and sorting according to the domain address if it is an unauthorized opinion, and delivering it to the reputation management server; And inputting a result stored in the storage DB to the flat management server for a specific search word to generate a report form.

FIG. 1 is a flowchart of a data analysis method for a flat management system using web crawling according to the present invention, and FIG. 2 shows a reputation management system according to the present invention.

First, referring to FIG. 1, when a client receives a reputation management request for a specific query (S102), the data analysis apparatus receives data crawled by the web using a specific query word from the web server. (S103)

More specifically, the web crawling used in step S103 is a process of copying and fetching contents from a predetermined web site using a specific software.

After step S103, the data analyzing apparatus separates data on a sentence-by-sentence basis and performs language processing on each word in the separated sentence to extract linguistic qualities. (S104)

For example, in the sentence "Samsung Computer bought last month but it seems to be slowing down already", the extraction of the linguistic qualities such as computer, speed, .

Then, referring to the negative morpheme dictionary for each morpheme, the number of the linguistic qualities matched with the negative morpheme and the attribute priority score of the matching negative morpheme are calculated as a probability according to the weight, and an irregular decision index is calculated . (S105)

The negative morpheme dictionary is constructed so that the morpheme can be continuously increased by performing the real-time update, and the morpheme of negative is stored in the database. Priority levels can be classified as 'bad, bad, worst', and 'not bad, just like that'. To give a priority score.

In addition, we can dynamically change the weights by determining the number of linguistic qualities or the importance of the level according to the client's request.

In another embodiment, the weights can be set differently according to whether the type of a specific search word is a company name, an individual, or a product name.

If the Index is equal to or more than a predetermined value, it is determined to be a negative opinion and stored in the storage DB (S106), sorted according to the domain address, and transmitted to the reputation management server in a report form for use. (S107)

     In one embodiment, the reference schedule index is set by a client or is determined by a reputation management server according to a reference value set according to a specific search word.

Strategies for building reputation management strategies can be used in a variety of ways, not shown.

2 is an organization chart showing a flat management system according to the present invention.

The components include a client 11, a web server 12, a data analysis device 13, and a reputation management server 14.

The client 11 requests the reputation management system 14 for a reputation management service for a particular query or event and provides information that can determine the importance of the number or level of verbal qualities along with the request.

Upon receiving the request from the client, the data analysis apparatus 13 performs web crawling of a specific search word from the web server 12 and receives contents included in the web page. Then, the data are separated by sentence unit and positive / negative element determination is performed for each separated sentence. Specifically, it extracts linguistic qualities by performing linguistic processing from each word in a separate sentence, compares the elements of the negative morpheme with morpheme, and finds the linguistic qualities . At the same time, the attribute priority score of the matching negative morpheme is calculated as a probability according to the weight, and a false negative decision index is calculated. If the decision index is greater than or equal to a fixed value, it is determined that it is a negative opinion and stored in the storage DB, sorted according to the domain address, and transmitted to the reputation management server.

The negative morpheme dictionary is constructed so that the morpheme can be continuously increased by performing the real-time update, and the negative morpheme is stored in the database. Priority levels can be classified as 'bad, bad, worst', and 'not bad, just like that'. To give a priority score.

In addition, the number of linguistic qualities or importance of the level is determined according to the information included in the request of the client, so that the weight can be changed dynamically.

The data analyzing unit 13 can set weights differently according to whether the type of a specific search word is a company name, an individual, or a product name. When the Index is equal to or greater than a fixed value, it is determined that the opinion is negative, It can be transferred to the reputation management server in the form of a report so that it can be used.

     In one embodiment, the reference schedule index is set by a client or is determined by a reputation management server according to a reference value set according to a specific search word.

The reputation management server 14 is characterized in that the report information is provided to the client company, or used for establishing a reputation management strategy, or inserted as an input value into the reputation management system to generate a strategy.

It will be apparent to those skilled in the art that various modifications and variations can be made in the present invention without departing from the spirit or scope of the invention. The present invention is not limited to the drawings.

Claims (7)

In a data analysis method for a reputation management system using web crawling,
<A> Receiving a reputation management request for a specific query from a client
<B> The data analysis apparatus receives data crawled by the web using the specific search word from a web server
&Lt; C > Extracting the linguistic qualities by separating the data by sentences and performing language processing for each word in the separated sentence
<D> A step of determining whether the extracted verbal qualities are affirmative / negative verbs, referring to the negative morpheme dictionary for each morpheme with respect to the extracted vernacular qualities, A step of calculating an irregular decision index by calculating a property priority score of an irregular morpheme matching the number of qualities in a sentence with a probability according to a weight
&Lt; E > If the index is greater than or equal to a fixed value,
&Lt; F > In case of negative feedback, sorting is performed according to the domain address, and classification and delivery to the reputation management server
&Lt; G > A method for analyzing data for a reputation management system using web crawling, comprising: inputting a result stored in the save DB to the reputation management server for the specific search word to generate a report form.
The method according to claim 1, wherein the web crawling is an operation of copying and fetching contents of the site from a predetermined web site when specific software is activated.
The method of claim 1, wherein the verbal qualities are elements including subjective opinions to be used for emotional analysis.
The method according to claim 1, wherein the negative morpheme dictionary is a database in which a morpheme used as a meaning of negation is stored in a database.
2. The reputation management system using web crawling according to claim 1, wherein the attribute priority of the negative morpheme is classified into levels 1, 2, 3, and 4, and is previously specified for each morpheme together with the negative morpheme dictionary. A method for analyzing data.
2. The reputation management system according to claim 1, wherein the weight is set according to the type of the specific search word, and is set to be different depending on whether a specific search word is a company name, a product name, or an individual. Data analysis method.
The method according to claim 1, wherein the predetermined index is set by a requesting client.
KR1020150182394A 2015-12-19 2015-12-19 Method of data analysis for reputation management system using web crawling KR20170073424A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
KR1020150182394A KR20170073424A (en) 2015-12-19 2015-12-19 Method of data analysis for reputation management system using web crawling

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
KR1020150182394A KR20170073424A (en) 2015-12-19 2015-12-19 Method of data analysis for reputation management system using web crawling

Publications (1)

Publication Number Publication Date
KR20170073424A true KR20170073424A (en) 2017-06-28

Family

ID=59280936

Family Applications (1)

Application Number Title Priority Date Filing Date
KR1020150182394A KR20170073424A (en) 2015-12-19 2015-12-19 Method of data analysis for reputation management system using web crawling

Country Status (1)

Country Link
KR (1) KR20170073424A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019164119A1 (en) * 2018-02-22 2019-08-29 삼성전자주식회사 Electronic device and control method therefor
KR102218358B1 (en) * 2020-08-19 2021-02-22 포항공과대학교 산학협력단 System, method of verifying malicious comment, and computer readable medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019164119A1 (en) * 2018-02-22 2019-08-29 삼성전자주식회사 Electronic device and control method therefor
US11544469B2 (en) 2018-02-22 2023-01-03 Samsung Electronics Co., Ltd. Electronic apparatus and control method thereof
KR102218358B1 (en) * 2020-08-19 2021-02-22 포항공과대학교 산학협력단 System, method of verifying malicious comment, and computer readable medium

Similar Documents

Publication Publication Date Title
US10452694B2 (en) Information extraction from question and answer websites
US9070088B1 (en) Determining trustworthiness and compatibility of a person
CN105893533B (en) Text matching method and device
Silva et al. Building a sentiment lexicon for social judgement mining
US8983977B2 (en) Question answering device, question answering method, and question answering program
US8977573B2 (en) System and method for identifying customers in social media
CN109145216A (en) Network public-opinion monitoring method, device and storage medium
US10860565B2 (en) Database update and analytics system
WO2019037258A1 (en) Information recommendation method, device and system, and computer-readable storage medium
US10152478B2 (en) Apparatus, system and method for string disambiguation and entity ranking
US20200134264A1 (en) Method for Updating a Knowledge Base of a Sentiment Analysis System
CN111079029B (en) Sensitive account detection method, storage medium and computer equipment
US20210151038A1 (en) Methods and systems for automatic generation and convergence of keywords and/or keyphrases from a media
WO2015084757A1 (en) Systems and methods for processing data stored in a database
US9811592B1 (en) Query modification based on textual resource context
Mangal et al. A Framework for Detection and Validation of Fake News via authorize source matching
CN110209804B (en) Target corpus determining method and device, storage medium and electronic device
JP4879775B2 (en) Dictionary creation method
KR20170073424A (en) Method of data analysis for reputation management system using web crawling
US20230186212A1 (en) System, method, electronic device, and storage medium for identifying risk event based on social information
US20170147679A1 (en) Query expansion system and method using language and language variants
US20230090601A1 (en) System and method for polarity analysis
Yin et al. Research of integrated algorithm establishment of a spam detection system
KR102180329B1 (en) System for determining fake news
KR101821777B1 (en) Automatic answering system for on-line bulletin board and method of the same