KR20170073424A - Method of data analysis for reputation management system using web crawling - Google Patents
Method of data analysis for reputation management system using web crawling Download PDFInfo
- Publication number
- KR20170073424A KR20170073424A KR1020150182394A KR20150182394A KR20170073424A KR 20170073424 A KR20170073424 A KR 20170073424A KR 1020150182394 A KR1020150182394 A KR 1020150182394A KR 20150182394 A KR20150182394 A KR 20150182394A KR 20170073424 A KR20170073424 A KR 20170073424A
- Authority
- KR
- South Korea
- Prior art keywords
- negative
- morpheme
- reputation management
- qualities
- web
- Prior art date
Links
Images
Classifications
-
- G06F17/30707—
-
- G06F17/2735—
-
- G06F17/2755—
-
- G06F17/30539—
-
- G06F17/30572—
-
- G06F17/30864—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2216/00—Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
- G06F2216/03—Data mining
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention relates to a data analysis method for a reputation management system using a web crawl, and more particularly, to a method for analyzing data for a reputation management system using web crawling, The data analysis apparatus comprising: receiving from the web server data crawled by the web using the specific search word; Extracting linguistic features by separating the data into sentences and performing language processing on each word in the separated sentence; Determining whether the extracted verbal qualities are positive or negative from the extracted verbal qualities by referring to the negative morpheme dictionary for each morpheme extracted from the extracted vernacular qualities and comparing the verbal qualities matched with the negative morpheme in one sentence to a sentence Calculating a false-positive decision index by calculating the attribute priority score of the negative morpheme matched with the number existing in the negative morpheme with a probability according to the weight; determining that the index is negative if the index is equal to or greater than a predetermined value, and storing the negative DB in the storage DB; Sorting and sorting according to the domain address if it is an unauthorized opinion, and delivering it to the reputation management server; And inputting the result stored in the storage DB to the reputation management server to generate a report form for the specific search word. The data analysis method for the reputation management system using the web crawl.
Description
The present invention aims at constructing a reputation management system through data analysis and data mining using web crawling.
It means to analyze the data in reputation management technology through data analysis to determine the tendency of opinions on specific names and to establish a strategy. However, in the past, the number of affirmative and negative elements was counted for each sentence to judge whether the sentence or paragraph was affirmative or not based on the elements having a larger number. However, for example, the sentence "The camera is pretty, the color is good and the function is heavy, but it is heavy" has three affirmative elements and one negative factor, but it can be seen that it is a negative sentence. It is judged as a sentence and an error occurs.
Accordingly, the present invention provides a method for more accurate data analysis by dividing sentences into words and sentence morphemes in order to provide a method of refining data analysis techniques.
The present invention is based on the fact that, in the emotion analysis through the conventional data analysis, the machine learning algorithm judges only the number of attributes of the positive and negative extremes, The method of analysis is more detailed by the method of calculating.
The present invention relates to a data analysis method for a reputation management system using web crawling, and more particularly, to a method for analyzing data for a reputation management system using web crawling, The data analysis apparatus comprising: receiving from the web server data crawled by the web using the specific search word; Extracting linguistic features by separating the data into sentences and performing language processing on each word in the separated sentence; Determining whether the extracted verbal qualities are positive or negative from the extracted verbal qualities by referring to the negative morpheme dictionary for each morpheme extracted from the extracted vernacular qualities and comparing the verbal qualities matched with the negative morpheme in one sentence to one sentence Calculating a false-positive decision index by calculating an attribute priority score of a negative morpheme matching a number existing in the number of false negative morpheme with a probability according to a weight; determining that the index is negative if the index is equal to or greater than a predetermined value, and storing the negative DB in the storage DB; Sorting and sorting according to the domain address if it is an unauthorized opinion, and delivering it to the reputation management server; And inputting the result stored in the storage DB to the reputation management server to generate a report form for a specific search word. The present invention provides a data analysis method for a reputation management system using a web crawl.
The present invention utilizes a negative morpheme dictionary to calculate an unfixed probability and perform accurate data analysis, thereby systematizing reputation management services.
1 is a flowchart of a data analysis method for a reputation management system using web crawling according to the present invention.
2 is a configuration diagram of a flat plate management system according to the present invention.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, a hash value verification method for retrieving outgoing data according to the present invention will be described in detail with reference to the accompanying drawings.
According to another aspect of the present invention, there is provided a method of analyzing data for a reputation management system using web crawling, the method comprising: receiving a reputation management request for a specific query from a client; The data analysis apparatus comprising: receiving from the web server data crawled by the web using the specific search word; Extracting linguistic features by separating the data into sentences and performing language processing on each word in the separated sentence; Determining whether the extracted verbal qualities are positive or negative from the extracted verbal qualities by referring to the negative morpheme dictionary for each morpheme extracted from the extracted vernacular qualities and comparing the verbal qualities matched with the negative morpheme in one sentence to a sentence Calculating a false-positive decision index by calculating the attribute priority score of the negative morpheme matched with the number existing in the negative morpheme with a probability according to the weight; determining that the index is negative if the index is equal to or greater than a predetermined value, and storing the negative DB in the storage DB; Sorting and sorting according to the domain address if it is an unauthorized opinion, and delivering it to the reputation management server; And inputting a result stored in the storage DB to the flat management server for a specific search word to generate a report form.
FIG. 1 is a flowchart of a data analysis method for a flat management system using web crawling according to the present invention, and FIG. 2 shows a reputation management system according to the present invention.
First, referring to FIG. 1, when a client receives a reputation management request for a specific query (S102), the data analysis apparatus receives data crawled by the web using a specific query word from the web server. (S103)
More specifically, the web crawling used in step S103 is a process of copying and fetching contents from a predetermined web site using a specific software.
After step S103, the data analyzing apparatus separates data on a sentence-by-sentence basis and performs language processing on each word in the separated sentence to extract linguistic qualities. (S104)
For example, in the sentence "Samsung Computer bought last month but it seems to be slowing down already", the extraction of the linguistic qualities such as computer, speed, .
Then, referring to the negative morpheme dictionary for each morpheme, the number of the linguistic qualities matched with the negative morpheme and the attribute priority score of the matching negative morpheme are calculated as a probability according to the weight, and an irregular decision index is calculated . (S105)
The negative morpheme dictionary is constructed so that the morpheme can be continuously increased by performing the real-time update, and the morpheme of negative is stored in the database. Priority levels can be classified as 'bad, bad, worst', and 'not bad, just like that'. To give a priority score.
In addition, we can dynamically change the weights by determining the number of linguistic qualities or the importance of the level according to the client's request.
In another embodiment, the weights can be set differently according to whether the type of a specific search word is a company name, an individual, or a product name.
If the Index is equal to or more than a predetermined value, it is determined to be a negative opinion and stored in the storage DB (S106), sorted according to the domain address, and transmitted to the reputation management server in a report form for use. (S107)
In one embodiment, the reference schedule index is set by a client or is determined by a reputation management server according to a reference value set according to a specific search word.
Strategies for building reputation management strategies can be used in a variety of ways, not shown.
2 is an organization chart showing a flat management system according to the present invention.
The components include a client 11, a web server 12, a data analysis device 13, and a reputation management server 14.
The client 11 requests the reputation management system 14 for a reputation management service for a particular query or event and provides information that can determine the importance of the number or level of verbal qualities along with the request.
Upon receiving the request from the client, the data analysis apparatus 13 performs web crawling of a specific search word from the web server 12 and receives contents included in the web page. Then, the data are separated by sentence unit and positive / negative element determination is performed for each separated sentence. Specifically, it extracts linguistic qualities by performing linguistic processing from each word in a separate sentence, compares the elements of the negative morpheme with morpheme, and finds the linguistic qualities . At the same time, the attribute priority score of the matching negative morpheme is calculated as a probability according to the weight, and a false negative decision index is calculated. If the decision index is greater than or equal to a fixed value, it is determined that it is a negative opinion and stored in the storage DB, sorted according to the domain address, and transmitted to the reputation management server.
The negative morpheme dictionary is constructed so that the morpheme can be continuously increased by performing the real-time update, and the negative morpheme is stored in the database. Priority levels can be classified as 'bad, bad, worst', and 'not bad, just like that'. To give a priority score.
In addition, the number of linguistic qualities or importance of the level is determined according to the information included in the request of the client, so that the weight can be changed dynamically.
The data analyzing unit 13 can set weights differently according to whether the type of a specific search word is a company name, an individual, or a product name. When the Index is equal to or greater than a fixed value, it is determined that the opinion is negative, It can be transferred to the reputation management server in the form of a report so that it can be used.
In one embodiment, the reference schedule index is set by a client or is determined by a reputation management server according to a reference value set according to a specific search word.
The reputation management server 14 is characterized in that the report information is provided to the client company, or used for establishing a reputation management strategy, or inserted as an input value into the reputation management system to generate a strategy.
It will be apparent to those skilled in the art that various modifications and variations can be made in the present invention without departing from the spirit or scope of the invention. The present invention is not limited to the drawings.
Claims (7)
<A> Receiving a reputation management request for a specific query from a client
<B> The data analysis apparatus receives data crawled by the web using the specific search word from a web server
≪ C > Extracting the linguistic qualities by separating the data by sentences and performing language processing for each word in the separated sentence
<D> A step of determining whether the extracted verbal qualities are affirmative / negative verbs, referring to the negative morpheme dictionary for each morpheme with respect to the extracted vernacular qualities, A step of calculating an irregular decision index by calculating a property priority score of an irregular morpheme matching the number of qualities in a sentence with a probability according to a weight
≪ E > If the index is greater than or equal to a fixed value,
≪ F > In case of negative feedback, sorting is performed according to the domain address, and classification and delivery to the reputation management server
≪ G > A method for analyzing data for a reputation management system using web crawling, comprising: inputting a result stored in the save DB to the reputation management server for the specific search word to generate a report form.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020150182394A KR20170073424A (en) | 2015-12-19 | 2015-12-19 | Method of data analysis for reputation management system using web crawling |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020150182394A KR20170073424A (en) | 2015-12-19 | 2015-12-19 | Method of data analysis for reputation management system using web crawling |
Publications (1)
Publication Number | Publication Date |
---|---|
KR20170073424A true KR20170073424A (en) | 2017-06-28 |
Family
ID=59280936
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
KR1020150182394A KR20170073424A (en) | 2015-12-19 | 2015-12-19 | Method of data analysis for reputation management system using web crawling |
Country Status (1)
Country | Link |
---|---|
KR (1) | KR20170073424A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019164119A1 (en) * | 2018-02-22 | 2019-08-29 | 삼성전자주식회사 | Electronic device and control method therefor |
KR102218358B1 (en) * | 2020-08-19 | 2021-02-22 | 포항공과대학교 산학협력단 | System, method of verifying malicious comment, and computer readable medium |
-
2015
- 2015-12-19 KR KR1020150182394A patent/KR20170073424A/en unknown
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019164119A1 (en) * | 2018-02-22 | 2019-08-29 | 삼성전자주식회사 | Electronic device and control method therefor |
US11544469B2 (en) | 2018-02-22 | 2023-01-03 | Samsung Electronics Co., Ltd. | Electronic apparatus and control method thereof |
KR102218358B1 (en) * | 2020-08-19 | 2021-02-22 | 포항공과대학교 산학협력단 | System, method of verifying malicious comment, and computer readable medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10452694B2 (en) | Information extraction from question and answer websites | |
US9070088B1 (en) | Determining trustworthiness and compatibility of a person | |
CN105893533B (en) | Text matching method and device | |
Silva et al. | Building a sentiment lexicon for social judgement mining | |
US8983977B2 (en) | Question answering device, question answering method, and question answering program | |
US8977573B2 (en) | System and method for identifying customers in social media | |
CN109145216A (en) | Network public-opinion monitoring method, device and storage medium | |
US10860565B2 (en) | Database update and analytics system | |
WO2019037258A1 (en) | Information recommendation method, device and system, and computer-readable storage medium | |
US10152478B2 (en) | Apparatus, system and method for string disambiguation and entity ranking | |
US20200134264A1 (en) | Method for Updating a Knowledge Base of a Sentiment Analysis System | |
CN111079029B (en) | Sensitive account detection method, storage medium and computer equipment | |
US20210151038A1 (en) | Methods and systems for automatic generation and convergence of keywords and/or keyphrases from a media | |
WO2015084757A1 (en) | Systems and methods for processing data stored in a database | |
US9811592B1 (en) | Query modification based on textual resource context | |
Mangal et al. | A Framework for Detection and Validation of Fake News via authorize source matching | |
CN110209804B (en) | Target corpus determining method and device, storage medium and electronic device | |
JP4879775B2 (en) | Dictionary creation method | |
KR20170073424A (en) | Method of data analysis for reputation management system using web crawling | |
US20230186212A1 (en) | System, method, electronic device, and storage medium for identifying risk event based on social information | |
US20170147679A1 (en) | Query expansion system and method using language and language variants | |
US20230090601A1 (en) | System and method for polarity analysis | |
Yin et al. | Research of integrated algorithm establishment of a spam detection system | |
KR102180329B1 (en) | System for determining fake news | |
KR101821777B1 (en) | Automatic answering system for on-line bulletin board and method of the same |