KR20090003853A - System and method for automatically detecting information in real-time using rule - Google Patents
System and method for automatically detecting information in real-time using rule Download PDFInfo
- Publication number
- KR20090003853A KR20090003853A KR1020070067574A KR20070067574A KR20090003853A KR 20090003853 A KR20090003853 A KR 20090003853A KR 1020070067574 A KR1020070067574 A KR 1020070067574A KR 20070067574 A KR20070067574 A KR 20070067574A KR 20090003853 A KR20090003853 A KR 20090003853A
- Authority
- KR
- South Korea
- Prior art keywords
- information
- rule
- wired
- wireless internet
- search
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/23—Updating
- G06F16/235—Update request formulation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
Abstract
The present invention relates to a real-time automatic information extraction system and method using a rule. Real-time automatic information extraction system using a rule of the present invention, the URL database that stores the URL of a plurality of wired and wireless Internet site; A rule information database in which rule information for extracting only information on a specific data item from a wired / wireless internet page including a template and data is stored for each wired / wireless internet site; A search term receiver configured to receive a search term from a pre-established information search site; A search word input unit for inputting the received search word into each wired or wireless Internet site according to the URL of the URL database; A page receiving unit receiving a search result page for the corresponding search word from each of the wired and wireless Internet sites; Applying rule information about each wired / wireless internet site stored in the rule information database to a search result page transmitted from the wired / wireless internet site, and extracting information about each data item to be provided as a search result from each search result page. A page analyzer; And an information providing unit which provides the information extracted by the page analysis unit to a user through the information automatic search site.
Description
The present invention relates to a system and method for automatically extracting real-time information using rules. More specifically, when a specific keyword is input through a real-time page analysis method rather than database sharing, the keyword-related information is extracted and collected from various wired and wireless Internet sites. To provide a system and method.
With the construction of an online virtual market called an online shopping mall, Internet users can purchase products through the Internet without visiting a dealership that sells a specific product or goods to purchase. With the emergence of such online shopping malls, time and money can be saved due to the form of direct purchase such as direct visits or reservation purchases of users. Due to the advantages of such an online shopping mall, many users use the online shopping mall, and various online shopping malls have appeared.
On the other hand, since the selling price in the online shopping mall shows various price differences for each online shopping mall company by item, the buyer had to search for each shopping mall, compare the product prices of each shopping mall, and then purchase the corresponding product.
As a result, buyers have a desire to purchase a product in an online shopping mall that offers the lowest price, and as a means to accomplish some of these purposes, the price of the same product sold in each shopping mall can be compared. A professional site is in operation, and the buyer checks the lowest priced mall through this price comparison site and purchases the corresponding item.
However, since general price comparison sites provide product price information through an alliance with existing wired and wireless Internet sites, users cannot know additional information of products provided by unaffiliated sites. There was a problem that can only be confirmed.
In addition, the existing price comparison site, the price information of the products that are posted on the auction site that the price fluctuate in real time, the user can not check the current price of the auction product because of this.
The present invention is to solve the above-mentioned disadvantages and problems of the prior art, it is an object of the present invention to extract and collect information desired by the user from various wired and wireless Internet sites.
In particular, it is an object of the present invention to collect and provide information of each site through a method of directly analyzing a wired or wireless Internet page, rather than using a database or a content cooperation method.
In addition, an object of the present invention is to generate rule information for extracting information when analyzing a wired or wireless Internet page, and to automatically set new rule information when information is not extracted from a wired or wireless Internet site.
The present invention for achieving the above object is a URL database that stores the URL of a plurality of wired and wireless Internet site; A rule information database in which rule information for extracting only information on a specific data item from a wired / wireless internet page including a template and data is stored for each wired / wireless internet site; A search term receiver configured to receive a search term from a pre-established information search site; A search word input unit for inputting the received search word into each wired or wireless Internet site according to the URL of the URL database; A page receiving unit receiving a search result page for the corresponding search word from each of the wired and wireless Internet sites; Applying rule information about each wired / wireless internet site stored in the rule information database to a search result page transmitted from the wired / wireless internet site, and extracting information about each data item to be provided as a search result from each search result page. A page analyzer; It provides a real-time automatic information extraction system using a rule comprising a; information providing unit for providing the information extracted by the page analysis unit to the user through the automatic information search site.
The present invention includes a rule information database in which rule information for extracting only information on a specific data item from a wired / wireless internet page including a template and data is stored for each wired / wireless internet site, thereby providing rule information of a database. A method for extracting real-time information automatically by automatically collecting and providing information on a plurality of wired and wireless Internet sites based on a user input search term. A) When a search word is received from an already constructed information search site, Inputting to a wired or wireless Internet site; B) receiving a search result page for the corresponding search word from each wired or wireless Internet site; C) Applying rule information for each wired / wireless internet site stored in the rule information database to a search result page transmitted from the wired / wireless internet site, the information on each data item to be provided as a search result from each search result page And extracting and providing the extracted information to a user through the automatic information search site.
The present invention includes a rule information database in which rule information for extracting only information on a specific data item from a wired / wireless internet page is stored for each wired / wireless internet site. As a result of applying a rule of a rule information database, a search result is provided. As a rule automatic calibration method for automatically correcting rule information of the rule information database when information on a data item to be provided is not extracted, information on a specific data item to be provided as a search result from the wired or wireless Internet site is normally A first step of determining whether or not an error occurs in the rule information on the wired / wireless internet site by determining whether the information is extracted, and outputting the result; A second step of inputting an arbitrary query word to the wired / wireless internet site where an error occurs in the rule information, and receiving a result page for inputting the query word; A third step of analyzing the result page and classifying it into a template area and a data area; And a fourth step of acquiring data including information on the corresponding data item from which the normal information is not extracted from the classified data area, and modifying and updating the rule based on the corresponding data. Provide a calibration method.
According to the present invention, by extracting and collecting information desired by a user from various wired and wireless Internet sites, the user can see information corresponding to current keywords in various wired and wireless Internet sites at a glance.
In particular, the present invention collects and provides information on each site through a method of directly analyzing wired and wireless Internet pages, and thus, information included in sites not affiliated with content can be viewed.
In addition, the present invention generates rule information for extracting information when analyzing a wired / wireless internet page, and automatically sets new rule information when information is not extracted from a wired / wireless internet site, so that an administrator does not need to directly set rule information. It has the advantage of being.
Details of the above object and technical configuration of the present invention and the resulting effects thereof will be more clearly understood from the following detailed description based on the accompanying drawings.
1 is a block diagram of a real-time automatic information extraction system using a rule according to an embodiment of the present invention, the real-time automatic information extraction system, the price in a wired and
2 is a configuration diagram of the automatic information extraction server according to an embodiment of the present invention, Figure 3 is a configuration diagram of the
As shown in FIG. 2, the automatic
When the user of the wired /
The search term inputter 202 inputs a search term received by the
The
The
The
The
As illustrated in FIG. 3, the
In addition, the
The error determiner 216 of FIG. 3 receives the information extraction result of the
In other words, the
For example, if there is an item in the data area that has an information property of 'price', and the information on the data item is not normally output due to a change in the wired / wireless internet site, the
Then, the
The
The
Next, the service process of the present invention will be described with reference to FIGS. 4 to 6.
First, FIG. 4 is a flowchart of a method for extracting real-time automatic information using a rule according to an embodiment of the present invention. Rule information for extracting only information on a specific data item from a wired or wireless Internet page consisting of a template and data is shown in FIG. Has a
When the user inputs a search term of a target to be searched by the information search site (S101), the search word is transmitted to the automatic information extraction server 200 (S102).
When the
On the other hand, when receiving a search result page for the corresponding search word from each wired or wireless Internet site (S104), the rule result information for each wired and wireless Internet site stored in the
For example, when a search word input from the corresponding information search site is a notebook and the wired / wireless Internet site is a shopping mall site, the
Then, the extracted information is provided to the user in real time through the automatic information search site (S106).
5 is a flowchart of a method for automatically generating a rule according to an embodiment of the present invention. When a URL of a new site is registered in the
The
In addition, the
On the other hand, if some or all of the displayed data portion is blocked (S205), the
6 is a flowchart of a method for automatically correcting a rule according to an embodiment of the present invention. When product information cannot be extracted from rule information stored in the
First, when a search term is input from the automatic information search wired / wireless internet site or a predetermined period, and a query word is input to a specific wired / wireless internet site, product information such as price information is no longer extracted (S301). In operation 302, it is determined whether information on a specific data item to be provided as a search result is normally extracted from the wired / wireless internet site, and whether or not an error of rule information for the wired / wireless internet site is determined (S302).
In addition, an arbitrary query is input to the wired / wireless Internet site where an error occurs in the rule information, and after receiving a result page for the query input, the result page is analyzed and classified into a template area and a data area (S303). .
Obtaining data including information on a corresponding data item from which the normal information is not extracted from the classified data area, and modifying a rule based on the corresponding data (S304), the rule information previously generated in the rule information database Delete and update the new rule by storing (S305).
On the other hand, if it is determined by the error determining unit 221 that there is no error, the user is notified that the product does not correspond to the input query in the wired and wireless Internet site (S306).
As those skilled in the art to which the present invention pertains may implement the present invention in other specific forms without changing the technical spirit or essential features, the embodiments described above are illustrative in all respects as illustrative and not restrictive. It is desirable to. The scope of the present invention is shown by the following claims rather than the detailed description, and all changes or modifications derived from the meaning and scope of the claims and their equivalents should be construed as being included in the scope of the present invention. do.
The above-described real-time automatic information extraction system and method of the present invention can be applied to a shopping mall product integrated management service that integrates and manages product information registered on a shopping mall site, and provides integrated management / providing information registered on a plurality of Internet sites. It can be applied to a server system, and in particular can be usefully used to provide product information of various shopping malls in a consistent form in one site.
1 is a block diagram of a real-time automatic information extraction system using a rule according to an embodiment of the present invention,
2 is a block diagram of an automatic information extraction server according to an embodiment of the present invention;
3 is a block diagram of the rule extraction module of FIG.
4 is an operation flowchart of a real-time automatic information extraction method using a rule according to an embodiment of the present invention;
5 is an operation flowchart of a rule automatic generation method according to an embodiment of the present invention;
6 is a flowchart illustrating an automatic rule correcting method according to an embodiment of the present invention.
<Description of the symbols for the main parts of the drawings>
100: wired / wireless terminal 200: automatic information extraction server
201: search term receiver 202: search term inputter
203: page receiving unit 204: page analyzing unit
205: Information provider 206: URL database
207: Query Database 208: Rule Information Database
210: rule extraction module 211: query input unit
212: data receiving unit 213: data classification unit
214: display unit 215: rule generation unit
216: error determination unit
Claims (11)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR20070067574A KR100888329B1 (en) | 2007-07-05 | 2007-07-05 | System and method for automatically detecting information in real-time using rule |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR20070067574A KR100888329B1 (en) | 2007-07-05 | 2007-07-05 | System and method for automatically detecting information in real-time using rule |
Publications (2)
Publication Number | Publication Date |
---|---|
KR20090003853A true KR20090003853A (en) | 2009-01-12 |
KR100888329B1 KR100888329B1 (en) | 2009-03-12 |
Family
ID=40486415
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
KR20070067574A KR100888329B1 (en) | 2007-07-05 | 2007-07-05 | System and method for automatically detecting information in real-time using rule |
Country Status (1)
Country | Link |
---|---|
KR (1) | KR100888329B1 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR101872826B1 (en) * | 2017-03-29 | 2018-06-29 | 안동과학대학교 산학협력단 | Apparatus for supporting rejoining through machine learning and the method by using the same |
KR102354731B1 (en) * | 2020-08-06 | 2022-02-08 | 쿠팡 주식회사 | Computerized systems and methods for managing and monitoring services and modules on an online platform |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20240014845A (en) * | 2022-07-26 | 2024-02-02 | 쿠팡 주식회사 | Electronic apparatus and information provision method thereof |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20000072817A (en) * | 2000-09-29 | 2000-12-05 | 김상진 | Method and apparatus for aggregation and re-organization of information distributed arbitrarily thought internet according to user`s needs |
KR20030004653A (en) * | 2001-07-06 | 2003-01-15 | (주)아이퀵 | Information support system and the method using real-time web search |
KR20040017008A (en) * | 2002-08-20 | 2004-02-26 | 주식회사 케이랩 | System and method for offering information using a search engine |
-
2007
- 2007-07-05 KR KR20070067574A patent/KR100888329B1/en not_active IP Right Cessation
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR101872826B1 (en) * | 2017-03-29 | 2018-06-29 | 안동과학대학교 산학협력단 | Apparatus for supporting rejoining through machine learning and the method by using the same |
KR102354731B1 (en) * | 2020-08-06 | 2022-02-08 | 쿠팡 주식회사 | Computerized systems and methods for managing and monitoring services and modules on an online platform |
US11768886B2 (en) | 2020-08-06 | 2023-09-26 | Coupang Corp. | Computerized systems and methods for managing and monitoring services and modules on an online platform |
Also Published As
Publication number | Publication date |
---|---|
KR100888329B1 (en) | 2009-03-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Lu et al. | BizSeeker: a hybrid semantic recommendation system for personalized government‐to‐business e‐services | |
US9171088B2 (en) | Mining for product classification structures for internet-based product searching | |
TWI557664B (en) | Product information publishing method and device | |
US9569499B2 (en) | Method and apparatus for recommending content on the internet by evaluating users having similar preference tendencies | |
KR101419504B1 (en) | System and method providing a suited shopping information by analyzing the propensity of an user | |
US20070214133A1 (en) | Methods for filtering data and filling in missing data using nonlinear inference | |
US9384278B2 (en) | Methods and systems for assessing excessive accessory listings in search results | |
US20090076899A1 (en) | Method for analyzing, searching for, and trading targeted advertisement spaces | |
KR102147649B1 (en) | Method and apparatus for providing product information | |
CN105469263A (en) | Commodity recommendation method and device | |
KR101834480B1 (en) | Providing system for goods recommending using goods review of customer | |
JP2013531289A (en) | Use of model information group in search | |
KR101523450B1 (en) | Related-word registration device, related-word registration method, recording medium, and related-word registration system | |
KR102142126B1 (en) | Hierarchical Category Cluster Based Shopping Basket Associated Recommendation Method | |
CN106919625A (en) | A kind of internet customer attribute recognition methods and device | |
CN111680165B (en) | Information matching method and device, readable storage medium and electronic equipment | |
KR20180052489A (en) | method of providing goods recommendation for cross-border E-commerce based on user experience analysis and environmental factors | |
CN113032668A (en) | Product recommendation method, device and equipment based on user portrait and storage medium | |
KR20140133633A (en) | Goods exposure system in online shopping mall with keyword analyzing and managing method thereof | |
KR20190055963A (en) | Goods exposure system in online shopping mall with keyword analyzing | |
KR20210032691A (en) | Method and apparatus of recommending goods based on network | |
KR100888329B1 (en) | System and method for automatically detecting information in real-time using rule | |
KR20130021945A (en) | Method and apparatus for auto extracting information of product | |
TWI630574B (en) | Real-estate information system and its method | |
KR20200117668A (en) | cross-credit based B2B2C cross-border E-commerce system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
A201 | Request for examination | ||
E902 | Notification of reason for refusal | ||
E701 | Decision to grant or registration of patent right | ||
GRNT | Written decision to grant | ||
FPAY | Annual fee payment |
Payment date: 20130304 Year of fee payment: 5 |
|
FPAY | Annual fee payment |
Payment date: 20140304 Year of fee payment: 6 |
|
FPAY | Annual fee payment |
Payment date: 20150302 Year of fee payment: 7 |
|
LAPS | Lapse due to unpaid annual fee |