KR20140107814A - System and method for providing integrated project information - Google Patents
System and method for providing integrated project information Download PDFInfo
- Publication number
- KR20140107814A KR20140107814A KR1020130021844A KR20130021844A KR20140107814A KR 20140107814 A KR20140107814 A KR 20140107814A KR 1020130021844 A KR1020130021844 A KR 1020130021844A KR 20130021844 A KR20130021844 A KR 20130021844A KR 20140107814 A KR20140107814 A KR 20140107814A
- Authority
- KR
- South Korea
- Prior art keywords
- information
- research
- wrapper
- specific information
- announcement
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims description 29
- 238000011160 research Methods 0.000 claims abstract description 107
- 230000010354 integration Effects 0.000 claims abstract description 25
- 239000000284 extract Substances 0.000 claims abstract description 9
- 238000007726 management method Methods 0.000 description 28
- 238000010586 diagram Methods 0.000 description 9
- 238000000605 extraction Methods 0.000 description 8
- 238000004891 communication Methods 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 6
- 230000002093 peripheral effect Effects 0.000 description 4
- 230000005540 biological transmission Effects 0.000 description 3
- 230000009193 crawling Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 230000001413 cellular effect Effects 0.000 description 1
- 238000013075 data extraction Methods 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 235000013305 food Nutrition 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000000699 topical effect Effects 0.000 description 1
- 230000007723 transport mechanism Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
Landscapes
- Business, Economics & Management (AREA)
- Tourism & Hospitality (AREA)
- Health & Medical Sciences (AREA)
- Economics (AREA)
- General Health & Medical Sciences (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- Physics & Mathematics (AREA)
- General Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The research project information integration providing server according to the present invention includes a wrapper storage unit for storing a wrapper including URL information of a plurality of research management institution sites and storage location information of specific information among web documents provided by each site, An extracting unit for extracting specific information based on the wrapper from the research business announcement data collected by the collecting unit, an extracting unit for extracting the specific information extracted by the extracting unit from the integrated information data And a transmitting unit for transmitting the integrated information to a terminal requesting the integrated information, wherein the extracting unit extracts the specific information based on the wrapper corresponding to the collected research management institution site, Can be extracted.
Description
The present invention relates to a system and method for providing research project information integration.
The user must visit the website of each institute and search the contents of the bulletin to find the research project information of each research management institution. This process requires a lot of time and effort as more web sites related to the research project are involved. However, there is little development of a system that provides integrated research project information.
On the other hand, focused crawling and topical crawling techniques are used as methods for extracting specific data from various web sites. However, there is a disadvantage in that the process of extracting the hyperlink from the seed url is less rapid and the accuracy of the document classifier is lowered.
Therefore, a system and method that can extract the research project information accurately and quickly from the website of each institute is necessary to integrally manage and provide the research project information of each individual research management institution.
On the other hand, Korean Patent Laid-Open Publication No. 2011-0029205 (entitled " Internet shopping mall retrieval system and method ") discloses a method of extracting a page in which an Internet content requested by a user exists and integrating and processing An Internet shopping mall search system and method capable of extracting only specific information through the data extraction technology, the data integration processing technology, the DB linking technology, and the like are proposed.
SUMMARY OF THE INVENTION The present invention has been made in order to solve the above problems of the prior art, and it is an object of the present invention to provide a method and system for automatically collecting and integrating research project information scattered on a web site of a major research management institution using a wrapper- .
According to an aspect of the present invention, there is provided a research project information integration providing server, comprising: URL information for a plurality of research management institution sites; storage location information of specific ones of web documents provided by respective sites; A collection section for collecting research project announcement data in the research management institution site; and a specific information extraction section for extracting specific information based on the wrapper from the research project announcement data collected by the collection section An extraction unit, an integrated information storage unit for inserting and storing the specific information extracted by the extraction unit into the integrated information data, and a transmission unit for transmitting the integrated information to the terminal requesting the integrated information, Based on the wrapper corresponding to the site of the research management institution, A it can be extracted.
According to the above-mentioned problem solving means of the present invention, it is possible to confirm the updated known contents of each individual research project at a time through the integrated notification providing site and individual mailing through a single integrated notification providing system, Time and effort to access and search the site can be shortened.
FIG. 1 is a block diagram illustrating a research project information integration providing system according to an embodiment of the present invention.
2 is a block diagram of a research project information integration server according to an embodiment of the present invention.
FIG. 3 is an example for explaining collecting results of a research task announcement by the collecting unit according to an embodiment of the present invention.
4 is a diagram for explaining a schema of a wrapper storage unit according to an embodiment of the present invention.
5 is a diagram for explaining an example of character extraction using a wrapper and a wrapper configured based on a schema according to an embodiment of the present invention.
FIG. 6 is a view for explaining a research project information integration providing system according to an embodiment of the present invention.
FIG. 7 is a flowchart illustrating a method for providing research project information integration through a research project information integration server according to an embodiment of the present invention. Referring to FIG.
Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings, which will be readily apparent to those skilled in the art. The present invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. In order to clearly illustrate the present invention, parts not related to the description are omitted, and similar parts are denoted by like reference characters throughout the specification.
Throughout the specification, when a part is referred to as being "connected" to another part, it includes not only "directly connected" but also "electrically connected" with another part in between . Also, when an element is referred to as "comprising ", it means that it can include other elements as well, without departing from the other elements unless specifically stated otherwise.
FIG. 1 is a block diagram illustrating a research project information integration providing system according to an embodiment of the present invention.
The research project information integration providing system 400 according to an embodiment of the present invention includes a research project information
As shown in FIG. 1, the research project
The research project
At this time, the research project can include various projects and projects announced by each research management institution. In addition, research project announcement data may include various announcements, announcements, and notifications related to the research project.
The research management institution site (200) is a website of a plurality of research management institutions storing research business announcement data.
The
In other words, the
2 is a block diagram of a research project information integration server according to an embodiment of the present invention.
The research project information
The
Also, the
The
The extracting
At this time, the specific information can be extracted based on the wrapper corresponding to the research
The integrated
The
Meanwhile, the research project information
The
FIG. 3 is an example for explaining collecting results of a research task announcement by the collecting unit according to an embodiment of the present invention.
As shown in FIG. 3, the collected research report announcement may include a Hyper Text Markup Language (HTML) format. HTML is a basic programming language used to create web documents.
4 is a diagram for explaining a schema of a wrapper storage unit according to an embodiment of the present invention.
As shown in FIG. 4, the structure and contents of data existing in the
First, no means consecutive serial numbers, and url indicates the URL of a web page representing the known contents of a research project. Link_start means string information commonly generated in the periphery for extracting the start position of the detailed link, and link_end means string information generated in common in the periphery for searching for the end position of the detailed link. That is, the address of the detailed link can be extracted by finding the start position and the end position through the surrounding string and then extracting the string. This will be described in more detail with reference to FIG.
Next, title_start indicates peripheral string information for searching for the start position of the business announcement name, and title_end indicates peripheral string information for searching for the end position of the business announcement name. Also, date_start is peripheral string information for extracting the start position of the business announcement publication date, and date_end is peripheral string information for extracting the end position of the business announcement publication date. Lastly, there is a site_code which means code information of the research
5 is a diagram for explaining an example of character extraction using a wrapper and a wrapper configured based on a schema according to an embodiment of the present invention.
As shown in FIG. 5 (a), a wrapper for each
First, the process of extracting the detailed link is as follows. As shown in Fig. 5 (a), the url can confirm that the string defined in link_start is " a href = ". In this string, as shown in Fig. 5B, the position of the character string on the source code is 4. Add the length of <a href = ', 8 and 1, to get the starting
Next, the process of extracting the business name is as follows. In order to extract the business name, at least two strings must be found in succession. As shown in FIG. 5 (a), the string indicating the start position of the business announcement name is <a href=\">>. First, you need to find the string <a href=\", and find the first> that appears in this position. That is, as shown in FIG. 5 (b), the start position of the business announcement name may be 35, which is 34
Finally, the process of extracting the business publication date is as follows. In order to extract the publication date of the business announcement, at least two strings must be found consecutively. As shown in Fig. 5 (a), the string indicating the start position of the business announcement publication date is <a href=\" and <td>. First, you need to find the string <a href=\", and then find the first occurrence of <td> from this position. 5B, the start position of the business announcement publication date is 52, which is the position of < td > first appearing from the string <a href=\", plus 4 which is the length of <td> Lt; / RTI > Next, the string indicating the end position of the business announcement name is </ td> and its position is 66. That is, the business announcement publication date can be extracted as the business announcement publication date, which is a string starting from the
The specific information for extracting the character is not limited, and all the data that can constitute the wrapper can be extracted.
FIG. 6 is a view for explaining a research project information integration providing system according to an embodiment of the present invention.
As shown in FIG. 6, the research project information in the
FIG. 7 is a flowchart illustrating a method for providing research project information integration through a research project information integration server according to an embodiment of the present invention. Referring to FIG.
In step S110, a wrapper including URL information on a plurality of research management institution sites and storage location information of specific information among the web documents provided by the respective sites may be stored.
In step S120, the research project announcement data in the research management institution site can be collected.
In step S130, specific information can be extracted based on the wrapper from the research project announcement data collected in the step. That is, at least one of the detailed link of the task announcement, the business announcement name, and the publication date of the business announcement can be extracted from the collected research business announcement data. At this time, specific information can be extracted using a character string that commonly generates start and end positions of specific information to be extracted from the source code of the research project announcement data using the wrapper.
In step S140, the specific information extracted in the step may be inserted into the integrated information data and stored. At this time, it is possible to insert and store only information that is not duplicated by discriminating whether the extracted specific information is duplicated information.
In step S150, the integrated information may be transmitted to the requesting terminal. For example, it may be sent to the email account of the user requesting the aggregated information or to the aggregated information providing web site.
One embodiment of the present invention may also be embodied in the form of a recording medium including instructions executable by a computer, such as program modules, being executed by a computer. Computer readable media can be any available media that can be accessed by a computer and includes both volatile and nonvolatile media, removable and non-removable media. In addition, the computer-readable medium may include both computer storage media and communication media. Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Communication media typically includes any information delivery media, including computer readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave, or other transport mechanism.
It will be understood by those of ordinary skill in the art that the foregoing description of the embodiments is for illustrative purposes and that those skilled in the art can easily modify the invention without departing from the spirit or essential characteristics thereof. It is therefore to be understood that the above-described embodiments are illustrative in all aspects and not restrictive. For example, each component described as a single entity may be distributed and implemented, and components described as being distributed may also be implemented in a combined form.
The scope of the present invention is defined by the appended claims rather than the detailed description, and all changes or modifications derived from the meaning and scope of the claims and their equivalents should be construed as being included within the scope of the present invention.
100: Research project information integration providing server 110: Wrapper storage unit
120: collecting unit 130:
140: Integrated information storage unit 150:
160: Visit Management Department 200: Research Management Agency Site
300: terminal
Claims (11)
A wrapper storage unit for storing a wrapper including URL information of a plurality of research management institution sites and storage location information of specific information among web documents provided by each site,
A collection unit for collecting research project announcement data in the research management institution site,
An extracting unit for extracting specific information based on the wrapper from the research business announcement data collected by the collecting unit,
An integrated information storage unit for inserting and storing the specific information extracted by the extracting unit into the integrated information data,
And a transmitting unit for transmitting the integrated information to the requesting terminal,
Wherein the extracting unit extracts the specific information based on a wrapper corresponding to a research management institution site where the research business announcement data is collected.
And a visit management unit for periodically accessing the research management institution site based on the url information stored in the wrapper storage unit.
The visit management unit periodically reconnects to a web site to be visited and confirms whether the research business announcement data is updated.
Wherein the specific information includes at least one of a detailed link of a business announcement, a business announcement name, and a business announcement publication date.
Wherein the wrapper uses a string that commonly generates a start position and an end position of specific information to be extracted from the source code of the research project announcement data.
Wherein the integrated information storage unit determines whether the specific information is duplicated information, and inserts and stores only non-duplicated information.
Storing a wrapper including URL information on a plurality of research management institution sites and storage location information of specific information among web documents provided by each site;
Collecting research project announcement data in the research management institution site;
Extracting specific information based on the wrapper from the research project announcement data collected in the collecting of the research project announcement data;
Inserting and storing the extracted specific information into the integrated information data, and
Transmitting the stored integrated information to a requesting terminal;
A method for providing integrated research project information.
Wherein the specific information includes at least one of a detailed link of a business announcement, a business announcement name, and a business announcement publication date.
Wherein the wrapper uses a character string that commonly generates a start position and an end position of specific information to be extracted from the source code of the research project announcement data.
Wherein the step of inserting and storing the integration notice includes inserting and storing only the non-duplicated information by determining whether the specific information is duplicated information.
Wherein the step of accessing the website comprises periodically reconnecting to a website to be visited and confirming whether the research project announcement data is updated.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020130021844A KR20140107814A (en) | 2013-02-28 | 2013-02-28 | System and method for providing integrated project information |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020130021844A KR20140107814A (en) | 2013-02-28 | 2013-02-28 | System and method for providing integrated project information |
Publications (1)
Publication Number | Publication Date |
---|---|
KR20140107814A true KR20140107814A (en) | 2014-09-05 |
Family
ID=51755269
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
KR1020130021844A KR20140107814A (en) | 2013-02-28 | 2013-02-28 | System and method for providing integrated project information |
Country Status (1)
Country | Link |
---|---|
KR (1) | KR20140107814A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR102425811B1 (en) * | 2021-05-06 | 2022-07-27 | 주식회사 우주하나소유 | Method, server and computer program for uploading product sale contents using website crawling based on artificial intelligence |
-
2013
- 2013-02-28 KR KR1020130021844A patent/KR20140107814A/en not_active Application Discontinuation
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR102425811B1 (en) * | 2021-05-06 | 2022-07-27 | 주식회사 우주하나소유 | Method, server and computer program for uploading product sale contents using website crawling based on artificial intelligence |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20180034899A1 (en) | Aggregating Plug-In Requests for Improved Client Performance | |
US8078986B1 (en) | Method and system for a browser module | |
US9082129B2 (en) | Providing recommendations on a social networking system page | |
US20170308251A1 (en) | User Interface with Media Wheel Facilitating Viewing of Media Objects | |
US10817613B2 (en) | Access and management of entity-augmented content | |
US10819772B2 (en) | Transformation of a content file into a content-centric social network | |
Littman et al. | API-based social media collecting as a form of web archiving | |
US20170177317A1 (en) | Dependency-Aware Transformation of Multi-Function Applications for On-Demand Execution | |
CN108572990A (en) | Information-pushing method and device | |
JP2018514846A (en) | Web page access method, apparatus, device, and program | |
CN103853757A (en) | Method and system for displaying information of network, terminal and information displaying and processing device | |
KR101582620B1 (en) | Method for providing social activity intergrating service | |
CN109325197A (en) | Method and apparatus for extracting information | |
US20090112833A1 (en) | Federated search data normalization for rich presentation | |
JP2015064623A (en) | Page site server, program, and method for immediately displaying remarked portion about page content | |
CN104573120A (en) | Recommendation information obtaining method and device for terminal | |
US20140108619A1 (en) | Information providing system and method for providing information | |
KR20140107814A (en) | System and method for providing integrated project information | |
KR101734533B1 (en) | Method for providing news of multi-nations | |
US10572523B1 (en) | Method and apparatus of obtaining and organizing relevant user defined information | |
CN113590985B (en) | Page jump configuration method and device, electronic equipment and computer readable medium | |
US20150026266A1 (en) | Share to stream | |
KR101140262B1 (en) | System, method and computer readable recording medium for providing search result | |
KR101482143B1 (en) | Apparatus for providing additional information based on type of auto-completed word and method thereof | |
KR101372584B1 (en) | System and method for providing object information |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
A201 | Request for examination | ||
E902 | Notification of reason for refusal | ||
E601 | Decision to refuse application |