KR20140107814A - System and method for providing integrated project information - Google Patents

System and method for providing integrated project information Download PDF

Info

Publication number
KR20140107814A
KR20140107814A KR1020130021844A KR20130021844A KR20140107814A KR 20140107814 A KR20140107814 A KR 20140107814A KR 1020130021844 A KR1020130021844 A KR 1020130021844A KR 20130021844 A KR20130021844 A KR 20130021844A KR 20140107814 A KR20140107814 A KR 20140107814A
Authority
KR
South Korea
Prior art keywords
information
research
wrapper
specific information
announcement
Prior art date
Application number
KR1020130021844A
Other languages
Korean (ko)
Inventor
유성준
강한훈
Original Assignee
세종대학교산학협력단
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 세종대학교산학협력단 filed Critical 세종대학교산학협력단
Priority to KR1020130021844A priority Critical patent/KR20140107814A/en
Publication of KR20140107814A publication Critical patent/KR20140107814A/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services

Landscapes

  • Business, Economics & Management (AREA)
  • Tourism & Hospitality (AREA)
  • Health & Medical Sciences (AREA)
  • Economics (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The research project information integration providing server according to the present invention includes a wrapper storage unit for storing a wrapper including URL information of a plurality of research management institution sites and storage location information of specific information among web documents provided by each site, An extracting unit for extracting specific information based on the wrapper from the research business announcement data collected by the collecting unit, an extracting unit for extracting the specific information extracted by the extracting unit from the integrated information data And a transmitting unit for transmitting the integrated information to a terminal requesting the integrated information, wherein the extracting unit extracts the specific information based on the wrapper corresponding to the collected research management institution site, Can be extracted.

Description

TECHNICAL FIELD [0001] The present invention relates to a system and method for integrating research information,

The present invention relates to a system and method for providing research project information integration.

The user must visit the website of each institute and search the contents of the bulletin to find the research project information of each research management institution. This process requires a lot of time and effort as more web sites related to the research project are involved. However, there is little development of a system that provides integrated research project information.

On the other hand, focused crawling and topical crawling techniques are used as methods for extracting specific data from various web sites. However, there is a disadvantage in that the process of extracting the hyperlink from the seed url is less rapid and the accuracy of the document classifier is lowered.

Therefore, a system and method that can extract the research project information accurately and quickly from the website of each institute is necessary to integrally manage and provide the research project information of each individual research management institution.

On the other hand, Korean Patent Laid-Open Publication No. 2011-0029205 (entitled " Internet shopping mall retrieval system and method ") discloses a method of extracting a page in which an Internet content requested by a user exists and integrating and processing An Internet shopping mall search system and method capable of extracting only specific information through the data extraction technology, the data integration processing technology, the DB linking technology, and the like are proposed.

SUMMARY OF THE INVENTION The present invention has been made in order to solve the above problems of the prior art, and it is an object of the present invention to provide a method and system for automatically collecting and integrating research project information scattered on a web site of a major research management institution using a wrapper- .

According to an aspect of the present invention, there is provided a research project information integration providing server, comprising: URL information for a plurality of research management institution sites; storage location information of specific ones of web documents provided by respective sites; A collection section for collecting research project announcement data in the research management institution site; and a specific information extraction section for extracting specific information based on the wrapper from the research project announcement data collected by the collection section An extraction unit, an integrated information storage unit for inserting and storing the specific information extracted by the extraction unit into the integrated information data, and a transmission unit for transmitting the integrated information to the terminal requesting the integrated information, Based on the wrapper corresponding to the site of the research management institution, A it can be extracted.

According to the above-mentioned problem solving means of the present invention, it is possible to confirm the updated known contents of each individual research project at a time through the integrated notification providing site and individual mailing through a single integrated notification providing system, Time and effort to access and search the site can be shortened.

FIG. 1 is a block diagram illustrating a research project information integration providing system according to an embodiment of the present invention.
2 is a block diagram of a research project information integration server according to an embodiment of the present invention.
FIG. 3 is an example for explaining collecting results of a research task announcement by the collecting unit according to an embodiment of the present invention.
4 is a diagram for explaining a schema of a wrapper storage unit according to an embodiment of the present invention.
5 is a diagram for explaining an example of character extraction using a wrapper and a wrapper configured based on a schema according to an embodiment of the present invention.
FIG. 6 is a view for explaining a research project information integration providing system according to an embodiment of the present invention.
FIG. 7 is a flowchart illustrating a method for providing research project information integration through a research project information integration server according to an embodiment of the present invention. Referring to FIG.

Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings, which will be readily apparent to those skilled in the art. The present invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. In order to clearly illustrate the present invention, parts not related to the description are omitted, and similar parts are denoted by like reference characters throughout the specification.

Throughout the specification, when a part is referred to as being "connected" to another part, it includes not only "directly connected" but also "electrically connected" with another part in between . Also, when an element is referred to as "comprising ", it means that it can include other elements as well, without departing from the other elements unless specifically stated otherwise.

FIG. 1 is a block diagram illustrating a research project information integration providing system according to an embodiment of the present invention.

The research project information integration providing system 400 according to an embodiment of the present invention includes a research project information integration providing server 100, a research management institution site 200, and a terminal 300.

As shown in FIG. 1, the research project information integration server 100 collects research project notification data from a plurality of research management institution sites 200, extracts specific information in the collected research project notification data, And transmits the integrated information to the terminal 300 requesting the integrated information.

The research project information providing server 100 can extract specific information to be inserted into the integrated information from the research project announcement data of the collected institution site 200. [ At this time, in extracting the specific information, it is possible to accurately and quickly collect and extract by knowing the positional information of the specific information based on the wrapper. The collected aggregate information is transmitted at the request of the terminal and can be automatically updated periodically.

At this time, the research project can include various projects and projects announced by each research management institution. In addition, research project announcement data may include various announcements, announcements, and notifications related to the research project.

The research management institution site (200) is a website of a plurality of research management institutions storing research business announcement data.

The terminal 300 receives integration information from the research project information integration providing server 100. That is, the terminal 300 may include a smart phone, a tablet PC, a desktop, or the like, which is connected to a site that integrally provides research task information or is provided through an e-mail account set by a user.

In other words, the terminal 300 may be implemented as a computer or a portable terminal that can access the research project information providing server 100 through a network. Here, the computer includes, for example, a notebook computer, a desktop computer, a laptop computer, a tablet PC, and the like, each of which is equipped with a WEB Browser. (PDS), a Personal Digital Assistant (PDA), an International Mobile Telecommunication (IMT), and the like. ) -2000, Code Division Multiple Access (CDMA) -2000, W-CDMA (W-CDMA), Wibro (Wireless Broadband Internet) terminals, smart phones, Based wireless communication device.

2 is a block diagram of a research project information integration server according to an embodiment of the present invention.

The research project information integration providing server 100 according to an embodiment of the present invention includes a wrapper storage unit 110, a collection unit 120, an extraction unit 130, an integrated information storage unit 140, a transmission unit 150, And a visit management unit 160.

The wrapper storage unit 110 may store a wrapper including URL information of a plurality of research management institution sites 200 and storage location information of specific information among web documents provided by the respective sites. Further, the wrapper can extract specific information by calling the extraction unit 130. [ That is, the wrapper can extract specific information by using a character string that commonly generates the start position and the end position of the specific information to be extracted from the source code of the research project announcement data. At this time, the specific information may include at least one of a detailed link of the business announcement, a business announcement name, and a business announcement publication date.

Also, the wrapper storage unit 110 may provide a URL of the research management institution site 200 to be visited by the visit management unit 160. [

The collecting unit 120 may collect the research business announcement data in the research management institution site 200. That is, the web data including the contents of the research project announcements in the plurality of research management institution sites 200 may be imported. At this time, the research management institution site 200 is provided by the Information and Communication Industry Promotion Agency (http://www.nipa.kr), the Small Business Administration (http://www.smtech.go.kr), the Korea Research Foundation (http: www.nrf.re.kr), Korea Ocean Science and Technology Promotion Agency (http://www.kimst.re.kr), Agriculture, Forestry and Fisheries Quarantine Inspection Headquarters (http://www.qia.go.kr) (Http://www.hpeb.re.kr), Ministry of Food, Agriculture, Forestry and Fisheries (http://www.fris.go.kr), disease management headquarters (http://www.cdc.go.kr), Korea (Http://www.kocca.kr), and the contents promotion agency (http://www.kocca.kr).

The extracting unit 130 can extract specific information based on the wrapper from the research business announcement data collected by the collecting unit 120. [ That is, at least one of the detailed link of the task announcement, the business announcement name, and the publication date of the business announcement can be extracted from the collected research business announcement data.

At this time, the specific information can be extracted based on the wrapper corresponding to the research management institution site 200 from which the research project announcement data is collected. That is, the wrapper is called from the wrapper storage unit 110 to extract only specific information in the research project announcement data, and the start position and the end position of the specific information to be extracted from the source code of the research project announcement data are common It is possible to extract specific information by using a character string generated as a character string. A more detailed description will be made with reference to FIGS. 4, 5 and 7 below.

The integrated information storage unit 140 may insert and store the specific information extracted by the extraction unit 130 in the integrated information data. At this time, it is possible to insert and store only information that is not duplicated by discriminating whether the extracted specific information is duplicated information.

The transmission unit 150 may transmit the integrated information to the requesting terminal. For example, the aggregated information can be sent to the email account of the requesting user or to the aggregated information providing website.

Meanwhile, the research project information integration providing server 100 and the terminal 300 are connected through a network. The network may be a local area network (LAN), a wide area network (WAN) or a value added network Such as a wired network such as a cellular network or a VAN, or a mobile radio communication network or a satellite communication network.

The visit management unit 160 may periodically access the research management institution site 200 based on the url information stored in the wrapper storage unit 110. [ Also, it is possible to periodically reconnect with the site 200 to be visited and confirm whether the research business announcement data is updated. At this time, if there is newly updated data, data can be collected through the collecting unit 120.

FIG. 3 is an example for explaining collecting results of a research task announcement by the collecting unit according to an embodiment of the present invention.

As shown in FIG. 3, the collected research report announcement may include a Hyper Text Markup Language (HTML) format. HTML is a basic programming language used to create web documents.

4 is a diagram for explaining a schema of a wrapper storage unit according to an embodiment of the present invention.

As shown in FIG. 4, the structure and contents of data existing in the wrapper storage unit 110 are as follows.

First, no means consecutive serial numbers, and url indicates the URL of a web page representing the known contents of a research project. Link_start means string information commonly generated in the periphery for extracting the start position of the detailed link, and link_end means string information generated in common in the periphery for searching for the end position of the detailed link. That is, the address of the detailed link can be extracted by finding the start position and the end position through the surrounding string and then extracting the string. This will be described in more detail with reference to FIG.

Next, title_start indicates peripheral string information for searching for the start position of the business announcement name, and title_end indicates peripheral string information for searching for the end position of the business announcement name. Also, date_start is peripheral string information for extracting the start position of the business announcement publication date, and date_end is peripheral string information for extracting the end position of the business announcement publication date. Lastly, there is a site_code which means code information of the research management institution site 200.

5 is a diagram for explaining an example of character extraction using a wrapper and a wrapper configured based on a schema according to an embodiment of the present invention.

As shown in FIG. 5 (a), a wrapper for each institution site 200 can be configured. 5 (b) is a diagram showing an example of the source code of the present invention by character positions. For example, as shown in FIG. 5 (a), a detailed link, a business announcement name, and a business announcement publication date, which are specific information, are extracted from a wrapper whose URL is http://www.test.com/notice.php Can be explained as follows.

First, the process of extracting the detailed link is as follows. As shown in Fig. 5 (a), the url can confirm that the string defined in link_start is &quot; a href = &quot;. In this string, as shown in Fig. 5B, the position of the character string on the source code is 4. Add the length of <a href = ', 8 and 1, to get the starting position 13 of the detail link to be extracted. Next, the string defined in link_end is ">, and if you find the position of this string, you can see that it is 33. That is, the detailed link may be a string extracted from the start position 13 to the end position 33.

Next, the process of extracting the business name is as follows. In order to extract the business name, at least two strings must be found in succession. As shown in FIG. 5 (a), the string indicating the start position of the business announcement name is <a href=\">>. First, you need to find the string <a href=\", and find the first> that appears in this position. That is, as shown in FIG. 5 (b), the start position of the business announcement name may be 35, which is 34 plus 1, which is the position of the first occurrence of the string <a href=\". Next, the string indicating the end position of the business announcement name is </ td> and its position is 47. That is, a 'technology development project selection guide', which is a string located from the start position 35 to the end position 47, can be extracted as the business announcement name.

Finally, the process of extracting the business publication date is as follows. In order to extract the publication date of the business announcement, at least two strings must be found consecutively. As shown in Fig. 5 (a), the string indicating the start position of the business announcement publication date is <a href=\" and <td>. First, you need to find the string <a href=\", and then find the first occurrence of <td> from this position. 5B, the start position of the business announcement publication date is 52, which is the position of &lt; td &gt; first appearing from the string <a href=\", plus 4 which is the length of <td> Lt; / RTI &gt; Next, the string indicating the end position of the business announcement name is </ td> and its position is 66. That is, the business announcement publication date can be extracted as the business announcement publication date, which is a string starting from the start position 56 and located at the end position 66, '2012-12-20'.

The specific information for extracting the character is not limited, and all the data that can constitute the wrapper can be extracted.

FIG. 6 is a view for explaining a research project information integration providing system according to an embodiment of the present invention.

As shown in FIG. 6, the research project information in the research management site 200 is collected, the specific information is extracted and integrated through the research project information integration providing server 100, and then transmitted to the requesting terminal 300 have. The transmitted integrated information can be posted on the bulletin board of the integrated website of the research project information.

FIG. 7 is a flowchart illustrating a method for providing research project information integration through a research project information integration server according to an embodiment of the present invention. Referring to FIG.

In step S110, a wrapper including URL information on a plurality of research management institution sites and storage location information of specific information among the web documents provided by the respective sites may be stored.

In step S120, the research project announcement data in the research management institution site can be collected.

In step S130, specific information can be extracted based on the wrapper from the research project announcement data collected in the step. That is, at least one of the detailed link of the task announcement, the business announcement name, and the publication date of the business announcement can be extracted from the collected research business announcement data. At this time, specific information can be extracted using a character string that commonly generates start and end positions of specific information to be extracted from the source code of the research project announcement data using the wrapper.

In step S140, the specific information extracted in the step may be inserted into the integrated information data and stored. At this time, it is possible to insert and store only information that is not duplicated by discriminating whether the extracted specific information is duplicated information.

In step S150, the integrated information may be transmitted to the requesting terminal. For example, it may be sent to the email account of the user requesting the aggregated information or to the aggregated information providing web site.

One embodiment of the present invention may also be embodied in the form of a recording medium including instructions executable by a computer, such as program modules, being executed by a computer. Computer readable media can be any available media that can be accessed by a computer and includes both volatile and nonvolatile media, removable and non-removable media. In addition, the computer-readable medium may include both computer storage media and communication media. Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Communication media typically includes any information delivery media, including computer readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave, or other transport mechanism.

It will be understood by those of ordinary skill in the art that the foregoing description of the embodiments is for illustrative purposes and that those skilled in the art can easily modify the invention without departing from the spirit or essential characteristics thereof. It is therefore to be understood that the above-described embodiments are illustrative in all aspects and not restrictive. For example, each component described as a single entity may be distributed and implemented, and components described as being distributed may also be implemented in a combined form.

The scope of the present invention is defined by the appended claims rather than the detailed description, and all changes or modifications derived from the meaning and scope of the claims and their equivalents should be construed as being included within the scope of the present invention.

100: Research project information integration providing server 110: Wrapper storage unit
120: collecting unit 130:
140: Integrated information storage unit 150:
160: Visit Management Department 200: Research Management Agency Site
300: terminal

Claims (11)

A research project information integration server,
A wrapper storage unit for storing a wrapper including URL information of a plurality of research management institution sites and storage location information of specific information among web documents provided by each site,
A collection unit for collecting research project announcement data in the research management institution site,
An extracting unit for extracting specific information based on the wrapper from the research business announcement data collected by the collecting unit,
An integrated information storage unit for inserting and storing the specific information extracted by the extracting unit into the integrated information data,
And a transmitting unit for transmitting the integrated information to the requesting terminal,
Wherein the extracting unit extracts the specific information based on a wrapper corresponding to a research management institution site where the research business announcement data is collected.
The method according to claim 1,
And a visit management unit for periodically accessing the research management institution site based on the url information stored in the wrapper storage unit.
3. The method of claim 2,
The visit management unit periodically reconnects to a web site to be visited and confirms whether the research business announcement data is updated.
The method according to claim 1,
Wherein the specific information includes at least one of a detailed link of a business announcement, a business announcement name, and a business announcement publication date.
The method according to claim 1,
Wherein the wrapper uses a string that commonly generates a start position and an end position of specific information to be extracted from the source code of the research project announcement data.
The method according to claim 1,
Wherein the integrated information storage unit determines whether the specific information is duplicated information, and inserts and stores only non-duplicated information.
In a method for integrating research project information using a research project information provision server,
Storing a wrapper including URL information on a plurality of research management institution sites and storage location information of specific information among web documents provided by each site;
Collecting research project announcement data in the research management institution site;
Extracting specific information based on the wrapper from the research project announcement data collected in the collecting of the research project announcement data;
Inserting and storing the extracted specific information into the integrated information data, and
Transmitting the stored integrated information to a requesting terminal;
A method for providing integrated research project information.
The method according to claim 6,
Wherein the specific information includes at least one of a detailed link of a business announcement, a business announcement name, and a business announcement publication date.
The method according to claim 6,
Wherein the wrapper uses a character string that commonly generates a start position and an end position of specific information to be extracted from the source code of the research project announcement data.
The method according to claim 6,
Wherein the step of inserting and storing the integration notice includes inserting and storing only the non-duplicated information by determining whether the specific information is duplicated information.
The method according to claim 6,
Wherein the step of accessing the website comprises periodically reconnecting to a website to be visited and confirming whether the research project announcement data is updated.
KR1020130021844A 2013-02-28 2013-02-28 System and method for providing integrated project information KR20140107814A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
KR1020130021844A KR20140107814A (en) 2013-02-28 2013-02-28 System and method for providing integrated project information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
KR1020130021844A KR20140107814A (en) 2013-02-28 2013-02-28 System and method for providing integrated project information

Publications (1)

Publication Number Publication Date
KR20140107814A true KR20140107814A (en) 2014-09-05

Family

ID=51755269

Family Applications (1)

Application Number Title Priority Date Filing Date
KR1020130021844A KR20140107814A (en) 2013-02-28 2013-02-28 System and method for providing integrated project information

Country Status (1)

Country Link
KR (1) KR20140107814A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102425811B1 (en) * 2021-05-06 2022-07-27 주식회사 우주하나소유 Method, server and computer program for uploading product sale contents using website crawling based on artificial intelligence

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102425811B1 (en) * 2021-05-06 2022-07-27 주식회사 우주하나소유 Method, server and computer program for uploading product sale contents using website crawling based on artificial intelligence

Similar Documents

Publication Publication Date Title
US20180034899A1 (en) Aggregating Plug-In Requests for Improved Client Performance
US8078986B1 (en) Method and system for a browser module
US9082129B2 (en) Providing recommendations on a social networking system page
US20170308251A1 (en) User Interface with Media Wheel Facilitating Viewing of Media Objects
US10817613B2 (en) Access and management of entity-augmented content
US10819772B2 (en) Transformation of a content file into a content-centric social network
Littman et al. API-based social media collecting as a form of web archiving
US20170177317A1 (en) Dependency-Aware Transformation of Multi-Function Applications for On-Demand Execution
CN108572990A (en) Information-pushing method and device
JP2018514846A (en) Web page access method, apparatus, device, and program
CN103853757A (en) Method and system for displaying information of network, terminal and information displaying and processing device
KR101582620B1 (en) Method for providing social activity intergrating service
CN109325197A (en) Method and apparatus for extracting information
US20090112833A1 (en) Federated search data normalization for rich presentation
JP2015064623A (en) Page site server, program, and method for immediately displaying remarked portion about page content
CN104573120A (en) Recommendation information obtaining method and device for terminal
US20140108619A1 (en) Information providing system and method for providing information
KR20140107814A (en) System and method for providing integrated project information
KR101734533B1 (en) Method for providing news of multi-nations
US10572523B1 (en) Method and apparatus of obtaining and organizing relevant user defined information
CN113590985B (en) Page jump configuration method and device, electronic equipment and computer readable medium
US20150026266A1 (en) Share to stream
KR101140262B1 (en) System, method and computer readable recording medium for providing search result
KR101482143B1 (en) Apparatus for providing additional information based on type of auto-completed word and method thereof
KR101372584B1 (en) System and method for providing object information

Legal Events

Date Code Title Description
A201 Request for examination
E902 Notification of reason for refusal
E601 Decision to refuse application