WO2021235617A1 - System for recommending scientific and technical knowledge information, and method therefor - Google Patents

System for recommending scientific and technical knowledge information, and method therefor Download PDF

Info

Publication number
WO2021235617A1
WO2021235617A1 PCT/KR2020/014373 KR2020014373W WO2021235617A1 WO 2021235617 A1 WO2021235617 A1 WO 2021235617A1 KR 2020014373 W KR2020014373 W KR 2020014373W WO 2021235617 A1 WO2021235617 A1 WO 2021235617A1
Authority
WO
WIPO (PCT)
Prior art keywords
science
knowledge information
technology
information
scientific
Prior art date
Application number
PCT/KR2020/014373
Other languages
French (fr)
Korean (ko)
Inventor
김인수
Original Assignee
위인터랙트 주식회사
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 위인터랙트 주식회사 filed Critical 위인터랙트 주식회사
Publication of WO2021235617A1 publication Critical patent/WO2021235617A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/258Heading extraction; Automatic titling; Numbering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/268Morphological analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present invention relates to a scientific and technological knowledge information recommendation system and a method therefor. More specifically, it is a user-customized science and technology knowledge information recommendation based on user information, and for the efficient recommendation of science and technology knowledge information in various fields, it is a top-level science that can connect science and technology knowledge information between patent information, thesis information, and user information.
  • Establish a technology R&D classification system build a science and technology metadata (science and technology knowledge information) similarity network based on the established top-level science and technology research and development classification system, and build a patent around users in the established science and technology knowledge information similarity network and a science and technology knowledge information recommendation system and method that can recommend customized papers to users.
  • the disclosed invention is a method of discovering a subsequent development item customized for a company, comprising the steps of: (a) constructing a patent database except for patents whose patent applicants are large corporations, universities, or public research institutes; (b) in the patent database, calculating and constructing a preference for each IPC using the patent frequency for IPC for each applicant; (c) extracting similar companies by establishing the database, setting the reference applicant in the user system, calculating the similarity between the reference applicant and the random applicant using the preference for each IPC between the established reference applicant and the random applicant; (d) calculating a correlation index for each IPC using the degree of similarity between the reference applicant and the similar company and the patent frequency of the similar company for a specific IPC; and (e) extracting an IPC having a high correlation index in the above, and recommending a technical field corresponding to the extracted IPC as a subsequent item.
  • the preference is a patent frequency or a fuzzy application value adjusted within a certain range by applying a fuzzy to the patent frequency
  • the fuzzy application value is a value converted to a certain scale by applying a fuzzy to the patent frequency
  • the degree of similarity can be calculated using the preference value of the specific IPC of the reference applicant and the preference value of the specific IPC of the arbitrary applicant. It is a value obtained by normalizing the sum of values multiplied by the similarity of It is an invention further comprising the step of presenting the R&D direction for each patent classification code after calculating at least one of the growth potential indicating the degree of competition of the patent classification code and the self-growth degree of the corresponding patent classification code.
  • the heterogeneity is expressed as the reciprocal of the similarity between a specific patent classification code with a high correlation index and a patent classification code possessed by the reference applicant, and the degree of competition refers to the total amount of patents applied for a specific patent classification code with a high correlation index, , the growth potential is an invention meaning the average increase rate of patents applied for a specific patent classification code with a high correlation index.
  • a patent database is constructed except for patents whose patent applicants are large corporations, universities, or public research institutes, and in the patent database, the preference for each IPC is calculated and constructed using the patent frequency for IPC for each applicant, and the database is constructed.
  • a reference applicant is set in the user system, and the degree of similarity between the reference applicant and the random applicant is calculated using the preference for each IPC between the set reference applicant and the random applicant, and the R&D direction is recommended for each calculated IPC.
  • the disclosed invention is a configuration that calculates and recommends the preference for each IPC of the standard applicant and the applicant based on the International Patent Classification (IPC) of the patent database and the similarity based on this, and it is a configuration of various scientific and technological knowledge information related to research and development. Since each element cannot be included, there is a problem that systematic and customized scientific and technological knowledge information cannot be provided based on various scientific and technological knowledge information.
  • IPC International Patent Classification
  • the present invention is to solve the problems of the prior art, and an object of the present invention is to recommend user-customized scientific and technological knowledge information based on user information, and for efficient recommendation of scientific and technological knowledge information in various fields, patent information, thesis information and a top-level science and technology R&D classification system that can connect science and technology knowledge information between user information and a similarity network of science and technology metadata (science and technology knowledge information)
  • An object of the present invention is to provide a science and technology knowledge information recommendation system and method capable of recommending patents and thesis information around users within the established and constructed science and technology knowledge information similarity network in a customized way.
  • a technical solution means for achieving the object of the present invention, as a first aspect of the present invention, by collecting and managing scientific and technological knowledge information, and building a scientific and technological knowledge information similarity network based on the collected scientific and technological knowledge information, users an operating computer that provides scientific and technological knowledge information customized to people; a member information data storage unit for storing and managing member information that is connected to the operating computer and joined as a member to the operating computer, information related to science and technology of members, and information on use of scientific and technology knowledge information of members; a science and technology knowledge information data storage unit that is communicatively connected to the operating computer and stores and manages patent information collected by the operating computer, thesis information, and collected information collected from social networks; It is connected to the operating computer to store and manage science and technology word-based similarity model information, science and technology related R&D classification information, similarity network information of science and technology knowledge information, and use information of science and technology knowledge information built by the operating computer.
  • a construction science and technology knowledge information data storage unit at least one user terminal that is communicatively connected to the operating computer to provide membership registration, user science and technology related information, and the like, and to receive customized scientific and technological knowledge information from the operating computer; a science and technology information providing computer communicating with the operating computer and providing scientific and technological document information in response to a request for providing information from the operating computer; a patent information providing computer communicating with the operating computer and providing patent information in response to a request for providing information from the operating computer; a thesis information providing computer that is connected to the operating computer and provides thesis information in response to a request for providing information from the operating computer; the operating computer includes a social network medium such as the Internet and social networks that collects various information related to science and technology through communication access;
  • the operating computer builds a word similarity model of science and technology information through text data preprocessing and artificial neural network learning from a large amount of science and technology documents, and builds and stores the highest level science and technology R&D classification system based on international or domestic science and technology classification systems Based on the similarity between the constructed top-level scientific and technological R&D classification system, a reference similarity network is built and stored, and science and technology knowledge information including patent and thesis information is added using the established reference similarity network.
  • a science and technology knowledge information recommendation system characterized by constructing and storing a similarity network, and recommending patents and papers around members in the constructed science and technology knowledge information similarity network to users.
  • the operation computer comprises the steps of: building a word similarity model of scientific and technological information through text data preprocessing and artificial neural network learning in a large amount of scientific and technological documents; establishing, by the operating computer, a top-level scientific and technological research and development (R&D) classification system capable of linking heterogeneous scientific and technological knowledge information; constructing a standard scientific and technological knowledge information similarity network by performing, by the operating computer, a similarity calculation between the highest scientific and technological R&D classification systems using the word similarity model; constructing a science and technology knowledge information similarity network to which science and technology knowledge information is added by using the reference science and technology knowledge information similarity network in which the operating computer is built; Calculating, by the operating computer, the initial similarity of members in the science and technology knowledge information similarity network constructed using member information including industry classification, field of interest, field of specialization, and university major input at the time of membership registration (S140) Wow;
  • a science and technology knowledge information recommendation method is provided, including the step of the operating computer
  • the highest scientific and technological R&D classification system that can connect scientific and technological knowledge information among patent information, thesis information, and user information is constructed, and the Establish a similarity network of science and technology metadata (science and technology knowledge information) based on the science and technology R&D classification system, and use science and technology knowledge information including patents and papers around users in the constructed science and technology knowledge information similarity network It has the effect of being able to make a customized recommendation for you.
  • 1 ⁇ is a schematic configuration diagram of an embodiment of the science and technology knowledge information recommendation system of the present invention.
  • FIG. 2 is a schematic configuration diagram of an embodiment of an operating computer, which is a main part of the scientific and technological knowledge information recommendation system of the present invention.
  • FIG. 3 is a flowchart for explaining an embodiment of a method for recommending scientific and technological knowledge information of the present invention.
  • FIG. 4 is a flowchart for explaining the main part of the scientific and technological knowledge information recommendation method of the present invention.
  • FIG. 5 is a flowchart for explaining the main part of the method for recommending scientific and technological knowledge information of the present invention.
  • a word similarity model building means for constructing a word similarity model of scientific and technological information through text data preprocessing and artificial neural network learning in a large amount of scientific and technological documents; the highest level science and technology R&D establishment means for establishing the highest level science and technology R&D classification system based on the international or domestic science and technology classification system; a reference similarity network construction means for constructing a reference similarity network based on the similarity between the constructed top-level scientific and technological R&D classification systems; a science and technology knowledge information similarity network construction means for constructing a science and technology knowledge information similarity network to which science and technology knowledge information including patent and thesis information is added using the established reference similarity network; A science and technology knowledge information recommendation system and method including a science and technology knowledge information recommendation means for recommending science and technology knowledge information including patents and thesis around members in the established science and technology knowledge information similarity network are presented.
  • Various computers and terminals used in the present invention may consist of hardware itself, or may be composed of a computer program or a web program utilizing the hardware resources.
  • the operating computer of the present invention may consist of each component of hardware included in the computer, and may consist of a computer program or web program executed by utilizing the hardware resources of the computer.
  • the 'user interface' described in the embodiment of the present invention may be a web program or an application program that is output to a user terminal or installed and executed.
  • ' ⁇ part' described in the embodiment of the present invention may be used instead of ' ⁇ means'.
  • ' ⁇ part' or ' ⁇ means' may be a component of hardware itself, and preferably may be composed of a component of software or a program.
  • FIG. 1 is a schematic configuration diagram of an embodiment of the science and technology knowledge information recommendation system of the present invention.
  • the science and technology knowledge information recommendation system of the present invention collects and manages science and technology knowledge information, and builds a science and technology knowledge information similarity network based on the collected science and technology knowledge information to customize it to users.
  • an operating computer 100 that provides scientific and technological knowledge information
  • Similarity model information of science and technology information words constructed through communication connection with the operating computer 100, science and technology related R&D classification information, similarity network information of science and technology knowledge information, and scientific and technological knowledge built by the operating computer 100
  • a science and technology knowledge information utilization data storage unit 400 for storing and managing information use information
  • at least one user terminal 500 that is communicatively connected to the operating computer 100 to provide membership registration, user science and technology related information, and the like, and to receive customized scientific and technological knowledge information from the operating computer 100
  • the operating computer 100, the scientific and technological information providing computer 600, the patent information providing computer 700 and the thesis information providing computer 800 have their own data storage means or are communicatively connected to an external data storage means, and the present invention It may be composed of at least one server computer equipped with means for the operation and use of the scientific and technological knowledge information recommendation system.
  • the science and technology information providing computer 600 may be configured as a server that provides a national science and technology document database having a large amount of science and technology document files or an integrated science and technology document database of each country.
  • the patent information providing computer 700 may be configured as a server providing a patent information database server of each country's patent office or an integrated patent information database of each country.
  • the patent information database of each country may include, for example, a database provided by 'www.kipris.or.kr', a website that provides intellectual property information including domestic patent information to users in the case of the Republic of Korea.
  • the integrated patent information database of 'www.escape.net' a website that provides users with patent information from around the world, includes a database.
  • the thesis information providing computer 800 may be configured as a thesis information database server in each country or a server providing an integrated thesis information database in each country.
  • the member information data storage unit 200, science and technology knowledge information data storage unit 300 and science and technology knowledge information utilization data storage unit 400 may be configured as data storage means provided by the operating computer 100, preferably a database management server system (DBMS). In addition, it may be configured as one server system or may be configured as separate server systems.
  • DBMS database management server system
  • the member information data storage unit 200 is a member information storage unit 210 that stores and manages basic member information and score information for each member node of users who have signed up as members to use the science and technology knowledge information recommendation system of the present invention. )Wow; an industry classification information storage unit 220 for storing and managing industry classification information selected or input in the user interface when a user joins a membership; a field of interest information storage unit 230 for storing and managing the user's interest (field) information selected or input in the user interface when the user signs up for a membership; a specialization information storage unit 240 for storing and managing the user's current specialization information selected or input in the user interface when the user signs up for membership; a college major information storage unit 250 for storing and managing the user's college major information selected or input in the user interface when the user joins the membership; It is a configuration including a member use information storage unit 260 for a member to store and manage information using science and technology knowledge information recommendation through a user interface in the science and technology knowledge information recommendation system of the present
  • the science and technology knowledge information data storage unit 300 includes: a patent information data storage unit 310 for storing and managing the collected patent information (registered patents and published patents) of each country around the world; a thesis information data storage unit 320 for storing and managing the collected thesis information from around the world;
  • the operating computer 100 is configured to include a collection information data storage unit 330 that stores and manages science and technology-related information collected through the social networking medium 900 .
  • the science and technology knowledge information utilization data storage unit 400 includes: a science and technology word similarity model storage unit 410 for storing and managing a science and technology word similarity model performed through analysis and learning in a large number of scientific and technological documents; a top-level science and technology R&D classification information storage unit 420 that stores and manages the top-level science and technology R&D classification system constructed by organizing the international science and technology classification system and the domestic science and technology classification system so as to link science and technology knowledge information; a reference science and technology knowledge information network information storage unit 430 for storing and managing the reference science and technology knowledge information similarity network information constructed by calculating the similarity between the highest level science and technology R&D classification systems using the science and technology word similarity model; a science and technology knowledge information similarity network information storage unit 440 for storing science and technology knowledge information similarity network information constructed by adding science and technology knowledge information to the reference science and technology knowledge information similarity network; It has a configuration including a science and technology knowledge information use information storage unit 450 that stores and manages use information of science and technology
  • the member information data storage unit 200, the science and technology knowledge information data storage unit 300 and the science and technology knowledge information utilization data storage unit 400 have been separately described, it is not limited thereto. It can be configured using an integrated storage and management means, and is included in the member information data storage unit 200 , the science and technology knowledge information data storage unit 300 and the science and technology knowledge information utilization data storage unit 400 . It goes without saying that each of the storage units described above may also be configured by changing its arrangement as needed in terms of use and function.
  • the user terminal 500 outputs a user interface composed of a website or a web program provided by the operating computer 100, or downloads and executes a user interface provided by the operating computer 100 or an application program download computer. or a mobile phone, a smart phone, a tablet computer, a notebook computer, or a personal computer (PC) provided with a means for outputting a user interface by accessing the cloud computing system.
  • a mobile phone, a smart phone, a tablet computer, a notebook computer, or a personal computer (PC) provided with a means for outputting a user interface by accessing the cloud computing system.
  • FIG. 2 is a schematic configuration diagram of an embodiment of an operating computer, which is a main part of the scientific and technological knowledge information recommendation system of the present invention.
  • the operating computer 100 of the present invention includes: a user interface management unit 101 that manages identification information and update information of a user interface to be provided to a user terminal; a science and technology information collection and management unit 102 that collects scientific and technological information, patent information, and thesis information from around the world; a member information management unit 103 for storing and managing basic member information and score information for each member node, which users who use the scientific and technological knowledge information recommendation system of the present invention have signed up as members; a member science and technology knowledge information management unit 104 for storing and managing industry classification, field of interest, field of specialization, and university major information selected or input by the users when signing up for membership; a patent information management unit 105 for storing and extracting patent information from around the world collected by the science and technology information collection and management unit 102; a thesis information management unit 106 for storing and extracting science and technology-related thesis information from around the world collected by the science and technology information collection and management unit 102; a science and technology information collection management unit 107 that stores and
  • the operating computer 100 receives membership registration from users who want to use the science and technology knowledge information recommendation system of the present invention, and provides basic member information and industry classification, fields of interest, specialties and university majors provided by users when they sign up for membership. Receive and manage the same user science and technology knowledge information.
  • the operating computer 100 extracts the main text excluding unnecessary paragraphs from each scientific and technical document based on a large amount of scientific and technical document files collected by itself or transmitted from the outside, and morphological analysis from the extracted scientific and technical document body After extracting only noun words from the text using an algorithm, stopword processing is performed to delete unnecessary words to express the characteristics of a sentence or document, such as words that appear frequently such as prepositions and articles.
  • a morpheme in a morpheme analysis algorithm refers to a "minimum semantic unit" in a language.
  • meaning includes both lexical and grammatical meanings.
  • Morphological analysis refers to the process of segmenting a word or sentence, which is a language unit with a larger unit than a morpheme, into a morpheme, which is the smallest unit of meaning.
  • the unsupervised learning algorithm is to find out how the data is structured without a target value for the input data.
  • the goal It is a fast machine learning method because there is no need to set a value and no prior learning is required.
  • the text is extracted from each scientific and technological document from the large amount of scientific and technological document database, and unnecessary paragraphs are excluded from the main text in this process.
  • Words composed of nouns are extracted, stopword processing is performed on the extracted words, and based on the words, a word similarity model related to science and technology is built through artificial neural network or machine learning learning.
  • It performs processing calculates the meaning between the stopword-processed words as a specific vector value, builds training data for model learning, and learns a model to which an artificial neural network or unsupervised learning algorithm, a type of machine learning, is applied. It may be configured to build a similarity model between the extracted words.
  • the operating computer 100 is a science for establishing an effective science and technology knowledge information recommendation system between each heterogeneous science and technology knowledge information, for example, science and technology related papers, patents, and user information who are science and technology experts.
  • Establish a top-level scientific and technological R&D classification system that can connect technical knowledge information.
  • the establishment of the top-level scientific and technological R&D classification system can utilize domestic as well as international information.
  • the OECD's FORD system and Korea's national science and technology classification system can be organized and integrated to build.
  • a classification system can be constructed in the field of mathematics, such as mathematics as a large classification, algebra as a medium classification, and linear algebra as a small classification.
  • the operating computer 100 constructs a reference science and technology knowledge information similarity network by calculating the similarity between the top-level scientific and technological R&D classification system constructed using the constructed science and technology related word similarity model.
  • the similarity between the sub-classifications of the top-level science and technology R&D classification system is calculated by intervening the science and technology-related word similarity model, and the degree of similarity is fine-tuned using the large and medium classifications of the top-level science and technology R&D classification system to obtain scientific and technological knowledge.
  • the sub-classification of the top-level scientific and technological R&D classification system can be a node
  • the degree of similarity can be a relationship.
  • the form of the established standard science and technology knowledge information similarity network can be managed as shown in Table 1 below.
  • the operating computer 100 builds a science and technology knowledge information similarity network by adding science and technology knowledge information in various science and technology fields to the constructed reference science and technology knowledge information similarity network.
  • the science and technology knowledge information to be utilized includes patent information, thesis information, industry classification of members, fields of interest, scientific and technological knowledge information in science and technology fields such as specialized fields and university majors. can be heard
  • the international patent classification (IPC: International Patent Classification) and keywords of the invention are added to the standard science and technology knowledge information similarity network, and the similarity of science and technology words Using the model, it is possible to determine to which node a specific patented invention belongs by calculating the similarity between a set of sentences or words describing the International Patent Classification (IPC) information and the node. In this case, it is natural that one patented invention can have a plurality of nodes.
  • TF-IDF Term Frequency - Inverse Document Frequency
  • TF-IDF Term Frequency - Inverse Document Frequency
  • the TF-IDF is a weight used in information retrieval and text mining, and is a statistical value indicating how important a word is in a specific document when there is a document group consisting of several documents.
  • the Text-Rank technique is based on the page rank algorithm and is known as a technique for summarizing a single document by weighting sentences and words by comparing the similarity, but it has a characteristic of extracting high-frequency words and sentences. It is a strong algorithm.
  • the thesis topic classification and keywords are used to be added to the standard science and technology knowledge information similarity network, and the thesis topic classification and node using the science and technology word similarity model By calculating the degree of similarity between the two, it is possible to determine which node a specific paper belongs to. In this case, it is natural that one paper can have multiple nodes.
  • TF-IDF Term Frequency - Inverse Document Frequency
  • TF-IDF Term Frequency - Inverse Document Frequency
  • the industry classification code is used to add it to the standard science and technology knowledge information similarity network, and the industry classification code is used using the science and technology word similarity model. It is possible to calculate the similarity between the sentence or word set and the node that describes In this case, it is natural that one industry classification may have a plurality of nodes.
  • the interest and specialization are large amounts of data in the form of nouns, which are used as standard science and technology using a science and technology word similarity model. It can be added to the knowledge information similarity network, and the similarity between words and nodes related to interests and specialties can be calculated using the science and technology word similarity model. In this case, it is natural that one interest and field of expertise may have a plurality of nodes.
  • the operating computer 100 performs node management of members in the established scientific and technological knowledge information similarity network.
  • a member has a score for each node and may belong to the node with the highest score.
  • the score management for each node of the member is performed by setting an initial node based on the basic member information input by the user when signing up for the science and technology knowledge information recommendation system of the present invention, and selecting the university major, industry classification, and field of interest selected by the user.
  • a member when a member uses the service of the science and technology knowledge information recommendation system of the present invention, it can be configured to calculate a score based on data collected using weights and cookies, and add the score to a node corresponding to the usage history.
  • the member's member information when there is a modification to the member's member information (university major, industry, interest and specialization), it may be configured to calculate the node score of the member by adding or subtracting the score of the corresponding node.
  • the operating computer recommends science and technology knowledge information based on the constructed similarity network of science and technology knowledge information and node scores of managed members.
  • the process of recommending scientific and technological knowledge information calculates the difference in scores between the node to which the member belongs and other nodes, and determines the number and depth of other nodes to be used according to the difference in scores. In this case, the greater the difference between the scores, the smaller the number of nodes may be, and the greater the difference, the deeper it may be determined.
  • Recommends filtered science and technology knowledge information by performing filtering using conditions such as year, number of citations, and member inquiry status from the science and technology knowledge information extracted according to the science and technology knowledge information similarity network and member's node score can be configured to
  • the score difference between the member node and other nodes is calculated in the member information storage unit where the score information for each node of the member is stored, and the number of nodes to be extracted is determined according to the calculated score difference.
  • Science and technology knowledge information recommended here may include patent information and thesis information.
  • the method for recommending scientific and technological knowledge information of the present invention comprises the steps of, by an operating computer, constructing a word similarity model of scientific and technological information through text data preprocessing and artificial neural network learning in a large number of scientific and technological documents (S100) Wow; Step (S110) of the operating computer to establish a top-level scientific and technological research and development (R&D) classification system that can connect scientific and technological knowledge information in heterogeneous science and technology fields; the operation computer constructing a standard science and technology knowledge information similarity network by performing a similarity calculation between the highest scientific and technological R&D classification systems using the word similarity model (S120); constructing a science and technology knowledge information similarity network to which science and technology knowledge information is added using the reference science and technology knowledge information similarity network in which the operating computer is built (S130); Calculating, by the operating computer, the initial similarity of members in the science and technology knowledge information
  • the operation computer may further include the step of resetting the member's similarity in the scientific and technological knowledge information similarity network by using the member's scientific and technological knowledge information use information.
  • the step of constructing the word similarity model of the present invention includes extracting the main text excluding unnecessary paragraphs from a large amount of scientific and technological documents (S101) and: morphological analysis from the extracted main text extracting only words that are nouns using a technique (S102); performing stopword processing on the extracted words (S103); It is a configuration including the step (S104) of constructing a science and technology-related word similarity model through artificial neural network or machine learning learning based on the stopword-processed word.
  • the step of recommending the scientific and technological knowledge information of the present invention is a step of calculating the difference in score between the member node and other nodes in the member information storage unit in which the score information for each node of the member is stored. (S151) and; determining the number of nodes to be extracted according to the calculated score difference (S152); determining a node depth to be used according to the calculated score difference (S153); filtering the extracted scientific and technological knowledge information using various conditions (S154); It is a configuration including a step (S155) of recommending scientific and technological knowledge information including filtered patent and thesis information.
  • the embodiments of the present invention described above are only some of the various embodiments.
  • the operating computer of the present invention builds a word similarity model of science and technology information through text data preprocessing and artificial neural network learning from a large amount of science and technology documents, and builds a top-level science and technology R&D classification system based on international or domestic science and technology classification systems And, based on the similarity between the constructed top-level scientific and technological R&D classification system, a standard similarity network is built, and science and technology knowledge information similarity network that uses the established standard similarity network to add science and technology knowledge information including patent and thesis information It is natural that various embodiments including in the technical idea of constructing and recommending scientific and technological knowledge information including patents and papers around members in the established scientific and technological knowledge information similarity network to users are included in the protection scope of the present invention. .
  • the present invention can be applied to science and technology related knowledge information data industry.

Abstract

The present invention relates to a system for recommending scientific and technical knowledge information, and a method therefor, and the present invention provides a system for recommending scientific and technical knowledge information and a method therefor, the system comprising: a word-similarity model construction means for constructing a word-similarity model of scientific and technical information through text data pre-processing and artificial neural network learning from a large number of scientific and technical documents; a top-level science and technology R&D construction means for constructing top-level science and technology R&D classification systems on the basis of international or domestic science and technology classification systems; a reference-similarity network construction means for constructing a reference-similarity network on the basis of the similarity between the constructed top-level science and technology R&D classification systems; a scientific and technical knowledge information similarity network construction means using the constructed reference-similarity network to construct a scientific and technical knowledge information similarity network to which scientific and technical knowledge information, including patent and thesis information, is added; and a scientific and technical knowledge information recommendation means for recommending scientific and technical knowledge information including patents and theses that are adjacent to a member in the constructed scientific and technical knowledge information similarity network.

Description

과학기술 지식정보 추천 시스템 및 그 방법 Science and technology knowledge information recommendation system and method therefor
본 발명은 과학기술 지식정보 추천 시스템 및 그 방법에 관한 것이다. 더 상세하게는 사용자정보 기반으로 사용자 맞춤형 과학기술 지식정보 추천으로서, 각 다양한 분야의 과학기술 지식정보의 효율적인 추천을 위해 특허정보, 논문정보 및 사용자정보들 간의 과학기술 지식정보를 연결할 수 있는 최상위 과학기술 연구개발 분류체계를 구축하고, 구축된 최상위 과학기술 연구개발 분류체계를 기준으로 과학기술 메타데이터(과학기술 지식정보) 유사도 네트워크를 구축하고, 구축된 과학기술 지식정보 유사도 네트워크 내 사용자 주변의 특허와 논문을 사용자에게 맞춤형으로 추천할 수 있는 과학기술 지식정보 추천 시스템 및 그 방법에 관한 것이다. The present invention relates to a scientific and technological knowledge information recommendation system and a method therefor. More specifically, it is a user-customized science and technology knowledge information recommendation based on user information, and for the efficient recommendation of science and technology knowledge information in various fields, it is a top-level science that can connect science and technology knowledge information between patent information, thesis information, and user information. Establish a technology R&D classification system, build a science and technology metadata (science and technology knowledge information) similarity network based on the established top-level science and technology research and development classification system, and build a patent around users in the established science and technology knowledge information similarity network and a science and technology knowledge information recommendation system and method that can recommend customized papers to users.
연구자 또는 기업 등은 지속적인 과학기술의 연구개발을 하고 있고, 그 성과를 내고 있다. 연구자의 경우 주로 본인이 연구해 온 과학기술분야에 관한 연구를 주로 관련 기술분야의 논문 등을 참조하면서 연구를 지속하고 있다. 기업들의 경우는 기술개발 분야의 부서별 개발자들이 회사의 후속 아이템 또는 신사업 아이템에 관하여 논의를 통하여 협업으로 개발을 지속하고 있다. Researchers or companies are continuously conducting research and development of science and technology, and are producing results. In the case of a researcher, the research on the science and technology field in which he or she has been researching is continued, mainly referring to papers in the related technology field. In the case of companies, developers by departments in the field of technology development continue to develop collaboratively through discussions about the company's follow-up items or new business items.
그러나 현재의 연구자들 또는 기업들은 연구개발을 하기 위한 관련 기술분야의 특허정보나 논문정보 등을 사용자 중심의 체계적이고 맞춤형으로 제공받거나 접할 수 없어서, 연구자 들의 시간과 노력을 투입하여 필요한 과학기술 지식정보를 확보하고 있는 상황이다.However, current researchers or companies cannot receive or access patent information or thesis information in related technology fields for R&D in a user-centered, systematic and customized manner. is being secured.
이와 같은 문제점을 해결하기 위한 연구가 계속되어 왔다. 관련된 발명을 살펴보면, 대한민국 공개특허번호 제10-2019-0115505호(공개일: 2019년10월14일)의 기업 맞춤형 후속 개발 아이템 발굴 방법의 발명이 공개되어 있다. Research to solve such problems has been continued. Looking at the related invention, the invention of the company-customized follow-up development item discovery method of Republic of Korea Patent Publication No. 10-2019-0115505 (published date: October 14, 2019) is disclosed.
상기 공개발명은, 기업 맞춤형 후속 개발 아이템 발굴 방법으로서, (a) 특허 출원인이 대기업, 대학교 또는 공공연구기관인 특허를 제외하고 특허 데이터베이스를 구축하는 단계; (b) 특허 데이터베이스에서, 각 출원인별로 IPC에 대한 특허 빈도수를 이용하여 IPC별 선호도를 계산하여 구축하는 단계; (c) 상기 데이터베이스를 구축한 후, 사용자 시스템에서 기준 출원인을 설정하고, 설정된 기준 출원인과 임의 출원인 간의 IPC별 선호도를 이용하여 기준 출원인과 임의 출원인 간의 유사도를 계산하여 유사기업을 추출하는 단계; (d) 상기 기준 출원인과 유사기업 간의 유사도와 특정 IPC에 대한 유사기업의 특허 빈도수를 이용하여 각 IPC별 연관지수를 산출하는 단계; 및 (e) 상기에서 연관지수가 높은 IPC를 추출하고, 추출된 IPC에 대응되는 기술분야를 후속 아이템으로 추천하는 단계를 포함하는 발명이다. The disclosed invention is a method of discovering a subsequent development item customized for a company, comprising the steps of: (a) constructing a patent database except for patents whose patent applicants are large corporations, universities, or public research institutes; (b) in the patent database, calculating and constructing a preference for each IPC using the patent frequency for IPC for each applicant; (c) extracting similar companies by establishing the database, setting the reference applicant in the user system, calculating the similarity between the reference applicant and the random applicant using the preference for each IPC between the established reference applicant and the random applicant; (d) calculating a correlation index for each IPC using the degree of similarity between the reference applicant and the similar company and the patent frequency of the similar company for a specific IPC; and (e) extracting an IPC having a high correlation index in the above, and recommending a technical field corresponding to the extracted IPC as a subsequent item.
상기에서 선호도는 특허 빈도수 또는 상기 특허 빈도수에 퍼지를 적용하여 일정범위 내로 조정된 퍼지 적용값이며, 상기 퍼지 적용값은 특허 빈도수에 퍼지를 적용하여 일정 척도로 변환된 값인 것을 특징으로 하고, 또한, 유사도는 기준 출원인의 특정 IPC의 선호도값과 임의 출원인의 특정 IPC의 선호도값을 이용하여 산출될 수 있으며, 연관지수는 임의 출원인이 해당 특허분류코드에 특허를 출원한 빈도수와 기준 출원인과 상기 임의 출원인의 유사도를 곱한 값의 합을 정규화시킨 값이고, 상기 (d)단계를 수행한 후, 상기 후속 아이템으로 추천된 특허분류코드의 기술 특성을 분석하여, 기준 출원인이 보유한 기술영역과의 이종성, 해당 특허분류코드의 경쟁도, 및 해당 특허분류코드의 자체 성장정도를 나타내는 성장성 중 적어도 하나 이상을 산출한 후 특허분류코드별로 R&D방향을 제시하는 단계를 더 포함하는 발명이다. In the above, the preference is a patent frequency or a fuzzy application value adjusted within a certain range by applying a fuzzy to the patent frequency, and the fuzzy application value is a value converted to a certain scale by applying a fuzzy to the patent frequency, The degree of similarity can be calculated using the preference value of the specific IPC of the reference applicant and the preference value of the specific IPC of the arbitrary applicant. It is a value obtained by normalizing the sum of values multiplied by the similarity of It is an invention further comprising the step of presenting the R&D direction for each patent classification code after calculating at least one of the growth potential indicating the degree of competition of the patent classification code and the self-growth degree of the corresponding patent classification code.
상기 이종성은 연관지수가 높은 특정 특허분류코드와 기준 출원인이 보유하고 있는 특허분류코드 간의 유사도에 대한 역수로 표현되고, 상기 경쟁도는 연관지수가 높은 특정 특허분류코드에 출원된 특허 총량을 의미하며, 상기 성장성은 연관지수가 높은 특정 특허분류코드에 출원된 특허의 평균 증가율을 의미하는 발명이다.The heterogeneity is expressed as the reciprocal of the similarity between a specific patent classification code with a high correlation index and a patent classification code possessed by the reference applicant, and the degree of competition refers to the total amount of patents applied for a specific patent classification code with a high correlation index, , the growth potential is an invention meaning the average increase rate of patents applied for a specific patent classification code with a high correlation index.
상기 공개발명은 특허 출원인이 대기업, 대학교 또는 공공연구기관인 특허를 제외하고 특허 데이터베이스를 구축하고, 특허 데이터베이스에서, 각 출원인별로 IPC에 대한 특허 빈도수를 이용하여 IPC별 선호도를 계산하여 구축하고, 상기 데이터베이스를 구축한 후, 사용자 시스템에서 기준 출원인을 설정하고, 설정된 기준 출원인과 임의 출원인 간의 IPC별 선호도를 이용하여 기준 출원인과 임의 출원인 간의 유사도를 계산하여 산출된 IPC별로 연구개발 방향을 추천하는 발명이다.In the disclosed invention, a patent database is constructed except for patents whose patent applicants are large corporations, universities, or public research institutes, and in the patent database, the preference for each IPC is calculated and constructed using the patent frequency for IPC for each applicant, and the database is constructed. After constructing , a reference applicant is set in the user system, and the degree of similarity between the reference applicant and the random applicant is calculated using the preference for each IPC between the set reference applicant and the random applicant, and the R&D direction is recommended for each calculated IPC.
그러나, 상기 공개발명은 특허 데이터베이스의 국제특허분류(IPC)를 기준으로 기준 출원인과 인의 출원인의 IPC별 선호도 및 이를 기초로 유사도를 산출하여 추천하는 구성으로, 연구개발에 관한 다양한 과학기술 지식정보의 각 요소들을 포함할 수 없으므로 다양한 과학기술 지식정보를 기반으로 체계적이고 맞춤형 과학기술 지식정보를 제공할 수 없는 문제가 있다. However, the disclosed invention is a configuration that calculates and recommends the preference for each IPC of the standard applicant and the applicant based on the International Patent Classification (IPC) of the patent database and the similarity based on this, and it is a configuration of various scientific and technological knowledge information related to research and development. Since each element cannot be included, there is a problem that systematic and customized scientific and technological knowledge information cannot be provided based on various scientific and technological knowledge information.
따라서, 사용자정보 기반으로 사용자 맞춤형 과학기술 지식정보 추천으로서, 각 다양한 분야의 과학기술 지식정보의 효율적인 추천을 위해 특허정보, 논문정보 및 사용자정보들 간의 과학기술 지식정보를 연결할 수 있는 최상위 과학기술 연구개발 분류체계를 구축하고, 구축된 최상위 과학기술 연구개발 분류체계를 기준으로 과학기술 메타데이터(과학기술 지식정보) 유사도 네트워크를 구축하고, 구축된 과학기술 지식정보 유사도 네트워크 내 사용자 주변의 특허와 논문을 사용자에게 맞춤형으로 추천할 수 있는 발명이 요망된다. Therefore, as a user-customized science and technology knowledge information recommendation based on user information, the highest level science and technology research that can connect science and technology knowledge information among patent information, thesis information, and user information for efficient recommendation of science and technology knowledge information in various fields A development classification system is established, and a science and technology metadata (science and technology knowledge information) similarity network is established based on the established top-level scientific technology R&D classification system, and patents and papers around users within the established science and technology knowledge information similarity network are established. An invention that can recommend customized to a user is desired.
본 발명은 상기 종래기술의 문제점을 해결하기 위한 것으로, 본 발명의 목적은 사용자 정보 기반으로 사용자 맞춤형 과학기술 지식정보 추천으로서, 각 다양한 분야의 과학기술 지식정보의 효율적인 추천을 위해 특허정보, 논문정보 및 사용자 정보들 간의 과학기술 지식정보를 연결할 수 있는 최상위 과학기술 연구개발 분류체계를 구축하고, 구축된 최상위 과학기술 연구개발 분류체계를 기준으로 과학기술 메타데이터(과학기술 지식정보)의 유사도 네트워크를 구축하고, 구축된 과학기술 지식정보 유사도 네트워크 내 사용자 주변의 특허와 논문 정보를 사용자에게 맞춤형으로 추천할 수 있는 과학기술 지식정보 추천 시스템 및 그 방법을 제공함에 있다.The present invention is to solve the problems of the prior art, and an object of the present invention is to recommend user-customized scientific and technological knowledge information based on user information, and for efficient recommendation of scientific and technological knowledge information in various fields, patent information, thesis information and a top-level science and technology R&D classification system that can connect science and technology knowledge information between user information and a similarity network of science and technology metadata (science and technology knowledge information) An object of the present invention is to provide a science and technology knowledge information recommendation system and method capable of recommending patents and thesis information around users within the established and constructed science and technology knowledge information similarity network in a customized way.
상기 본 발명의 목적을 달성하기 위한 기술적 해결 수단으로서, 본 발명의 제1 관점으로, 과학기술 지식정보를 수집하여 관리하고 수집된 과학기술 지식정보를 기초로 과학기술 지식정보 유사도 네트워크를 구축하여 사용자들에게 맞춤형으로 과학기술 지식정보를 제공하는 운영컴퓨터와; 상기 운영컴퓨터에 통신 접속되어 상기 운영컴퓨터에 회원 가입한 회원정보, 회원들의 과학기술 관련 정보 및 회원들의 과학기술 지식정보 이용 정보 등을 저장하고 관리하는 회원정보데이터저장부와; 상기 운영컴퓨터에 통신 접속되어 상기 운영컴퓨터가 수집하는 특허정보, 논문정보 및 사회관계망에서 수집한 수집정보 등을 저장하고 관리하는 과학기술 지식정보데이터저장부와; 상기 운영컴퓨터에 통신 접속되어 과학기술 단어기반 유사도모델 정보, 과학기술 관련 연구개발 분류 정보, 과학기술 지식정보의 유사도 네트워크 정보 및 상기 운영컴퓨터가 구축한 과학기술 지식정보의 이용 정보를 저장하고 관리하는 구축과학기술 지식정보데이터저장부와; 상기 운영컴퓨터에 통신 접속되어 회원 가입, 사용자 과학기술 관련 정보 등을 제공하고 상기 운영컴퓨터로부터 맞춤형 과학기술 지식정보를 제공받는 적어도 하나의 사용자단말기와; 상기 운영컴퓨터와 통신 접속되어 상기 운영컴퓨터의 정보 제공 요청에 따라서 과학기술문서정보를 제공하는 과학기술정보제공컴퓨터와; 상기 운영컴퓨터와 통신 접속되어 상기 운영컴퓨터의 정보 제공 요청에 따라서 특허정보를 제공하는 특허정보제공컴퓨터와; 상기 운영컴퓨터와 통신 접속되어 상기 운영컴퓨터의 정보 제공 요청에 따라서 논문정보를 제공하는 논문정보제공컴퓨터와; 상기 운영컴퓨터가 통신 접속하여 과학기술 관련 각종 정보를 수집하는 인터넷, 소셜네트워크 등의 사회관계망매체를 포함하고;As a technical solution means for achieving the object of the present invention, as a first aspect of the present invention, by collecting and managing scientific and technological knowledge information, and building a scientific and technological knowledge information similarity network based on the collected scientific and technological knowledge information, users an operating computer that provides scientific and technological knowledge information customized to people; a member information data storage unit for storing and managing member information that is connected to the operating computer and joined as a member to the operating computer, information related to science and technology of members, and information on use of scientific and technology knowledge information of members; a science and technology knowledge information data storage unit that is communicatively connected to the operating computer and stores and manages patent information collected by the operating computer, thesis information, and collected information collected from social networks; It is connected to the operating computer to store and manage science and technology word-based similarity model information, science and technology related R&D classification information, similarity network information of science and technology knowledge information, and use information of science and technology knowledge information built by the operating computer. a construction science and technology knowledge information data storage unit; at least one user terminal that is communicatively connected to the operating computer to provide membership registration, user science and technology related information, and the like, and to receive customized scientific and technological knowledge information from the operating computer; a science and technology information providing computer communicating with the operating computer and providing scientific and technological document information in response to a request for providing information from the operating computer; a patent information providing computer communicating with the operating computer and providing patent information in response to a request for providing information from the operating computer; a thesis information providing computer that is connected to the operating computer and provides thesis information in response to a request for providing information from the operating computer; the operating computer includes a social network medium such as the Internet and social networks that collects various information related to science and technology through communication access;
상기 운영컴퓨터가 대량의 과학기술문서에서 텍스트 데이터 전처리 및 인공신경망 학습을 통해 과학기술 정보의 단어 유사도 모델을 구축하고, 국제 또는 국내 과학기술분류체계를 기초로 최상위 과학기술 R&D 분류체계를 구축하여 저장시키고, 구축된 상기 최상위 과학기술 R&D 분류체계 간의 유사도를 토대로 기준 유사도 네트워크를 구축하여 저장시키고, 구축된 기준 유사도 네트워크를 이용하여 특허 및 논문 정보를 포함하는 과학기술 지식정보를 추가한 과학기술 지식정보 유사도 네트워크를 구축하여 저장시키고, 구축된 과학기술 지식정보 유사도 네트워크 내 회원 주변의 특허와 논문을 사용자에게 추천하는 것을 특징으로 하는 과학기술 지식정보 추천시스템이 제시된다. The operating computer builds a word similarity model of science and technology information through text data preprocessing and artificial neural network learning from a large amount of science and technology documents, and builds and stores the highest level science and technology R&D classification system based on international or domestic science and technology classification systems Based on the similarity between the constructed top-level scientific and technological R&D classification system, a reference similarity network is built and stored, and science and technology knowledge information including patent and thesis information is added using the established reference similarity network. A science and technology knowledge information recommendation system characterized by constructing and storing a similarity network, and recommending patents and papers around members in the constructed science and technology knowledge information similarity network to users.
또한, 본 발명의 제2 관점으로, 운영컴퓨터가 대량의 과학기술문서에서 텍스트 데이터 전처리 및 인공신경망 학습을 통해 과학기술 정보의 단어 유사도 모델을 구축하는 단계와; 상기 운영컴퓨터가 이질적인 과학기술 지식정보를 연결할 수 있는 최상위 과학기술 연구개발(R&D) 분류 체계를 구축하는 단계와; 상기 운영컴퓨터가 상기 단어 유사도 모델을 이용하여 최상위 과학기술 R&D 분류체계 간 유사도 계산을 수행하여 기준 과학기술 지식정보 유사도 네트워크를 구축하는 단계와; 상기 운영컴퓨터가 구축된 상기 기준 과학기술 지식정보 유사도 네트워크를 이용하여 과학기술 지식정보를 추가한 과학기술 지식정보 유사도 네트워크를 구축하는 단계와; 상기 운영컴퓨터가 사용자의 회원 가입시 입력한 산업분류, 관심분야, 전문분야 및 대학전공을 포함하는 회원 정보를 이용하여 구축된 상기 과학기술 지식정보 유사도 네트워크 내 회원의 초기 유사도를 산출하는 단계(S140)와; 상기 운영컴퓨터가 상기 과학기술 지식정보 유사도 네트워크 내 회원 주변의 과학기술 지식정보를 추천하는 단계를 포함하는 과학기술 지식정보 추천 방법이 제시된다. In addition, as a second aspect of the present invention, the operation computer comprises the steps of: building a word similarity model of scientific and technological information through text data preprocessing and artificial neural network learning in a large amount of scientific and technological documents; establishing, by the operating computer, a top-level scientific and technological research and development (R&D) classification system capable of linking heterogeneous scientific and technological knowledge information; constructing a standard scientific and technological knowledge information similarity network by performing, by the operating computer, a similarity calculation between the highest scientific and technological R&D classification systems using the word similarity model; constructing a science and technology knowledge information similarity network to which science and technology knowledge information is added by using the reference science and technology knowledge information similarity network in which the operating computer is built; Calculating, by the operating computer, the initial similarity of members in the science and technology knowledge information similarity network constructed using member information including industry classification, field of interest, field of specialization, and university major input at the time of membership registration (S140) Wow; A science and technology knowledge information recommendation method is provided, including the step of the operating computer recommending science and technology knowledge information around members in the science and technology knowledge information similarity network.
본 발명에 의하면, 각 다양한 분야의 과학기술 지식정보의 효율적인 추천을 위해 특허정보, 논문정보 및 사용자 정보들 간의 과학기술 지식정보를 연결할 수 있는 최상위 과학기술 연구개발 분류체계를 구축하고, 구축된 최상위 과학기술 연구개발 분류체계를 기준으로 과학기술 메타데이터(과학기술 지식정보)의 유사도 네트워크를 구축하고, 구축된 과학기술 지식정보 유사도 네트워크 내 사용자 주변의 특허와 논문을 포함하는 과학기술 지식정보를 사용자에게 맞춤형으로 추천할 수 있는 효과가 있다.According to the present invention, for the efficient recommendation of scientific and technological knowledge information in various fields, the highest scientific and technological R&D classification system that can connect scientific and technological knowledge information among patent information, thesis information, and user information is constructed, and the Establish a similarity network of science and technology metadata (science and technology knowledge information) based on the science and technology R&D classification system, and use science and technology knowledge information including patents and papers around users in the constructed science and technology knowledge information similarity network It has the effect of being able to make a customized recommendation for you.
도 1`은 본 발명의 과학기술 지식정보 추천 시스템의 실시예에 관한 개략적인 구성도이다.1` is a schematic configuration diagram of an embodiment of the science and technology knowledge information recommendation system of the present invention.
도 2는 본 발명의 과학기술 지식정보 추천 시스템의 주요부인 운영컴퓨터의 실시예에 관한 개략적인 구성도이다.2 is a schematic configuration diagram of an embodiment of an operating computer, which is a main part of the scientific and technological knowledge information recommendation system of the present invention.
도 3은 본 발명의 과학기술 지식정보 추천 방법의 실시예를 설명하기 위한 흐름도이다.3 is a flowchart for explaining an embodiment of a method for recommending scientific and technological knowledge information of the present invention.
도 4는 본 발명의 과학기술 지식정보 추천 방법의 주요부를 설명하기 위한 흐름도이다.4 is a flowchart for explaining the main part of the scientific and technological knowledge information recommendation method of the present invention.
도 5는 본 발명의 과학기술 지식정보 추천 방법의 주요부를 설명하기 위한 흐름도이다.5 is a flowchart for explaining the main part of the method for recommending scientific and technological knowledge information of the present invention.
본 발명에서는, 대량의 과학기술문서에서 텍스트 데이터 전처리 및 인공신경망 학습을 통해 과학기술 정보의 단어 유사도 모델을 구축하는 단어유사도모델구축수단과; 국제 또는 국내 과학기술분류체계를 기초로 최상위 과학기술 R&D 분류체계를 구축하는 최상위과학기술R&D구축수단과; 구축된 상기 최상위 과학기술 R&D 분류체계 간의 유사도를 토대로 기준 유사도 네트워크를 구축하는 기준유사도네트워크구축수단과; 구축된 기준 유사도 네트워크를 이용하여 특허 및 논문 정보를 포함하는 과학기술 지식정보를 추가한 과학기술 지식정보 유사도 네트워크를 구축하는 과학기술 지식정보유사도네트워크구축수단과; 구축된 과학기술 지식정보 유사도 네트워크 내 회원 주변의 특허와 논문을 포함하는 과학기술 지식정보를 추천하는 과학기술 지식정보추천수단을 포함하는 과학기술 지식정보 추천 시스템 및 방법이 제시된다.In the present invention, a word similarity model building means for constructing a word similarity model of scientific and technological information through text data preprocessing and artificial neural network learning in a large amount of scientific and technological documents; the highest level science and technology R&D establishment means for establishing the highest level science and technology R&D classification system based on the international or domestic science and technology classification system; a reference similarity network construction means for constructing a reference similarity network based on the similarity between the constructed top-level scientific and technological R&D classification systems; a science and technology knowledge information similarity network construction means for constructing a science and technology knowledge information similarity network to which science and technology knowledge information including patent and thesis information is added using the established reference similarity network; A science and technology knowledge information recommendation system and method including a science and technology knowledge information recommendation means for recommending science and technology knowledge information including patents and thesis around members in the established science and technology knowledge information similarity network are presented.
이하에서 본 발명의 실시예를 첨부한 도면을 참조하면서 상세히 설명하기로 한다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.
본 발명의 실시예의 설명에 사용되는 용어에 관해 정의하기로 한다. 본 발명에서 사용하는 각종 컴퓨터 및 단말기는 하드웨어 자체 구성일 수 있고, 그 하드웨어 자원을 활용하는 컴퓨터 프로그램, 웹프로그램의 구성일 수 있다. 예를 들면 본 발명의 운영컴퓨터는 컴퓨터에 포함된 하드웨어의 각 구성으로 이루어질 수 있고, 그 컴퓨터의 하드웨어 자원을 활용하여 실행되는 컴퓨터 프로그램 또는 웹프로그램으로 이루어질 수 있다.Terms used in the description of the embodiments of the present invention will be defined. Various computers and terminals used in the present invention may consist of hardware itself, or may be composed of a computer program or a web program utilizing the hardware resources. For example, the operating computer of the present invention may consist of each component of hardware included in the computer, and may consist of a computer program or web program executed by utilizing the hardware resources of the computer.
또한, 본 발명의 실시예에 설명되는 '사용자인터페이스'는 사용자단말기에 출력되거나 설치되어 실행되는 웹프로그램 또는 어플리케이션프로그램일 수 있다.In addition, the 'user interface' described in the embodiment of the present invention may be a web program or an application program that is output to a user terminal or installed and executed.
또한, 본 발명의 실시예에서 설명되는 '~부'는 '~수단'으로도 대체하여 사용할 수 있다. 여기에서 '~부' 또는 '~수단'은 하드웨어 자체의 구성요소일 수 있고 바람직하게는 소프트웨어 또는 프로그램의 구성요소로 구성될 수 있다.In addition, '~ part' described in the embodiment of the present invention may be used instead of '~ means'. Here, '~ part' or '~ means' may be a component of hardware itself, and preferably may be composed of a component of software or a program.
도 1은 본 발명의 과학기술 지식정보 추천 시스템의 실시예에 관한 개략적인 구성도이다.1 is a schematic configuration diagram of an embodiment of the science and technology knowledge information recommendation system of the present invention.
도 1에 도시한 바와 같이 본 발명의 과학기술 지식정보 추천 시스템은, 과학기술 지식정보를 수집하여 관리하고 수집된 과학기술 지식정보를 기초로 과학기술 지식정보 유사도 네트워크를 구축하여 사용자들에게 맞춤형으로 과학기술 지식정보를 제공하는 운영컴퓨터(100)와; 상기 운영컴퓨터(100)에 통신 접속되어 상기 운영컴퓨터(100)에 회원 가입한 회원정보, 회원들의 과학기술 관련 정보 및 회원들의 과학기술 지식정보 이용 정보 등을 저장하고 관리하는 회원정보데이터저장부(200)와; 상기 운영컴퓨터(100)에 통신 접속되어 상기 운영컴퓨터(100)가 수집하는 특허정보, 논문정보 및 사회관계망에서 수집한 수집정보 등을 저장하고 관리하는 과학기술 지식정보데이터저장부(300)와; 상기 운영컴퓨터(100)에 통신 접속되어 구축한 과학기술 정보 단어의 유사도 모델 정보, 과학기술 관련 연구개발 분류 정보, 과학기술 지식정보의 유사도 네트워크 정보 및 상기 운영컴퓨터(100)가 구축한 과학기술 지식정보의 이용 정보를 저장하고 관리하는 과학기술 지식정보활용데이터저장부(400)와; 상기 운영컴퓨터(100)에 통신 접속되어 회원 가입, 사용자 과학기술 관련 정보 등을 제공하고 상기 운영컴퓨터(100)로부터 맞춤형 과학기술 지식정보를 제공받는 적어도 하나의 사용자단말기(500)와; 상기 운영컴퓨터(100)와 통신 접속되어 상기 운영컴퓨터의 정보 제공 요청에 따라서 과학기술문서 정보를 제공하는 과학기술정보제공컴퓨터(600)와; 상기 운영컴퓨터(100)와 통신 접속되어 상기 운영컴퓨터의 정보 제공 요청에 따라서 특허정보를 제공하는 특허정보제공컴퓨터(700)와; 상기 운영컴퓨터(100)와 통신 접속되어 상기 운영컴퓨터(100)의 정보 제공 요청에 따라서 논문정보를 제공하는 논문정보제공컴퓨터(800)와; 상기 운영컴퓨터(100)가 통신 접속하여 과학기술 관련 각종 정보를 수집할 수 있는 인터넷 웹사이트, 블로그, 소셜네트워크 등의 사회관계망매체(900)를 포함하는 구성이다.As shown in FIG. 1, the science and technology knowledge information recommendation system of the present invention collects and manages science and technology knowledge information, and builds a science and technology knowledge information similarity network based on the collected science and technology knowledge information to customize it to users. an operating computer 100 that provides scientific and technological knowledge information; A member information data storage unit ( 200) and; a science and technology knowledge information data storage unit 300 connected to communication with the operating computer 100 to store and manage patent information collected by the operating computer 100, thesis information, and collected information collected from social networks; Similarity model information of science and technology information words constructed through communication connection with the operating computer 100, science and technology related R&D classification information, similarity network information of science and technology knowledge information, and scientific and technological knowledge built by the operating computer 100 a science and technology knowledge information utilization data storage unit 400 for storing and managing information use information; at least one user terminal 500 that is communicatively connected to the operating computer 100 to provide membership registration, user science and technology related information, and the like, and to receive customized scientific and technological knowledge information from the operating computer 100; a science and technology information providing computer 600 that is connected to communication with the operating computer 100 and provides scientific and technical document information in response to a request for providing information from the operating computer; a patent information providing computer 700 that is connected to communication with the operating computer 100 and provides patent information according to the information provision request of the operating computer; a thesis information providing computer 800 connected to the operating computer 100 and providing thesis information in response to a request for providing information from the operating computer 100; The operating computer 100 is configured to include a social networking medium 900 such as an Internet website, blog, social network, etc. that can collect various information related to science and technology through communication connection.
상기 운영컴퓨터(100), 과학기술정보제공컴퓨터(600), 특허정보제공컴퓨터(700) 및 논문정보제공컴퓨터(800)는 자체 데이터저장수단을 구비하거나 외부 데이터저장수단과 통신 접속되고, 본 발명의 과학기술 지식정보 추천 시스템의 운용 및 이용을 위한 수단을 구비한 적어도 하나의 서버 컴퓨터로 구성될 수 있다.The operating computer 100, the scientific and technological information providing computer 600, the patent information providing computer 700 and the thesis information providing computer 800 have their own data storage means or are communicatively connected to an external data storage means, and the present invention It may be composed of at least one server computer equipped with means for the operation and use of the scientific and technological knowledge information recommendation system.
상기 과학기술정보제공컴퓨터(600)는 대량의 과학기술문서 파일을 구비하고 있는 국가별 과학기술문서 데이터베이스 또는 세계 각국의 통합 과학기술문서 데이터베이스를 제공하는 서버로 구성될 수 있다. 상기 특허정보제공컴퓨터(700)는 각국의 특허청 특허정보 데이터베이스 서버 또는 세계 각국의 통합 특허정보 데이터베이스를 제공하는 서버로 구성될 수 있다. 각국의 특허정보 데이터베이스는 예를 들면, 대한민국의 경우 국내 특허정보를 포함한 지식재산권 정보를 구비하여 사용자에게 제공하는 웹사이트 'www.kipris.or.kr'가 구비한 데이터베이스를 들 수 있고, 세계 각국의 통합 특허정보 데이터베이스는 세계 각국의 특허정보를 구비하여 사용자에게 제공하는 웹사이트 'www.escape.net'이 구비한 데이터베이스를 들 수 있다.The science and technology information providing computer 600 may be configured as a server that provides a national science and technology document database having a large amount of science and technology document files or an integrated science and technology document database of each country. The patent information providing computer 700 may be configured as a server providing a patent information database server of each country's patent office or an integrated patent information database of each country. The patent information database of each country may include, for example, a database provided by 'www.kipris.or.kr', a website that provides intellectual property information including domestic patent information to users in the case of the Republic of Korea. The integrated patent information database of 'www.escape.net', a website that provides users with patent information from around the world, includes a database.
상기 논문정보제공컴퓨터(800)는 각국의 논문정보 데이터베이스 서버 또는 세계 각국의 통합 논문정보 데이터베이스를 제공하는 서버로 구성될 수 있다. 각국의 특허정보 데이터베이스 및 는 예를 들면, 대한민국의 경우 국내 논문정보 및 세계 각국의 통합 논문정보를 포함한 논문정보를 구비하여 사용자에게 제공하는 웹사이트 'www.ndsl.kr'가 구비한 데이터베이스를 들 수 있다. The thesis information providing computer 800 may be configured as a thesis information database server in each country or a server providing an integrated thesis information database in each country. Patent information database of each country and, for example, in the case of Korea, include the database provided by 'www.ndsl.kr', a website that provides users with thesis information including domestic thesis information and integrated thesis information from around the world. can
상기 회원정보데이터저장부(200), 과학기술 지식정보데이터저장부(300) 및 과학기술 지식정보활용데이터저장부(400)는 상기 운영컴퓨터(100)가 구비한 데이터저장수단으로 구성될 수 있고, 바람직하게는 데이터베이스관리서버시스템(DBMS)으로 구성될 수 있다. 또한, 하나의 서버시스템으로 구성될 수 있고 각각 분리된 서버시스템으로 구성될 수 있다. The member information data storage unit 200, science and technology knowledge information data storage unit 300 and science and technology knowledge information utilization data storage unit 400 may be configured as data storage means provided by the operating computer 100, , preferably a database management server system (DBMS). In addition, it may be configured as one server system or may be configured as separate server systems.
상기 회원정보데이터저장부(200)는, 본 발명의 과학기술 지식정보 추천 시스템을 이용하기 위해 회원으로 가입한 사용자들의 기본적인 회원정보 및 회원 노드별 점수 정보를 저장하고 관리하는 회원정보저장부(210)와; 사용자가 회원 가입시 사용자인터페이스에서 선택 또는 입력한 산업분류 정보를 저장하고 관리하는 산업분류정보저장부(220)와; 사용자가 회원 가입시 사용자인터페이스에서 선택 또는 입력한 사용자의 관심사(분야) 정보를 저장하고 관리하는 관심분야정보저장부(230)와; 사용자가 회원 가입시 사용자인터페이스에서 선택 또는 입력한 사용자의 현재 전문분야 정보를 저장하고 관리하는 전문분야정보저장부(240)와; 사용자가 회원 가입시 사용자인터페이스에서 선택 또는 입력한 사용자의 대학전공 정보를 저장하고 관리하는 대학전공정보저장부(250)와; 회원이 본 발명의 과학기술 지식정보 추천 시스템에 사용자인터페이스를 통해 과학기술 지식정보 추천을 이용한 정보를 저장하고 관리하는 회원이용정보저장부(260)를 포함하는 구성이다.The member information data storage unit 200 is a member information storage unit 210 that stores and manages basic member information and score information for each member node of users who have signed up as members to use the science and technology knowledge information recommendation system of the present invention. )Wow; an industry classification information storage unit 220 for storing and managing industry classification information selected or input in the user interface when a user joins a membership; a field of interest information storage unit 230 for storing and managing the user's interest (field) information selected or input in the user interface when the user signs up for a membership; a specialization information storage unit 240 for storing and managing the user's current specialization information selected or input in the user interface when the user signs up for membership; a college major information storage unit 250 for storing and managing the user's college major information selected or input in the user interface when the user joins the membership; It is a configuration including a member use information storage unit 260 for a member to store and manage information using science and technology knowledge information recommendation through a user interface in the science and technology knowledge information recommendation system of the present invention.
상기 과학기술 지식정보데이터저장부(300)는, 수집된 세계 각국의 특허정보(등록특허 및 공개특허)를 저장하고 관리하는 특허정보데이터저장부(310)와; 수집된 세계 각국의 논문정보를 저장하고 관리하는 논문정보데이터저장부(320)와; 상기 운영컴퓨터(100)가 상기 사회관계망매체(900)를 통하여 수집된 과학기술 관련 정보를 저장하고 관리하는 수집정보데이터저장부(330)를 포함하는 구성이다.The science and technology knowledge information data storage unit 300 includes: a patent information data storage unit 310 for storing and managing the collected patent information (registered patents and published patents) of each country around the world; a thesis information data storage unit 320 for storing and managing the collected thesis information from around the world; The operating computer 100 is configured to include a collection information data storage unit 330 that stores and manages science and technology-related information collected through the social networking medium 900 .
상기 과학기술 지식정보활용데이터저장부(400)는, 대량의 과학기술 문서에서 분석 및 학습을 통하여 수행한 과학기술 단어 유사도 모델을 저장하고 관리하는 과학기술단어유사도모델저장부(410)와; 과학기술 지식정보를 연결할 수 있도록 국제적인 과학기술분류체계 및 국내 과학기술분류체계를 정리하여 구축한 최상위 과학기술 R&D 분류체계를 저장하고 관리하는 최상위과학기술R&D분류정보저장부(420)와; 과학기술 단어 유사도 모델을 이용하여 최상위 과학기술 R&D 분류체계 간의 유사도를 계산하여 구축한 기준 과학기술 지식정보 유사도 네트워크 정보를 저장하고 관리하는 기준과학기술 지식정보네트워크정보저장부(430)와; 기준 과학기술 지식정보 유사도 네트워크에 과학기술 지식정보를 추가하여 구축한 과학기술 지식정보 유사도 네트워크 정보를 저장하는 과학기술 지식정보유사도네트워크정보저장부(440)와; 구축된 과학기술 지식정보 유사도 네트워크에서 추천된 과학기술 지식정보의 이용 정보를 저장하고 관리하는 과학기술 지식정보이용정보저장부(450)를 포함하는 구성이다.The science and technology knowledge information utilization data storage unit 400 includes: a science and technology word similarity model storage unit 410 for storing and managing a science and technology word similarity model performed through analysis and learning in a large number of scientific and technological documents; a top-level science and technology R&D classification information storage unit 420 that stores and manages the top-level science and technology R&D classification system constructed by organizing the international science and technology classification system and the domestic science and technology classification system so as to link science and technology knowledge information; a reference science and technology knowledge information network information storage unit 430 for storing and managing the reference science and technology knowledge information similarity network information constructed by calculating the similarity between the highest level science and technology R&D classification systems using the science and technology word similarity model; a science and technology knowledge information similarity network information storage unit 440 for storing science and technology knowledge information similarity network information constructed by adding science and technology knowledge information to the reference science and technology knowledge information similarity network; It has a configuration including a science and technology knowledge information use information storage unit 450 that stores and manages use information of science and technology knowledge information recommended in the constructed science and technology knowledge information similarity network.
상기 회원정보데이터저장부(200), 과학기술 지식정보데이터저장부(300) 및 과학기술 지식정보활용데이터저장부(400)를 각각 분리해서 설명했으나, 이에 한정되는 것은 아니다. 통합된 저장 및 관리수단을 이용하여 구성할 수 있고, 상기 회원정보데이터저장부(200), 과학기술 지식정보데이터저장부(300) 및 과학기술 지식정보활용데이터저장부(400)에 포함된 것으로 설명한 각각의 저장부도 이용 및 기능면에서 필요에 따라 그 배치를 변경하여 구성될 수 있음은 물론이다.Although the member information data storage unit 200, the science and technology knowledge information data storage unit 300 and the science and technology knowledge information utilization data storage unit 400 have been separately described, it is not limited thereto. It can be configured using an integrated storage and management means, and is included in the member information data storage unit 200 , the science and technology knowledge information data storage unit 300 and the science and technology knowledge information utilization data storage unit 400 . It goes without saying that each of the storage units described above may also be configured by changing its arrangement as needed in terms of use and function.
상기 사용자단말기(500)는 상기 운영컴퓨터(100)가 제공하는 웹사이트 또는 웹프로그램으로 이루어진 사용자인터페이스를 출력시키거나, 상기 운영컴퓨터(100) 또는 애플리케이션 프로그램 다운로드 컴퓨터에서 제공하는 사용자인터페이스를 다운로드하여 실행시키거나, 또는 클라우드컴퓨팅시스템에 접속하여 사용자인터페이스를 출력시킬 수 있는 수단을 구비한, 휴대전화, 스마트폰, 태블릿컴퓨터, 노트북 또는 개인용컴퓨터(PC) 등으로 구성될 수 있다.The user terminal 500 outputs a user interface composed of a website or a web program provided by the operating computer 100, or downloads and executes a user interface provided by the operating computer 100 or an application program download computer. or a mobile phone, a smart phone, a tablet computer, a notebook computer, or a personal computer (PC) provided with a means for outputting a user interface by accessing the cloud computing system.
도 2는 본 발명의 과학기술 지식정보 추천 시스템의 주요부인 운영컴퓨터의 실시예에 관한 개략적인 구성도이다.2 is a schematic configuration diagram of an embodiment of an operating computer, which is a main part of the scientific and technological knowledge information recommendation system of the present invention.
도 2에 도시한 바와 같이 본 발명의 운영컴퓨터(100)는, 사용자단말기에 제공할 사용자인터페이스의 식별정보 및 업데이트 정보 등을 관리하는 사용자인터페이스관리부(101)와; 세계 각국의 과학기술 정보, 특허정보, 논문정보 등을 수집하는 과학기술정보수집관리부(102)와; 본 발명의 과학기술 지식정보 추천 시스템을 이용하는 사용자들이 회원으로 가입한 기본적인 회원정보 및 회원 노드별 점수 정보를 저장시키고 관리하는 회원정보관리부(103)와; 상기 사용자들이 회원 가입시 선택 또는 입력한 산업분류, 관심분야, 전문분야 및 대학전공 정보를 저장시키고 관리하는 회원과학기술 지식정보관리부(104)와; 상기 과학기술정보수집관리부(102)에서 수집한 세계 각국의 특허정보를 저장 및 추출 등의 관리를 수행하는 특허정보관리부(105)와; 상기 과학기술정보수집관리부(102)에서 수집한 세계 각국의 과학기술 관련 논문정보를 저장 및 추출 등의 관리를 수행하는 논문정보관리부(106)와; 상기 과학기술정보수집관리부(102)에서 인터넷, SNS 등의 사회관계망을 통해 수집한 세계 각국의 과학기술 정보를 저장시키고 관리를 수행하는 수집과학기술정보관리부(107)와; 대량의 과학기술 문서에서 분석과 학습을 통하여 단어의 유사도 모델을 구축하고 관리하는 과학기술단어유사도모델정보관리부(108)와; 과학기술 지식정보를 연결할 수 있도록 국제적인 과학기술분류체계 및 국내 과학기술분류체계를 정리하여 구축한 최상위 과학기술 R&D 분류체계를 구축하고 관리하는 최상위과학기술R&D분류정보관리부(109)와; 과학기술 단어 유사도 모델을 이용하여 최상위 과학기술 R&D 분류체계 간의 유사도를 계산하여 구축한 기준 과학기술 지식정보 유사도 네트워크를 구축하고 관리하는 기준과학기술 지식정보유사도네트워크정보관리부(110)와; 상기 기준 과학기술 지식정보 유사도 네트워크에 과학기술 지식정보를 추가하여 과학기술 지식정보 유사도 네트워크를 구축하고 관리하는 과학기술 지식정보유사도네트워크정보관리부(111)와; 상기 과학기술 지식정보 유사도 네트워크 내 사용자의 유사도를 산출하여 관리하는 사용자과학기술 지식정보유사도산출정보관리부(112)와; 사용자에게 추천할 과학기술 지식정보의 목록을 생성시켜 관리하는 과학기술 지식정보추천정보관리부(113)와; 구축된 과학기술 지식정보 유사도 네트워크 내에서 생성된 추천 과학기술 지식정보의 이용 정보를 관리하는 과학기술 지식정보이용정보관리부(114)와; 회원의 과학기술 지식정보 이용 현황 정보를 관리하는 사용자과학기술 지식정보이용정보관리부(115)를 포함하는 구성이다. As shown in FIG. 2, the operating computer 100 of the present invention includes: a user interface management unit 101 that manages identification information and update information of a user interface to be provided to a user terminal; a science and technology information collection and management unit 102 that collects scientific and technological information, patent information, and thesis information from around the world; a member information management unit 103 for storing and managing basic member information and score information for each member node, which users who use the scientific and technological knowledge information recommendation system of the present invention have signed up as members; a member science and technology knowledge information management unit 104 for storing and managing industry classification, field of interest, field of specialization, and university major information selected or input by the users when signing up for membership; a patent information management unit 105 for storing and extracting patent information from around the world collected by the science and technology information collection and management unit 102; a thesis information management unit 106 for storing and extracting science and technology-related thesis information from around the world collected by the science and technology information collection and management unit 102; a science and technology information collection management unit 107 that stores and manages science and technology information of countries around the world collected through social networks such as the Internet and SNS by the science and technology information collection and management unit 102; a science and technology word similarity model information management unit 108 that builds and manages a word similarity model through analysis and learning in a large number of scientific and technological documents; the top-level science and technology R&D classification information management department (109), which builds and manages the top-level science and technology R&D classification system that has been established by organizing the international science and technology classification system and the domestic science and technology classification system to connect science and technology knowledge information; a standard science and technology knowledge information similarity network information management unit 110 that constructs and manages a standard science and technology knowledge information similarity network constructed by calculating the similarity between the highest level science and technology R&D classification systems using a science and technology word similarity model; a science and technology knowledge information similarity network information management unit 111 that builds and manages a science and technology knowledge information similarity network by adding science and technology knowledge information to the reference science and technology knowledge information similarity network; a user science and technology knowledge information similarity calculation information management unit 112 that calculates and manages the similarity of users in the science and technology knowledge information similarity network; a science and technology knowledge information recommendation information management unit 113 for generating and managing a list of science and technology knowledge information to be recommended to a user; a science and technology knowledge information use information management unit 114 that manages use information of recommended science and technology knowledge information generated in the established science and technology knowledge information similarity network; It is a configuration including a user science and technology knowledge information use information management unit 115 that manages the scientific and technological knowledge information usage status information of the member.
상기 도 1 및 도 2를 참조하여 본 발명의 과학기술 지식정보 추천 시스템의 작용에 관해 상세히 설명하기로 한다.The operation of the science and technology knowledge information recommendation system of the present invention will be described in detail with reference to FIGS. 1 and 2 .
상기 운영컴퓨터(100)는 본 발명의 과학기술 지식정보 추천 시스템을 이용하고자 하는 사용자들로부터 회원가입을 받고, 사용자들이 회원 가입시 제공한 기본적인 회원정보 및 산업분류, 관심분야, 전문분야 및 대학전공과 같은 사용자 과학기술 지식정보를 수신하여 관리한다.The operating computer 100 receives membership registration from users who want to use the science and technology knowledge information recommendation system of the present invention, and provides basic member information and industry classification, fields of interest, specialties and university majors provided by users when they sign up for membership. Receive and manage the same user science and technology knowledge information.
또한, 상기 운영컴퓨터(100)는 자체 수집하거나 외부에서 전송한 대량의 과학기술 문서 파일을 기초로 각 과학기술 문서에서 불필요한 문단을 제외한 주요 본문을 추출하고, 추출된 과학기술 문서의 본문에서 형태소 분석 알고리즘을 이용하여 본문 중의 명사 단어만을 추출한 후, 전치사, 관사 등 많이 등장하는 단어 등 문장이나 문서의 특징을 표현하는데 불필요한 단어를 삭제하는 불용어 처리를 수행한다. In addition, the operating computer 100 extracts the main text excluding unnecessary paragraphs from each scientific and technical document based on a large amount of scientific and technical document files collected by itself or transmitted from the outside, and morphological analysis from the extracted scientific and technical document body After extracting only noun words from the text using an algorithm, stopword processing is performed to delete unnecessary words to express the characteristics of a sentence or document, such as words that appear frequently such as prepositions and articles.
여기서, 형태소 분석 알고리즘에서의 형태소란 언어에 있어서 "최소 의미 단위"를 말한다. 이 때 의미는 어휘적 의미와 문법적 의미를 모두 포함한다. 형태소 분석이란 형태소 보다 단위가 큰 언어 단위인 어절, 혹은 문장을 최소 의미 단위인 형태소로 분절하는 과정을 의미한다.Here, a morpheme in a morpheme analysis algorithm refers to a "minimum semantic unit" in a language. In this case, meaning includes both lexical and grammatical meanings. Morphological analysis refers to the process of segmenting a word or sentence, which is a language unit with a larger unit than a morpheme, into a morpheme, which is the smallest unit of meaning.
대량의 과학기술 문서 파일에서 추출하여 불용어 처리된 명사 단어 간의 의미를 특정 벡터 값으로 계산을 하고 인공신경망 학습 또는 머신러닝의 일종인 비지도 학습(unsupervised learning) 알고리즘을 적용하여 추출된 단어간의 유사도 모델을 구축한다. A similarity model between words extracted by extracting from a large number of scientific and technological document files and calculating the meaning between noun words processed as stop words as a specific vector value and applying an unsupervised learning algorithm, a type of artificial neural network learning or machine learning. to build
여기서, 비지도 학습 알고리즘은, 입력 데이터에 대한 목표값이 없이 데이터가 어떻게 구성되었는지를 알아내는 것으로서, 비정제 데이터를 입력하여 훈련 데이터가 없이 데이터의 특징 요약과 군집(clustering)을 수행함으로써, 목표값을 정해주지 않아도 되고 사전 학습이 필요 없으므로 속도가 빠른 머신러닝 방법이다.Here, the unsupervised learning algorithm is to find out how the data is structured without a target value for the input data. By inputting unrefined data and performing feature summary and clustering of the data without training data, the goal It is a fast machine learning method because there is no need to set a value and no prior learning is required.
상기 대량의 과학기술 문서에서의 단어 유사도 모델의 구축을 정리하면, 대량의 과학기술문서 데이터베이스에서 각 과학기술문서에서 본문을 추출하고, 이 과정에서 본문 중 불필요한 문단은 제외시킴, 추출된 주요 본문에서 명사로 이루어진 단어를 추출하고, 추출된 단에에서 불용어 처리를 수행하고, 그 단어를 기반으로 인공신경망 또는 머신러닝 학습을 통하여 과학기술 관련 단어 유사도 모델을 구축한다. 즉, 대량의 과학기술 문서 파일에서 텍스트 데이터의 전처리를 수행하고, 인공신경망(neural network) 또는 머신러닝(machine learning)을 통한 학습을 통해 과학기술 관련 단어 유사도 모델을 구축할 수 있다.To summarize the construction of the word similarity model in the large amount of scientific and technological documents, the text is extracted from each scientific and technological document from the large amount of scientific and technological document database, and unnecessary paragraphs are excluded from the main text in this process. Words composed of nouns are extracted, stopword processing is performed on the extracted words, and based on the words, a word similarity model related to science and technology is built through artificial neural network or machine learning learning. In other words, it is possible to pre-process text data from a large amount of scientific and technological document files, and to build a science and technology-related word similarity model through learning through a neural network or machine learning.
다시 정리하면, 자체 수집하거나 외부에서 전송한 대량의 과학기술 문서 파일을 기초로 각 과학기술 문서 내 문장과 단어의 유사도를 계산하고,유사도 비교를 통해 문장과 단어에 가중치를 매겨 불필요한 문단을 제외한 주요 본문을 추출하고,추출된 과학기술 문서의 본문에서 형태소 분석 기법을 이용하여 명사 단어만을 추출하고, 등장 빈도가 적은 단어, 길이가 짧은 단어 등 문장이나 문서의 특징을 표현하는데 불필요한 단어를 삭제하는 불용어 처리를 수행하고, 불용어 처리된 단어 간의 의미를 특정 벡터 값으로 계산하여 모델 학습을 위한 트레이닝 데이터를 구축하고, 인공신경망 또는 머신러닝의 일종인 비지도 학습(unsupervised learning) 알고리즘을 적용한 모델을 학습하여 추출된 단어간의 유사도 모델을 구축하도록 구성될 수 있다. In other words, it calculates the similarity of sentences and words in each scientific and technical document based on the large amount of scientific and technical document files collected by itself or transmitted from outside, and weights sentences and words through similarity comparison to exclude unnecessary paragraphs. A stopword that extracts the text, extracts only noun words from the extracted scientific and technological document body using morphological analysis techniques, and deletes unnecessary words to express the characteristics of a sentence or document, such as words with a low frequency of occurrence and words with a short length It performs processing, calculates the meaning between the stopword-processed words as a specific vector value, builds training data for model learning, and learns a model to which an artificial neural network or unsupervised learning algorithm, a type of machine learning, is applied. It may be configured to build a similarity model between the extracted words.
또한, 상기 운영컴퓨터(100)는 각각 이질적인 과학기술 분야 과학기술 지식정보들, 예를 들면 과학기술관련 논문, 특허 및 과학기술 전문가인 사용자 정보, 간의 효과적인 과학기술 지식정보 추천 시스템을 구축하기 위한 과학기술 지식정보를 연결할 수 있는 최상위 과학기술 R&D 분류체계를 구축한다. 상기 최상위 과학기술 R&D 분류체계의 구축은 국내는 물론 국제적인 정보를 활용할 수 있다. 예를 들면, OECD의 FORD 체계와 대한민국의 국가과학기술분류체계를 정리 및 통합하여 구축할 수 있다. 그 구축 형태의 예로서, 수학분야를, 대분류로 수학, 중분류를 대수학, 소분류를 선형대수와 같이 분류체계를 구축할 수 있다.In addition, the operating computer 100 is a science for establishing an effective science and technology knowledge information recommendation system between each heterogeneous science and technology knowledge information, for example, science and technology related papers, patents, and user information who are science and technology experts. Establish a top-level scientific and technological R&D classification system that can connect technical knowledge information. The establishment of the top-level scientific and technological R&D classification system can utilize domestic as well as international information. For example, the OECD's FORD system and Korea's national science and technology classification system can be organized and integrated to build. As an example of the construction type, a classification system can be constructed in the field of mathematics, such as mathematics as a large classification, algebra as a medium classification, and linear algebra as a small classification.
상기 운영컴퓨터(100)는 구축된 상기 과학기술 관련 단어 유사도 모델을 이용하여 구축된 상기 최상위 과학기술 R&D 분류체계 간의 유사도를 산출하여 기준 과학기술 지식정보 유사도 네트워크를 구축한다. 그 방법으로 1차적으로, 과학기술 관련 단어 유사도 모델을 개입시켜 최상위 과학기술 R&D 분류체계의 소분류 간의 유사도를 산출하고, 최상위 과학기술 R&D 분류체계의 대분류 및 중분류를 이용해 유사도를 세부 조정하여 과학기술 지식정보 유사도 네트워크를 구축한다. 여기서, 최상위 과학기술 R&D 분류체계의 소분류는 노드가 되고, 유사도는 관계가 될 수 있다.The operating computer 100 constructs a reference science and technology knowledge information similarity network by calculating the similarity between the top-level scientific and technological R&D classification system constructed using the constructed science and technology related word similarity model. As a method, first, the similarity between the sub-classifications of the top-level science and technology R&D classification system is calculated by intervening the science and technology-related word similarity model, and the degree of similarity is fine-tuned using the large and medium classifications of the top-level science and technology R&D classification system to obtain scientific and technological knowledge. Build an information similarity network. Here, the sub-classification of the top-level scientific and technological R&D classification system can be a node, and the degree of similarity can be a relationship.
정리하면, 과학기술 관련 단어 유사도 모델을 개입시켜 최상위 과학기술 R&D 분류체계의 단계별 유사도를 계산한 뒤, 가중치에 따라 유사도를 재산출하여 지식정보 유사도 네트워크를 구축할 수 있다.In summary, it is possible to construct a knowledge information similarity network by intervening a science and technology-related word similarity model to calculate the step-by-step similarity of the top-level science and technology R&D classification system, and then recalculating the similarity according to weights.
구축된 기준 과학기술 지식정보 유사도 네트워크의 형태는 아래 표 1과 같이 관리될 수 있다.The form of the established standard science and technology knowledge information similarity network can be managed as shown in Table 1 below.
A분류Class A B분류Class B C분류Class C D분류Class D
A분류Class A 1One 0.20.2 0.70.7 0.50.5
B분류Class B 0.20.2 1One 0.40.4 0.10.1
C분류Class C 0.70.7 0.40.4 1One 0.90.9
D분류Class D 0.50.5 0.10.1 0.90.9 1One
상기 운영컴퓨터(100)는 구축된 상기 기준 과학기술 지식정보 유사도 네트워크에 다양한 과학기술분야의 과학기술 지식정보를 추가하여 과학기술 지식정보 유사도 네트워크를 구축한다. The operating computer 100 builds a science and technology knowledge information similarity network by adding science and technology knowledge information in various science and technology fields to the constructed reference science and technology knowledge information similarity network.
상기 과학기술 지식정보 유사도 네트워크의 구축의 실시예로서, 활용할 과학기술 지식정보는 특허정보, 논문정보, 회원들의 산업분류, 관심분야, 전문분야 및 대학전공 등의 과학기술 분야의 과학기술 지식정보를 들 수 있다. As an embodiment of the construction of the science and technology knowledge information similarity network, the science and technology knowledge information to be utilized includes patent information, thesis information, industry classification of members, fields of interest, scientific and technological knowledge information in science and technology fields such as specialized fields and university majors. can be heard
상기 과학기술 지식정보 유사도 네트워크의 구축에 활용되는 특허정보와 관련해서는, 국제특허분류(IPC: International Patent Classification)와 발명의 키워드를 이용하여 기준 과학기술 지식정보 유사도 네트워크에 추가하고, 과학기술 단어 유사도 모델을 이용하여 국제특허분류(IPC) 정보를 설명하는 문장 또는 단어 집합과 노드 간의 유사도를 산출함으로써 특정 특허발명이 어떤 노드에 속하는지를 결정할 수 있다. 이 경우 하나의 특허발명은 복수의 노드를 가질 수 있음은 당연하다.With respect to the patent information used in the construction of the science and technology knowledge information similarity network, the international patent classification (IPC: International Patent Classification) and keywords of the invention are added to the standard science and technology knowledge information similarity network, and the similarity of science and technology words Using the model, it is possible to determine to which node a specific patented invention belongs by calculating the similarity between a set of sentences or words describing the International Patent Classification (IPC) information and the node. In this case, it is natural that one patented invention can have a plurality of nodes.
구체적으로는, TF-IDF(Term Frequency - Inverse Document Frequency)를 이용하여 너무 잦은 의미 없는 단어를 정제하고 불용어를 처리한 후, 자체적으로 발전시킨 Text-Rank 기법을 통해 키워드를 추출하고, 과학기술 단어 유사도 모델을 이용하여 키워드와 노드간의 유사도를 계산하여 산출하고, 산출된 유사도를 정규화시킴으로써, 키워드가 노드 내부에서의 깊이를 결정하도록 작용을 한다.Specifically, TF-IDF (Term Frequency - Inverse Document Frequency) is used to refine too frequent meaningless words, process stopwords, extract keywords through self-developed Text-Rank technique, and scientific and technological words By calculating and calculating the similarity between the keyword and the node using the similarity model, and normalizing the calculated similarity, the keyword acts to determine the depth within the node.
상기 TF-IDF는 정보 검색과 텍스트 마이닝에서 이용하는 가중치로, 여러 문서로 이루어진 문서군이 있을 때 어떤 단어가 특정 문서 내에서 얼마나 중요한 것인지를 나타내는 통계적 수치이다The TF-IDF is a weight used in information retrieval and text mining, and is a statistical value indicating how important a word is in a specific document when there is a document group consisting of several documents.
상기 Text-Rank 기법(알고리즘)은 페이지 랭크 알고리즘에 착안하여 나온 것으로서 유사도를 비교하여 문장과 단어에 가중치를 매겨 하나의 문서를 요약해주는 기법으로 알려져 있지만, 빈도가 높은 단어와 문장들을 추출해내는 성격이 강한 알고리즘이다.The Text-Rank technique (algorithm) is based on the page rank algorithm and is known as a technique for summarizing a single document by weighting sentences and words by comparing the similarity, but it has a characteristic of extracting high-frequency words and sentences. It is a strong algorithm.
상기 과학기술 지식정보 유사도 네트워크의 구축에 활용되는 논문정보와 관련해서는, 논문 주제 분류와 키워드를 이용하여 기준 과학기술 지식정보 유사도 네트워크에 추가하고, 과학기술 단어 유사도 모델을 이용하여 논문 주제 분류와 노드 간의 유사도를 산출함으로써 특정 논문이 어떤 노드에 속하는지를 결정할 수 있다. 이 경우 하나의 논문은 복수의 노드를 가질 수 있음은 당연하다.With respect to the thesis information used in the construction of the science and technology knowledge information similarity network, the thesis topic classification and keywords are used to be added to the standard science and technology knowledge information similarity network, and the thesis topic classification and node using the science and technology word similarity model By calculating the degree of similarity between the two, it is possible to determine which node a specific paper belongs to. In this case, it is natural that one paper can have multiple nodes.
구체적으로는, TF-IDF(Term Frequency - Inverse Document Frequency)를 이용하여 너무 잦은 의미 없는 단어를 정제하고 불용어를 처리한 후, 자체적으로 발전시킨 Text-Rank 기법을 통해 키워드를 추출하고, 과학기술 단어 유사도 모델을 이용하여 키워드와 노드간의 유사도를 계산하여 산출하고, 산출된 유사도를 정규화시킴으로써, 키워드가 노드 내부에서의 깊이를 결정하도록 작용을 한다.Specifically, TF-IDF (Term Frequency - Inverse Document Frequency) is used to refine too frequent meaningless words, process stopwords, extract keywords through self-developed Text-Rank technique, and scientific and technological words By calculating and calculating the similarity between the keyword and the node using the similarity model, and normalizing the calculated similarity, the keyword acts to determine the depth within the node.
상기 과학기술 지식정보 유사도 네트워크의 구축에 활용되는 사용자의 대학전공 정보와 관련해서는, 대학전공을 대표 분류로 간소화 및 재분류 하는 과정과 학과 분류 자료를 이용하여 기준 과학기술 지식정보 유사도 네트워크에 추가함으로써 이루어질 수 있다.With respect to the user's university major information used in the construction of the science and technology knowledge information similarity network, the process of simplifying and reclassifying the university major into a representative classification and adding it to the standard science and technology knowledge information similarity network using department classification data can be done
상기 대학전공을 대표 분류로 간소화 및 재분류 하는 과정은 같은 전공 내용이지만 대학 별 표현법의 차이로 이름이 다른 경우가 존재하는 것을 감안하여, 대표 분류로 간소화 및 재분류하여 통일성 획득하고, 교육부에서 제공하는 학과(전공) 분류 자료집을 이용해 어떤 노드에 속하는지 결정할 수 있다.Considering that the process of simplifying and reclassifying the above university major into a representative classification has the same content but different names due to differences in expression methods for each university, simplification and reclassification into a representative classification to achieve unity, provided by the Ministry of Education You can determine which node you belong to by using the department (major) classification data book you are interested in.
상기 과학기술 지식정보 유사도 네트워크의 구축에 활용되는 사용자가 선택한 산업분류 정보와 관련해서는, 산업분류코드를 이용하여 기준 과학기술 지식정보 유사도 네트워크에 추가하고, 과학기술 단어 유사도 모델을 이용하여 산업분류코드를 설명하는 문장 또는 단어 집합과 노드 간의 유사도를 산출할 수 있다. 이 경우 하나의 산업분류는 복수의 노드를 가질 수 있음은 당연하다.With respect to the industry classification information selected by the user used to construct the science and technology knowledge information similarity network, the industry classification code is used to add it to the standard science and technology knowledge information similarity network, and the industry classification code is used using the science and technology word similarity model. It is possible to calculate the similarity between the sentence or word set and the node that describes In this case, it is natural that one industry classification may have a plurality of nodes.
상기 과학기술 지식정보 유사도 네트워크의 구축에 활용되는 사용자의 관심분야 및 전문분야 정보와 관련해서는, 관심사 및 전문분야는 명사의 형태를 가진 대량의 데이터로서, 이를 과학기술 단어 유사도 모델을 이용해 기준 과학기술 지식정보 유사도 네트워크에 추가하고, 과학기술 단어 유사도 모델을 이용해 관심사 및 전문분야 관련 단어와 노드 간의 유사도를 산출할 수 있다. 이 경우, 하나의 관심사 및 전문분야는 복수의 노드를 가질 수 있음은 당연하다.With respect to the user's interest and specialization information used in constructing the scientific and technological knowledge information similarity network, the interest and specialization are large amounts of data in the form of nouns, which are used as standard science and technology using a science and technology word similarity model. It can be added to the knowledge information similarity network, and the similarity between words and nodes related to interests and specialties can be calculated using the science and technology word similarity model. In this case, it is natural that one interest and field of expertise may have a plurality of nodes.
또한, 상기 운영컴퓨터(100)는 구축된 과학기술 지식정보 유사도 네트워크 내 회원의 노드 관리를 수행한다. 회원은 노드 별로 점수를 가지고 가장 높은 점수를 갖는 노드에 속할 수 있다. 상기 회원의 노드별 점수 관리는, 사용자가 본 발명의 과학기술 지식정보 추천 시스템에 회원 가입할 때 입력한 회원 기본정보를 기초로 초기 노드를 설정하고, 사용자가 선택한 대학전공, 산업분류, 관심분야 및 전문분야에 해당하는 노드에 점수를 추가하고, 본 발명의 지식재산 추천 시스템에서의 조회, 검색 및 스크랩 등의 회원의 활동에 가중치를 부여하고, 추가적으로 쿠키를 이용하여 조회 시간 등의 데이터를 수집하여, 회원이 본 발명의 과학기술 지식정보 추천 시스템의 서비스를 이용하면 가중치와 쿠키를 이용해 수집한 데이터를 토대로 점수를 계산하고, 이용 내역에 해당하는 노드에 점수 추가하도록 구성될 수 있다. In addition, the operating computer 100 performs node management of members in the established scientific and technological knowledge information similarity network. A member has a score for each node and may belong to the node with the highest score. The score management for each node of the member is performed by setting an initial node based on the basic member information input by the user when signing up for the science and technology knowledge information recommendation system of the present invention, and selecting the university major, industry classification, and field of interest selected by the user. and adding a score to a node corresponding to a specialized field, giving weight to member activities such as inquiry, search, and scrap in the intellectual property recommendation system of the present invention, and additionally collecting data such as inquiry time using cookies Thus, when a member uses the service of the science and technology knowledge information recommendation system of the present invention, it can be configured to calculate a score based on data collected using weights and cookies, and add the score to a node corresponding to the usage history.
또한, 회원의 회원정보(대학 전공, 산업, 관심사 및 전문분야)에 대한 수정이 있을 때 해당하는 노드의 점수를 가감하는 것을 통해 회원의 노드 점수 계산하도록 구성될 수 있다.In addition, when there is a modification to the member's member information (university major, industry, interest and specialization), it may be configured to calculate the node score of the member by adding or subtracting the score of the corresponding node.
상기 운영컴퓨터는 구축된 상기 과학기술 지식정보 유사도 네트워크 및 관리하고 있는 회원의 노드 점수를 기초로 과학기술 지식정보를 추천한다. The operating computer recommends science and technology knowledge information based on the constructed similarity network of science and technology knowledge information and node scores of managed members.
과학기술 지식정보의 추천 과정은 회원이 속한 노드와 타 노드들의 점수 차이를 계산하여, 점수 차이에 따라 사용할 타 노드의 개수와 깊이를 결정한다. 이 경우 점수 차이가 클수록 적은 노드의 개수로 결정되고, 차이가 클수록 깊은 것으로 결정될 수 있다. 상기 과학기술 지식정보 유사도 네트워크 및 회원의 노드 점수에 따라 추출된 과학기술 지식정보들에서 연도, 인용된 횟수 및 회원의 조회 유무 등의 조건을 이용하여 필터링을 수행하여 필터링된 과학기술 지식정보를 추천하도록 구성될 수 있다. The process of recommending scientific and technological knowledge information calculates the difference in scores between the node to which the member belongs and other nodes, and determines the number and depth of other nodes to be used according to the difference in scores. In this case, the greater the difference between the scores, the smaller the number of nodes may be, and the greater the difference, the deeper it may be determined. Recommends filtered science and technology knowledge information by performing filtering using conditions such as year, number of citations, and member inquiry status from the science and technology knowledge information extracted according to the science and technology knowledge information similarity network and member's node score can be configured to
정리하면, 과학기술 지식정보의 추천은, 회원의 노드별 점수 정보가 저장된 회원정보저장부에서 회원 노드와 타 노드들 간의 점수 차이를 계산하고, 계산된 점수차이에 따라 추출할 노드 개수를 결정함과 동시에 계산된 점수차이에 따라 사용할 노드 깊이를 결정하여 추출된 과학기술 지식정보들을 다양한 조건을 이용하여 필터링하여 필터링된 과학기술 지식정보를 추천할 수 있다. 여기서 추천되는 과학기술 지식정보는 특허정보 및 논문정보를 포함할 수 있다. In summary, for the recommendation of scientific and technological knowledge information, the score difference between the member node and other nodes is calculated in the member information storage unit where the score information for each node of the member is stored, and the number of nodes to be extracted is determined according to the calculated score difference. At the same time, it is possible to recommend the filtered scientific and technological knowledge information by determining the node depth to be used according to the calculated score difference and filtering the extracted scientific and technological knowledge information using various conditions. Science and technology knowledge information recommended here may include patent information and thesis information.
도 3은 본 발명의 과학기술 지식정보 추천 방법의 실시예를 설명하기 위한 흐름도이다. 도 3에 도시한 바와 같이 본 발명의 과학기술 지식정보 추천 방법은, 운영컴퓨터가 대량의 과학기술문서에서 텍스트 데이터 전처리 및 인공신경망 학습을 통해 과학기술 정보의 단어 유사도 모델을 구축하는 단계(S100)와; 상기 운영컴퓨터가 이질적인 과학기술분야의 과학기술 지식정보를 연결할 수 있는 최상위 과학기술 연구개발(R&D) 분류 체계를 구축하는 단계(S110)와; 상기 운영컴퓨터가 상기 단어 유사도 모델을 이용하여 최상위 과학기술 R&D 분류체계 간 유사도 계산을 수행하여 기준 과학기술 지식정보 유사도 네트워크를 구축하는 단계(S120)와; 상기 운영컴퓨터가 구축된 상기 기준 과학기술 지식정보 유사도 네트워크를 이용하여 과학기술 지식정보를 추가한 과학기술 지식정보 유사도 네트워크를 구축하는 단계(S130)와; 상기 운영컴퓨터가 사용자의 회원 가입시 입력한 산업분류, 관심분야, 전문분야 및 대학전공을 포함하는 회원 정보를 이용하여 구축된 상기 과학기술 지식정보 유사도 네트워크 내 회원의 초기 유사도를 산출하는 단계(S140)와; 상기 운영컴퓨터가 상기 과학기술 지식정보 유사도 네트워크 내 회원 주변의 과학기술 지식정보를 추천하는 단계(S150)를 포함하는 구성이다.3 is a flowchart for explaining an embodiment of a method for recommending scientific and technological knowledge information of the present invention. As shown in FIG. 3 , the method for recommending scientific and technological knowledge information of the present invention comprises the steps of, by an operating computer, constructing a word similarity model of scientific and technological information through text data preprocessing and artificial neural network learning in a large number of scientific and technological documents (S100) Wow; Step (S110) of the operating computer to establish a top-level scientific and technological research and development (R&D) classification system that can connect scientific and technological knowledge information in heterogeneous science and technology fields; the operation computer constructing a standard science and technology knowledge information similarity network by performing a similarity calculation between the highest scientific and technological R&D classification systems using the word similarity model (S120); constructing a science and technology knowledge information similarity network to which science and technology knowledge information is added using the reference science and technology knowledge information similarity network in which the operating computer is built (S130); Calculating, by the operating computer, the initial similarity of members in the science and technology knowledge information similarity network constructed using member information including industry classification, field of interest, field of specialization, and university major input at the time of membership registration (S140) Wow; The operating computer is configured to include a step (S150) of recommending scientific and technological knowledge information around members in the scientific and technological knowledge information similarity network.
또한, 상기 운영컴퓨터가 회원의 과학기술 지식정보 이용정보를 이용하여 과학기술 지식정보 유사도 네트워크 내 회원의 유사도를 재설정하는 단계를 더 포함할 수 있다.In addition, the operation computer may further include the step of resetting the member's similarity in the scientific and technological knowledge information similarity network by using the member's scientific and technological knowledge information use information.
도 4는 본 발명의 과학기술 지식정보 추천 방법의 주요부를 설명하기 위한 흐름도이다. 도 4에 도시한 바와 같이 본 발명의 상기 단어 유사도 모델을 구축하는 단계(S100)는, 대량의 과학기술 문서에서 불필요한 문단을 제외한 주요 본문을 추출하는 단계(S101)와: 추출한 주요 본문에서 형태소 분석 기법을 이용하여 명사인 단어만을 추출하는 단계(S102)와; 추출된 단어에서 불용어 처리를 수행하는 단계(S103)와; 불용어 처리된 단어를 기반으로 인공신경망 또는 머신러닝 학습을 통하여 과학기술 관련 단어 유사도 모델을 구축하는 단계(S104)를 포함하는 구성이다.4 is a flowchart for explaining the main part of the scientific and technological knowledge information recommendation method of the present invention. As shown in FIG. 4 , the step of constructing the word similarity model of the present invention (S100) includes extracting the main text excluding unnecessary paragraphs from a large amount of scientific and technological documents (S101) and: morphological analysis from the extracted main text extracting only words that are nouns using a technique (S102); performing stopword processing on the extracted words (S103); It is a configuration including the step (S104) of constructing a science and technology-related word similarity model through artificial neural network or machine learning learning based on the stopword-processed word.
도 5는 본 발명의 과학기술 지식정보 추천 방법의 주요부를 설명하기 위한 흐름도이다. 도 5에 도시한 바와 같이 본 발명의 상기 과학기술 지식정보를 추천하는 단계(S150)는, 회원의 노드별 점수 정보가 저장된 회원정보저장부에서 회원 노드와 타 노드들 간의 점수 차이를 계산하는 단계(S151)와; 계산된 점수차이에 따라 추출할 노드 개수를 결정하는 단계(S152)와; 계산된 점수차이에 따라 사용할 노드 깊이를 결정하는 단계(S153)와; 추출된 과학기술 지식정보들을 다양한 조건을 이용하여 필터링하는 단계(S154)와; 필터링된 특허 및 논문 정보를 포함하는 과학기술 지식정보를 추천하는 단계(S155)를 포함하는 구성이다.5 is a flowchart for explaining the main part of the method for recommending scientific and technological knowledge information of the present invention. As shown in FIG. 5, the step of recommending the scientific and technological knowledge information of the present invention (S150) is a step of calculating the difference in score between the member node and other nodes in the member information storage unit in which the score information for each node of the member is stored. (S151) and; determining the number of nodes to be extracted according to the calculated score difference (S152); determining a node depth to be used according to the calculated score difference (S153); filtering the extracted scientific and technological knowledge information using various conditions (S154); It is a configuration including a step (S155) of recommending scientific and technological knowledge information including filtered patent and thesis information.
이상에서 설명한 본 발명의 실시예는 다양한 실시예 중 일부에 불과하다. 본 발명의 운영컴퓨터가 대량의 과학기술문서에서 텍스트 데이터 전처리 및 인공신경망 학습을 통해 과학기술 정보의 단어 유사도 모델을 구축하고, 국제 또는 국내 과학기술분류체계를 기초로 최상위 과학기술 R&D 분류체계를 구축하고, 구축된 상기 최상위 과학기술 R&D 분류체계 간의 유사도를 토대로 기준 유사도 네트워크를 구축하고, 구축된 기준 유사도 네트워크를 이용하여 특허 및 논문 정보를 포함하는 과학기술 지식정보를 추가한 과학기술 지식정보 유사도 네트워크를 구축하고, 구축된 과학기술 지식정보 유사도 네트워크 내 회원 주변의 특허와 논문을 포함하는 과학기술 지식정보를 사용자에게 추천하는 기술적 사상에 포함하는 다양한 실시예가 본 발명의 보호범위에 포함되는 것은 당연하다.The embodiments of the present invention described above are only some of the various embodiments. The operating computer of the present invention builds a word similarity model of science and technology information through text data preprocessing and artificial neural network learning from a large amount of science and technology documents, and builds a top-level science and technology R&D classification system based on international or domestic science and technology classification systems And, based on the similarity between the constructed top-level scientific and technological R&D classification system, a standard similarity network is built, and science and technology knowledge information similarity network that uses the established standard similarity network to add science and technology knowledge information including patent and thesis information It is natural that various embodiments including in the technical idea of constructing and recommending scientific and technological knowledge information including patents and papers around members in the established scientific and technological knowledge information similarity network to users are included in the protection scope of the present invention. .
본 발명은 과학기술 관련 지식정보 데이터 산업에 이용될 수 있다.The present invention can be applied to science and technology related knowledge information data industry.

Claims (10)

  1. 과학기술 지식정보를 수집하여 사용자들에게 추천하는 운영컴퓨터를 포함하는 시스템에서,In a system including an operating computer that collects scientific and technological knowledge information and recommends it to users,
    상기 운영컴퓨터는 적어도 하나의 하드웨어 프로세서와 프로그램을 저장하는 메모리를 구비하고, 상기 적어도 하나의 하드웨어 프로세서가 상기 메모리에 저장된 프로그램의 실행을 제어하되,The operating computer includes at least one hardware processor and a memory for storing a program, wherein the at least one hardware processor controls execution of the program stored in the memory,
    대량의 과학기술문서에서 텍스트 데이터 전처리 및 인공신경망 학습을 통해 과학기술 정보의 단어 유사도 모델을 구축하고,Building a word similarity model of scientific and technological information through text data preprocessing and artificial neural network learning from a large number of scientific and technological documents,
    국제 또는 국내 과학기술분류체계를 기초로 최상위 과학기술 R&D 분류체계를 구축하고,Establish a top-level science and technology R&D classification system based on the international or domestic science and technology classification system,
    구축된 상기 최상위 과학기술 R&D 분류체계 간의 유사도를 토대로 기준 유사도 네트워크를 구축하고,Based on the similarity between the constructed top-level scientific and technological R&D classification system, a standard similarity network is constructed,
    구축된 기준 유사도 네트워크를 이용하여 특허 및 논문 정보를 포함하는 과학기술 지식정보를 추가한 과학기술 지식정보 유사도 네트워크를 구축하고,By using the established standard similarity network, a science and technology knowledge information similarity network is constructed to which science and technology knowledge information including patent and thesis information is added,
    구축된 과학기술 지식정보 유사도 네트워크 내 회원 주변의 특허와 논문을 포함하는 과학기술 지식정보를 추천하는 것을 특징으로 하는 과학기술 지식정보 추천 시스템. A science and technology knowledge information recommendation system, characterized in that it recommends science and technology knowledge information including patents and papers around members in the established science and technology knowledge information similarity network.
  2. 청구항 1에 있어서,The method according to claim 1,
    상기 과학기술 정보의 단어 유사도 모델의 구축은,The construction of the word similarity model of the scientific and technological information is,
    자체 수집하거나 외부에서 전송한 대량의 과학기술 문서 파일을 기초로 각 과학기술 문서 내 문장과 단어의 유사도를 계산하고,유사도 비교를 통해 문장과 단어에 가중치를 매겨 불필요한 문단을 제외한 주요 본문을 추출하고, 추출된 과학기술 문서의 본문에서 형태소 분석 기법을 이용하여 명사 단어만을 추출하고, 등장 빈도가 적은 단어, 길이가 짧은 단어 등 문장이나 문서의 특징을 표현하는데 불필요한 단어를 삭제하는 불용어 처리를 수행하고, 불용어 처리된 단어 간의 의미를 특정 벡터 값으로 계산하여 모델 학습을 위한 트레이닝 데이터를 구축하고, 인공신경망 또는 머신러닝의 일종인 비지도 학습(unsupervised learning) 알고리즘을 적용한 모델을 학습하여 추출된 단어간의 유사도 모델을 구축하는 것을 특징으로 하는 과학기술 지식정보 추천 시스템.It calculates the similarity of sentences and words in each scientific and technological document based on the large amount of scientific and technological document files collected by itself or transmitted from outside, and weights the sentences and words through similarity comparison to extract the main body excluding unnecessary paragraphs. , extracts only noun words from the extracted scientific and technological document body using morphological analysis techniques, and performs stopword processing by deleting words unnecessary to express the characteristics of a sentence or document, such as words with a low frequency of occurrence and words with a short length. , constructs training data for model learning by calculating the meaning between words processed by stopwords as specific vector values, Science and technology knowledge information recommendation system, characterized in that it builds a similarity model.
  3. 청구항 1에 있어서,The method according to claim 1,
    상기 최상위 과학기술 R&D 분류체계의 구축은,The construction of the top-level scientific and technological R&D classification system is,
    과학기술 관련 단어 유사도 모델을 개입시켜 최상위 과학기술 R&D 분류체계의 단계별 유사도를 계산한 뒤, 가중치에 따라 유사도를 재산출하여 지식정보 유사도 네트워크를 구축하는 것을 특징으로 하는 과학기술 지식정보 추천 시스템.Science and technology knowledge information recommendation system, characterized in that by intervening a science and technology-related word similarity model, the step-by-step similarity of the top-level science and technology R&D classification system is calculated, and then the similarity is recalculated according to weights to construct a knowledge information similarity network.
  4. 청구항 1에 있어서,The method according to claim 1,
    상기 기준 유사도 네트워크의 구축은,The construction of the reference similarity network is,
    과학기술 관련 단어 유사도 모델을 개입시켜 최상위 과학기술 R&D 분류체계의 단계별 유사도를 계산한 뒤, 가중치에 따라 유사도를 재산출하여 지식정보 유사도 네트워크를 구축하는 것을 특징으로 하는 과학기술 지식정보 추천 시스템.Science and technology knowledge information recommendation system, characterized in that by intervening a science and technology-related word similarity model, the step-by-step similarity of the top-level science and technology R&D classification system is calculated, and then the similarity is recalculated according to weights to construct a knowledge information similarity network.
  5. 청구항 1에 있어서,The method according to claim 1,
    상기 과학기술 지식정보 유사도 네트워크의 구축은,The construction of the science and technology knowledge information similarity network is,
    특허정보, 논문정보, 회원들의 산업분류, 관심분야, 전문분야 및 대학전공 등의 과학기술 분야의 과학기술 지식정보를 활용하는 것을 특징으로 하는 과학기술 지식정보 추천 시스템.A science and technology knowledge information recommendation system that utilizes science and technology knowledge information in science and technology fields such as patent information, thesis information, industry classification of members, fields of interest, specialized fields, and university majors.
  6. 청구항 1에 있어서,The method according to claim 1,
    상기 과학기술 지식정보의 추천은,Recommendation of the above science and technology knowledge information,
    회원의 노드별 점수 정보가 저장된 회원정보저장부에서 회원 노드와 타 노드들 간의 점수 차이를 계산하고, 계산된 점수차이에 따라 추출할 노드 개수를 결정하고, 계산된 점수차이에 따라 사용할 노드 깊이를 결정하고, 추출된 과학기술 지식정보들을 다양한 조건을 이용하여 필터링하여, 필터링된 특허 및 논문 정보를 포함하는 과학기술 지식정보를 추천하는 것을 특징으로 하는 과학기술 지식정보 추천시스템.In the member information storage unit where the score information for each node of the member is stored, the score difference between the member node and other nodes is calculated, the number of nodes to be extracted is determined according to the calculated score difference, and the node depth to be used according to the calculated score difference is determined. A science and technology knowledge information recommendation system, characterized in that it is determined, and the extracted scientific and technological knowledge information is filtered using various conditions to recommend science and technology knowledge information including the filtered patent and thesis information.
  7. 운영컴퓨터가 대량의 과학기술문서에서 텍스트 데이터 전처리 및 인공신경망 학습을 통해 과학기술 정보의 단어 유사도 모델을 구축하는 단계와; 상기 운영컴퓨터가 이질적인 과학기술분야의 과학기술 지식정보를 연결할 수 있는 최상위 과학기술 연구개발(R&D) 분류 체계를 구축하는 단계와; 상기 운영컴퓨터가 상기 단어 유사도 모델을 이용하여 최상위 과학기술 R&D 분류체계 간 유사도 계산을 수행하여 기준 과학기술 지식정보 유사도 네트워크를 구축하는 단계와; 상기 운영컴퓨터가 구축된 상기 기준 과학기술 지식정보 유사도 네트워크를 이용하여 과학기술 지식정보를 추가한 과학기술 지식정보 유사도 네트워크를 구축하는 단계와; 상기 운영컴퓨터가 사용자의 회원 가입시 입력한 산업분류, 관심분야, 전문분야 및 대학전공을 포함하는 회원 정보를 이용하여 구축된 상기 과학기술 지식정보 유사도 네트워크 내 회원의 초기 유사도를 산출하는 단계와; 상기 운영컴퓨터가 상기 과학기술 지식정보 유사도 네트워크 내 회원 주변의 과학기술 지식정보를 추천하는 단계를 포함하는 과학기술 지식정보 추천 방법.constructing, by an operating computer, a word similarity model of scientific and technological information through text data preprocessing and artificial neural network learning in a large number of scientific and technological documents; establishing, by the operating computer, a top-level scientific and technological research and development (R&D) classification system capable of linking scientific and technological knowledge information in heterogeneous science and technology fields; constructing a standard scientific and technological knowledge information similarity network by performing, by the operating computer, a similarity calculation between the highest scientific and technological R&D classification systems using the word similarity model; constructing a science and technology knowledge information similarity network to which science and technology knowledge information is added by using the reference science and technology knowledge information similarity network in which the operating computer is built; calculating, by the operating computer, an initial similarity of members in the similarity network of science and technology knowledge information constructed using member information including industry classification, field of interest, field of specialization, and university major input at the time of membership registration by the user; Scientific and technological knowledge information recommendation method comprising the step of the operating computer recommending scientific and technological knowledge information around members in the scientific and technological knowledge information similarity network.
  8. 청구항 7에 있어서,8. The method of claim 7,
    상기 운영컴퓨터가 회원의 과학기술 지식정보 이용정보를 이용하여 과학기술 지식정보 유사도 네트워크 내 회원의 유사도를 재설정하는 단계를 더 포함하는 것을 특징으로 하는 과학기술 지식정보 추천 방법.Science and technology knowledge information recommendation method, characterized in that the operation computer further comprises the step of resetting the similarity of the member in the science and technology knowledge information similarity network by using the scientific and technological knowledge information use information of the member.
  9. 청구항 7에 있어서,8. The method of claim 7,
    상기 단어 유사도 모델을 구축하는 단계는, Building the word similarity model comprises:
    대량의 과학기술 문서에서 불필요한 문단을 제외한 주요 본문을 추출하는 단계와: 추출한 주요 본문에서 형태소 분석 기법을 이용하여 명사인 단어만을 추출하는 단계와; 추출된 단어에서 불용어 처리를 수행하는 단계와; 불용어 처리된 단어를 기반으로 인공신경망 또는 머신러닝 학습을 통하여 과학기술 관련 단어 유사도 모델을 구축하는 단계를 포함하는 것을 특징으로 하는 과학기술 지식정보 추천 방법.A step of extracting a main text excluding unnecessary paragraphs from a large number of scientific and technological documents; performing stopword processing on the extracted words; Science and technology knowledge information recommendation method, characterized in that it comprises the step of constructing a science and technology related word similarity model through artificial neural network or machine learning learning based on the stopword-processed word.
  10. 청구항 7에 있어서,8. The method of claim 7,
    상기 과학기술 지식정보를 추천하는 단계는, The step of recommending the science and technology knowledge information,
    회원의 노드별 점수 정보가 저장된 회원정보저장부에서 회원 노드와 타 노드들 간의 점수 차이를 계산하는 단계와; 계산된 점수차이에 따라 추출할 노드 개수를 결정하는 단계와; 계산된 점수차이에 따라 사용할 노드 깊이를 결정하는 단계와; 추출된 과학기술 지식정보들을 다양한 조건을 이용하여 필터링하는 단계와; 필터링된 특허 및 논문 정보를 포함하는 과학기술 지식정보를 추천하는 단계를 포함하는 것을 특징으로 하는 과학기술 지식정보 추천 방법.calculating a score difference between a member node and other nodes in a member information storage unit storing score information for each node of the member; determining the number of nodes to be extracted according to the calculated score difference; determining a node depth to be used according to the calculated score difference; filtering the extracted scientific and technological knowledge information using various conditions; Science and technology knowledge information recommendation method comprising the step of recommending science and technology knowledge information including filtered patent and thesis information.
PCT/KR2020/014373 2020-05-20 2020-10-21 System for recommending scientific and technical knowledge information, and method therefor WO2021235617A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR10-2020-0060170 2020-05-20
KR1020200060170A KR102371329B1 (en) 2020-05-20 2020-05-20 Operating computer for recommendation of scientific and technological knowledge information, scientific and technological information recommendation system and method thereof

Publications (1)

Publication Number Publication Date
WO2021235617A1 true WO2021235617A1 (en) 2021-11-25

Family

ID=78697943

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2020/014373 WO2021235617A1 (en) 2020-05-20 2020-10-21 System for recommending scientific and technical knowledge information, and method therefor

Country Status (2)

Country Link
KR (1) KR102371329B1 (en)
WO (1) WO2021235617A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114186002A (en) * 2021-12-14 2022-03-15 智博天宫(苏州)人工智能产业研究院有限公司 Scientific and technological achievement data processing and analyzing method and system
CN117114105A (en) * 2023-10-25 2023-11-24 中国科学技术信息研究所 Target object recommendation method and system based on scientific research big data information

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102540417B1 (en) * 2022-03-02 2023-06-05 고려대학교 산학협력단 System and method for integrated recommendation of learning activities based on keywords of interest using academic domain embedding and recording medium for performing the same
KR102543343B1 (en) * 2023-03-07 2023-06-16 주식회사 로이드케이 Method and device for generating search word dictionary and searching based on artificial neural network

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20100125682A (en) * 2009-05-21 2010-12-01 주식회사 아이네크 Semantic search method and system for associating with plurality of classifications
KR101122436B1 (en) * 2010-09-30 2012-03-09 엔에이치엔(주) Method and apparatus for extracting ketwords from a page based on relevance scores of terms and graph structure
KR102059309B1 (en) * 2019-11-04 2020-02-11 윤선희 Method and server for adaptive paper search using machine learning
KR20200017575A (en) * 2018-07-24 2020-02-19 배재대학교 산학협력단 Similar patent search service system and method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20190115505A (en) 2018-03-15 2019-10-14 특허법인 해담 Method for deriving a follow-up items considering firms' existing technologies

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20100125682A (en) * 2009-05-21 2010-12-01 주식회사 아이네크 Semantic search method and system for associating with plurality of classifications
KR101122436B1 (en) * 2010-09-30 2012-03-09 엔에이치엔(주) Method and apparatus for extracting ketwords from a page based on relevance scores of terms and graph structure
KR20200017575A (en) * 2018-07-24 2020-02-19 배재대학교 산학협력단 Similar patent search service system and method
KR102059309B1 (en) * 2019-11-04 2020-02-11 윤선희 Method and server for adaptive paper search using machine learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
DONGHUN LEE, KWANHO KIM: " Web Site Keyword Selection Method by Considering Semantic Similarity Based on Word2Vec. ", THE JOURNAL OF SOCIETY FOR E-BUSINESS STUDIES, vol. 23, no. 2, 1 May 2018 (2018-05-01), pages 83 - 96, XP055870824, ISSN: 2288-3908, DOI: 10.7838/jsebs.2018.23.2.083 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114186002A (en) * 2021-12-14 2022-03-15 智博天宫(苏州)人工智能产业研究院有限公司 Scientific and technological achievement data processing and analyzing method and system
CN117114105A (en) * 2023-10-25 2023-11-24 中国科学技术信息研究所 Target object recommendation method and system based on scientific research big data information
CN117114105B (en) * 2023-10-25 2024-01-30 中国科学技术信息研究所 Target object recommendation method and system based on scientific research big data information

Also Published As

Publication number Publication date
KR20210143431A (en) 2021-11-29
KR102371329B1 (en) 2022-03-07

Similar Documents

Publication Publication Date Title
WO2021235617A1 (en) System for recommending scientific and technical knowledge information, and method therefor
Sigurbergsson et al. Offensive language and hate speech detection for Danish
US10846274B2 (en) Ontological subjects of a universe and knowledge representations thereof
Bharti et al. Sarcastic sentiment detection in tweets streamed in real time: a big data approach
Lambrix et al. SAMBO—a system for aligning and merging biomedical ontologies
US8429167B2 (en) User-context-based search engine
US20170262783A1 (en) Team Formation
Tsui et al. A concept–relationship acquisition and inference approach for hierarchical taxonomy construction from tags
WO2010134752A2 (en) Semantic search method and system in which a plurality of classification systems are linked
CN109214454B (en) Microblog-oriented emotion community classification method
WO2011122730A1 (en) System and method for a related search service based on an rdf network
Pasupa et al. Hybrid deep learning models for thai sentiment analysis
CA2789052A1 (en) Methods and systems for investigation of compositions of ontological subjects
Lytvyn et al. Analysis of statistical methods for stable combinations determination of keywords identification
Dorji et al. Extraction, selection and ranking of Field Association (FA) Terms from domain-specific corpora for building a comprehensive FA terms dictionary
CN111325018A (en) Domain dictionary construction method based on web retrieval and new word discovery
Gharavi et al. Scalable and language-independent embedding-based approach for plagiarism detection considering obfuscation type: no training phase
Nasser et al. n-Gram based language processing using Twitter dataset to identify COVID-19 patients
EP2613275B1 (en) Search device, search method, search program, and computer-readable memory medium for recording search program
WO2012057563A2 (en) Emotion-based community-forming system, communication terminal capable of forming a community, and community-forming method therefor
Hashemzadeh et al. Improving keyword extraction in multilingual texts.
Tan et al. Alignment of biomedical ontologies using life science literature
KR102454261B1 (en) Collaborative partner recommendation system and method based on user information
KR102540944B1 (en) Digital content system supporting document management using meta data and integrated search based on artificial intelligent
KR20210071501A (en) Method for providing internet search service sorted by correlation based priority specialized in professional areas

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20936531

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20936531

Country of ref document: EP

Kind code of ref document: A1