WO2021107447A1 - Procédé de classification de documents pour graphe de connaissances de commercialisation et appareil associé - Google Patents

Procédé de classification de documents pour graphe de connaissances de commercialisation et appareil associé Download PDF

Info

Publication number
WO2021107447A1
WO2021107447A1 PCT/KR2020/015583 KR2020015583W WO2021107447A1 WO 2021107447 A1 WO2021107447 A1 WO 2021107447A1 KR 2020015583 W KR2020015583 W KR 2020015583W WO 2021107447 A1 WO2021107447 A1 WO 2021107447A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
marketing
classification
data
classification system
Prior art date
Application number
PCT/KR2020/015583
Other languages
English (en)
Korean (ko)
Inventor
이진형
장원홍
윤동준
Original Assignee
주식회사 데이터마케팅코리아
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from KR1020190152515A external-priority patent/KR20210063880A/ko
Priority claimed from KR1020190152516A external-priority patent/KR20210063881A/ko
Application filed by 주식회사 데이터마케팅코리아 filed Critical 주식회사 데이터마케팅코리아
Publication of WO2021107447A1 publication Critical patent/WO2021107447A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/93Document management systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Definitions

  • the present invention relates to a document classification processing method and an apparatus therefor. More specifically, the present invention relates to a document classification processing method and apparatus for a marketing knowledge graph.
  • the present invention has been devised to solve the above problems, and by effectively providing a marketing information analysis service based on a marketing specialized knowledge graph model based on analysis information for each marketing channel through artificial intelligence technology, marketing with low cost and high efficiency
  • An object of the present invention is to provide a method and apparatus for providing an efficient marketing analysis service that can support decision-making.
  • the present invention can effectively and accurately classify documents through a hybrid document classification processing method for a marketing knowledge graph in order to effectively provide marketing decision-making and analysis results as described above, and filter only necessary documents for performance
  • An object of the present invention is to provide a document classification processing method and apparatus capable of improving
  • a service providing apparatus for solving the above-described problems, in the document classification processing apparatus, analyzes a plurality of marketing document data for constructing a marketing specialized knowledge graph to define classification system information system justice department; a rule-based filtering unit that performs primary rule-based filtering corresponding to the plurality of marketing document data according to the definition of the classification system information to first remove garbage data; and a machine learning filtering unit configured to perform secondary removal of garbage data by processing machine learning filtering based on a pre-learning model corresponding to the residual marketing document data according to the primary filtering.
  • a document classification processing method for solving the above-described problems includes: defining classification system information by analyzing a plurality of marketing document data for constructing a marketing-specialized knowledge graph; a rule-based filtering step of first removing garbage data by performing rule-based primary filtering corresponding to the plurality of marketing document data according to the definition of the classification system information; and a machine learning filtering step of secondarily removing garbage data by processing machine learning filtering based on a pre-learning model corresponding to the residual marketing document data according to the first filtering.
  • the method according to an embodiment of the present invention for solving the above problems may be implemented as a program for executing the method in a computer and a recording medium in which the program is recorded.
  • documents can be classified effectively and accurately through a hybrid document classification processing method for a marketing knowledge graph, and necessary It is possible to provide a document classification processing method and apparatus capable of improving performance by filtering only documents.
  • FIG. 1 is a block diagram schematically showing an entire system according to an embodiment of the present invention.
  • FIG. 2 is a block diagram illustrating in more detail an apparatus for providing a marketing service according to an embodiment of the present invention.
  • FIG. 3 is a flowchart illustrating an operation of an apparatus for providing a marketing service according to an embodiment of the present invention.
  • FIG. 4 is a block diagram for explaining in more detail a knowledge graph construction module according to an embodiment of the present invention.
  • FIG. 5 is a flowchart illustrating an operation of a knowledge graph building module according to an embodiment of the present invention.
  • FIG. 6 is a relationship diagram for explaining a knowledge graph construction and semantic mapping process according to an embodiment of the present invention.
  • FIG. 7 is a block diagram illustrating a document classification module according to an embodiment of the present invention in more detail
  • FIG. 8 is a flowchart illustrating a document classification processing process according to an embodiment of the present invention.
  • FIG. 9 is a diagram illustrating a classification system definition according to an embodiment of the present invention.
  • block diagrams herein are to be understood as representing conceptual views of illustrative circuitry embodying the principles of the present invention.
  • all flowcharts, state transition diagrams, pseudo code, etc. may be tangibly embodied on a computer-readable medium and be understood to represent various processes performed by a computer or processor, whether or not a computer or processor is explicitly shown.
  • processors may be provided by the use of dedicated hardware as well as hardware having the ability to execute software in association with appropriate software.
  • the functionality may be provided by a single dedicated processor, a single shared processor, or a plurality of separate processors, some of which may be shared.
  • DSP digital signal processor
  • ROM read-only memory
  • RAM random access memory
  • non-volatile memory Other common hardware may also be included.
  • a component expressed as a means for performing the function described in the detailed description includes, for example, any form of software including a combination of circuit elements or firmware/microcode for performing the above function. It is intended to include all methods of performing the functions that are combined with suitable circuitry for executing the software to perform the functions. Since the present invention defined by these claims is combined with the functions provided by the various enumerated means and in a manner required by the claims, any means capable of providing the functions are equivalent to those contemplated from the present specification. should be understood as
  • FIG. 1 is a conceptual diagram schematically illustrating an entire system according to an embodiment of the present invention.
  • the entire system includes a marketing information analysis service providing apparatus 100, a marketing platform 200 and one or more user terminals 300 connected through one or more mutually distinct channels, and a marketing information analysis service
  • the providing apparatus 100 may be connected to the machine learning module 400 or include the machine learning module 400 .
  • the marketing information analysis service providing apparatus 100 may be connected to each platform 200 and the user terminal 300 through a wired/wireless network to provide a marketing information analysis service, and to analyze marketing information based on learning and artificial intelligence
  • it may be connected to the machine learning module 400 or include the machine learning module 400, and devices or terminals connected to each network may perform mutual communication through a preset network channel.
  • each network is a local area network (LAN), a wide area network (WAN), a value added network (VAN), a personal area network (PAN), a mobile communication network ( It can be implemented in all types of wired/wireless networks such as mobile radiocommunication network) or satellite communication networks.
  • LAN local area network
  • WAN wide area network
  • VAN value added network
  • PAN personal area network
  • mobile communication network It can be implemented in all types of wired/wireless networks such as mobile radiocommunication network) or satellite communication networks.
  • the user terminal 300 may include various server devices, network devices, or terminal devices that access the marketing information analysis service providing apparatus 100 for the purpose of receiving a marketing analysis service for marketing decision making.
  • the user terminals 300 may be connected to the marketing information analysis service providing apparatus 100 through an individual security network, and the marketing information analysis service providing apparatus 100 may be connected to each user terminal 300 through each security network.
  • the security network may be an encryption network
  • the service-registered user terminal 300 stores in advance the decryption key information according to the company authentication, and stores the marketing analysis result information received from the marketing information analysis service providing apparatus 100, Decryption according to the decryption key information can be obtained and output.
  • the user terminals 300 may have completed the basic information registration process corresponding to the marketing information analysis service providing apparatus 100 .
  • the user terminal 300 may be a terminal that is provided with a marketing information analysis service as a member of each company.
  • it may be a terminal of a company that directly makes marketing decisions, a terminal of a company that provides marketing services in partnership with a plurality of companies, or a terminal of a network service company that mediates data between a plurality of networks.
  • the marketing information analysis service providing apparatus 100 receives company information from each user terminal 300 , collects marketing document data based on a marketing network channel classified in advance based on the received company information, and the document data By learning the unstructured data according to the processing of the machine learning module 400 through the machine learning module 400, and using the knowledge graph information and ontology information built in advance, the learning information and the structured data collected and analyzed in advance, a marketing-specialized knowledge graph model create
  • the marketing information analysis service providing apparatus 100 may process the marketing market trend and demand prediction analysis using the marketing specialized knowledge graph model, and transmit marketing analysis information according to the processed result information to the user terminal 300 . can provide
  • the marketing-specialized knowledge graph may be constructed by semantic mapping processing of pre-established knowledge graph model information according to a natural language analysis result of company information and marketing document collection information, and the marketing information analysis service providing apparatus 100 ) can collect, store and manage dictionary (DICTIONARY) information required for natural language processing and text analysis for such semantic mapping processing and ontology information for constructing a classification system in advance.
  • DITIONARY dictionary
  • the marketing information analysis service providing apparatus 100 collects and sets a dictionary and a classification system specialized for marketing in advance, and natural language analysis-based learning of marketing document information collected for each marketing channel in response to corporate information According to the processing, semantic mapping may be performed on the pre-established knowledge graph. Accordingly, the meaning-mapped marketing-specialized knowledge graph is specialized for marketing and includes the latest information and rich synonym information, and can include rich context (CONTEXT) and relationship (ASSOCIATION) information.
  • CONTEXT rich context
  • ASSOCIATION relationship
  • Such a marketing-specialized knowledge graph can include relationship information between keywords, can be used for various solutions such as marketing trend analysis and future predictive analysis, and can be used to individually create a subdivided dictionary and classification system for each marketing field.
  • the marketing-specialized knowledge graph is a graph-based data model including relationship information between knowledge keywords by setting keyword information, which is a marketing entity, as a node, and representing the relationship between each node as an edge.
  • keyword information which is a marketing entity
  • a relational data model may be exemplified, but the marketing information analysis service providing apparatus 100 according to an embodiment of the present invention is based on the recently proposed SEMANTIC WEB technology to overcome the complexity and performance limitations of the relational data model. Based on this, it is possible to create higher efficiencies, expand the knowledge expression method, and solve the problems of scalability of data models and interoperability between systems.
  • the dictionary and classification system for text analysis are manually created by experts in a specific field, and as described above, there is a problem of cost increase due to the increase in the amount of data. There is a problem that this falls, and the technology itself, such as a typical web ontology language (OWL, Ontology Web Language), has problems with low model complexity and reusability.
  • OWL Ontology Web Language
  • the marketing information analysis service providing apparatus 100 provides learning information learned through machine learning from unstructured data analysis information and knowledge extracted from structured data in order to construct a marketing-specialized knowledge graph.
  • the diversified marketing knowledge data is efficiently semantically mapped and processed to enable automation while Its accuracy and performance can be improved.
  • the marketing information analysis service providing apparatus 100 may provide keyword classification and system information based on a marketing specialized knowledge graph through the semantic mapping processing of the diversified marketing knowledge data, It facilitates the reflection of recent issue keywords or new words for marketing purposes, and it is possible to quickly build and process information on compatibility between languages for marketing purposes (eg, foreign language data corresponding to Korean transliteration, etc.).
  • the platform 200 may be a marketing target network platform, and may be connected to the marketing information analysis service providing apparatus 100 through each access channel.
  • Each channel may be, for example, site address information corresponding to a specific platform, and the marketing information analysis service providing apparatus 100 collects marketing document data for each platform channel determined in response to site address information, and collects the results can be stored and analyzed.
  • the machine learning module 400 used in the analysis may process parallel analysis of structured and unstructured data, and hybrid-type document classification processing for this may be performed in advance.
  • Hybrid document classification processing is marketing document data using a machine learning-based primary document classification process and secondary classification information using an ontology dictionary and a linguistic rule from classification information obtained from the primary document classification process. It may include a secondary classification process for classifying As such, the classification information according to the primary and secondary classification may be used as re-learning training information of the machine learning module 400 .
  • the marketing information analysis service providing apparatus 100 may provide an analysis information service for effective marketing to the user terminal 300 .
  • the marketing information analysis service providing apparatus 100 may provide a keyword dictionary construction service for market trend analysis, a digital influence quantification service for each keyword, a trend prediction information providing service according to a prediction model, etc. to the user terminal 300 .
  • the marketing information analysis service providing apparatus 100 analyzes text or voice-based request data received from the user terminal 300, and provides a marketing analysis information providing service using an artificial intelligence chatbot function. You may.
  • FIG. 2 is a block diagram illustrating in more detail an apparatus for providing a marketing service according to an embodiment of the present invention.
  • the apparatus 100 for providing a marketing information analysis service includes a control unit 110 , a communication unit 120 , and a user management unit 130 . ), a channel-based information collection unit 140 , an analysis data processing unit 150 , a dashboard configuration unit 160 , a service providing unit 170 , and a storage unit 190 .
  • the control unit 110 generally controls the execution of the operation and function of each component including the marketing document data information collection, analysis data processing, dashboard configuration, and marketing information analysis service provision of the marketing information analysis service providing device 100 .
  • the controller 110 may be implemented as a processor for controlling all or a part of a function of providing an analysis result of information collected from the platforms 200 to the user terminals 300 or a program for executing the same.
  • the communication unit 120 is a network between the marketing information analysis service providing apparatus 100 and a wireless communication system including a mobile communication network or Internet network or between the service providing apparatus 100 and the platform 200 or the user terminal 300 is located. It may include one or more communication modules that enable wired/wireless communication between them.
  • the communication unit 120 may include a modem that encodes and modulates a transmitted signal and demodulates and decodes a received signal, or an RF front end that processes an RF signal.
  • the user manager 130 performs user registration and account management for one or more user terminals 300 using the service providing apparatus 100 .
  • the user management unit 130 receives authentication information including at least one of account identification information and terminal identification information of a person in charge of a logged-in company or a marketing service provider from the user terminal 300, and uses the authentication information to store user information. Registration can be processed. Accordingly, the user management unit 130 may register and manage information on the platform 200 to provide or analyze a marketing service and information on the user terminal 300 corresponding thereto for each marketing channel.
  • the channel-based information collection unit 140 collects marketing document data through data channels connected from the platform 200 corresponding to the user terminals 300 managed by the user management unit 130, respectively, and for each channel.
  • the collected marketing document data is output to the analysis data processing unit 150 .
  • the marketing document data may form basic analysis information processed by the analysis data processing unit 150 according to an embodiment of the present invention.
  • the marketing document data may include, for example, web page document data collected for each channel from the platform 200 , keyword data collected corresponding to a preset format, or site source code information.
  • the channel-based information collection unit 140 stores a keyword crawler that collects and stores keywords classified by industry/subject/brand in response to each platform 200, a user request collection process, and A collection process manager that allocates a collection process for each channel, a collector for each channel that accesses the platform 200, performs collection by channel and stores the collection result in the storage unit 190, and a problem that collection is stopped due to site source change It may include a collection site source management manager that prepares for and periodically compares and reports newly updated information.
  • the channel-based information collection unit 140 may access the platform 200 through a specific channel according to channel information requested from the user terminal 300 or preset in response to the user terminal 300 .
  • the channel-based information collection unit 140 receives the marketing document data to be collected according to keyword information received from the user terminal 300 or preset corresponding to the user terminal 300 through a data channel connected to the platform 200 . It can be collected through the Star Collector.
  • the channel-specific collector of the channel-based information collection unit 140 may store the collected marketing document data in the collection result database of the storage unit 190 .
  • the channel-based information collection unit 140 identifies the channel information of the platforms 200 for each industry/subject/brand corresponding to the classification information requested from the user terminal 300, and through the channel, the user terminal ( 300), a suitable collection site may be determined, and marketing document data corresponding to preset keyword information may be collected and stored from the determined site.
  • the preset keyword information may be obtained from a marketing ontology-based knowledge graph processed by the analysis data processing unit 150 , which will be described in more detail later.
  • the channel-based information collection unit 140 may register and periodically monitor site information of the platform 200 on which marketing document data is collected, and when source code update information is generated, the information is sent to the user terminal 300 . It provides an alarm and can collect and store updated data.
  • analysis data processing unit 150 may perform document classification processing of the marketing document data collected by the channel-based information collection unit 140 , and may generate or construct a marketing-specialized knowledge graph model using the classified document data.
  • the marketing specialized knowledge graph model includes pre-built keyword-based knowledge graph information, pre-collected ontology information, machine learning learning information of the collected and classified document data, and structured data information.
  • the marketing-specialized knowledge graph model may be modular ontology model data, and the ontology model data includes a core ontology built from key concepts, relationship information, daily keywords and emotional keyword information, and real-time machine learning-based document classification to reflect the latest keywords. It can be designed as a layered domain ontology built from the data obtained from the data, and interoperability can be secured by the semantic web standard technology.
  • the semantic web standard technology may include, for example, a conversion processing technology into a standard protocol language corresponding to an ontology description query, and the converted ontology description query format is RDF (Resource Description Framework) format, OWL (Web Ontoyoly language) Format, Sparkle (SPARQL, Protocol and RDF Query Language) format, etc. may be exemplified.
  • RDF Resource Description Framework
  • OWL Web Ontoyoly language
  • Sparkle SPARQL, Protocol and RDF Query Language
  • the analysis data processing unit 150 includes a knowledge graph construction module 151 that processes knowledge graph construction, a dictionary construction module 152 corresponding to the domain ontology, and filtering classification of structured and unstructured documents It may include a document classification module 153 for each. Accordingly, the analysis data processing unit 150 may provide various service information based on the marketing ontology by using the generated or constructed marketing specialized knowledge graph model.
  • the knowledge graph construction module 151 may acquire machine learning-based marketing learning information, and the acquired marketing learning information may be used to build a marketing-specific knowledge graph model.
  • the dashboard configuration unit 160 may configure a marketing analysis dashboard interface to be provided to the user terminal 300 , and the dashboard may be in the form of a GUI (GRAPHIC USER INTERFACE) such as a web interface. may be visually or aurally output through the user terminal 300 .
  • GUI GUI USER INTERFACE
  • the dashboard configuration unit 160 may configure an artificial intelligence chatbot-based marketing interface dashboard for a user-friendly marketing information analysis service, and through this marketing interface dashboard, a request is made from the user terminal 300 . It can provide various services such as market trend analysis, demand prediction analysis, keyword influence analysis, new word keyword dictionary, and product competitiveness analysis.
  • the service providing unit 170 receives the service request of the user terminal 300, and through the dashboard interface configured in the dashboard configuration unit 160, the marketing information analysis service result corresponding to the service request, the user terminal ( 300), and may include a service manager provided by .
  • the storage unit 190 includes one or more storage media for storing program information for the operation of the above-described control unit 110 and the operation of the above-described components, and may include one or more databases according to each purpose. have.
  • FIG. 3 is a flowchart illustrating an operation of an apparatus for providing a marketing service according to an embodiment of the present invention.
  • the apparatus 100 for providing a marketing information analysis service first collects platform channel-based marketing document data according to a service request of the user terminal 300 ( S101 ).
  • the marketing information analysis service providing apparatus 100 performs hybrid document classification processing according to primary filtering of marketing document data and secondary filtering based on machine learning (S105).
  • the marketing information analysis service providing apparatus 100 extracts unstructured data from the marketing document data (S105), and obtains machine learning-based marketing learning information corresponding to the unstructured data (S107).
  • the marketing information analysis service providing apparatus 100 generates a specialized marketing knowledge graph model using pre-built knowledge graph information and pre-collected ontology information, and the marketing learning information and structured data (S109).
  • the marketing information analysis service providing apparatus 100 performs marketing market trend and demand prediction analysis based on the marketing specialized knowledge graph model (S111).
  • the marketing information analysis service providing apparatus 100 may perform an analysis corresponding to the service according to the request of the user terminal 300, and not only the market trend and demand prediction analysis, but also the construction of a neologism dictionary, keyword influence analysis, etc. This can be done further.
  • the marketing information analysis service providing apparatus 100 may provide marketing analysis information based on natural language processing according to the analysis result by using the dashboard interface (S113).
  • FIG. 4 is a block diagram for explaining in more detail a knowledge graph construction module according to an embodiment of the present invention.
  • the unstructured data for building a marketing-specialized knowledge graph model may be the original text of a marketing web page collected by the channel-based information collection unit 140, and the structured data may be a general-purpose file format or a structured data that can be collected through openAPI It may contain data.
  • the open knowledge graph data may be domestic and foreign data published in RDF format, and may be obtained by receiving an RDF file or a query response targeting a SPARQL endpoint.
  • the knowledge graph construction module 151 processes step-by-step through a two-stage pipeline module as shown in FIG. 4 , thereby effectively marketing specialized knowledge Graph model building processing can be performed.
  • the knowledge graph building module 151 includes an unstructured data processing unit 1511 , a structured data processing unit 1512 , an open knowledge graph management unit 1515 , and a relational database. 1517 , and may include a natural language analysis unit 1513 , a knowledge graph information conversion unit 1514 , a large-capacity knowledge graph processing unit 1516 , and an ontology information processing unit 1518 as the second pipeline module.
  • the data output from the second pipeline may be transmitted to the marketing specialized knowledge graph construction unit 1519 and used to generate marketing specialized knowledge graph model data or keyword analysis information.
  • the unstructured data processing unit 1511 may identify the unstructured data from the marketing document data collected in the first pipeline stage, and transmit it to the natural language analyzer 1513 .
  • the unstructured data may include, for example, text data identified from marketing document data.
  • the natural language analyzer 153 may extract main keywords using natural language processing technology from the unstructured data.
  • the natural language processing technology may be exemplified by techniques such as morpheme analysis and entity name recognition, and the natural language analysis unit 1513 may use classification information of the document classification module 153 for more accurate keyword extraction processing.
  • the knowledge graph information conversion unit 1514 is a marketing knowledge graph information that is mapped and integrated into the knowledge graph information in a preset format by a mapping technology such as rule-based marketing keyword mapping or machine learning algorithm-based mapping. Format conversion can be processed.
  • the open knowledge graph management unit 1515 may collect and store pre-built open knowledge graph information using an openAPI or the like.
  • the large-capacity knowledge graph processing unit 1516 pre-builds the large-capacity knowledge graph information prepared so that the collected open knowledge graph information can be mapped to the marketing knowledge graph information that has been format-converted from the natural language analysis information described above,
  • the knowledge graph information may be transmitted to the marketing specialized knowledge graph model building unit 1519 .
  • the relational database 1517 may collect and store ontology information for semantic mapping between the knowledge graph information converted by the knowledge graph information conversion unit 1514 and the knowledge graph information processed by the large capacity knowledge graph processing unit 1516, Among the stored ontology information, mutually compatible ontology information may be transmitted to the marketing specialized knowledge graph construction unit 1519 .
  • the marketing-specialized knowledge graph model building unit 1519 collects open knowledge graph information collected from an RDF file or SPARQL Endpoint as knowledge graph model information for processing a large-capacity knowledge graph, and the converted marketing knowledge graph information By building a mapping table between and the large-capacity knowledge graph information, a marketing-specific knowledge graph model can be built.
  • the marketing-specialized knowledge graph model building unit 1519 performs mapping processing based on the unique identifier assigned to each data item, but in the case of the same data whose identifiers do not match, the pre-collected ontology information-based relationship information and attributes After calculating the matching probability through the information, data mapping processing for preferentially mapping the high probability may be performed.
  • FIG. 5 is a flowchart illustrating an operation of a knowledge graph construction module according to an embodiment of the present invention
  • FIG. 6 is a relationship diagram illustrating a knowledge graph construction and semantic mapping process according to an embodiment of the present invention.
  • the knowledge graph building module 151 is a knowledge graph from OpenAPI or structured file data. Conversion rule information may be obtained (S201).
  • the conversion rule information may be obtained from a conversion rule file described in R2RML (RDB to RDF Mapping Language), which is a W3C international standard.
  • R2RML RDB to RDF Mapping Language
  • the transformation rule information may be converted into knowledge graph transformation rule data using transformation rules described in RML (RDF Mapping Language) from OpenAPI or formatted file data.
  • the knowledge graph construction module 151 obtains ontology transformation rule information from the relational database (S203), and transforms the natural language analysis information of the unstructured data according to the knowledge graph transformation rule information (S205).
  • the knowledge graph construction module 151 maps the transformed knowledge graph information to a pre-built large-capacity knowledge graph according to the ontology transformation rule information to build a marketing-specialized knowledge graph model (S207).
  • the knowledge graph construction module 151 may include a marketing-specific knowledge graph model construction unit 1519 for generating marketing-specific knowledge graph model data.
  • the marketing specialized knowledge graph model building unit 1519 may include the semantic mapping processing unit to efficiently perform the above-described mapping processing with high accuracy.
  • the semantic mapping processing unit may further include a fuzzy algorithm processing unit and a URI identifier processing unit.
  • the semantic mapping processing unit may process semantic mapping between items of data converted into a knowledge graph format (eg, RDF) and a pre-established large-capacity knowledge graph item.
  • a knowledge graph format eg, RDF
  • the semantic mapping processing unit may include a URI identifier processing unit for processing primary mapping by comparing URI identifiers assigned to all data items.
  • the semantic mapping processing unit applies a semantic mapping tool between words implemented based on the Levenshtein fuzzy metric algorithm developed according to the linguistic characteristics of Korean from the primary mapping-processed data to obtain automated meaning. Mapping can be handled.
  • the data for which the automatic mapping is completed may be subjected to sampling processing, and the processed sampling data may be used for subsequent mapping inspection and correction processing.
  • the knowledge graph construction module 151 may acquire knowledge graph model data on which semantic mapping is completed as marketing-specific knowledge graph model data.
  • the knowledge graph construction module 151 may integrally generate a knowledge graph model by importing the mapped knowledge graph data into a triplestore type database in which the large-capacity knowledge graph data previously built is stored. have.
  • the final knowledge graph model be described as an RDF (Resource Description Framework) data model, which improves compatibility and analysis efficiency.
  • RDF Resource Description Framework
  • the classification system for each item of the established large-capacity knowledge graph may be a marketing-specialized system created by a domain expert in the marketing field.
  • the open knowledge graph management unit 1515 manages the classification system for each field based on the public interest (can be calculated as a number of searches for each period of the main portal service) corresponding to each classification system keyword, and the classification system for each field may decide to keep or archive them.
  • the apparatus 100 for providing marketing information analysis service according to an embodiment of the present invention, the problem of not reflecting the latest keywords pointed out as a disadvantage of the general knowledge graph, the Korean-based knowledge graph and It solves the difficulty of building dictionary data for analysis, facilitates marketing trend and keyword analysis through the establishment of a marketing-specialized knowledge graph model, and makes accurate marketing at a lower cost by facilitating the reflection of new words and Korean keyword analysis in particular. It has the advantage of being able to provide information analysis services.
  • FIG. 7 is a block diagram illustrating a document classification module according to an embodiment of the present invention in more detail
  • FIG. 8 is a flowchart illustrating a document classification processing process according to an embodiment of the present invention.
  • the quality of data to be analyzed is a very important factor.
  • data collected by online crawling, etc. includes a large amount of various types of garbage data, causing problems such as distortion of data analysis values and waste of system resources.
  • the document classification module 153 builds a hybrid classification system in which a machine learning algorithm and a rule-based algorithm for building a marketing-specialized knowledge graph model are combined to reach a high level of garbage filtering accuracy. make it possible
  • the document classification module 153 includes a classification system definition unit 1531 , a rule-based filtering unit 1532 , a model learning unit 1533 , and a machine learning filtering unit 1534 .
  • the classification system definition unit 1531 performs linear discriminant analysis (LDA) analysis on the marketing document original data collected and transmitted from the channel-based information collection unit 140, and classification system information according to the result of the execution define
  • LDA linear discriminant analysis
  • the linear discriminant analysis is a method of classifying an arbitrary class by arbitrarily defining a group through mechanical learning.
  • the classification system definition unit 1531 receives the collected marketing document data original information, topic number information, and a relevance metric value as LDA analysis information, and as an LDA analysis result, the original topic group information, and schedules frequently appearing by group Number of keyword information can be output.
  • classification system definition unit 1531 may adjust the number of topics and the related metrics within a meaningful predetermined number, and may process repetitive LDA analysis.
  • the classification system definition unit 1531 generates classification system information from the analysis result information and transmits it to the rule-based filtering unit 1532 .
  • the classification system information may include, for example, garbage topic selection information having a specific pattern for each step.
  • level 1 is advertisement, non-advertisement
  • level 2 is public relations, real estate, stock, transaction, etc.
  • level 3 is event, experience group, product repair and installation, rental, rental sale, stock, transaction sales, and other classification system It may include cataloging information of garbage keywords that appear frequently.
  • the rule-based filtering unit 1532 may perform a rule-based primary filtering process of the marketing document data using the classification system information received from the classification system definition unit 1531 as a rule. Accordingly, data primarily classified as a garbage document based on a rule may be removed from the analysis target classification data. Here, the primary filtered remaining data is transferred to the machine learning filtering unit 1534 .
  • model learning unit 1533 performs a document classification model learning process using a pre-established training set
  • machine learning filtering unit 1534 performs 2 classification processing using the model learning information from the primary filtered data. Carry out tea filtering.
  • data classified as garbage data again by the machine learning filtering unit 1534 may be included in the removal data, and only the remaining data may be classified and processed as normal collection data.
  • the rule-based filtering unit 1532 sets a rule module according to classification system definition information defined from the collected documents (S501).
  • the rule-based filtering unit 1532 may filter sentences in which garbage keywords that frequently appear for each classification system appear. To this end, the rule-based filtering unit 1532 may set a rule module corresponding to the classification system definition information. In addition, the rule-based filtering unit 1532 analyzes the original marketing document data collected by the channel-based information collection unit 140 according to whether it meets the garbage classification system defined in the rule module and the linguistic rule condition. Garbage classification analysis can be performed. Accordingly, the primary filtering classification is processed (S503).
  • a sentence containing the keyword 'one room' may be determined to be garbage classified as 'real estate'.
  • data classified as garbage and residual data may be classified, and the residual data may be transmitted to the machine learning filtering unit 1534 .
  • the machine learning filtering unit 1534 performs secondary filtering processing according to the document classification based on the learning model processed by the model learning unit 1533 .
  • the model learning unit 1533 pre-learns machine learning-based classification information (S505), and the machine learning filtering unit 1534 performs a corpus formation process for learning and a predictive model process for predicting classification labeling using the learning model. can do.
  • the secondary filtering classification based on machine learning learning information of the primary analysis residual data is processed (S507), and the processed data may be output as garbage data.
  • FIG. 9 is a diagram illustrating a classification system definition according to an embodiment of the present invention.
  • classification system information may be defined by spam topic clustering according to a preset number of LDA analysis and adjustment, and the final classification system modeled by the rule-based filtering unit 1532 may be layered and processed for each step.
  • the rule-based primary filtering keyword is set in stages through LDA analysis, and the primary processing result of the rule-based filtering unit 1532 and the machine learning filtering unit ( 1534) can be processed as a hybrid, resulting in more accurate (garbage filtering accuracy of 80% or more achieved). This helps in quick data-based decision making, and can reduce the time and manpower used to build the existing heuristic rule-based filtering system.
  • the above-described method according to various embodiments of the present invention may be implemented as a program and provided to each server or device while being stored in various non-transitory computer readable media. Accordingly, the user terminal 100 may access the server or device and download the program.
  • the non-transitory readable medium refers to a medium that stores data semi-permanently, rather than a medium that stores data for a short moment, such as a register, cache, memory, etc., and can be read by a device.
  • a non-transitory readable medium such as a CD, DVD, hard disk, Blu-ray disk, USB, memory card, ROM, and the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Business, Economics & Management (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

L'invention concerne un appareil de classification de documents qui, selon un mode de réalisation, comprend : une unité de définition de système de classification qui définit des informations de système de classification par analyse d'une pluralité d'éléments de données de document de commercialisation pour construire un graphe de connaissances spécialisées sur la commercialisation ; une unité de filtrage à base de règles qui élimine principalement des données inutiles en effectuant un filtrage primaire à base de règles de la pluralité d'éléments de données de document de commercialisation selon les informations de système de classification définies ; et une unité de filtrage d'apprentissage machine qui élimine secondairement des données inutiles par le biais d'un filtrage d'apprentissage machine basé sur un modèle pré-entraîné des données de document de commercialisation qui restent après le filtrage primaire.
PCT/KR2020/015583 2019-11-25 2020-11-09 Procédé de classification de documents pour graphe de connaissances de commercialisation et appareil associé WO2021107447A1 (fr)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
KR1020190152515A KR20210063880A (ko) 2019-11-25 2019-11-25 마케팅 지식 그래프를 위한 문서 분류 처리 방법 및 그 장치
KR1020190152516A KR20210063881A (ko) 2019-11-25 2019-11-25 문서 분류 처리를 위한 프로그램 및 기록매체
KR10-2019-0152516 2019-11-25
KR10-2019-0152515 2019-11-25

Publications (1)

Publication Number Publication Date
WO2021107447A1 true WO2021107447A1 (fr) 2021-06-03

Family

ID=76130278

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2020/015583 WO2021107447A1 (fr) 2019-11-25 2020-11-09 Procédé de classification de documents pour graphe de connaissances de commercialisation et appareil associé

Country Status (1)

Country Link
WO (1) WO2021107447A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114647743A (zh) * 2022-05-20 2022-06-21 国网浙江省电力有限公司 电力营销全业务门禁规则图谱生成及处理方法、装置
CN117473431A (zh) * 2023-12-22 2024-01-30 青岛民航凯亚系统集成有限公司 一种基于知识图谱的机场数据分类分级方法及系统

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6751614B1 (en) * 2000-11-09 2004-06-15 Satyam Computer Services Limited Of Mayfair Centre System and method for topic-based document analysis for information filtering
KR100479346B1 (ko) * 2001-12-28 2005-03-30 한국전자통신연구원 문서분류기법을 이용한 정답문서집합 자동 구축 방법
JP4349875B2 (ja) * 2003-09-19 2009-10-21 株式会社リコー 文書フィルタリング装置、文書フィルタリング方法、および文書フィルタリングプログラム
US20140156567A1 (en) * 2012-12-04 2014-06-05 Msc Intellectual Properties B.V. System and method for automatic document classification in ediscovery, compliance and legacy information clean-up
JP2019114239A (ja) * 2017-11-13 2019-07-11 アクセンチュア グローバル ソリューションズ リミテッド 機械学習およびファジーマッチングを使用した階層型の自動的な文書の分類およびメタデータ識別

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6751614B1 (en) * 2000-11-09 2004-06-15 Satyam Computer Services Limited Of Mayfair Centre System and method for topic-based document analysis for information filtering
KR100479346B1 (ko) * 2001-12-28 2005-03-30 한국전자통신연구원 문서분류기법을 이용한 정답문서집합 자동 구축 방법
JP4349875B2 (ja) * 2003-09-19 2009-10-21 株式会社リコー 文書フィルタリング装置、文書フィルタリング方法、および文書フィルタリングプログラム
US20140156567A1 (en) * 2012-12-04 2014-06-05 Msc Intellectual Properties B.V. System and method for automatic document classification in ediscovery, compliance and legacy information clean-up
JP2019114239A (ja) * 2017-11-13 2019-07-11 アクセンチュア グローバル ソリューションズ リミテッド 機械学習およびファジーマッチングを使用した階層型の自動的な文書の分類およびメタデータ識別

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114647743A (zh) * 2022-05-20 2022-06-21 国网浙江省电力有限公司 电力营销全业务门禁规则图谱生成及处理方法、装置
CN114647743B (zh) * 2022-05-20 2022-08-26 国网浙江省电力有限公司 电力营销全业务门禁规则图谱生成及处理方法、装置
CN117473431A (zh) * 2023-12-22 2024-01-30 青岛民航凯亚系统集成有限公司 一种基于知识图谱的机场数据分类分级方法及系统

Similar Documents

Publication Publication Date Title
WO2021107444A1 (fr) Procédé de fourniture de service d'analyse d'informations de commercialisation sur la base d'un graphe de connaissances et dispositif associé
CN107566376B (zh) 一种威胁情报生成方法、装置及系统
JP4654776B2 (ja) 質問応答システム、およびデータ検索方法、並びにコンピュータ・プログラム
WO2021107449A1 (fr) Procédé pour fournir un service d'analyse d'informations de commercialisation basée sur un graphe de connaissances à l'aide de la conversion de néologismes translittérés et appareil associé
CN110968684B (zh) 一种信息处理方法、装置、设备及存储介质
WO2021107447A1 (fr) Procédé de classification de documents pour graphe de connaissances de commercialisation et appareil associé
WO2021107448A1 (fr) Procédé et appareil permettant de fournir un service d'analyse d'informations marketing basée sur un graphe de connaissances afin de prendre en charge un traitement de classification de documents efficace
WO2011096690A2 (fr) Terminal de communication portable pour extraire des sujets présentant un intérêt pour l'utilisateur et procédé s'y rapportant
CN109446305A (zh) 智能旅游客服系统的构建方法以及系统
KR20210063874A (ko) 지식 그래프 기반 마케팅 정보 분석 서비스 제공 방법 및 그 장치
CN103778471A (zh) 提供信息差距的指示的问答系统
CN110852095B (zh) 语句热点提取方法及系统
WO2021107446A1 (fr) Appareil et procédé de fourniture de service d'agent conversationnel d'analyse marketing basée sur un graphe de connaissances
WO2020085663A1 (fr) Système de génération automatique de logos basée sur l'intelligence artificielle et procédé de service de génération de logos l'utilisant
WO2023096254A1 (fr) Système de mise en correspondance d'emploi sur la base de l'intelligence artificielle
Al-Safadi et al. Developing ontology for Arabic blogs retrieval
KR20210063882A (ko) 효율적 문서 분류 처리를 지원하는 지식 그래프 기반 마케팅 정보 분석 서비스 제공 방법 및 그 장치
WO2021107445A1 (fr) Procédé pour fournir un service d'informations de mots nouvellement créés sur la base d'un graphe de connaissances et d'une conversion de translittération spécifique à un pays, et appareil associé
WO2018131955A1 (fr) Procédé d'analyse de contenus numériques
CN114896305A (zh) 一种基于大数据技术的智慧互联网安全平台
WO2023078136A1 (fr) Procédé et appareil de construction de jeu de données, dispositif, support de stockage et produit de programme informatique
KR20210063878A (ko) 지식 그래프 기반 마케팅 정보 분석 챗봇 서비스 제공 방법 및 그 장치
WO2011109195A1 (fr) Système et procédé associé pour déterminer et appliquer des caractéristiques socioculturelles
CN113742496B (zh) 一种基于异构资源融合的电力知识学习系统及方法
KR20220074574A (ko) 지식 그래프 기반 라이브스트림 실시간 채팅 내용 분석 방법 및 그 장치

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20891477

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20891477

Country of ref document: EP

Kind code of ref document: A1