WO2020111827A1

WO2020111827A1 - Automatic profile generation server and method

Info

Publication number: WO2020111827A1
Application number: PCT/KR2019/016608
Authority: WO
Inventors: 정희동; 이상범; 조민희; 김동희
Original assignee: 주식회사 로켓펀치
Priority date: 2018-11-29
Filing date: 2019-11-28
Publication date: 2020-06-04
Also published as: KR102185733B1; KR20200064490A

Abstract

Disclosed are an automatic profile generation server and method. An automatic profile generation server according to an embodiment includes: a collection module which periodically collects documents, including articles, columns, and interviews, in a web space including news sites and blogs; a database which stores the collected documents, the sources of the documents, and web space information, and stores profile generation information including keywords for generating profile information from the documents, and tags representing information categories in which business information and the keywords are included; an extraction module which analyzes sentences included in a document from which profile information is to be extracted, extracts keywords, tags each of the letters constituting the keywords with tag information which is profile category information, and generates profile reserve information; and a generation module which collects the extracted profile reserve information, merges continuously tagged letters to generate keywords that are pieces of the profile information, and separates the keywords from the tags to generate the profile information.

Description

Profile auto-generation server and method

It is related to the automatic server and method for profile creation. Specifically, if a document such as an article or column that can be collected online is entered, the profile information is automatically sorted according to the item and output automatically. It is about.

Unless otherwise indicated herein, the content described in this section is not prior art to the claims of this application and is not admitted to be prior art by inclusion in this section.

Artificial intelligence technology is being researched and developed in various fields. Recently, artificial intelligence programs that can be useful in real life, such as big data analysis, voice recognition, and language implication recognition, have spread and are used in various smart terminals.

Among the various fields of artificial intelligence technology, the language implication recognition field enables advanced data processing such as interpretation, classification, and inference of language contents included in documents to be performed by an automated system rather than a person. The artificial intelligence technology related to language processing has recently been applied to smart device control and smart home service to make it more convenient to control the smart terminal. Artificial intelligence related to language recognition can be used not only for speech recognition, but also for interpreting recorded language information such as documents, sentences, and word recognition and extracting important information.

In particular, with the development of technology, a lot of corporate information and personal information that the economic population finds and utilizes has been created online, and digital content that contains information of many people, such as articles, columns, and interviews, is being created every time. Since these contents are scattered in various places, users are using online search services or offline materials to manually find and use necessary information. Or, organizations that collect a lot of information on each category, such as the media, distribute information through efforts to manually create and update personal information.

Profile information managed manually can be automated by creating and updating profile information through language recognition technology. However, since the conventional language recognition technology mainly grasps the meaning of words through morpheme classification and analysis, profile information in which pronouns, foreign words, and new words are frequently used is often recognized incorrectly.

In the embodiment, profile information that automatically collects various business information such as corporate information, personal information, and book information required by the economic population and is collected in data units, and automatically extracts and processes the automatically collected data in a form convenient for people to utilize Provide a generating server and method.

Profile automatic generation server according to an embodiment includes a collection module that periodically collects articles including articles, columns, and interviews in a web space including a news site and a blog; A database for storing the collected document and the source and web space information of the document, and storing profile generation information including keywords for generating profile information from the document and tags indicating business information and information categories including keywords; An extraction module that analyzes sentences included in the document from which profile information is to be extracted, extracts keywords, and generates profile preliminary information by tagging tag information, which is profile category information, in each letter constituting the keyword; And a generation module for collecting the extracted profile preliminary information, merging successively tagged texts to generate keywords that are profile information, and classifying keywords and tags to generate profile information. It includes.

A method for automatically generating a profile according to another embodiment includes (A) the collecting module periodically collecting a document including an article, a column, and an interview in a web space including a news site and a blog; (B) The database stores the collected document and the source and web space information of the document, and stores the profile generation information including keywords for generating profile information from the document and tags indicating business information and information categories containing the keyword. To do; (C) The extraction module analyzes sentences included in the document to extract profile information, extracts keywords, and generates profile preliminary information by tagging tag information, which is profile category information, in each letter constituting the keyword. To do; And (D) generating a module to collect the extracted profile preliminary information, to generate keywords that are profile information by merging successively tagged characters, and to generate profile information by classifying keywords and tags; It includes.

The profile information generation server and method according to the embodiment enable automatic and accurate extraction of profile information, which is important information about people, companies, and products from various online contents.

As the profile data extracted through machine learning accumulates, the accuracy and speed of profile information extraction can be improved. In addition, when a specific keyword included in the profile information of the same person is repeatedly extracted, the reliability of the keyword is calculated so that it is possible to grasp how accurate the specific profile information is.

The profile information generation server and method according to an embodiment automatically prevents the generation of incorrect profile information and the spread of information by automatically calculating the reliability of the profile information, separating profile data of the same person, and continuously updating the profile information. .

It should be understood that the effects of the present invention are not limited to the above-described effects, and include all effects that can be deduced from the configuration of the invention described in the detailed description or claims of the present invention.

1 is a diagram showing an approximate data processing block of a profile creation server according to an embodiment.

2 is a view showing in more detail the data processing block of the profile information generation server according to the embodiment.

3 is a view for explaining the machine learning process of the profile information generation server according to the embodiment

4 is a view for explaining a process of generating profile information according to an embodiment

5 is a diagram showing a data processing flow for automatically generating profile information according to an embodiment

6 is a diagram showing a data processing process for generating profile preliminary information according to an embodiment

7 is a view for explaining a profile information generation process according to an embodiment

Profile automatic generation server according to the implementation includes a collection module that periodically collects articles including articles, columns, and interviews in a web space including a news site and a blog; A database that stores the collected document, the source and web space information of the document, and stores profile generation information including keywords for generating profile information from the document and tags indicating information categories including keywords and business information. ; Extraction module that analyzes sentences included in the document to extract profile information, extracts keywords, and generates profile preliminary information by tagging tag information, which is profile category information, in each letter constituting the keyword; And a generation module for collecting the extracted profile preliminary information, merging successively tagged texts to generate keywords that are profile information, and classifying the keywords and tags to generate profile information. It includes.

Advantages and features of the present invention, and methods for achieving them will be clarified with reference to embodiments described below in detail together with the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below, but may be implemented in various different forms, and only the embodiments allow the disclosure of the present invention to be complete, and common knowledge in the technical field to which the present invention pertains. It is provided to fully inform the holder of the scope of the invention, and the invention is only defined by the scope of the claims. The same reference numerals refer to the same components throughout the specification.

In describing embodiments of the present invention, when it is determined that a detailed description of known functions or configurations may unnecessarily obscure the subject matter of the present invention, the detailed description will be omitted. In addition, terms to be described later are terms defined in consideration of functions in an embodiment of the present invention, which may vary according to a user's or operator's intention or practice. Therefore, the definition should be made based on the contents throughout this specification.

Referring to FIG. 1, the profile generation server according to the embodiment may include a collection module 110, a database 130, an extraction module 150, and a generation module 170.

As used herein, the term'module' should be interpreted to include software, hardware, or a combination thereof, depending on the context in which the term is used. For example, the software may be machine language, firmware, embedded code, and application software. As another example, the hardware can be a circuit, processor, computer, integrated circuit, integrated circuit core, sensor, micro-electro-mechanical system (MEMS), passive device, or combinations thereof.

The collection module 110 periodically collects documents from various web spaces and external servers. For example, the collection module 110 periodically collects document data in which articles, columns, interviews, and the like are recorded in web sites such as news sites, blogs, and various SNS.

The database 130 stores a series of data necessary for generating profile information, such as the collected document and the source of the document and web space information and profile creation information. For example, keywords, tags, and the like necessary for generating profile information may be stored in the database 130. In an embodiment, keywords are content data representing profile information as words and proper nouns extracted from sentences input to the server. The tag is a category of keyword and profile information, and may be higher information of a specific keyword. For example, when the keyword is'manager', the tag of the'manager' keyword may be'position', and when the keyword is '30', the tag of the '30' keyword may be'age'. The database 130 accumulates and stores keywords and tags and profile information generated by keywords and tags, and updates and stores changed profile information of the same person.

The extraction module 150 analyzes sentences included in the document from which profile information is to be extracted, and extracts keywords from the sentences. Subsequently, tag preliminary information is generated by tagging the tags constituting the keyword with tags indicating the profile category information. In the embodiment, if the sentence'Baek Hyun-a, the representative manager of Elvision, Inc. is a veteran with over 10 years of industry experience' is entered as a server, extracting'Elvision' as a keyword in each letter constituting'Elvision' Add tags. Specifically, data such as'L_company, non-company, all_company' may be profile preliminary information. In an embodiment, the tag information added to the keyword may be selected through other keywords adjacent to the specific keyword, or may be used by loading accumulated keyword tag information in the database. Continuing from the previous example, Elvision can recognize the word adjacent to another keyword, Inc., and select tag information added to each word constituting the keyword Elvision as a'company'.

The generation module 170 collects the extracted profile preliminary information to generate keywords, and classifies the keywords according to the profile information category. For example, when the same tag is continuously added to each letter, the generation module 170 merges the letters having the same tag to generate a keyword. Specifically, when the company tags appear consecutively, the words'L','B', and'I' tagged with the same tag are respectively collected and merged to generate the keyword'LVI'. Subsequently, the generation module 170 generates and displays profile information classified by sorting keywords according to tag information tagged to the keyword. Continuing the above example, it is possible to generate profile information that classifies keywords and tag information assigned to keywords in the form of'Company: Elvision'.

In addition, in the embodiment, the generation module 170 stores the keyword after generating it, and in the process of merging the tagged words when analyzing new input data, if the merged word is equal to or more than a predetermined percentage, the previously stored keyword is recommended. can do. Continuing the above example, if the profile creation server 100 generates a company name keyword'Elvision' and has already saved it, the generation module 170 continues the letters'L_company, non-company'. Upon input, calculates the match rate of the letters and tags constituting the previously stored keyword'Elvision', and when the calculated match rate is above a certain level (reference value),'Elvision' is a keyword corresponding to the company of the profile information. Automatic extraction is possible. In the embodiment, when the generation module 170 recognizes even'Elvy', a matching rate of 66% with the pre-stored keyword'Elvision' is calculated, and thus only tags of 2 letters and 2 letters are recognized and then called'Elvision'. The keyword creation module 170 may automatically recommend the keyword. In an embodiment, the reference value of the matching rate for performing automatic keyword recommendation may vary according to the number of characters and tags constituting the pre-stored keyword. For example, in the case of a keyword composed of 3 letters, if the letters and tags are the same as up to 2 letters, 66% of automatically recommending the keyword can be set as a reference value. It is possible to set 60% to automatically recommend keywords as a reference value.

2 is a view showing in more detail the data processing block of the profile information generation server according to the embodiment, and FIG. 3 is a view for explaining the machine learning process of the profile information generation server according to the embodiment.

Referring to Figure 2, the database of the profile information generation server according to the embodiment may be composed of a keyword storage unit 131, a tag storage unit 133, a profile preliminary information storage unit 135, the extraction module 150 The learning unit 151, the extraction unit 153 and the tagging unit 155 may be configured, and the generation module 170 may include a generation unit 171, a classification unit 173, and an output unit 175. It may be configured to include, the calculation module 190 may be configured to include a counting unit 191 and the calculation unit 193.

In the keyword storage unit 131 of the database, proper nouns and words as profile information are classified and stored. The tag storage unit 133 stores detailed item information of the profile information. For example, the tag storage unit stores category information constituting profile information such as job, age, date of birth, affiliation, institution, position, career, peculiarity, address, job, annual sales. The profile preliminary information storage unit 135 stores profile preliminary information tagged with letters constituting a keyword.

The learning unit 151 of the extraction module 150 analyzes the meaning of the words included in the sentence and the location information in the sentence of the word to infer the meaning and correlation between words, and machine learning to extract profile preliminary information To perform. In an embodiment, a model of machine learning may be trained to enable Named Entity Recognition (hereinafter NER).

In an embodiment, the generation module 170 may use tagging information of letters adjacent to a specific letter to correct the tagging error of the specific letter constituting the word. For example, as a result of analyzing the remaining tags excluding'last name' and'first name' in the input sentence, when two or more consecutive tags do not appear, the generation module 170 displays the tags of the surrounding letters that are the first letter and the last letter of the specific letter. Recognize. If the tags of the front and back letters, which are the recognized surrounding letters, are the same type of tag, the tags of the specific letters, which are intermediate letters, are changed to the same tags as the tags of the front letters and the back letters. Afterwards, a keyword including the text with the changed tag is generated. Specifically, in the case of A_tag1, B_tag2, C_tag1, D_tag1, and E_tag1, the generation module 170 may change B to tag1 and recognize'ABCDE' as tag1. Through this, it is possible to lower the error rate of profile generation due to tagging error.

Referring to FIG. 3, when the machine learning process of the extraction module is described, the extraction module receives profile pre-word data tagged with keywords and classifications from the database. Thereafter, a model for profile information is generated through a training process using the transmitted data. In an embodiment, various neural networks including LSTM (RNN) and CNN may be used. Subsequently, prediction on a new input is performed based on the generated model. That is, the extraction unit 151 automatically extracts keywords when a document is input according to the result of machine learning.

The tagging unit 155 assigns a tag indicating the category or metadata of the keyword to each letter included in the extracted keyword. In an embodiment, when another word adjacent to the keyword is a tag indicating profile category information, it may be added to each letter of the keyword.

The generation module 170 collects keywords tagged to each letter from the extraction module 150 and continuously merges the tagged text to generate keywords that are profile information. Thereafter, the classification unit 173 classifies the generated keyword according to the profile information category indicated by the keyword. For example, the classification unit 172 may classify according to tag information given to keywords.

The output unit 175 displays profile information in which keywords are sorted according to tag information.

The calculation module 191 may calculate profile importance according to the number of times keywords and tags are extracted from the collected document, and when a specific keyword is extracted from the profile information of the same person, reliability of the extracted keyword may be calculated. To this end, the counting unit 191 counts the number of times keywords and tags have been extracted, and the calculating unit 193 calculates keyword reliability proportional to the same keyword counting number for the same person.

In an embodiment, the generation module 170 may independently generate and manage profile information for the same person, or update the profile for the same person when the profile is changed.

The generation module 170 compares the names in the generated profile information, and if the names are the same, compares the profile information of other categories other than the names, and if the same profile information other than the same name does not exist, a new name for the person with the same name Profile information can be created. In addition, in the embodiment, the generation module 170 may determine whether the generated profile information is the same person's profile according to a result of comparing unique information such as age and date of birth from profile information generated with the same name. If the name and unique information match, profile information of different categories is compared, and if other profile information exists, the previous profile can be updated according to the time when the profile information was generated.

4 is a view for explaining a learning process of the profile information generation server and learning data of the profile information generation server according to the embodiment.

Referring to FIG. 4, on the server, “CEO Shin Yong-soo of 3D Eye Pictures who majored in imaging at the university developed the world's first underwater 3D imaging equipment and completed patent registration.” When the sentence of (10) is entered, the server separates the letters constituting the sentence one by one regardless of spaces or words and morphemes. Subsequently, through the semantic analysis of each word, a tag is added to a letter that can indicate profile information. As shown in FIG. 4, the major tag is assigned to the letter'zero' constituting the keyword'imageology', and the title tag is assigned to the letter'large' constituting the keyword representative. Characters, tags and data shown in the table (a) of FIG. 4 are used as learning data of the profile information generation server as profile preliminary information tagged with characters.

In an embodiment, when profile preliminary information is generated by tagging each letter, a keyword is generated by merging the letters with the same tag information consecutively, and the tag tagged to the keyword is divided into keyword category information, and b of FIG. You can create profile information such as

When extracting valid information such as profile information by semantic analysis of conventional Hangul text, a word resulting from the use of a morpheme analyzer is generally used as a semantic unit. If the above sentence is used as the input of a morpheme analyzer,'image science' or'representative' can be selected as a word, and tags such as'major' and'position' can be assigned to the word. However, the method of tagging the morpheme is likely to generate inaccurate profile information because a proper noun, a company name with many new words, and a name are not recognized. Since the profile generation server according to the embodiment generates tag information by tagging every letter without using a morpheme analyzer, it is possible to accurately recognize important profile information such as foreign words, company names or names with many new words or proper nouns. To make.

Hereinafter, a method of generating profile information will be sequentially described. Since the operation (function) of the profile information generation method according to the embodiment is essentially the same as that of the profile information generation server, a description overlapping with FIGS. 1 to 4 will be omitted.

5 is a diagram illustrating a data processing flow for automatically generating profile information according to an embodiment.

In step S510, the collection module periodically collects articles including articles, columns, and interviews from a web space including news sites and blogs on the profile auto-generation server.

In step S530, the document is collected in a database, and the source and web space information of the document are stored, and profile generation information including a keyword for generating profile information from the document and a tag indicating a category of information including a business information and keywords is generated. To save.

In step S550, the extraction module analyzes sentences included in the document to extract the profile information, extracts keywords from the sentences, and generates profile preliminary information by tagging profile category information in letters constituting the keyword. .

In step S570, the generation module collects the extracted profile preliminary information, classifies the keywords according to the profile information category, generates the profile words by merging consecutively tagged letters, and collects keywords and profile words to generate profile information. .

In step S590, display profile information is displayed according to the category of keywords and profile words.

6 is a diagram illustrating a data processing process for generating profile preliminary information according to an embodiment.

In step S551, semantic analysis of the words included in the sentence and location information in the sentence of the word are grasped to infer the semantic relationship and correlation between words, and machine learning is performed to extract profile preliminary information.

In step S553, keywords are extracted from the input document according to the result of the machine learning.

In step S555, profile preliminary information is generated to indicate profile information that assigns a tag indicating a category or metadata of the keyword to each letter included in the extracted keyword.

7 is a view for explaining a process of generating profile information according to an embodiment.

Referring to FIG. 7, the server says, “Jun Jeon Joon of the game board of a professional game board that has been over 10 years now is a person pioneering the field of domestic and global game casters.” When the sentence of (20) is entered, the server separates the words constituting the sentence and the letters constituting the word according to the spacing. Subsequently, through the semantic analysis of each word, a tag is added to a letter that can indicate profile information. As shown in FIG. 7, the title tag is assigned to the letter'crab' constituting the keyword'gamecaster', and the gender tag is assigned to the letter'before' constituting the keyword'dedicated'. When profile preliminary information is generated by tagging each letter, a keyword is generated by merging the letters with the same tag information consecutively, and the tag tagged to the keyword is divided into keyword category information, and the profile shown in FIG. Information can be generated.

The profile information generation server and method according to the embodiment enable automatic and accurate extraction of profile information, which is important effective information about people, companies, and products from various online contents.

As the profile data extracted through machine learning accumulates, the accuracy and speed of profile data extraction can be improved.

The disclosed content is only an example, and can be variously modified by a person having ordinary skill in the art without departing from the gist of the claims claimed in the claims. It is not limited to the examples.

It automatically extracts profile information, which is important information about people, companies, and products, from various online contents automatically and prevents the generation of incorrect profile information and the spread of information.

Claims

In the automatic profile creation server,

A collection module for periodically collecting documents including articles, columns, and interviews in a web space including a news site and a blog;

A database that stores the collected document, the source and web space information of the document, and stores profile generation information including keywords for generating profile information from the document and tags indicating information categories including keywords and business information. ;

An extraction module that analyzes sentences included in the document from which profile information is to be extracted, extracts keywords, and generates profile preliminary information by tagging tag information, which is profile category information, in each letter constituting the keyword; And

A generation module that collects the extracted profile preliminary information, merges continuously tagged texts to generate keywords that are profile information, and classifies the keywords and tags to generate profile information; Profile auto-generation server comprising a.
According to claim 1, The extraction module

A learning unit performing semantic analysis of words included in a sentence and location information in a sentence to infer semantic relationships and correlations between words, and performing machine learning to generate profile preliminary information;

An extraction unit that extracts keywords from the input document according to the machine learning result;

A tagging unit that assigns a tag indicating a category or metadata of the keyword to each letter included in the extracted keyword; Profile auto-generation server, characterized in that it comprises a.
The method of claim 1, wherein the generation module

A generating unit that collects profile preliminary information tagged to each letter from the extraction module and continuously merges the letters tagged with the same tag to generate keywords that are profile information;

A classification unit that classifies the profile information according to the tag information of the category or keyword including the generated keyword; And

An output unit that displays the classified keyword and tag information that is a category of the keyword according to a profile information format; Automatic profile generation server, characterized in that it comprises a.
The method of claim 1, wherein the automatic profile creation server

An operation module for counting the number of times keywords and tags have been extracted from the collected documents, and calculating the reliability of the extracted keywords when the same keywords are repeatedly extracted with profile information of the same person; Profile auto-generation server, characterized in that it further comprises.
The method of claim 1, wherein the automatic profile creation server

Compare the name in the generated profile information, and if the name is the same, compare other profile information other than the name, and if the same profile information other than the name does not exist, it is characterized in that it generates profile information of the same name person for the name Automatic profile creation server.
The method of claim 5, wherein the automatic profile creation server

If it is determined that the profile is for the same person by comparing the age and date of birth from the profile information created with the same name, if there is different profile information by comparing the profile information of different categories, the previous profile is displayed according to the time when the profile information was created. Automatic profile creation server characterized in that the update.
The method of claim 1, wherein the database

A keyword storage unit that classifies keywords according to categories of profile information including education, age, school, department, and position, and stores the classified keyword data;

Creates category or metadata for each keyword as tag information, stores characters tagged with tag information corresponding to the keyword in text constituting the keyword as profile preliminary information, and stores the tag information Tag storage unit; Profile auto-generation server, characterized in that it comprises a.
In the automatic profile creation method,

(A) the collecting module periodically collects articles including articles, columns, and interviews from a web space including a news site and a blog;

(B) The database stores the collected document and the source and web space information of the document, and generates a profile including a keyword for generating profile information from the document and tags indicating business information and a category of information including the keyword. Storing information;

(C) The extraction module analyzes sentences included in the document to extract profile information, extracts keywords, and generates profile preliminary information by tagging tag information, which is profile category information, in each letter constituting the keyword. To do; And

(D) the generation module collects the extracted profile preliminary information, merges the tagged text in succession to generate a keyword that is profile information, and classifies the keyword and tag to generate profile information; Automatic profile creation method comprising a.
The extraction module of claim 8, wherein the extracting module (C) analyzes sentences included in the document to extract profile information, extracts keywords, and tags tag information, which is profile category information, in each letter constituting the keyword. ) To generate profile preliminary information; The

Semantic analysis of the words included in the sentence and the location information in the sentence of the word to determine the semantic relationship and correlation between words, and performing machine learning to generate profile preliminary information;

Extracting keywords from the input document according to the machine learning result;

Assigning a tag indicating a category or metadata of the keyword to each letter included in the extracted keyword; Automatic profile generation method characterized in that it comprises a.
10. The method of claim 8, wherein the (D) generation module collects the extracted profile preliminary information, merges continuously tagged text to generate keywords that are profile information, and classifies the keywords and tags to generate profile information. step; The

Collecting profile preliminary information tagged with each letter from the extraction module, and subsequently generating keywords that are profile information by merging the letters with the same tag;

Classifying profile information according to tag information of a category or keyword including the generated keyword; And

Displaying the classified keyword and tag information that is a category of the keyword according to a profile information format; Automatic profile generation method characterized in that it comprises a.
The method of claim 8, wherein the automatic profile generation method

(E) counting the number of times keywords and tags have been extracted from the documents collected by the calculation module, and calculating the reliability of the extracted keywords when the same keywords are repeatedly extracted with profile information of the same person; Automatic profile generation method characterized in that it further comprises.
10. The method of claim 8, wherein the (D) generation module collects the extracted profile preliminary information, merges continuously tagged text to generate keywords that are profile information, and classifies the keywords and tags to generate profile information. step; The

Compare the name in the generated profile information, and if the name is the same, compare other profile information other than the name, and if the same profile information other than the name does not exist, it is characterized in that it generates profile information of the same name person for the name Automatic profile creation method.
The method of claim 12, wherein the (D) generation module collects the extracted profile preliminary information, merges continuously tagged text to generate keywords that are profile information, and classifies the keywords and tags to generate profile information. step; The

If it is determined that the profile is for the same person by comparing the age and date of birth from the profile information created with the same name, if there is different profile information by comparing the profile information of different categories, the previous profile is displayed according to the time when the profile information was created. Automatic profile creation method characterized by updating.