WO2019112117A1 - Procédé et programme informatique pour inférer des méta-informations d'un créateur de contenu textuel - Google Patents

Procédé et programme informatique pour inférer des méta-informations d'un créateur de contenu textuel Download PDF

Info

Publication number
WO2019112117A1
WO2019112117A1 PCT/KR2018/001409 KR2018001409W WO2019112117A1 WO 2019112117 A1 WO2019112117 A1 WO 2019112117A1 KR 2018001409 W KR2018001409 W KR 2018001409W WO 2019112117 A1 WO2019112117 A1 WO 2019112117A1
Authority
WO
WIPO (PCT)
Prior art keywords
vector
meta information
syllable
creator
morpheme
Prior art date
Application number
PCT/KR2018/001409
Other languages
English (en)
Korean (ko)
Inventor
박외진
오성식
오세진
하헌규
Original Assignee
(주)아크릴
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by (주)아크릴 filed Critical (주)아크릴
Publication of WO2019112117A1 publication Critical patent/WO2019112117A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis

Definitions

  • Embodiments of the present invention relate to a method and computer program for inferring meta information of a text content creator, and more particularly, to a method for inferring meta information of a content creator based on morpheme and syllable of text content.
  • such terminals are implemented to use various functions such as generating secondary information from received information, in a conventional function of simply receiving and displaying information.
  • the amount of content posted on the web is increasing exponentially, which makes it difficult to grasp the attributes of individual contents (for example, to understand the age and gender of the creator).
  • the present invention intends to deduce the author's meta information based on the morphological characteristics of the text content.
  • the present invention intends to deduce the author's meta information based on the syllable characteristic of the text content, that is, the dialogue (or speech) of the text content, and deduce the author's meta information with higher accuracy.
  • the present invention intends to deduce the author's meta information in consideration of both morphological features and syllable characteristics of text contents.
  • a method for inferring meta information of a text content creator includes: receiving text content; Dividing the textual content into one or more morphemes, and determining a morpheme vector from the one or more morphemes based on the morpheme-vector transformer; Dividing the textual content into one or more syllables, and determining a syllable vector from the segmented one or more syllables based on a syllable-vector converter; And determining a meta information vector corresponding to the author's meta information based on the morpheme vector, the syllable vector, and the creator recognizer.
  • the morpheme-vector converter is a data set expressing a correlation between a plurality of morphemes and a plurality of morpheme vectors
  • the syllable-vector converter is a data set expressing a correlation between a plurality of syllables and a plurality of syllable vectors
  • the method of inferring the author's meta information may further include generating one or more divided contents by dividing the text contents into predetermined units after the step of receiving the text contents.
  • determining the morphological vector comprises determining a morphological vector for each of the one or more segmented content
  • determining the syllable vector comprises determining a syllable vector for each of the at least one segmented content
  • the determining may determine a meta information vector corresponding to the author's meta information based on the morphological vector of each of the one or more sub-content, the syllable vector of each of the one or more sub-content, and the creator recognizer.
  • the method for inferring the author's meta information includes: determining meta information of a creator for each of the at least one divided contents based on a meta information vector for each of the at least one divided contents after determining the meta information vector; And determining meta information of the text content creator based on the meta information of the creator of the at least one divided content.
  • the predetermined unit may be a sentence unit.
  • the method of inferring the author's meta information may further include determining meta information of the text content creator based on the determined meta information vector after determining the meta information vector.
  • the meta information of the text content creator may include at least one of the age range of the creator, the gender of the creator, the region related to the creator, the political tendency of the creator, the academic background of the creator, and the marriage of the creator.
  • the step of determining meta information of the creator comprises the steps of: determining the age range of the text content creator in the form of a probability that the creator belongs to a plurality of candidate age ranges; The gender of the text content creator can be determined.
  • the method for inferring the author's meta information comprises: learning the morpheme-vector converter based on a plurality of first learning data including a first test morpheme and a first morpheme vector corresponding to the first test morpheme; Learning the syllable-vector converter based on a plurality of second learning data including a first test syllable and a first syllable vector corresponding to the first test syllable; And learning the author recognizer based on third learning data including a second morpheme vector, a second syllable vector, and a meta information vector corresponding to the second morpheme vector and the second syllable vector .
  • the determining the meta information vector comprises: combining the morpheme vector and the syllable vector to generate a content vector; And determining the meta information vector based on the content vector and the creator recognizer.
  • the author's meta information can be inferred based on the morphological characteristics of the text content.
  • the author's meta information can be inferred based on the syllable characteristic of the text content, that is, the dialogue (or speech) of the text content, and the author's meta information can be deduced with higher accuracy.
  • FIG. 1 schematically illustrates a content creator meta information reasoning system according to an embodiment of the present invention.
  • FIG. 2 schematically shows a configuration of a creator meta information inference apparatus according to an embodiment of the present invention.
  • FIGS. 3A to 3C are diagrams for explaining a method of generating / learning a morpheme-vector converter, a syllable-vector converter, and a creator recognizer according to an author meta information inference apparatus according to an embodiment of the present invention.
  • FIG. 4 is a diagram for explaining a method of dividing a text content including a plurality of sentences into divided contents according to an embodiment of the present invention.
  • FIG. 5 is a diagram for explaining a method of inferring author's meta information from a text content (or a divided content) according to an embodiment of the present invention.
  • FIG. 6 is a flowchart illustrating a method for inferring meta information of a text content creator by a creator meta information inference apparatus according to an exemplary embodiment of the present invention.
  • FIG. 7 illustrates an example of a screen displayed on a display unit of a user terminal according to an exemplary embodiment of the present invention.
  • a method of inferring meta information of a creator of a text content comprising: receiving text content; Dividing the textual content into one or more morphemes, and determining a morpheme vector from the one or more morphemes based on the morpheme-vector transformer; Dividing the textual content into one or more syllables, and determining a syllable vector from the segmented one or more syllables based on a syllable-vector converter; And determining a meta information vector corresponding to the author's meta information based on the morpheme vector, the syllable vector, and the creator recognizer.
  • FIG. 1 schematically illustrates a content creator meta information reasoning system according to an embodiment of the present invention.
  • a content creator meta information reasoning system may include a server 100, a user terminal 200, an external device 300, and a communication network 400 connecting them .
  • the content creator meta information reasoning system is a system in which the server 100 receives text content from the user terminal 200 and / or the external device 300, and infer the meta information of the creator of the received content can do. Also, the content creator meta information reasoning system can allow the server 100 to acquire content that the author's meta information is known in advance from the user terminal 200 and / or the external device 300, . A more detailed description will be given later.
  • 'author's meta information' may mean personal information or personal information about the creator such as the age group, sex, residence area, political inclination, etc. of the creator.
  • 'text content' may mean various contents including text in at least a part of the contents.
  • text content may mean content that contains only text.
  • the server 100 according to an embodiment of the present invention can infer the author's meta information by analyzing the entire text content.
  • the text content may mean content including content such as images, images, etc. in addition to text.
  • the server 100 according to an embodiment of the present invention may separate only the text from the content and infer the content creator's meta information therefrom.
  • the text content includes only text.
  • the user terminal 200 may refer to various devices capable of transmitting and receiving the above-described text contents to and from the server 100.
  • the terminal may be the personal computer 202 or the portable terminal 201.
  • the portable terminal 201 is shown as a smart phone, but the spirit of the present invention is not limited thereto.
  • the user terminal 200 may include display means for displaying the content, and input means for obtaining a user's input on the content.
  • the input means and the display means can be configured in various ways.
  • the input means may include, but is not limited to, a keyboard, a mouse, a trackball, a microphone, a button, a touch panel,
  • the external device 300 may refer to various devices that transmit and receive data to and from the server 100 and / or the user terminal 200 via the communication network 400.
  • the external device 300 may be an apparatus that provides learning data for learning a creator recognizer provided in the server 100.
  • the external device 300 may be a server that provides content (e.g., a comment on a newspaper article or an article) and meta information of a creator of the content (e.g., meta information of a reporter who wrote the article or meta information of a user who wrote the comment) .
  • the external device 300 may be a single number or a plurality.
  • the external device 300 transmits the identification information on the text content to be provided to the user terminal 200 by the external device 300 itself to the server 100, Or may be a device that receives meta information of the text content creator.
  • the external device 300 may be a service-based server that provides a service for providing meta information about articles for which users want to know the author's meta information.
  • both of the above-described cases are exemplary, and the spirit of the present invention is not limited thereto.
  • the communication network 400 connects the server 100, the user terminal 200, and the external device 300.
  • the communication network 400 provides a connection path so that the user terminal 200 can transmit and receive packet data after connecting to the server 100.
  • the communication network 400 may be a wired network such as LANs (Local Area Networks), WANs (Wide Area Networks), MANs (Metropolitan Area Networks), ISDNs (Integrated Service Digital Networks), wireless LANs, CDMA, Bluetooth, But the scope of the present invention is not limited thereto.
  • the server 100 may receive the text content from the user terminal 200 and / or the external device 300 and infer the meta information of the received content creator.
  • the server 100 may acquire one or more learning contents marked with meta information from the user terminal 200 and / or the external device 300, and train the creator recognizer based thereon.
  • the server 100 may include an author meta information inference device as shown in FIG.
  • FIG. 2 schematically shows a configuration of a creator meta information inference apparatus 110 according to an embodiment of the present invention.
  • the author meta information inference apparatus 110 may include a communication unit 111, a control unit 112, and a memory 113. Also, although not shown in the figure, the author meta information inference apparatus 110 according to the present embodiment may further include an input / output unit, a program storage unit, and the like.
  • the communication unit 111 is a communication unit that allows the author meta information inference apparatus 110 to transmit and receive a signal such as a control signal or a data signal through a wired connection with another network apparatus such as the user terminal 200 and / Hardware, and software.
  • the control unit 112 may include any type of device capable of processing data, such as a processor.
  • the term 'processor' may mean a data processing device embedded in hardware, for example, having a circuit physically structured to perform a function represented by a code or an instruction contained in the program.
  • a microprocessor a central processing unit (CPU), a processor core, a multiprocessor, an ASIC (Application-Specific Integrated Circuit, and an FPGA (Field Programmable Gate Array), but the scope of the present invention is not limited thereto.
  • the memory 113 performs a function of temporarily or permanently storing the data processed by the author meta-information inferring apparatus 110.
  • the memory 113 may include a magnetic storage medium or a flash storage medium, but the scope of the present invention is not limited thereto.
  • the author meta information inference apparatus 110 is provided in the server 100 in the following description, the author meta information inference apparatus 110 may be provided separately from the server 100 according to role allocation.
  • the server 100 that is, the author meta information inference apparatus 110, receives the text content for inferring the author's meta information from the user terminal 200 and / or the external apparatus 300,
  • the meta information of the creator of the text content can be deduced.
  • the creator meta information inference device 110 may acquire one or more learning contents marked with meta information of the creator from the user terminal 200 and / or the external device 300, and train the creator recognizer based on this.
  • the author meta information inference apparatus 110 generates and / or learns a creator recognizer from one or more learning contents
  • the 'author recognizer' may be a data set representing a correlation between a plurality of morpheme vectors and a plurality of syllable vectors and a plurality of meta information vectors. That is, the author recognizer may represent a correspondence relationship between a plurality of morpheme vectors and a plurality of syllable vectors and a plurality of meta information vectors.
  • the morpheme vector may be a vector generated based on the morphological analysis of the text content in which the author's meta information is to be inferred.
  • the syllable vector may also be a vector generated based on a syllable analysis for the text content for which the author's meta information is to be inferred.
  • the meta information vector is a vector corresponding to the meta information of the creator of the text content, and may be a vector including information on various items (for example, age group, sex, residence area, political orientation, etc.).
  • Such 'author recognizer' may be generated by machine learning based on a plurality of learning data.
  • the learning data may be received from the external device 300 described above.
  • the learning data may be stored in the memory 113 of the meta information reasoning device 110.
  • the author recognizer may be a data set as described above.
  • the author recognizer may be a data set comprising a plurality of numbers, such as a matrix.
  • the present invention is not limited thereto.
  • the 'morpheme-vector converter' may be a data set expressing a correlation between a plurality of morphemes and a plurality of morpheme vectors.
  • the 'syllable-vector converter' may be a data set expressing a correlation between a plurality of syllables and a plurality of syllable vectors.
  • Both the morpheme-vector converter and the syllable-vector converter described above may be generated by machine learning based on a plurality of learning data similar to the author recognizer. For example, in the case of a morpheme-vector converter, it may be generated based on learning data including a morpheme and a morpheme vector corresponding to the morpheme. Similarly, in the case of a syllable-vector converter, it may be generated based on learning data including a certain syllable and a syllable vector corresponding to the syllable.
  • FIGS. 3A through 3C are diagrams illustrating a method of generating and / or learning an author meta information inference apparatus 110 according to an exemplary embodiment of the present invention, which includes a morphological-vector converter 520, a syllable-vector converter 620, and a creator recognizer 720 Fig.
  • the controller 112 of the author meta information inference apparatus 110 generates a morpheme-to-vector converter 520 using the first test morpheme and the first test morpheme
  • a plurality of first learning data 500 including a corresponding first morpheme vector may be received and / or obtained.
  • the controller 112 can acquire learning data including a morpheme and a morpheme vector corresponding to the morpheme.
  • a morpheme vector 511 corresponding to the morpheme and the morpheme may be included.
  • the number of morphemes corresponding to any one morpheme vector may be a single number or a plurality of morphemes.
  • the controller 112 may learn the morphological-vector converter 520 based on the received and / or obtained first learning data 500.
  • the morpheme-vector converter 520 may be a correlation between a plurality of morphemes generated by a machine learning technique and a plurality of morpheme vectors, that is, mapping information between them.
  • control unit 112 controls the data set of the morpheme-vector converter 520 so that the morpheme of the first learning data 500 and the morpheme vectors corresponding to the morpheme can be mapped (mapped) Vector converter 520 in a manner such that the morphological-vector converter 520 is updated.
  • the control unit 112 maps the morpheme of the first learning data 500 and the morpheme vectors corresponding to the morpheme corresponding to each other
  • the coefficients constituting the morpheme-vector converter 520 can be appropriately adjusted.
  • the control unit 112 can acquire the morpheme-vector converter 520 with improved accuracy by updating the coefficients based on the plurality of learning data.
  • the present invention can make it possible to deduce the author's meta information based on the morphological feature of the text content.
  • a controller 112 includes a plurality of syllable-to- And / or acquire second learning data 600 of FIG.
  • the controller 112 can acquire the second learning data 600 including the syllable and the syllable vector corresponding to the syllable.
  • a syllable vector 611 corresponding to syllables and syllables may be included.
  • the number of syllables corresponding to any one syllable vector may be a single number or a plurality of syllables.
  • the controller 112 can learn the syllable-vector converter 620 based on the second learning data 600 received and / or obtained.
  • the syllable-vector converter 620 may be a correlation between a plurality of syllables generated by a machine learning technique and a plurality of syllable vectors, that is, mapping information of both syllables.
  • control unit 112 may include a data set of the syllable-vector converter 620 so that the syllable of the second learning data 600 and the syllable vector corresponding to the syllable can be mapped (mapped)
  • the syllable-to-vector converter 620 can be learned by updating the syllable-to-
  • the controller 112 maps the syllable of the second learning data 600 and the syllable vector corresponding to the syllable
  • the syllable-to-vector converter 620 may appropriately adjust the coefficients constituting the syllable-vector converter 620 so that the syllable-
  • the control unit 112 can acquire the syllable-vector converter 620 with a higher accuracy by updating the coefficients based on the plurality of learning data.
  • the author's meta information can be deduced based on the syllable characteristic of the text content, that is, the dialogue (or speech) of the text content, and the author's meta information can be deduced with higher accuracy have.
  • a controller 112 corresponds to a second morpheme vector, a second syllable vector, a second morpheme vector, and a second syllable vector for generation of the creator recognizer 720 And / or acquire a plurality of third learning data 700 including a meta information vector to be obtained.
  • the controller 112 according to an embodiment of the present invention can acquire learning data including a morphological vector and a syllable vector, and a meta information vector corresponding to the morphological vector and the syllable vector.
  • a morpheme vector Vm2, a syllable vector Vs2, and a meta information vector 711 corresponding thereto may be included.
  • the controller 112 can learn the creator recognizer 720 based on the received and / or obtained third learning data 700.
  • the author recognizer 720 may be a correlation between a plurality of morpheme vectors and a plurality of syllable vectors and a plurality of meta information vectors generated by a machine learning technique, that is, mapping information of both.
  • control unit 112 may update the data set of the author recognizer 720 so that the morphological vector of the learning data and the syllable vector and the meta information vector can correspond to each other (map)
  • the author recognizer 720 can be learned.
  • the controller 112 may map the morphological vector of the learning data and the syllable vector and the corresponding meta information vector to each other
  • the coefficients constituting the creator recognizer 720 can be appropriately adjusted.
  • the control unit 112 can acquire the creator recognizer 720 with improved accuracy by updating the coefficients based on the plurality of learning data.
  • the present invention allows the author's meta information to be inferred in consideration of both morphological features and syllable characteristics of the text content.
  • the controller 112 includes a morpheme-vector converter 520, a syllable-vector converter 620, and a creator recognizer 720 for inferring the author's meta information from the text content ) Can be generated and / or learned.
  • the control unit 112 may receive the text content for inferring the author's meta information from the user terminal 200 and / or the external device 300.
  • the text content may include various kinds of contents (e.g., images, images, etc.) in addition to the text content as described above.
  • the text content may include only one sentence or may include a plurality of sentences. If the text content includes a plurality of sentences, the control unit 112 according to an embodiment of the present invention may divide the text content into a plurality of contents to infer the author's meta information.
  • FIG. 4 is a diagram for explaining a method of dividing text content 800 including a plurality of sentences into divided contents 810, 820 and 830 according to an embodiment of the present invention.
  • the controller 112 may generate the one or more divided contents 810, 820, and 830 by dividing the text content 800 into predetermined units.
  • the predetermined unit may be a sentence unit, a paragraph unit, or a topic unit.
  • the present invention is not limited thereto.
  • FIG 5 is a diagram for explaining a method of the controller 112 inferring the author's meta information from the text content 810 (or the divided content) according to an embodiment of the present invention.
  • control unit 112 may receive or acquire text content, and may divide it into predetermined units, if necessary, to generate one or more divided contents.
  • the control unit 112 may divide the text content 810 into one or more morphemes 811 according to an embodiment of the present invention.
  • the control unit 112 may divide the content into one or more morphemes such as' Korea, '' peace, '' for the purpose, 'and' please help 'if the text content is' .
  • the control unit 112 according to an embodiment of the present invention can determine the morpheme vector 821 from one or more morphemes 811 that are divided based on the morpheme-vector converter 520.
  • the morpheme-vector converter 520 may be a correlation between a plurality of morphemes generated by a machine learning technique and a plurality of morpheme vectors, that is, mapping information between them.
  • control unit 112 may input one or more morphemes 811 to the morpheme-vector converter 520 and obtain a morpheme vector 821 corresponding to one or more morphemes 811 as a result.
  • the present invention can deduce the author's meta information based on the morphological characteristics of the text content.
  • the control unit 112 may divide the text content 810 into one or more syllables 812. [ For example, as in the above-described example, when the text content is' Please do it for the peace of Korea, 'the control unit 112 sets the content to one or more syllables such as' Daejun', 'Han', 'Min', ' Can be divided.
  • the controller 112 may determine the syllable vector 822 from one or more syllable segments 812 that are segmented based on the syllable-vector converter 620.
  • the syllable-vector converter 620 may be a correlation between a plurality of syllables generated by a machine learning technique and a plurality of syllable vectors, that is, mapping information of both syllables.
  • control unit 112 may input one or more syllables 812 to the syllable-vector converter 620 and thereby obtain one or more syllables 812 and a syllable vector 822 corresponding thereto.
  • the present invention can deduce the author's meta information based on the syllable characteristic of the text content, that is, the dialogue (or speech) of the text content, and can infer the author's meta information with higher accuracy.
  • the control unit 112 generates a meta information vector corresponding to the author's meta information based on the morpheme vector 821, syllable vector 822 and creator recognizer 720 determined by the above- 830).
  • the author recognizer 720 also includes a plurality of morpheme vectors and a plurality of syllable vectors generated by a Machine Learning technique, A correlation of information vectors, that is, mapping information of both.
  • the control unit 112 inputs the morpheme vector 821 and the syllable vector 822 to the creator recognizer 720 and outputs the meta information vector 830 corresponding to the morpheme vector 821 and syllable vector 822 as a result, Can be obtained.
  • the controller 112 may generate a content vector by merging the morpheme vector and the syllable vector, and determine the meta information vector 830 based on the generated content vector and the creator recognizer 720 have.
  • merging a vector may mean generating a new vector corresponding to the sum of the number of dimensions of two vectors, and generating a vector of a new dimension (a dimension equal to or less than the sum of the dimensions of two vectors) through a predetermined operation It may mean to do.
  • the control unit 112 performs a series of processes (a process of determining a morphological vector, a syllable vector, and a meta information vector) on the divided content generated by dividing text content into predetermined units .
  • the control unit 112 can determine a morphological vector for each of the one or more divided contents.
  • the control unit 112 may also determine a syllable vector for each of the one or more sub-content.
  • the control unit 112 can determine a meta information vector corresponding to the creator's meta information based on the morphological vector of each of the one or more divided contents, the syllable vector of each of the one or more divided contents, and the creator recognizer. In this way, the control unit 112 can determine the meta information vector for each divided content.
  • a method of determining author's meta information for the entire contents will be described later.
  • the controller 112 can determine the meta information 840 of the creator of the text content 810 based on the meta information vector 830 determined by the above process.
  • the meta information of the content creator may include at least one of the age range of the creator, the gender of the creator, the region related to the creator, the political tendency of the creator, the academic background of the creator, and the marriage of the creator.
  • the above-mentioned items are exemplary and can be used as meta information of the present invention if they are items that can be used as meta information of a person.
  • the controller 112 can determine meta information for each item of meta information in a probability form of a plurality of options for each item. For example, the control unit 112 may determine the age range of the text content creator in the form of a probability that the creator belongs to a plurality of candidate age groups (10, 20, 30, 40, 50, etc.). Similarly, the control unit 112 can determine the gender of the creator in the form of a probability that the creator is male and a probability that the creator is female.
  • the above-mentioned items of age group and gender are illustrative and not intended to limit the scope of the present invention.
  • the controller 112 when the meta information vector is determined for each of the divided contents, the controller 112 according to an embodiment of the present invention can determine meta information for each divided content.
  • the controller 112 may determine the meta information about the full text content by merging the meta information about each divided content. For example, when the probability that the creator belongs to a plurality of candidate ages is determined for each of the divided contents, the control unit 112 calculates the sum of the probabilities of the age groups (for example, the sum of the probabilities belonging to the teenagers, Meta information about the text content can be determined.
  • the present invention is not limited thereto.
  • the present invention can deduce the author's meta information in consideration of both morphological features and syllable characteristics of the text content.
  • FIG. 6 is a flowchart illustrating a method by which the author meta information inference apparatus 110 inferences meta information of a text content creator according to an embodiment of the present invention.
  • the description of the contents overlapping with those described in Figs. 1 to 5 will be omitted.
  • the author meta information inference apparatus 110 can generate / learn a morpheme-vector converter, a syllable-vector converter, and a creator recognizer (S61)
  • the 'author recognizer' may be a data set representing a correlation between a plurality of morpheme vectors and a plurality of syllable vectors and a plurality of meta information vectors. That is, the author recognizer may represent a correspondence relationship between a plurality of morpheme vectors and a plurality of syllable vectors and a plurality of meta information vectors.
  • the morpheme vector may be a vector generated based on the morphological analysis of the text content in which the author's meta information is to be inferred.
  • the syllable vector may be a vector generated based on a syllable analysis for the text content in which the author's meta information is to be inferred.
  • the meta information vector is a vector corresponding to the meta information of the creator of the text content, and may be a vector including information on various items (for example, age, sex, residence area, etc.).
  • Such 'author recognizer' may be generated by machine learning based on a plurality of learning data.
  • the learning data may be received from the external device 300 described above.
  • the learning data may be stored in the memory 113 of the meta information reasoning device 110.
  • the author recognizer may be a data set as described above.
  • the author recognizer may be a data set comprising a plurality of numbers, such as a matrix.
  • the present invention is not limited thereto.
  • the 'morpheme-vector converter' may be a data set expressing a correlation between a plurality of morphemes and a plurality of morpheme vectors.
  • the 'syllable-vector converter' may be a data set expressing a correlation between a plurality of syllables and a plurality of syllable vectors.
  • Both the morpheme-vector converter and the syllable-vector converter described above may be generated by machine learning based on a plurality of learning data similar to the author recognizer. For example, in the case of a morpheme-vector converter, it may be generated based on learning data including a morpheme and a morpheme vector corresponding to the morpheme. Similarly, in the case of a syllable-vector converter, it may be generated based on learning data including a certain syllable and a syllable vector corresponding to the syllable.
  • the meta information inference apparatus 110 generates the morpheme-vector converter 520, the syllable-vector converter 620, and the creator recognizer 720 will be described.
  • the author meta information inference apparatus 110 of the author meta information inference apparatus 110 includes a first test morpheme for generating the morpheme-vector converter 520, A plurality of first learning data 500 including a first morpheme vector corresponding to one test morpheme can be received and / or obtained.
  • the author meta information inference apparatus 110 can acquire learning data including a morpheme and a morpheme vector corresponding to the morpheme.
  • a morpheme vector 511 corresponding to the morpheme and the morpheme may be included.
  • the number of morphemes corresponding to any one morpheme vector may be a single number or a plurality of morphemes.
  • the author meta information inference apparatus 110 can learn the morpheme-vector converter 520 based on the received and / or obtained first learning data 500.
  • the morpheme-vector converter 520 may be a correlation between a plurality of morphemes generated by a machine learning technique and a plurality of morpheme vectors, that is, mapping information between them.
  • the author meta information inference apparatus 110 includes a morpheme-vector converter 520 (corresponding to the morpheme-vector converter) so that the morpheme of the first learning data 500 and the morpheme vectors corresponding to the morpheme can be mapped Vector converter 520 in a manner that updates the data set of the morpheme-vector converter 520.
  • the author meta information inference apparatus 110 calculates the morpheme of the first learning data 500 and the morpheme vector corresponding to the morpheme
  • the coefficients constituting the morpheme-vector converter 520 can be appropriately adjusted so as to correspond to each other.
  • the author meta information inference apparatus 110 can acquire the morpheme-vector converter 520 with improved accuracy by updating the coefficients based on the plurality of learning data.
  • the present invention can make it possible to deduce the author's meta information based on the morphological feature of the text content.
  • the author meta information inference apparatus 110 generates a syllable-vector converter 620 by generating a first test syllable and a first syllable vector corresponding to a first test syllable And / or acquire a plurality of second learning data 600,
  • the author meta information inference apparatus 110 can acquire second learning data 600 including a syllable and a syllable vector corresponding to the syllable.
  • a syllable vector 611 corresponding to syllables and syllables may be included.
  • the number of syllables corresponding to any one syllable vector may be a single number or a plurality of syllables.
  • the author meta information inference apparatus 110 can learn the syllable-vector converter 620 based on the second learning data 600 received and / or obtained.
  • the syllable-vector converter 620 may be a correlation between a plurality of syllables generated by a machine learning technique and a plurality of syllable vectors, that is, mapping information of both syllables.
  • the author meta information inference apparatus 110 includes a syllable-vector converter 620 so that syllables of the second learning data 600 and syllable vectors corresponding to the syllables can be mapped to each other
  • the syllable-to-vector converter 620 can be learned by updating the data set of the syllable-to-vector converter 620.
  • the author meta information inference apparatus 110 calculates the syllable of the second learning data 600 and the syllable vector corresponding to the syllable
  • the coefficients constituting the syllable-vector converter 620 can be appropriately adjusted so that they can be mapped (mapped) to each other.
  • the author meta-information inference apparatus 110 can acquire the syllable-vector converter 620 with improved accuracy by updating the coefficients based on the plurality of learning data.
  • the author's meta information can be deduced based on the syllable characteristic of the text content, that is, the dialogue (or speech) of the text content, and the author's meta information can be deduced with higher accuracy have.
  • the author meta information inference apparatus 110 includes a second morpheme vector, a second syllable vector, and a second morpheme vector for generating the author identifier 720, It is possible to receive and / or acquire a plurality of third learning data 700 including a meta information vector corresponding to a syllable vector.
  • the author meta information inference apparatus 110 can acquire learning data including a morphological vector and a syllable vector, and a meta information vector corresponding to the morphological vector and the syllable vector.
  • a morpheme vector Vm2, a syllable vector Vs2, and a meta information vector 711 corresponding thereto may be included.
  • the author meta information inference apparatus 110 can learn the author recognizer 720 based on the received and / or obtained third learning data 700.
  • the author recognizer 720 may be a correlation between a plurality of morpheme vectors and a plurality of syllable vectors and a plurality of meta information vectors generated by a machine learning technique, that is, mapping information of both.
  • the author meta information inference apparatus 110 updates (updates) the data set of the author recognizer 720 so that morphological vectors of the learning data and syllable vectors and meta information vectors can be mapped
  • the author recognizer 720 can be learned in such a manner that the creator recognizer 720 is updated.
  • the author meta information inference apparatus 110 maps the morphological vector of the learning data and the syllable vector and the corresponding meta information vector to each other ) Of the creator recognizer 720 so as to be able to adjust the values of the coefficients.
  • the author meta information inference apparatus 110 can acquire the creator recognizer 720 with improved accuracy by updating the coefficients based on the plurality of learning data.
  • the present invention allows the author's meta information to be inferred in consideration of both morphological features and syllable characteristics of the text content.
  • the author meta information inference apparatus 110 includes a morpheme-vector converter 520, a syllable-vector converter 620, and a syllable-
  • the creator recognizer 720 can be generated and / or learned.
  • the creator's meta information inference apparatus 110 can receive the text content for inferring the author's meta information (S62). At this time, as described above, Content (e.g., images, images, etc.).
  • Content e.g., images, images, etc.
  • the text content may include only one sentence or may include a plurality of sentences.
  • the author meta information inference apparatus 110 may divide the text content into a plurality of contents and infer the author's meta information.
  • the author meta information inference apparatus 110 can divide the text content 810 into one or more morphemes 811 and determine a morpheme vector 821 from the divided morpheme 811 (S63)
  • the author meta-information reasoning device 110 may determine whether the text content is' please do it for the peace of Korea, 'such as' Korea,' 'peace,' 'for the purpose, And the morpheme vector 821 can be determined based thereon.
  • the morpheme-vector converter 520 may be a correlation between a plurality of morphemes generated by a machine learning technique and a plurality of morpheme vectors, that is, mapping information between them.
  • the author meta information inference device 110 may input one or more morphemes 811 into the morpheme-vector converter 520 and obtain a morpheme vector 821 corresponding to one or more morphemes 811 as a result .
  • the present invention can deduce the author's meta information based on the morphological characteristics of the text content.
  • the author meta information inference device 110 can divide text content 810 into one or more syllables 812 and determine syllable vectors 822 from the one or more syllable segments 812 (S64)
  • the creator meta information inference device 110 may generate the content such as' Dae ',' Han ',' Min ',' Country ', etc. if the text content is' It may be divided into one or more syllables and the syllable vector 822 may be determined based thereon.
  • the syllable-vector converter 620 may be a correlation between a plurality of syllables generated by a machine learning technique and a plurality of syllable vectors, that is, mapping information of both syllables.
  • author meta information inference device 110 may input one or more syllables 812 into syllable-vector converter 620 and thereby obtain syllable vectors 822 corresponding to one or more syllables 812 .
  • the present invention can deduce the author's meta information based on the syllable characteristic of the text content, that is, the dialogue (or speech) of the text content, and can infer the author's meta information with higher accuracy.
  • the author meta information inference apparatus 110 generates a meta information corresponding to the creator's meta information based on the morpheme vector 821, syllable vector 822 and creator recognizer 720 determined by the above-
  • the meta information vector 830 can be determined (S65)
  • the author recognizer 720 also includes a plurality of morpheme vectors and a plurality of syllable vectors generated by a Machine Learning technique, A correlation of information vectors, that is, mapping information of both.
  • the author meta information inference apparatus 110 inputs the morpheme vector 821 and the syllable vector 822 to the author recognizer 720 and outputs meta information 822 corresponding to the morpheme vector 821 and syllable vector 822 Vector 830 can be obtained. Meanwhile, the author meta information inference apparatus 110 according to an embodiment of the present invention generates a content vector by merging the morpheme vector and the syllable vector, and generates a meta information vector 830 (830) based on the generated content vector and the creator recognizer 720. [ ) May be determined.
  • merging a vector may mean generating a new vector corresponding to the sum of the number of dimensions of two vectors, and generating a vector of a new dimension (a dimension equal to or less than the sum of the dimensions of two vectors) through a predetermined operation It may mean to do.
  • the creator meta information inference apparatus 110 performs a series of processes (a morphological vector, a syllable vector, and a meta information vector for the divided content generated by dividing a text content into predetermined units Process) can be performed in the same manner.
  • the author meta information inference apparatus 110 may determine a morphological vector for each of the one or more divided contents.
  • the author meta information inference device 110 may also determine a syllable vector for each of the one or more sub-content items.
  • the author meta information inference apparatus 110 can also determine a meta information vector corresponding to the creator's meta information based on the morphological vector of each of the one or more divided contents, the syllable vector of each of the one or more divided contents, and the creator recognizer. In this manner, the author meta information inference apparatus 110 can determine a meta information vector for each divided content. On the other hand, a method of determining author's meta information for the entire contents will be described later.
  • the author meta information inference apparatus 110 can determine the meta information 840 of the creator of the text content 810 based on the meta information vector 830 determined by the above- S66)
  • the meta information of the content creator may include at least one of the age range of the creator, the gender of the creator, the region related to the creator, the political tendency of the creator, the academic background of the creator, and the marriage of the creator.
  • the above-mentioned items are exemplary and can be used as meta information of the present invention if they are items that can be used as meta information of a person.
  • the author meta information inference apparatus 110 can determine meta information for each item of meta information in a probability form of a plurality of options for each item. For example, the author meta information inference apparatus 110 may determine the age range of the text content creator in the form of a probability that the creator belongs to a plurality of candidate age groups (10, 20, 30, 40, 50, etc.). Similarly, the author meta information inference apparatus 110 can determine the author's gender in the form of a probability that a creator is male and a probability that a creator is female.
  • the above-mentioned items of age group and gender are illustrative and not intended to limit the scope of the present invention.
  • the author meta information inference apparatus 110 can determine meta information about each divided content.
  • the author meta information inference apparatus 110 may determine meta information about the entire text content by merging the meta information about each divided content. For example, when the probability that the creator belongs to a plurality of candidate age ranges is determined for each of the divided contents, the author meta information inference device 110 calculates the sum of the probabilities of the age groups (for example, the sum of the probabilities belonging to the teenagers, Etc.) to determine the meta information for the full text content.
  • the present invention is not limited thereto.
  • the present invention can deduce the author's meta information in consideration of both morphological features and syllable characteristics of the text content.
  • FIG. 7 is an illustration of a screen 900 displayed on the display unit of the user terminal 200 according to an embodiment of the present invention.
  • the server 100 provides a service for providing the author's meta information on the text content inputted by the user through the user terminal 200, and the user can access the server 100 ) As shown in Fig.
  • the server 100 can provide meta information of the speculated content creator to the user terminal 200 together with the screen 900.
  • the screen 900 includes a region 910 in which the content of the content input by the user is displayed, an area 920 in which the sex of the inferred creator is displayed in the form of probability, and the age range of the inferred creator is displayed in the form of probability Region 930.
  • the embodiments of the present invention described above can be embodied in the form of a computer program that can be executed on various components on a computer, and the computer program can be recorded on a computer-readable medium.
  • the medium may be a computer-executable program. Examples of the medium include a magnetic medium such as a hard disk, a floppy disk and a magnetic tape, an optical recording medium such as CD-ROM and DVD, a magneto-optical medium such as a floptical disk, And program instructions including ROM, RAM, flash memory, and the like.
  • the computer program may be designed and configured specifically for the present invention or may be known and used by those skilled in the computer software field.
  • Examples of computer programs may include machine language code such as those produced by a compiler, as well as high-level language code that may be executed by a computer using an interpreter or the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

Selon un mode de réalisation, la présente invention concerne un procédé pour inférer des méta-informations d'un créateur de contenu textuel, et le procédé peut comprendre les étapes consistant à : recevoir un contenu textuel ; diviser le contenu textuel en un ou plusieurs morphèmes, et déterminer un vecteur de morphèmes à partir du ou des morphèmes divisés sur la base d'un transformateur en vecteur de morphèmes ; diviser le contenu textuel en une ou plusieurs syllabes, et déterminer un vecteur de syllabes à partir de la ou des syllabes divisées sur la base d'un transformateur en vecteur de syllabes ; et déterminer un vecteur de méta-informations correspondant à des méta-informations d'un créateur sur la base du vecteur de morphèmes, du vecteur de syllabes, et d'un reconnaisseur de créateur.
PCT/KR2018/001409 2017-12-05 2018-02-01 Procédé et programme informatique pour inférer des méta-informations d'un créateur de contenu textuel WO2019112117A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR10-2017-0166201 2017-12-05
KR1020170166201A KR101985900B1 (ko) 2017-12-05 2017-12-05 텍스트 콘텐츠 작성자의 메타정보를 추론하는 방법 및 컴퓨터 프로그램

Publications (1)

Publication Number Publication Date
WO2019112117A1 true WO2019112117A1 (fr) 2019-06-13

Family

ID=66750524

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2018/001409 WO2019112117A1 (fr) 2017-12-05 2018-02-01 Procédé et programme informatique pour inférer des méta-informations d'un créateur de contenu textuel

Country Status (2)

Country Link
KR (1) KR101985900B1 (fr)
WO (1) WO2019112117A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111144575A (zh) * 2019-12-05 2020-05-12 支付宝(杭州)信息技术有限公司 舆情预警模型的训练方法、预警方法、装置、设备及介质

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102488620B1 (ko) * 2021-12-17 2023-01-18 주식회사 텐 자산화 된 학습 데이터 및 자산화 된 인공 신경망 데이터와 관련된 수익을 분배하는 방법
KR102488619B1 (ko) * 2021-12-17 2023-01-18 주식회사 텐 학습된 인공 신경망을 자산화 하는 방법
KR102488618B1 (ko) * 2021-12-17 2023-01-18 주식회사 텐 인공 신경망의 학습을 위한 학습 데이터를 자산화 하는 방법

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005025602A (ja) * 2003-07-04 2005-01-27 Matsushita Electric Ind Co Ltd 文章・言語生成装置およびその選択方法
KR20090034052A (ko) * 2007-10-02 2009-04-07 동국대학교 산학협력단 감정정보 추출 장치 및 방법
KR20110062896A (ko) * 2009-12-04 2011-06-10 한국과학기술원 지역 정보 검색 장치 및 방법
KR20120109943A (ko) * 2011-03-28 2012-10-09 가톨릭대학교 산학협력단 문장에 내재한 감정 분석을 위한 감정 분류 방법
KR20130036863A (ko) * 2011-10-05 2013-04-15 (주)워드워즈 의미적 자질을 이용한 문서 분류 시스템 및 그 방법

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100701044B1 (ko) * 2004-07-20 2007-03-29 황상석 온라인망을 기반으로 하는 위급상황 처리 시스템
TWI280029B (en) * 2004-10-27 2007-04-21 Inst Information Industry Method and system for data authorization and mobile device using the same
KR101178068B1 (ko) * 2005-07-14 2012-08-30 주식회사 케이티 텍스트의 카테고리 분류 장치 및 그 방법
EP2864938A1 (fr) * 2012-06-21 2015-04-29 Thomson Licensing Procédé et appareil de déduction de données démographiques d'utilisateurs
US20150379092A1 (en) * 2014-06-26 2015-12-31 Hapara Inc. Recommending literacy activities in view of document revisions

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005025602A (ja) * 2003-07-04 2005-01-27 Matsushita Electric Ind Co Ltd 文章・言語生成装置およびその選択方法
KR20090034052A (ko) * 2007-10-02 2009-04-07 동국대학교 산학협력단 감정정보 추출 장치 및 방법
KR20110062896A (ko) * 2009-12-04 2011-06-10 한국과학기술원 지역 정보 검색 장치 및 방법
KR20120109943A (ko) * 2011-03-28 2012-10-09 가톨릭대학교 산학협력단 문장에 내재한 감정 분석을 위한 감정 분류 방법
KR20130036863A (ko) * 2011-10-05 2013-04-15 (주)워드워즈 의미적 자질을 이용한 문서 분류 시스템 및 그 방법

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
SEONG SIK OH: "Jonathan Brain", KOREAN INFORMATION PROCESSING SYSTEM COMPETITION, 13 October 2017 (2017-10-13) *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111144575A (zh) * 2019-12-05 2020-05-12 支付宝(杭州)信息技术有限公司 舆情预警模型的训练方法、预警方法、装置、设备及介质

Also Published As

Publication number Publication date
KR101985900B1 (ko) 2019-09-03

Similar Documents

Publication Publication Date Title
WO2019112117A1 (fr) Procédé et programme informatique pour inférer des méta-informations d'un créateur de contenu textuel
WO2017150860A1 (fr) Prédiction de saisie de texte sur la base d'informations démographiques d'utilisateur et d'informations de contexte
WO2019117466A1 (fr) Dispositif électronique pour analyser la signification de la parole, et son procédé de fonctionnement
WO2020246702A1 (fr) Dispositif électronique et procédé de commande de dispositif électronique associé
WO2020213842A1 (fr) Structures multi-modèles pour la classification et la détermination d'intention
WO2018097439A1 (fr) Dispositif électronique destiné à la réalisation d'une traduction par le partage d'un contexte d'émission de parole et son procédé de fonctionnement
WO2021091022A1 (fr) Système d'apprentissage automatique et procédé de fonctionnement pour système d'apprentissage automatique
WO2020130447A1 (fr) Procédé de fourniture de phrases basé sur un personnage et dispositif électronique de prise en charge de ce dernier
WO2019050137A1 (fr) Système et procédé pour déterminer des caractères d'entrée sur la base d'une entrée par balayage
WO2021029643A1 (fr) Système et procédé de modification d'un résultat de reconnaissance vocale
WO2018124464A1 (fr) Dispositif électronique et procédé de fourniture de service de recherche de dispositif électronique
WO2021246812A1 (fr) Solution et dispositif d'analyse de niveau de positivité d'actualités utilisant un modèle nlp à apprentissage profond
WO2019107674A1 (fr) Appareil informatique et procédé d'entrée d'informations de l'appareil informatique
WO2021096150A1 (fr) Procédé de génération d'un module de leçon personnalisé pour l'apprentissage d'une langue
WO2020209661A1 (fr) Dispositif électronique de génération d'une réponse en langage naturel et procédé associé
WO2023121165A1 (fr) Procédé de génération de modèle qui prédit une corrélation entre des entités comprenant une maladie, un gène, un matériel et un symptôme à partir de données de document et qui délivre un texte d'argument d'unité et système utilisant ledit procédé
WO2018191889A1 (fr) Procédé et appareil de traitement de photo, et dispositif informatique
WO2020076086A1 (fr) Système de traitement d'énoncé d'utilisateur et son procédé de fonctionnement
WO2020171545A1 (fr) Dispositif électronique et système de traitement de saisie d'utilisateur et procédé associé
WO2020213885A1 (fr) Serveur et son procédé de commande
WO2022114451A1 (fr) Procédé de formation de réseau de neurones artificiel et procédé d'évaluation de la prononciation l'utilisant
WO2019117567A1 (fr) Procédé et appareil de gestion de navigation de contenu web
WO2024025039A1 (fr) Système et procédé permettant d'analyser l'état d'un utilisateur dans un métavers
WO2021020727A1 (fr) Dispositif électronique et procédé d'identification de niveau de langage d'objet
WO2023085736A1 (fr) Dispositif électronique et procédé de commande de dispositif électronique

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18886944

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18886944

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 22.01.2021 DATED 25.01.2021)

122 Ep: pct application non-entry in european phase

Ref document number: 18886944

Country of ref document: EP

Kind code of ref document: A1