WO2019194343A1 - Appareil mobile et procédé de classification de phrase dans une pluralité de classes - Google Patents

Appareil mobile et procédé de classification de phrase dans une pluralité de classes Download PDF

Info

Publication number
WO2019194343A1
WO2019194343A1 PCT/KR2018/004623 KR2018004623W WO2019194343A1 WO 2019194343 A1 WO2019194343 A1 WO 2019194343A1 KR 2018004623 W KR2018004623 W KR 2018004623W WO 2019194343 A1 WO2019194343 A1 WO 2019194343A1
Authority
WO
WIPO (PCT)
Prior art keywords
sentence
classes
natural
language
processed
Prior art date
Application number
PCT/KR2018/004623
Other languages
English (en)
Inventor
Ji Hun Park
Original Assignee
Phill It Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Phill It Co., Ltd. filed Critical Phill It Co., Ltd.
Publication of WO2019194343A1 publication Critical patent/WO2019194343A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3334Selection or weighting of terms from queries, including natural language queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/151Transformation
    • G06F40/157Transformation using dictionaries or tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/247Thesauruses; Synonyms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • One or more embodiments relate to mobile apparatuses and methods of classifying a sentence into a plurality of classes.
  • One or more embodiments include mobile apparatuses and methods of classifying a sentence into a plurality of classes, which may increase the accuracy of class classification of an input sentence and give a discrimination to a score of each classified class.
  • a method of classifying a sentence into a plurality of classes includes: performing natural language processing on a first sentence; extracting a keyword having an importance higher than a predetermined standard from the natural-language-processed first sentence; detecting a similar word corresponding to the extracted keyword by using a word embedding model; generating a second sentence by merging the detected similar word into the natural-language-processed first sentence; and classifying the generated second sentence into a plurality of classes by using a convolutional neural network-based text classification model using a sigmoid function as an activation function of an output layer.
  • a mobile apparatus for classifying a sentence into a plurality of classes includes: a user interface device; a memory storing a computer-executable instruction; and a processor executing the computer-executable instruction to perform natural language processing on a first sentence input through the user interface device, extract a keyword having an importance higher than a predetermined standard from the natural-language-processed first sentence, detect a similar word corresponding to the extracted keyword by using a word embedding model, generate a second sentence by merging the detected similar word into the natural-language-processed first sentence, and classify the generated second sentence into a plurality of classes by using a convolutional neural network-based text classification model using a sigmoid function as an activation function of an output layer.
  • a non-transitory computer-readable storage medium having stored therein processor-executable instructions includes: instructions for performing natural language processing on a first sentence; instructions for extracting a keyword having an importance higher than a predetermined standard from the natural-language-processed first sentence; instructions for detecting a similar word corresponding to the extracted keyword by using a word embedding model; instructions for generating a second sentence by merging the detected similar word into the natural-language-processed first sentence; and instructions for classifying the generated second sentence into a plurality of classes by using a text classification model based on a convolutional neural network-based text classification model as an activation function of an output layer.
  • FIG. 1 is a block diagram illustrating a configuration of a mobile apparatus for classifying a sentence into a plurality of classes according to an embodiment
  • FIG. 2 is a diagram illustrating a process of classifying a sentence into a plurality of classes according to an embodiment
  • FIG. 3 is a diagram illustrating a process of generating a second sentence from a first sentence in a process of classifying a sentence into a plurality of classes according to an embodiment
  • FIG. 4 is a diagram illustrating a result of classifying each of a first sentence and a second sentence into a plurality of classes according to an embodiment
  • FIG. 5 is a diagram illustrating a convolutional neural network-based text classification model using a sigmoid function as an activation function of an output layer according to an embodiment
  • FIG. 6 is a diagram illustrating a process of classifying a sentence into a plurality of classes by using each of convolutional neural network-based text classification models using each of different functions as an activation function of an output layer according to an embodiment
  • FIG. 7 is a diagram illustrating a result of classifying a sentence into a plurality of classes by using each of convolutional neural network-based text classification models using each of different functions as an activation function of an output layer according to an embodiment
  • FIG. 8 is a flowchart illustrating a method of classifying a sentence into a plurality of classes according to an embodiment.
  • a method of classifying a sentence into a plurality of classes includes: performing natural language processing on a first sentence; extracting a keyword having an importance higher than a predetermined standard from the natural-language-processed first sentence; detecting a similar word corresponding to the extracted keyword by using a word embedding model; generating a second sentence by merging the detected similar word into the natural-language-processed first sentence; and classifying the generated second sentence into a plurality of classes by using a convolutional neural network-based text classification model using a sigmoid function as an activation function of an output layer.
  • the present embodiments relate to mobile apparatuses and methods of classifying a sentence into a plurality of classes, and detailed descriptions of the details widely known to those of ordinary skill in the art will be omitted herein.
  • FIG. 1 is a block diagram illustrating a configuration of a mobile apparatus 100 for classifying a sentence into a plurality of classes according to an embodiment.
  • the mobile apparatus 100 for classifying a sentence into a plurality of classes may include a memory 110, a processor 120, and a user interface device 130.
  • a memory 110 may include a central processing unit (CPU) 110, a central processing unit (CPU), and a central processing unit (CPU).
  • the mobile apparatus 100 may be an electronic apparatus such as a smart phone, a tablet PC, or a laptop computer that may mount an operating system (OS) and execute an application installed therein to display a processing result according to a user input.
  • the application may be a term collectively referring to an application program or a mobile application. A user may select and execute an application to be executed among various types of applications installed in the mobile apparatus 100.
  • the memory 110 may store software and/or programs.
  • the memory 110 may store various types of data and programs such as an application and an application programming interface (API).
  • API application programming interface
  • the processor 120 may access and use data stored in the memory 110 or may store new data in the memory 110. Also, the processor 120 may execute a program installed in the memory 110. Also, the processor 120 may install an application received from outside in the memory 110.
  • the processor 120 may include at least one processing module.
  • the processor 120 may control other components included in the mobile apparatus 100 to perform an operation corresponding to a user input received through the user interface device 130.
  • the user interface device 130 may receive a user input or the like from the user.
  • the user interface device 130 may display information such as an application execution result, a processing result corresponding to a user input, and a state of the mobile apparatus 100 in the mobile apparatus 100.
  • the user interface device 130 may include hardware units for receiving an input from the user or providing an output from the mobile apparatus 100 and may include a dedicated software module for driving the hardware units.
  • the user interface device 130 may be a touch screen but is not limited thereto.
  • the memory 110 may store instructions executable by the processor 120.
  • the processor 120 may execute the instructions stored in the memory 110.
  • the processor 120 may execute the application installed in the mobile apparatus 100 according to a user input.
  • the processor 120 may perform a preprocessing process such as natural language processing on a first sentence.
  • the first sentence may be input through the user interface device 130.
  • the processor 120 may perform natural language processing by extracting a prototype of each of the words constituting the first sentence and removing stop words and non-keywords having an importance lower than a predetermined standard based on a term frequency and an inverse document frequency from the first sentence including the extracted prototypes.
  • the prototype of each word may be extracted according to lemmatization.
  • the stop words may be, for example, articles, prepositions, or conjunctions that are included in a sentence but do not convey meaning in the sentence.
  • a Term Frequency-Inverse Document Frequency (TF-IDF) score may be a statistical value indicating how important a word is in a sentence or document, and may be used to extract a keyword from a sentence or document.
  • the TF-IDF score may be the product of a term frequency and an inverse document frequency.
  • the processor 120 may extract a keyword having an importance higher than a predetermined standard from the natural-language-processed first sentence. For example, the processor 120 may extract a keyword having an importance higher than a predetermined standard from the natural-language-processed first sentence based on a term frequency and an inverse document frequency. The processor 120 may extract N words ranked at upper scores or words having TF-IDF scores higher than a predetermined threshold value as keywords based on the TF-IDF score.
  • the processor 120 may detect a similar word corresponding to the extracted keyword by using a word embedding model.
  • the processor 120 may detect a similar word corresponding to a keyword corresponding to some group among those grouped according to the importance or all extracted as a keyword by using a word embedding model.
  • the processor 120 may generate a second sentence by merging the detected similar word into the natural-language-processed first sentence.
  • the processor 120 may generate the second sentence by merging a similar word having a priority based on a similarity among the detected similar word at a position between keywords or a position next the keyword of the natural-language-processed first sentence.
  • the processor 120 may classify the generated second sentence into a plurality of classes by using a convolutional neural network-based text classification model using a sigmoid function as an activation function of an output layer.
  • the processor 120 may calculate a score of binary classification with respect to each of the plurality of classes based on the sigmoid function and acquire a normalized score through normalization with respect to a score of each of classes having a priority based on the score.
  • the processor 120 may provide the classes having the priority and the normalized score of each of the classes having the priority together.
  • the processor 120 may determine whether a predetermined condition for classification into the plurality of classes is satisfied, based on a length of the natural-language-processed first sentence and a score based on a term frequency and an inverse document frequency with respect to the natural-language-processed first sentence. As a result of the determination, the processor 120 may classify the natural-language-processed first sentence into a plurality of classes by using the text classification model when the predetermined condition is satisfied, and classify the second sentence into a plurality of classes by using the text classification model when the predetermined condition is not satisfied.
  • the mobile apparatus 100 may perform wired/wireless communication with another device or network.
  • the mobile apparatus 100 may include a communication module that supports at least one of various wired/wireless communication methods.
  • the mobile apparatus 100 may be connected to an external device located outside the mobile apparatus 100 to transmit/receive signals or data thereto/therefrom.
  • the mobile apparatus 100 may receive a word embedding model or text classification model trained in the server 200 from the server 200 according to an update period.
  • FIG. 2 is a diagram illustrating a process of classifying a sentence into a plurality of classes according to an embodiment.
  • the mobile apparatus 100 may classify a sentence including text into a plurality of classes by using a convolutional neural network-based text classification model.
  • the convolutional neural network-based text classification model is less trained or the amount of learned data is small, when a short sentence is input, the accuracy of classification into a plurality of classes may be affected.
  • the scores of the respective classes may not give discrimination in the operations using the scores of the respective classes.
  • a description will be given of a method by which accurate classification processing may be performed and the score of each class may have discrimination even when a short sentence is input or even when the number of classified classes increases.
  • FIG. 3 is a diagram illustrating a process of generating a second sentence from a first sentence in a process of classifying a sentence into a plurality of classes according to an embodiment.
  • a second sentence generated from a first sentence may be longer than the first sentence.
  • the length of a sentence increases, the number of words included in the sentence increases and thus the accuracy of classifying the sentence into classes may increase.
  • the mobile apparatus 100 may perform natural language processing on the first sentence and then find a keyword from the first sentence.
  • the mobile apparatus 100 may acquire a short sentence of "play computer Python” by extracting a prototype of each of the words and then removing stop words and non-keywords from the first sentence including the extracted prototypes.
  • the mobile apparatus 100 may extract "computer” and "Python” as keywords from the sentence “play computer Python” and detect similar words corresponding to "computer” and "Python”.
  • the mobile apparatus 100 may generate a second sentence by merging the similar words such that all or some of the detected similar terms "programming", "coding”, and “language” are located between “computer” and “Python” or after "computer” or “Python” in the short sentence "play computer Python". As illustrated in FIG. 3, the second sentence may be "play computer Python programming language”.
  • FIG. 4 is a diagram illustrating a result of classifying each of a first sentence and a second sentence into a plurality of classes according to an embodiment.
  • FIG. 4 illustrates a result of classification into a plurality of classes with respect to each of the natural-language-processed first sentence "play computer Python” and the second sentence "play computer Python programming language” illustrated in FIG. 3. Since the word “Python” in the preprocessed first sentence "play computer Python” may be determined as referring to a type of snake, it may be classified as a class "pet". On the other hand, since the second sentence "play computer Python programming language” includes more words such as “programming” and “language” in addition to "computer” and “Python", the word “Python” may be determined as referring to a programming language name instead of a type of snake.
  • the second sentence further includes similar words corresponding to the keywords in addition to the keywords of the first sentence, the word "Python” may be prevented from being misinterpreted.
  • the occurrence of an inaccurate class such as "pet” may be prevented and the score of the class "programming" may be highest and thus the accuracy of the result of classification into a plurality of classes may increase.
  • FIG. 5 is a diagram illustrating a convolutional neural network-based text classification model using a sigmoid function as an activation function of an output layer according to an embodiment.
  • the text classification model may be a model for classifying sentences into classes, and may be used to perform categorization on sentences. For example, there may be a convolutional neural network-based text classification model.
  • a convolution neural network-based text classification model using a sigmoid function as an activation function of an output layer may use a hyperbolic tangent function as an activation function of a convolution 1D layer and may use Adam as an optimizer function.
  • Categorical cross-entropy may be used as a loss function of the output layer.
  • the text classification model may calculate a score for each of a plurality of classes as an individual score like a binary classification method. Also, a normalized score may be acquired through normalization with respect to the score of upper n classes having a high calculated score. In this case, even when the number of classes is large, a distinguishable score may be expected for the upper n classes.
  • FIG. 6 is a diagram illustrating a process of classifying a sentence into a plurality of classes by using each of convolutional neural network-based text classification models using each of different functions as an activation function of an output layer according to an embodiment.
  • FIG. 6 it illustrates a process of classifying a sentence into a plurality of classes when a softmax function used for multi-classification (or multi-class classification) is used for a sentence of "My hobby is to play FIFA Online with a computer game" and when a sigmoid function used for binary classification (or binary class classification) is used therefor.
  • a sigmoid function used to classify a sentence into a plurality of classes
  • a score value of each class is like a score of binary classification
  • each score may be normalized with respect to the upper n classes.
  • each score is normalized with respect to the upper n classes even when a sentence is classified into a plurality of classes by using a softmax function.
  • each score is normalized with respect to upper 3 classes among 250 classes.
  • FIG. 7 is a diagram illustrating a result of classifying a sentence into a plurality of classes by using each of convolutional neural network-based text classification models using each of different functions as an activation function of an output layer according to an embodiment.
  • FIG. 7 the result of the process described above with reference to FIG. 6 is illustrated by a diagram.
  • the upper 3 classes “computer”, “hobby”, and “sports” may have almost similar score values; and when normalization is performed thereon, the score values normalized to "0.3334", "0.3333", and "0.3333” may become more similar and thus the score between the classes may have lower discrimination.
  • the upper 3 classes "computer”, “hobby”, and “sports” may have score values "0.9161", “0.8034", and "0.5326” respectively; and even when normalization is performed thereon, they may have score values normalized to "0.3883", “0.3469”, and "0.2646” respectively and thus discrimination may be expected in the score of each class.
  • FIG. 8 is a flowchart illustrating a method of classifying a sentence into a plurality of classes according to an embodiment. Although omitted below, the above description of the mobile apparatus 100 for classifying a sentence into a plurality of classes may also be applied to a method of classifying a sentence into a plurality of classes.
  • the mobile apparatus 100 may perform natural language processing on a first sentence.
  • the mobile apparatus 100 may extract a keyword having an importance higher than a predetermined standard from the natural-language-processed first sentence.
  • the mobile apparatus 100 may extract a keyword having an importance higher than a predetermined standard from the natural-language-processed first sentence based on a term frequency and an inverse document frequency.
  • the mobile apparatus 100 may detect a similar word corresponding to the extracted keyword by using a word embedding model.
  • the mobile apparatus 100 may generate a second sentence by merging the detected similar word into the natural-language-processed first sentence.
  • the mobile apparatus 100 may generate the second sentence by merging a similar word having a priority based on a similarity among the detected similar word at a position between keywords or a position next the keyword of the natural-language-processed first sentence.
  • the mobile apparatus 100 may classify the generated second sentence into a plurality of classes by using a convolutional neural network-based text classification model using a sigmoid function as an activation function of an output layer.
  • the mobile apparatus 100 may calculate a score of binary classification with respect to each of the plurality of classes based on the sigmoid function and acquire a normalized score through normalization with respect to a score of each of classes having a priority based on the score.
  • the mobile apparatus 100 may provide the classes having the priority and the normalized score of each of the classes having the priority together.
  • the mobile apparatus 100 may determine whether a predetermined condition for classification into the plurality of classes is satisfied, based on a length of the natural-language-processed first sentence and a score based on a term frequency and an inverse document frequency with respect to the natural-language-processed first sentence. As a result of the determination, the mobile apparatus 100 may classify the natural-language-processed first sentence into a plurality of classes by using the text classification model when the predetermined condition is satisfied, and classify the second sentence into a plurality of classes by using the text classification model when the predetermined condition is not satisfied.
  • the above embodiments of the method of classifying a sentence into a plurality of classes may be provided in the form of a computer program or application stored in a computer-readable storage medium to cause a computer to perform the method of classifying a sentence into a plurality of classes.
  • the above embodiments may be implemented in the form of a computer-readable recording medium storing a computer-executable instruction and data. At least one of the instruction and the data may be stored in the form of program code and may, when executed by a processor, generate a predetermined program module to perform a predetermined operation.
  • the computer-readable recording medium may be Read-Only Memory (ROM), Random-Access Memory (RAM), flash memories, Compact Disk Read-Only Memory (CD-ROM), Compact Disk Recordable (CD-R), CD+R, Compact Disk Rewritable (CD-RW), CD+RW, Digital Versatile Disk Read-Only Memory (DVD-ROM), Digital Versatile Disk Recordable (DVD-R), DVD+R, Digital Versatile Disk Rewritable (DVD-RW), DVD+RW, Digital Versatile Disk Random-Access Memory (DVD-RAM), Blu-ray Disk Read-Only Memory (BD-ROM), Blu-ray Disk Recordable (BD-R), Blu-ray Disk Recordable Low to High (BD-R LTH), Blu-ray Disk Recordable Erasable (BD-RE), magnetic tapes, floppy disks, magneto-optical data storages, optical data storages, hard disks, Solid-State Disk (SSD), or any device that may store instructions or software,

Abstract

L'invention concerne un appareil mobile et un procédé qui classifie une phrase dans une pluralité de classes, ce qui peut augmenter la précision de classification de classe d'une phrase d'entrée et donner une discrimination à un score de chaque classe classifiée, le procédé consistant à : réaliser un traitement de langage naturel sur une première phrase, extraire un mot-clé ayant une importance supérieure à une norme prédéterminée à partir de la première phrase ayant fait l'objet d'un traitement de langage naturel, détecter un mot similaire correspondant au mot-clé extrait par utilisation d'un modèle d'incorporation de mot, générer une seconde phrase en fusionnant le mot similaire détecté avec la première phrase ayant fait l'objet d'un traitement de langage naturel, et classifier la seconde phrase générée dans une pluralité de classes par utilisation d'un modèle de classification de texte basé sur un réseau neuronal à convolution utilisant une fonction sigmoïde comme fonction d'activation d'une couche de sortie.
PCT/KR2018/004623 2018-04-02 2018-04-20 Appareil mobile et procédé de classification de phrase dans une pluralité de classes WO2019194343A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR1020180038217A KR20190115319A (ko) 2018-04-02 2018-04-02 문장을 복수의 클래스들로 분류하는 모바일 장치 및 방법
KR10-2018-0038217 2018-04-02

Publications (1)

Publication Number Publication Date
WO2019194343A1 true WO2019194343A1 (fr) 2019-10-10

Family

ID=68100860

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2018/004623 WO2019194343A1 (fr) 2018-04-02 2018-04-20 Appareil mobile et procédé de classification de phrase dans une pluralité de classes

Country Status (2)

Country Link
KR (1) KR20190115319A (fr)
WO (1) WO2019194343A1 (fr)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112100384A (zh) * 2020-11-10 2020-12-18 北京智慧星光信息技术有限公司 一种数据观点抽取方法、装置、设备及存储介质
CN113010740A (zh) * 2021-03-09 2021-06-22 腾讯科技(深圳)有限公司 词权重的生成方法、装置、设备及介质
CN113254595A (zh) * 2021-06-22 2021-08-13 北京沃丰时代数据科技有限公司 闲聊识别方法、装置、电子设备及存储介质
CN113449099A (zh) * 2020-03-25 2021-09-28 瑞典爱立信有限公司 文本分类方法和文本分类设备

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102315215B1 (ko) * 2019-10-02 2021-10-20 (주)디앤아이파비스 특허문서의 단어 세트 획득 방법 및 획득된 단어 세트를 바탕으로 특허문서의 유사도를 판단하기 위한 방법.
KR102315214B1 (ko) * 2019-10-02 2021-10-20 (주)디앤아이파비스 유사도 점수 및 비유사도 점수를 이용한 특허문서의 유사도 판단 방법, 장치 및 시스템
KR102440193B1 (ko) * 2022-03-08 2022-09-05 그린캣소프트(주) 자연어 분류 신경망의 훈련 데이터를 증강하는 방법

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140188830A1 (en) * 2012-12-27 2014-07-03 Sas Institute Inc. Social Community Identification for Automatic Document Classification
US20140350964A1 (en) * 2013-05-22 2014-11-27 Quantros, Inc. Probabilistic event classification systems and methods
US20170116204A1 (en) * 2015-08-24 2017-04-27 Hasan Davulcu Systems and methods for narrative detection and frame detection using generalized concepts and relations
US20180060305A1 (en) * 2016-08-25 2018-03-01 International Business Machines Corporation Semantic hierarchical grouping of text fragments
WO2018046412A1 (fr) * 2016-09-07 2018-03-15 Koninklijke Philips N.V. Classification semi-supervisée avec auto-codeur empilé

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140188830A1 (en) * 2012-12-27 2014-07-03 Sas Institute Inc. Social Community Identification for Automatic Document Classification
US20140350964A1 (en) * 2013-05-22 2014-11-27 Quantros, Inc. Probabilistic event classification systems and methods
US20170116204A1 (en) * 2015-08-24 2017-04-27 Hasan Davulcu Systems and methods for narrative detection and frame detection using generalized concepts and relations
US20180060305A1 (en) * 2016-08-25 2018-03-01 International Business Machines Corporation Semantic hierarchical grouping of text fragments
WO2018046412A1 (fr) * 2016-09-07 2018-03-15 Koninklijke Philips N.V. Classification semi-supervisée avec auto-codeur empilé

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113449099A (zh) * 2020-03-25 2021-09-28 瑞典爱立信有限公司 文本分类方法和文本分类设备
CN113449099B (zh) * 2020-03-25 2024-02-23 瑞典爱立信有限公司 文本分类方法和文本分类设备
CN112100384A (zh) * 2020-11-10 2020-12-18 北京智慧星光信息技术有限公司 一种数据观点抽取方法、装置、设备及存储介质
CN112100384B (zh) * 2020-11-10 2021-02-02 北京智慧星光信息技术有限公司 一种数据观点抽取方法、装置、设备及存储介质
CN113010740A (zh) * 2021-03-09 2021-06-22 腾讯科技(深圳)有限公司 词权重的生成方法、装置、设备及介质
CN113010740B (zh) * 2021-03-09 2023-05-30 腾讯科技(深圳)有限公司 词权重的生成方法、装置、设备及介质
CN113254595A (zh) * 2021-06-22 2021-08-13 北京沃丰时代数据科技有限公司 闲聊识别方法、装置、电子设备及存储介质

Also Published As

Publication number Publication date
KR20190115319A (ko) 2019-10-11

Similar Documents

Publication Publication Date Title
WO2019194343A1 (fr) Appareil mobile et procédé de classification de phrase dans une pluralité de classes
US9542477B2 (en) Method of automated discovery of topics relatedness
US10831762B2 (en) Extracting and denoising concept mentions using distributed representations of concepts
US20180357511A1 (en) Recommending machine learning techniques, features, and feature relevance scores
US20160239500A1 (en) System and methods for extracting facts from unstructured text
US7840521B2 (en) Computer-based method and system for efficient categorizing of digital documents
CN110532376B (zh) 分类文本以确定用于选择机器学习算法结果的目标类型
US20180173495A1 (en) Duplicate and similar bug report detection and retrieval using neural networks
KR102048638B1 (ko) 콘텐츠 인식 방법 및 시스템
TW202020691A (zh) 特徵詞的確定方法、裝置和伺服器
CN112183099A (zh) 基于半监督小样本扩展的命名实体识别方法及系统
US9141883B1 (en) Method, hard negative proposer, and classifier for supporting to collect hard negative images using a similarity map
CN108550054B (zh) 一种内容质量评估方法、装置、设备和介质
WO2021182689A1 (fr) Détection et association automatiques de nouveaux attributs à des entités dans des bases de connaissances
US10628749B2 (en) Automatically assessing question answering system performance across possible confidence values
WO2022042297A1 (fr) Procédé et appareil de regroupement de textes, dispositif électronique et support de stockage
CN111324810A (zh) 一种信息过滤方法、装置及电子设备
WO2022143608A1 (fr) Procédé et appareil d'étiquetage de langues, dispositif informatique et support de stockage
CN114117038A (zh) 一种文档分类方法、装置、系统及电子设备
KR20200106108A (ko) 딥러닝 기반의 특허정보 워드임베딩 방법 및 그 시스템
CN117216687A (zh) 一种基于集成学习的大语言模型生成文本检测方法
CN116821903A (zh) 检测规则确定及恶意二进制文件检测方法、设备及介质
US9342795B1 (en) Assisted learning for document classification
CN114153954A (zh) 测试用例推荐方法、装置、电子设备及存储介质
CN111382267B (zh) 一种问题分类方法、问题分类装置及电子设备

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205 DATED 29.06.2021)

122 Ep: pct application non-entry in european phase

Ref document number: 18913308

Country of ref document: EP

Kind code of ref document: A1