WO2012030049A3 - 동적 임계값이 적용된 유사문서 분류화 장치 및 방법 - Google Patents

동적 임계값이 적용된 유사문서 분류화 장치 및 방법 Download PDF

Info

Publication number
WO2012030049A3
WO2012030049A3 PCT/KR2011/003590 KR2011003590W WO2012030049A3 WO 2012030049 A3 WO2012030049 A3 WO 2012030049A3 KR 2011003590 W KR2011003590 W KR 2011003590W WO 2012030049 A3 WO2012030049 A3 WO 2012030049A3
Authority
WO
WIPO (PCT)
Prior art keywords
threshold value
document
applying
dynamic threshold
documents
Prior art date
Application number
PCT/KR2011/003590
Other languages
English (en)
French (fr)
Other versions
WO2012030049A2 (ko
Inventor
정한민
김평
이승우
이미경
서동민
성원경
Original Assignee
한국과학기술정보연구원
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 한국과학기술정보연구원 filed Critical 한국과학기술정보연구원
Publication of WO2012030049A2 publication Critical patent/WO2012030049A2/ko
Publication of WO2012030049A3 publication Critical patent/WO2012030049A3/ko

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/194Calculation of difference between files

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

본 발명은 동적 임계값이 적용된 문서 브라우징 장치 및 방법에 관한 것으로, 입력되거나 저장된 문서에 대하여 각 문서 사이의 유사도를 저장하는 문서 관리 모듈, 문서 관리 모듈을 검색하여 기준 문서와 다른 문서 사이의 유사도가 설정된 임계값 이상의 유사도를 가지는 문서를 검색하는 유사 문서 검색 모듈 및 검색된 문서들을 하나의 군집으로 군집화하는 유사 문서 분류화 모듈을 포함한다.
PCT/KR2011/003590 2010-09-01 2011-05-16 동적 임계값이 적용된 유사문서 분류화 장치 및 방법 WO2012030049A2 (ko)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR10-2010-0085384 2010-09-01
KR1020100085384A KR101035037B1 (ko) 2010-09-01 2010-09-01 동적 임계값이 적용된 유사문서 분류화 장치 및 방법

Publications (2)

Publication Number Publication Date
WO2012030049A2 WO2012030049A2 (ko) 2012-03-08
WO2012030049A3 true WO2012030049A3 (ko) 2012-04-26

Family

ID=44366141

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2011/003590 WO2012030049A2 (ko) 2010-09-01 2011-05-16 동적 임계값이 적용된 유사문서 분류화 장치 및 방법

Country Status (2)

Country Link
KR (1) KR101035037B1 (ko)
WO (1) WO2012030049A2 (ko)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101847847B1 (ko) 2016-11-15 2018-04-12 주식회사 와이즈넛 딥러닝을 이용한 비정형 텍스트 데이터의 문서 군집화 방법
US11176179B2 (en) 2019-09-24 2021-11-16 International Business Machines Corporation Assigning a new problem record based on a similarity to previous problem records
KR102376489B1 (ko) * 2019-11-22 2022-03-18 주식회사 와이즈넛 단어 랭킹 기반의 텍스트 문서 군집 및 주제 생성 장치 및 그 방법
KR102373146B1 (ko) * 2020-03-24 2022-03-14 경북대학교 산학협력단 군집 기반 중복문서 제거 장치 및 제거 방법

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4828091B2 (ja) * 2003-03-05 2011-11-30 ヒューレット・パッカード・カンパニー クラスタリング方法プログラム及び装置

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
HEWLETT PACKARD CO, HP, PUBLICATION NO. 2004-78896, 13 September 2004 (2004-09-13) *
LG ELECTRONICS INC., PUBLICATION NO. 2007-0102036, 18 October 2007 (2007-10-18) *
SUNG HO JANG: "Design and Implementation of Keyword-based Document Clustering System", GRADUATE SCHOOL OF KOOKMIN UNIVERSITY MASTER'S DEGREE THESIS, 31 July 2003 (2003-07-31), pages 21 - 39 *

Also Published As

Publication number Publication date
WO2012030049A2 (ko) 2012-03-08
KR101035037B1 (ko) 2011-05-19

Similar Documents

Publication Publication Date Title
WO2012070840A3 (ko) 컨센서스 검색 장치 및 방법
EP3748629A4 (en) IDENTIFICATION METHOD AND DEVICE FOR LANGUAGE KEYWORDS, COMPUTER READABLE STORAGE MEDIUM AND COMPUTER DEVICE
MX2019001112A (es) Sistema y metodo para la implementacion de contenedores que extraen y aplican conocimiento de pagina de semantica.
WO2014183956A3 (en) Social media content analysis and output
GB2482630A (en) A data retrieval and indexing method and apparatus
WO2013173826A3 (en) Populating and searching a drug informatics database
WO2012129149A3 (en) Aggregating search results based on associating data instances with knowledge base entities
EP3051432A4 (en) Semantic information acquisition method, keyword expansion method thereof, and search method and system
WO2011097066A3 (en) Semantic table of contents for search results
WO2015170191A3 (en) Method and apparatus for screening promotion keywords
WO2011112744A3 (en) User role based customizable semantic search
WO2009140272A3 (en) Search results with most clicked next objects
WO2014085776A3 (en) Web search ranking
WO2010141799A3 (en) Feature engineering and user behavior analysis
WO2014025705A3 (en) Search result ranking and presentation
WO2013163644A3 (en) Updating a search index used to facilitate application searches
WO2010008800A3 (en) Query identification and association
WO2011149961A3 (en) Systems and methods for identifying intersections using content metadata
WO2011159516A3 (en) Semantic content searching
GB201209093D0 (en) Method of searching for document data files based on keywords,and computer system and computer program thereof
WO2014043200A3 (en) Dynamic data acquisition method and system
CA2879417A1 (en) Structured search queries based on social-graph information
WO2007089289A3 (en) Method for ranking and sorting electronic documents in a search result list based on relevance
GB2490070A (en) Systems and methods for ranking documents
WO2009029675A3 (en) Method and system for data context service

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11822024

Country of ref document: EP

Kind code of ref document: A2

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 11822024

Country of ref document: EP

Kind code of ref document: A2