KR20030024309A - Slang Remover Program On Web board - Google Patents

Slang Remover Program On Web board Download PDF

Info

Publication number
KR20030024309A
KR20030024309A KR1020010057389A KR20010057389A KR20030024309A KR 20030024309 A KR20030024309 A KR 20030024309A KR 1020010057389 A KR1020010057389 A KR 1020010057389A KR 20010057389 A KR20010057389 A KR 20010057389A KR 20030024309 A KR20030024309 A KR 20030024309A
Authority
KR
South Korea
Prior art keywords
slang
text
bulletin
token
list
Prior art date
Application number
KR1020010057389A
Other languages
Korean (ko)
Inventor
정상호
Original Assignee
정상호
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 정상호 filed Critical 정상호
Priority to KR1020010057389A priority Critical patent/KR20030024309A/en
Publication of KR20030024309A publication Critical patent/KR20030024309A/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • Tourism & Hospitality (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Marketing (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Economics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Primary Health Care (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Computer And Data Communications (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

PURPOSE: A program for blocking slang on a web bulletin board is provided to transfer an upright meaning to other Netizen while protecting freedom for expressing a language on the Internet and to enable a web manager to efficiently manage the bulletin board, a monitoring function and an operation time setting function. CONSTITUTION: A bulletin list after an input date of the latest processed bulletin is read by using a 'select' query through a JDBC(Java DataBase Connectivity) interface of a bulletin board RDBMS(Relational DataBase Management System). A text of the bulletin is passed over to a tokenizer in order. The tokenizer divides the text into each token by scanning the text through a defined special character set. A slang extracting module finds a key of a slang list loaded on a main memory by using the first phoneme of each token. The data matched with the token in a pattern is searched from a specific list on the found slang list by using a binary search technique. If the pattern-matched token is found out, a substituted text is obtained from the tokenizer. The text of the bulleting including the found slang is updated by the substituted text.

Description

웹게시판 비어와 저속어 처리 프로그램{Slang Remover Program On Web board}Slang Remover Program On Web board}

본발명은 인터넷의 비어와 저속어에 대한 관리의 어려움에 대해 효율적으로 관리 할 수 있는 비어, 저속어의 제거 프로그램에 관한것이다.The present invention relates to a program for removing beer and vulgar words that can efficiently manage the difficulty of managing the beer and vulgar words on the Internet.

과거 인터넷 게시판의 비어 및 저속어의 처리 방식은 사용자가 입력 할 때 속어에 대한 차단으로 인하여 네티즌은 욕구 충족이 되지 않아 새로운 저속어(욕)을 만들게 하였다.In the past, Internet bulletin boards deal with beer and slang words, so netizens couldn't meet their needs due to the blocking of slang when a user inputs them.

또한 인터넷이 발전함에 따라 일부 네티즌의 심한 저속어 표현에 대해 방어를 위한 프로그램으로 인해 네티즌들의 과감한 욕만들기가 성행하고 있는 실정이며, 게시판의 저속어를 차단하기 위하여 실제 언어로 사용되는 단어조차 차단하는 경우가 발생되어 네티즌의 언어표현의 자유를 막고 있다.Also, with the development of the Internet, some netizens are trying to defend against the expression of severe vulgar language, which is causing the bold swearing of netizens, and even the words used in actual languages are blocked to block the vulgar language of bulletin boards. To prevent netizens' freedom of speech.

본 발명은 기존의 차단 방법이 아닌 네티즌의 언어표현의 자유를 지켜 주면서 게시판의 관리 모듈로 타 네티즌에게는 정화된 언어를 보여 주므로 인해 2중의 효과를 가져 올 수 있는 프로그램이다.The present invention is a program that can bring a double effect because it shows the purified language to other netizens as a management module of the bulletin board while protecting the freedom of language expression of the netizens rather than the existing blocking method.

또한 웹마스터의 효율적인 관리를 위하여 여러 기능을 추가 하였다.In addition, several functions have been added for efficient management of webmasters.

기존 프로그램과 차별된 것은 차단이 아닌 사용자가 저속어를 올린후 타 사용자가 그내용을 클릭 하였을때는 저속어는 치완 및 삭제 되면서 올바른 표현만 보여주는 방식이다.What is different from the existing program is not blocking, but when a user uploads a vulgar word and another user clicks on the content, the vulgar word is only corrected and deleted and shows only the correct expression.

본 발명은 인터넷의 언어 표현의 자유를 지켜주면서 타 네티즌에게 올바른 뜻을 전달 될 수 있도록 제작된 프로그램이다.The present invention is a program designed to deliver the correct meaning to other netizens while protecting the freedom of language expression of the Internet.

또한 웹관리자가 게시판의 관리를 효율적으로 관리 할 수 있도록 치완기능, 삭제기능, 업로드 기능등을 추가 하였으며 모니터링 기능, 운영 시간 설정기능등을 원활히 관리 할수 있도록 하였다.In addition, Web administrators can add management functions, delete function, upload function, etc. to manage the bulletin board efficiently, and manage the monitoring function and operation time setting function smoothly.

* 프로그램명: 웹크리너Program Name: Web Cleaner

도1은 프로그램 전체 구성도1 is a program overall configuration diagram

도2 는 비속어 처리모듈 프로그램2 is a slang processing module program

Webcleaner는 프로그래밍 모델 중MVC(Model-View- Controller) 디자인 패턴을 따라 개발하였으며 DB에 접근방법은 DB의 종류에 상관없는 JDBC(Java Database Connectivity)를 사용했다. Webcleaner는 Java언어로 작성되고 JDBC를 사용하므로 어떤 플랫폼에서든지 실행되며 또한 어떤 RDBMS(Relational DataBase Management Systems)에서든지 JDBC 드라이버가 있다면 연결되며 Java실행환경과 JSP/서블릿 엔진이 있으면 작동할 수 있다.Webcleaner developed according to MVC (Model-View-Controller) design pattern among programming models and used JDBC (Java Database Connectivity) regardless of DB type. Webcleaner is written in the Java language and uses JDBC, so it runs on any platform, and can be connected with any JDBC driver on any relational database management system (RDBMS), and works with the Java runtime and JSP / servlet engine.

도1은 시스템의 구성도 이다. 그림의 Web server와 Web application server부를 보면 Http Request를 서블릿이 받아서 자바 실행환경의 비즈니스 로직부분을 호출하며 Controller의 역할을 하고 있고, 비즈니스 로직에서는 백엔드시스템과 통신한 결과를 Model인 자바빈즈에 넘겨준다. 그러면 JSP에서는 자바빈즈의 내용을 화면에 보여주는 View의 역할을 하여 Response를 보낸다. 백엔드시스템에는 게시판DB와 비속어리스트 파일, 결과를 저장하는 파일, 비속어를 처리할 게시판리스트 정보를 가지고 있는 게시판 정보파일이 있다. 특정 게시판에 대한 비속어처리를 시작하라는 Request가 있을 때 비즈니스 로직부의 cleaner 쓰레드 중 그 게시판에 관한 쓰레드를 시작시키게 된다. cleaner 쓰레드는 자바가상머신에서 멀티쓰레드로 실행되면서 주기억장치의 비속어리스트를 공유하므로 효율적이며 각 게시판에 대한 동시수행이 가능하다.1 is a configuration diagram of a system. In the web and web application server shown in the figure, the servlet receives the Http Request and calls the business logic part of the Java execution environment and acts as a controller, and the business logic passes the result of communicating with the backend system to the Java bean which is the model. . Then JSP sends a response by acting as a view showing the contents of JavaBeans on the screen. The back-end system includes a bulletin board DB, a slang list file, a file storing results, and a bulletin board information file that contains bulletin board list information to process slang words. When a request is made to start slang processing for a specific bulletin board, the cleaner thread in the business logic section starts the thread for that bulletin board. The cleaner thread runs as a multi-threaded Java virtual machine, sharing the slang list of main memory, so it is efficient and allows simultaneous execution of each bulletin board.

도 2는 게시판 RDBMS로 부터 본문 텍스트들을 얻어서 비속어를 추출하여 치환된 텍스트를 얻는 과정인데 순서대로 설명하면 다음과 같다.2 is a process of extracting slang by obtaining the body texts from the bulletin board RDBMS to obtain the replaced text.

(1) 게시판RDBMS에 JDBC인터페이스를 사용하여 select 질의를 주어 가장마지막 처리한 게시물 의 입력날짜 이후의 게시물 리스트를 읽어온다.(1) By giving a select query to the bulletin board RDBMS using the JDBC interface, the list of posts after the input date of the last processed post is read.

(2) 게시물의 본문테스트를 차례로 토크나이저에 넘겨준다. 그러면 토크나이저가 정의된 특수문자세트에 의해 본문텍스트를 한번 스캐닝하여 단어(토큰)별로 나누게 하고 각 토큰의 본문텍스트 상의 인덱스를 기록해 둔다.(2) Pass the body test of the post to the tokenizer in turn. The tokenizer then scans the body text once by a defined set of special characters and divides them into words (tokens) and records the index on the body text of each token.

(3) 비속어 추출모듈에서는 토크나이저에게 토큰을 차례로 넘겨줄 것을 요청하여 각 토큰의 첫음절 음소에 의해 주기억장치에 올라와 있는 비속어 리스트의 키를 결정하여 찾아간다.(3) The slang extraction module asks the tokenizer to pass tokens in turn, and determines the key of the slang list on the main memory by the first syllable phoneme of each token.

(4) 찾아간 비속어 리스트 상의 특정리스트에서 토큰과 패턴이 매치되는 데이터가 있는지를 binary search기법으로 찾는다.(4) Use binary search to find out if there is data matching the token and pattern in a specific list on the slang list.

(5) 이렇게 해서 본문텍스트의 끝까지 찾아서 비속어리스트와 패턴이 매치되는 토큰이 발견되었다면 토크나이저에게 비속어 부분을 치환시킨 텍스트를 요청하여 치환텍스트를 얻어낸다. 토크나이저는 토크나이징 할 때 본문텍스트 상의 토큰의 인덱스를 기록해 두었으므로 치환텍스트를 만들어 낼 수 있다.(5) In this way, if the token that matches the slang list and the pattern is found by searching to the end of the body text, the tokenizer asks the text that replaces the slang part to obtain the replacement text. When Tokenize keeps track of the index of tokens in the body text when tokenizing, it can generate replacement text.

(6) 게시판 RDBMS에 JDBC인터페이스를 사용하여 비속어가 발견된 게시물의 본문텍스트를 치환텍스트로 update하는 SQL문을 실행시켜 치환시킨다.(6) Use the JDBC interface in the bulletin board RDBMS to execute the SQL statement that updates the body text of the post where the slang is found with the replacement text.

(7) 치환시킨 날짜별로 치환시킨 게시물에 대한 정보를 결과파일에 저장한다.(7) It saves the information about the replaced post by the date replaced in the result file.

본 발명의 전체 적인 구성은 저속어 분석 모듈(module), 설치모듈, 시스템 설정 모듈, 관리모듈로 나눌 수 있다.The overall configuration of the present invention can be divided into a low word analysis module (module), an installation module, a system configuration module, a management module.

이상과 같이 저속어 및 비어 처리 제거기는 웹관리자의 관리의 편리성과 네티즌의 심리적인 요인을 분석하여 차단이 아닌 제거로 인하여 인터넷 네티켓을 지켜나가는데 현실적인 프로그램이 될 수 있다.As mentioned above, the slang and beer processing eliminator can be a realistic program to protect the Internet netiquette by eliminating the blocking of the web administrator by analyzing the convenience of management and the psychological factors of the netizens.

본 발명의 결과를 정리하면 아래와 같다.The results of the present invention are summarized as follows.

Claims (1)

인터넷의 비어 및 저속어에 대한 효율적인 게시판 관리을 위하여 사용자가 속어등을 입력하는 경우 비어 및 저속어 분석기에 의하여 차단이 아닌 제거하는 프로그램In order to manage the bulletin board effectively for the internet's beer and slang words, if the user inputs slang, the program is not blocked by the beer and slang analyzer.
KR1020010057389A 2001-09-17 2001-09-17 Slang Remover Program On Web board KR20030024309A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
KR1020010057389A KR20030024309A (en) 2001-09-17 2001-09-17 Slang Remover Program On Web board

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
KR1020010057389A KR20030024309A (en) 2001-09-17 2001-09-17 Slang Remover Program On Web board

Publications (1)

Publication Number Publication Date
KR20030024309A true KR20030024309A (en) 2003-03-26

Family

ID=27724397

Family Applications (1)

Application Number Title Priority Date Filing Date
KR1020010057389A KR20030024309A (en) 2001-09-17 2001-09-17 Slang Remover Program On Web board

Country Status (1)

Country Link
KR (1) KR20030024309A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005036412A1 (en) * 2003-10-16 2005-04-21 Nhn Corporation A method of managing bulletin on internet and a system thereof
KR100817848B1 (en) * 2006-06-26 2008-03-31 (주)트리니티소프트 Method for network-based data inspection and apparatus thereof
WO2010090382A1 (en) * 2009-02-03 2010-08-12 Jang Sung-Hee Online protection system and protection method
CN104252463A (en) * 2013-06-26 2014-12-31 中国银联股份有限公司 Db2 database management method based on web system

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005036412A1 (en) * 2003-10-16 2005-04-21 Nhn Corporation A method of managing bulletin on internet and a system thereof
KR100817848B1 (en) * 2006-06-26 2008-03-31 (주)트리니티소프트 Method for network-based data inspection and apparatus thereof
WO2010090382A1 (en) * 2009-02-03 2010-08-12 Jang Sung-Hee Online protection system and protection method
CN104252463A (en) * 2013-06-26 2014-12-31 中国银联股份有限公司 Db2 database management method based on web system
CN104252463B (en) * 2013-06-26 2018-09-04 中国银联股份有限公司 A kind of db2 data base management methods based on web system

Similar Documents

Publication Publication Date Title
US5890103A (en) Method and apparatus for improved tokenization of natural language text
US6782505B1 (en) Method and system for generating structured data from semi-structured data sources
US6975983B1 (en) Natural language input method and apparatus
US20020099536A1 (en) System and methods for improved linguistic pattern matching
JP2014041615A (en) Method and system with high performance data meta tag using coprocessor and with data index
WO1997004405A9 (en) Method and apparatus for automated search and retrieval processing
US20040088651A1 (en) Method and system for multiple level parsing
US7398210B2 (en) System and method for performing analysis on word variants
Cahill et al. Wide-coverage deep statistical parsing using automatic dependency structure annotation
CN112597307A (en) Extraction method, device and equipment of figure action related data and storage medium
CN111984774A (en) Search method, device, equipment and storage medium
KR20060043583A (en) Compression of logs of language data
KR20030024309A (en) Slang Remover Program On Web board
CN113032371A (en) Database grammar analysis method and device and computer equipment
CN109800430B (en) Semantic understanding method and system
US20040024741A1 (en) Database processing method
CN115098365A (en) SQL code debugging method and device, electronic equipment and readable storage medium
CN112069198B (en) SQL analysis optimization method and device
JP2006004283A (en) Method and system for extracting/narrowing keyword from text information source
JP2000194559A5 (en)
JP5412137B2 (en) Machine learning apparatus and method
KR100347055B1 (en) Korean morpheme analyzing method
JP2830097B2 (en) Sentence search method
CN117454378A (en) Static detection method and device for storage type XSS loopholes in modern Web application
JPH0773200A (en) Key word extracting method

Legal Events

Date Code Title Description
A201 Request for examination
E902 Notification of reason for refusal
E601 Decision to refuse application
E601 Decision to refuse application