KR101349967B1

KR101349967B1 - Method of Improving Logic to Propose Query for Mobile Keyboard Typo Pattern and the Device Thereof

Info

Publication number: KR101349967B1
Application number: KR1020120061063A
Authority: KR
Inventors: 김효민; 서희철; 이태호; 김태일
Original assignee: 네이버 주식회사
Priority date: 2012-06-07
Filing date: 2012-06-07
Publication date: 2014-02-07
Also published as: KR20130139447A

Abstract

통계 데이터에 기초한 중복문자를 포함하는 사용자 질의 교정 방법 및 장치가 개시된다. 사용자 질의 교정 방법은 입력된 사용자 질의가 오탈자 질의인지 여부를 판단하는 단계, 사용자 질의가 오탈자 질의인 경우 적어도 하나의 교정 후보 문자열을 추출하는 단계, 적어도 하나의 교정 후보 문자열에 대한 후보 점수를 산출하는 단계, 교정 후보 문자열 중에서 교정되는 문자의 앞문자 혹은 뒷문자가 교정되는 문자와 동일한 경우에 해당하는 문자열에 가중치를 부여하는 단계 및 후보 점수 및 가중치에 따라 교정 후보 문자열 중에서 교정 질의를 결정하는 단계를 포함할 수 있다.Disclosed are a method and apparatus for calibrating a user query including a duplicate character based on statistical data. The user query correction method may include determining whether an input user query is a typo query, extracting at least one calibration candidate string when the user query is a typo query, and calculating candidate scores for the at least one calibration candidate string. And weighting a string corresponding to a case in which the first character or the back character of the character to be corrected among the correction candidate strings is the same as the character to be corrected, and determining a correction query among the correction candidate strings according to candidate scores and weights. can do.

Description

Method of Improving Logic to Propose Query for Mobile Keyboard Typo Pattern and the Device Thereof}

본 발명은 통계 데이터에 기초한 사용자 질의 교정 방법 및 장치에 관한 것으로, 구체적으로는 중복되는 음절 또는 자모를 포함하면서 오탈자 질의로 판단된 사용자 질의를 교정하는 방법 및 장치에 관한 것이다.The present invention relates to a method and apparatus for correcting a user query based on statistical data, and more particularly, to a method and apparatus for correcting a user query determined as a typo query, including overlapping syllables or letters.

사용자는 원하는 정보를 얻기 위해 검색을 수행할 수 있다. 이 때, 사용자는 검색 페이지의 질의 입력창에 질의를 입력함으로써, 검색을 수행할 수 있다. 이 경우, 사용자는 키보드에 존재하는 키를 잘못 누르거나, 중복해서 누르는 경우, 엉뚱한 질의를 입력할 수 있다.The user can perform a search to get the desired information. In this case, the user may perform a search by inputting a query in the query input window of the search page. In this case, the user may incorrectly press a key existing on the keyboard, or may input a wrong query if the user presses the key repeatedly.

이와 같이 사용자가 오탈자 질의를 입력하여 검색하는 경우, 원래 의도하였던 검색 결과와 전혀 무관한 검색 결과가 도출됨으로써, 검색 품질이 나빠질 수 있다.As such, when a user inputs and searches a typo query, a search result that is completely unrelated to the intended search result may be derived, and thus the search quality may be deteriorated.

특히 스마트폰이 널리 보급되면서 모바일에서 주변 자판을 잘못 선택하거나 자음을 잘못 변환하는 문제가 발생할 수 있다. 모바일 오타는 한영 오타보다 대부분 한글-한글 오타에서 비롯된다. 모바일에서 한글-한글 오타의 유형으로는 주변 자판을 잘못 클릭하여 자모가 불필요하게 들어가거나 특정 자모를 삭제하면 정자가 되는 주변 자판 클릭 오류 유형, 자음 키패드를 연속 클릭하여 과도하게 변환하거나 자음을 변환하지 않은 글자 변환 오류 유형 등이 있다.In particular, with the widespread use of smartphones, mobile devices may incorrectly select peripheral keyboards or convert consonants incorrectly. Mobile typos are mostly caused by Hangul-Hangul typos rather than Korean-English typos. The Hangul-Hangul typo type on mobile can be used to accidentally click on the surrounding keyboard to make the alphabet unnecessarily, or to delete certain letters, which causes sperm to become sperm. Type of character conversion error.

구체적으로 주변 자판 클릭 오류 유형에는 자음 또는 모음이 잘못 들어간 교체 오류 패턴이 가장 많으며 천지인 자판 사용자의 경우 도트 또는 마침표가 삽입된 오타가 유입되는바 다음 표 1에 예시가 나와있다.Specifically, the most frequently used type of error type includes a consonant or a vowel replacement error. In the case of a keyboard user who has a problem, a typo with a dot or a period is introduced.

정자sperm 오타typo 교체
오류substitute
error 멜론melon 멜ㄹㅎㄴMel 울랄라세션Ulala session 울랄라ㅛㅔ션Ulala cushion 날씨weather 날ㅆㄱㅆ 수능sat 순ㄷㅇIn order 네이버Naver 네이.ㅓNay. 네이버영화Naver Movies 네이버영호ㅓNaver Youngho

글자 변환 오류 유형으로는 자음을 된소리 자음으로 과도하게 변환하거나, 변환을 하지 않아 잘못된 받침에 들어간 오타가 유입되는바 다음 표 2에 예시가 나와있다.Types of character conversion errors include excessive conversion of consonant sounds into consonants, or typos that enter the wrong support due to no conversion. Examples are shown in Table 2 below.

정자sperm 오타typo 과도
변환excess
conversion 나도꽃Flower 나도꼬ㅉMe too 연금복권Pension Lottery 연금보퀀Alchemy 인피니트Infinite 인피니ㄸInfiniz 비스트beast 비스ㄸVischen 원더걸스Wonder Girls 원더걸ㅆWonder Girl 뿌리깊은나무Deep-rooted tree 뿌니깊은나무Deep wood 뱀파이어검사Vampire test 뱅파이어검사Van Fire Inspection 다음next 다응Yes

따라서, 사용자가 무의식적으로 오탈자 질의를 입력할 수 있기 때문에, 오탈자 질의가 입력되면 실시간으로 오탈자 질의를 사용자가 원래 의도한 정자 질의로 변경하는 방법이 필요하다. 그러나, 사용자가 원래 입력하고자 하였던 정자 질의를 검색 엔진과 같은 시스템이 파악하기 어려운 문제가 있으며, 오히려 시스템이 제안한 정자 질의가 전혀 엉뚱한 결과를 초래할 수 있다.Therefore, since the user may unknowingly input the wrong person query, there is a need for a method of changing the wrong person query into a sperm query originally intended by the user when the wrong person query is input. However, there is a problem that a system such as a search engine cannot easily identify a sperm query that a user originally intended to input. Rather, the sperm query proposed by the system may cause an erratic result.

따라서, 사용자가 오탈자 질의를 입력하는 경우, 사용자의 의도를 반영하고 정확도가 높은 정자 질의를 제공하는 방법이 요구되고 있다.Therefore, when a user inputs a wrong person query, a method of reflecting the user's intention and providing a high accuracy sperm query is required.

일실시예에 따르면 오탈자로 판단된 사용자 질의를 교정하는 경우, 오탈자-정자 질의 쌍으로 구성된 사전 데이터에 기초하여 확률에 따른 후보 점수를 산출함으로써 오탈자 질의를 보다 정확하게 교정할 수 있는 사용자 질의 교정 방법 및 장치를 제공한다.According to an embodiment of the present invention, when correcting a user query determined as a typo, a user score calibration method for correcting a typo query more accurately by calculating candidate scores based on probabilities based on dictionary data composed of a typo-sperm query pair. Provide a device.

일실시예에 따르면 오탈자로 판단된 사용자 질의를 교정하는 경우, 오탈자-정자 질의 쌍에서 오탈자가 중복음절을 포함한다면 후보 점수에 가중치를 부여함으로써 오탈자 질의를 보다 정확하게 교정할 수 있는 사용자 질의 교정 방법 및 장치를 제공한다.According to an embodiment, when correcting a user query determined to be a typo, a user query calibration method for correcting a typo query more accurately by weighting a candidate score if the typo in a typo-sperm query pair includes a middle verse Provide a device.

일실시예에 따르면 오탈자로 판단된 사용자 질의를 교정하는 경우, 오탈자-정자 질의 쌍에서 오탈자가 중복자모를 포함한다면 후보 점수에 가중치를 부여함으로써 오탈자 질의를 보다 정확하게 교정할 수 있는 사용자 질의 교정 방법 및 장치를 제공한다.According to an embodiment of the present invention, when correcting a user query determined as a typo, a user query calibration method for correcting a typo query more accurately by weighting a candidate score if the typo in a typo-sperm query pair includes duplicate children and Provide a device.

일실시예에 따른 사용자 질의 교정 방법은 입력된 사용자 질의가 오탈자 질의인지 여부를 판단하는 단계, 상기 사용자 질의가 오탈자 질의인 경우 적어도 하나의 교정 후보 문자열을 추출하는 단계, 상기 적어도 하나의 교정 후보 문자열에 대한 후보 점수를 산출하는 단계, 상기 교정 후보 문자열 중에서 교정되는 문자의 앞문자 혹은 뒷문자가 상기 교정되는 문자와 동일한 경우에 해당하는 문자열에 가중치를 부여하는 단계 및 상기 후보 점수 및 상기 가중치에 따라 상기 교정 후보 문자열 중에서 교정 질의를 결정하는 단계를 포함할 수 있다.According to an embodiment, a method of retouching a user query may include determining whether an input user query is a typo query, extracting at least one calibration candidate string when the user query is a typo query, and at least one calibration candidate string. Calculating a candidate score for the step, weighting a string corresponding to a case in which the front character or the rear character of the character to be corrected among the correction candidate strings is the same as the character to be corrected, and according to the candidate score and the weight And determining a calibration query among the candidate candidate strings.

다른 일실시예에 따른 사용자 질의 교정 방법에서, 상기 적어도 하나의 교정 후보 문자열에 대한 후보 점수를 산출하는 단계는, 오탈자-정자 질의 쌍으로 구성된 사전 데이터에 기초하여 정자 질의에 대한 제1 확률을 계산하는 단계, 상기 오탈자-정자 질의 쌍으로 구성된 사전 데이터에 기초하여 상기 사용자 질의에 대한 제2 확률을 계산하는 단계 및 상기 제1 확률 및 제2 확률에 따라 상기 후보 점수를 산출하는 단계를 더 포함할 수 있다.In another embodiment, the method of calculating a candidate score for the at least one candidate string for correction may include calculating a first probability for the sperm query based on dictionary data composed of a typo-sperm query pair. The method may further include calculating a second probability for the user query based on the dictionary data of the sperm-sperm query pair, and calculating the candidate score according to the first probability and the second probability. Can be.

또 다른 일실시예에 따른 사용자 질의 교정 방법에서, 상기 후보 점수 및 상기 가중치에 따라 상기 교정 후보 문자열 중에서 교정 질의를 결정하는 단계는, 상기 후보 점수에 상기 가중치를 합산하여 상기 교정 후보 문자열 중에서 가장 큰 후보 점수를 가지는 교정 후보 문자열을 교정 질의로 결정하는 것을 특징으로 할 수 있다.In another embodiment, the determining of a correction query among the candidate candidate strings based on the candidate scores and the weights comprises: adding the weights to the candidate scores to obtain the largest one among the candidate candidate strings. The correction candidate string having the candidate score may be determined as a correction query.

또 다른 일실시예에 따른 사용자 질의 교정 방법에서, 상기 교정되는 문자의 앞문자 혹은 뒷문자가 상기 교정되는 문자와 동일한 경우에 해당하는 문자열은, 상기 교정되는 문자의 앞문자 혹은 뒷문자의 자모가 상기 교정되는 문자의 자모와 동일한 경우에 해당하는 문자열이고, 상기 교정 질의에 따라 상기 사용자 질의를 교정하는 단계는, 상기 교정 질의에 따라 상기 오탈자에서 교정되는 문자의 자모를 삭제하는 단계인 것을 특징으로 할 수 있다.In another method of correcting a user query, a character string corresponding to a case in which a character before or after the character to be corrected is the same as the character to be corrected is a letter of the character before or after the character to be corrected. The character string corresponding to the letter of the character to be corrected, and the step of correcting the user query according to the correction query, characterized in that the step of deleting the letter of the character to be corrected in the typo according to the correction query. have.

또 다른 일실시예에 따른 사용자 질의 교정 방법에서, 상기 교정되는 문자의 앞문자 혹은 뒷문자가 상기 교정되는 문자와 동일한 경우에 해당하는 문자열은, 상기 교정되는 문자의 앞문자 혹은 뒷문자의 음절이 상기 교정되는 문자의 음절과 동일한 경우에 해당하는 문자열이고, 상기 교정 질의에 따라 상기 사용자 질의를 교정하는 단계는, 상기 교정 질의에 따라 상기 오탈자에서 교정되는 문자의 음절을 삭제하는 단계인 것을 특징으로 할 수 있다.In another method of correcting a user query, a string corresponding to a case in which a character before or after the character to be corrected is the same as the character to be corrected is a syllable of a syllable of a character before or after the character to be corrected. A character string corresponding to a syllable of a character to be corrected, and the correcting of the user query according to the correction query may include deleting a syllable of the character corrected in the typographer according to the correction query. have.

또 다른 일실시예에 따른 사용자 질의 교정 방법에서, 상기 사용자 질의가 오탈자 질의인 경우 적어도 하나의 교정 후보 문자열을 추출하는 단계는, 상기 사용자 질의가 오탈자-정자 질의 쌍으로 구성된 사전 데이터의 오탈자 질의에 등록되어 있는지 여부를 판단하는 단계 및 상기 오탈자 질의에 등록된 경우 상기 사전 데이터에서 상기 사용자 질의에 대응하는 교정 후보 문자열을 추출하는 단계를 포함할 수 있다.According to another embodiment, in the user query correcting method, extracting at least one correction candidate string when the user query is a typo query, the user query includes a typo query of dictionary data consisting of a typo-sperm query pair. The method may include determining whether the information is registered, and extracting a correction candidate string corresponding to the user query from the dictionary data when registered in the typo query.

일실시예에 따른 사용자 질의 교정 장치로서, 처리장치 및 처리장치에 의한 실행 시 상기 처리장치가 동작들을 수행하도록 하는 명령들을 포함하는 컴퓨터 저장 매체를 포함하고, 상기 동작들은, 입력된 사용자 질의가 오탈자 질의인지 여부를 판단하는 단계, 상기 사용자 질의가 오탈자 질의인 경우 적어도 하나의 교정 후보 문자열을 추출하는 단계, 상기 적어도 하나의 교정 후보 문자열에 대한 후보 점수를 산출하는 단계, 상기 교정 후보 문자열 중에서 교정되는 문자의 앞문자 혹은 뒷문자가 상기 교정되는 문자와 동일한 경우에 해당하는 문자열에 가중치를 부여하는 단계 및 상기 후보 점수 및 상기 가중치에 따라 상기 교정 후보 문자열 중에서 교정 질의를 결정하는 단계를 포함할 수 있다.An apparatus for calibrating a user query according to an embodiment, comprising: a computer storage medium including a processing device and instructions for causing the processing device to perform operations when executed by the processing device, wherein the operations include a typo entered by the user query. Determining whether the query is a query, extracting at least one proof candidate string when the user query is a typo query, calculating a candidate score for the at least one proof candidate string, and correcting among the proof candidate strings The method may include weighting a string corresponding to a case in which a character before or after a character is the same as the character to be corrected, and determining a correction query among the correction candidate strings according to the candidate score and the weight.

다른 일실시예에 따른 사용자 질의 교정 장치로서, 상기 적어도 하나의 교정 후보 문자열에 대한 후보 점수를 산출하는 단계는, 오탈자-정자 질의 쌍으로 구성된 사전 데이터에 기초하여 정자 질의에 대한 제1 확률을 계산하는 단계, 상기 오탈자-정자 질의 쌍으로 구성된 사전 데이터에 기초하여 상기 사용자 질의에 대한 제2 확률을 계산하는 단계 및 상기 제1 확률 및 제2 확률에 따라 상기 후보 점수를 산출하는 단계를 더 포함할 수 있다.A device for calibrating a user query according to another embodiment, wherein the calculating of a candidate score for the at least one candidate string for calibration includes calculating a first probability for the sperm query based on dictionary data composed of a sperm-sperm query pair. The method may further include calculating a second probability for the user query based on the dictionary data of the sperm-sperm query pair, and calculating the candidate score according to the first probability and the second probability. Can be.

또 다른 일실시예에 따른 사용자 질의 교정 장치로서, 상기 후보 점수 및 상기 가중치에 따라 상기 교정 후보 문자열 중에서 교정 질의를 결정하는 단계는, 상기 후보 점수에 상기 가중치를 합산하여 상기 교정 후보 문자열 중에서 가장 큰 후보 점수를 가지는 교정 후보 문자열을 교정 질의로 결정하는 것을 특징으로 할 수 있다.A device for calibrating a user query according to another embodiment, wherein the determining of a calibration query from among the candidate candidate strings according to the candidate score and the weighting value comprises: adding the weight to the candidate score to increase the largest among the candidate candidate strings. The correction candidate string having the candidate score may be determined as a correction query.

또 다른 일실시예에 따른 사용자 질의 교정 장치로서, 상기 교정되는 문자의 앞문자 혹은 뒷문자가 상기 교정되는 문자와 동일한 경우에 해당하는 문자열은, 상기 교정되는 문자의 앞문자 혹은 뒷문자의 자모가 상기 교정되는 문자의 자모와 동일한 경우에 해당하는 문자열이고, 상기 교정 질의에 따라 상기 사용자 질의를 교정하는 단계는, 상기 교정 질의에 따라 상기 오탈자에서 교정되는 문자의 자모를 삭제하는 단계인 것을 특징으로 할 수 있다.In another embodiment, a user query correcting apparatus, wherein a character string corresponding to a case in which a character before or after the character to be corrected is the same as the character to be corrected is a letter of the character before or after the character to be corrected. The character string corresponding to the letter of the character to be corrected, and the step of correcting the user query according to the correction query, characterized in that the step of deleting the letter of the character to be corrected in the typo according to the correction query. have.

또 다른 일실시예에 따른 사용자 질의 교정 장치로서, 상기 교정되는 문자의 앞문자 혹은 뒷문자가 상기 교정되는 문자와 동일한 경우에 해당하는 문자열은, 상기 교정되는 문자의 앞문자 혹은 뒷문자의 음절이 상기 교정되는 문자의 음절과 동일한 경우에 해당하는 문자열이고, 상기 교정 질의에 따라 상기 사용자 질의를 교정하는 단계는, 상기 교정 질의에 따라 상기 오탈자에서 교정되는 문자의 음절을 삭제하는 단계인 것을 특징으로 할 수 있다.In another embodiment, a user query correcting apparatus, wherein a character string corresponding to a case in which the character of the character to be corrected is the same as the character to be corrected is a syllable of the syllable of the character of the character before or after the character to be corrected. A character string corresponding to a syllable of a character to be corrected, and the correcting of the user query according to the correction query may include deleting a syllable of the character corrected in the typographer according to the correction query. have.

또 다른 일실시예에 따른 사용자 질의 교정 장치로서, 상기 사용자 질의가 오탈자 질의인 경우 적어도 하나의 교정 후보 문자열을 추출하는 단계는, 상기 사용자 질의가 오탈자-정자 질의 쌍으로 구성된 사전 데이터의 오탈자 질의에 등록되어 있는지 여부를 판단하는 단계 및 상기 오탈자 질의에 등록된 경우 상기 사전 데이터에서 상기 사용자 질의에 대응하는 교정 후보 문자열을 추출하는 단계를 포함할 수 있다.In another embodiment, a user query correcting apparatus, wherein the extracting of at least one correction candidate string when the user query is a typo query comprises: applying the typo query to the typo query of dictionary data consisting of a typo-sperm query pair. The method may include determining whether the information is registered, and extracting a correction candidate string corresponding to the user query from the dictionary data when registered in the typo query.

일실시예에 따르면 사용자 질의 교정 방법을 수행하도록 하는 명령어들을 포함하는 하나 이상의 프로그램을 저장한 컴퓨터 판독 가능 저장 매체가 제공될 수 있다.According to an embodiment, a computer readable storage medium may be provided that stores one or more programs including instructions for performing a user query retouching method.

일실시예에 따른 사용자 질의 교정 방법 및 장치에서, 오탈자로 판단된 사용자 질의를 교정하는 경우, 오탈자-정자 질의 쌍으로 구성된 사전 데이터에 기초하여 확률에 따른 후보 점수를 산출함으로써 오탈자 질의를 보다 정확하게 교정할 수 있다.In the method and apparatus for correcting a user query according to an embodiment, when correcting a user query determined as a typo, a candidate score based on probability is calculated based on prior data composed of a typo-sperm query pair, thereby more accurately correcting the typo query can do.

일실시예에 따른 사용자 질의 교정 방법 및 장치에서, 오탈자로 판단된 사용자 질의를 교정하는 경우, 오탈자-정자 질의 쌍에서 오탈자가 중복음절을 포함한다면 후보 점수에 가중치를 부여함으로써 오탈자 질의를 보다 정확하고 사용자의 의도에 부합하게 교정할 수 있다.In the method and apparatus for correcting a user query according to an embodiment, when correcting a user query determined as a typo, the typo query is more accurate by weighting the candidate score if the typo in the typo-sperm query pair includes the middle verse. Calibration can be made according to the user's intention.

일실시예에 따른 사용자 질의 교정 방법 및 장치에서, 오탈자로 판단된 사용자 질의를 교정하는 경우, 오탈자-정자 질의 쌍에서 오탈자가 중복자모를 포함한다면 후보 점수에 가중치를 부여함으로써 오탈자 질의를 보다 정확하게 교정할 수 있다.In the method and apparatus for correcting a user query according to an embodiment, when correcting a user query that is determined to be a typo, correcting the typo query more accurately by weighting a candidate score if the typo in a typo-sperm query pair includes duplicate hairs can do.

도 1은 일실시예에 있어서, 사용자단말기와 사용자 질의 교정 장치의 관계를 개괄적으로 나타낸 도면이다.
도 2는 일실시예에 있어서, 사용자 질의를 교정하는 방법을 표시하기 위한 흐름도이다.
도 3은 일실시예에 있어서, 교정 후보 문자열을 추출하는 방법을 표시하기 위한 흐름도이다.
도 4는 일실시예에 있어서, 후보 점수를 산출하는 방법을 표시하기 위한 흐름도이다.
도 5는 일실시예에 있어서, 사용자 질의 교정 장치의 세부 구성을 도시한 블럭도이다.1 is a diagram schematically illustrating a relationship between a user terminal and a user query calibrating apparatus, according to an exemplary embodiment.
2 is a flowchart for displaying a method of correcting a user query, according to an exemplary embodiment.
3 is a flowchart illustrating a method of extracting a candidate candidate string, according to an exemplary embodiment.
4 is a flowchart of a method of calculating a candidate score, according to an exemplary embodiment.
FIG. 5 is a block diagram illustrating a detailed configuration of an apparatus for correcting a user query according to an embodiment.

이하, 첨부된 도면들에 기재된 내용들을 참조하여 각 실시예를 상세히 설명한다.Hereinafter, each embodiment will be described in detail with reference to the contents described in the accompanying drawings.

도 1은 일실시예에 있어서, 사용자단말기(190)와 사용자 질의 교정 장치(100)의 관계를 개괄적으로 나타낸 도면이다.1 is a diagram schematically illustrating a relationship between a user terminal 190 and a user query calibrating apparatus 100.

사용자단말기(190)에서 사용자는 검색을 위해 사용자 질의(110)를 입력할 수 있다. 이 때, 사용자 질의는 하나 또는 둘 이상의 단어로 구성될 수 있다. 입력된 사용자 질의는 사용자 질의 교정 장치(100)로 전송될 수 있다. 사용자 질의 교정 장치(100)는 입력된 사용자 질의가 오탈자 질의인지 여부를 판단할 수 있다. 그리고 사용자 질의 교정 장치(100)는 사용자단말기(190)와 통신이 가능한 전자장치로서 일실시예에 따르면 서버일 수 있다.In the user terminal 190, a user may input a user query 110 for searching. In this case, the user query may be composed of one or more words. The input user query may be transmitted to the user query correcting apparatus 100. The user query correcting apparatus 100 may determine whether the input user query is a typo query. In addition, the user query calibration apparatus 100 may be an electronic device capable of communicating with the user terminal 190, and according to an exemplary embodiment, may be a server.

이 때 사용자 질의가 오탈자 질의로 판단되는 경우, 사용자 질의 교정 장치(100)는 오탈자 질의를 교정하여 교정 질의(120)를 제공할 수 있다. 반대로 사용자 질의 교정 장치(100)가 입력된 사용자 질의를 오탈자 질의로 판단하지 않으면 별도의 교정 질의를 제공하지 않을 수 있다.In this case, when the user query is determined to be a typo query, the user query calibration apparatus 100 may correct the typo query to provide the calibration query 120. On the contrary, if the user query calibration apparatus 100 does not determine the input user query as a typo query, it may not provide a separate calibration query.

여기서 사용자단말기(190)는 사용자 질의 교정 장치(100)와 통신이 가능한 장치로서 PC, 노트북, IPTV, 스마트폰, 태블릿PC 및 기타 유무선 통신이 가능한 모든 전자장치를 포함할 수 있다. 일실시예에 따르면 중복된 문자를 포함하는 오탈자 질의는 스마트폰과 같은 휴대용 단말기에서 문제될 수 있다.Here, the user terminal 190 may be a device capable of communicating with the user inquiry correcting apparatus 100 and may include a PC, a notebook computer, an IPTV, a smartphone, a tablet PC, and all other electronic devices capable of wired and wireless communication. According to an embodiment, a misspelling query including a duplicate character may be a problem in a portable terminal such as a smartphone.

도 2는 일실시예에 있어서, 사용자 질의를 교정하는 방법을 표시하기 위한 흐름도이다.2 is a flowchart for displaying a method of correcting a user query, according to an exemplary embodiment.

단계(S210)에서는, 입력된 사용자 질의가 오탈자 질의인지 여부를 판단할 수 있고 오탈자 질의가 아닌 것으로 판단되면 다음 단계를 수행하지 않고 바로 종료할 수 있다. 사용자 질의는 사용자가 입력한 질의를 의미하며, 사용자가 검색하거나 문서를 작성할 때 입력되는 단어 또는 단어들의 집합으로 구성될 수 있다. 여기서 오탈자 질의는 사용자가 입력한 사용자 질의 중 한영키 전환이 잘못된 경우나 또는 키 입력이 잘못된 경우 등에서 발생하는 질의를 의미할 수 있다. 일실시예에 따르면 스마트폰과 같은 휴대용 단말기에서 키 입력이 잘못된 경우가 많이 발생할 수 있다.In step S210, it is possible to determine whether the input user query is a typo query, and if it is determined that the typo query is not the typo query, the user query can be immediately terminated without performing the next step. The user query refers to a query input by the user, and may be composed of a word or a set of words input when the user searches or writes a document. Here, the typo query may refer to a query that occurs when a Korean-English key change is incorrect or a key input is incorrect. According to an embodiment, a key input may be wrong in a portable terminal such as a smart phone.

구체적으로는 입력된 사용자 질의가 정자 단어를 가지는 정자 사전에 문자열이 없는 경우, 사용자 질의가 미리 수집한 사용자들이 흔히 범하는 오류 단어들의 목록에 있는 경우, 사용자 질의로 웹 검색하여 검색되는 문서 수가 현저히 적은 경우, 사용자 질의가 소설이나 신문 기사와 같이 정자 단어로 구성되는 텍스트에서 저빈도 혹은 나타나지 않는 경우 또는 사용자 질의가 완성되지 않은 한글에 해당하는 경우를 오탈자 질의로 판단할 수 있다.Specifically, if the input user query does not have a string in the sperm dictionary with the sperm word, or if the user query is in the list of error words commonly encountered by the users collected in advance, the number of documents retrieved by searching the web with the user query is remarkable. In a few cases, the user's query may be judged as a typo query if the user's query is low frequency or does not appear in a text composed of sperm words such as a novel or newspaper article, or if the user's query corresponds to a Korean character that is not completed.

일실시예에 따르면 입력된 사용자 질의 및 교정 후보 문자열이 오탈자-정자 질의 쌍으로 구성된 사전 데이터의 정자 질의에 등록되어 있는지 여부를 판단할 수 있는 바, 오탈자-정자 질의 쌍으로 구성되는 사전 데이터는 오탈자 질의 각각에 대응하는 정자 질의를 포함하는 데이터를 의미할 수 있다. 오탈자 질의는 공백을 포함할 수 있으며, 정자 질의는 공백까지 그대로 포함할 수 있다. 오탈자-정자 질의 쌍으로 구성되는 사전 데이터의 일례는 표 3과 같다.According to an embodiment of the present invention, it is possible to determine whether the input user query and the correction candidate string are registered in the sperm query of the dictionary data composed of the sperm-sperm query pair. It may mean data including a sperm query corresponding to each query. Misspelled queries can include spaces, and sperm queries can include spaces. An example of dictionary data consisting of a dentifier-sperm query pair is shown in Table 3.

오탈자 질의Typo 정자 질의Sperm vaginal 감명깊게 읽은 채Impressed 감명깊게 읽은 책An impressive book 한국카트Korean Cart 한국카드Korea Card 원더걸스 소학Wonder Girls Study 원더걸스 소핫Wonder Girls Sohat 네이버 지싯쇼핀Naver Zitshopin 네이버 지식쇼핑Naver Knowledge Shopping

정자 단어로 구성되는 사전 데이터는 정자 단어가 포함된 데이터를 의미할 수 있다. 구체적으로 정자 단어는 국어 사전, 백과 사전 등의 정확도가 매우 높은 데이터로부터 추출될 수 있다. 오탈자-정자 질의 쌍으로 구성되는 사전 데이터는 오탈자 질의 전체에 대한 정자 질의를 제공하는 반면에 정자 단어로 구성되는 사전 데이터는 오탈자 질의를 구성하는 단어 각각에 대응하는 정자 단어를 제공할 수 있다.Dictionary data composed of sperm words may refer to data containing sperm words. In detail, the sperm word may be extracted from highly accurate data such as a Korean dictionary and an encyclopedia. Dictionary data consisting of a sperm-sperm query pair provides a sperm query for the entire sperm query, while dictionary data consisting of sperm words may provide a sperm word corresponding to each word constituting the sperm query.

이어서 교정 후보 문자열 추출 단계(S220)에서는, 오탈자 질의로 판단된 사용자 질의에 대해 오탈자-정자 질의 쌍으로 구성되는 사전 데이터로부터 교정 후보 문자열을 추출할 수 있다. 예를 들면 "소네시대"라는 오탈자 질의에 대해 오탈자-정자 질의 쌍의 정자 질의에 해당하는 "소녀시대, 소년시대"가 교정 후보 문자열로 추출될 수 있다.Subsequently, in the correction candidate string extraction step (S220), the correction candidate string may be extracted from the dictionary data composed of the sperm-sperm query pairs with respect to the user query that is determined as the sperm query. For example, a "girl's age, a boy's age" corresponding to a sperm query of a sperm-sperm query pair may be extracted as a correction candidate string for a sperm query called "one generation".

그리고 후보 점수 산출 단계(S230)에서는, 정자 질의를 입력할 확률과 사용자 질의를 입력할 확률에 기반하여 후보 점수를 계산할 수 있다. 이는 도 3에서 상세히 설명한다.In operation S230, candidate scores may be calculated based on a probability of inputting a sperm query and a probability of inputting a user query. This is described in detail in FIG. 3.

이어 단계(S240)에서는, 추출된 교정 후보 문자열이 중복되는 문자를 교정한 것인지 여부를 판단할 수 있다. 문자가 중복되는 것은 교정되는 문자의 앞문자 혹은 뒷문자가 교정되는 문자와 동일한 것으로서 구체적으로는 음절 또는 자모가 중복되는 경우일 수 있다. 예를 들면, 중복되는 음절을 삭제한 경우로서 "차"가 두 번 중복되는 "자동차차"에 대한 교정 후보 문자열로서 추출된 "자동차"나, 중복되는 자모를 삭제한 경우로서 "한나"의 "ㄴ"이 두 번 중복되는 "한나카드"에 대한 교정 후보 문자열로서 추출된 "하나카드"가 해당할 수 있다.Subsequently, in step S240, it may be determined whether the extracted candidate candidate character string has been corrected with duplicate characters. The overlapping character is the same as the character to be corrected before or after the character to be corrected, specifically, the syllable or the letter may be duplicated. For example, "car" extracted as a candidate string for "car" that duplicates "car" twice as a duplicate syllable, or "Hanna" " It may correspond to "Hana card" extracted as a candidate string for correction of "Hanna card" where "B" is duplicated twice.

여기서 교정 후보 문자열이 중복되는 문자를 교정하여 추출된 것이라면 다음 가중치 부여 단계(S250)로 진행할 수 있고, 그렇지 않다면 교정 질의 단계(S260)로 진행할 수 있다.In this case, if the corrected candidate string is extracted by correcting duplicate characters, the process may proceed to the next weighting step S250, and if not, the process may proceed to the correcting query step S260.

그리고 가중치 부여 단계(S250)에서는, 오탈자 질의로 판단된 사용자 질의에 대응하는 교정 후보 문자열 중에서, 교정되는 문자의 앞문자 혹은 뒷문자가 교정되는 문자와 동일한 경우에 해당하는 교정 후보 문자열의 후보 점수에 가중치를 부여할 수 있다. 여기서 가중치는 사용자가 중복 음절 또는 중복 자모로 오탈자를 만드는 빈도에 따른 확률에 기초하여 산출될 수 있다.In the weighting step (S250), among the candidate candidate strings corresponding to the user query determined as the typo query, weights are applied to candidate scores of the candidate candidate strings corresponding to the case in which the first letter or the second letter of the letter to be corrected is the same as the letter to be corrected. Can be given. The weight may be calculated based on the probability of the user making a typo with duplicate syllables or duplicate letters.

예를 들면, "한나카드"가 사용자 질의로 입력된 경우 가능한 후보로서 "하나카드"와 "한자카드"가 있는바, "하나카드"는 중복된 자모인 "ㄴ"을 삭제하여 교정한 후보 문자열이고, "한자카드"는 "ㄴ"을 "ㅈ"으로 변경하여 교정한 후보 문자열이다. 일실시예에 따르면 천지인 패드를 사용하는 휴대용 단말기에서는 "ㄴ"을 두 번 잘못 누른 것으로 보는 것이 타당하므로 교정 후보 문자열 중에서 "하나카드"에만 가중치를 부여하거나 "한자카드"에 비해 더 많은 가중치를 부여할 수 있다.For example, when "Hanna Card" is entered as a user query, there are "Hana Card" and "Hanja Card" as possible candidates. "Hana Card" is a candidate string corrected by deleting the duplicate letter "b". "Kanji card" is a candidate character string corrected by changing "b" to "j". According to an embodiment of the present invention, it is reasonable to view “n” twice as wrong in a portable terminal using a pad that is cheonjiin, so that only “one card” is weighted among the candidate candidate strings or more weighted than “kanji card”. can do.

이어서 교정 질의 결정 단계(S260)에서는, 후보 점수와 가중치에 따라 교정 후보 문자열 중에서 교정 질의를 결정할 수 있다. 구체적으로는 정자 질의에 대한 제1 확률에 따른 후보 점수, 오탈자 질의에 대한 제2 확률에 따른 후보 점수 및 중복되는 문자에 대한 가중치를 합산하여 가장 큰 후보 점수를 가지는 교정 후보 문자열을 교정 질의로 결정할 수 있다.Subsequently, in the determination of the correction query (S260), the correction query may be determined among the candidate candidate strings according to the candidate score and the weight. Specifically, a candidate candidate string having the largest candidate score is determined as a correction query by summing a candidate score according to a first probability for a sperm query, a candidate score according to a second probability for a typo query, and weights for overlapping characters. Can be.

예를 들어, "한나카드"에 대한 후보 교정 문자열로서 "한자카드"와 "하나카드"가 있다고 가정한다. 우선 "한자카드"에 대해서 "한자카드"가 입력될 빈도에 따른 제1 확률 및 "자"를 "나"로 잘못 입력하는 빈도에 따른 제2 확률에 기초하여 후보 점수를 합산한다. 다음으로 "하나카드"에 대해서 "하나카드"가 입력될 빈도에 따른 제1 확률 및 "하"를 "한"으로 잘못 입력하는 빈도에 따른 제2 확률에 기초한 후보 점수 및 중복된 "ㄴ"을 삭제한 교정에 따른 가중치를 합산한다. 확률에 따른 후보 점수가 비슷하다면 "하나카드"가 중복 문자에 대한 가중치를 가지므로, 합산하면 "한자카드"보다 후보 점수가 커져서 교정 질의로 결정될 수 있다.For example, assume that there are "Hanja card" and "Hana card" as candidate correction strings for "Hanna card". First, the candidate scores are summed based on the first probability according to the frequency with which the "kanji card" is input for the "kanji card" and the second probability according to the frequency of incorrectly inputting "ja" as "I". Next, candidate scores based on the first probability according to the frequency with which "Hana Card" is input and the second probability according to the frequency of incorrectly inputting "Ha" as "one" and "B" for "Hana Card" Sum the weights according to the corrections you deleted. If the candidate scores are similar according to the probability, the "one card" has weights for the duplicate characters, and when summed, the candidate score is larger than the "Hanja card", and thus can be determined by the correction query.

도 3은 일실시예에 있어서, 교정 후보 문자열을 추출하는 방법을 표시하기 위한 흐름도이다.3 is a flowchart illustrating a method of extracting a candidate candidate string, according to an exemplary embodiment.

단계(S321)에서는 입력된 사용자 질의가 오탈자-정자 질의 쌍으로 구성된 사전 데이터의 정자 질의에 등록되어 있는지 여부를 판단할 수 있으며 일실시예에 따르면 도 2의 단계(210)에서 미리 수행될 수 있다. 여기서 입력된 사용자 질의가 오탈자-정자 질의 쌍으로 구성된 사전 데이터의 정자 질의에 등록되어 있는 경우 정자 단어 제공 단계(S322), 등록되어 있지 않은 경우에는 알고리즘 기반 추출 단계(S323)를 수행할 수 있다.In operation S321, it may be determined whether the input user query is registered in the sperm query of the dictionary data consisting of the sperm-sperm query pair, and according to an embodiment, it may be performed in advance in step 210 of FIG. 2. . If the user query input here is registered in the sperm query of the dictionary data consisting of the sperm-sperm query pair, the sperm word providing step (S322), and if not registered, the algorithm-based extraction step (S323) may be performed.

정자 단어 제공 단계(S322)에서는, 사용자 질의가 사전 데이터에서 탐색된 경우 대응하는 정자 질의를 교정 후보 문자열로 추출할 수 있다. 여기서 사용자 질의에 대응하는 교정 후보 문자열은 적어도 하나 이상일 수 있다. 일실시예에 따르면 사용자 질의에서 중복 자음 또는 음절을 삭제한 교정 후보 문자열에 대한 사전 데이터가 표 4와 같이 별도로 구축될 수 있다.In the sperm word providing step (S322), when the user query is found in the dictionary data, the corresponding sperm query may be extracted as the correction candidate string. Here, the candidate candidate string corresponding to the user query may be at least one. According to an embodiment, dictionary data for a candidate candidate string for which a duplicate consonant or a syllable is deleted from a user query may be separately constructed as shown in Table 4 below.

오탈자 질의Typo 정자 질의
(중복 문자 삭제)Sperm vaginal
(Delete Duplicate Characters) 정자 질의Sperm vaginal 삼확고속High speed 삼화고속Samhwa Express 삼활고속Samwha Express ...... 한나카드Hannah Card 하나카드Hana Card 한자카드Kanji Card ...... 자동차차Car 자동차car 자동차창Car window ...... ?본?example 밀본Milbon 금금값Gold price 금값Gold price 감금값Imprisonment ......

알고리즘 기반 추출 단계(S323)에서는 사용자 질의가 사전 데이터에서 탐색되지 않은 경우 알고리즘을 통해 교정 후보 문자열을 추출할 수 있다. 일실시예에 따르면 오탈자-정자 질의 쌍으로 구성된 사전 데이터는 오탈자에 대해 통계적으로 축적된 데이터이므로, 사전 데이터를 탐색해도 입력된 사용자 질의가 없는 경우에는 오탈자에 대한 적합한 교정 질의를 제공할 수 없다. 따라서 대량의 정자 단어 목록을 만들어 두고, 해당 단어 목록에서 사용자 질의와 형태적으로 비슷한 정자 단어를 추출하는 알고리즘을 사용하여 교정 후보 문자열을 추출할 수 있다.In the algorithm-based extraction step (S323), when the user query is not found in the dictionary data, the candidate candidate string may be extracted through the algorithm. According to an embodiment of the present invention, since dictionary data consisting of a typo-sperm query pair is statistically accumulated data on the typo, the searcher may not provide a corrective query for the typo if there is no input user query. Therefore, a large list of sperm words can be made, and candidate candidate strings can be extracted using an algorithm that extracts sperm words that are similar in shape to the user's query.

도 4는 일실시예에 있어서, 후보 점수를 산출하는 방법을 표시하기 위한 흐름도이다.4 is a flowchart of a method of calculating a candidate score, according to an exemplary embodiment.

제1 확률 계산 단계(S431)에서는 정자 질의에 대한 확률을 계산할 수 있다. 구체적으로는 수학식 1에서 정의될 수 있다. 예를 들어 입력된 사용자 질의가 "한나카드"이고 이에 대한 교정 후보 문자열로서 "하나카드"가 추출되면, 사용자가 "하나카드"를 입력할 확률이 정자 질의에 대한 제1 확률일 수 있다.In a first probability calculation step S431, a probability for a sperm query may be calculated. Specifically, it may be defined in Equation 1. For example, if the input user query is "Hanna card" and "Hana card" is extracted as a candidate candidate string for correction, the probability that the user inputs "Hana card" may be the first probability for the sperm query.

이어서, 제2 확률 계산 단계(S432)에서는 사용자 질의에 대한 확률을 계산할 수 있다. 구체적으로는 수학식 1에서 정의될 수 있다. 예를 들어 입력된 사용자 질의가 "한나카드"이고 이에 대한 교정 후보 문자열로서 "하나카드"가 추출되면, 사용자가 "하"를 "한"으로 잘못 입력한 확률이 사용자 질의에 대한 제2 확률일 수 있다.Subsequently, in the second probability calculation step S432, a probability for the user query may be calculated. Specifically, it may be defined in Equation 1. For example, if the entered user query is "Hannacard" and "Hanacard" is extracted as the candidate correction string for it, the probability that the user incorrectly inputs "ha" as "han" is the second probability for the user query. Can be.

일실시예에 따르면 상술한 정자 질의와 사용자 질의의 확률은 서로간 상이한 음절에 기초하여 음절 변환 확률을 계산하는 하기 수학식 1에 따라 결정될 수 있다.According to an embodiment, the probability of the sperm query and the user query described above may be determined according to Equation 1 below, which calculates a syllable conversion probability based on different syllables.

Q는 사용자 질의를 의미하고, Q'는 오탈자-정자 질의 쌍으로 구성된 사전 데이터를 통해 교정된 정자 질의를 의미할 수 있다. 이 때,

에 대해 음절 단위의 음절 변환 확률을 이용할 수 있다. 여기서,

는 사용자가 오탈자를 정자로 잘못 알고 있다가, 사용자가 잘못 알고 있음을 인지하고 정자로 교정할 확률을 의미할 수 있다. 또는

는 사용자가 오탈자 질의를 입력한 후, 질의가 잘못 입력되었음을 인지하고, 정자 질의를 입력할 확률로 해석될 수 있다.Q may mean a user query, and Q 'may mean a sperm query corrected through dictionary data composed of a typo-sperm query pair. At this time,

The syllable conversion probability in syllable units can be used for. here,

Denotes a probability that a user incorrectly recognizes a typo as a sperm, and then corrects the sperm by recognizing that the user incorrectly recognizes the typo. or

After the user inputs a wrong person query, the user may recognize that the query is incorrectly input and may be interpreted as a probability of inputting a sperm query.

는

로 대체될 수 있다. 이 때,

는 사용자가 사용자 질의를 정자 질의로 생각하고 있지만, 타이핑 과정에서 오탈자를 생성할 확률로 해석될 수 있다.

The

&Lt; / RTI > At this time,

Although the user considers the user query as a sperm query, it can be interpreted as a probability of generating a typo in the typing process.

사용자 질의를 구성하는 단어 전체에 대해 변환 확률을 구하는 경우, 자료 부족 문제가 발생할 수 있다. 또한, 단어의 개수가 증가할수록 계산량이 급격하게 증가할 수 있다. 따라서, 일실시예에 따르면, 사용자 질의와 정자 질의 중 상이한 음절열 부분에 대해서 음절 단위의 음절 변환 확률을 계산할 수 있다.When calculating the probability of conversion for all the words that make up a user query, a data shortage problem may occur. In addition, as the number of words increases, the amount of calculation may increase rapidly. Therefore, according to an embodiment, syllable conversion probabilities in syllable units may be calculated for different syllable sequence portions of a user query and a sperm query.

일례로, 수학식 1에서

는 수학식 2에 따라 결정될 수 있다.For example, in Equation 1

May be determined according to Equation 2.

이 때, 수학식 2에서

는 음절간 변환 확률을 나타낸다. 단어 q_ij와 q'_ij에 대해 상이한 음절열을 기준으로 분할할 수 있다. 수학식 2에서는 k개의 분할이 이루어진 것으로 가정하여 분할된 결과 중 서로 다른 음절열에 대해서 확률을 계산할 수 있다. 예를 들어, 사용자 질의가 abcd이고, 정자 질의가 abed인 경우, 음절간 변환 확률

는

가 될 수 있다.At this time, in equation (2)

Denotes the probability of intersyllable conversion. The words q _ij and q ' _ij may be divided based on different syllable sequences. In Equation 2, assuming that k divisions are made, a probability may be calculated for different syllable sequences among the divided results. For example, if the user query is abcd and the sperm query is abed, the intersyllable conversion probability

The

.

일례로, 음절간 변환 확률은 오탈자-정자 질의 쌍으로 구성된 사전 데이터, QC(사용자 질의에 대한 입력 빈도), QQ(사용자 질의 쌍의 입력 빈도)를 이용하여 다음의 과정을 통해 계산될 수 있다.For example, the intersyllable conversion probability may be calculated through the following process using dictionary data consisting of a sperm-sperm query pair, QC (input frequency for a user query), and QQ (input frequency of a user query pair).

(1) 사전 데이터에 포함된 오탈자-정자 질의 쌍 각각에 QC, QQ를 부여한다. 예를 들어, abcd (qc: 10)-abed(qc:100), qq:5(1) Assign QC and QQ to each sperm-sperm query pair included in the dictionary data. For example, abcd (qc: 10) -abed (qc: 100), qq: 5

(2) 오탈자-정자 질의 쌍에서 서로 다른 부분 문자열(c-e)을 결정한다. (2) Determine different substrings (c-e) in the typos-sperm query pair.

(3) 부분 문자열의 빈도를 계산한다. 구체적으로, 사전 데이터에서 c-e 쌍이 나타난 모든 오탈자-정자 질의 쌍들에 대해 qc, qq의 합을 계산한다. 예를 들면, c(qc:50)-e(qc:1000), qq:20(3) Calculate the frequency of the substring. Specifically, the sum of qc and qq is calculated for all typos-sperm query pairs in which c-e pairs appear in prior data. For example, c (qc: 50) -e (qc: 1000), qq: 20

(4) 계산된 빈도를 이용해서 음절 변환 확률을 계산한다.(4) The syllable conversion probability is calculated using the calculated frequency.

=20/50

= 20/50

그리고 후보 점수 산출 단계(S433)에서는 제1 확률 및 제2 확률에 따라 후보 점수를 산출할 수 있다. 후보 점수는 확률을 그대로 사용하거나 확률을 빈도로 변환하는 등 기타 확률을 점수로 변환하는 수식에 의해 산출될 수 있다.In operation S433, the candidate score may be calculated based on the first probability and the second probability. The candidate score may be calculated by a formula that converts the probability into a score, such as using the probability as it is or converting the probability into a frequency.

도 5는 일실시예에 있어서, 사용자 질의 교정 장치(500)의 세부 구성을 도시한 블럭도이다.FIG. 5 is a block diagram illustrating a detailed configuration of the user query correcting apparatus 500 according to an exemplary embodiment.

사용자 질의 교정 장치(500)는 처리장치(501) 및 저장매체(502)를 포함할 수 있다. 사용자 질의 교정 장치(500)는 사용자단말기(590)으로부터 사용자질의(510)를 입력 받고 교정 질의(520)를 전송할 수 있는 전자장치로서 서버일 수 있으나 이에 한정하는 것은 아니다.The user query calibration apparatus 500 may include a processing apparatus 501 and a storage medium 502. The user query calibration apparatus 500 may be a server as an electronic device that receives a user query 510 from the user terminal 590 and transmits a calibration query 520, but is not limited thereto.

처리장치(501)는 저장매체(502)에 저장된 프로그램에 따른 동작들을 처리하는 장치로서 CPU 및 기타 마이크로프로세서를 포함할 수 있다.The processing device 501 may include a CPU and other microprocessors as a device for processing operations according to a program stored in the storage medium 502.

저장매체(502)는 처리장치(501)로 하여금 입력된 사용자 질의가 오탈자 질의인지 여부를 판단하는 단계, 상기 사용자 질의가 오탈자 질의인 경우 적어도 하나의 교정 후보 문자열을 추출하는 단계, 상기 적어도 하나의 교정 후보 문자열에 대한 후보 점수를 산출하는 단계, 상기 교정 후보 문자열 중에서 교정되는 문자의 앞문자 혹은 뒷문자가 상기 교정되는 문자와 동일한 경우에 해당하는 문자열에 가중치를 부여하는 단계 및 상기 후보 점수 및 상기 가중치에 따라 상기 교정 후보 문자열 중에서 교정 질의를 결정하는 단계에 따른 동작을 수행하도록 하는 명령어를 포함하는 프로그램을 저장할 수 있다. 저장매체(502)는 예를 들어, 하드디스크, SSD, SD카드 및 기타 프로그램을 저장할 수 있는 하드웨어일 수 있다.The storage medium 502 determines, by the processing device 501, whether the input user query is a typo query, extracting at least one calibration candidate string when the user query is a typo query, and the at least one Calculating a candidate score for a proof candidate string, weighting a string corresponding to a case in which the front letter or the back letter of the letter to be corrected among the proof candidate strings is the same as the letter to be corrected, and the candidate score and the weight The program may include a program including an instruction to perform an operation according to the step of determining a calibration query among the calibration candidate strings. The storage medium 502 may be, for example, hardware capable of storing a hard disk, an SSD, an SD card, and other programs.

이상에서 설명된 장치는 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치 및 구성요소는, 예를 들어, 프로세서, 콘트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPA(field programmable array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 애플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 콘트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The apparatus described above may be implemented as a hardware component, a software component, and / or a combination of hardware components and software components. For example, the apparatus and components described in the embodiments may be implemented within a computer system, such as, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable array (FPA) A programmable logic unit (PLU), a microprocessor, or any other device capable of executing and responding to instructions. The processing device may execute an operating system (OS) and one or more software applications running on the operating system. The processing device may also access, store, manipulate, process, and generate data in response to execution of the software. For ease of understanding, the processing apparatus may be described as being used singly, but those skilled in the art will recognize that the processing apparatus may have a plurality of processing elements and / As shown in FIG. For example, the processing unit may comprise a plurality of processors or one processor and one controller. Other processing configurations are also possible, such as a parallel processor.

?소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치, 또는 전송되는 신호 파(signal wave)에 영구적으로, 또는 일시적으로 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.The software may include a computer program, code, instructions, or a combination of one or more of the above, and configure the processing unit to operate as desired, or independently or collectively. The processing device can be commanded. The software and / or data may be in the form of any type of machine, component, physical device, virtual equipment, computer storage media, or device , Or may be permanently or temporarily embodied in a transmitted signal wave. The software may be distributed over a networked computer system and stored or executed in a distributed manner. The software and data may be stored on one or more computer readable recording media.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 실시예의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The method according to an embodiment may be implemented in the form of a program command that can be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, and the like, alone or in combination. The program instructions to be recorded on the medium may be those specially designed and configured for the embodiments or may be available to those skilled in the art of computer software. Examples of computer-readable media include magnetic media such as hard disks, floppy disks and magnetic tape; optical media such as CD-ROMs and DVDs; magnetic media such as floppy disks; Magneto-optical media, and hardware devices specifically configured to store and execute program instructions such as ROM, RAM, flash memory, and the like. Examples of program instructions include machine language code such as those produced by a compiler, as well as high-level language code that can be executed by a computer using an interpreter or the like. The hardware devices described above may be configured to operate as one or more software modules to perform the operations of the embodiments, and vice versa.

이상과 같이 실시예들이 비록 한정된 실시예와 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기의 기재로부터 다양한 수정 및 변형이 가능하다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed exemplary embodiments. For example, it is to be understood that the techniques described may be performed in a different order than the described methods, and / or that components of the described systems, structures, devices, circuits, Lt; / RTI > or equivalents, even if it is replaced or replaced.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 특허청구범위의 범위에 속한다.Therefore, other implementations, other embodiments, and equivalents to the claims are also within the scope of the following claims.

500: 사용자 질의 교정 장치
501: 처리장치
502: 저장매체
510: 사용자 질의
520: 교정 질의
590: 사용자단말기500: user quality correction device
501 processing device
502: storage medium
510: user query
520: correction query
590: user terminal

Claims

Determining whether the input user query is a typo query;
Extracting at least one correction candidate string when the user query is a typo query;
Calculating a candidate score for the at least one calibration candidate string;
Determining whether the at least one proof candidate string corrects duplicate characters in the user query;
And based on the determination result, if the at least one proof candidate string includes correcting duplicate characters in the user query, the correct candidate candidate string corresponding to correcting the duplicate characters in the user query is higher than the remaining proof candidate string. Assigning a relatively high weight; And
Determining a calibration query among the at least one calibration candidate string according to the candidate score and the weight
User query correction method comprising a.

The method of claim 1,
Computing a candidate score for the at least one correction candidate string,
Calculating a first probability for the sperm query based on dictionary data consisting of a sperm-sperm query pair;
Calculating a second probability for the user query based on dictionary data consisting of the sperm-sperm query pair; And
Calculating the candidate score according to the first probability and the second probability
User query correction method further comprising.

3. The method of claim 2,
Determining a calibration query from the at least one calibration candidate string according to the candidate score and the weight,
Adding the weights to the candidate scores to determine a calibration candidate string having the largest candidate score among the at least one calibration candidate strings as a calibration query.
User query correction method characterized in that.

The method of claim 1,
The character string corresponding to the case where the character before or after the character to be corrected is the same as the character to be corrected,
A character string corresponding to the case where the letter or syllable of the letter or the letter of the letter to be corrected is the same as the letter or syllable of the letter to be corrected,
Correcting the user query according to the correction query,
Deleting the alphabet or syllable of the character to be corrected in the typo in accordance with the correction query
User query correction method characterized in that.

The method of claim 1,
If the user query is a typo query, extracting at least one correction candidate string,
Determining whether the user query is registered in a spelling query of dictionary data consisting of a spelling-sperm query pair; And
Extracting a candidate candidate string corresponding to the user query from the dictionary data when registered in the typo query
User query correction method comprising a.

As an electronic device,
Processing device; And
A computer storage medium comprising instructions for causing the processing device to perform operations when executed by the processing device, the operations comprising:
Determining whether the input user query is a typo query;
Extracting at least one correction candidate string when the user query is a typo query;
Calculating a candidate score for the at least one calibration candidate string;
Determining whether the at least one proof candidate string corrects duplicate characters in the user query;
And based on the determination result, if the at least one proof candidate string includes correcting duplicate characters in the user query, the correct candidate candidate string corresponding to correcting the duplicate characters in the user query is higher than the remaining proof candidate string. Assigning a relatively high weight; And
Determining a calibration query among the at least one calibration candidate string according to the candidate score and the weight
User query correction apparatus comprising a.

The method according to claim 6,
Computing a candidate score for the at least one correction candidate string,
Calculating a first probability for the sperm query based on dictionary data consisting of a sperm-sperm query pair;
Calculating a second probability for the user query based on dictionary data consisting of the sperm-sperm query pair; And
Calculating the candidate score according to the first probability and the second probability
The user query correction apparatus further comprising.

The method of claim 7, wherein
Determining a calibration query from the at least one calibration candidate string according to the candidate score and the weight,
Adding the weights to the candidate scores to determine a calibration candidate string having the largest candidate score among the at least one calibration candidate strings as a calibration query.
User query correction apparatus characterized in that.

The method according to claim 6,
The character string corresponding to the case where the character before or after the character to be corrected is the same as the character to be corrected,
A character string corresponding to the case where the letter or syllable of the letter or the letter of the letter to be corrected is the same as the letter or syllable of the letter to be corrected,
Correcting the user query according to the correction query,
Deleting the letter or syllable of the letter to be corrected in the typo in accordance with the correction query
User query correction apparatus characterized in that.

The method according to claim 6,
If the user query is a typo query, extracting at least one correction candidate string,
Determining whether the user query is registered in a spelling query of dictionary data consisting of a spelling-sperm query pair; And
Extracting a candidate candidate string corresponding to the user query from the dictionary data when registered in the typo query
User query correction apparatus comprising a.

A computer readable storage medium having stored thereon one or more programs comprising instructions for performing the method of any one of claims 1 to 5.