CN110309258A - A kind of input checking method, server and computer readable storage medium - Google Patents

A kind of input checking method, server and computer readable storage medium Download PDF

Info

Publication number
CN110309258A
CN110309258A CN201810214555.4A CN201810214555A CN110309258A CN 110309258 A CN110309258 A CN 110309258A CN 201810214555 A CN201810214555 A CN 201810214555A CN 110309258 A CN110309258 A CN 110309258A
Authority
CN
China
Prior art keywords
keyword
checked
entity
server
default
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810214555.4A
Other languages
Chinese (zh)
Other versions
CN110309258B (en
Inventor
李小文
李晟
刘松
梁俊
蒋忠强
陈敏
杨东
王伟
邢荣荣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
China Mobile Suzhou Software Technology Co Ltd
China Mobile Chengdu ICT Co Ltd
Original Assignee
Zhongchang (suzhou) Software Technology Co Ltd
China Mobile Communications Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhongchang (suzhou) Software Technology Co Ltd, China Mobile Communications Group Co Ltd filed Critical Zhongchang (suzhou) Software Technology Co Ltd
Priority to CN201810214555.4A priority Critical patent/CN110309258B/en
Publication of CN110309258A publication Critical patent/CN110309258A/en
Application granted granted Critical
Publication of CN110309258B publication Critical patent/CN110309258B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3334Selection or weighting of terms from queries, including natural language queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Abstract

The embodiment of the invention discloses a kind of input checking method, server and computer readable storage mediums, when keyword to be checked is at least two keyword to be checked, corresponding at least two attribute of the keyword to be checked of acquisition at least two, keyword to be checked are to treat corresponding keyword when search statement scans for from preset standard corpus;When the attribute of the first keyword to be checked at least two attributes belong to it is default check attribute when, according to the attribute of the first keyword to be checked, determine the corresponding first instance of the first keyword to be checked;From default entity relationship library, at least one entity relevant to first instance is searched;Based on default relatedness computation method, it is determined from least one entity and the second highest entity to be checked of keyword relevance to be checked, with the process scanned for using first instance and entity to be checked, the second keyword to be checked is the keyword to be checked at least two keywords to be checked in addition to the first keyword to be checked.

Description

A kind of input checking method, server and computer readable storage medium
Technical field
The present invention relates to the search engine technique of internet area more particularly to a kind of input checking method, server and Computer readable storage medium.
Background technique
According to statistics, when user inputs the inquiry of keyword progress relevant information on the internet, the probability of input error is 10%-15%, keywords input of these mistakes, which will lead to network and return to the result of mistake, even comes to nothing return, greatly The speed and accuracy for reducing inquiry therefore the keyword of user's input is corrected and is just seemed especially before search It is important.
Common inspection method has the inspection method based on editing distance, the inspection method based on model and based on dictionary Inspection method.Wherein, editing distance is that a character string changes into minimum edit operation times needed for another character string, is permitted Edit operation include that a character is substituted for another character, be inserted into a character and delete character etc., based on compiling The inspection method for collecting distance is that inquiry and keyword editing distance are the smallest as a result, it is possible to check more from editing distance dictionary The problem of word hiatus;Inspection method based on model is using mass data training pattern, later using model come to input The problem of keyword is checked, can check unisonance malapropism;Inspection method based on dictionary is first to record in checking dictionary Enter malformed queries information and corresponding correct query information, searches just firmly believing for replacement keyword from inspection dictionary later The problem of ceasing, capable of checking nearly word form.
However, the inspection method based on editing distance, the inspection method based on model and the inspection method based on dictionary are all The misspelling that can only identify character string, when user input " chopsticks brother apple " when, be using above-mentioned three kinds of inspection methods There is no a misspelling, however what user wished to search is " chopsticks brother griggles ", can not combine user using the prior art The semanteme of input come check user input character string, it is low so as to cause the accuracy of inspection.
Summary of the invention
In order to solve the above technical problems, an embodiment of the present invention is intended to provide a kind of input checking method, server and calculating Machine readable storage medium storing program for executing, can in conjunction with user input semanteme come check user input character string, improve the accuracy of inspection.
The technical scheme of the present invention is realized as follows:
The embodiment of the present invention provides a kind of input checking method, comprising:
When keyword to be checked is at least two keyword to be checked, acquisition is described at least from preset standard corpus Corresponding at least two attribute of two keywords to be checked, the keyword to be checked are to treat when search statement scans for pair The keyword answered;
When the attribute of the first keyword to be checked at least two attribute belongs to default inspection attribute, according to described The attribute of first keyword to be checked determines the corresponding first instance of the described first keyword to be checked;
From default entity relationship library, at least one entity relevant to the first instance is searched;
Based on default relatedness computation method, determined from least one described entity and the second keyword phase to be checked The highest entity to be checked of Guan Du, with the process scanned for using the first instance keyword and the entity to be checked, Second keyword to be checked be described at least two keywords to be checked in addition to the described first keyword to be checked to Check keyword.
It is in the above-mentioned methods, described that described at least two keywords to be checked are obtained from preset standard corpus is corresponding Before at least two attributes, the method also includes:
When receiving the sentence to be searched of user's input, using default input template, know from the sentence to be searched It Chu not the first keyword;
According to default spell check strategy, determined from the preset standard corpus first keyword it is corresponding to Check keyword.
In the above-mentioned methods, described based on default relatedness computation method and at least one described entity, it calculates and second The highest entity to be checked of keyword relevance to be checked, comprising:
According to default editing distance calculation method, calculate at least one described entity and the described second keyword to be checked it Between at least one editing distance;
From at least one described editing distance, the smallest first editing distance of editing distance and described are determined The corresponding entity to be checked of one editing distance.
In the above-mentioned methods, described according to default spell check strategy, institute is determined from the preset standard corpus State the corresponding keyword to be checked of the first keyword, comprising:
First keyword is matched with preset standard corpus;
It, will be described when the first predetermined keyword in first keyword and the preset standard corpus mismatches First keyword is converted to phonetic to be checked;
When the phonetic to be checked is matched with the first phonetic in the preset standard corpus, obtains described first and spell Corresponding second predetermined keyword of sound;
Second predetermined keyword is added in the keyword to be checked.
In the above-mentioned methods, it is described first keyword is converted into phonetic to be checked after, the method also includes:
When the phonetic to be checked and first phonetic mismatch, lookup and institute from the preset standard corpus State the nearest third predetermined keyword of the editing distance of the first keyword;
The third predetermined keyword is added in the keyword to be checked.
In the above-mentioned methods, the editing distance searched from the preset standard corpus with first keyword It is described that the third predetermined keyword is added to it in the keyword to be checked after nearest third predetermined keyword Before, the method also includes:
When in the third predetermined keyword including at least two keywords, searched from the preset standard corpus Corresponding at least two searching times of at least two keyword;
According at least two searching times, from least two keyword, determine that searching times are most 4th predetermined keyword;Correspondingly,
It is described that the third predetermined keyword is added in the keyword to be checked, comprising:
4th predetermined keyword is added in the keyword to be checked.
In the above-mentioned methods, it is described first keyword is matched with preset standard corpus after, the side Method further include:
When first keyword is matched with first predetermined keyword, first keyword is added to described In keyword to be checked.
In the above-mentioned methods, it is described with the process that is scanned for using the first instance and the entity to be checked it Afterwards, the method also includes:
The second historical search of the first historical search number and the entity to be checked of the first instance is obtained respectively Number;
The first historical search number and the second historical search number are added to the preset standard corpus In.
In the above-mentioned methods, described according to default spell check strategy, from the preset standard corpus described in determination After the corresponding keyword to be checked of first keyword, the method also includes:
When the number for judging the keyword to be checked is for the moment, to be scanned for using the keyword to be checked.
The embodiment of the present invention provides a kind of server, and the server includes: processor, memory and communication bus, institute Processor is stated for executing the operation program stored in the memory, to perform the steps of
When keyword to be checked is at least two keyword to be checked, acquisition is described at least from preset standard corpus Corresponding at least two attribute of two keywords to be checked, the keyword to be checked are to treat when search statement scans for pair The keyword answered;When the attribute of the first keyword to be checked at least two attribute belongs to default inspection attribute, according to The attribute of first keyword to be checked determines the corresponding first instance of the described first keyword to be checked;From default entity In relationship library, at least one entity relevant to the first instance is searched;Based on default relatedness computation method, from it is described to Determined in a few entity with the second highest entity to be checked of keyword relevance to be checked, to utilize the first instance The process that keyword and the entity to be checked scan for, second keyword to be checked are described at least two to be checked Keyword to be checked in keyword in addition to the described first keyword to be checked.
In above-mentioned server, the processor is also used to when receiving the sentence to be searched of user's input, using pre- If input template, the first keyword is identified from the sentence to be searched;According to default spell check strategy, preset from described The corresponding keyword to be checked of first keyword is determined in standard corpus library.
In above-mentioned server, the processor is also used to according to editing distance calculation method is preset, and calculating is described at least At least one editing distance between one entity and second keyword to be checked;From at least one described editing distance In, determine the smallest first editing distance of editing distance and the corresponding entity to be checked of first editing distance.
In above-mentioned server, the processor is also used to carry out first keyword and preset standard corpus Matching;When the first predetermined keyword in first keyword and the preset standard corpus mismatches, by described the One keyword is converted to phonetic to be checked;When the phonetic to be checked is matched with the first phonetic in the preset standard corpus When, obtain corresponding second predetermined keyword of first phonetic;Second predetermined keyword is added to described to be checked In keyword.
In above-mentioned server, the processor is also used to mismatch when the phonetic to be checked and first phonetic When, the third predetermined keyword nearest with the editing distance of first keyword is searched from the preset standard corpus; The third predetermined keyword is added in the keyword to be checked.
In above-mentioned server, the processor, being also used to work as in the third predetermined keyword includes at least two passes When keyword, corresponding at least two searching times of at least two keyword are searched from the preset standard corpus;Root According at least two searching times, from least two keyword, the 4th most default pass of searching times is determined Keyword;4th predetermined keyword is added in the keyword to be checked.
In above-mentioned server, the processor is also used to when first keyword and first predetermined keyword When matching, first keyword is added in the keyword to be checked.
In above-mentioned server, the processor is also used to obtain the first historical search time of the first instance respectively Several and the entity to be checked the second historical search number;By the first historical search number and second historical search Number is added in the preset standard corpus.
In above-mentioned server, the processor is also used to when the number for judging the keyword to be checked be a period of time, It is scanned for using the keyword to be checked.
The embodiment of the present invention provides a kind of computer readable storage medium, is stored thereon with computer program, is applied to clothes Business device, realizes such as any of the above-described input checking method when which is executed by processor.
The embodiment of the invention provides a kind of input checking method, server and computer readable storage mediums, when to be checked Look into keyword be at least two keywords to be checked when, from preset standard corpus obtain at least two keywords pair to be checked At least two attributes answered, keyword to be checked are to treat corresponding keyword when search statement scans for;When at least two When the attribute of the first keyword to be checked belongs to default inspection attribute in attribute, according to the attribute of the first keyword to be checked, really The corresponding first instance of fixed first keyword to be checked;From default entity relationship library, lookup is relevant to first instance at least One entity;Based on default relatedness computation method, determined from least one entity related to the second keyword to be checked Highest entity to be checked is spent, with the process scanned for using first instance and entity to be checked, the second keyword to be checked For the keyword to be checked at least two keywords to be checked in addition to the first keyword to be checked.Using above method realization side Case is provided with the default entity relationship library for storing relationship between entity on server, when server receive at least two to When checking keyword, such as " chopsticks brother apple ", server searches the first keyword pair to be checked from default entity relationship library The song that the first instance " chopsticks brother " answered, and at least one entity relevant to " chopsticks brother ", i.e. " chopsticks brother " are sung Song, and determined and the second highest reality to be checked of keyword " apple " degree of correlation to be checked from the song that " chopsticks brother " sings Body " griggles ", at this point, it is highest two realities of the degree of correlation that first instance " chopsticks brother " and entity to be checked, which are " griggles ", Body, server by utilizing " chopsticks brother griggles " scan for, and server can be in conjunction with the semantic to keyword to be checked of user Error correction is carried out, to improve the accuracy of input checking.
Detailed description of the invention
Fig. 1 is a kind of flow chart one of input checking method provided in an embodiment of the present invention;
Fig. 2 is a kind of corresponding module map of illustrative input checking method provided in an embodiment of the present invention;
Fig. 3 is the relation schematic diagram between a kind of illustrative entity and entity provided in an embodiment of the present invention;
Fig. 4 is a kind of inspection logical schematic of illustrative input checking provided in an embodiment of the present invention;
Fig. 5 is a kind of flowchart 2 of input checking method provided in an embodiment of the present invention;
Fig. 6 is a kind of flow chart of illustrative rule-based error correction provided in an embodiment of the present invention;
Fig. 7 is a kind of flow chart illustratively based on semantic error correction provided in an embodiment of the present invention;
Fig. 8 is a kind of structural schematic diagram one of server provided in an embodiment of the present invention.
Specific embodiment
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, below in conjunction with of the invention real The attached drawing in example is applied, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described implementation Example is only a part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, this field is common Technical staff's every other embodiment obtained without making creative work belongs to what the present invention protected Range.
Embodiment one
The embodiment of the present invention provides a kind of input checking method, as shown in Figure 1, this method may include:
S101, when keyword to be checked be at least two keywords to be checked when, from preset standard corpus obtain to Corresponding at least two attribute of few two keywords to be checked, keyword to be checked be treat it is corresponding when search statement scans for Keyword.
A kind of pass of the input checking method provided in an embodiment of the present invention suitable for the sentence that server inputs user Keyword carries out under the scene of error correction.
In the embodiment of the present invention, input checking method is as shown in Fig. 2, include building library, online fault-tolerant and increment offline Practise three modules, building library offline is that corpus is pre-processed and trained, obtain template, corpus dictionary, editing distance dictionary and Knowledge mapping;It is online it is fault-tolerant be the sentence to be searched for receiving user's input after, using building template, the corpus that library obtains offline Dictionary, editing distance dictionary and knowledge mapping carry out online fault-tolerant;Incremental learning be it is online it is fault-tolerant after, from search log It determines the searching times updated, and the searching times of update is updated in corresponding dictionary, to complete the process of incremental learning.
In the embodiment of the present invention, when server receives the sentence to be checked that user inputs in input frame, using default Input template identifies the first keyword from sentence to be checked, and later, server judges the number of the first keyword, works as clothes Business device judge the first keyword number be at least two when, server searched from preset standard corpus at least two to Check corresponding at least two attribute of keyword.
Further, before corresponding at least two attribute of the keyword to be checked of server acquisition at least two, service Device determines the corresponding keyword to be checked of the first keyword according to default spell check strategy from preset standard corpus.
In the embodiment of the present invention, server by utilizing crawler system or database obtain default corpus information, wherein pre- If in corpus information include physical name, the corresponding attribute of physical name and searching times, such as: Zhou Jielun, singer and 10,000 It is secondary;Wherein, Zhou Jielun is physical name, attribute is singer, searching times are 10,000 times.
In the embodiment of the present invention, server pre-processes the default corpus information got, presets corpus information packet Include the corresponding entity attribute of physical name, physical name and searching times, wherein pretreatment includes by the Chinese character in default corpus information Traditional font is converted into simplified, symbol full-shape and is converted into half-angle, and letter capitalization is converted into small letter, and remove some additional character spaces with And forbidden character, then server is trained the default corpus information after pretreatment, generate respectively default input template, Preset standard corpus and knowledge mapping (default entity relationship library), wherein preset standard corpus include corpus dictionary and Editing distance dictionary, what is stored in corpus dictionary is physical name, including Chinese and corresponding English name, specific basis Actual conditions are selected, and the embodiment of the present invention does not do specific restriction;What is stored in editing distance dictionary is physical name, entity Attribute and searching times, are specifically selected according to the actual situation, and the embodiment of the present invention does not do specific restriction.
In the embodiment of the present invention, default editing distance calculation method is provided in editing distance dictionary, server can root The keyword nearest with the first keyword editing distance is determined according to default editing distance calculation method.
In the embodiment of the present invention, server identifies the first key first with default input module from sentence to be searched Then word searches the first keyword from corpus dictionary, when finding, the first keyword is determined as keyword to be checked; When not finding the first keyword, the first keyword is converted into English, then searches the first of English from corpus dictionary and closes Keyword.When finding, the corresponding keyword of the first keyword of the English in corpus dictionary is determined as keyword to be checked; When not finding, searched from editing distance dictionary with the first the smallest keyword of keyword editing distance, and by the key Word is determined as keyword to be checked.
In the embodiment of the present invention, it is corresponding extremely that server by utilizing editing distance dictionary obtains at least two keywords to be checked Few two attributes.
In the embodiment of the present invention, server judges the process of the number of the first keyword, is to identify the first keyword Later or after server determines the corresponding keyword to be checked of the first keyword, it specifically carries out according to the actual situation Selection, the embodiment of the present invention do not do specific restriction.
S102, when the attribute of the first keyword to be checked at least two attributes belong to it is default check attribute when, according to The attribute of first keyword to be checked determines the corresponding first instance of the first keyword to be checked.
When server gets corresponding at least two attribute of at least two keywords to be checked from preset standard material library Later, server will successively judge whether at least two attributes belong to default inspection attribute, when server judges at least two When the attribute of the first keyword to be checked in a attribute belongs to default inspection attribute, server will be according to the first pass to be checked The attribute of keyword determines the corresponding first instance of the first keyword to be checked.
The default inspection attribute for needing further to be checked is pre-set in the embodiment of the present invention, on server, when After server determines at least two attributes, server successively judges whether at least two attributes belong to default inspection attribute, When the attribute for judging at least two attributes first keyword to be checked belong to it is default check attribute when, server is from knowledge In map, the corresponding first instance of the keyword to be checked of lookup first.
S103, from default entity relationship library, search relevant to first instance at least one entity.
When server determines the first keyword corresponding first to be checked in fact according to the attribute of the first keyword to be checked After body, server will search at least one entity relevant to first instance from default entity relationship library.
In the embodiment of the present invention, knowledge mapping is a kind of data structure based on figure, is made of node and side, wherein One entity of each node on behalf, relationship of each edge between entity and entity.Entity-relation-reality can be used in knowledge mapping The triple of body indicates, with traditional resource description framework (RDF, Resource Description Framework), or Chart database is stored, and knowledge mapping is used to inquire and at least one related entity of first instance.
As shown in figure 3, Zhou Jielun and cowboy it is extremely busy this to the relationship between entity being works.
S104, it is based on default relatedness computation method, is determined from least one entity and the second keyword to be checked The highest entity to be checked of the degree of correlation, with the process scanned for using first instance and entity to be checked, the second pass to be checked Keyword is the keyword to be checked at least two keywords to be checked in addition to the first keyword to be checked.
After server finds at least one entity relevant to first instance from default entity relationship library, service Device will be determined from least one entity with the second keyword relevance to be checked most based on default relatedness computation method High entity to be checked, and scanned for using first instance and entity to be checked.
In the embodiment of the present invention, presetting relatedness computation method includes editing distance calculation method and vector space model Deng specifically being selected according to the actual situation, the embodiment of the present invention does not do specific restriction.
In the embodiment of the present invention, server calculates at least one entity and second according to default editing distance calculation method At least one editing distance between keyword to be checked from least one editing distance, determines editing distance most later The small corresponding entity to be checked of the first editing distance and the first editing distance.
In the embodiment of the present invention, editing distance calculation method is preset are as follows: definition indicates di,jLength is that the character string a of i becomes The editing distance that the character string b that length is j needs.If the final lengths of character string a are m, the final lengths of character string b are n, Then d matrix is the matrix of (m+1) * (n+1), because the conversion between the character string that length is 0 can be indicated.
Wherein, wdel(bi) and wins(aj) it is 1, when i-th of character of character string a is not equal to j-th of character of character string b When, wsub(aj,bi)=1;Otherwise it is equal to 0.
Preferably, server searches at least one entity associated with the first keyword to be checked from knowledge mapping, Judge that the second keyword to be checked whether there is at least one entity, when being present at least one entity, directly returns The first keyword to be checked and the second keyword to be checked are returned, otherwise, according to default editing distance calculation method, calculates at least one At least one editing distance between a entity and the second keyword to be checked.
In the embodiment of the present invention, server is suitable according to from big to small or from small to large by least one editing distance Sequence is ranked up, and specific sort method is selected according to the actual situation, and the embodiment of the present invention does not do specific restriction.
In the embodiment of the present invention, server determines editing distance minimum from least one editing distance after sequence The first editing distance, later, server determines the corresponding entity to be checked of the first editing distance, and using first instance and to Query entity scans for.
Further, it after server by utilizing first instance and entity to be checked scan for, searches for this in log and searches Suo Jilu adds one, and the searching times that this searches for record are updated in editing distance dictionary by server, completes incremental learning Process.
Illustratively, as shown in figure 4, the inspection logic of input checking are as follows: receive sentence inquiry request to be checked it Afterwards, it first treats query statement and carries out deep layer pretreatment, the first keyword is selected including the use of stencil screen, then to the first keyword It is formatted operation, including simplified, symbol full-shape turns half-angle to Complex form of Chinese Character turn, letter capitalization turns small letter, Arabic numerals turn At simple Chinese character, additional character space and forbidden character etc. are removed, is determined in the first keyword using error correction dictionary later With the presence or absence of nearly word form;Carry out error correction rule-based to the first keyword later carries out unisonance malapropism including the use of corpus dictionary Discrimination, carry out the discrimination that fuzzy pinyin and multiword lack word using editing distance dictionary, obtain keyword to be checked;Work as judgement The number of keyword to be checked is at least two out, and the of the first keyword to be checked at least two keywords to be checked When one attribute belongs to default inspection attribute, semantic progress error correction is based on at least two keywords to be checked, including the use of knowledge Map searches relevant to the first keyword to be checked at least one entity, determined from least one entity with it is second to be checked The smallest entity to be checked of editing distance of keyword is looked into, the checking process for treating query statement is completed.
It is understood that being provided with the default entity relationship library for storing relationship between entity on server, work as service Device receives at least two keywords to be checked, and when such as " chopsticks brother apple ", server is searched from default entity relationship library The corresponding first instance of first keyword to be checked " chopsticks brother ", and at least one entity relevant to " chopsticks brother ", That is " chopsticks brother " song for singing, and determined and the second keyword " apple " phase to be checked from the song that " chopsticks brother " sings The highest entity " griggles " to be checked of Guan Du, at this point, first instance " chopsticks brother " and entity to be checked are that " griggles " are Highest two entities of the degree of correlation, server by utilizing " chopsticks brother griggles " scan for, and server can be in conjunction with user's Semanteme carries out error correction to keyword to be checked, to improve the accuracy of input checking.
Embodiment two
The embodiment of the present invention provides a kind of input checking method, as shown in figure 5, this method may include:
S201, when server receives the sentence to be searched of input, server by utilizing presets input template, to be searched The first keyword is identified in sentence.
A kind of pass of the input checking method provided in an embodiment of the present invention suitable for the sentence that server inputs user Keyword carries out under the scene of error correction.
In the embodiment of the present invention, input checking method is as shown in Fig. 2, include building library, online fault-tolerant and increment offline Practise three modules, building library offline is that corpus is pre-processed and trained, obtain template, corpus dictionary, editing distance dictionary and Knowledge mapping;It is online it is fault-tolerant be the sentence to be searched for receiving user's input after, using building template, the corpus that library obtains offline Dictionary, editing distance dictionary and knowledge mapping carry out online fault-tolerant;Incremental learning be it is online it is fault-tolerant after, from search log It determines the searching times updated, and the searching times of update is updated in corresponding dictionary, to complete the process of incremental learning.
In the embodiment of the present invention, user inputs sentence to be checked in input frame, and server is conducive to default input template, from The first keyword is identified in sentence to be checked.
Illustratively, input template being pre-defined, such as " I wants to listen the song of () ", wherein the type of () is singer, When user's input " I wants to listen the song of Chen Yixun ", server is according to the input template pre-defined, by " Chen Yixun " conduct Singer returns.
In the embodiment of the present invention, when server receives the crucial phrase of user's input, server is directly according to participle The crucial phrase that user inputs is decomposed into the first keyword by method.
S202, server match the first keyword with preset standard corpus.
When server by utilizing presets input template, after identifying the first keyword in sentence to be searched, server is just First keyword and preset standard corpus are had matched.
In the embodiment of the present invention, server carries out deep layer pretreatment to the first keyword, including carries out to the first keyword Simplified, symbol full-shape turns half-angle to Complex form of Chinese Character turn, letter capitalization turns small letter, Arabic numerals change into simple Chinese character and removal Additional character, space and forbidden character etc., and utilize the corresponding nearly word form of the first keyword of error correction dictionary lookup;It services later Device carries out rule-based error correction to the first keyword after pretreatment, including the first keyword is searched from corpus dictionary, When finding, the first keyword is determined as keyword to be checked;When not finding the first keyword, by the first keyword It is converted into English, then searches the first keyword of English from corpus dictionary.When finding, by the English in corpus dictionary The corresponding keyword of first keyword is determined as keyword to be checked;When not finding, from editing distance dictionary search with The first the smallest keyword of keyword editing distance, and the keyword is determined as keyword to be checked.
In the embodiment of the present invention, in store malformed queries information and corresponding correct query information in error correction dictionary, and The corresponding attribute of correct query information, for example, configuring " Bi Zhiqian " and " Xue Zhiqian singer " in error correction dictionary, such first is crucial Word can be fault-tolerant at " Xue Zhiqian " when being " Bi Zhiqian ", and can recognize that the attribute of " Xue Zhiqian " is " singer ".
In the embodiment of the present invention, server by utilizing crawler system or database obtain default corpus information, wherein pre- If in corpus information include physical name, the corresponding attribute of physical name and searching times, such as: Zhou Jielun, singer and 10,000 It is secondary;Wherein, Zhou Jielun is physical name, attribute is singer, searching times are 10,000 times.
In the embodiment of the present invention, server is trained the default corpus information after pretreatment, generates respectively default Input template, preset standard corpus and knowledge mapping (default entity relationship library), wherein preset standard corpus includes language Expect dictionary and editing distance dictionary, what is stored in corpus dictionary is physical name, including Chinese and corresponding English name, tool Body is selected according to the actual situation, and the embodiment of the present invention does not do specific restriction;What is stored in editing distance dictionary is real Body name, entity attribute and searching times, are specifically selected according to the actual situation, and the embodiment of the present invention does not do specific limit It is fixed.
In the embodiment of the present invention, default editing distance calculation method is provided in editing distance dictionary, server can root The keyword nearest with the first keyword editing distance is determined according to default editing distance calculation method.
In the embodiment of the present invention, server judges the word length of the first keyword, when word length is less than preset length When threshold value, the first keyword is searched from corpus dictionary.
Illustratively, if the first keyword is for Chinese and when length is less than 10, it is crucial that first is searched from corpus dictionary Word.
S203, when the first keyword is matched with the first predetermined keyword, the first keyword is added to be checked by server It looks into keyword.
When server determines that the first keyword is matched with the first predetermined keyword in preset standard corpus, service First keyword will be added to keyword to be checked and suffered by device.
In the embodiment of the present invention, when server judges the first predetermined keyword in the first keyword and corpus dictionary Timing, characterization server inquires the first keyword from corpus dictionary, at this point, server searches the from editing distance dictionary Corresponding 5th attribute of one predetermined keyword.
In the embodiment of the present invention, the first keyword is added in keyword to be checked by server respectively, by the 5th attribute It is added at least two attributes.
S204, when the first predetermined keyword in the first keyword and preset standard corpus mismatches, server will First keyword is converted to phonetic to be checked.
When server judges that the first predetermined keyword in the first keyword and preset standard corpus mismatches, clothes First keyword will be converted to phonetic to be checked by business device.
In the embodiment of the present invention, when server judges the first predetermined keyword in the first keyword and corpus dictionary not When matching, the first keyword is converted into phonetic to be checked by server.
Illustratively, user inputs " before ", and server does not find the word from corpus dictionary, at this point, clothes Business device will be converted to phonetic " xuezhiqian " " before ".
S203 and S204 is two steps arranged side by side after S202, and specifically selection executes according to the actual situation, this hair Bright embodiment does not do specific restriction.
S205, when phonetic to be checked is matched with the first phonetic in preset standard corpus, server obtain first spell Corresponding second predetermined keyword of sound.
After the first keyword is converted to phonetic to be checked by server, server will be by phonetic to be checked and default The first phonetic in standard corpus library is had matched, when phonetic to be checked is matched with the first phonetic in preset standard corpus When, server obtains corresponding second predetermined keyword of the first phonetic.
In the embodiment of the present invention, server judges the phonetic length of phonetic to be checked, when phonetic length is less than preset length When threshold value, phonetic to be checked is matched with the first phonetic in corpus dictionary, when successful match, server obtains first Corresponding second predetermined keyword of phonetic and corresponding second attribute of the second predetermined keyword.
Illustratively, if the length of phonetic to be checked is less than 20, server searches spelling to be checked from corpus dictionary Sound.
Illustratively, it is " Xue that server searches " xuezhiqian " corresponding second predetermined keyword from corpus dictionary It is modest ", later, server obtained from editing distance dictionary " Xue Zhiqian " attribute be " singer ".
Second predetermined keyword is added in keyword to be checked by S206, server.
After server gets the second predetermined keyword, the second predetermined keyword will be added to be checked by server Keyword is looked into suffer.
In the embodiment of the present invention, the second predetermined keyword is added in keyword to be checked by server, by the second attribute It is added at least two attributes.
S207, when phonetic to be checked and the first phonetic mismatch, server is searched and the from preset standard corpus The nearest third predetermined keyword of the editing distance of one keyword.
When server judges phonetic to be checked and the first phonetic mismatches, server will be from preset standard corpus The middle lookup third predetermined keyword nearest with the editing distance of the first keyword.
In the embodiment of the present invention, when server judges phonetic to be checked and the first phonetic mismatches, server by utilizing Editing distance dictionary searches the third predetermined keyword and third predetermined keyword nearest with the editing distance of the first keyword Corresponding third attribute.
Illustratively, as shown in fig. 6, server carries out the process of rule-based error correction to the first keyword are as follows:
1, operation is formatted to the first keyword;
2, the first keyword is matched with corpus dictionary;
3, when successful match, the first keyword is determined as keyword to be checked;
4, when it fails to match, the first keyword is converted into phonetic to be detected;
5, phonetic to be detected is matched with the first phonetic in corpus dictionary;
6, when successful match, corresponding second predetermined keyword of the first phonetic is searched from corpus dictionary, and by second Predetermined keyword is determined as keyword to be checked;
7, the smallest using editing distance dictionary lookup and the editing distance of the first keyword to be detected when it fails to match Third predetermined keyword, and third predetermined keyword is determined as keyword to be checked.
S205-S206 and S207 is two steps arranged side by side after S204, and specifically selection executes according to the actual situation, The embodiment of the present invention does not do specific restriction.
S208, when in third predetermined keyword include at least two keywords when, server is from preset standard corpus Search corresponding at least two searching times of at least two keywords.
After server finds out the third predetermined keyword nearest with the editing distance of the first keyword, server is just Judge in third predetermined keyword whether to include at least two keywords, when server is judged in third predetermined keyword When including at least two keywords, server searches at least two keywords corresponding at least two from preset standard corpus Searching times.
In the embodiment of the present invention, when server judges the editing distance for having at least two keywords all with the first keyword When nearest, server searches corresponding at least two searching times of at least two keywords from editing distance dictionary.
S209, server from least two keywords, determine that searching times are most according at least two searching times The 4th predetermined keyword.
After server finds at least two keywords corresponding at least two searching times, server will basis At least two searching times determine the 4th most predetermined keyword of searching times from least two keywords.
In the embodiment of the present invention, server is at least two searching times according to descending or ascending suitable Sequence is ranked up, and later, server therefrom determines the 4th most predetermined keyword of searching times and the 4th pass to be checked Corresponding 4th attribute of keyword.
4th predetermined keyword is added in keyword to be checked by S210, server.
After server determines four most predetermined keyword of searching times, server will be by the 4th default pass Keyword is added to keyword to be checked and suffers.
In the embodiment of the present invention, the 4th predetermined keyword is added in keyword to be checked by server, by the 4th attribute It is added at least two attributes.
S211, when server judge the number of keyword to be checked for for the moment, server by utilizing keyword to be checked into Row search.
After server determines the corresponding keyword to be checked of the first keyword from preset standard corpus, service Device will judge the number of keyword to be checked, when the number of keyword to be checked is for the moment that server will utilize to be checked Keyword is looked into scan for.
In the embodiment of the present invention, server scans for keyword to be checked, and search result is shown and is shown currently Show on interface.
S212, when keyword to be checked be at least two keywords to be checked when, server is from preset standard corpus Obtain corresponding at least two attribute of at least two keywords to be checked.
When server determined from preset standard corpus at least two keywords corresponding at least two to be checked to After checking keyword, it is corresponding at least that server will obtain at least two keywords to be checked from preset standard corpus Two attributes.
In the embodiment of the present invention, at least two keywords corresponding at least two to be checked are stored in editing distance dictionary Attribute obtains at least two keys to be checked when server by utilizing editing distance dictionary carries out spelling error correction to the first keyword When word, server can obtain corresponding at least two attribute of at least two keywords to be checked simultaneously.
S211 and S212 is two steps arranged side by side after S210, is specifically selected according to the actual situation, this hair Bright embodiment does not do specific restriction.
S213, when the attribute of the first keyword to be checked at least two attributes belong to it is default check attribute when, server According to the attribute of the first keyword to be checked, the corresponding first instance of the first keyword to be checked is determined.
Belong to when server gets at least two keywords corresponding at least two to be checked from preset standard corpus Property after, server will successively judge at least two attributes whether belong to it is default check attribute, when server judges the The attribute of one keyword to be checked belong to it is default when checking attribute, server will according to the attribute of the first keyword to be checked, Determine the corresponding first instance of the first keyword to be checked.
The default inspection attribute for needing further to be checked is pre-set in the embodiment of the present invention, on server, when After server determines at least two attributes, server successively judges whether at least two attributes belong to default inspection attribute, When the attribute for judging the first keyword to be checked belong to it is default check attribute when, server is searched and the from knowledge mapping The corresponding first instance of one keyword to be checked.
S214, server search at least one entity relevant to first instance from default entity relationship library.
When server is according to the attribute of the first keyword to be checked, the corresponding first instance of the first keyword to be checked is determined Later, server will search at least one entity relevant to first instance from default entity relationship library.
In the embodiment of the present invention, knowledge mapping is a kind of data structure based on figure, is made of node and side, wherein One entity of each node on behalf, relationship of each edge between entity and entity.Entity-relation-reality can be used in knowledge mapping The triple of body indicates, is stored with traditional RDF or chart database, and knowledge mapping is used to inquire and the first pass to be checked At least one related entity of keyword.
In the embodiment of the present invention, the generating process of knowledge mapping are as follows: for structuring and partly-structured data, utilize D2R or metadata acquisition tool carry out batch processing, therefrom extract entity and attribute, and establish the triple of entity relationship, structure Build knowledge mapping;For non-structured text information, using natural language processing technique, text is segmented, syntax according to The constraint for relying analysis, identification class is configured to the entity of respective class, and supplementary data value to the vocabulary for meeting constraint.
As shown in figure 3, Zhou Jielun and cowboy it is extremely busy this to the relationship between entity being works.
S215, server calculate at least one entity and the second key to be checked according to default editing distance calculation method At least one editing distance between word.
After server finds at least one entity relevant to the first keyword to be checked, server will basis Default editing distance calculation method, calculates at least one editing distance between at least one entity and the second keyword to be checked , wherein the second keyword to be checked is to be checked in addition to the first keyword to be checked at least two keywords to be checked Keyword.
In the embodiment of the present invention, editing distance calculation method is preset are as follows: definition indicates di,jLength is that the character string a of i becomes The editing distance that the character string b that length is j needs.If the final lengths of character string a are m, the final lengths of character string b are n, Then d matrix is the matrix of (m+1) * (n+1), because the conversion between the character string that length is 0 can be indicated.
Wherein, wdel(bi) and wind(aj) it is 1, when i-th of character of character string a is not equal to j-th of character of character string b When, wsub(aj,bi)=1;Otherwise it is equal to 0.
Preferably, server searches at least one entity associated with the first keyword to be checked from knowledge mapping, Judge that the second keyword to be checked whether there is at least one entity, when being present at least one entity, directly returns The first keyword to be checked and the second keyword to be checked are returned, otherwise, according to default editing distance calculation method, calculates at least one At least one editing distance between a entity and the second keyword to be checked.
S216, server from least one editing distance, determine the smallest first editing distance of editing distance and The corresponding entity to be checked of first editing distance, with the process scanned for using first instance and entity to be checked.
When server calculate at least one editing distance between at least one entity and the second keyword to be checked it Afterwards, server will determine the smallest first editing distance of editing distance, and determine first from least one editing distance The corresponding entity to be checked of editing distance, with the process scanned for using first instance and entity to be checked.
In the embodiment of the present invention, server is suitable according to from big to small or from small to large by least one editing distance Sequence is ranked up, and specific sort method is selected according to the actual situation, and the embodiment of the present invention does not do specific restriction.
In the embodiment of the present invention, server determines editing distance minimum from least one editing distance after sequence The first editing distance, later, server determines the corresponding entity to be checked of the first editing distance, and using first instance and to Query entity scans for.
Illustratively, as shown in fig. 7, server carries out semantic-based error correction logic to the first keyword to be checked are as follows:
1, at least one entity relevant to the first keyword to be checked is searched from knowledge mapping;
2, judge the second keyword to be checked whether with matched at least one entity;
3, the first keyword to be checked and the second keyword to be checked upon a match, are directly returned;
4, when mismatching, at least one editor between at least one entity and the second keyword to be checked is successively calculated Distance;
5, from least one editing distance, the smallest first editing distance of editing distance and the first editor are determined Apart from corresponding entity to be checked;
6, the first keyword to be checked and entity to be checked are returned.
Second history of the first historical search number and entity to be checked that S217, server obtain first instance respectively is searched Rope number.
After server by utilizing first instance and entity to be checked scan for, server will obtain respectively first to Check the first historical search number of keyword and the second historical search number of entity to be checked.
In the embodiment of the present invention, server obtains the first historical search number of first instance and to be checked from search log Ask the second historical search number of entity.
First historical search number and the second historical search number are added to preset standard corpus by S218, server In.
When server obtains the first historical search number and the second historical search of entity to be checked of first instance respectively After number, the first historical search number and the second historical search number will be added in preset standard corpus by server ?.
In the embodiment of the present invention, server by the first historical search number and the second historical search number be updated to editor away from Corresponding position from dictionary, to complete the process of incremental learning.
It is understood that being provided with the default entity relationship library for storing relationship between entity on server, work as service Device receives at least two keywords to be checked, and when such as " chopsticks brother apple ", server is searched from default entity relationship library The corresponding first instance of first keyword to be checked " chopsticks brother ", and at least one entity relevant to " chopsticks brother ", That is " chopsticks brother " song for singing, and determined and the second keyword " apple " phase to be checked from the song that " chopsticks brother " sings The highest entity " griggles " to be checked of Guan Du, at this point, first instance " chopsticks brother " and entity to be checked are that " griggles " are Highest two entities of the degree of correlation, server by utilizing " chopsticks brother griggles " scan for, and server can be in conjunction with user's Semanteme carries out error correction to keyword to be checked, to improve the accuracy of input checking.
Embodiment three
Fig. 8 is the composed structure schematic diagram one for the server that the embodiment of the present invention proposes, in practical applications, based on implementation Under example one to the same inventive concept of embodiment two, as shown in figure 8, the server 1 of the embodiment of the present invention include: processor 10, Memory 11 and communication bus 12.During specific embodiment, above-mentioned processor 10 can be the integrated electricity of special-purpose Road (ASIC, Application Specific Integrated Circuit), digital signal processor (DSP, Digital Signal Processor), digital signal processing appts (DSPD, Digital Signal Processing Device), can Programmed logic equipment (PLD, Programmable Logic Device), field programmable gate array (FPGA, Field Programmable Gate Array), CPU, controller, at least one of microcontroller, microprocessor.It is to be appreciated that For different equipment, the electronic device for realizing above-mentioned processor function can also be it is other, the embodiment of the present invention is not made It is specific to limit.
In an embodiment of the present invention, above-mentioned communication bus 12 is for realizing the connection between processor 10 and memory 11 Communication;Above-mentioned processor 10 is for executing the operation program stored in memory 11, to perform the steps of
Above-mentioned processor 10 is used for when keyword to be checked is at least two keyword to be checked, from preset standard language Corresponding at least two attribute of described at least two keywords to be checked is obtained in material library, the keyword to be checked is to treat to search Rope sentence corresponding keyword when scanning for;When the attribute of the first keyword to be checked at least two attribute belong to it is pre- If check attribute, according to the attribute of the described first keyword to be checked, the described first keyword to be checked corresponding the is determined One entity;From default entity relationship library, at least one entity relevant to the first instance is searched;Based on the default degree of correlation Calculation method, determined from least one described entity with the second highest entity to be checked of keyword relevance to be checked, With the process scanned for using the first instance and the entity to be checked, second keyword to be checked be it is described extremely Keyword to be checked in few two keywords to be checked in addition to the described first keyword to be checked.
In the embodiment of the present invention, further, above-mentioned processor 10 is also used to when the language to be searched for receiving user's input When sentence, using default input template, the first keyword is identified from the sentence to be searched;According to default spell check plan Slightly, the corresponding keyword to be checked of first keyword is determined from the preset standard corpus.
In embodiments of the present invention, further, above-mentioned processor 10 is also used to according to default editing distance calculating side Method calculates at least one editing distance between at least one described entity and second keyword to be checked;From it is described to In a few editing distance, determine that the smallest first editing distance of editing distance and first editing distance are corresponding The entity to be checked.
In embodiments of the present invention, further, above-mentioned processor 10 is also used to first keyword and pre- bidding Quasi- corpus is matched;When the first predetermined keyword in first keyword and the preset standard corpus mismatches When, first keyword is converted into phonetic to be checked;When in the phonetic to be checked and the preset standard corpus When first phonetic matches, corresponding second predetermined keyword of first phonetic is obtained;Second predetermined keyword is added To in the keyword to be checked.
In embodiments of the present invention, further, above-mentioned processor 10 is also used to when the phonetic to be checked and described the When one phonetic mismatches, the third nearest with the editing distance of first keyword is searched from the preset standard corpus Predetermined keyword;The third predetermined keyword is added in the keyword to be checked.
In embodiments of the present invention, further, above-mentioned processor 10 is also used to wrap in the third predetermined keyword When including at least two keywords, at least two keyword corresponding at least two is searched from the preset standard corpus Searching times;According at least two searching times, from least two keyword, determine that searching times are most 4th predetermined keyword;4th predetermined keyword is added in the keyword to be checked.
In embodiments of the present invention, further, above-mentioned processor 10 is also used to when first keyword and described the When one predetermined keyword matches, first keyword is added in the keyword to be checked.
In embodiments of the present invention, further, above-mentioned processor 10 is also used to obtain the of the first instance respectively Second historical search number of one historical search number and the entity to be checked;By the first historical search number and described Second historical search number is added in the preset standard corpus.
In embodiments of the present invention, further, above-mentioned processor 10 is also used to work as and judges the keyword to be checked Number be for the moment, scanned for using the keyword to be checked.
The server that the embodiment of the present invention proposes, when keyword to be checked is at least two keyword to be checked, from pre- It is marked with corresponding at least two attribute of the keyword to be checked of acquisition at least two in quasi- corpus, keyword to be checked is to treat to search Rope sentence corresponding keyword when scanning for;When the attribute of the first keyword to be checked at least two attributes belongs to default inspection When looking into attribute, according to the attribute of the first keyword to be checked, the corresponding first instance of the first keyword to be checked is determined;From default In entity relationship library, at least one entity relevant to first instance is searched;Based on default relatedness computation method, from least one Determined in a entity with the second highest entity to be checked of keyword relevance to be checked, to utilize first instance and to be checked The process that entity scans for, the second keyword to be checked are that the first keyword to be checked is removed at least two keywords to be checked Outer keyword to be checked.It can be seen that the server that the embodiment of the present invention proposes, is provided with to store and close between entity The default entity relationship library of system, when server receives at least two keywords to be checked, when such as " chopsticks brother apple ", service Device is searched the corresponding first instance of the first keyword to be checked " chopsticks brother " from default entity relationship library, and with " chopsticks The song that relevant at least one entity of brother ", i.e. " chopsticks brother " are sung, and determine from the song that " chopsticks brother " sings with The second highest entity " griggles " to be checked of keyword " apple " degree of correlation to be checked, at this point, first instance " chopsticks brother " With entity to be checked be " griggles " be highest two entities of the degree of correlation, server by utilizing " chopsticks brother griggles " is searched Rope, server can carry out error correction to keyword to be checked in conjunction with the semantic of user, to improve the accuracy of input checking.
The embodiment of the present invention provides a kind of computer readable storage medium, and above-mentioned computer-readable recording medium storage has one A perhaps multiple program said ones or multiple programs can be executed by one or more processor, be applied in server, The method such as embodiment one and embodiment two is realized when the program is executed by processor.
Specifically, the corresponding program instruction of one of the present embodiment input checking method read by an electronic equipment or It is performed, includes the following steps:
When keyword to be checked is at least two keyword to be checked, acquisition is described at least from preset standard corpus Corresponding at least two attribute of two keywords to be checked, the keyword to be checked are to treat when search statement scans for pair The keyword answered;
When the attribute of the first keyword to be checked at least two attribute belongs to default inspection attribute, according to described The attribute of first keyword to be checked determines the corresponding first instance of the described first keyword to be checked;
From default entity relationship library, at least one entity relevant to the first instance is searched;
Based on default relatedness computation method, determined from least one described entity and the second keyword phase to be checked The highest entity to be checked of Guan Du, with the process scanned for using the first instance and the entity to be checked, described Two keywords to be checked are the pass to be checked in described at least two keywords to be checked in addition to the described first keyword to be checked Keyword.
In an embodiment of the present invention, further, it is to be checked that described at least two are obtained from preset standard corpus Before corresponding at least two attribute of keyword, said one or multiple programs are held by said one or multiple processors Row, also performs the steps of
When receiving the sentence to be searched of user's input, using default input template, know from the sentence to be searched It Chu not the first keyword;
According to default spell check strategy, determined from the preset standard corpus first keyword it is corresponding to Check keyword.
In an embodiment of the present invention, further, it is based on default relatedness computation method and at least one described entity, Calculate with the second highest entity to be checked of keyword relevance to be checked, said one or multiple programs by said one or The multiple processors of person execute, and implement following steps:
According to default editing distance calculation method, calculate at least one described entity and the described second keyword to be checked it Between at least one editing distance;
From at least one described editing distance, the smallest first editing distance of editing distance and described are determined The corresponding entity to be checked of one editing distance.
In an embodiment of the present invention, further, according to default spell check strategy, from the preset standard corpus In determine the corresponding keyword to be checked of first keyword, said one or multiple programs by said one or more A processor executes, and implements following steps:
First keyword is matched with preset standard corpus;
It, will be described when the first predetermined keyword in first keyword and the preset standard corpus mismatches First keyword is converted to phonetic to be checked;
When the phonetic to be checked is matched with the first phonetic in the preset standard corpus, obtains described first and spell Corresponding second predetermined keyword of sound;
Second predetermined keyword is added in the keyword to be checked.
In an embodiment of the present invention, further, above-mentioned after first keyword being converted to phonetic to be checked One or more program is executed by said one or multiple processors, is performed the steps of later
When the phonetic to be checked and first phonetic mismatch, lookup and institute from the preset standard corpus State the nearest third predetermined keyword of the editing distance of the first keyword;
The third predetermined keyword is added in the keyword to be checked.
In an embodiment of the present invention, further, it is searched from the preset standard corpus and first key After the nearest third predetermined keyword of the editing distance of word, the third predetermined keyword is added to the key to be checked Before in word, said one or multiple programs are executed by said one or multiple processors, are also performed the steps of
When in the third predetermined keyword including at least two keywords, searched from the preset standard corpus Corresponding at least two searching times of at least two keyword;
According at least two searching times, from least two keyword, determine that searching times are most 4th predetermined keyword;Correspondingly,
It is described that the third predetermined keyword is added in the keyword to be checked, said one or multiple programs It is executed by said one or multiple processors, implements following steps:
4th predetermined keyword is added in the keyword to be checked.
In an embodiment of the present invention, further, first keyword is matched with preset standard corpus Later, said one or multiple programs are executed by said one or multiple processors, are also performed the steps of
When first keyword is matched with first predetermined keyword, first keyword is added to described In keyword to be checked
In an embodiment of the present invention, further, it is described with using the first instance and the entity to be checked into After the process of row search, said one or multiple programs are executed by said one or multiple processors, are also realized following Step:
The second historical search of the first historical search number and the entity to be checked of the first instance is obtained respectively Number;
The first historical search number and the second historical search number are added to the preset standard corpus In.
In an embodiment of the present invention, further, according to default spell check strategy, from the preset standard corpus After the corresponding keyword to be checked of middle determination first keyword, said one or multiple programs by said one or Multiple processors execute, and also perform the steps of
When the number for judging the keyword to be checked is for the moment, to be scanned for using the keyword to be checked.
It should be noted that, in this document, the terms "include", "comprise" or its any other variant are intended to non-row His property includes, so that the process, method, article or the device that include a series of elements not only include those elements, and And further include other elements that are not explicitly listed, or further include for this process, method, article or device institute it is intrinsic Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including being somebody's turn to do There is also other identical elements in the process, method of element, article or device.
The serial number of the above embodiments of the invention is only for description, does not represent the advantages or disadvantages of the embodiments.
Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment side Method can be realized by means of software and necessary general hardware platform, naturally it is also possible to by hardware, but in many cases The former is more preferably embodiment.Based on this understanding, technical solution of the present invention substantially in other words does the prior art The part contributed out can be embodied in the form of software products, which is stored in a storage medium In (such as ROM/RAM, magnetic disk, CD), including some instructions are used so that a terminal (can be mobile phone, computer, service Device, air conditioner or network equipment etc.) execute method described in each embodiment of the present invention.
The above description is merely a specific embodiment, but scope of protection of the present invention is not limited thereto, any In the technical scope disclosed by the present invention, any changes or substitutions that can be easily thought of by those familiar with the art, all answers It is included within the scope of the present invention.Therefore, protection scope of the present invention should be with the scope of protection of the claims It is quasi-.

Claims (19)

1. a kind of input checking method characterized by comprising
When keyword to be checked is at least two keyword to be checked, described at least two are obtained from preset standard corpus Corresponding at least two attribute of keyword to be checked, the keyword to be checked be treat it is corresponding when search statement scans for Keyword;
When the attribute of the first keyword to be checked at least two attribute belongs to default inspection attribute, according to described first The attribute of keyword to be checked determines the corresponding first instance of the described first keyword to be checked;
From default entity relationship library, at least one entity relevant to the first instance is searched;
Based on default relatedness computation method, determined from least one described entity and the second keyword relevance to be checked Highest entity to be checked, with the process scanned for using the first instance and the entity to be checked, described second to Check that keyword is the keyword to be checked in described at least two keywords to be checked in addition to the described first keyword to be checked.
2. the method according to claim 1, wherein described obtain described at least two from preset standard corpus Before corresponding at least two attribute of a keyword to be checked, the method also includes:
When receiving the sentence to be searched of input, using default input template, first is identified from the sentence to be searched Keyword;
According to default spell check strategy, determine that first keyword is corresponding to be checked from the preset standard corpus Keyword.
3. the method according to claim 1, wherein it is described based on default relatedness computation method and it is described at least One entity calculates and the second highest entity to be checked of keyword relevance to be checked, comprising:
According to default editing distance calculation method, calculate between at least one described entity and second keyword to be checked At least one editing distance;
From at least one described editing distance, the smallest first editing distance of editing distance and first volume are determined It collects apart from the corresponding entity to be checked.
4. according to the method described in claim 2, it is characterized in that, described according to default spell check strategy, from described default The corresponding keyword to be checked of first keyword is determined in standard corpus library, comprising:
First keyword is matched with preset standard corpus;
When the first predetermined keyword in first keyword and the preset standard corpus mismatches, by described first Keyword is converted to phonetic to be checked;
When the phonetic to be checked is matched with the first phonetic in the preset standard corpus, first phonetic pair is obtained The second predetermined keyword answered;
Second predetermined keyword is added in the keyword to be checked.
5. according to the method described in claim 4, it is characterized in that, described be converted to phonetic to be checked for first keyword Later, the method also includes:
When the phonetic to be checked and first phonetic mismatch, searched from the preset standard corpus and described the The nearest third predetermined keyword of the editing distance of one keyword;
The third predetermined keyword is added in the keyword to be checked.
6. according to the method described in claim 5, it is characterized in that, it is described from the preset standard corpus search with it is described It is described that the third predetermined keyword is added to institute after the nearest third predetermined keyword of the editing distance of first keyword Before stating in keyword to be checked, the method also includes:
When in the third predetermined keyword including at least two keywords, from the preset standard corpus described in lookup Corresponding at least two searching times of at least two keywords;
According at least two searching times, from least two keyword, the 4th of searching times at most is determined Predetermined keyword;Correspondingly,
It is described that the third predetermined keyword is added in the keyword to be checked, comprising:
4th predetermined keyword is added in the keyword to be checked.
7. according to the method described in claim 4, it is characterized in that, described by first keyword and preset standard corpus After being matched, the method also includes:
When first keyword is matched with first predetermined keyword, first keyword is added to described to be checked It looks into keyword.
8. the method according to claim 1, wherein described to utilize the first instance and the reality to be checked After the process that body scans for, the method also includes:
The first historical search number of the first instance and the second historical search number of the entity to be checked are obtained respectively;
The first historical search number and the second historical search number are added in the preset standard corpus.
9. according to the method described in claim 2, it is characterized in that, described according to default spell check strategy, from described default After determining the corresponding keyword to be checked of first keyword in standard corpus library, the method also includes:
When the number for judging the keyword to be checked is for the moment, to be scanned for using the keyword to be checked.
10. a kind of server, which is characterized in that the server includes: processor, memory and communication bus, the processing Device is for executing the operation program stored in the memory, to perform the steps of
When keyword to be checked is at least two keyword to be checked, described at least two are obtained from preset standard corpus Corresponding at least two attribute of keyword to be checked, the keyword to be checked be treat it is corresponding when search statement scans for Keyword;When the attribute of the first keyword to be checked at least two attribute belongs to default inspection attribute, according to described The attribute of first keyword to be checked determines the corresponding first instance of the described first keyword to be checked;From default entity relationship In library, at least one entity relevant to the first instance is searched;Based on default relatedness computation method, from described at least one Determined in a entity with the second highest entity to be checked of keyword relevance to be checked, with crucial using the first instance The process that word and the entity to be checked scan for, second keyword to be checked are described at least two keys to be checked Keyword to be checked in word in addition to the described first keyword to be checked.
11. server according to claim 10, which is characterized in that
The processor is also used to when receiving the sentence to be searched of user's input, using default input template, from it is described to The first keyword is identified in search statement;According to default spell check strategy, institute is determined from the preset standard corpus State the corresponding keyword to be checked of the first keyword.
12. server according to claim 10, which is characterized in that
The processor is also used to calculate at least one described entity and described second according to editing distance calculation method is preset At least one editing distance between keyword to be checked;From at least one described editing distance, determine editing distance most The first small editing distance and the corresponding entity to be checked of first editing distance.
13. server according to claim 11, which is characterized in that
The processor is also used to match first keyword with preset standard corpus;When first key When the first predetermined keyword in word and the preset standard corpus mismatches, first keyword is converted to be checked Phonetic;When the phonetic to be checked is matched with the first phonetic in the preset standard corpus, first phonetic is obtained Corresponding second predetermined keyword;Second predetermined keyword is added in the keyword to be checked.
14. server according to claim 13, which is characterized in that
The processor is also used to when the phonetic to be checked and first phonetic mismatch, from the preset standard language Expect to search the third predetermined keyword nearest with the editing distance of first keyword in library;By the third predetermined keyword It is added in the keyword to be checked.
15. server according to claim 14, which is characterized in that
The processor is also used to when in the third predetermined keyword including at least two keywords, from the pre- bidding Corresponding at least two searching times of at least two keyword are searched in quasi- corpus;According at least two search time Number determines the 4th most predetermined keyword of searching times from least two keyword;By the described 4th default pass Keyword is added in the keyword to be checked.
16. server according to claim 13, which is characterized in that
The processor is also used to close when first keyword is matched with first predetermined keyword by described first Keyword is added in the keyword to be checked.
17. server according to claim 10, which is characterized in that
The processor is also used to obtain the first historical search number and the entity to be checked of the first instance respectively Second historical search number;The first historical search number and the second historical search number are added to the pre- bidding In quasi- corpus.
18. server according to claim 11, which is characterized in that
The processor is also used to when the number for judging the keyword to be checked be for the moment, to utilize the key to be checked Word scans for.
19. a kind of computer readable storage medium, is stored thereon with computer program, it is applied to server, which is characterized in that should Such as claim 1-9 described in any item methods are realized when computer program is executed by processor.
CN201810214555.4A 2018-03-15 2018-03-15 Input checking method, server and computer readable storage medium Active CN110309258B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810214555.4A CN110309258B (en) 2018-03-15 2018-03-15 Input checking method, server and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810214555.4A CN110309258B (en) 2018-03-15 2018-03-15 Input checking method, server and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN110309258A true CN110309258A (en) 2019-10-08
CN110309258B CN110309258B (en) 2022-03-29

Family

ID=68073330

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810214555.4A Active CN110309258B (en) 2018-03-15 2018-03-15 Input checking method, server and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN110309258B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112507073A (en) * 2020-12-07 2021-03-16 云南电网有限责任公司普洱供电局 Content verification method of power distribution network operation file and related equipment
WO2021143299A1 (en) * 2020-01-17 2021-07-22 华为技术有限公司 Semantic error correction method, electronic device and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120232904A1 (en) * 2011-03-10 2012-09-13 Samsung Electronics Co., Ltd. Method and apparatus for correcting a word in speech input text
CN102708100A (en) * 2011-03-28 2012-10-03 北京百度网讯科技有限公司 Method and device for digging relation keyword of relevant entity word and application thereof
US20150154265A1 (en) * 2013-12-02 2015-06-04 Qbase, LLC Search suggestions using fuzzy-score matching and entity co-occurrence
US20160275148A1 (en) * 2015-03-20 2016-09-22 Huawei Technologies Co., Ltd. Database query method and device
CN107526812A (en) * 2017-08-24 2017-12-29 北京奇艺世纪科技有限公司 A kind of searching method, device and electronic equipment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120232904A1 (en) * 2011-03-10 2012-09-13 Samsung Electronics Co., Ltd. Method and apparatus for correcting a word in speech input text
CN102708100A (en) * 2011-03-28 2012-10-03 北京百度网讯科技有限公司 Method and device for digging relation keyword of relevant entity word and application thereof
US20150154265A1 (en) * 2013-12-02 2015-06-04 Qbase, LLC Search suggestions using fuzzy-score matching and entity co-occurrence
US20160275148A1 (en) * 2015-03-20 2016-09-22 Huawei Technologies Co., Ltd. Database query method and device
CN106033466A (en) * 2015-03-20 2016-10-19 华为技术有限公司 Database query method and device
CN107526812A (en) * 2017-08-24 2017-12-29 北京奇艺世纪科技有限公司 A kind of searching method, device and electronic equipment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021143299A1 (en) * 2020-01-17 2021-07-22 华为技术有限公司 Semantic error correction method, electronic device and storage medium
CN112507073A (en) * 2020-12-07 2021-03-16 云南电网有限责任公司普洱供电局 Content verification method of power distribution network operation file and related equipment

Also Published As

Publication number Publication date
CN110309258B (en) 2022-03-29

Similar Documents

Publication Publication Date Title
CN116628172B (en) Dialogue method for multi-strategy fusion in government service field based on knowledge graph
CN110399457B (en) Intelligent question answering method and system
JP6309644B2 (en) Method, system, and storage medium for realizing smart question answer
CN110727779A (en) Question-answering method and system based on multi-model fusion
CN104657439A (en) Generation system and method for structured query sentence used for precise retrieval of natural language
JPH08241335A (en) Method and system for vague character string retrieval using fuzzy indeterminative finite automation
CN104657440A (en) Structured query statement generating system and method
CN101131706A (en) Query amending method and system thereof
CN111488468B (en) Geographic information knowledge point extraction method and device, storage medium and computer equipment
CN112115232A (en) Data error correction method and device and server
US20220245353A1 (en) System and method for entity labeling in a natural language understanding (nlu) framework
CN112507089A (en) Intelligent question-answering engine based on knowledge graph and implementation method thereof
CN115114419A (en) Question and answer processing method and device, electronic equipment and computer readable medium
CN111159381A (en) Data searching method and device
CN110309258A (en) A kind of input checking method, server and computer readable storage medium
CN113157887A (en) Knowledge question-answering intention identification method and device and computer equipment
CN106407332B (en) Search method and device based on artificial intelligence
CN117251455A (en) Intelligent report generation method and system based on large model
CN111898024A (en) Intelligent question and answer method and device, readable storage medium and computing equipment
CN106776590A (en) A kind of method and system for obtaining entry translation
CN115828854A (en) Efficient table entity linking method based on context disambiguation
US20220229990A1 (en) System and method for lookup source segmentation scoring in a natural language understanding (nlu) framework
US20220229986A1 (en) System and method for compiling and using taxonomy lookup sources in a natural language understanding (nlu) framework
US20230142351A1 (en) Methods and systems for searching and retrieving information
CN114780700A (en) Intelligent question-answering method, device, equipment and medium based on machine reading understanding

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20220706

Address after: 610041 China (Sichuan) pilot Free Trade Zone, Chengdu, Sichuan

Patentee after: China Mobile (Chengdu) information and Communication Technology Co.,Ltd.

Patentee after: CHINA MOBILE (SUZHOU) SOFTWARE TECHNOLOGY Co.,Ltd.

Patentee after: CHINA MOBILE COMMUNICATIONS GROUP Co.,Ltd.

Address before: 100032 No. 29, Finance Street, Beijing, Xicheng District

Patentee before: CHINA MOBILE COMMUNICATIONS GROUP Co.,Ltd.

Patentee before: CHINA MOBILE (SUZHOU) SOFTWARE TECHNOLOGY Co.,Ltd.