CN110309258B - Input checking method, server and computer readable storage medium - Google Patents

Input checking method, server and computer readable storage medium Download PDF

Info

Publication number
CN110309258B
CN110309258B CN201810214555.4A CN201810214555A CN110309258B CN 110309258 B CN110309258 B CN 110309258B CN 201810214555 A CN201810214555 A CN 201810214555A CN 110309258 B CN110309258 B CN 110309258B
Authority
CN
China
Prior art keywords
keyword
preset
checked
entity
keywords
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810214555.4A
Other languages
Chinese (zh)
Other versions
CN110309258A (en
Inventor
李小文
李晟
刘松
梁俊
蒋忠强
陈敏
杨东
王伟
邢荣荣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
China Mobile Suzhou Software Technology Co Ltd
China Mobile Chengdu ICT Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
China Mobile Suzhou Software Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, China Mobile Suzhou Software Technology Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN201810214555.4A priority Critical patent/CN110309258B/en
Publication of CN110309258A publication Critical patent/CN110309258A/en
Application granted granted Critical
Publication of CN110309258B publication Critical patent/CN110309258B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3334Selection or weighting of terms from queries, including natural language queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Abstract

The embodiment of the invention discloses an input checking method, a server and a computer readable storage medium, wherein when keywords to be checked are at least two keywords to be checked, at least two attributes corresponding to the at least two keywords to be checked are obtained from a preset standard corpus, and the keywords to be checked are corresponding keywords when sentences to be searched are searched; when the attribute of a first keyword to be checked in the at least two attributes belongs to a preset checking attribute, determining a first entity corresponding to the first keyword to be checked according to the attribute of the first keyword to be checked; searching at least one entity related to the first entity from a preset entity relation library; and determining an entity to be queried with the highest correlation degree with a second keyword to be checked from at least one entity based on a preset correlation degree calculation method so as to utilize the first entity and the entity to be queried to perform a searching process, wherein the second keyword to be checked is the keyword to be checked except the first keyword to be checked in at least two keywords to be checked.

Description

Input checking method, server and computer readable storage medium
Technical Field
The present invention relates to search engine technology in the internet field, and in particular, to an input check method, a server, and a computer-readable storage medium.
Background
According to statistics, when a user inputs keywords on the internet to inquire related information, the probability of input errors is 10% -15%, the input of the wrong keywords can cause the network to return wrong results or even no results, and the inquiring speed and accuracy are greatly reduced, so that the correction of the keywords input by the user before searching is particularly important.
Commonly used inspection methods are an edit distance-based inspection method, a model-based inspection method, and a dictionary-based inspection method. The method comprises the steps that an editing distance is the minimum number of editing operations required by converting a character string into another character string, permitted editing operations comprise the steps of replacing one character with another character, inserting one character, deleting one character and the like, and a result with the minimum editing distance to a keyword is inquired from an editing distance dictionary based on an editing distance checking method, so that the problem of character missing of multiple words can be checked; the model-based checking method is that a large amount of data is used for training a model, and then the model is used for checking input keywords, so that the problem of homophones can be checked; the dictionary-based checking method is characterized in that wrong inquiry information and corresponding correct inquiry information are firstly input into a checking dictionary, and then correct information of a replacement keyword is searched from the checking dictionary, so that the problem of a character form and a word can be checked.
However, the editing distance-based inspection method, the model-based inspection method, and the dictionary-based inspection method can only identify spelling errors of character strings, and when a user inputs "chopsticks brother apple", there is no spelling error using the above three inspection methods, but the user wants to search for "chopsticks brother apple", and the user cannot check the character strings input by the user using the prior art in combination with the semantics input by the user, thereby resulting in low accuracy of inspection.
Disclosure of Invention
To solve the above technical problem, embodiments of the present invention desirably provide an input checking method, a server, and a computer-readable storage medium, which can check a character string input by a user in combination with a semantic meaning input by the user, and improve the accuracy of the check.
The technical scheme of the invention is realized as follows:
the embodiment of the invention provides an input checking method, which comprises the following steps:
when the keywords to be detected are at least two keywords to be detected, acquiring at least two attributes corresponding to the at least two keywords to be detected from a preset standard corpus, wherein the keywords to be detected are keywords corresponding to the sentences to be searched;
when the attribute of a first keyword to be checked in the at least two attributes belongs to a preset checking attribute, determining a first entity corresponding to the first keyword to be checked according to the attribute of the first keyword to be checked;
searching at least one entity related to the first entity from a preset entity relation library;
and based on a preset relevance calculation method, determining an entity to be queried with the highest relevance to a second keyword to be tested from the at least one entity so as to utilize the first entity keyword and the entity to be queried to perform a searching process, wherein the second keyword to be tested is the keyword to be tested except the first keyword to be tested in the at least two keywords to be tested.
In the above method, before the obtaining at least two attributes corresponding to the at least two keywords to be checked from the preset standard corpus, the method further includes:
when a sentence to be searched input by a user is received, identifying a first keyword from the sentence to be searched by using a preset input template;
and determining the keyword to be checked corresponding to the first keyword from the preset standard corpus according to a preset spelling check strategy.
In the above method, the calculating, based on the preset relevance calculating method and the at least one entity, the entity to be queried having the highest relevance to the second keyword to be queried includes:
calculating at least one edit distance between the at least one entity and the second keyword to be checked according to a preset edit distance calculation method;
and determining a first editing distance with the minimum editing distance and the entity to be queried corresponding to the first editing distance from the at least one editing distance.
In the above method, determining the keyword to be checked corresponding to the first keyword from the preset standard corpus according to a preset spell checking policy, includes:
matching the first keyword with a preset standard corpus;
when the first keyword is not matched with a first preset keyword in the preset standard corpus, converting the first keyword into pinyin to be checked;
when the pinyin to be checked is matched with a first pinyin in the preset standard corpus, acquiring a second preset keyword corresponding to the first pinyin;
and adding the second preset keyword to the keyword to be detected.
In the above method, after the converting the first keyword into the pinyin to be checked, the method further includes:
when the pinyin to be checked is not matched with the first pinyin, searching a third preset keyword which is closest to the editing distance of the first keyword from the preset standard corpus;
and adding the third preset keyword to the keyword to be detected.
In the above method, after searching for a third preset keyword that is closest to the editing distance of the first keyword from the preset standard corpus, and before adding the third preset keyword to the keyword to be detected, the method further includes:
when the third preset keyword comprises at least two keywords, searching at least two search times corresponding to the at least two keywords from the preset standard corpus;
determining a fourth preset keyword with the largest searching times from the at least two keywords according to the at least two searching times; accordingly, the method can be used for solving the problems that,
adding the third preset keyword into the keyword to be detected, including:
and adding the fourth preset keyword to the keyword to be detected.
In the above method, after the matching the first keyword with a preset standard corpus, the method further includes:
and when the first keyword is matched with the first preset keyword, adding the first keyword into the keyword to be detected.
In the above method, after the process of searching by using the first entity and the entity to be queried, the method further includes:
respectively acquiring a first historical search frequency of the first entity and a second historical search frequency of the entity to be inquired;
adding the first historical search times and the second historical search times to the preset standard corpus.
In the method, after determining the keyword to be checked corresponding to the first keyword from the preset standard corpus according to a preset spell checking policy, the method further includes:
and when the number of the keywords to be checked is judged to be one, searching by using the keywords to be checked.
An embodiment of the present invention provides a server, where the server includes: the processor is used for executing the running program stored in the memory so as to realize the following steps:
when the keywords to be detected are at least two keywords to be detected, acquiring at least two attributes corresponding to the at least two keywords to be detected from a preset standard corpus, wherein the keywords to be detected are keywords corresponding to the sentences to be searched; when the attribute of a first keyword to be checked in the at least two attributes belongs to a preset checking attribute, determining a first entity corresponding to the first keyword to be checked according to the attribute of the first keyword to be checked; searching at least one entity related to the first entity from a preset entity relation library; and based on a preset relevance calculation method, determining an entity to be queried with the highest relevance to a second keyword to be tested from the at least one entity so as to utilize the first entity keyword and the entity to be queried to perform a searching process, wherein the second keyword to be tested is the keyword to be tested except the first keyword to be tested in the at least two keywords to be tested.
In the server, the processor is further configured to, when receiving a sentence to be searched input by a user, identify a first keyword from the sentence to be searched by using a preset input template; and determining the keyword to be checked corresponding to the first keyword from the preset standard corpus according to a preset spelling check strategy.
In the server, the processor is further configured to calculate at least one edit distance between the at least one entity and the second keyword to be inspected according to a preset edit distance calculation method; and determining a first editing distance with the minimum editing distance and the entity to be queried corresponding to the first editing distance from the at least one editing distance.
In the server, the processor is further configured to match the first keyword with a preset standard corpus; when the first keyword is not matched with a first preset keyword in the preset standard corpus, converting the first keyword into pinyin to be checked; when the pinyin to be checked is matched with a first pinyin in the preset standard corpus, acquiring a second preset keyword corresponding to the first pinyin; and adding the second preset keyword to the keyword to be detected.
In the server, the processor is further configured to search a third preset keyword closest to the editing distance of the first keyword from the preset standard corpus when the pinyin to be checked is not matched with the first pinyin; and adding the third preset keyword to the keyword to be detected.
In the server, the processor is further configured to search, when the third preset keyword includes at least two keywords, at least two search times corresponding to the at least two keywords from the preset standard corpus; determining a fourth preset keyword with the largest searching times from the at least two keywords according to the at least two searching times; and adding the fourth preset keyword to the keyword to be detected.
In the server, the processor is further configured to add the first keyword to the keyword to be checked when the first keyword matches the first preset keyword.
In the server, the processor is further configured to obtain a first historical search frequency of the first entity and a second historical search frequency of the entity to be queried, respectively; adding the first historical search times and the second historical search times to the preset standard corpus.
In the server, the processor is further configured to search by using the keyword to be checked when it is determined that the number of the keyword to be checked is one.
The embodiment of the invention provides a computer readable storage medium, which stores a computer program, is applied to a server, and when the computer program is executed by a processor, the computer program realizes any one of the input checking methods.
The embodiment of the invention provides an input checking method, a server and a computer readable storage medium, wherein when keywords to be checked are at least two keywords to be checked, at least two attributes corresponding to the at least two keywords to be checked are obtained from a preset standard corpus, and the keywords to be checked are corresponding keywords when sentences to be searched are searched; when the attribute of a first keyword to be checked in the at least two attributes belongs to a preset checking attribute, determining a first entity corresponding to the first keyword to be checked according to the attribute of the first keyword to be checked; searching at least one entity related to the first entity from a preset entity relation library; and determining an entity to be queried with the highest correlation degree with a second keyword to be checked from at least one entity based on a preset correlation degree calculation method so as to utilize the first entity and the entity to be queried to perform a searching process, wherein the second keyword to be checked is the keyword to be checked except the first keyword to be checked in at least two keywords to be checked. By adopting the method, the server is provided with a preset entity relational database storing the relation between entities, when the server receives at least two keywords to be checked, such as ' chopsticks brother apple ', the server searches a first entity ' chopsticks brother ' corresponding to the first keyword to be checked and at least one entity related to the chopsticks brother ', namely a song sung by ' chopsticks brother ', and determines an entity to be checked ' apple ' with the highest degree of correlation with a second keyword ' apple ' from the song sung by ' chopsticks brother ', at the moment, the first entity ' chopsticks brother ' and the entity to be checked are ' apples ' with the highest degree of correlation, the server searches by ' chopsticks brother ', the server can correct the keywords to be checked by combining the semantics of the user, thereby improving the accuracy of the input check.
Drawings
Fig. 1 is a first flowchart of an input checking method according to an embodiment of the present invention;
FIG. 2 is a block diagram illustrating an exemplary input checking method according to an embodiment of the present invention;
FIG. 3 is a diagram illustrating exemplary entities and relationships between entities, according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of exemplary input inspection logic according to an embodiment of the present invention;
FIG. 5 is a flowchart illustrating a second input checking method according to an embodiment of the present invention;
FIG. 6 is a flow chart of an exemplary rule-based error correction provided by an embodiment of the present invention;
FIG. 7 is a flowchart of an exemplary semantic error correction based approach provided by an embodiment of the present invention;
fig. 8 is a first schematic structural diagram of a server according to an embodiment of the present invention.
Detailed Description
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.
Example one
An embodiment of the present invention provides an input checking method, as shown in fig. 1, the method may include:
s101, when the keywords to be detected are at least two keywords to be detected, acquiring at least two attributes corresponding to the at least two keywords to be detected from a preset standard corpus, wherein the keywords to be detected are corresponding keywords when the sentences to be searched are searched.
The input checking method provided by the embodiment of the invention is suitable for a scene that a server corrects the error of the keywords in the sentences input by the user.
In the embodiment of the invention, the input checking method is shown in fig. 2 and comprises three modules of off-line library building, on-line fault tolerance and incremental learning, wherein the off-line library building is used for preprocessing and training the corpus to obtain a template, a corpus dictionary, an editing distance dictionary and a knowledge map; the online fault tolerance is to perform online fault tolerance by utilizing a template, a corpus dictionary, an editing distance dictionary and a knowledge map which are obtained by building a library offline after receiving a sentence to be searched and input by a user; incremental learning is a process of determining updated search times from a search log after online fault tolerance, and updating the updated search times into a corresponding dictionary to complete the incremental learning.
In the embodiment of the invention, when receiving the sentence to be checked input in the input box by the user, the server identifies the first keyword from the sentence to be checked by using the preset input template, then, the server judges the number of the first keyword, and when the server judges that the number of the first keyword is at least two, the server searches at least two attributes corresponding to the at least two keywords to be checked from the preset standard corpus.
Further, before the server obtains at least two attributes corresponding to at least two keywords to be checked, the server determines the keywords to be checked corresponding to the first keyword from a preset standard corpus according to a preset spell checking strategy.
In the embodiment of the present invention, a server obtains preset corpus information by using a crawler system or a database, where the preset corpus information includes entity names, attributes corresponding to the entity names, and search times, for example: zhou Jie Lun, singer and 1 ten thousand times; wherein, the Zhou Jie Lun is the entity name, the attribute is singer, and the number of searching times is 1 ten thousand.
In the embodiment of the invention, the server preprocesses the acquired preset corpus information, the preset corpus information comprises an entity name, entity attributes corresponding to the entity name and search times, wherein the preprocessing comprises converting the complex form of the Chinese character in the preset corpus information into a simplified form, converting the full angle of the symbol into a half angle, converting the capital into the lowercase, and removing some special symbol spaces and illegal characters, then the server trains the preprocessed preset corpus information to respectively generate a preset input template, a preset standard corpus and a knowledge graph (a preset entity relation library), the preset standard corpus comprises a corpus dictionary and an editing distance dictionary, entity names are stored in the corpus dictionary and comprise Chinese names and corresponding English names, and the entity names are specifically selected according to actual conditions; the editing distance dictionary stores entity names, entity attributes and search times, and the entity names, the entity attributes and the search times are specifically selected according to actual conditions.
In the embodiment of the invention, the preset editing distance calculation method is arranged in the editing distance dictionary, and the server can determine the keyword closest to the first keyword editing distance according to the preset editing distance calculation method.
In the embodiment of the invention, a server firstly identifies a first keyword from a sentence to be searched by using a preset input module, then searches the first keyword from a corpus dictionary, and determines the first keyword as a keyword to be checked when the first keyword is searched; and when the first keyword is not found, converting the first keyword into English, and then searching the English first keyword from the corpus dictionary. When the keyword is searched, determining the keyword corresponding to the first English keyword in the corpus dictionary as a keyword to be checked; and when the keyword is not searched out, searching the keyword with the minimum editing distance with the first keyword from the editing distance dictionary, and determining the keyword as the keyword to be checked.
In the embodiment of the invention, the server acquires at least two attributes corresponding to at least two keywords to be checked by utilizing the edit distance dictionary.
In the embodiment of the present invention, the process of determining the number of the first keyword by the server is specifically selected according to an actual situation after the first keyword is identified, or after the server determines the keyword to be detected corresponding to the first keyword, which is not specifically limited in the embodiment of the present invention.
S102, when the attribute of the first keyword to be detected in the at least two attributes belongs to the preset detection attribute, determining a first entity corresponding to the first keyword to be detected according to the attribute of the first keyword to be detected.
When the server judges that the attribute of a first keyword to be checked in the at least two attributes belongs to the preset checking attribute, the server determines a first entity corresponding to the first keyword to be checked according to the attribute of the first keyword to be checked.
In the embodiment of the invention, the server is preset with preset checking attributes which need to be further checked, after the server determines at least two attributes, the server sequentially judges whether the at least two attributes belong to the preset checking attributes, and when the attribute of a first keyword to be checked in the at least two attributes belongs to the preset checking attributes, the server searches a first entity corresponding to the first keyword to be checked from the knowledge graph.
S103, at least one entity related to the first entity is searched from a preset entity relation library.
After the server determines a first entity corresponding to the first keyword to be checked according to the attribute of the first keyword to be checked, the server searches at least one entity related to the first entity from a preset entity relation library.
In the embodiment of the invention, the knowledge graph is a data structure based on a graph and consists of nodes and edges, wherein each node represents an entity, and each edge is a relationship between the entities. The knowledge graph may be represented by entity-relationship-entity triples, and stored using a conventional Resource Description Framework (RDF) or a graph database, and is used to query at least one entity related to the first entity.
As shown in fig. 3, the pair of entities, which are both workmanship and jean are busy, are the relationship of the work.
And S104, determining an entity to be queried with the highest correlation degree with a second keyword to be tested from at least one entity based on a preset correlation degree calculation method so as to utilize the first entity and the entity to be queried to perform a searching process, wherein the second keyword to be tested is the keyword to be tested except the first keyword to be tested in at least two keywords to be tested.
After the server searches at least one entity related to the first entity from the preset entity relation library, the server determines an entity to be queried with the highest correlation degree with the second keyword to be queried from the at least one entity based on a preset correlation degree calculation method, and searches by using the first entity and the entity to be queried.
In the embodiment of the present invention, the preset correlation calculation method includes an edit distance calculation method, a vector space model, and the like, which are specifically selected according to actual situations, and the embodiment of the present invention is not specifically limited.
In the embodiment of the invention, the server calculates at least one editing distance between at least one entity and a second keyword to be checked according to a preset editing distance calculation method, and then determines a first editing distance with the minimum editing distance and an entity to be checked corresponding to the first editing distance from the at least one editing distance.
In the embodiment of the present invention, the preset edit distance calculation method includes: definition representation di,jThe character string a of length i becomes the edit distance required for the character string b of length j. If the final length of string a is m and the final length of string b is n, the d matrix is a matrix of (m +1) × (n +1) since transitions between strings of length 0 can be represented.
Figure BDA0001598236090000091
Figure BDA0001598236090000092
Figure BDA0001598236090000093
Wherein, wdel(bi) And wins(aj) Is 1, when the ith character of the character string a is not equal to the jth character of the character string b, wsub(aj,bi) 1 is ═ 1; otherwise it equals 0.
Preferably, the server searches at least one entity associated with the first keyword to be checked from the knowledge graph, judges whether the second keyword to be checked exists in the at least one entity, and directly returns the first keyword to be checked and the second keyword to be checked when the second keyword exists in the at least one entity, otherwise, calculates at least one edit distance between the at least one entity and the second keyword to be checked according to a preset edit distance calculation method.
In the embodiment of the present invention, the server sorts at least one edit distance in a descending order or a descending order, and a specific sorting method is selected according to an actual situation.
In the embodiment of the invention, the server determines the first editing distance with the minimum editing distance from the at least one editing distance after sequencing, then the server determines the entity to be queried corresponding to the first editing distance, and searches by using the first entity and the entity to be queried.
Further, after the server searches by using the first entity and the entity to be queried, adding one to the search record in the search log, and updating the search times of the search record to the edit distance dictionary by the server to complete the incremental learning process.
Illustratively, as shown in FIG. 4, the check logic for the input check is: after receiving a query request of a sentence to be queried, performing deep preprocessing on the sentence to be queried, including screening out a first keyword by using a template, then performing formatting operation on the first keyword, including conversion from a traditional Chinese character to a simplified Chinese character, conversion from a symbol full angle to a half angle, conversion from an upper case to a lower case of a letter, conversion from an Arabic number to a simple Chinese character, removing a special symbol space, an illegal character and the like, and then determining whether a near word exists in the first keyword by using an error correction dictionary; then, error correction is carried out on the first keyword based on rules, including homonym word identification by using a corpus dictionary, fuzzy pinyin identification and multi-word and few-word identification by using an edit distance dictionary, and the keyword to be detected is obtained; when the number of the keywords to be checked is at least two and the first attribute of the first keyword to be checked in the at least two keywords to be checked belongs to the preset checking attribute, error correction is carried out on the at least two keywords to be checked based on semantics.
It is understood that the server is provided with a preset entity relation library storing the relations between the entities, when the server receives at least two keywords to be checked, such as 'chopsticks brother apple', the server searches a first entity 'chopsticks brother' corresponding to the first keyword to be checked and at least one entity related to the 'chopsticks brother' from a preset entity relation library, namely the song sung by the brother of the chopsticks, and the entity to be inquired, namely the apple with the highest correlation degree with the second keyword to be checked, is determined from the song sung by the brother of the chopsticks, the first entity 'chopsticks brother' and the entity to be inquired is 'small apple' are two entities with the highest correlation degree, the server searches by using the 'chopsticks brother small apple', and the server can correct errors of keywords to be checked by combining the semantics of the user, so that the accuracy of input checking is improved.
Example two
An embodiment of the present invention provides an input checking method, as shown in fig. 5, the method may include:
s201, when the server receives an input sentence to be searched, the server identifies a first keyword from the sentence to be searched by using a preset input template.
The input checking method provided by the embodiment of the invention is suitable for a scene that a server corrects the error of the keywords in the sentences input by the user.
In the embodiment of the invention, the input checking method is shown in fig. 2 and comprises three modules of off-line library building, on-line fault tolerance and incremental learning, wherein the off-line library building is used for preprocessing and training the corpus to obtain a template, a corpus dictionary, an editing distance dictionary and a knowledge map; the online fault tolerance is to perform online fault tolerance by utilizing a template, a corpus dictionary, an editing distance dictionary and a knowledge map which are obtained by building a library offline after receiving a sentence to be searched and input by a user; incremental learning is a process of determining updated search times from a search log after online fault tolerance, and updating the updated search times into a corresponding dictionary to complete the incremental learning.
In the embodiment of the invention, a user inputs the sentence to be checked in the input box, and the server is favorable for presetting the input template and identifying the first keyword from the sentence to be checked.
Illustratively, an input template is predefined, such as "songs that I want to listen to ()", wherein the type of (is) singer, and when the user inputs "songs that I want to listen to game show", the server returns "game show" as singer according to the predefined input template.
In the embodiment of the invention, when the server receives the keyword group input by the user, the server directly decomposes the keyword group input by the user into the first keyword according to the word segmentation method.
S202, the server matches the first keyword with a preset standard corpus.
After the server identifies the first keyword from the sentence to be searched by using the preset input template, the server needs to match the first keyword with a preset standard corpus.
In the embodiment of the invention, a server carries out deep preprocessing on a first keyword, including carrying out Chinese character complex form conversion to simple form conversion, symbol full angle conversion to half angle conversion, letter upper case conversion to lower case conversion, Arabic number conversion to simple Chinese character, removing special symbols, blank spaces, illegal characters and the like on the first keyword, and searching for a near-form character corresponding to the first keyword by using an error correction dictionary; then the server carries out rule-based error correction on the preprocessed first keyword, wherein the rule-based error correction comprises the steps of searching the first keyword from a corpus dictionary, and when the first keyword is searched, the first keyword is determined as a keyword to be checked; and when the first keyword is not found, converting the first keyword into English, and then searching the English first keyword from the corpus dictionary. When the keyword is searched, determining the keyword corresponding to the first English keyword in the corpus dictionary as a keyword to be checked; and when the keyword is not searched out, searching the keyword with the minimum editing distance with the first keyword from the editing distance dictionary, and determining the keyword as the keyword to be checked.
In the embodiment of the invention, the error correction dictionary stores the error query information and the corresponding correct query information, and the attributes corresponding to the correct query information, for example, the error correction dictionary is provided with a Figure and a Figure singer, so that the error can be corrected to a Figure when the first keyword is the Figure, and the attribute of the Figure can be identified to a singer.
In the embodiment of the present invention, a server obtains preset corpus information by using a crawler system or a database, where the preset corpus information includes entity names, attributes corresponding to the entity names, and search times, for example: zhou Jie Lun, singer and 1 ten thousand times; wherein, the Zhou Jie Lun is the entity name, the attribute is singer, and the number of searching times is 1 ten thousand.
In the embodiment of the invention, a server trains the preprocessed preset corpus information to respectively generate a preset input template, a preset standard corpus and a knowledge graph (a preset entity relation library), wherein the preset standard corpus comprises a corpus dictionary and an editing distance dictionary, entity names including Chinese names and corresponding English names are stored in the corpus dictionary, and the entity names are specifically selected according to actual conditions; the editing distance dictionary stores entity names, entity attributes and search times, and the entity names, the entity attributes and the search times are specifically selected according to actual conditions.
In the embodiment of the invention, the preset editing distance calculation method is arranged in the editing distance dictionary, and the server can determine the keyword closest to the first keyword editing distance according to the preset editing distance calculation method.
In the embodiment of the invention, the server judges the word length of the first keyword, and when the word length is smaller than a preset length threshold, the first keyword is searched from the corpus dictionary.
Illustratively, if the first keyword is Chinese and has a length less than 10, the first keyword is looked up from the corpus dictionary.
S203, when the first keyword is matched with the first preset keyword, the server adds the first keyword to the keyword to be checked.
When the server determines that the first keyword is matched with a first preset keyword in a preset standard corpus, the server needs to add the first keyword to the keyword to be checked.
In the embodiment of the invention, when the server judges that the first keyword is matched with the first preset keyword in the corpus dictionary, the characterization server inquires the first keyword from the corpus dictionary, and at the moment, the server searches the fifth attribute corresponding to the first preset keyword from the editing distance dictionary.
In the embodiment of the invention, the server respectively adds the first keyword to the keyword to be detected and adds the fifth attribute to at least two attributes.
S204, when the first keyword is not matched with the first preset keyword in the preset standard corpus, the server converts the first keyword into pinyin to be checked.
When the server judges that the first keyword is not matched with the first preset keyword in the preset standard corpus, the server needs to convert the first keyword into pinyin to be checked.
In the embodiment of the invention, when the server judges that the first keyword is not matched with the first preset keyword in the corpus dictionary, the server converts the first keyword into the pinyin to be checked.
Illustratively, the user enters "before learning," and the server does not find the word from the corpus dictionary, at which point the server converts "before learning" to the pinyin "xuezhiqian.
S203 and S204 are two parallel steps after S202, and are specifically selected to be executed according to actual situations, and the embodiment of the present invention is not specifically limited.
S205, when the pinyin to be checked is matched with the first pinyin in the preset standard corpus, the server acquires a second preset keyword corresponding to the first pinyin.
When the server converts the first keyword into the pinyin to be checked, the server matches the pinyin to be checked with the first pinyin in the preset standard corpus, and when the pinyin to be checked is matched with the first pinyin in the preset standard corpus, the server acquires a second preset keyword corresponding to the first pinyin.
In the embodiment of the invention, the server judges the pinyin length of the pinyin to be checked, matches the pinyin to be checked with the first pinyin in the corpus dictionary when the pinyin length is smaller than a preset length threshold value, and acquires a second preset keyword corresponding to the first pinyin and a second attribute corresponding to the second preset keyword when the matching is successful.
Illustratively, if the length of the pinyin to be checked is less than 20, the server searches the pinyin to be checked from the corpus dictionary.
Illustratively, the server searches for the second preset keyword corresponding to "xuezhiqian" from the corpus dictionary as "hummer", and then the server obtains the attribute of "hummer" from the edit distance dictionary as "singer".
S206, the server adds the second preset keyword to the keyword to be checked.
After the server acquires the second preset keyword, the server adds the second preset keyword to the keyword to be checked.
In the embodiment of the invention, the server adds the second preset keyword to the keyword to be checked and adds the second attribute to at least two attributes.
And S207, when the pinyin to be checked is not matched with the first pinyin, the server searches a third preset keyword which is closest to the editing distance of the first keyword from the preset standard corpus.
When the server judges that the pinyin to be checked is not matched with the first pinyin, the server needs to search a third preset keyword which is closest to the editing distance of the first keyword from the preset standard corpus.
In the embodiment of the invention, when the server judges that the pinyin to be checked is not matched with the first pinyin, the server searches a third preset keyword which is closest to the editing distance of the first keyword and a third attribute corresponding to the third preset keyword by using the editing distance dictionary.
Illustratively, as shown in fig. 6, the process of the server performing rule-based error correction on the first keyword is as follows:
1. formatting the first keyword;
2. matching the first keyword with a corpus dictionary;
3. when the matching is successful, determining the first keyword as a keyword to be checked;
4. when the matching fails, converting the first keyword into a pinyin to be detected;
5. matching the pinyin to be detected with a first pinyin in a corpus dictionary;
6. when the matching is successful, searching a second preset keyword corresponding to the first pinyin from the corpus dictionary, and determining the second preset keyword as a keyword to be checked;
7. and when the matching fails, searching a third preset keyword with the minimum editing distance with the first keyword to be detected by using the editing distance dictionary, and determining the third preset keyword as the keyword to be detected.
S205-S206 and S207 are two parallel steps after S204, and are specifically selected to be executed according to actual situations, and the embodiment of the present invention is not specifically limited.
S208, when the third preset keyword comprises at least two keywords, the server searches at least two search times corresponding to the at least two keywords from the preset standard corpus.
After the server finds out a third preset keyword which is closest to the editing distance of the first keyword, the server needs to judge whether the third preset keyword comprises at least two keywords, and when the server judges that the third preset keyword comprises the at least two keywords, the server searches for at least two search times corresponding to the at least two keywords from a preset standard corpus.
In the embodiment of the invention, when the server judges that at least two keywords are closest to the editing distance of the first keyword, the server searches at least two search times corresponding to the at least two keywords from the editing distance dictionary.
S209, the server determines a fourth preset keyword with the most searching times from the at least two keywords according to the at least two searching times.
After the server finds at least two search times corresponding to the at least two keywords, the server determines a fourth preset keyword with the largest search time from the at least two keywords according to the at least two search times.
In the embodiment of the invention, the server sorts at least two search times in a descending order or a descending order, and then the server determines a fourth preset keyword with the highest search time and a fourth attribute corresponding to the fourth keyword to be detected.
S210, the server adds a fourth preset keyword to the keyword to be checked.
After the server determines the fourth preset keyword with the largest searching frequency, the server needs to add the fourth preset keyword to the keyword to be detected.
In the embodiment of the invention, the server adds the fourth preset keyword to the keyword to be detected and adds the fourth attribute to at least two attributes.
S211, when the server judges that the number of the keywords to be detected is one, the server searches by using the keywords to be detected.
After the server determines the keywords to be detected corresponding to the first keyword from the preset standard corpus, the server needs to judge the number of the keywords to be detected, and when the number of the keywords to be detected is one, the server needs to search by using the keywords to be detected.
In the embodiment of the invention, the server searches the keywords to be checked and displays the search result on the current display interface.
S212, when the keywords to be detected are at least two keywords to be detected, the server obtains at least two attributes corresponding to the at least two keywords to be detected from a preset standard corpus.
After the server determines at least two keywords to be checked corresponding to the at least two keywords to be checked from the preset standard corpus, the server acquires at least two attributes corresponding to the at least two keywords to be checked from the preset standard corpus.
In the embodiment of the invention, the editing distance dictionary stores at least two attributes corresponding to at least two keywords to be checked, and when the server uses the editing distance dictionary to perform spelling error correction on the first keyword to obtain at least two keywords to be checked, the server can simultaneously obtain at least two attributes corresponding to at least two keywords to be checked.
S211 and S212 are two parallel steps after S210, which are specifically selected according to actual situations, and the embodiment of the present invention is not specifically limited.
S213, when the attribute of the first keyword to be checked in the at least two attributes belongs to the preset checking attribute, the server determines a first entity corresponding to the first keyword to be checked according to the attribute of the first keyword to be checked.
When the server obtains at least two attributes corresponding to at least two keywords to be checked from a preset standard corpus, the server needs to sequentially judge whether the at least two attributes belong to preset checking attributes, and when the server judges that the attribute of the first keyword to be checked belongs to the preset checking attributes, the server needs to determine a first entity corresponding to the first keyword to be checked according to the attribute of the first keyword to be checked.
In the embodiment of the invention, the server is preset with preset checking attributes which need to be further checked, when the server determines at least two attributes, the server sequentially judges whether the at least two attributes belong to the preset checking attributes, and when the attribute of the first keyword to be checked is judged to belong to the preset checking attributes, the server searches the first entity corresponding to the first keyword to be checked from the knowledge graph.
S214, the server searches at least one entity related to the first entity from a preset entity relation library.
After the server determines the first entity corresponding to the first keyword to be checked according to the attribute of the first keyword to be checked, the server searches at least one entity related to the first entity from the preset entity relationship library.
In the embodiment of the invention, the knowledge graph is a data structure based on a graph and consists of nodes and edges, wherein each node represents an entity, and each edge is a relationship between the entities. The knowledge graph may be represented by entity-relationship-entity triples, and stored using a conventional RDF or graph database, and is used to query at least one entity related to the first keyword to be checked.
In the embodiment of the invention, the generation process of the knowledge graph comprises the following steps: for structured and semi-structured data, carrying out batch processing by using D2R or a data acquisition tool, extracting entities and attributes from the data, establishing triples of entity relations, and constructing a knowledge graph; for unstructured text information, natural language processing technology is utilized to carry out word segmentation, syntactic dependency analysis and class recognition constraint on the text, the vocabulary meeting the constraint is constructed into entities of corresponding classes, and data values are supplemented.
As shown in fig. 3, the pair of entities, which are both workmanship and jean are busy, are the relationship of the work.
S215, the server calculates at least one edit distance between at least one entity and the second keyword to be checked according to a preset edit distance calculation method.
After the server finds at least one entity related to the first keyword to be checked, the server calculates at least one edit distance between the at least one entity and a second keyword to be checked according to a preset edit distance calculation method, wherein the second keyword to be checked is the keyword to be checked except the first keyword to be checked in the at least two keywords to be checked.
In the embodiment of the present invention, the preset edit distance calculation method includes: definition representation di,jThe character string a of length i becomes the edit distance required for the character string b of length j. If the final length of the character string a is m,the final length of string b is n, and the d matrix is a matrix of (m +1) × (n +1), since transitions between strings of length 0 can be represented.
Figure BDA0001598236090000171
Figure BDA0001598236090000172
Figure BDA0001598236090000173
Wherein, wdel(bi) And wind(aj) Is 1, when the ith character of the character string a is not equal to the jth character of the character string b, wsub(aj,bi) 1 is ═ 1; otherwise it equals 0.
Preferably, the server searches at least one entity associated with the first keyword to be checked from the knowledge graph, judges whether the second keyword to be checked exists in the at least one entity, and directly returns the first keyword to be checked and the second keyword to be checked when the second keyword exists in the at least one entity, otherwise, calculates at least one edit distance between the at least one entity and the second keyword to be checked according to a preset edit distance calculation method.
S216, the server determines a first editing distance with the minimum editing distance and an entity to be queried corresponding to the first editing distance from the at least one editing distance, and the first entity and the entity to be queried are used for searching.
After the server calculates at least one edit distance between at least one entity and a second keyword to be checked, the server determines a first edit distance with the minimum edit distance from the at least one edit distance, and determines an entity to be queried corresponding to the first edit distance, so as to utilize the first entity and the entity to be queried to perform a searching process.
In the embodiment of the present invention, the server sorts at least one edit distance in a descending order or a descending order, and a specific sorting method is selected according to an actual situation.
In the embodiment of the invention, the server determines the first editing distance with the minimum editing distance from the at least one editing distance after sequencing, then the server determines the entity to be queried corresponding to the first editing distance, and searches by using the first entity and the entity to be queried.
Illustratively, as shown in fig. 7, the logic of semantic-based error correction performed by the server on the first keyword to be checked is as follows:
1. searching at least one entity related to the first keyword to be checked from the knowledge graph;
2. judging whether the second keyword to be detected is matched with at least one entity;
3. when the keywords are matched, directly returning the first keywords to be checked and the second keywords to be checked;
4. when the first entity and the second entity are not matched, at least one editing distance between the at least one entity and the second keyword to be detected is calculated in sequence;
5. determining a first editing distance with the minimum editing distance and an entity to be queried corresponding to the first editing distance from at least one editing distance;
6. and returning the first keyword to be checked and the entity to be inquired.
S217, the server respectively obtains the first historical search times of the first entity and the second historical search times of the entity to be inquired.
After the server searches by using the first entity and the entity to be queried, the server needs to respectively obtain a first historical search frequency of the first keyword to be checked and a second historical search frequency of the entity to be queried.
In the embodiment of the invention, the server acquires the first historical search times of the first entity and the second historical search times of the entity to be inquired from the search log.
S218, the server adds the first historical search times and the second historical search times to a preset standard corpus.
After the server respectively obtains a first historical search frequency of the first entity and a second historical search frequency of the entity to be queried, the server needs to add the first historical search frequency and the second historical search frequency to a preset standard corpus.
In the embodiment of the invention, the server updates the first historical search times and the second historical search times to the corresponding positions of the editing distance dictionary to complete the incremental learning process.
It is understood that the server is provided with a preset entity relation library storing the relations between the entities, when the server receives at least two keywords to be checked, such as 'chopsticks brother apple', the server searches a first entity 'chopsticks brother' corresponding to the first keyword to be checked and at least one entity related to the 'chopsticks brother' from a preset entity relation library, namely the song sung by the brother of the chopsticks, and the entity to be inquired, namely the apple with the highest correlation degree with the second keyword to be checked, is determined from the song sung by the brother of the chopsticks, the first entity 'chopsticks brother' and the entity to be inquired is 'small apple' are two entities with the highest correlation degree, the server searches by using the 'chopsticks brother small apple', and the server can correct errors of keywords to be checked by combining the semantics of the user, so that the accuracy of input checking is improved.
EXAMPLE III
Fig. 8 is a schematic diagram of a composition structure of a server according to an embodiment of the present invention, and in practical applications, based on the same inventive concept of the first embodiment to the second embodiment, as shown in fig. 8, a server 1 according to an embodiment of the present invention includes: a processor 10, a memory 11, and a communication bus 12. In a Specific embodiment, the Processor 10 may be at least one of an Application Specific Integrated Circuit (ASIC), a Digital Signal Processor (DSP), a Digital Signal Processing Device (DSPD), a Programmable Logic Device (PLD), a Field Programmable Gate Array (FPGA), a CPU, a controller, a microcontroller, and a microprocessor. It will be appreciated that the electronic devices used to implement the processor functions described above may be other devices, and embodiments of the present invention are not limited in particular.
In the embodiment of the present invention, the communication bus 12 is used for realizing connection communication between the processor 10 and the memory 11; the processor 10 is configured to execute the operating program stored in the memory 11 to implement the following steps:
the processor 10 is configured to, when a keyword to be checked is at least two keywords to be checked, obtain at least two attributes corresponding to the at least two keywords to be checked from a preset standard corpus, where the keyword to be checked is a keyword corresponding to a sentence to be searched; when the attribute of a first keyword to be checked in the at least two attributes belongs to a preset checking attribute, determining a first entity corresponding to the first keyword to be checked according to the attribute of the first keyword to be checked; searching at least one entity related to the first entity from a preset entity relation library; and based on a preset relevance calculation method, determining an entity to be queried with the highest relevance to a second keyword to be checked from the at least one entity so as to utilize the first entity and the entity to be queried to perform a searching process, wherein the second keyword to be checked is the keyword to be checked except the first keyword to be checked in the at least two keywords to be checked.
In an embodiment of the present invention, the processor 10 is further configured to, when receiving a sentence to be searched, which is input by a user, identify a first keyword from the sentence to be searched by using a preset input template; and determining the keyword to be checked corresponding to the first keyword from the preset standard corpus according to a preset spelling check strategy.
In an embodiment of the present invention, further, the processor 10 is further configured to calculate at least one edit distance between the at least one entity and the second keyword to be inspected according to a preset edit distance calculation method; and determining a first editing distance with the minimum editing distance and the entity to be queried corresponding to the first editing distance from the at least one editing distance.
In an embodiment of the present invention, the processor 10 is further configured to match the first keyword with a preset standard corpus; when the first keyword is not matched with a first preset keyword in the preset standard corpus, converting the first keyword into pinyin to be checked; when the pinyin to be checked is matched with a first pinyin in the preset standard corpus, acquiring a second preset keyword corresponding to the first pinyin; and adding the second preset keyword to the keyword to be detected.
In an embodiment of the present invention, the processor 10 is further configured to search, when the pinyin to be checked is not matched with the first pinyin, a third preset keyword closest to an editing distance of the first keyword from the preset standard corpus; and adding the third preset keyword to the keyword to be detected.
In an embodiment of the present invention, further, the processor 10 is further configured to search, when the third preset keyword includes at least two keywords, at least two search times corresponding to the at least two keywords from the preset standard corpus; determining a fourth preset keyword with the largest searching times from the at least two keywords according to the at least two searching times; and adding the fourth preset keyword to the keyword to be detected.
In an embodiment of the present invention, the processor 10 is further configured to add the first keyword to the keyword to be checked when the first keyword matches the first preset keyword.
In this embodiment of the present invention, further, the processor 10 is further configured to obtain a first historical search frequency of the first entity and a second historical search frequency of the entity to be queried, respectively; adding the first historical search times and the second historical search times to the preset standard corpus.
In an embodiment of the present invention, the processor 10 is further configured to perform a search by using the keyword to be checked when it is determined that the number of the keyword to be checked is one.
According to the server provided by the embodiment of the invention, when the keywords to be detected are at least two keywords to be detected, at least two attributes corresponding to the at least two keywords to be detected are obtained from the preset standard corpus, and the keywords to be detected are corresponding keywords when the sentences to be searched are searched; when the attribute of a first keyword to be checked in the at least two attributes belongs to a preset checking attribute, determining a first entity corresponding to the first keyword to be checked according to the attribute of the first keyword to be checked; searching at least one entity related to the first entity from a preset entity relation library; and determining an entity to be queried with the highest correlation degree with a second keyword to be checked from at least one entity based on a preset correlation degree calculation method so as to utilize the first entity and the entity to be queried to perform a searching process, wherein the second keyword to be checked is the keyword to be checked except the first keyword to be checked in at least two keywords to be checked. Therefore, the server provided by the embodiment of the invention is provided with the preset entity relationship library storing the relationship between the entities, when the server receives at least two keywords to be checked, such as "chopsticks brother apple", the server searches the first entity "chopsticks brother" corresponding to the first keyword to be checked and at least one entity related to the "chopsticks brother", namely the song sung by the "chopsticks brother", and determines the entity "apple" to be checked with the highest correlation with the second keyword "apple" from the song sung by the "chopsticks brother", at this time, the first entity "chopsticks brother" and the entity to be checked are the two entities with the highest correlation, the server searches by using the "chopsticks brother apple", and the server can correct the keyword to be checked by combining the semantics of the user, thereby improving the accuracy of the input check.
The embodiment of the invention provides a computer-readable storage medium, which stores one or more programs, wherein the one or more programs are executable by one or more processors and are applied to a server, and when the programs are executed by the processors, the method according to the first embodiment and the second embodiment is realized.
Specifically, the program instructions corresponding to an input checking method in the embodiment are read or executed by an electronic device, and include the following steps:
when the keywords to be detected are at least two keywords to be detected, acquiring at least two attributes corresponding to the at least two keywords to be detected from a preset standard corpus, wherein the keywords to be detected are keywords corresponding to the sentences to be searched;
when the attribute of a first keyword to be checked in the at least two attributes belongs to a preset checking attribute, determining a first entity corresponding to the first keyword to be checked according to the attribute of the first keyword to be checked;
searching at least one entity related to the first entity from a preset entity relation library;
and based on a preset relevance calculation method, determining an entity to be queried with the highest relevance to a second keyword to be checked from the at least one entity so as to utilize the first entity and the entity to be queried to perform a searching process, wherein the second keyword to be checked is the keyword to be checked except the first keyword to be checked in the at least two keywords to be checked.
In an embodiment of the present invention, before obtaining at least two attributes corresponding to the at least two keywords to be checked from a preset standard corpus, the one or more programs are executed by the one or more processors, and further implement the following steps:
when a sentence to be searched input by a user is received, identifying a first keyword from the sentence to be searched by using a preset input template;
and determining the keyword to be checked corresponding to the first keyword from the preset standard corpus according to a preset spelling check strategy.
In an embodiment of the present invention, further, based on a preset relevance calculating method and the at least one entity, an entity to be queried having a highest relevance to a second keyword to be queried is calculated, where the one or more programs are executed by the one or more processors, and the following steps are specifically implemented:
calculating at least one edit distance between the at least one entity and the second keyword to be checked according to a preset edit distance calculation method;
and determining a first editing distance with the minimum editing distance and the entity to be queried corresponding to the first editing distance from the at least one editing distance.
In an embodiment of the present invention, further, according to a preset spell checking policy, a keyword to be checked corresponding to the first keyword is determined from the preset standard corpus, and the one or more programs are executed by the one or more processors, and specifically implement the following steps:
matching the first keyword with a preset standard corpus;
when the first keyword is not matched with a first preset keyword in the preset standard corpus, converting the first keyword into pinyin to be checked;
when the pinyin to be checked is matched with a first pinyin in the preset standard corpus, acquiring a second preset keyword corresponding to the first pinyin;
and adding the second preset keyword to the keyword to be detected.
In an embodiment of the present invention, further, after the first keyword is converted into the pinyin to be checked, the one or more programs are executed by the one or more processors, and then the following steps are implemented:
when the pinyin to be checked is not matched with the first pinyin, searching a third preset keyword which is closest to the editing distance of the first keyword from the preset standard corpus;
and adding the third preset keyword to the keyword to be detected.
In an embodiment of the present invention, further, after searching for a third preset keyword closest to an editing distance of the first keyword from the preset standard corpus, before adding the third preset keyword to the keyword to be detected, the one or more programs are executed by the one or more processors, and the following steps are further implemented:
when the third preset keyword comprises at least two keywords, searching at least two search times corresponding to the at least two keywords from the preset standard corpus;
determining a fourth preset keyword with the largest searching times from the at least two keywords according to the at least two searching times; accordingly, the method can be used for solving the problems that,
adding the third preset keyword to the keyword to be checked, wherein the one or more programs are executed by the one or more processors, and the following steps are specifically realized:
and adding the fourth preset keyword to the keyword to be detected.
In an embodiment of the present invention, further, after matching the first keyword with a preset standard corpus, the one or more programs are executed by the one or more processors, and the following steps are further implemented:
when the first keyword is matched with the first preset keyword, the first keyword is added into the keyword to be detected
In an embodiment of the present invention, further, after the process of searching by using the first entity and the entity to be queried is performed, the one or more programs are executed by the one or more processors, and the following steps are further implemented:
respectively acquiring a first historical search frequency of the first entity and a second historical search frequency of the entity to be inquired;
adding the first historical search times and the second historical search times to the preset standard corpus.
In an embodiment of the present invention, further, after determining the keyword to be checked corresponding to the first keyword from the preset standard corpus according to a preset spell checking policy, the one or more programs are executed by the one or more processors, and the following steps are further implemented:
and when the number of the keywords to be checked is judged to be one, searching by using the keywords to be checked.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.
The above description is only for the specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims (17)

1. An input checking method, comprising:
when an input sentence to be searched is received, recognizing a first keyword from the sentence to be searched by using a preset input template;
determining keywords to be checked corresponding to the first keywords from a preset standard corpus according to a preset spelling check strategy;
when the keywords to be detected are at least two keywords to be detected, acquiring at least two attributes corresponding to the at least two keywords to be detected from the preset standard corpus, wherein the keywords to be detected are keywords corresponding to the sentences to be searched;
when the attribute of a first keyword to be checked in the at least two attributes belongs to a preset checking attribute, determining a first entity corresponding to the first keyword to be checked according to the attribute of the first keyword to be checked;
searching at least one entity related to the first entity from a preset entity relation library;
determining an entity to be queried with the highest correlation degree with a second keyword to be checked from the at least one entity based on a preset correlation degree calculation method so as to utilize the first entity and the entity to be queried for a searching process, wherein the second keyword to be checked is the keyword to be checked except the first keyword to be checked in the at least two keywords to be checked;
the preset input template, the preset standard corpus and the preset entity relation library are generated based on preset corpus information; the preset corpus information at least comprises entity names and entity attributes corresponding to the entity names.
2. The method according to claim 1, wherein the calculating the entity to be queried with the highest relevance to the second keyword to be queried based on the preset relevance calculating method and the at least one entity comprises:
calculating at least one edit distance between the at least one entity and the second keyword to be checked according to a preset edit distance calculation method;
and determining a first editing distance with the minimum editing distance and the entity to be queried corresponding to the first editing distance from the at least one editing distance.
3. The method according to claim 1, wherein the determining the keyword to be checked corresponding to the first keyword from the preset standard corpus according to a preset spell checking policy comprises:
matching the first keyword with a preset standard corpus;
when the first keyword is not matched with a first preset keyword in the preset standard corpus, converting the first keyword into pinyin to be checked;
when the pinyin to be checked is matched with a first pinyin in the preset standard corpus, acquiring a second preset keyword corresponding to the first pinyin;
and adding the second preset keyword to the keyword to be detected.
4. The method of claim 3, wherein after converting the first keyword into the pinyin to be checked, the method further comprises:
when the pinyin to be checked is not matched with the first pinyin, searching a third preset keyword which is closest to the editing distance of the first keyword from the preset standard corpus;
and adding the third preset keyword to the keyword to be detected.
5. The method according to claim 4, wherein after the third preset keyword closest to the editing distance of the first keyword is searched from the preset standard corpus, and before the third preset keyword is added to the keyword to be checked, the method further comprises:
when the third preset keyword comprises at least two keywords, searching at least two search times corresponding to the at least two keywords from the preset standard corpus;
determining a fourth preset keyword with the largest searching times from the at least two keywords according to the at least two searching times; accordingly, the method can be used for solving the problems that,
adding the third preset keyword into the keyword to be detected, including:
and adding the fourth preset keyword to the keyword to be detected.
6. The method according to claim 3, wherein after matching the first keyword with a predetermined standard corpus, the method further comprises:
and when the first keyword is matched with the first preset keyword, adding the first keyword into the keyword to be detected.
7. The method of claim 1, wherein after the process of searching with the first entity and the entity to be queried, the method further comprises:
respectively acquiring a first historical search frequency of the first entity and a second historical search frequency of the entity to be inquired;
adding the first historical search times and the second historical search times to the preset standard corpus.
8. The method according to claim 1, wherein after determining the keyword to be checked corresponding to the first keyword from the preset standard corpus according to a preset spell checking policy, the method further comprises:
and when the number of the keywords to be checked is judged to be one, searching by using the keywords to be checked.
9. A server, characterized in that the server comprises: the processor is used for executing the running program stored in the memory so as to realize the following steps:
when a sentence to be searched input by a user is received, identifying a first keyword from the sentence to be searched by using a preset input template; determining keywords to be checked corresponding to the first keywords from a preset standard corpus according to a preset spelling check strategy; when the keywords to be detected are at least two keywords to be detected, acquiring at least two attributes corresponding to the at least two keywords to be detected from the preset standard corpus, wherein the keywords to be detected are keywords corresponding to the sentences to be searched; when the attribute of a first keyword to be checked in the at least two attributes belongs to a preset checking attribute, determining a first entity corresponding to the first keyword to be checked according to the attribute of the first keyword to be checked; searching at least one entity related to the first entity from a preset entity relation library; determining an entity to be queried with the highest correlation degree with a second keyword to be checked from the at least one entity based on a preset correlation degree calculation method so as to utilize the first entity keyword and the entity to be queried for a searching process, wherein the second keyword to be checked is the keyword to be checked except the first keyword to be checked in the at least two keywords to be checked;
the preset input template, the preset standard corpus and the preset entity relation library are generated based on preset corpus information; the preset corpus information at least comprises entity names and entity attributes corresponding to the entity names.
10. The server according to claim 9,
the processor is further configured to calculate at least one edit distance between the at least one entity and the second keyword to be inspected according to a preset edit distance calculation method; and determining a first editing distance with the minimum editing distance and the entity to be queried corresponding to the first editing distance from the at least one editing distance.
11. The server according to claim 9,
the processor is further used for matching the first keyword with a preset standard corpus; when the first keyword is not matched with a first preset keyword in the preset standard corpus, converting the first keyword into pinyin to be checked; when the pinyin to be checked is matched with a first pinyin in the preset standard corpus, acquiring a second preset keyword corresponding to the first pinyin; and adding the second preset keyword to the keyword to be detected.
12. The server according to claim 11,
the processor is further configured to search a third preset keyword which is closest to the editing distance of the first keyword from the preset standard corpus when the pinyin to be detected is not matched with the first pinyin; and adding the third preset keyword to the keyword to be detected.
13. The server according to claim 12,
the processor is further configured to search at least two search times corresponding to at least two keywords from the preset standard corpus when the third preset keyword includes the at least two keywords; determining a fourth preset keyword with the largest searching times from the at least two keywords according to the at least two searching times; and adding the fourth preset keyword to the keyword to be detected.
14. The server according to claim 11,
the processor is further configured to add the first keyword to the keyword to be checked when the first keyword matches the first preset keyword.
15. The server according to claim 9,
the processor is further configured to obtain a first historical search frequency of the first entity and a second historical search frequency of the entity to be queried respectively; adding the first historical search times and the second historical search times to the preset standard corpus.
16. The server according to claim 9,
and the processor is also used for searching by using the keywords to be checked when the number of the keywords to be checked is judged to be one.
17. A computer-readable storage medium, on which a computer program is stored, for application to a server, characterized in that the computer program, when being executed by a processor, carries out the method according to any one of claims 1 to 8.
CN201810214555.4A 2018-03-15 2018-03-15 Input checking method, server and computer readable storage medium Active CN110309258B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810214555.4A CN110309258B (en) 2018-03-15 2018-03-15 Input checking method, server and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810214555.4A CN110309258B (en) 2018-03-15 2018-03-15 Input checking method, server and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN110309258A CN110309258A (en) 2019-10-08
CN110309258B true CN110309258B (en) 2022-03-29

Family

ID=68073330

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810214555.4A Active CN110309258B (en) 2018-03-15 2018-03-15 Input checking method, server and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN110309258B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111291571A (en) * 2020-01-17 2020-06-16 华为技术有限公司 Semantic error correction method, electronic device and storage medium
CN112507073A (en) * 2020-12-07 2021-03-16 云南电网有限责任公司普洱供电局 Content verification method of power distribution network operation file and related equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102708100A (en) * 2011-03-28 2012-10-03 北京百度网讯科技有限公司 Method and device for digging relation keyword of relevant entity word and application thereof
CN106033466A (en) * 2015-03-20 2016-10-19 华为技术有限公司 Database query method and device
CN107526812A (en) * 2017-08-24 2017-12-29 北京奇艺世纪科技有限公司 A kind of searching method, device and electronic equipment

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102682763B (en) * 2011-03-10 2014-07-16 北京三星通信技术研究有限公司 Method, device and terminal for correcting named entity vocabularies in voice input text
US9208204B2 (en) * 2013-12-02 2015-12-08 Qbase, LLC Search suggestions using fuzzy-score matching and entity co-occurrence

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102708100A (en) * 2011-03-28 2012-10-03 北京百度网讯科技有限公司 Method and device for digging relation keyword of relevant entity word and application thereof
CN106033466A (en) * 2015-03-20 2016-10-19 华为技术有限公司 Database query method and device
CN107526812A (en) * 2017-08-24 2017-12-29 北京奇艺世纪科技有限公司 A kind of searching method, device and electronic equipment

Also Published As

Publication number Publication date
CN110309258A (en) 2019-10-08

Similar Documents

Publication Publication Date Title
CN111104794A (en) Text similarity matching method based on subject words
CN104657439A (en) Generation system and method for structured query sentence used for precise retrieval of natural language
CN108874996B (en) Website classification method and device
CN111488468B (en) Geographic information knowledge point extraction method and device, storage medium and computer equipment
WO2011044659A1 (en) System and method for phrase identification
CN104657440A (en) Structured query statement generating system and method
CN110765761A (en) Contract sensitive word checking method and device based on artificial intelligence and storage medium
CN101131706A (en) Query amending method and system thereof
Yerra et al. A sentence-based copy detection approach for web documents
CN110096599B (en) Knowledge graph generation method and device
CN114036930A (en) Text error correction method, device, equipment and computer readable medium
CN113312922B (en) Improved chapter-level triple information extraction method
CN112115232A (en) Data error correction method and device and server
CN112417891B (en) Text relation automatic labeling method based on open type information extraction
CN114495143B (en) Text object recognition method and device, electronic equipment and storage medium
CN116775847A (en) Question answering method and system based on knowledge graph and large language model
CN110309258B (en) Input checking method, server and computer readable storage medium
JP6108212B2 (en) Synonym extraction system, method and program
CN112948573A (en) Text label extraction method, device, equipment and computer storage medium
CN115150354B (en) Method and device for generating domain name, storage medium and electronic equipment
CN112560425B (en) Template generation method and device, electronic equipment and storage medium
CN114417008A (en) Construction engineering field-oriented knowledge graph construction method and system
CN115309994A (en) Location search method, electronic device, and storage medium
CN107203512B (en) Method for extracting key elements from natural language input of user
JP2001101184A (en) Method and device for generating structurized document and storage medium with structurized document generation program stored therein

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20220706

Address after: 610041 China (Sichuan) pilot Free Trade Zone, Chengdu, Sichuan

Patentee after: China Mobile (Chengdu) information and Communication Technology Co.,Ltd.

Patentee after: CHINA MOBILE (SUZHOU) SOFTWARE TECHNOLOGY Co.,Ltd.

Patentee after: CHINA MOBILE COMMUNICATIONS GROUP Co.,Ltd.

Address before: 100032 No. 29, Finance Street, Beijing, Xicheng District

Patentee before: CHINA MOBILE COMMUNICATIONS GROUP Co.,Ltd.

Patentee before: CHINA MOBILE (SUZHOU) SOFTWARE TECHNOLOGY Co.,Ltd.