Disclosure of Invention
To solve the above technical problem, embodiments of the present invention desirably provide an input checking method, a server, and a computer-readable storage medium, which can check a character string input by a user in combination with a semantic meaning input by the user, and improve the accuracy of the check.
The technical scheme of the invention is realized as follows:
the embodiment of the invention provides an input checking method, which comprises the following steps:
when the keywords to be detected are at least two keywords to be detected, acquiring at least two attributes corresponding to the at least two keywords to be detected from a preset standard corpus, wherein the keywords to be detected are keywords corresponding to the sentences to be searched;
when the attribute of a first keyword to be checked in the at least two attributes belongs to a preset checking attribute, determining a first entity corresponding to the first keyword to be checked according to the attribute of the first keyword to be checked;
searching at least one entity related to the first entity from a preset entity relation library;
and based on a preset relevance calculation method, determining an entity to be queried with the highest relevance to a second keyword to be tested from the at least one entity so as to utilize the first entity keyword and the entity to be queried to perform a searching process, wherein the second keyword to be tested is the keyword to be tested except the first keyword to be tested in the at least two keywords to be tested.
In the above method, before the obtaining at least two attributes corresponding to the at least two keywords to be checked from the preset standard corpus, the method further includes:
when a sentence to be searched input by a user is received, identifying a first keyword from the sentence to be searched by using a preset input template;
and determining the keyword to be checked corresponding to the first keyword from the preset standard corpus according to a preset spelling check strategy.
In the above method, the calculating, based on the preset relevance calculating method and the at least one entity, the entity to be queried having the highest relevance to the second keyword to be queried includes:
calculating at least one edit distance between the at least one entity and the second keyword to be checked according to a preset edit distance calculation method;
and determining a first editing distance with the minimum editing distance and the entity to be queried corresponding to the first editing distance from the at least one editing distance.
In the above method, determining the keyword to be checked corresponding to the first keyword from the preset standard corpus according to a preset spell checking policy, includes:
matching the first keyword with a preset standard corpus;
when the first keyword is not matched with a first preset keyword in the preset standard corpus, converting the first keyword into pinyin to be checked;
when the pinyin to be checked is matched with a first pinyin in the preset standard corpus, acquiring a second preset keyword corresponding to the first pinyin;
and adding the second preset keyword to the keyword to be detected.
In the above method, after the converting the first keyword into the pinyin to be checked, the method further includes:
when the pinyin to be checked is not matched with the first pinyin, searching a third preset keyword which is closest to the editing distance of the first keyword from the preset standard corpus;
and adding the third preset keyword to the keyword to be detected.
In the above method, after searching for a third preset keyword that is closest to the editing distance of the first keyword from the preset standard corpus, and before adding the third preset keyword to the keyword to be detected, the method further includes:
when the third preset keyword comprises at least two keywords, searching at least two search times corresponding to the at least two keywords from the preset standard corpus;
determining a fourth preset keyword with the largest searching times from the at least two keywords according to the at least two searching times; accordingly, the method can be used for solving the problems that,
adding the third preset keyword into the keyword to be detected, including:
and adding the fourth preset keyword to the keyword to be detected.
In the above method, after the matching the first keyword with a preset standard corpus, the method further includes:
and when the first keyword is matched with the first preset keyword, adding the first keyword into the keyword to be detected.
In the above method, after the process of searching by using the first entity and the entity to be queried, the method further includes:
respectively acquiring a first historical search frequency of the first entity and a second historical search frequency of the entity to be inquired;
adding the first historical search times and the second historical search times to the preset standard corpus.
In the method, after determining the keyword to be checked corresponding to the first keyword from the preset standard corpus according to a preset spell checking policy, the method further includes:
and when the number of the keywords to be checked is judged to be one, searching by using the keywords to be checked.
An embodiment of the present invention provides a server, where the server includes: the processor is used for executing the running program stored in the memory so as to realize the following steps:
when the keywords to be detected are at least two keywords to be detected, acquiring at least two attributes corresponding to the at least two keywords to be detected from a preset standard corpus, wherein the keywords to be detected are keywords corresponding to the sentences to be searched; when the attribute of a first keyword to be checked in the at least two attributes belongs to a preset checking attribute, determining a first entity corresponding to the first keyword to be checked according to the attribute of the first keyword to be checked; searching at least one entity related to the first entity from a preset entity relation library; and based on a preset relevance calculation method, determining an entity to be queried with the highest relevance to a second keyword to be tested from the at least one entity so as to utilize the first entity keyword and the entity to be queried to perform a searching process, wherein the second keyword to be tested is the keyword to be tested except the first keyword to be tested in the at least two keywords to be tested.
In the server, the processor is further configured to, when receiving a sentence to be searched input by a user, identify a first keyword from the sentence to be searched by using a preset input template; and determining the keyword to be checked corresponding to the first keyword from the preset standard corpus according to a preset spelling check strategy.
In the server, the processor is further configured to calculate at least one edit distance between the at least one entity and the second keyword to be inspected according to a preset edit distance calculation method; and determining a first editing distance with the minimum editing distance and the entity to be queried corresponding to the first editing distance from the at least one editing distance.
In the server, the processor is further configured to match the first keyword with a preset standard corpus; when the first keyword is not matched with a first preset keyword in the preset standard corpus, converting the first keyword into pinyin to be checked; when the pinyin to be checked is matched with a first pinyin in the preset standard corpus, acquiring a second preset keyword corresponding to the first pinyin; and adding the second preset keyword to the keyword to be detected.
In the server, the processor is further configured to search a third preset keyword closest to the editing distance of the first keyword from the preset standard corpus when the pinyin to be checked is not matched with the first pinyin; and adding the third preset keyword to the keyword to be detected.
In the server, the processor is further configured to search, when the third preset keyword includes at least two keywords, at least two search times corresponding to the at least two keywords from the preset standard corpus; determining a fourth preset keyword with the largest searching times from the at least two keywords according to the at least two searching times; and adding the fourth preset keyword to the keyword to be detected.
In the server, the processor is further configured to add the first keyword to the keyword to be checked when the first keyword matches the first preset keyword.
In the server, the processor is further configured to obtain a first historical search frequency of the first entity and a second historical search frequency of the entity to be queried, respectively; adding the first historical search times and the second historical search times to the preset standard corpus.
In the server, the processor is further configured to search by using the keyword to be checked when it is determined that the number of the keyword to be checked is one.
The embodiment of the invention provides a computer readable storage medium, which stores a computer program, is applied to a server, and when the computer program is executed by a processor, the computer program realizes any one of the input checking methods.
The embodiment of the invention provides an input checking method, a server and a computer readable storage medium, wherein when keywords to be checked are at least two keywords to be checked, at least two attributes corresponding to the at least two keywords to be checked are obtained from a preset standard corpus, and the keywords to be checked are corresponding keywords when sentences to be searched are searched; when the attribute of a first keyword to be checked in the at least two attributes belongs to a preset checking attribute, determining a first entity corresponding to the first keyword to be checked according to the attribute of the first keyword to be checked; searching at least one entity related to the first entity from a preset entity relation library; and determining an entity to be queried with the highest correlation degree with a second keyword to be checked from at least one entity based on a preset correlation degree calculation method so as to utilize the first entity and the entity to be queried to perform a searching process, wherein the second keyword to be checked is the keyword to be checked except the first keyword to be checked in at least two keywords to be checked. By adopting the method, the server is provided with a preset entity relational database storing the relation between entities, when the server receives at least two keywords to be checked, such as ' chopsticks brother apple ', the server searches a first entity ' chopsticks brother ' corresponding to the first keyword to be checked and at least one entity related to the chopsticks brother ', namely a song sung by ' chopsticks brother ', and determines an entity to be checked ' apple ' with the highest degree of correlation with a second keyword ' apple ' from the song sung by ' chopsticks brother ', at the moment, the first entity ' chopsticks brother ' and the entity to be checked are ' apples ' with the highest degree of correlation, the server searches by ' chopsticks brother ', the server can correct the keywords to be checked by combining the semantics of the user, thereby improving the accuracy of the input check.
Detailed Description
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.
Example one
An embodiment of the present invention provides an input checking method, as shown in fig. 1, the method may include:
s101, when the keywords to be detected are at least two keywords to be detected, acquiring at least two attributes corresponding to the at least two keywords to be detected from a preset standard corpus, wherein the keywords to be detected are corresponding keywords when the sentences to be searched are searched.
The input checking method provided by the embodiment of the invention is suitable for a scene that a server corrects the error of the keywords in the sentences input by the user.
In the embodiment of the invention, the input checking method is shown in fig. 2 and comprises three modules of off-line library building, on-line fault tolerance and incremental learning, wherein the off-line library building is used for preprocessing and training the corpus to obtain a template, a corpus dictionary, an editing distance dictionary and a knowledge map; the online fault tolerance is to perform online fault tolerance by utilizing a template, a corpus dictionary, an editing distance dictionary and a knowledge map which are obtained by building a library offline after receiving a sentence to be searched and input by a user; incremental learning is a process of determining updated search times from a search log after online fault tolerance, and updating the updated search times into a corresponding dictionary to complete the incremental learning.
In the embodiment of the invention, when receiving the sentence to be checked input in the input box by the user, the server identifies the first keyword from the sentence to be checked by using the preset input template, then, the server judges the number of the first keyword, and when the server judges that the number of the first keyword is at least two, the server searches at least two attributes corresponding to the at least two keywords to be checked from the preset standard corpus.
Further, before the server obtains at least two attributes corresponding to at least two keywords to be checked, the server determines the keywords to be checked corresponding to the first keyword from a preset standard corpus according to a preset spell checking strategy.
In the embodiment of the present invention, a server obtains preset corpus information by using a crawler system or a database, where the preset corpus information includes entity names, attributes corresponding to the entity names, and search times, for example: zhou Jie Lun, singer and 1 ten thousand times; wherein, the Zhou Jie Lun is the entity name, the attribute is singer, and the number of searching times is 1 ten thousand.
In the embodiment of the invention, the server preprocesses the acquired preset corpus information, the preset corpus information comprises an entity name, entity attributes corresponding to the entity name and search times, wherein the preprocessing comprises converting the complex form of the Chinese character in the preset corpus information into a simplified form, converting the full angle of the symbol into a half angle, converting the capital into the lowercase, and removing some special symbol spaces and illegal characters, then the server trains the preprocessed preset corpus information to respectively generate a preset input template, a preset standard corpus and a knowledge graph (a preset entity relation library), the preset standard corpus comprises a corpus dictionary and an editing distance dictionary, entity names are stored in the corpus dictionary and comprise Chinese names and corresponding English names, and the entity names are specifically selected according to actual conditions; the editing distance dictionary stores entity names, entity attributes and search times, and the entity names, the entity attributes and the search times are specifically selected according to actual conditions.
In the embodiment of the invention, the preset editing distance calculation method is arranged in the editing distance dictionary, and the server can determine the keyword closest to the first keyword editing distance according to the preset editing distance calculation method.
In the embodiment of the invention, a server firstly identifies a first keyword from a sentence to be searched by using a preset input module, then searches the first keyword from a corpus dictionary, and determines the first keyword as a keyword to be checked when the first keyword is searched; and when the first keyword is not found, converting the first keyword into English, and then searching the English first keyword from the corpus dictionary. When the keyword is searched, determining the keyword corresponding to the first English keyword in the corpus dictionary as a keyword to be checked; and when the keyword is not searched out, searching the keyword with the minimum editing distance with the first keyword from the editing distance dictionary, and determining the keyword as the keyword to be checked.
In the embodiment of the invention, the server acquires at least two attributes corresponding to at least two keywords to be checked by utilizing the edit distance dictionary.
In the embodiment of the present invention, the process of determining the number of the first keyword by the server is specifically selected according to an actual situation after the first keyword is identified, or after the server determines the keyword to be detected corresponding to the first keyword, which is not specifically limited in the embodiment of the present invention.
S102, when the attribute of the first keyword to be detected in the at least two attributes belongs to the preset detection attribute, determining a first entity corresponding to the first keyword to be detected according to the attribute of the first keyword to be detected.
When the server judges that the attribute of a first keyword to be checked in the at least two attributes belongs to the preset checking attribute, the server determines a first entity corresponding to the first keyword to be checked according to the attribute of the first keyword to be checked.
In the embodiment of the invention, the server is preset with preset checking attributes which need to be further checked, after the server determines at least two attributes, the server sequentially judges whether the at least two attributes belong to the preset checking attributes, and when the attribute of a first keyword to be checked in the at least two attributes belongs to the preset checking attributes, the server searches a first entity corresponding to the first keyword to be checked from the knowledge graph.
S103, at least one entity related to the first entity is searched from a preset entity relation library.
After the server determines a first entity corresponding to the first keyword to be checked according to the attribute of the first keyword to be checked, the server searches at least one entity related to the first entity from a preset entity relation library.
In the embodiment of the invention, the knowledge graph is a data structure based on a graph and consists of nodes and edges, wherein each node represents an entity, and each edge is a relationship between the entities. The knowledge graph may be represented by entity-relationship-entity triples, and stored using a conventional Resource Description Framework (RDF) or a graph database, and is used to query at least one entity related to the first entity.
As shown in fig. 3, the pair of entities, which are both workmanship and jean are busy, are the relationship of the work.
And S104, determining an entity to be queried with the highest correlation degree with a second keyword to be tested from at least one entity based on a preset correlation degree calculation method so as to utilize the first entity and the entity to be queried to perform a searching process, wherein the second keyword to be tested is the keyword to be tested except the first keyword to be tested in at least two keywords to be tested.
After the server searches at least one entity related to the first entity from the preset entity relation library, the server determines an entity to be queried with the highest correlation degree with the second keyword to be queried from the at least one entity based on a preset correlation degree calculation method, and searches by using the first entity and the entity to be queried.
In the embodiment of the present invention, the preset correlation calculation method includes an edit distance calculation method, a vector space model, and the like, which are specifically selected according to actual situations, and the embodiment of the present invention is not specifically limited.
In the embodiment of the invention, the server calculates at least one editing distance between at least one entity and a second keyword to be checked according to a preset editing distance calculation method, and then determines a first editing distance with the minimum editing distance and an entity to be checked corresponding to the first editing distance from the at least one editing distance.
In the embodiment of the present invention, the preset edit distance calculation method includes: definition representation di,jThe character string a of length i becomes the edit distance required for the character string b of length j. If the final length of string a is m and the final length of string b is n, the d matrix is a matrix of (m +1) × (n +1) since transitions between strings of length 0 can be represented.
Wherein, wdel(bi) And wins(aj) Is 1, when the ith character of the character string a is not equal to the jth character of the character string b, wsub(aj,bi) 1 is ═ 1; otherwise it equals 0.
Preferably, the server searches at least one entity associated with the first keyword to be checked from the knowledge graph, judges whether the second keyword to be checked exists in the at least one entity, and directly returns the first keyword to be checked and the second keyword to be checked when the second keyword exists in the at least one entity, otherwise, calculates at least one edit distance between the at least one entity and the second keyword to be checked according to a preset edit distance calculation method.
In the embodiment of the present invention, the server sorts at least one edit distance in a descending order or a descending order, and a specific sorting method is selected according to an actual situation.
In the embodiment of the invention, the server determines the first editing distance with the minimum editing distance from the at least one editing distance after sequencing, then the server determines the entity to be queried corresponding to the first editing distance, and searches by using the first entity and the entity to be queried.
Further, after the server searches by using the first entity and the entity to be queried, adding one to the search record in the search log, and updating the search times of the search record to the edit distance dictionary by the server to complete the incremental learning process.
Illustratively, as shown in FIG. 4, the check logic for the input check is: after receiving a query request of a sentence to be queried, performing deep preprocessing on the sentence to be queried, including screening out a first keyword by using a template, then performing formatting operation on the first keyword, including conversion from a traditional Chinese character to a simplified Chinese character, conversion from a symbol full angle to a half angle, conversion from an upper case to a lower case of a letter, conversion from an Arabic number to a simple Chinese character, removing a special symbol space, an illegal character and the like, and then determining whether a near word exists in the first keyword by using an error correction dictionary; then, error correction is carried out on the first keyword based on rules, including homonym word identification by using a corpus dictionary, fuzzy pinyin identification and multi-word and few-word identification by using an edit distance dictionary, and the keyword to be detected is obtained; when the number of the keywords to be checked is at least two and the first attribute of the first keyword to be checked in the at least two keywords to be checked belongs to the preset checking attribute, error correction is carried out on the at least two keywords to be checked based on semantics.
It is understood that the server is provided with a preset entity relation library storing the relations between the entities, when the server receives at least two keywords to be checked, such as 'chopsticks brother apple', the server searches a first entity 'chopsticks brother' corresponding to the first keyword to be checked and at least one entity related to the 'chopsticks brother' from a preset entity relation library, namely the song sung by the brother of the chopsticks, and the entity to be inquired, namely the apple with the highest correlation degree with the second keyword to be checked, is determined from the song sung by the brother of the chopsticks, the first entity 'chopsticks brother' and the entity to be inquired is 'small apple' are two entities with the highest correlation degree, the server searches by using the 'chopsticks brother small apple', and the server can correct errors of keywords to be checked by combining the semantics of the user, so that the accuracy of input checking is improved.
Example two
An embodiment of the present invention provides an input checking method, as shown in fig. 5, the method may include:
s201, when the server receives an input sentence to be searched, the server identifies a first keyword from the sentence to be searched by using a preset input template.
The input checking method provided by the embodiment of the invention is suitable for a scene that a server corrects the error of the keywords in the sentences input by the user.
In the embodiment of the invention, the input checking method is shown in fig. 2 and comprises three modules of off-line library building, on-line fault tolerance and incremental learning, wherein the off-line library building is used for preprocessing and training the corpus to obtain a template, a corpus dictionary, an editing distance dictionary and a knowledge map; the online fault tolerance is to perform online fault tolerance by utilizing a template, a corpus dictionary, an editing distance dictionary and a knowledge map which are obtained by building a library offline after receiving a sentence to be searched and input by a user; incremental learning is a process of determining updated search times from a search log after online fault tolerance, and updating the updated search times into a corresponding dictionary to complete the incremental learning.
In the embodiment of the invention, a user inputs the sentence to be checked in the input box, and the server is favorable for presetting the input template and identifying the first keyword from the sentence to be checked.
Illustratively, an input template is predefined, such as "songs that I want to listen to ()", wherein the type of (is) singer, and when the user inputs "songs that I want to listen to game show", the server returns "game show" as singer according to the predefined input template.
In the embodiment of the invention, when the server receives the keyword group input by the user, the server directly decomposes the keyword group input by the user into the first keyword according to the word segmentation method.
S202, the server matches the first keyword with a preset standard corpus.
After the server identifies the first keyword from the sentence to be searched by using the preset input template, the server needs to match the first keyword with a preset standard corpus.
In the embodiment of the invention, a server carries out deep preprocessing on a first keyword, including carrying out Chinese character complex form conversion to simple form conversion, symbol full angle conversion to half angle conversion, letter upper case conversion to lower case conversion, Arabic number conversion to simple Chinese character, removing special symbols, blank spaces, illegal characters and the like on the first keyword, and searching for a near-form character corresponding to the first keyword by using an error correction dictionary; then the server carries out rule-based error correction on the preprocessed first keyword, wherein the rule-based error correction comprises the steps of searching the first keyword from a corpus dictionary, and when the first keyword is searched, the first keyword is determined as a keyword to be checked; and when the first keyword is not found, converting the first keyword into English, and then searching the English first keyword from the corpus dictionary. When the keyword is searched, determining the keyword corresponding to the first English keyword in the corpus dictionary as a keyword to be checked; and when the keyword is not searched out, searching the keyword with the minimum editing distance with the first keyword from the editing distance dictionary, and determining the keyword as the keyword to be checked.
In the embodiment of the invention, the error correction dictionary stores the error query information and the corresponding correct query information, and the attributes corresponding to the correct query information, for example, the error correction dictionary is provided with a Figure and a Figure singer, so that the error can be corrected to a Figure when the first keyword is the Figure, and the attribute of the Figure can be identified to a singer.
In the embodiment of the present invention, a server obtains preset corpus information by using a crawler system or a database, where the preset corpus information includes entity names, attributes corresponding to the entity names, and search times, for example: zhou Jie Lun, singer and 1 ten thousand times; wherein, the Zhou Jie Lun is the entity name, the attribute is singer, and the number of searching times is 1 ten thousand.
In the embodiment of the invention, a server trains the preprocessed preset corpus information to respectively generate a preset input template, a preset standard corpus and a knowledge graph (a preset entity relation library), wherein the preset standard corpus comprises a corpus dictionary and an editing distance dictionary, entity names including Chinese names and corresponding English names are stored in the corpus dictionary, and the entity names are specifically selected according to actual conditions; the editing distance dictionary stores entity names, entity attributes and search times, and the entity names, the entity attributes and the search times are specifically selected according to actual conditions.
In the embodiment of the invention, the preset editing distance calculation method is arranged in the editing distance dictionary, and the server can determine the keyword closest to the first keyword editing distance according to the preset editing distance calculation method.
In the embodiment of the invention, the server judges the word length of the first keyword, and when the word length is smaller than a preset length threshold, the first keyword is searched from the corpus dictionary.
Illustratively, if the first keyword is Chinese and has a length less than 10, the first keyword is looked up from the corpus dictionary.
S203, when the first keyword is matched with the first preset keyword, the server adds the first keyword to the keyword to be checked.
When the server determines that the first keyword is matched with a first preset keyword in a preset standard corpus, the server needs to add the first keyword to the keyword to be checked.
In the embodiment of the invention, when the server judges that the first keyword is matched with the first preset keyword in the corpus dictionary, the characterization server inquires the first keyword from the corpus dictionary, and at the moment, the server searches the fifth attribute corresponding to the first preset keyword from the editing distance dictionary.
In the embodiment of the invention, the server respectively adds the first keyword to the keyword to be detected and adds the fifth attribute to at least two attributes.
S204, when the first keyword is not matched with the first preset keyword in the preset standard corpus, the server converts the first keyword into pinyin to be checked.
When the server judges that the first keyword is not matched with the first preset keyword in the preset standard corpus, the server needs to convert the first keyword into pinyin to be checked.
In the embodiment of the invention, when the server judges that the first keyword is not matched with the first preset keyword in the corpus dictionary, the server converts the first keyword into the pinyin to be checked.
Illustratively, the user enters "before learning," and the server does not find the word from the corpus dictionary, at which point the server converts "before learning" to the pinyin "xuezhiqian.
S203 and S204 are two parallel steps after S202, and are specifically selected to be executed according to actual situations, and the embodiment of the present invention is not specifically limited.
S205, when the pinyin to be checked is matched with the first pinyin in the preset standard corpus, the server acquires a second preset keyword corresponding to the first pinyin.
When the server converts the first keyword into the pinyin to be checked, the server matches the pinyin to be checked with the first pinyin in the preset standard corpus, and when the pinyin to be checked is matched with the first pinyin in the preset standard corpus, the server acquires a second preset keyword corresponding to the first pinyin.
In the embodiment of the invention, the server judges the pinyin length of the pinyin to be checked, matches the pinyin to be checked with the first pinyin in the corpus dictionary when the pinyin length is smaller than a preset length threshold value, and acquires a second preset keyword corresponding to the first pinyin and a second attribute corresponding to the second preset keyword when the matching is successful.
Illustratively, if the length of the pinyin to be checked is less than 20, the server searches the pinyin to be checked from the corpus dictionary.
Illustratively, the server searches for the second preset keyword corresponding to "xuezhiqian" from the corpus dictionary as "hummer", and then the server obtains the attribute of "hummer" from the edit distance dictionary as "singer".
S206, the server adds the second preset keyword to the keyword to be checked.
After the server acquires the second preset keyword, the server adds the second preset keyword to the keyword to be checked.
In the embodiment of the invention, the server adds the second preset keyword to the keyword to be checked and adds the second attribute to at least two attributes.
And S207, when the pinyin to be checked is not matched with the first pinyin, the server searches a third preset keyword which is closest to the editing distance of the first keyword from the preset standard corpus.
When the server judges that the pinyin to be checked is not matched with the first pinyin, the server needs to search a third preset keyword which is closest to the editing distance of the first keyword from the preset standard corpus.
In the embodiment of the invention, when the server judges that the pinyin to be checked is not matched with the first pinyin, the server searches a third preset keyword which is closest to the editing distance of the first keyword and a third attribute corresponding to the third preset keyword by using the editing distance dictionary.
Illustratively, as shown in fig. 6, the process of the server performing rule-based error correction on the first keyword is as follows:
1. formatting the first keyword;
2. matching the first keyword with a corpus dictionary;
3. when the matching is successful, determining the first keyword as a keyword to be checked;
4. when the matching fails, converting the first keyword into a pinyin to be detected;
5. matching the pinyin to be detected with a first pinyin in a corpus dictionary;
6. when the matching is successful, searching a second preset keyword corresponding to the first pinyin from the corpus dictionary, and determining the second preset keyword as a keyword to be checked;
7. and when the matching fails, searching a third preset keyword with the minimum editing distance with the first keyword to be detected by using the editing distance dictionary, and determining the third preset keyword as the keyword to be detected.
S205-S206 and S207 are two parallel steps after S204, and are specifically selected to be executed according to actual situations, and the embodiment of the present invention is not specifically limited.
S208, when the third preset keyword comprises at least two keywords, the server searches at least two search times corresponding to the at least two keywords from the preset standard corpus.
After the server finds out a third preset keyword which is closest to the editing distance of the first keyword, the server needs to judge whether the third preset keyword comprises at least two keywords, and when the server judges that the third preset keyword comprises the at least two keywords, the server searches for at least two search times corresponding to the at least two keywords from a preset standard corpus.
In the embodiment of the invention, when the server judges that at least two keywords are closest to the editing distance of the first keyword, the server searches at least two search times corresponding to the at least two keywords from the editing distance dictionary.
S209, the server determines a fourth preset keyword with the most searching times from the at least two keywords according to the at least two searching times.
After the server finds at least two search times corresponding to the at least two keywords, the server determines a fourth preset keyword with the largest search time from the at least two keywords according to the at least two search times.
In the embodiment of the invention, the server sorts at least two search times in a descending order or a descending order, and then the server determines a fourth preset keyword with the highest search time and a fourth attribute corresponding to the fourth keyword to be detected.
S210, the server adds a fourth preset keyword to the keyword to be checked.
After the server determines the fourth preset keyword with the largest searching frequency, the server needs to add the fourth preset keyword to the keyword to be detected.
In the embodiment of the invention, the server adds the fourth preset keyword to the keyword to be detected and adds the fourth attribute to at least two attributes.
S211, when the server judges that the number of the keywords to be detected is one, the server searches by using the keywords to be detected.
After the server determines the keywords to be detected corresponding to the first keyword from the preset standard corpus, the server needs to judge the number of the keywords to be detected, and when the number of the keywords to be detected is one, the server needs to search by using the keywords to be detected.
In the embodiment of the invention, the server searches the keywords to be checked and displays the search result on the current display interface.
S212, when the keywords to be detected are at least two keywords to be detected, the server obtains at least two attributes corresponding to the at least two keywords to be detected from a preset standard corpus.
After the server determines at least two keywords to be checked corresponding to the at least two keywords to be checked from the preset standard corpus, the server acquires at least two attributes corresponding to the at least two keywords to be checked from the preset standard corpus.
In the embodiment of the invention, the editing distance dictionary stores at least two attributes corresponding to at least two keywords to be checked, and when the server uses the editing distance dictionary to perform spelling error correction on the first keyword to obtain at least two keywords to be checked, the server can simultaneously obtain at least two attributes corresponding to at least two keywords to be checked.
S211 and S212 are two parallel steps after S210, which are specifically selected according to actual situations, and the embodiment of the present invention is not specifically limited.
S213, when the attribute of the first keyword to be checked in the at least two attributes belongs to the preset checking attribute, the server determines a first entity corresponding to the first keyword to be checked according to the attribute of the first keyword to be checked.
When the server obtains at least two attributes corresponding to at least two keywords to be checked from a preset standard corpus, the server needs to sequentially judge whether the at least two attributes belong to preset checking attributes, and when the server judges that the attribute of the first keyword to be checked belongs to the preset checking attributes, the server needs to determine a first entity corresponding to the first keyword to be checked according to the attribute of the first keyword to be checked.
In the embodiment of the invention, the server is preset with preset checking attributes which need to be further checked, when the server determines at least two attributes, the server sequentially judges whether the at least two attributes belong to the preset checking attributes, and when the attribute of the first keyword to be checked is judged to belong to the preset checking attributes, the server searches the first entity corresponding to the first keyword to be checked from the knowledge graph.
S214, the server searches at least one entity related to the first entity from a preset entity relation library.
After the server determines the first entity corresponding to the first keyword to be checked according to the attribute of the first keyword to be checked, the server searches at least one entity related to the first entity from the preset entity relationship library.
In the embodiment of the invention, the knowledge graph is a data structure based on a graph and consists of nodes and edges, wherein each node represents an entity, and each edge is a relationship between the entities. The knowledge graph may be represented by entity-relationship-entity triples, and stored using a conventional RDF or graph database, and is used to query at least one entity related to the first keyword to be checked.
In the embodiment of the invention, the generation process of the knowledge graph comprises the following steps: for structured and semi-structured data, carrying out batch processing by using D2R or a data acquisition tool, extracting entities and attributes from the data, establishing triples of entity relations, and constructing a knowledge graph; for unstructured text information, natural language processing technology is utilized to carry out word segmentation, syntactic dependency analysis and class recognition constraint on the text, the vocabulary meeting the constraint is constructed into entities of corresponding classes, and data values are supplemented.
As shown in fig. 3, the pair of entities, which are both workmanship and jean are busy, are the relationship of the work.
S215, the server calculates at least one edit distance between at least one entity and the second keyword to be checked according to a preset edit distance calculation method.
After the server finds at least one entity related to the first keyword to be checked, the server calculates at least one edit distance between the at least one entity and a second keyword to be checked according to a preset edit distance calculation method, wherein the second keyword to be checked is the keyword to be checked except the first keyword to be checked in the at least two keywords to be checked.
In the embodiment of the present invention, the preset edit distance calculation method includes: definition representation di,jThe character string a of length i becomes the edit distance required for the character string b of length j. If the final length of the character string a is m,the final length of string b is n, and the d matrix is a matrix of (m +1) × (n +1), since transitions between strings of length 0 can be represented.
Wherein, wdel(bi) And wind(aj) Is 1, when the ith character of the character string a is not equal to the jth character of the character string b, wsub(aj,bi) 1 is ═ 1; otherwise it equals 0.
Preferably, the server searches at least one entity associated with the first keyword to be checked from the knowledge graph, judges whether the second keyword to be checked exists in the at least one entity, and directly returns the first keyword to be checked and the second keyword to be checked when the second keyword exists in the at least one entity, otherwise, calculates at least one edit distance between the at least one entity and the second keyword to be checked according to a preset edit distance calculation method.
S216, the server determines a first editing distance with the minimum editing distance and an entity to be queried corresponding to the first editing distance from the at least one editing distance, and the first entity and the entity to be queried are used for searching.
After the server calculates at least one edit distance between at least one entity and a second keyword to be checked, the server determines a first edit distance with the minimum edit distance from the at least one edit distance, and determines an entity to be queried corresponding to the first edit distance, so as to utilize the first entity and the entity to be queried to perform a searching process.
In the embodiment of the present invention, the server sorts at least one edit distance in a descending order or a descending order, and a specific sorting method is selected according to an actual situation.
In the embodiment of the invention, the server determines the first editing distance with the minimum editing distance from the at least one editing distance after sequencing, then the server determines the entity to be queried corresponding to the first editing distance, and searches by using the first entity and the entity to be queried.
Illustratively, as shown in fig. 7, the logic of semantic-based error correction performed by the server on the first keyword to be checked is as follows:
1. searching at least one entity related to the first keyword to be checked from the knowledge graph;
2. judging whether the second keyword to be detected is matched with at least one entity;
3. when the keywords are matched, directly returning the first keywords to be checked and the second keywords to be checked;
4. when the first entity and the second entity are not matched, at least one editing distance between the at least one entity and the second keyword to be detected is calculated in sequence;
5. determining a first editing distance with the minimum editing distance and an entity to be queried corresponding to the first editing distance from at least one editing distance;
6. and returning the first keyword to be checked and the entity to be inquired.
S217, the server respectively obtains the first historical search times of the first entity and the second historical search times of the entity to be inquired.
After the server searches by using the first entity and the entity to be queried, the server needs to respectively obtain a first historical search frequency of the first keyword to be checked and a second historical search frequency of the entity to be queried.
In the embodiment of the invention, the server acquires the first historical search times of the first entity and the second historical search times of the entity to be inquired from the search log.
S218, the server adds the first historical search times and the second historical search times to a preset standard corpus.
After the server respectively obtains a first historical search frequency of the first entity and a second historical search frequency of the entity to be queried, the server needs to add the first historical search frequency and the second historical search frequency to a preset standard corpus.
In the embodiment of the invention, the server updates the first historical search times and the second historical search times to the corresponding positions of the editing distance dictionary to complete the incremental learning process.
It is understood that the server is provided with a preset entity relation library storing the relations between the entities, when the server receives at least two keywords to be checked, such as 'chopsticks brother apple', the server searches a first entity 'chopsticks brother' corresponding to the first keyword to be checked and at least one entity related to the 'chopsticks brother' from a preset entity relation library, namely the song sung by the brother of the chopsticks, and the entity to be inquired, namely the apple with the highest correlation degree with the second keyword to be checked, is determined from the song sung by the brother of the chopsticks, the first entity 'chopsticks brother' and the entity to be inquired is 'small apple' are two entities with the highest correlation degree, the server searches by using the 'chopsticks brother small apple', and the server can correct errors of keywords to be checked by combining the semantics of the user, so that the accuracy of input checking is improved.
EXAMPLE III
Fig. 8 is a schematic diagram of a composition structure of a server according to an embodiment of the present invention, and in practical applications, based on the same inventive concept of the first embodiment to the second embodiment, as shown in fig. 8, a server 1 according to an embodiment of the present invention includes: a processor 10, a memory 11, and a communication bus 12. In a Specific embodiment, the Processor 10 may be at least one of an Application Specific Integrated Circuit (ASIC), a Digital Signal Processor (DSP), a Digital Signal Processing Device (DSPD), a Programmable Logic Device (PLD), a Field Programmable Gate Array (FPGA), a CPU, a controller, a microcontroller, and a microprocessor. It will be appreciated that the electronic devices used to implement the processor functions described above may be other devices, and embodiments of the present invention are not limited in particular.
In the embodiment of the present invention, the communication bus 12 is used for realizing connection communication between the processor 10 and the memory 11; the processor 10 is configured to execute the operating program stored in the memory 11 to implement the following steps:
the processor 10 is configured to, when a keyword to be checked is at least two keywords to be checked, obtain at least two attributes corresponding to the at least two keywords to be checked from a preset standard corpus, where the keyword to be checked is a keyword corresponding to a sentence to be searched; when the attribute of a first keyword to be checked in the at least two attributes belongs to a preset checking attribute, determining a first entity corresponding to the first keyword to be checked according to the attribute of the first keyword to be checked; searching at least one entity related to the first entity from a preset entity relation library; and based on a preset relevance calculation method, determining an entity to be queried with the highest relevance to a second keyword to be checked from the at least one entity so as to utilize the first entity and the entity to be queried to perform a searching process, wherein the second keyword to be checked is the keyword to be checked except the first keyword to be checked in the at least two keywords to be checked.
In an embodiment of the present invention, the processor 10 is further configured to, when receiving a sentence to be searched, which is input by a user, identify a first keyword from the sentence to be searched by using a preset input template; and determining the keyword to be checked corresponding to the first keyword from the preset standard corpus according to a preset spelling check strategy.
In an embodiment of the present invention, further, the processor 10 is further configured to calculate at least one edit distance between the at least one entity and the second keyword to be inspected according to a preset edit distance calculation method; and determining a first editing distance with the minimum editing distance and the entity to be queried corresponding to the first editing distance from the at least one editing distance.
In an embodiment of the present invention, the processor 10 is further configured to match the first keyword with a preset standard corpus; when the first keyword is not matched with a first preset keyword in the preset standard corpus, converting the first keyword into pinyin to be checked; when the pinyin to be checked is matched with a first pinyin in the preset standard corpus, acquiring a second preset keyword corresponding to the first pinyin; and adding the second preset keyword to the keyword to be detected.
In an embodiment of the present invention, the processor 10 is further configured to search, when the pinyin to be checked is not matched with the first pinyin, a third preset keyword closest to an editing distance of the first keyword from the preset standard corpus; and adding the third preset keyword to the keyword to be detected.
In an embodiment of the present invention, further, the processor 10 is further configured to search, when the third preset keyword includes at least two keywords, at least two search times corresponding to the at least two keywords from the preset standard corpus; determining a fourth preset keyword with the largest searching times from the at least two keywords according to the at least two searching times; and adding the fourth preset keyword to the keyword to be detected.
In an embodiment of the present invention, the processor 10 is further configured to add the first keyword to the keyword to be checked when the first keyword matches the first preset keyword.
In this embodiment of the present invention, further, the processor 10 is further configured to obtain a first historical search frequency of the first entity and a second historical search frequency of the entity to be queried, respectively; adding the first historical search times and the second historical search times to the preset standard corpus.
In an embodiment of the present invention, the processor 10 is further configured to perform a search by using the keyword to be checked when it is determined that the number of the keyword to be checked is one.
According to the server provided by the embodiment of the invention, when the keywords to be detected are at least two keywords to be detected, at least two attributes corresponding to the at least two keywords to be detected are obtained from the preset standard corpus, and the keywords to be detected are corresponding keywords when the sentences to be searched are searched; when the attribute of a first keyword to be checked in the at least two attributes belongs to a preset checking attribute, determining a first entity corresponding to the first keyword to be checked according to the attribute of the first keyword to be checked; searching at least one entity related to the first entity from a preset entity relation library; and determining an entity to be queried with the highest correlation degree with a second keyword to be checked from at least one entity based on a preset correlation degree calculation method so as to utilize the first entity and the entity to be queried to perform a searching process, wherein the second keyword to be checked is the keyword to be checked except the first keyword to be checked in at least two keywords to be checked. Therefore, the server provided by the embodiment of the invention is provided with the preset entity relationship library storing the relationship between the entities, when the server receives at least two keywords to be checked, such as "chopsticks brother apple", the server searches the first entity "chopsticks brother" corresponding to the first keyword to be checked and at least one entity related to the "chopsticks brother", namely the song sung by the "chopsticks brother", and determines the entity "apple" to be checked with the highest correlation with the second keyword "apple" from the song sung by the "chopsticks brother", at this time, the first entity "chopsticks brother" and the entity to be checked are the two entities with the highest correlation, the server searches by using the "chopsticks brother apple", and the server can correct the keyword to be checked by combining the semantics of the user, thereby improving the accuracy of the input check.
The embodiment of the invention provides a computer-readable storage medium, which stores one or more programs, wherein the one or more programs are executable by one or more processors and are applied to a server, and when the programs are executed by the processors, the method according to the first embodiment and the second embodiment is realized.
Specifically, the program instructions corresponding to an input checking method in the embodiment are read or executed by an electronic device, and include the following steps:
when the keywords to be detected are at least two keywords to be detected, acquiring at least two attributes corresponding to the at least two keywords to be detected from a preset standard corpus, wherein the keywords to be detected are keywords corresponding to the sentences to be searched;
when the attribute of a first keyword to be checked in the at least two attributes belongs to a preset checking attribute, determining a first entity corresponding to the first keyword to be checked according to the attribute of the first keyword to be checked;
searching at least one entity related to the first entity from a preset entity relation library;
and based on a preset relevance calculation method, determining an entity to be queried with the highest relevance to a second keyword to be checked from the at least one entity so as to utilize the first entity and the entity to be queried to perform a searching process, wherein the second keyword to be checked is the keyword to be checked except the first keyword to be checked in the at least two keywords to be checked.
In an embodiment of the present invention, before obtaining at least two attributes corresponding to the at least two keywords to be checked from a preset standard corpus, the one or more programs are executed by the one or more processors, and further implement the following steps:
when a sentence to be searched input by a user is received, identifying a first keyword from the sentence to be searched by using a preset input template;
and determining the keyword to be checked corresponding to the first keyword from the preset standard corpus according to a preset spelling check strategy.
In an embodiment of the present invention, further, based on a preset relevance calculating method and the at least one entity, an entity to be queried having a highest relevance to a second keyword to be queried is calculated, where the one or more programs are executed by the one or more processors, and the following steps are specifically implemented:
calculating at least one edit distance between the at least one entity and the second keyword to be checked according to a preset edit distance calculation method;
and determining a first editing distance with the minimum editing distance and the entity to be queried corresponding to the first editing distance from the at least one editing distance.
In an embodiment of the present invention, further, according to a preset spell checking policy, a keyword to be checked corresponding to the first keyword is determined from the preset standard corpus, and the one or more programs are executed by the one or more processors, and specifically implement the following steps:
matching the first keyword with a preset standard corpus;
when the first keyword is not matched with a first preset keyword in the preset standard corpus, converting the first keyword into pinyin to be checked;
when the pinyin to be checked is matched with a first pinyin in the preset standard corpus, acquiring a second preset keyword corresponding to the first pinyin;
and adding the second preset keyword to the keyword to be detected.
In an embodiment of the present invention, further, after the first keyword is converted into the pinyin to be checked, the one or more programs are executed by the one or more processors, and then the following steps are implemented:
when the pinyin to be checked is not matched with the first pinyin, searching a third preset keyword which is closest to the editing distance of the first keyword from the preset standard corpus;
and adding the third preset keyword to the keyword to be detected.
In an embodiment of the present invention, further, after searching for a third preset keyword closest to an editing distance of the first keyword from the preset standard corpus, before adding the third preset keyword to the keyword to be detected, the one or more programs are executed by the one or more processors, and the following steps are further implemented:
when the third preset keyword comprises at least two keywords, searching at least two search times corresponding to the at least two keywords from the preset standard corpus;
determining a fourth preset keyword with the largest searching times from the at least two keywords according to the at least two searching times; accordingly, the method can be used for solving the problems that,
adding the third preset keyword to the keyword to be checked, wherein the one or more programs are executed by the one or more processors, and the following steps are specifically realized:
and adding the fourth preset keyword to the keyword to be detected.
In an embodiment of the present invention, further, after matching the first keyword with a preset standard corpus, the one or more programs are executed by the one or more processors, and the following steps are further implemented:
when the first keyword is matched with the first preset keyword, the first keyword is added into the keyword to be detected
In an embodiment of the present invention, further, after the process of searching by using the first entity and the entity to be queried is performed, the one or more programs are executed by the one or more processors, and the following steps are further implemented:
respectively acquiring a first historical search frequency of the first entity and a second historical search frequency of the entity to be inquired;
adding the first historical search times and the second historical search times to the preset standard corpus.
In an embodiment of the present invention, further, after determining the keyword to be checked corresponding to the first keyword from the preset standard corpus according to a preset spell checking policy, the one or more programs are executed by the one or more processors, and the following steps are further implemented:
and when the number of the keywords to be checked is judged to be one, searching by using the keywords to be checked.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.
The above description is only for the specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.