CN113535895A

CN113535895A - Search text processing method and device, electronic equipment and medium

Info

Publication number: CN113535895A
Application number: CN202110695117.6A
Authority: CN
Inventors: 钱昉
Original assignee: Beijing Sankuai Online Technology Co Ltd
Current assignee: Beijing Sankuai Online Technology Co Ltd
Priority date: 2021-06-22
Filing date: 2021-06-22
Publication date: 2021-10-22

Abstract

The embodiment of the application provides a method, a device, electronic equipment and a medium for processing a search text, aiming at improving the accuracy of query error correction, wherein the method comprises the following steps: determining a current error correction text segment to be corrected from a search text to be processed; text recalling is carried out on the basis of the error correction text segments, and a plurality of texts to be recalled corresponding to the error correction text segments are obtained; searching a search object in a target index library based on the texts to be recalled respectively, wherein the target index library stores a plurality of index records which take the description words as indexes respectively; one index record corresponds to one or more search objects, and the description words are participles in the names of the search objects; and determining a target text for correcting the error correction text segment from the plurality of texts to be recalled according to the retrieval results corresponding to the plurality of texts to be recalled respectively.

Description

Search text processing method and device, electronic equipment and medium

Technical Field

The present application relates to the field of computer processing technologies, and in particular, to a method and an apparatus for processing a search text, an electronic device, and a medium.

Background

With the popularization of network technology, various network platforms come into play, and different network platforms provide different services for users. In order to facilitate users to use the network platform, the network platform generally provides query services for the users, so that the users can efficiently acquire information required by the users.

In the related art, in order to accurately respond to the query intention of a user, after receiving a search text sent by the user, the network platform performs error correction processing on the search text, which is called query error correction. Specifically, the query error correction refers to correcting a part of text segments or all texts in a search text searched by a user, so as to search a search result which is more in line with the real intention of the user. For example, the "unitary shorthand" input by the user is corrected to "western shorthand", or the "li jiang" in the "li jiang li" input by the user is corrected to "lijiang", which is a text fragment requiring error correction.

The method adopted by query error correction in the related art is generally a method for recalling ranking, and the main process of the method is as follows: recalling some texts from the text segments needing error correction, and then utilizing a language model to score and sort the recalled texts, so that the text segments needing error correction are corrected based on the texts in the front of the sorting. However, in this way, the recalled texts are sorted mainly by depending on the language model, however, the language model mainly learns the rule expressed by human beings, but in some scenes focused on merchants and commodity search, the merchant/commodity naming-odds-with-the-thousand strange language model is difficult to learn, so that the accuracy of the texts for correcting the search texts determined by the language model is not high, and further the accuracy of query correction is not high.

Disclosure of Invention

In order to solve the above problems, the present application provides a method, an apparatus, an electronic device, and a medium for processing a search text, which aim to improve accuracy of query error correction.

In a first aspect of the embodiments of the present disclosure, a method for processing a search text is provided, where the method includes:

determining a current error correction text segment to be corrected from a search text to be processed;

text recalling is carried out on the basis of the error correction text segments, and a plurality of texts to be recalled corresponding to the error correction text segments are obtained;

searching a search object in a target index library based on the texts to be recalled respectively, wherein the target index library stores a plurality of index records which take the description words as indexes respectively; one index record corresponds to one or more search objects, and the description words are participles in the names of the search objects;

and determining a target text for correcting the error correction text segment from the plurality of texts to be recalled according to the retrieval results corresponding to the plurality of texts to be recalled respectively.

Optionally, retrieving the search object of the target index library based on the plurality of texts to be recalled respectively includes:

based on the texts to be recalled, searching a search object of a target index library by multiple different granularities, wherein the multiple different granularities at least comprise a fragment text searching granularity and a complete text searching granularity;

determining a target text for correcting the error correction text segment from the plurality of texts to be recalled according to the retrieval result corresponding to each of the plurality of texts to be recalled, wherein the determining comprises:

and determining a target text for correcting the error correction text segments from the plurality of texts to be recalled according to the retrieval results of the plurality of texts to be recalled, which correspond to different granularities respectively.

Optionally, based on the multiple texts to be recalled, the search object of the target index repository is retrieved at multiple different granularities, including:

respectively taking the plurality of texts to be recalled as retrieval texts, retrieving the search objects of the target index library to obtain at least one candidate recall text of the retrieved search objects;

replacing the error correction text segments in the search text with the at least one candidate recall text respectively to obtain candidate search texts corresponding to the at least one candidate recall text respectively;

searching a search object of the target index library by taking the candidate search text as a search text;

determining a target text for correcting the error correction text segment from the plurality of texts to be recalled according to the retrieval results of the plurality of texts to be recalled, which correspond to different granularities, respectively, and the method comprises the following steps:

and determining the target text from the at least one candidate recall text according to the retrieval result corresponding to each candidate search text.

Optionally, determining, according to search results of different granularities corresponding to the multiple texts to be recalled, a target text for error correction of the error correction text segment from the multiple texts to be recalled includes:

under the complete text retrieval granularity, acquiring a target search object corresponding to an index record hit by each candidate search text; the candidate search text is obtained by carrying out error correction processing on the search text by a text to be recalled;

determining the multidimensional characteristics corresponding to the candidate search texts based on the candidate search texts and the corresponding target search objects; the multi-dimensional characteristics of each candidate search text comprise similarity characteristics between the candidate search text and the name of the target search object, attribute characteristics of the target search object and context language characteristics of the candidate recall text;

and screening the target text from the candidate recall texts corresponding to the candidate search texts based on the multi-dimensional features corresponding to the candidate search texts.

Optionally, in a case that the corrected text segment is multiple, the method further includes:

when the similarity characteristic of the screened target text is determined to be larger than or equal to a similarity threshold value, replacing the current error correction text segment to be corrected in the search text with the target text to obtain a corrected search text;

and when the similarity characteristic of the screened target text is smaller than the similarity threshold value, determining the target text corresponding to the next error correction text segment until all error correction text segments are traversed, and respectively replacing the error correction text segments in the search text with the corresponding target texts to obtain the error-corrected search text.

Optionally, the error correction text segment is obtained by:

obtaining a position identifier of a user sending the search text;

performing word segmentation processing and/or entity recognition on the search text to obtain a plurality of text segments:

respectively taking the text segments and the position identifications as retrieval texts, retrieving the index records in the target index library to obtain a hit search object corresponding to the index record hit by each text segment;

and determining error correction text segments needing error correction from the text segments according to intersection taking results of the hit search object corresponding to each text segment in the text segments and the hit search objects corresponding to other text segments.

Optionally, performing text recall on the basis of the error correction text segment to obtain a plurality of texts to be recalled corresponding to the error correction text segment, where the text recall includes:

determining a plurality of similar texts associated with the error correction text segments from a plurality of preset text dictionaries, wherein different text dictionaries correspond to different error correction dimensions, and the error correction dimensions at least comprise a near dimension and a near dimension;

filtering the plurality of similar texts based on a preset language model to obtain a similar text set, wherein the similar text set comprises a plurality of filtered similar texts;

and determining a text to be recalled from the similar text set based on the editing distance between each similar text in the similar text set and the error correction text fragment.

Optionally, the method further comprises:

acquiring a plurality of text segment corresponding relations, wherein each text segment corresponding relation is obtained by aligning a search text input by a user in historical search behavior data with a name of a search object clicked by the user;

obtaining a name fragment corresponding to the error correction text fragment from the corresponding relation of the text fragments;

and taking the name fragment corresponding to the error correction text fragment as a similar text and adding the similar text into the similar text set.

Optionally, the target index library is obtained by:

obtaining sample information of a plurality of search object samples, wherein the sample information comprises names, addresses and identifications of the search object samples;

performing word segmentation processing on the name of each search object sample to obtain a plurality of descriptors;

obtaining a search object sample to which each descriptor in a plurality of descriptors belongs;

taking each descriptor as an index item, and constructing an index record of the descriptor based on the sample information of the search object sample to which the descriptor belongs to obtain the target index library; wherein, each index record at least comprises the identification, the category and the address of the search object sample to which the description word belongs.

In a second aspect of the embodiments of the present application, a search text processing apparatus is provided, where the apparatus includes:

the error correction text determination module is used for determining the current error correction text segment to be corrected from the search text to be processed;

the recall module is used for recalling texts based on the error correction text segments to obtain a plurality of texts to be recalled corresponding to the error correction text segments;

the retrieval module is used for retrieving the search objects in the target index library based on the texts to be recalled respectively, and the target index library stores a plurality of index records which take the description words as indexes respectively; one index record corresponds to one or more search objects, and the description words are participles in the names of the search objects;

a target text obtaining module, configured to determine, according to the search result corresponding to each of the multiple texts to be recalled, a target text for error correction of the error correction text segment from the multiple texts to be recalled

In a third aspect of the embodiments of the present disclosure, an electronic device is provided, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and when the processor executes the computer program, the method for processing search text according to an aspect of the present disclosure is implemented.

In addition, an embodiment of the present application further provides a computer-readable storage medium storing a computer program for causing a processor to execute the search text processing method according to the first aspect.

The method for processing the search text in the embodiment of the application can determine the current error correction text segment to be corrected from the search text to be processed; text recalling is carried out on the basis of the error correction text fragments, and a plurality of texts to be recalled corresponding to the error correction text fragments are obtained; and then, searching a search object of a target index library based on the plurality of texts to be recalled as search terms, and determining a target text for correcting the error correction text segment from the plurality of texts to be recalled according to respective corresponding search results of the plurality of texts to be recalled.

By adopting the embodiment of the application, the accuracy rate of query error correction can be improved at least from the following aspects:

on one hand, after the plurality of texts to be recalled are obtained, the texts to be recalled are used as the search terms, and the search objects of the target index library are searched, so that the search quality of searching by using the texts to be recalled as the search terms can be obtained through the search of the target index library, for example, the quantity, the accuracy and the like of the searched search objects can be obtained, and thus, the search quality of the texts to be recalled for searching can be more accurately evaluated through the search results of the texts to be recalled, so that the accuracy of the target texts for correcting error correction text segments is improved, and the error correction accuracy is improved.

On the other hand, because each index record in the target index library takes the participle in the name of the search object as an index, when the target index library is searched based on the text to be recalled, the similar relation between the text to be recalled and the name is taken as a search basis, so that even if the name of a merchant/commodity is strange, the correct evaluation of the retrieval quality of the text to be recalled for retrieval is not influenced, the accuracy of subsequent sequencing of the text to be recalled can be improved, and the error correction accuracy is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required to be used in the description of the embodiments or the related art will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive labor.

FIG. 1 is a flowchart illustrating steps for building a target index repository according to an embodiment of the present application;

FIG. 2 is a block flow diagram of a search text processing method according to an embodiment of the present application;

FIG. 3 is a flowchart illustrating steps of a method for processing search text according to an embodiment of the present application;

fig. 4 is a schematic overall flow chart illustrating a search text processing method according to an embodiment of the present application;

fig. 5 is an overall flowchart illustrating a method for recalling a plurality of texts to be recalled based on a plurality of text dictionaries according to an embodiment of the present application;

FIG. 6 is a flowchart illustrating steps of retrieving a plurality of texts to be retrieved based on a plurality of text dictionaries according to an embodiment of the present application

FIG. 7 is a flow chart illustrating the steps of retrieval at two different granularities according to an embodiment of the present application;

fig. 8 is a flowchart illustrating a step of determining the target text according to a retrieval result corresponding to a candidate search text according to an embodiment of the present application;

FIG. 9 is a flowchart illustrating steps for determining error corrected text segments according to one embodiment of the present application;

fig. 10 is a schematic block diagram of an apparatus for processing search text according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

In view of the problem of low accuracy of query error correction in the related art, the application provides a method for processing search texts, and the adopted core technical concept is as follows: the method comprises the steps of indexing knowledge data of merchants/commodities and the like by means of each participle in the names of the merchants/commodities, constructing a plurality of index records, and obtaining a target index library which can be used for retrieval, so that ranking and evaluation are carried out on the text to be recalled according to the retrieval result of the text to be recalled on the target index library, the problems that the text to be recalled is ranked only by means of a traditional language model and is prone to generating natural language expressions and cannot cope with the richness of the merchant/commodity names are solved, the accuracy of the recalled text used for query and error correction is improved, and the error correction accuracy is improved.

First, how to construct the target index library required by the present application is described, referring to fig. 1, which shows a flowchart of steps for constructing the target index library, and as shown in fig. 1, the method specifically includes the following steps:

step S101: sample information of a plurality of search object samples is obtained, the sample information including names, addresses, and identifications of the search object samples.

In this embodiment, the search object sample may refer to an object sample of a merchant, an address, a hotel, a scenic spot, dishes, a commodity, a movie, and the like, and of course, the search object sample may not be limited to the above object in specific implementation. The sample information describing the sample of the search object may refer to information describing some features of the object, and may not be limited to the name of the object, the address where the object is located, the attribute and the identifier of the object, the contact information of the object, and the like. Specifically, when the sample information of the search target sample is obtained, the stored commodity information may be obtained from a database in the background of the network platform.

Step S102: and performing word segmentation processing on the name of each search object sample to obtain a plurality of descriptors.

In some practical cases, the obtained sample information of the search object sample may be dispersed, for example, the name of the merchant and the address of the merchant may be counted in different files, and therefore, the sample information may be sorted according to the search object sample to classify the sample information of the same search object sample into one class, so as to construct a corresponding index record according to the sample information of each search object sample.

In practice, there may be repeated phrases between names of different search objects, for example, there may be repeated phrases of "yunan lijiang hotel" and "yunan lijiang marble", and in practice, the name of a sample of search objects may be subjected to a word segmentation process, and then the name is split into a plurality of descriptors, and then an index record is constructed for each descriptor, and the descriptors are used as an index. Therefore, during retrieval, because the descriptor is an index item, one descriptor may appear in a plurality of search object samples, and then the plurality of search object samples can be retrieved, thereby improving the retrieval coverage rate of the target index database.

The description words of one search object sample can be obtained by performing word segmentation processing on the name of the search object sample. For example, if the name of the search target sample is "yunan Dali Lijiang Hotel", the word segmentation processing may be performed on the "yunan Dali Lijiang Hotel" to obtain four descriptors, or "yunan", "Dali", "Lijiang" and "Hotel", or may obtain four descriptors, i.e., "yunan", "Dali", "Lijiang", "Dada" and "Hotel".

The word segmentation processing method for the name of the search object may be performed in a conventional word segmentation habit, for example, Yunnan, Dali and Lijiang are place names, the conventional word segmentation is a word, and a hotel is used for describing the service type of a merchant and is also a conventional word segmentation method.

The name of the search object sample in this embodiment may be the complete name of the search object sample.

Step S103: and obtaining a search object to which each description word in the plurality of description words belongs.

In this embodiment, since the word segmentation processing may be performed on the names of the plurality of search object samples, a huge number of descriptors may be obtained.

As described above, one descriptor may appear in a plurality of search object samples, and thus, for each descriptor, it may belong to a plurality of search object samples. Further, after the plurality of descriptors are obtained, a search object to which each descriptor belongs can be specified. For example, for the descriptor "yunnan", it may be determined that the search objects to which the descriptor belongs are "yunnan laijiang hotel" and "yunnan laijiang marble".

Step S104: taking each descriptor as an index item, and constructing an index record of the descriptor based on the sample information of the search object sample to which the descriptor belongs; wherein, each index record at least comprises the identification and the category of the search object sample to which the description word belongs.

In this embodiment, since a plurality of descriptors for each search object sample are obtained and the search objects to which the plurality of descriptors respectively belong are obtained, the index record of the descriptors can be constructed based on the sample information of the search object sample to which the descriptors belong. Specifically, the identifier and the category of the search target sample to which the descriptor belongs may be used as the content in the index record.

Illustratively, taking the name of the sample of the search object as "yunan Daliangjiang Hotel", four descriptors, namely "Yunnan", "Daliang", "Lijiang" and "Hotel", are obtained. Then the following index record corresponding to each descriptor can be constructed:

{ key _ word ": Yunnan", poi _ count ":2," data ": [ [103013894,10], [124314127,10 ]. Wherein, key _ word is a descriptor, poi _ count is the corresponding number of merchants, and data is the specific merchant id and type. Similarly, there will be index records using "Dali", "Lijiang" and "Dahotel" as index entries.

In some practical cases, the same descriptor may appear in search objects located in different places, and in one embodiment, when an index record is constructed, an index item of the descriptor-address may be constructed with an address where the search object is located as a reference, that is, a key _ word is a combination of the descriptor and an address id where the search object is located; to refine the granularity of the index records so that search objects located in different regions can be sorted.

Specifically, for each descriptor, the multiple search object samples to which the descriptor belongs may be classified according to the addresses where the descriptor belongs, so as to classify the search object samples at the same address into one class, and further construct an index record of the search objects which have the unified descriptor and are located in the same region.

By way of example, the search object sample "yunnan Dalijiangjiang Hotel" is located in Yunnan Lijiang, and the search object sample "Guilin mountain and water Hotel" is located in Guilin, which have the same descriptor "Hotel" but are in different index records with the descriptor "Hotel" as well, wherein the index entries key _ word of the index records may be expressed as "Hotel-101" and "Hotel-102".

By adopting the target index library construction scheme of the embodiment, the same search object can exist in different index records, and a plurality of different search objects can be searched by the same index record. Thus, when searching by using the keyword, if one index record is hit, a plurality of search objects can be searched. For example, if the index record of "theory of the Yunnan Lijiang Hotel" is hit, the search objects of "Yunan Rijiang Marble" and "Yunan Lijiang Marble" can be obtained at the same time. Of course, the above listed examples are merely exemplary, and in practice, for a large number of search object samples, one descriptor may correspond to more search object samples.

After the target index library is obtained, query error correction can be performed by using the target index library. Specifically, referring to fig. 2, a framework flowchart of a method for processing a search text according to an embodiment of the present application is exemplarily shown, as shown in fig. 2, fig. 2 generally illustrates processing of the search text of the present application by taking the search text to be processed as "yunnan herli-jiang hotel," and it is understood that the search text taken as an example in fig. 1 should not be construed as limiting the present application.

Referring to fig. 3, a schematic flow chart of steps of a search text processing method according to the present application is shown, where the search text processing method according to the present application may be applied to a background server of a network platform, and as shown in fig. 3, the search text processing method may specifically include the following steps:

step S301: and determining the error correction text segment to be corrected currently from the search text to be processed.

In this embodiment, the to-be-processed search text may be a text sent from a front end, the to-be-processed search text may be obtained after a user inputs a corresponding text in a search box, and for the to-be-processed search text, an error correction text segment in which error correction is required may be determined.

Specifically, the error correction text segment may be a word segmentation in the search text, for example, if the search text is "yunan li jiang hotel", it may be determined that "li jiang" is the error correction text segment that needs error correction by detecting an error. In practice, the error detection model in the related art can be used to determine the error correction text segment in the search text that needs error correction. In some embodiments, one or more error correction text segments may exist in one search text, and when there are a plurality of error correction text segments, each error correction text segment may be subjected to error correction processing in turn. Thus, the text segment to be corrected currently referred to in this application may refer to a text segment that needs to be corrected currently.

Step S302: and performing text recall on the basis of the error correction text segments to obtain a plurality of texts to be recalled corresponding to the error correction text segments.

In this embodiment, a text recall may be performed on the error correction text segment, specifically, candidate texts similar to the error correction text segment may be obtained from some lexicons and used as texts to be recalled, a plurality of texts may be stored in the lexicon, and when the texts to be recalled are obtained from the lexicon, a similarity between each text in the lexicon and the text to be recalled may be determined, and then the text with the similarity higher than a set threshold value is used as the text to be recalled.

The determined text to be recalled, which is similar to the text to be recalled, may be a text with a similar pronunciation to the text to be recalled, for example, "Lijiang river" is similar to "Li river", or a text with a similar font to the text to be recalled, for example, "Li river" is similar to "Li river".

Illustratively, taking the error-corrected text segment as "li jiang", a plurality of texts to be recalled similar to "li jiang", such as "lijiang", "drijiang", "li jiang", may be recalled.

Of course, in practice, the plurality of texts to be recalled may be obtained in other manners.

Step S303: and retrieving the search objects in the target index library based on the plurality of texts to be recalled respectively.

Wherein, the target index stock stores a plurality of index records which respectively take the descriptors as indexes; wherein, one index record corresponds to one or more search objects, and the description words are participles in the names of the search objects.

In this embodiment, because a plurality of index records using a descriptor as an index are stored in the target index library, one index record may correspond to a plurality of search objects to which the descriptor belongs, and the descriptor is a participle in the name of the search object. In this way, the target index library can be retrieved by respectively using the plurality of texts to be recalled as the retrieval words, so as to obtain the search objects corresponding to the hit index records, and further obtain a plurality of search objects.

Illustratively, taking the text segment to be recalled as "lijiang", "drijiang", "li jiang" as the search terms, the text segment to be recalled is searched in the target index library by using "lijiang", "drijiang", "li jiang" as the search terms, respectively, and a plurality of search objects corresponding to "lijiang", and a plurality of search objects corresponding to "lijiang" are obtained.

Step S304: and determining a target text for correcting the error correction text segment from the plurality of texts to be recalled according to the retrieval results corresponding to the plurality of texts to be recalled respectively.

In this embodiment, since the plurality of texts to be recalled respectively correspond to the respective plurality of search objects, the search result may refer to the plurality of search objects retrieved by each text to be recalled. And then, for a plurality of texts to be recalled, the plurality of texts to be recalled can be sorted according to the quantity, accuracy and the like of a plurality of search objects which are respectively retrieved by the plurality of texts to be recalled.

In specific implementation, for a plurality of texts to be recalled, the plurality of texts to be recalled may be sorted according to similarities between the names of the search objects respectively retrieved by the plurality of texts to be recalled and the search texts to be processed. For example, if the name of the retrieved search object has a higher similarity to the search text to be processed, the representation may search for a result with a higher accuracy using the corresponding text to be recalled, and thus, the corresponding text to be recalled may be ranked in the top.

Of course, in some other embodiments, the texts to be recalled may also be sorted according to the number of search objects respectively retrieved by the texts to be recalled and the similarity between the texts to be recalled and the search texts to be processed. Therefore, the more searched search objects, the more abundant results can be retrieved by representing the corresponding text to be recalled, and the corresponding text to be recalled can be sorted in front; and if the similarity between the text to be recalled and the search text to be processed is higher, the representation can search out a result with higher accuracy by utilizing the corresponding text to be recalled, and the plurality of texts to be recalled are ranked according to the quantity and the similarity, so that a more accurate ranking result is obtained.

When the target text for correcting the error correction text segment is determined from the plurality of texts to be recalled, one or more texts to be recalled which are ranked at the top can be used as the target text. And then, replacing the error correction text segment in the search text to be processed with the target text to realize error correction of the search text.

By adopting the technical scheme of the embodiment of the application, the accuracy of query error correction can be improved at least from the following two aspects:

On the other hand, because each index record in the target index library takes the participle in the name of the search object as an index, when the text to be recalled is taken as a search word and the target index library is searched, the coincidence relation between the text to be recalled and the name is taken as a search basis, so that even if the name of a merchant/commodity is got in a strange way, the sequencing of the text to be recalled is not influenced, and more accurate target text can be recalled for error correction, thereby improving the error correction accuracy.

Referring to fig. 4, a schematic flowchart of an overall search text processing method proposed in an embodiment of the present application is shown, and as shown in fig. 4, a process of performing text recall based on an error correction text segment until a target text for error correction is determined is exemplarily given.

In the text recalling stage, some similar texts can be recalled by combining with fragment recalling and one or two of a plurality of text dictionaries, and then the similar texts are sequenced according to the editing distance between the similar texts and the error correction text fragments, so that the subsequent text to be recalled is obtained.

In the step of determining the target text according to the retrieval result of the text to be recalled, the search objects in the target index library may be retrieved at a plurality of different granularities based on the plurality of texts to be recalled, for example, the text to be recalled may be further filtered according to whether the object is retrieved according to the text to be recalled, and then the original search text may be subjected to the primary error correction processing based on the filtered text to be recalled, so as to construct a complete search text.

Of course, when the target text is finally determined according to the retrieval result of the complete search text, the text to be recalled may be ranked and scored according to the matching degree between the retrieval result and the complete search text, so as to obtain the target text.

Next, the steps of a search text processing method according to the present application will be described in blocks with reference to a flowchart shown in fig. 4.

First, a text recall stage is described in which a recall may be made in conjunction with a segment alignment process and multiple text dictionaries. Specifically, in practice, a plurality of text dictionaries may be selected for recall, or recall based on the segment alignment processing may be selected for recall, or of course, recall may be performed simultaneously by combining both. Referring to fig. 5, an overall flowchart illustrating a process of recalling a plurality of texts to be recalled based on a plurality of text dictionaries is shown, referring to fig. 6, a flowchart illustrating a process of recalling a plurality of texts to be recalled based on a plurality of text dictionaries is shown, and as shown in fig. 6, the process may specifically include the following steps:

step S601: and determining a plurality of similar texts associated with the corrected text segments from a plurality of preset text dictionaries.

Different text dictionaries correspond to different error correction dimensions, and the error correction dimensions at least comprise a phonetic dimension and a form dimension.

In this embodiment, as shown in fig. 4, the plurality of text dictionaries may include a near-to-speech dictionary and a near-to-shape dictionary, the near-to-speech dictionary may help recall a near text with a similar pronunciation to the text to be recalled from a dimension of a similar pronunciation to the text to be recalled, and the near-to-shape dictionary may help recall a near text with a similar font to the text to be recalled from a dimension of a similar font to the text to be recalled, for example, from a similar font to the text to be recalled, the near-to-shape dictionary has the same radical. The text dictionary constructed on the near dimension of the sound mainly considers three types of homophones, homophones and dissimilarities, and dissimilarities but is easy to confuse; the text dictionary constructed on the form-near dimension mainly divides the characters into roots and calculates the editing distance between the two as the similarity.

Of course, in addition to the above-described near-to-phoneme dictionary and near-to-shape dictionary, as shown in fig. 4, an error-prone text dictionary may be included, in which an error-prone word may be stored.

In specific implementation, the characters with similar characters can be recalled from different text dictionaries on the basis of each character in the text fragment to be recalled, and then the similar characters are pieced together into the text to be recalled to obtain a similar text.

For example, taking "li jiang" as an example, words similar to "li" words, such as li, and tame, may be recalled from different text dictionaries; then, the recalled characters are spliced with the 'river' to obtain a plurality of corresponding similar texts, and similarly, the characters similar to the 'river' characters can be recalled from different text dictionaries to obtain a plurality of similar texts.

Step S602: and filtering the plurality of similar texts based on a preset language model to obtain a similar text set.

And the similar text set comprises a plurality of filtered similar texts.

In this embodiment, the language model may be obtained by training a preset model with a plurality of text samples carrying tags as training samples, where the language model mainly learns the rule expressed by human beings, and the core algorithm is the frequency of occurrence between learning words, and accordingly, the language model of the present application may be used to determine whether an input text is a word group conforming to a reading and writing habit, for example, "lijiang" is a place name, which is a word group conforming to the reading and writing habit, and "scooping" is a word group not conforming to the reading and writing habit.

As shown in fig. 5, it can be determined whether the obtained multiple similar texts are word groups conforming to the reading and writing habits through the language model, so that the similar texts of the word groups not conforming to the reading and writing habits can be filtered out, and a filtered similar text set is obtained, wherein the similar texts stored in the similar text set are all the similar texts retained by the language model.

Step S603: and determining a text to be recalled from the similar text set based on the editing distance between each similar text in the similar text set and the error correction text fragment.

In this embodiment, the plurality of filtered similar texts may be sorted based on the edit distance between each of the plurality of filtered similar texts and the error correction text segment, where the edit distance between each of the plurality of filtered similar texts and the error correction text segment may be understood as: and the minimum number of editing operations required for converting the similar text into the error correction text fragment between the similar text and the error correction text fragment.

In specific implementation, a plurality of similar texts in the similar text set can be sequenced according to the sequence of the editing distance from short to long, so that a preset number of previously arranged similar texts are determined as the texts to be recalled.

When the implementation mode of the embodiment of the application is adopted, the fields with the similar sound and the similar shape are introduced for text recall, and the fused language model filters the recalled similar texts, so that the filtered similar texts reserved in the similar text set can be inclined to natural language expression, and are sequenced by fusing the editing distance, thereby improving the relevance between the text to be recalled and the error correction text fragment and improving the recall quality.

Of course, as shown in fig. 5, in some embodiments, during the text recall stage, some similar texts may be recalled in conjunction with a snippet recall, one or both of a plurality of text dictionaries, for example, a recall may also be performed in conjunction with a plurality of text dictionaries and snippet alignment.

In specific implementation, a plurality of text segment corresponding relations can be obtained, wherein each text segment corresponding relation is obtained by aligning a search text input by a user in historical search behavior data with a name of a search object clicked by the user; and after the name fragment with the corresponding relation of the error correction text fragment is obtained from the corresponding relation of the plurality of text fragments, the name fragment corresponding to the error correction text fragment is taken as a similar text and added into the similar text set.

In this embodiment, each text segment correspondence represents a correspondence between one text segment in the search text and one text segment in the names of the search objects, and for example, it is assumed that the search text input by the user is "Ji coffee hotel road construction", and the name of the search object clicked by the user is "Ji coffee hotel road construction", the "Ji coffee hotel road construction" and the "Ji coffee hotel road construction are aligned to obtain a correspondence of" Ji coffee hotel "- >" Ji coffee hotel ".

In this way, the corresponding relation of one text segment can reflect the text segment with difference between the search text input by the user and the search object actually clicked by the user, so that the text recall based on the user behavior data can be realized, and the recall text representing the actual search intention of the user can be recalled more probably. Further, as shown in fig. 5, when there is a plurality of text fragment correspondences, a name fragment having a correspondence with an error correction text fragment can be recalled.

For example, taking the error-corrected text segment as "li jiang", assuming that there is a correspondence relationship between "li jiang" - > "lijiang", among the correspondence relationships between the text segments, the name segment having a correspondence relationship with "li jiang" may include: lijiang river and Lijiang river.

In specific implementation, the correspondence between the text segments may be obtained through the following processes:

first, a plurality of search behavior data including a search text input by a user and a name of a search object clicked by the user may be obtained. And then, aligning the search text input by the user in each search behavior data with the name of the search object clicked by the user to obtain the corresponding relation between a plurality of search text segments and name segments, wherein the corresponding relation between the search text segments and the name segments is the text segment corresponding relation.

Accordingly, after the name fragment corresponding to the error correction text fragment is obtained, the name fragment corresponding to the error correction text fragment can be added to the similar text set to serve as the similar text in the similar text set, and the subsequent process of determining the text to be recalled based on the editing distance between each similar text and the error correction text fragment is participated in.

Next, a stage of determining a target text according to a retrieval result of the text to be recalled is introduced.

At this stage, a plurality of search objects in the target index library may be retrieved at different granularities based on the plurality of texts to be recalled, where the different granularities at least include a granularity using the texts to be recalled as search terms and a granularity using candidate search texts as search terms, and the candidate search texts are obtained by performing error correction processing on the search texts to be recalled.

Correspondingly, when the target text for error correction of the error correction text segment is determined from the multiple texts to be recalled according to the retrieval results corresponding to the multiple texts to be recalled, the target text for error correction of the error correction text segment may be determined from the multiple texts to be recalled according to the retrieval results of different granularities corresponding to the multiple texts to be recalled.

When the candidate search text is used as a search word, the candidate search text may be obtained by replacing an error correction text segment in the search text with a text to be recalled. For example, taking the search text as "yunan major li jiang hotel" as an example, the text "Lijiang river" to be recalled may be replaced by "li jiang river", so as to obtain "yunan major Lijiang river hotel".

In this embodiment, a search object in the target index library may be searched with a plurality of different granularities based on the text to be recalled, where the granularity may refer to the length of a search term for search.

For example, the result retrieved by using the "Lijiang river" as the search term may be different from the result retrieved by using the "Lijiang river hotel" as the search term, in which the former retrieves a larger number of search objects and the latter retrieves a smaller number of search objects.

In specific implementation, the segment text retrieval granularity may be the granularity of a text to be recalled as a search term, and the complete text retrieval granularity may be the granularity of a candidate search text as a search term. Specifically, the text to be recalled may be used as a search term to perform a search, so as to obtain a search result of the text to be recalled, and the candidate search text may be used as a search term to perform a search, so as to obtain a search result of the candidate search text.

In some specific implementations, when the texts to be recalled are ranked according to the retrieval result of the texts to be recalled and the retrieval result of the candidate search texts, an intersection may be taken between the retrieval result of the texts to be recalled and the retrieval result of the candidate search texts, and the texts to be recalled are ranked according to the search objects obtained after the intersection.

For example, suppose that 50 search objects are recalled in the text "Lijiang river" to be recalled and 20 search objects are recalled in the "Lijiang Hotel in southern Cloud," and 10 search objects are obtained after the intersection is taken. Then the dribble can be ranked according to 10 search object pairs.

As shown in fig. 4, in some embodiments, when searching for a search object in the target index library based on the multiple texts to be recalled, the texts to be recalled are used as search words respectively, the texts to be recalled are further filtered according to the search result, and then the error correction processing is performed on the search texts to be processed based on the filtered texts to be recalled, and candidate search texts obtained after the error correction processing are used as search words and are searched for in the target index library. Referring to fig. 7, a flowchart of the retrieval steps at two different granularities is shown, and as shown in fig. 7, the method specifically includes the following steps:

step S701: and respectively taking the plurality of texts to be recalled as search terms, and searching the search objects of the target index library to obtain at least one candidate recall text of the searched search objects.

As shown in fig. 4, in this embodiment, when a plurality of texts to be recalled are used as search terms to search for a search object in a target index library, if an index record is hit in a text to be recalled, a search object corresponding to the hit index record is obtained, and a result is represented; if the index record is not hit, the search object cannot be searched, and the text to be recalled is represented and cannot obtain the search result, namely no result exists, so that the text to be recalled, of which the search object is not retrieved, can be discarded.

By the method, at least one candidate recall text of the searched object can be obtained, and the filtering of a plurality of texts to be recalled is realized, so that unreasonable texts to be recalled can be filtered.

Step S702: and replacing the error correction text segments in the search text with the at least one candidate recall text respectively to obtain candidate search texts corresponding to the at least one candidate recall text respectively.

Step S703: and searching the target index database by taking the candidate search text as a search word.

In this embodiment, for each candidate recall text, the error correction text segment in the search text to be processed may be replaced by the candidate recall text, so as to obtain a candidate search text corresponding to the candidate recall text. And then, taking the candidate search text as a search word, and searching in a target index library.

When the candidate search text is used as a search word and searched in a target index library, the candidate search text can be subjected to word segmentation processing, so that a plurality of search text segments for searching are obtained, then, the search objects corresponding to the index records hit by each search text segment are intersected, the hit search objects are obtained, and the hit search objects are the search results of the candidate search text.

For example, taking the candidate search text as "yunan major lijiang hotel," as an example, index records hit by "yunan," major, "" lijiang, "" hotel "in the target index library may be obtained, each index record has 100 corresponding search objects, an intersection may be taken for 100 search objects, for example, through a duplication checking process, the same search object existing in different hit index records may be obtained, and the same search object may be used as a search result of" yunan major lijiang hotel.

Correspondingly, when the target text for correcting the error of the text segment to be recalled is determined from the text segment to be recalled according to the retrieval results with different granularities corresponding to the text segment to be recalled, the target text may be determined from the at least one candidate recall text according to the retrieval results corresponding to the candidate search texts.

In this embodiment, since the candidate search text is obtained by correcting the error of the search text to be processed based on the retained candidate recall text, and the search result of the candidate search text can be obtained, in some embodiments, the candidate recall texts in the candidate search text can be sorted according to the search object searched by the candidate search text, so that the target text is determined from the candidate recall text according to the result of the sorting by the score.

In some embodiments, as shown in fig. 4, when the target text is determined from the at least one candidate recall text according to the search result corresponding to each candidate search text, that is, under the granularity of taking the candidate search text as a search word, a multidimensional feature corresponding to each candidate search text may be determined, and then the target text may be filtered according to the multidimensional feature.

Referring to fig. 8, a schematic flowchart illustrating a step of determining the target text according to a retrieval result corresponding to the candidate search text is shown, and as shown in fig. 8, the method may specifically include the following steps:

step S801: obtaining retrieval results corresponding to the candidate search texts, wherein the retrieval results corresponding to each candidate search text at least comprise: and the hit index records correspond to the target search objects.

In this embodiment, as described in the foregoing embodiment, when the candidate search text is used as a search word and is searched in the target index library, the search result of the candidate search text includes a hit search object, and the hit search object is a target search object.

Step S802: and determining the multidimensional characteristics corresponding to the candidate search texts based on the candidate search texts and the corresponding target search objects.

The multi-dimensional characteristics of each candidate search text comprise similarity characteristics between the candidate search text and the name of the target search object, attribute characteristics of the target search object and context language characteristics of the candidate recall text.

In this embodiment, one search object may include a plurality of attributes, and different attributes may reflect characteristics of the search object, and may further reflect search quality of the candidate search text. As shown in fig. 4, the plurality of attributes of the search object may include merchant attributes and commodity attributes, when the target search object is a merchant, an evaluation tag of the target search object, such as a star tag, may also be obtained, and when the target search object is a commodity, a category of the commodity, such as a medicine, a living good, a dish, and the like, may also be obtained.

The similarity characteristic can reflect the similarity between the name of the target search object and the candidate search text, and the higher the similarity is, the more accurate the representation search is.

The attribute feature may reflect the search quality of the target search object searched by using the candidate search text, and of course, the search quality may be determined according to the search scene, for example, in the food search scene, the search quality of the target object that is a dish is poorer than that of the target object that is a business, and in the regional food search scene, the search quality of the target object that is a dish is higher than that of the target object that is a business.

The context language features may be obtained by inputting the candidate recall text and the context features of the candidate recall text into a language model respectively, and the context features may reflect the rationality between the candidate recall text and other text segments in the candidate search text, for example, the candidate recall text is "Lijiang river", and the contexts adjacent to "Lijiang river" in the candidate search text are "greater reason" and "big hotel", so the context features of the candidate recall text may be obtained through the language model.

In a specific implementation, the multidimensional feature of each candidate search text may be used to score and rank the candidate search texts, in an example, as shown in fig. 4, the multidimensional feature may be a multidimensional score, and specifically, the multidimensional feature of the candidate search object may be obtained through the following steps:

firstly, determining a first score of a candidate recall text corresponding to a candidate search text according to the attribute of a target search object; the attributes at least comprise merchant attributes and commodity attributes. The different attributes may correspond to different first scores.

Then, inputting the candidate recall texts corresponding to the candidate search texts and the context features of the candidate recall texts into a language model respectively to obtain second scores corresponding to the candidate recall texts and third scores corresponding to the context features of the candidate recall texts. And determining the multi-dimensional characteristics of the candidate search texts according to the first score, the second score and the third score. Specifically, the first score, the second score, and the third score may be weighted and summed according to the weight corresponding to the first score, the weight corresponding to the second score, and the weight corresponding to the third score, so as to obtain a multidimensional score, and then each candidate recall text may be ranked according to the multidimensional score of each candidate recall text.

Step S803: and screening the target text from the candidate recall texts corresponding to the candidate search texts based on the multi-dimensional features corresponding to the candidate search texts.

In this embodiment, because the multidimensional features corresponding to the candidate search texts are obtained, the candidate search texts may be ranked according to the multidimensional features, where the multidimensional features may be multidimensional scores of the candidate search texts, and the candidate recall texts corresponding to the candidate search texts may be ranked in an order from high scores to low scores.

By adopting the technical scheme of the embodiment of the application, the target index library can be retrieved with different granularities by utilizing the text to be recalled, and the retrieval capability of the text to be recalled can be reflected locally and integrally by the retrieval with different granularities, so that the accuracy of sequencing and evaluating the text to be recalled can be improved. And when the target text is screened according to the retrieval result corresponding to the candidate search text, the candidate recall text can be ranked according to the commodity attribute of the searched target search object, the similarity between the name of the target search object and the candidate search text, the context characteristics of the candidate recall text and the reasonability of the natural language expression of the candidate recall text, so that the candidate recall text is comprehensively ranked from the dimension of text similarity, the dimension of natural language and the quality dimension of the target search object, and the reasonability and accuracy of ranking are improved.

In some practical cases, there may be a plurality of error correction text segments that need to be corrected in one search text, and in this case, the error correction order of the plurality of error correction text segments may be determined according to the positions of the plurality of error correction text segments in the search text, and then according to the error correction order, the steps of the above embodiment are sequentially performed for each error correction text segment to determine the target text corresponding to each error correction text segment, and then the plurality of error correction text segments in the search text are respectively replaced with the respective corresponding target texts, so as to obtain the corrected search text.

It should be noted that, in the case that there are a plurality of error correction text segments, the search text to be processed on which the next error correction text segment is based may be: the original search text is corrected based on the target text corresponding to the previous corrected text segment, which can also be understood as an iterative correction mode, that is, the correction on the next corrected text segment is performed on the correction result of the previous corrected text segment.

As shown in fig. 4, in a specific implementation, when a stage of searching a target index library by using a candidate search text as a search word is entered, the candidate search text may be obtained by correcting a search text based on a target text corresponding to a previous error correction text segment and a candidate recall text corresponding to a next error correction text segment.

By way of example, taking "yunan Dali Li Jiang Dasanjiang Sporhut division point" as an example, the error correction segment includes "Li Jiang", "Sporhut" and "division point", when the error correction is performed to "Sporhut", it can be determined that "Li Jiang" and the corresponding target text "Lijiang" are already present, the candidate recall text corresponding to the current "Sporhut" includes "Sporhut" and "Hotel", and then the obtained candidate search text is "Yunnan Dali Lijiang Sporhut city division point".

In still other embodiments, in order to improve the error correction efficiency, when the steps of the foregoing embodiments are sequentially performed for each error-corrected text segment, and when a target text corresponding to the error-corrected text segment to be corrected currently is determined, it may be determined whether the similarity characteristic of the target text is not less than the similarity threshold.

As shown in fig. 4, the similarity characteristic may reflect the similarity between the name of the target search object and the candidate search text, and the higher the similarity is, the more accurate the representation search is, which may actually indicate that the candidate search text may search out a complete and accurate search object. As shown in fig. 4, in practice, the corresponding target text may be returned, and the target search object searched in this case may be returned together, and in practice, the target search object searched in this case may be returned to the user as the search result of the search text to be processed.

In this case, the error correction of the subsequent error correction text segment can be finished, so that the search text after error correction is used for subsequent search as the error correction text of the search text to be processed.

When the similarity characteristic of the screened target text is determined to be smaller than the similarity threshold, the target text corresponding to the next error correction text segment is determined until all error correction text segments are traversed, and the error correction text segments in the search text are respectively replaced by the corresponding target texts to obtain the error-corrected search text.

In this case, the target search object that represents the candidate search text search is not a complete and accurate search object, and the subsequent error correction text segment needs to be corrected continuously, so that the target text corresponding to the next error correction text segment can be determined according to the scheme of the above embodiment until all the error correction text segments are traversed, and thus, the plurality of error correction text segments in the search text are respectively replaced with the corresponding target texts, and the search text after error correction is obtained.

Of course, in some practical cases, if the target text with the similarity reaching the similarity threshold is obtained in the process of determining the target text corresponding to the subsequent error correction text segment, the error correction may also be finished, and further, the error correction processing is performed on the search text based on the target text corresponding to each error correction text segment in the previous error correction process.

For example, taking "yunan main li jiang lusan shop city division point" as an example, the error correction segment includes "li jiang", "spill shop" and "division point", and when the error correction is performed to "spill shop", it is realized that the similarity corresponding to the searched target object is greater than the similarity threshold, so that the error correction can be finished, and thus the "li jiang" and "spill shop" are replaced by the respective corresponding target texts "lijiang" and "hotel", and the final error-corrected search text "yunan main li jiang river lusan city division point" is obtained.

By adopting the embodiment, the error correction efficiency and the accuracy of the search text with a plurality of error correction text segments can be improved.

The above description describes the error correction of the text segment to be corrected according to the present application. In practice, because the target index library is provided in the present application, the error correction text segment that needs error correction in the search text can be located based on the target index library. In an embodiment, referring to fig. 9, a flowchart illustrating a step of determining an error correction text segment is shown, which may specifically include the following steps:

step S901: and obtaining the position identification of the user sending the search text.

In this embodiment, the to-be-processed search text may include a location identifier where the user is located, where the location identifier may be a number of a region, and if the number of the yunan is 101, the search text may carry the identifier 101.

Step S902: and performing word segmentation processing and/or entity recognition on the search text to obtain a plurality of text segments.

In this embodiment, the search text may be segmented based on two granularities, one is word segmentation, and the other is entity identification, where word segmentation may be performed using words described in natural language as granularity, and entity identification may be performed using word attributes of each text segment in the search text as granularity, for example, yunnan is a place name attribute, and hotel is a business attribute.

For example, taking the search text of "yunan Dali Li Jiang DaHotel" as an example, after the word segmentation processing is performed on "yunan Dali Li Jiang DaHotel", text segments of "yunan", "Dali", "Li Jiang", "Da", "Hotel" can be obtained. These text fragments are then checked for errors.

For example, taking the search text of "yunan university li jiang hotel" as an example, after the search text of "yunan university li jiang hotel" is subjected to entity recognition, text segments of "yunan university", "li jiang" and "hotel" can be obtained. These text fragments are then checked for errors.

Step S903: and respectively taking the text segments and the position marks as search words, and searching the index records in the target index library to obtain a hit search object corresponding to the index record hit by each text segment.

In this embodiment, as described in the above embodiment of constructing the target index library, the target index library may use the descriptor-address as an index item, and may use each text segment and the location identifier as a search term to search a plurality of index records in the target index library, so as to obtain a plurality of search objects corresponding to hit index records, that is, a plurality of hit search objects.

Wherein each text segment may hit one or more hit search objects or may miss a search object.

Step S904: and determining error correction text segments needing error correction from the text segments according to intersection taking results of the hit search object corresponding to each text segment in the text segments and the hit search objects corresponding to other text segments.

In this embodiment, for each text segment, an intersection may be taken between the hit search object corresponding to the text segment and the hit search objects corresponding to other text segments, and taking the intersection may be referred to as a search and repeat process, that is, it is determined whether two text segments correspond to the same hit search object. By taking intersection processing, the error correction text segment with larger meaning difference with other text segments can be found out.

If the text segment and other text segments have the intersection of hit search objects, it can be determined that the text segment does not need to be corrected; if the intersection of the hit search object does not exist between the text segment and other text segments, it can be determined that the text segment needs to be corrected. Of course, if the text fragment and the partial text fragment have an intersection of hit search objects, and there is no intersection of hit search objects with other partial text fragments, it may also be determined that the text fragment does not need to be corrected.

For fully understanding the error detection of the present application based on the target index library, an example is listed below for illustration, which is, of course, only for convenience of understanding and does not represent a limitation to the actual situation:

still taking the search text of "yunan Dali Li Jiang DaHotel" as an example, after performing word segmentation processing on "yunan Dali Li Jiang DaHotel", text fragments of "yunan", "Dali", "Li Jiang", "Da", "Hotel" can be obtained, and respective hit search objects of "yunan", "Dali", "Li Jiang", "Da" and "Hotel" are also obtained, wherein for "yunan", the same hit search object is corresponding to "Dali", "Da" and "Hotel", and then "yunan" is excluded; if the text segments of the 'li jiang' and the 'yunan', 'Dali' and 'hotel' do not have the condition of corresponding to the same hit search object, the 'li jiang' can be determined as the text segment needing error correction, in such a case, the property representing the 'li jiang' has a large difference with the properties of other text segments in the search text, and the fact that the label 'li jiang' exists in the search text is unreasonable, so error correction is needed.

Certainly, in some practical cases, a plurality of text segments for performing word segmentation processing on a search text and a plurality of text segments for performing entity identification on the search text may be obtained respectively, and then error detection is performed on the plurality of text segments obtained in the two manners respectively, so as to obtain a text segment requiring error correction under word segmentation processing and a text segment requiring error correction under entity identification.

Through the implementation mode, the error detection of the search text by using the target index library can be realized, and because each index record in the target index library takes the participle in the name of the search object as an index which can reflect the correct name of the search object, the error detection by using the target index library can realize the error detection accuracy, and under the condition of improving the error detection accuracy, the higher the probability that the error correction text segment is the segment needing to be corrected is, the missed detection and the error detection can be avoided, so the accuracy of inquiring and correcting the search text can be improved.

By adopting the technical scheme of the embodiment of the application, the method has the following advantages:

1. and the error detection accuracy is improved from the error detection stage, and the problems of missed detection and error detection are avoided.

Because each index record in the target index library takes the participle in the name of the search object as an index, the index can reflect the correct name of the search object, and therefore, the target index library is utilized to detect errors, and the problem of error detection is avoided. And because the search text is segmented with different granularities in the error detection process, the flexibility of error detection can be improved, so that the text segment needing error correction can be more easily detected, and the problem of missed detection is avoided.

2. The accuracy of the recalled text is improved from the text recall stage, so that the accuracy of subsequent query error correction is improved.

On one hand, the fields with similar sound and shape are introduced for text recall, and meanwhile, the fused language model filters the recalled similar texts, so that the filtered similar texts reserved in a similar text set can be preferentially expressed in natural language, and are sequenced by fusing editing distance, thereby improving the relevance between the text to be recalled and the error correction text fragment and improving the recall quality.

On the other hand, the user searching behavior data can be mined to obtain the corresponding relation of the text segments, and the text segments with difference between the searching text input by the user and the searching object actually clicked by the user can be reflected, so that the text recall based on the user behavior data can be realized, the recall text representing the actual searching intention of the user can be recalled more probably, and the recall accuracy is improved.

3. The sorting accuracy is improved from the stage of sorting the recalled texts, so that the accuracy of the obtained target texts for error correction is improved.

Since the search objects of the target index library are retrieved by taking the text to be recalled as the search words after the plurality of texts to be recalled are obtained, the search quality of searching by taking the text to be recalled as the search words can be obtained through retrieving the target index library, for example, the quantity, the accuracy and the like of the searched search objects can be obtained, and the recalled texts can be sequenced by combining the similarity characteristic, the attribute characteristic and the context language characteristic of the hit search objects, so that the defects that the score is scored only by using a traditional language model, the natural language expression is preferentially generated, the statistical frequency of new words (for example, merchants have creative names) is not high and the like are overcome, the search quality of the text to be recalled for retrieval can be more accurately evaluated, and the accuracy of the target text for correcting the error correction text fragments is improved, and the error correction accuracy is improved.

4. The method can effectively cope with the error correction of the multi-component complex error correction text segment.

On one hand, the error correction is respectively carried out on a plurality of error correction text segments in the search text segment, and the error correction is carried out on the error correction result of the last error correction text segment, and under the condition that the characteristics of the error correction text segments are different, the error correction is carried out on the error correction text segments which are composed of a plurality of components and are simplified into a single error correction text segment which is composed of a single component, so that the error correction of the complicated error correction text segment which is composed of a plurality of components is effectively carried out.

5. The error correction efficiency can be improved.

Under the condition of having a plurality of error correction text segments, the recall can be finished when a certain error correction text segment has a recall text with the corresponding similarity higher than the similarity threshold, so that the error correction efficiency is improved under the condition of ensuring the recall accuracy of the recall text.

Based on the same inventive concept as the above embodiments, a second aspect of the embodiments of the present disclosure provides a billing request processing method, and as shown in fig. 10, a frame diagram of a search text processing apparatus is shown, and as shown in fig. 10, the method may specifically include the following modules:

an error correction text determination module 1001, configured to determine a current error correction text segment to be corrected from a search text to be processed;

a recall module 1002, configured to perform text recall based on the error correction text segments to obtain multiple texts to be recalled corresponding to the error correction text segments;

a retrieval module 1003, configured to retrieve, based on the multiple texts to be recalled, a search object in a target index library, where multiple index records using description words as indexes are stored in the target index library; one index record corresponds to one or more search objects, and the description words are participles in the names of the search objects;

a target text obtaining module 1004, configured to determine, according to the search result corresponding to each of the multiple texts to be recalled, a target text for error correction of the error correction text segment from the multiple texts to be recalled.

Optionally, the retrieving module 1003 may be specifically configured to perform, based on the multiple texts to be recalled, retrieval on a search object of a target index library with multiple different granularities, where the multiple different granularities at least include a fragment text retrieval granularity and a complete text retrieval granularity;

the target text obtaining module 1004 may be specifically configured to determine, according to the retrieval results of the multiple texts to be recalled, which correspond to different granularities, a target text for error correction of the error correction text segment from the multiple texts to be recalled.

Optionally, the retrieving module 1003 may specifically include the following units:

the first granularity retrieval unit is used for retrieving the search objects of the target index library by respectively taking the plurality of texts to be recalled as retrieval texts to obtain at least one candidate recall text of the retrieved search objects;

the processing unit is used for replacing the error correction text segments in the search text with the at least one candidate recall text respectively to obtain candidate search texts corresponding to the at least one candidate recall text respectively;

the second granularity retrieval unit is used for retrieving the search object of the target index library by taking the candidate search text as a retrieval text;

the target text obtaining module 1004 may be specifically configured to determine the target text from the at least one candidate recall text according to a retrieval result corresponding to each of the candidate search texts.

Optionally, the target text obtaining module 1004 may specifically include the following units:

a search result obtaining unit, configured to obtain, at the complete text retrieval granularity, a target search object corresponding to an index record hit by each candidate search text; the candidate search text is obtained by carrying out error correction processing on the search text by a text to be recalled;

the characteristic determining unit is used for determining the multidimensional characteristic corresponding to each candidate searching text based on each candidate searching text and the corresponding target searching object; the multi-dimensional characteristics of each candidate search text comprise similarity characteristics between the candidate search text and the name of the target search object, attribute characteristics of the target search object and context language characteristics of the candidate recall text;

and the screening unit is used for screening the target text from the candidate recall texts corresponding to the candidate search texts based on the multi-dimensional features corresponding to the candidate search texts.

Optionally, in a case that the corrected text segment is multiple, the apparatus may further include the following modules:

the first processing module is used for replacing the current error correction text segment to be corrected in the search text with the target text to obtain an error corrected search text when the similarity characteristic of the screened target text is determined to be greater than or equal to a similarity threshold value;

and the second processing module is used for determining the target text corresponding to the next error correction text segment when the similarity characteristic of the screened target text is determined to be smaller than the similarity threshold value, and replacing the error correction text segments in the search text with the corresponding target texts respectively to obtain the error-corrected search text until all the error correction text segments are traversed.

Optionally, the apparatus may further include an error detection module, and the detection module may specifically include the following units:

the position obtaining unit is used for obtaining a position identifier of the user sending the search text;

the identification unit is used for performing word segmentation processing and/or entity identification on the search text to obtain a plurality of text segments:

the retrieval unit is used for retrieving the plurality of index records in the target index library by respectively taking the plurality of text segments and the position identification as retrieval texts to obtain a hit search object corresponding to the index record hit by each text segment;

and the determining unit is used for determining error correction text segments needing error correction from the text segments according to the intersection taking result of the hit search object corresponding to each text segment and the hit search objects corresponding to other text segments.

The recall module may specifically include the following units:

the first recalling unit is used for determining a plurality of similar texts associated with the error correction text segments from a plurality of preset text dictionaries, wherein different text dictionaries correspond to different error correction dimensions, and the error correction dimensions at least comprise a phonetic dimension and a form dimension;

the filtering unit is used for filtering the plurality of similar texts based on a preset language model to obtain a similar text set, wherein the similar text set comprises a plurality of filtered similar texts;

and the determining unit is used for determining the text to be recalled from the similar text set on the basis of the editing distance between each similar text in the similar text set and the error correction text fragment.

Optionally, the apparatus further specifically includes the following modules:

the obtaining module is used for obtaining a plurality of text segment corresponding relations, and each text segment corresponding relation is obtained by aligning a search text input by a user in the historical search behavior data with the name of a search object clicked by the user;

the second recall module is used for obtaining the name fragment corresponding to the error correction text fragment from the corresponding relation of the plurality of text fragments;

and the adding module is used for adding the name fragment corresponding to the error correction text fragment into the similar text set as a similar text.

Optionally, the apparatus may further include an index library construction module, specifically including:

a sample information obtaining unit configured to obtain sample information of a plurality of search target samples, the sample information including names, addresses, and identifications of the search target samples;

the information processing unit is used for carrying out word segmentation processing on the name of each search object sample to obtain a plurality of descriptors;

the classification unit is used for obtaining a search object sample to which each descriptor in the descriptors belongs;

the construction unit is used for taking each descriptor as an index item and constructing an index record of the descriptor based on the sample information of the search object sample to which the descriptor belongs to obtain the target index library; wherein, each index record at least comprises the identification, the category and the address of the search object sample to which the description word belongs.

The embodiment of the present invention further provides an electronic device, which may include a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor is configured to execute the search text processing method.

Embodiments of the present application also provide a non-transitory computer-readable storage medium, and when executed by a processor, enable the processor to perform an operation performed to implement the above-mentioned search text processing method of the present application.

The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

Embodiments of the present invention are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing terminal to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing terminal to cause a series of operational steps to be performed on the computer or other programmable terminal to produce a computer implemented process such that the instructions which execute on the computer or other programmable terminal provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present invention have been described, additional variations and modifications of these embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the embodiments of the invention.

Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or terminal that comprises the element.

The method, the device, the electronic device and the computer storage medium for processing the search text provided by the invention are introduced in detail, and a specific example is applied in the text to explain the principle and the implementation of the invention, and the description of the embodiment is only used for helping to understand the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. A search text processing method, the method comprising:

2. The method of claim 1, wherein retrieving search objects of a target index repository based on the plurality of texts to be recalled respectively comprises:

3. The method of claim 2, wherein retrieving search objects of a target index repository at a plurality of different granularities based on the plurality of texts to be recalled comprises:

4. The method according to claim 2 or 3, wherein determining a target text for correcting the error correction text segment from the plurality of texts to be recalled according to the retrieval results of the plurality of texts to be recalled with different granularities respectively comprises:

5. The method of claim 4, wherein in the case that the error correction text segment is plural, the method further comprises:

6. The method according to any of claims 1-5 or 7, characterized in that the error corrected text passage is obtained by:

obtaining a position identifier of a user sending the search text;

and determining error correction text segments needing error correction from the plurality of text segments according to intersection results of the hit search object corresponding to each text segment and the hit search objects corresponding to other text segments.

7. The method according to any one of claims 1 to 5, wherein performing text recall based on the corrected text segments to obtain a plurality of texts to be recalled corresponding to the corrected text segments comprises:

8. The method of claim 7, further comprising:

9. The method according to any one of claims 1 to 5, wherein the target index repository is obtained by:

10. A search text processing apparatus, characterized in that the apparatus comprises:

and the target text obtaining module is used for determining a target text for correcting the error correction text segments from the plurality of texts to be recalled according to the retrieval results corresponding to the plurality of texts to be recalled respectively.

11. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor when executing implementing the search text processing method according to any one of claims 1 to 9.

12. A computer-readable storage medium storing a computer program for causing a processor to execute the search text processing method according to any one of claims 1 to 9.