CN116069174A

CN116069174A - Input association method, electronic equipment and storage medium

Info

Publication number: CN116069174A
Application number: CN202310144621.6A
Authority: CN
Inventors: 薄满辉; 籍焱; 王凯; 张丽颖; 刘丰; 尚亚南
Original assignee: China Travelsky Mobile Technology Co Ltd
Current assignee: China Travelsky Mobile Technology Co Ltd
Priority date: 2023-02-21
Filing date: 2023-02-21
Publication date: 2023-05-05

Abstract

The invention provides an input association method, which comprises the following steps: acquiring an input target character string; acquiring a plurality of target entity word libraries; traversing a target entity word stock, and for the current target entity word stock, if any entity word in the current target entity word stock is contained in the target character string, acquiring a fixed sentence corresponding to the entity word from the current target entity word stock as a current output result; acquiring sentences beginning with the target character strings from a first set corpus, and taking the acquired target sentences as a second output result if the corresponding target sentences are acquired; performing word segmentation processing on the target character string to obtain a word segmentation set; acquiring sentences comprising each word in the word segmentation set from a second set corpus to obtain a corresponding sentence set; if the sentence set has an intersection, taking the sentence obtained by the intersection as a third output result; and outputting a result. The invention also provides electronic equipment and a storage medium. The invention can output the association words as rich as possible.

Description

Input association method, electronic equipment and storage medium

Technical Field

The present invention relates to the field of intelligent search, and in particular, to an input association method, an electronic device, and a storage medium.

Background

With the tremendous growth of internet technology, people are increasingly dependent on obtaining the required information from the internet. When a user searches for content by using a search box, each word is generally input in the search box, the search box searches for an associated word matched with the input word in a pre-built input associated word bank, and an input associated word list presented below the search box is displayed, so that the user can directly click on the recommended input associated word, and further, the content to be checked can be directly searched without continuously inputting the word. However, in the existing association input method, a user is required to input a character with relatively complete meaning to give a corresponding association word, or because the corpus is limited, there may be a situation that the association word cannot be provided because of no match. When the characters input by the user are fuzzy or the number of characters is too short, for example, only one character is input, the corresponding association word cannot be given, so that the applicability is poor and the user experience is poor.

Disclosure of Invention

Aiming at the technical problems, the invention adopts the following technical scheme:

the embodiment of the invention provides an input association method, which comprises the following steps:

s100, acquiring an input target character string;

s200, acquiring n target entity word libraries; each target entity word library comprises a plurality of entity words and corresponding fixed sentences, the entity word types corresponding to any two target entity word libraries are different, and the entity words in the same target entity word library correspond to the same entity word type;

s300, traversing n target entity word libraries, and if the ith target entity word library contains the ith target character stringAny entity word in the i target entity word banks is used for acquiring fixed sentences corresponding to the entity word from the i target entity word banks as k _i Outputting results, wherein the value of i is 1 to n; will (k) ₁ +k ₂ +…+k _i +…+k _n ) The output results are used as first output results; s400 is executed;

s400, acquiring a first target sentence beginning with the target character string from a first set corpus, taking the acquired first target sentence as a second output result if the corresponding first target sentence is acquired, and executing S500; otherwise, executing S500;

s500, performing word segmentation processing on the target character string to obtain a word segmentation set P= (P) ₁ ，P ₂ ，…，P _j ，…，P _m ），P _j J is the j-th word in P, the value of j is 1 to m, and m is the number of words in P; if m > 1, S600 is performed; otherwise, executing S800;

s600, obtaining P from the second set corpus _j Obtain P _j Statement set W _j =（w _j1 ，w _j2 ，…，w _jr ，…，w _jh（j）），w _jr Is W _j The value of r is 1 to h (j), h (j) is W _j A second target sentence number in (a);

s700, if W ₁ ∩W ₂ ∩…∩W _j ∩…∩W _m Not equal to Null, will W ₁ ∩W ₂ ∩…∩W _j ∩…∩W _m The obtained sentence is used as a third output result, and S810 is executed; s800, taking at least part of the first output result and the second output result as a final output result and outputting the final output result;

and S810, taking at least part of the first output result, the second output result and the third output result as final output results, and outputting. The invention has at least the following beneficial effects:

according to the input association method provided by the embodiment of the invention, the input character strings are firstly subjected to fixed sentence matching, then the first set corpus is used for matching, the character strings are subjected to word segmentation processing and then the second set corpus is used for matching under the condition that proper association words are not matched, if the proper association words are not matched yet, the character strings are subjected to synonym replacement and/or keyword extraction, and the matching is performed from the second set corpus based on the synonym replacement and/or keyword extraction results, so that the provided association words are rich and accurate as much as possible, and the user experience is good.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flowchart of an input association method according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to fall within the scope of the invention.

An embodiment of the present invention provides an input association method, as shown in fig. 1, which may include the following steps:

s100, acquiring an input target character string.

In the embodiment of the present invention, the target character string may be a character string composed of all characters input by the user in the input box of the setting information providing website, for example, the character input by the user is "southern aviation delay", and then the target character string is "southern aviation delay". For another example, if the character input by the user is "i want to go to beijing", the target character string is "i want to go to beijing".

S200, acquiring n target entity word libraries; each target entity word library comprises a plurality of entity words and corresponding fixed sentences, the entity word types corresponding to any two target entity word libraries are different, and the entity words in the same target entity word library correspond to the same entity word type.

In the embodiment of the present invention, n target entity word libraries may be stored in a server in advance, where the server is a server that provides website communication connection with setting information. In one example, each target entity word library may include an entity word list storing a number of entity words and a fixed sentence table associated with the entity word list. In another example, each target entity word library may include an entity word list storing a number of entity words and a number of fixed sentence tables associated with the number of entity words. Preferably, to reduce storage resources, all fixed statements may be stored in the same table.

The categories and numbers of target entity word banks may be set based on actual needs, and in one exemplary embodiment, the target entity word banks may be word banks associated with aviation, for example, entity word banks that may include entity word categories such as avionics, airports, and security checks. Those skilled in the art will recognize that any method of constructing a target entity word stock is within the scope of the present invention.

S300, traversing n target entity word banks, and for the ith target entity word bank, if any entity word in the ith target entity word bank is contained in the target character string, acquiring a fixed sentence corresponding to the entity word from the ith target entity word bank as k _i Outputting results, wherein the value of i is 1 to n; will (k) ₁ +k ₂ +…+k _i +…+k _n ) The output results are used as first output results; s400 is performed.

Specifically, for each target entity word bank, each entity word in the target entity word bank can be compared with a target character string, and if any entity word in the target entity word bank is contained in the target character string, a corresponding fixed sentence is obtained from the target entity word bank as a first output result of this time. The person skilled in the art may have a case that all entity words in the target entity word library are not included in the target character string, i.e. the fixed sentence in the first output result may be Null.

S400, acquiring a first target sentence beginning with the target character string from a first set corpus, taking the acquired first target sentence as a second output result if the corresponding first target sentence is acquired, and executing S500; otherwise, S500 is directly performed.

In the embodiment of the present invention, the first set corpus may be a prefix tree corpus, and may be an existing prefix tree corpus.

In the embodiment of the invention, the acquired target sentence is a sentence with the intention being the same as or close to the intention of the target character string.

S500, performing word segmentation processing on the target character string to obtain a word segmentation set P= (P) ₁ ，P ₂ ，…，P _j ，…，P _m ），P _j J is the j-th word in P, the value of j is 1 to m, and m is the number of words in P; if m > 1, S600 is performed; otherwise, i.e., m.ltoreq.1, S800 is performed.

S600, obtaining P from the second set corpus _j Obtain P _j Statement set W _j =（w _j1 ，w _j2 ，…，w _jr ，…，w _jh（j）），w _jr Is W _j The value of r is 1 to h (j), h (j) is W _j A second target sentence number in (a).

In the embodiment of the present invention, the corpus in the second set corpus may be the same as the corpus stored in the first set corpus, except that the manner of storing the corpus is different, and the second set corpus is the existing corpus.

S700, if W ₁ ∩W ₂ ∩…∩W _j ∩…∩W _m Not equal to Null, i.e. m statement sets W ₁ 、W ₂ 、…、W _j 、…、W _m If there is an intersection, i.e. comprising the same sentence, then W will be ₁ ∩W ₂ ∩…∩W _j ∩…∩W _m The resulting sentence is taken as a third output result, and S810 is performed.

S800, taking at least part of the first output result and the second output result as a final output result and outputting.

And S810, taking at least part of the first output result, the second output result and the third output result as final output results, and outputting. In the embodiment of the invention, the output result can be displayed on a display screen of a user.

If the output result includes only the second output result, N sentences, which is the set number of output sentences, may be selected from the target sentences acquired in S400 as the output result, for example, randomly, and may be set based on actual needs. Those skilled in the art know that if the target sentence acquired from S400 is smaller than N, all the acquired target sentences may be taken as the output result.

If the output result includes a first output result and a second output result, the first output result includes A1 fixed sentences, the second output result includes A2 sentences, a1+a2=n, N is a set number of output results, and A1 and A2 can be set based on actual needs. Wherein the A1 fixed statement may be a slave pair (k ₁ +k ₂ +…+k _i +…+k _n ) The output results after the duplicate removal processing are selected, for example, randomly selected. The A2 sentences may be selected from the target sentences obtained in S400, for example, randomly selected. Those skilled in the art will recognize that if the total number of sentences in the first output result and the second output result is less than N, the first output result and the second output result may be taken as output results.

If the output result includes a first output result, a second output result, and a third output result, the first output result may include B1 fixed sentences, the second output result includes B2 sentences, and the third output result includes B3 sentences, b1+b2+b3=n. B1, B2 and B3 can be set based on actual needs. Those skilled in the art will recognize that if the total number of sentences in the first output result, the second output result, and the third output result is less than N, the first output result, the second output result, and the second output result may all be taken as output results.

According to the input association method provided by the embodiment, association words can be matched by using the target entity word stock, the first set corpus and the second set corpus to match association words, so that the association words which are as rich as possible can be matched.

In another embodiment of the present invention, S700 further includes: if W is ₁ ∩W ₂ ∩…∩W _j ∩…∩W _m =null, i.e. m statement sets W ₁ 、W ₂ 、…、W _j 、…、W _m If there is no intersection, i.e. the same sentence is not included, the following steps are performed:

s710, obtain P _j Is based on P _j And corresponding substitution word, form P _j Is a combination of PB and PB _j =（P _j ，P _j1 ，P _j2 ，…,P _jx ，…, P _jf(j) ），P _jx Is P _j X has a value of 1 to f (j), f (j) is P _j Is a substitute word number of (c).

In the embodiment of the invention, P _j The substitution word of (1) is with P _j Words with similar meaning, for example, late words are deferred, and south words are south words, east words, etc. P (P) _j The surrogate word of (c) may be obtained based on a preset surrogate word list.

S720, PB-based ₁ ，PB ₂ ，…，PB _j ，…，PB _m Obtaining H combined word segmentation set groups PC= (PC) ₁ ，PC ₂ ，…，PC _s ，…，PC _H ) S-th combined word segmentation set PC _s =（PC _s1 ，PC _s2 ，…，PC _sj ，…，PC _sm ），PC _sj Is PC (personal computer) _s J-th word in (a), PC _sj ∈PB _j And PC (personal computer) _s Not equal to P, i.e. any combined vocabulary set includes PB ₁ ，PB ₂ ，…，PB _j ，…，PB _m One word in each word combination in the database, any two combination word segmentation sets are different, and P is not included in the PC; s730 is performed; s has a value of 1 to H.

In an embodiment of the present invention, PB-based ₁ ，PB ₂ ，…，PB _j ，…，PB _m The H combined word set groups may be obtained based on the existing permutation and combination manner, i.e. h=f (1) f (2) … f (j) … f (m) -1.

S730, obtaining the PC from the second set corpus _sj Obtaining PC _sj Is a set of sentences WC _sj =（wc ¹ _sj ，wc ² _sj ，…，wc ^u _sj ，…，wc ^f（sj） _sj ），wc ^u _sj For WC _sj The u-th third target sentence in (1) is given by the values of 1 to f (sj), and f (sj) is WC _sj A third target sentence number in (a).

S740, obtaining a target sentence result set T= (T) ₁ ，T ₂ ，…，T _s ，…，T _H ) S-th target sentence result T _s =（WC _s1 ∩WC _s2 ∩…∩WC _sj ∩…∩WC _sm ) The method comprises the steps of carrying out a first treatment on the surface of the If at least one target sentence result is not Null in the T, that is, if at least one target sentence result including a sentence is present, the target sentence result which is not Null is used as a fourth output result, and S900 is executed.

In a preferred embodiment of the present invention, taking the target sentence result that is not Null as the fourth output result may include:

if the combined word segmentation set corresponding to the target sentence result which is not Null in the T comprises the word segmentation in the P, acquiring a third target sentence from the combined word segmentation set comprising the word segmentation in the P as a fourth output result, namely preferentially acquiring the sentence from the combined word segmentation set comprising the word segmentation in the P as the fourth output result. More preferably, the sentence is acquired as the fourth output result from the combined word segmentation set including the most words in P.

S900, taking at least part of the first output result, the second output result and the fourth output result as final output results and outputting.

In this embodiment, if the output result includes a first output result, a second output result, and a fourth output result, the first output result may include C1 fixed sentences, the second output result includes C2 sentences, and the fourth output result includes C3 sentences, c1+c2+c3=n. C1, C2 and C3 may be set based on actual needs. Those skilled in the art will recognize that if the total number of sentences in the first output result, the second output result, and the fourth output result is less than N, the first output result, the second output result, and the fourth output result may all be taken as output results.

According to the input association method provided by the embodiment, association word matching is firstly carried out by using the target entity word stock and the first set corpus, then word segmentation processing is carried out on the target character string, association word matching is carried out by using the second set corpus, when the association word is not matched according to the word segmentation, replacement word replacement processing is carried out on the word in the target character string, and matching is carried out by using the second set corpus based on the replaced word, so that the association word which is as rich as possible can be further matched compared with the previous embodiment.

In another embodiment of the present invention, S740 further includes: if T is Null, namely, any target statement result is Null and no statement is included, executing the following steps:

s741, acquiring keywords in P.

In the embodiment of the invention, the keywords are words obtained from word segmentation in P according to a preset rule. In an exemplary embodiment, keywords in P may be obtained based on existing word importance, e.g., keywords in P may be obtained based on entropy of information. Those skilled in the art know that obtaining keywords by entropy of information may be prior art.

S742, obtaining a third target sentence corresponding to the keyword from the second set corpus, and executing S1000 by taking the obtained fourth target sentence as a fifth output result.

S1000, taking at least part of the first output result, the second output result and the fifth output result as final output results and outputting.

In another embodiment of the present invention, S740 further includes: if T is Null, S743 is performed.

S743, acquiring keywords in P based on the set keyword table; s744 is performed.

The set keyword table may be an existing keyword table, and is stored in the server in advance.

In the embodiment of the invention, if one word in the set keyword table is included in P, the word is taken as the keyword of P. If two or more words in the set keyword table are included in P, in one example, one word may be randomly selected as the keyword of P, and in another example, the word having the highest information entropy may be selected as the keyword of P.

S744, a fifth target sentence corresponding to the keyword in P is obtained from the second set corpus, and S1001 is executed using the obtained fifth target sentence as a fifth output result.

S1001, taking at least part of the first output result, the second output result and the fifth output result as final output results, and outputting.

In an embodiment of the present invention, in S743, if P does not include any keyword in the set keyword table, S1001 may be directly executed, except that the fifth output result at this time is Null.

In another embodiment of the present invention, in S743, if any one of the set keyword tables is not included in P, S745 may be performed:

s745, the keyword in P is acquired based on the word importance degree, and S744 is executed.

In this embodiment, if the output result includes a first output result, a second output result, and a fifth output result, the first output result may include D1 fixed sentences, the second output result includes D2 sentences, and the fifth output result includes D3 sentences, d1+d2+d3=n. D1, D2 and D3 may be set based on actual needs. Those skilled in the art will recognize that if the total number of sentences in the first output result, the second output result, and the fourth output result is less than N, all of the first output result, the second output result, and the fifth output result may be taken as output results.

According to the input association method provided by the embodiment, association word matching is firstly carried out by using the target entity word stock and the first set corpus, then word segmentation processing is carried out on the target character string, association word matching is carried out by using the second set corpus, when the association word is not matched according to word segmentation, replacement word replacement processing is carried out on words in the target character string, matching is carried out by using the second set corpus on the basis of the replaced words, and if the association word is not matched, matching is carried out on the basis of key words in the target character string, and compared with the previous embodiment, the association word which is as rich as possible can be further matched.

In another embodiment of the present invention, S700 further includes: if W is ₁ ∩W ₂ ∩…∩W _j ∩…∩W _m =null, the following steps are performed:

s711, obtaining keywords in P.

In the embodiment of the invention, the keywords in P can be acquired based on the existing word importance, for example, the keywords in P can be acquired based on the information entropy. Those skilled in the art know that obtaining keywords by entropy of information may be prior art.

S712, obtaining a sixth target sentence corresponding to the keyword in the P from the second set corpus, and taking the obtained sixth target sentence as a fourth output result, and executing S820.

S820, taking at least part of the first output result, the second output result and the fourth output result as final output results and outputting.

In another embodiment of the present invention, S700 further includes: if W is ₁ ∩W ₂ ∩…∩W _j ∩…∩W _m =null, then S713 is performed.

S713, acquiring keywords in P based on the set keyword table; s714 is performed.

S714, acquiring sentences corresponding to the keywords from the second set corpus, taking the acquired sentences as a fourth output result, and executing S820;

s820, taking at least part of the first output result, the second output result and the fourth output result as output results and outputting.

In an embodiment of the present invention, in S713, if P does not include any keyword in the set keyword table, S820 may be directly performed, except that the fifth output result at this time is Null.

In another embodiment of the present invention, in S713, if any one of the set keyword tables is not included in P, S715 may be executed:

s715, the keyword in P is acquired based on the word importance level, and S714 is executed.

In this embodiment, if W ₁ ∩W ₂ ∩…∩W _j ∩…∩W _m By matching with keywords, the matching can be further performed with as many associated words as possible, as in the case of the matching with the alternative words described above.

In another embodiment of the present invention, S100 is replaced with:

s110, acquiring the length L of the target character string, and executing S200 if L is more than L0; otherwise, executing S300; l0 is a set length and may be set based on actual needs, in one exemplary embodiment L0 is 2 characters or 3 characters, preferably 3 characters.

In this embodiment, fixed sentence matching is performed only when the target string length is greater than L0, so that matching time can be saved and matching efficiency can be improved compared with the foregoing embodiment.

Embodiments of the present invention also provide a non-transitory computer readable storage medium that may be disposed in an electronic device to store at least one instruction or at least one program for implementing one of the methods embodiments, the at least one instruction or the at least one program being loaded and executed by the processor to implement the methods provided by the embodiments described above.

Embodiments of the present invention also provide an electronic device comprising a processor and the aforementioned non-transitory computer-readable storage medium.

Embodiments of the present invention also provide a computer program product comprising program code for causing an electronic device to carry out the steps of the method according to the various exemplary embodiments of the invention as described in the specification, when said program product is run on the electronic device.

While certain specific embodiments of the invention have been described in detail by way of example, it will be appreciated by those skilled in the art that the above examples are for illustration only and are not intended to limit the scope of the invention. Those skilled in the art will also appreciate that many modifications may be made to the embodiments without departing from the scope and spirit of the invention. The scope of the present disclosure is defined by the appended claims.

Claims

1. An input association method, comprising the steps of:

s100, acquiring an input target character string;

s300, traversing n target entity word banks, and for the ith target entity word bank, if any entity word in the ith target entity word bank is contained in the target character string, acquiring a fixed sentence corresponding to the entity word from the ith target entity word bank as k _i Outputting results, wherein the value of i is 1 to n; will (k) ₁ +k ₂ +…+k _i +…+k _n ) The output results are used as first output results; s400 is executed;

s700, if W ₁ ∩W ₂ ∩…∩W _j ∩…∩W _m Not equal to Null, will W ₁ ∩W ₂ ∩…∩W _j ∩…∩W _m The obtained sentence is used as a third output result, and S810 is executed;

s800, taking at least part of the first output result and the second output result as a final output result and outputting the final output result;

and S810, taking at least part of the first output result, the second output result and the third output result as final output results, and outputting.

2. The method of claim 1, wherein S700 further comprises: if W is ₁ ∩W ₂ ∩…∩W _j ∩…∩W _m =null, the following steps are performed:

s710, obtain P _j Is based on P _j And corresponding substitution word, form P _j Is a combination of PB and PB _j =（P _j ，P _j1 ，P _j2 ，…,P _jx ，…, P _jf(j) ），P _jx Is P _j X has a value of 1 to f (j), f (j) is P _j Is the number of substitute words of (a);

s720, PB-based ₁ ，PB ₂ ，…，PB _j ，…，PB _m Obtaining H combined word segmentation set groups PC= (PC) ₁ ，PC ₂ ，…，PC _s ，…，PC _H ) S-th combined word segmentation set PC _s =（PC _s1 ，PC _s2 ，…，PC _sj ，…，PC _sm ），PC _sj Is PC (personal computer) _s J-th word in (a), PC _sj ∈PB _j And PC (personal computer) _s Not equal to P; s730 is performed; s has a value of 1 to H, h=f (1) f (2) … f (j) … f (m) -1;

s730, obtaining the PC from the second set corpus _sj Obtaining PC _sj Is a set of sentences WC _sj =（wc ¹ _sj ，wc ² _sj ，…，wc ^u _sj ，…，wc ^f（sj） _sj ），wc ^u _sj For WC _sj The u-th third target sentence in (1) is given by the values of 1 to f (sj), and f (sj) is WC _sj A third target sentence number in (a);

s740, obtaining a target sentence result set T= (T) ₁ ，T ₂ ，…，T _s ，…，T _H ) S-th target sentence result T _s =（WC _s1 ∩WC _s2 ∩…∩WC _sj ∩…∩WC _sm ) The method comprises the steps of carrying out a first treatment on the surface of the If at least one target sentence exists in T, the result is not Null, taking the target sentence result which is not Null as a fourth output result, and executing S900;

3. The method of claim 2, wherein taking a target sentence result that is not Null as a fourth output result comprises:

if the combined word segmentation set corresponding to the target sentence result which is not Null in the T comprises the word segmentation in the P, acquiring a third target sentence from the combined word segmentation set comprising the word segmentation in the P as a fourth output result.

4. The method of claim 2, wherein S740 further comprises: if T is Null, executing S741;

s741, obtaining keywords in P; the keywords are words obtained from word segmentation in P according to a preset rule;

s742, acquiring a fourth target sentence corresponding to the keyword from the second set corpus, and executing S1000 by taking the acquired fourth target sentence as a fifth output result;

5. The method of claim 2, wherein S740 further comprises: if T is Null, executing S743;

s743, acquiring keywords in P based on the set keyword table; execution S744; s744, obtaining a fifth target sentence corresponding to the keyword in the P from the second set corpus, and taking the obtained fifth target sentence as a fifth output result, and executing S1001;

6. The method of claim 1, wherein S700 further comprises: if W is ₁ ∩W ₂ ∩…∩W _j ∩…∩W _m =null, the following steps are performed:

s711, acquiring keywords in P; the keywords are words obtained from word segmentation in P according to a preset rule;

s712, obtaining a sixth target sentence corresponding to the keyword in the P from the second set corpus, and taking the obtained sixth target sentence as a fourth output result, and executing S820;

7. The method of claim 1, wherein in S800, the first output result includes A1 fixed sentences, the second output result includes A2 first target sentences, a1+a2=n, N is a set number of output sentences;

in S810, the first output result includes B1 fixed sentences, the second output result includes B2 first target sentences, and the third output result includes B3 second target sentences, b1+b2+b3=n.

8. The method of claim 1, wherein the first set corpus is a prefix tree corpus.

9. A non-transitory computer readable storage medium having stored therein at least one instruction or at least one program, wherein the at least one instruction or the at least one program is loaded and executed by a processor to implement the method of any one of claims 1-8.

10. An electronic device comprising a processor and the non-transitory computer readable storage medium of claim 9.