CN110209765B

CN110209765B - Method and device for searching keywords according to meanings

Info

Publication number: CN110209765B
Application number: CN201910433774.6A
Authority: CN
Inventors: 程波
Original assignee: Wuhan Greenet Information Service Co Ltd
Current assignee: Wuhan Greenet Information Service Co Ltd
Priority date: 2019-05-23
Filing date: 2019-05-23
Publication date: 2021-03-30
Anticipated expiration: 2039-05-23
Also published as: CN110209765A

Abstract

The invention relates to the technical field of semantic search, and provides a method and a device for searching keywords according to the semantic meaning. Splitting context information content in the initial matching result according to a preset splitting rule to obtain at least two groups of entry objects; acquiring a corresponding word skipping probability table according to the attribute information of the target object to be searched; searching the word skip probability table according to the sequence of each vocabulary entry contained in each group of vocabulary entry objects to obtain the establishment probability of each group of vocabulary entry objects; and screening the initial matching result according to the establishment probability of each group of entries to obtain the screened matching result. The semantic judgment method adopted by the invention has simple and clear logic and high accuracy after long-time verification.

Description

Method and device for searching keywords according to meanings

[ technical field ] A method for producing a semiconductor device

The invention relates to the technical field of semantic search, in particular to a method and a device for searching keywords according to the semantic meaning.

[ background of the invention ]

In internet applications and traffic monitoring projects, there are often scenes of searching keywords, such as financial news, and if the content contains a name of a certain stock or fund, the current price is automatically displayed behind the name; for another example, in a flow monitoring project, a web page containing a certain keyword needs to be blocked, and in these tasks, keyword search needs to be performed on content.

However, merely based on the binary information of the character and not on the semantic meaning of the character, some unexpected results may be brought, for example, to block a web page containing "chinese" binary words, a science fiction novel containing the following fields will be blocked, "the concept of the country in the constantan civilization does not exist at all", which is obviously not desired by people who have the blocking policy, and there is also a case that since the search of the character is essentially a comparison of binary data in a computer, binary data corresponding to a keyword is found in traffic, which may be only a coincidence, for example, a hit part is only a shaping number and does not represent a character, and if the calculation is hit, unexpected problems may be brought. For another example, an article on a certain comprehensive website contains the following field "continuously increasing the proportion of low-grade products in industrial and agricultural products", often, three characters of "agricultural products" in the article are highlighted, and the article is followed by the market of a stock called "agricultural products", which is obviously not suitable. The above examples are many, and the root is that the semantics of the keywords are not considered when searching for the keywords. Certainly, a mature word segmentation method can be adopted to segment words of the whole article, and then keywords are searched in all the segmented words, so that the semantics is correct, the implementation is complex, and the efficiency is extremely low.

In view of the above, overcoming the drawbacks of the prior art is an urgent problem in the art.

[ summary of the invention ]

The technical problem to be solved by the invention is that the method for searching keywords in the prior art is easy to find the result with inconsistent semantics, and although the improved search method based on semantics is adopted, the method is complex to implement and has low efficiency.

The technical problem to be further solved by the invention is how to more effectively identify the target search result in the environment of large data analysis.

The invention adopts the following technical scheme:

in a first aspect, the present invention provides a method for semantically searching for a keyword, which obtains a keyword to be searched and traffic data of each target object to be searched, and obtains an initial matching result by matching the keyword to be searched and the traffic data, wherein the initial matching result includes context information content corresponding to the keyword to be searched in each traffic data, and includes:

splitting the context information content in the initial matching result according to a preset splitting rule to obtain at least two groups of entry objects;

acquiring a corresponding word skipping probability table according to the attribute information of the target object to be searched;

searching the word skip probability table according to the sequence of each vocabulary entry contained in each group of vocabulary entry objects to obtain the establishment probability of each group of vocabulary entry objects;

and screening the initial matching result according to the establishment probability of each group of entries to obtain the screened matching result.

Preferably, inThe keyword to be searched is X₁,X₂,…,X_n-1,X_nWherein X is_iRepresents a character, i ∈ [1, n ]](ii) a The preset splitting rule specifically comprises the following steps:

splitting the context information content in the matching result according to at least two splitting modes to obtain at least two groups of vocabulary entry objects; wherein, the split mode includes:

the first splitting mode: in the context information content, matching in the word stock is performed by X₁The entry formed by the previous character is marked as W if matching₂If not, then X₁Is recognized as a word, denoted as W₂(ii) a Is located at W in the context information content₂Continue to look for a word before, denoted as W₁(ii) a Wherein, X₂,…,X_n-1,X_nIs marked as W₃In the context information content, at X₂,…,X_n-1,X_nThen find a word, denoted as W₄(ii) a At this point, a set of entry objects, denoted W, is obtained₁W₂W₃W₄；

And a second splitting mode: in the context information content, at X₁,X₂,…,X_n-1Look for a word before, denoted C₁(ii) a Said X₁,X₂,…,X_n-1Is marked as C₂(ii) a To X_nMaking backward combination matching, finding out the longest matched word, and marking as C₃At C₃Then continue to find a word backwards, recorded as C₄(ii) a At this time, a set of entry objects, denoted C, is obtained₁C₂C₃C₄；

A third splitting mode: handle X₁,X₂,…,X_n-1,X_nAs a word, it is marked as M₂(ii) a In the context information content, at X₁Look for a word before, denoted M₁(ii) a In the context information content, at X_nThen find two words, marked as M₃And M₄(ii) a At this time, a set of entry objects, denoted as M, is obtained₁M₂M₃M₄；

The splitting mode is four: handle X₁,X₂,…,X_n-1,X_nAs a word, is denoted as N₃(ii) a In the context information content, at X₁Look for two words before, denoted N₁And N₂(ii) a In the context information content, at X_nThen, a word is searched and marked as N₄(ii) a At this time, a set of entry objects, denoted as N, is obtained₁N₂N₃N₄。

Preferably, said site is at X₁Previously looking for a word or the position X_nThen, a word is searched, and the specific implementation is as follows:

in the context information content, corresponding to the initial reference object when searching, the lengths of continuous characters are increased one by one and are matched with a word stock; until the matching result is not obtained, the continuous characters with the length of the previous round are determined as the characters positioned at X₁Previously looking for a word or the position X_nThen searching a word;

wherein the starting reference object comprises the X₁Or said X_n。

Preferably, the initial matching result is screened according to the establishment probability of each group of entries to obtain the screened matching result, and the method specifically includes:

if M is₁M₂M₃M₄Or N₁N₂N₃N₄Probability of less than W₁W₂W₃W₄And/or C₁C₂C₃C₄Removing the corresponding target object from the initial matching result;

if M is₁M₂M₃M₄Or N₁N₂N₃N₄Probability of (1) is greater than or equal to W₁W₂W₃W₄And/or C₁C₂C₃C₄The probability value of (2) is then the target object is retained in the screened matching result.

Preferably, if the process of obtaining the initial matching result and the process of obtaining the filtered matching result are executed in parallel, the method further includes:

analyzing and obtaining a distribution map of each target object according to the attribute information of each target object contained in the screened matching result; wherein the area of the map is calibrated by the attribute information;

calculating subsequent M for first attribute information in which the ratio of the number of target objects in a certain area exceeds a preset threshold₁M₂M₃M₄Or N₁N₂N₃N₄The probability of (2) is increased by a weighted value, so that the target object belonging to the first attribute information has higher probability of passing the screening.

Preferably, when the target object to be searched is a web page, the attribute information of the target object to be searched specifically includes one or more items of a website topic type, a web page title content, and a web page text classification.

Preferably, the topic type of the website comprises one or more of news, finance, sports, entertainment and synthesis;

the webpage text classification comprises one or more items of a dispersed text, a narrative text and a comprehensive text.

Preferably, the word hop probability table specifically includes:

analyzing the flow data of the potential target object through the big data, and obtaining the part of speech of each entry in the corresponding flow data according to a word bank matching mode; wherein, the part of speech includes one or more items of nouns, verbs, adjectives, adverbs, prepositions, sentence heads, sentence tails and punctuation marks;

wherein, the jump probability table records the probability of completing the corresponding forward and backward sequence jump among the vocabulary entries corresponding to each part of speech.

Preferably, the matching the keyword to be searched and the flow data to obtain an initial matching result specifically includes:

and converting the keywords into codes to be searched of UFT-8, GB2312 and/or BIG5, and matching the flow data of the target object to be searched one by one through the codes to be searched to obtain an initial matching result.

Preferably, when the number of words of the keyword exceeds a preset value, before the splitting of the context information content in the initial matching result according to a preset splitting rule is executed to obtain at least two sets of entry objects, the method includes:

matching to obtain part-of-speech combinations of the keywords according to a word bank;

and obtaining the weighted value in the probability calculation process of each group of vocabulary item objects corresponding to each attribute information according to the part of speech combination.

In a second aspect, the present invention further provides an apparatus for semantic searching keywords, which is used to implement the method for semantic searching keywords in the first aspect, and the apparatus includes:

at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor and programmed to perform the method of semantically searching for keywords according to the first aspect.

In a third aspect, the present invention also provides a non-transitory computer storage medium storing computer-executable instructions for execution by one or more processors for performing the method for semantic searching for keywords according to the first aspect.

Compared with the prior art, the method for judging the semantics has the advantages that the logic is simple and clear, and the accuracy is high after long-time verification.

The traditional method is to perform word segmentation on the whole article or the whole sentence and then search in all word sets. The invention adopts the technical scheme that keywords are analyzed in advance, binary matching is carried out to search the keywords, and whether the searched content meets the overall execution flow of semantics is judged, so that the efficiency is higher; where the performance penalty depends primarily on the key hit rate.

In the preferred scheme of the invention, in the searching process, the attribute information of the searched target object is also dynamically collected and sorted, so that a weighted value with more referential meaning is provided for the subsequent calculation process, and the searching accuracy is further improved.

[ description of the drawings ]

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required to be used in the embodiments of the present invention will be briefly described below. It is obvious that the drawings described below are only some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive effort.

Fig. 1 is a schematic flowchart of a method for searching keywords semantically according to an embodiment of the present invention;

FIG. 2 is a diagram illustrating the effect of rendering context in the initial matching result according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of a splitting manner according to an embodiment of the present invention;

FIG. 4 is a schematic structural diagram of another splitting method provided by the embodiment of the invention;

FIG. 5 is a schematic structural diagram of another splitting method provided by an embodiment of the present invention;

fig. 6 is a schematic structural diagram of another splitting manner provided in the embodiment of the present invention;

FIG. 7 is a schematic diagram of a probability solution of a splitting manner according to an embodiment of the present invention;

fig. 8 is a flowchart illustrating a method for using a part-of-speech weighted value of a long keyword according to an embodiment of the present invention;

fig. 9 is a schematic structural diagram of an apparatus for searching keywords semantically according to an embodiment of the present invention.

[ detailed description ] embodiments

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

In the description of the present invention, the terms "inner", "outer", "longitudinal", "lateral", "upper", "lower", "top", "bottom", and the like indicate orientations or positional relationships based on those shown in the drawings, and are for convenience only to describe the present invention without requiring the present invention to be necessarily constructed and operated in a specific orientation, and thus should not be construed as limiting the present invention.

In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.

Before implementing embodiments of the present invention, it is usually required to perform some conventional operations in retrieving keywords, such as: the method comprises the steps of obtaining keywords to be searched and flow data of target objects to be searched, and obtaining an initial matching result by matching the keywords to be searched and the flow data, wherein the initial matching result comprises context information content corresponding to the keywords to be searched in the flow data.

In the embodiment of the present invention, the traffic data of the target object to be searched may be represented by various web portals and web page contents that can be acquired in the internet, and various media contents that are acquired through internet channels and contain a text expression form.

The matching of the keywords to be searched and the flow data can be realized by using the existing related search matching algorithm, and an implementation mode preferred in the invention will be specifically described in embodiment 1 of the invention; the core point of the invention is that after the initial matching process is completed and the context information content corresponding to the keyword to be searched in each flow data is obtained, how to discriminate which matching results are more in line with the search intention and which are not in line with the search intention through semantic analysis so as to obtain the screened matching results and further reduce the time waste of the matching result browser on the meaningless matching results.

Example 1:

an embodiment 1 of the present invention provides a method for semantically searching for a keyword, where based on the initial matching result obtained, as shown in fig. 1, the method includes:

in step 201, the context information content in the initial matching result is split according to a preset splitting rule, so as to obtain at least two groups of vocabulary entry objects.

In the embodiment of the invention, a shallow definition is given to the preset splitting rule; that is, for any keyword composed of one or more characters, the preliminary splitting is performed according to the following three ways: 1. split into "head character" + "remainder character"; 2. split into "remainder character" + "tail character"; 3. The split is not performed and the 'complete character' is retained. And then, in the second step of splitting, combining the context information content of the keywords and the sowing preliminary splitting result to generate entry objects with a uniform format. The term object combination may include 2 terms, 3 terms, 4 terms, and so on. However, through experimental verification, 2 entries and 3 entries cannot effectively restore the word skipping characteristics in the context information content where the keyword is located, and if the word skipping characteristics exceed 4 entries, part of the word skipping characteristics are far away from the keyword due to the fact that the incidence relation of the keyword is far, the effect of the word on the judgment of the keyword semantics is smaller. Therefore, it is preferable in the embodiment of the present invention that the combination of entry objects is formed by using 4 entries. The implementation of the corresponding entry object combination will be specifically explained in the following embodiments of the present invention.

In step 202, a corresponding word hop probability table is obtained according to the attribute information of the target object to be searched.

And when the target object to be searched is a webpage, the attribute information of the target object to be searched is specifically one or more of a website topic type, a webpage title content and a webpage text classification. The website topic type comprises one or more of news, finance, sports, entertainment and synthesis; the webpage text classification comprises one or more items of a dispersed text, a narrative text and a comprehensive text. For example: according to the URL location to the type of information (prose, narrative, synthesis, etc.), the word jump probability table of the corresponding classification is selected.

As can be seen from the above example, the attribute information of the target object to be searched is also one of the bases for generating the word jump probability table in the embodiment of the present invention, and in the subsequent embodiments of the present invention, several typical contents of the word jump probability table will be specifically shown.

In step 203, the word jump probability table is searched according to the order of each vocabulary entry included in each set of vocabulary entry objects, and the establishment probability of each set of vocabulary entry objects is obtained.

The purpose of this step is to analyze the probability that the current different splitting modes are correspondingly established through a word jump probability table obtained by analyzing historical big data. In different splitting modes, only if the entry object combination obtained by the initial splitting wins in the probability calculation result, the corresponding flow data of the target object entering the initial matching result is shown to be the object which is consistent with the input keyword of the user in the semanteme.

In step 204, the initial matching result is filtered according to the establishment probability of each group of entries, so as to obtain the filtered matching result.

Wherein, each group of entries is the entry object combination obtained by different splitting modes.

In the following, how to implement the term combination is described according to the above-analyzed configuration form of the term object combination with 4 terms. Taking the keyword to be searched as X₁,X₂,…,X_n-1,X_nFor example, wherein X_iRepresents a character, i ∈ [1, n ]](ii) a The preliminary steps involved in step 201 in example 1Setting a splitting rule (wherein the described preliminary splitting content is merged), specifically including:

the context information content in the matching result is divided according to at least two of the following dividing modes as shown in fig. 2, wherein the keyword is contained in the context information content, so as to obtain at least two groups of vocabulary entry objects; wherein, the split mode includes:

the first splitting mode: in the context information content, matching in the word stock is performed by X₁The entry formed by the previous character is marked as W if matching₂If not, then X₁Is recognized as a word, denoted as W₂(ii) a Is located at W in the context information content₂Continue to look for a word before, denoted as W₁(ii) a Wherein, X₂,…,X_n-1,X_nIs marked as W₃In the context information content, at X₂,…,X_n-1,X_nThen find a word, denoted as W₄(ii) a At this point, a set of entry objects, denoted as W as shown in FIG. 3, is obtained₁W₂W₃W₄；

And a second splitting mode: in the context information content, at X₁,X₂,…,X_n-1Look for a word before, denoted C₁(ii) a Said X₁,X₂,…,X_n-1Is marked as C₂(ii) a To X_nMaking backward combination matching, finding out the longest matched word, and marking as C₃At C₃Then continue to find a word backwards, recorded as C₄(ii) a At this point, a set of entry objects, denoted C as shown in FIG. 4, is obtained₁C₂C₃C₄；

A third splitting mode: handle X₁,X₂,…,X_n-1,X_nAs a word, it is marked as M₂(ii) a In the context information content, at X₁Look for a word before, denoted M₁(ii) a In the context information content, at X_nThen find two words, marked as M₃And M₄(ii) a At this point, a set of entry objects, denoted M as shown in FIG. 5, is obtained₁M₂M₃M₄；

The splitting mode is four: handle X₁,X₂,…,X_n-1,X_nAs a word, is denoted as N₃(ii) a In the context information content, at X₁Look for two words before, denoted N₁And N₂(ii) a In the context information content, at X_nThen, a word is searched and marked as N₄(ii) a At this point, a set of entry objects, denoted N as shown in FIG. 6, is obtained₁N₂N₃N₄。

Wherein said site is X₁Previously looking for a word or the position X_nThen, a word is searched, and the specific implementation is as follows:

wherein the starting reference object comprises the X₁Or said X_n。

It is emphasized that the site X₁Previously looking for a word or the position X_nThen, a word is sought, which is only one of the different ways of splitting described above, for example, located at X₁Previously, a word was sought, which in the different splitting modes described above is also denoted as "located at W in the context information content₂The search continues before "and in a concrete split mode, the operation of searching for a word also exists in the case of containing X₁By itself, e.g. "consisting of X₁The entry formed by the previous character is marked as W if matching₂". However, in any form, the basic principle can adopt the implementation given above, that is, "in the context information content, the lengths of the consecutive characters are increased one by one corresponding to the initial reference object when searching, and are matched with the lexicon; until the matching result is not obtained, the continuous words with the length of the previous round are determinedIs marked by the position X₁Previously looking for a word or the position X_nAnd then look for a word ".

Further, with reference to the above-mentioned example of the entry object combination, it is further seen that the initial matching result is screened according to the establishment probability of each group of entries related in step 203 in embodiment 1, so as to obtain a screened matching result, and the specific implementation content is represented as:

if M is₁M₂M₃M₄Or N₁N₂N₃N₄Probability of less than W₁W₂W₃W₄And/or C₁C₂C₃C₄Removing the corresponding target object from the initial matching result; as shown in FIG. 7, M is shown₁M₂M₃M₄Schematic diagram for calculating probability, wherein probability value is P₁*P₂*P₃Wherein P is₁Means from M₁Jump to M of part of speech₂Probability of belonging part of speech, P₂Means from M₂Jump to M of part of speech₃Probability of belonging part of speech, P₃Means from M₃Jump to M of part of speech₄Probability of belonging part of speech, and corresponding P₁、P₂And P₃The parameter value of (2) can be obtained by looking up a jump probability table.

Considering an implementation situation, when the retrieved target traffic data is large, the preferred operation mode is a process of obtaining an initial matching result and a process of obtaining a filtered matching result, and parallel execution processes are adopted, and then the method further includes:

calculating subsequent M for first attribute information in which the ratio of the number of target objects in a certain area exceeds a preset threshold₁M₂M₃M₄Or N₁N₂N₃N₄The probability of (2) is increased by a weighted value, so that the target object belonging to the first attribute information has higher probability of passing the screening. In order to improve the use effect of the weighted value, the identification of the target object in the area can be completed by an operator; therefore, the determination manner of "the ratio of the number of target objects exceeds the preset threshold" may be replaced with "the number of times of the target object is determined to be incorrect is less than the preset threshold". The preset threshold may be set according to experience, and the experience is also determined according to the total analyzed flow data of the target object to be searched.

In this embodiment of the present invention, the word jump probability table specifically includes:

For example, the general hop probability table used when the attribute information cannot be determined is as follows:

then for prose, the average period is shorter and the punctuation marks are more, such as "day, blue, heart, gray. ", the hop probability is schematically as follows:

p (adjective->Noun)	0.81
		P (period->Noun)	0.88
P (period->Adjective word)	0.21
		P (verb->Noun)	0.72
P (verb->Adjective word)	0.19
		P (preposition->Adjective word)	0.55
P (preposition->Punctuation mark)	0.10
		P (noun->Punctuation mark)	0.66
P (punctuation mark->Noun)	0.91
		P (punctuation mark->Preposition word)	0.80

Comparing the two, it can be clearly seen that the probability of "punctuation- > preposition" in the prose is strengthened to reach 0.80, which is only represented as 0.30 in the general jump probability table. Other values of the probability parameter, which are presented as examples, cannot be expressed as true values; the probability values of different jump modes can be calculated by statistics of semantic analysis in the existing traffic data, namely, the ratio of the occurrence times of different jump types to the total jump occurrence times in the total traffic data.

In the embodiment of the present invention, since an implementation scheme of first matching and then analyzing semantics is adopted, compared with a manner of splitting traffic data according to semantics and then performing matching in the prior art, the present invention further specially provides a method for completing preliminary matching, where matching the keyword to be searched and the traffic data to obtain an initial matching result specifically includes:

The keywords for searching set forth in the above contents of the embodiment of the present invention generally mean that the keywords themselves have entry characteristics, and in an actual situation, the expression form of the keywords may also be entry combinations, even sentences and the like, and at this time, the keywords have a part-of-speech combination characteristic; as can be known from practice, in the traffic data of different attribute information, the proportions of different parts of speech combinations are greatly different, so that, in combination with the embodiment of the present invention, there is a possible improvement, as shown in fig. 8, when the number of words of the keyword exceeds a preset value (i.e. the keyword is not formed by a single entry by default), before the splitting of the context information content in the initial matching result according to a preset splitting rule is performed to obtain at least two sets of entry objects, the method includes:

in step 301, a part-of-speech combination for the keyword is obtained by matching according to a lexicon.

In the embodiment of the present invention, the function of the word stock at least includes determining a part of speech based on matching, determining that the word is a complete entry based on matching, determining the existence probability of each entry when two or more entries are satisfied simultaneously based on matching, and the like. Wherein, the existence probability of each entry is determined according to the matching when two or more entries are satisfied at the same time, and the method is particularly suitable for the embodiment of the invention when the step of₁Previously looking for a word or the position X_nWhen looking for a word "later, the situation of the invention is possible. In particular, when the condition for setting search completion is not matched, and the condition for completing search on one match does not occur when two or more entries are satisfied at the same time.

In step 302, according to the part of speech combination, a weighted value in the process of calculating the probability of each group of vocabulary item objects corresponding to each attribute information is obtained.

The keywords are simply split, so that most scenes with ambiguous semantics can be solved; in general, the keyword itself set by the user is a whole (a word or a sentence) and has independent and complete meaning, but in few cases, the first character of the keyword is a part of other words, or the last character is a part of other words, such as the keyword "china", and then the person in the search strategy must consider "china" as a concept of a country, but in this case: "the concept of the country in this stellar culture does not exist at all" but "china" is not a word and is less likely to be the concept of the country. In rare cases, the front 2 words or the rear 2 words of the keywords belong to other words, and only a common scene needs to be considered, so that the logic is simple and easy to implement, and the performance loss is low.

In addition, the invention does not need to carry out semantic analysis on the whole article or the whole sentence, and only determines the combination of the maximum probability according to the word attributes marked in the word stock; meanwhile, probability value weighting is carried out by combining the length of the keyword and the occurrence frequency of the keyword in the whole information, so that a very high semantic accuracy can be obtained;

different information classifications have different word jump probability tables, since the semantic analysis is not performed on the whole article or sentence, the accuracy of the word jump probability table is required to be improved as much as possible.

Example 2:

fig. 9 is a schematic structural diagram of a semantic keyword searching apparatus based on human body status according to an embodiment of the present invention. The semantic search keyword device based on the human body state of the present embodiment includes one or more processors 21 and a memory 22. In fig. 9, one processor 21 is taken as an example.

The processor 21 and the memory 22 may be connected by a bus or other means, and fig. 9 illustrates the connection by a bus as an example.

The memory 22, as a non-volatile computer-readable storage medium for a method and apparatus for semantic searching for keywords, may be used to store non-volatile software programs and non-volatile computer-executable programs, such as the method for semantic searching for keywords in example 1. The processor 21 performs a method of semantically searching for keywords by executing a non-volatile software program and instructions stored in the memory 22.

The memory 22 may include high speed random access memory and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, the memory 22 may optionally include memory located remotely from the processor 21, and these remote memories may be connected to the processor 21 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The program instructions/modules are stored in the memory 22 and, when executed by the one or more processors 21, perform the method of semantically searching for keywords in embodiment 1 above, e.g., perform the various steps shown in fig. 1 and/or fig. 7 described above.

It should be noted that, for the information interaction, execution process and other contents between the modules and units in the apparatus and system, the specific contents may refer to the description in the embodiment of the method of the present invention because the same concept is used as the embodiment of the processing method of the present invention, and are not described herein again.

Those of ordinary skill in the art will appreciate that all or part of the steps of the various methods of the embodiments may be implemented by associated hardware as instructed by a program, which may be stored on a computer-readable storage medium, which may include: read Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disks, and the like.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims

1. A method for searching keywords according to meanings obtains keywords to be searched and flow data of target objects to be searched, and obtains an initial matching result by matching the keywords to be searched and the flow data, wherein the initial matching result comprises context information content corresponding to the keywords to be searched in the flow data, and the method is characterized by comprising the following steps:

screening the initial matching result according to the establishment probability of each group of entries to obtain a screened matching result;

the keyword to be searched is X₁,X₂,…,X_n-1,X_nWherein X is_iRepresents a character, i ∈ [1, n ]](ii) a The preset splitting rule specifically comprises the following steps:

2. The method for semantic search of keywords according to claim 1, wherein the position X is₁Previously looking for a word or the position X_nThen, a word is searched, and the specific implementation is as follows:

wherein the starting reference object comprises the X₁Or said X_n。

3. The method of claim 1, wherein the step of screening the initial matching result according to the probability of occurrence of each set of entries to obtain a screened matching result comprises:

4. The method for semantic search of keywords according to claim 1, wherein the process of obtaining the initial matching results and the process of obtaining the filtered matching results are executed in parallel, and the method further comprises:

5. The method for searching keywords semantically according to claim 1, wherein when the target object to be searched is a web page, the attribute information of the target object to be searched is specifically one or more of a website topic type, a web page title content, and a web page text classification.

6. The method for semantic search of keywords according to claim 5, wherein the topic type of the website comprises one or more of news, finance, sports, entertainment, and comprehension;

7. The method for semantic search of a keyword according to claim 1, wherein the word hop probability table is specifically:

8. The method for semantic search of keywords according to claim 1, wherein when the number of words of the keyword exceeds a preset value, before the splitting of the context information content in the initial matching result according to a preset splitting rule is performed to obtain at least two sets of entry objects, the method comprises:

9. An apparatus for semantically searching for a keyword, the apparatus comprising:

at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor and programmed to perform the method of semantically searching for keywords according to any of claims 1-8.