CN117421418A - Text searching method and device based on keywords and electronic equipment - Google Patents

Text searching method and device based on keywords and electronic equipment Download PDF

Info

Publication number
CN117421418A
CN117421418A CN202311483219.7A CN202311483219A CN117421418A CN 117421418 A CN117421418 A CN 117421418A CN 202311483219 A CN202311483219 A CN 202311483219A CN 117421418 A CN117421418 A CN 117421418A
Authority
CN
China
Prior art keywords
index
keyword
search
keywords
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311483219.7A
Other languages
Chinese (zh)
Inventor
张若晨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202311483219.7A priority Critical patent/CN117421418A/en
Publication of CN117421418A publication Critical patent/CN117421418A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures

Abstract

The application discloses a text searching method and device based on keywords and electronic equipment, and relates to the field of big data, the field of financial science and technology and other related technical fields. The text searching method based on the keywords comprises the following steps: obtaining N search keywords corresponding to a target user, determining the similarity between each search keyword in the N search keywords and each index keyword in the M index keywords, determining financial labels corresponding to each index keyword in the M index keywords and emotion labels corresponding to each index keyword, and determining search results corresponding to the N search keywords according to the similarity between each search keyword and each index keyword, the financial labels corresponding to each index keyword and the emotion labels corresponding to each index keyword. The method and the device solve the technical problem that the accuracy of the search results returned by text search based on keywords in the prior art is low.

Description

Text searching method and device based on keywords and electronic equipment
Technical Field
The application relates to the field of big data, the field of financial science and technology and other related technical fields, and in particular relates to a text searching method and device based on keywords and electronic equipment.
Background
With the development of big data technology, the frequency of using a search engine by a user is higher and higher, after the user inputs a search keyword into a search box, the search engine matches the search keyword with index keywords in an index library, in the prior art, the weight of the index keyword is generally determined according to the similarity between the index keyword and the search keyword, and then, the search result is determined according to the single similarity characteristic of each index keyword, so that the problem of low accuracy of the search result returned by text search based on keywords in the prior art is caused.
In view of the above problems, no effective solution has been proposed at present.
Disclosure of Invention
The application provides a text searching method and device based on keywords and electronic equipment, and aims to at least solve the technical problem that in the prior art, the accuracy of a search result returned by text searching based on the keywords is low.
According to one aspect of the present application, there is provided a text search method based on keywords, including: acquiring N search keywords corresponding to a target user, wherein N is a positive integer; determining the similarity between each search keyword in N search keywords and each index keyword in M index keywords, wherein M is a positive integer; determining financial labels corresponding to each index keyword and emotion labels corresponding to each index keyword in M index keywords, wherein the financial labels are used for representing the types of financial products corresponding to the index keywords, and the emotion labels are used for representing the interest level of target users on the index keywords; and determining search results corresponding to the N search keywords according to the similarity between each search keyword and each index keyword, and the financial label corresponding to each index keyword and the emotion label corresponding to each index keyword.
Optionally, the text searching method based on the keyword further comprises: data cleaning is carried out on the search text input by the target user into the search box, so that a target search text is obtained, wherein the data cleaning is used for unifying the text format of the search text; word segmentation processing is carried out on the target search text to obtain P words, wherein P is a positive integer greater than or equal to N; acquiring a first key weight corresponding to each word in the P words, wherein the first key weight corresponding to each word is used for representing the frequency of the word in the target search text; acquiring a second key weight corresponding to each word in the P words, wherein the second key weight corresponding to each word is used for representing the occurrence frequency of the word in a historical search text, and the historical search text is a text input to a search box by a target user in a historical preset time period; and determining N search keywords corresponding to the target user according to the first key weight and the second key weight corresponding to each word in the P words.
Optionally, carrying out weighted summation calculation on the first key weight and the second key weight corresponding to each word in the P words to obtain a target key weight corresponding to the word; determining that the first target word is not a search keyword, wherein a target keyword weight corresponding to the first target word is smaller than a preset keyword weight; and determining that the second target word is a search keyword, wherein a target keyword weight corresponding to the second target word is larger than or equal to a preset keyword weight.
Optionally, R index texts included in the index library are obtained, wherein each index text includes S index keywords, R and S are positive integers, and S is less than or equal to M; determining index word frequency corresponding to each index keyword in the M index keywords, wherein the index word frequency is used for representing the quantity ratio of S index keywords included in each index text in the R index texts; determining index inverse document frequency corresponding to each index keyword in the M index keywords, wherein the index inverse document frequency is used for representing the quantity ratio of index texts comprising the index keywords in the R index texts; and determining an index vector corresponding to each index keyword according to the index word frequency and the index inverse document frequency corresponding to each index keyword in the M index keywords.
Optionally, determining a vector inner product between a search vector corresponding to each search keyword and an index vector corresponding to each index keyword; determining the modular length of the search vector corresponding to each search keyword and the modular length of the index vector corresponding to each index keyword; and determining the similarity between each search keyword and each index keyword according to the ratio of the vector inner product corresponding to each search keyword to the target product, wherein the target product is obtained by multiplying the modular length corresponding to the search keyword and the modular length corresponding to the index keyword.
Optionally, obtaining a historical feedback result corresponding to the M index keywords, wherein the historical feedback result comprises browsing times and browsing time of the target user on index texts corresponding to each index keyword in the M index keywords; determining a first emotion score corresponding to each index keyword according to the browsing times of the index text corresponding to the index keyword, wherein the browsing times of the index text are in direct proportion to the first emotion score; determining a second emotion score corresponding to each index keyword according to the browsing time length of the index text corresponding to each index keyword, wherein the browsing time length of the index text is in direct proportion to the second emotion score; performing weighted average calculation on the first emotion scores and the second emotion scores corresponding to each index keyword to obtain target emotion scores corresponding to the index keywords; and determining the emotion label corresponding to each index keyword according to the target emotion score corresponding to the index keyword.
Optionally, determining a similarity score corresponding to each search keyword according to the similarity between each index keyword and each search keyword, wherein the similarity score is in direct proportion to the similarity; determining financial scores corresponding to financial labels of each index keyword, wherein the financial scores are used for representing purchase preference degrees of users on financial products corresponding to the financial labels; and determining search results corresponding to the N search keywords according to the similarity scores, the financial scores and the target emotion scores corresponding to each of the S index keywords included in each index text.
Optionally, taking the sum of similarity scores corresponding to each of the S index keywords included in each index text as a first index score of the index text; taking the sum of financial scores corresponding to each index keyword in the S index keywords included in each index text as a second index score of the index text; taking the sum of target emotion scores corresponding to each index keyword in S index keywords contained in each index text as a third index score of the index text; performing weighted average calculation on the first index score, the second index score and the third index score corresponding to each index text to obtain a target index score corresponding to the index text; and sequencing the R index texts according to the target index scores corresponding to the index texts to obtain sequencing results, and taking the sequencing results corresponding to the R index texts as search results corresponding to the N search keywords.
According to another aspect of the present application, there is also provided a text search apparatus based on keywords, including: the acquisition unit is used for acquiring N search keywords corresponding to the target user, wherein N is a positive integer; a first determining unit, configured to determine a similarity between each of the N search keywords and each of the M index keywords, where M is a positive integer; the second determining unit is used for determining financial labels corresponding to each index keyword in the M index keywords and emotion labels corresponding to each index keyword, wherein the financial labels are used for representing the types of financial products corresponding to the index keywords, and the emotion labels are used for representing the interest level of a target user on the index keywords; and the third determining unit is used for determining the search results corresponding to the N search keywords according to the similarity between each search keyword and each index keyword, the financial label corresponding to each index keyword and the emotion label corresponding to each index keyword.
According to another aspect of the present application, there is also provided a computer readable storage medium having a computer program stored therein, wherein the computer readable storage medium is controlled to execute the keyword-based text search method of any one of the above items by a device in which the computer program is located when the computer program is run.
According to another aspect of the present application, there is also provided an electronic device, wherein the electronic device includes one or more processors and a memory for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the keyword-based text search method of any of the above.
In the method, N search keywords corresponding to a target user are firstly obtained, wherein N is a positive integer, secondly, similarity between each search keyword in the N search keywords and each index keyword in the M index keywords is determined, M is a positive integer, then, financial labels corresponding to each index keyword in the M index keywords and emotion labels corresponding to each index keyword are determined, wherein the financial labels are used for representing types of financial products corresponding to the index keywords, the emotion labels are used for representing interest levels of the target user to the index keywords, and then, search results corresponding to the N search keywords are determined according to the similarity between each search keyword and each index keyword and the emotion labels corresponding to each index keyword.
As can be seen from the above, compared with the prior art, the method determines the search result according to the single feature corresponding to the index keyword, and determines the search result corresponding to the N search keywords according to the multiple features of the index keyword, where the multiple features of the index keyword include the similarity between each search keyword and each index keyword, the financial tag corresponding to each index keyword, and the emotion tag corresponding to each index keyword, so as to achieve the purpose of determining the search result according to the type of the financial product and the interest preference information of the user.
Therefore, the technical scheme of the application realizes the purpose of determining the search results according to the types of financial products and the interest preference information of the user by determining the search results corresponding to the N search keywords according to the characteristics of the index keywords, thereby realizing the purpose of improving the accuracy of the search results returned by text search based on the keywords, and further solving the technical problem of low accuracy of the search results returned by text search based on the keywords in the prior art.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute an undue limitation to the application. In the drawings:
FIG. 1 is a flow chart of an alternative keyword-based text search method according to an embodiment of the present application;
FIG. 2 is a flow chart of an alternative method of determining the similarity between each search keyword and each index keyword according to an embodiment of the present application;
FIG. 3 is a flowchart of an alternative method of determining emotion tags for each index key, according to an embodiment of the present application;
FIG. 4 is a flow chart of an alternative method of determining search results according to an embodiment of the present application;
FIG. 5 is a schematic diagram of an alternative keyword-based text search apparatus according to an embodiment of the present application;
fig. 6 is a schematic diagram of an alternative electronic device according to an embodiment of the present application.
Detailed Description
In order to make the present application solution better understood by those skilled in the art, the following description will be made in detail and with reference to the accompanying drawings in the embodiments of the present application, it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, shall fall within the scope of the present application.
It should be noted that the terms "first," "second," and the like in the description and claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that embodiments of the present application described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
It should be further noted that, related information (including the historical feedback result corresponding to the user and the search keyword of the search box input by the user) and data (including but not limited to data for presentation and analyzed data) related to the application are both information and data authorized by the user or fully authorized by each party. For example, an interface is provided between the system and the relevant user or institution, before acquiring the relevant information, the system needs to send an acquisition request to the user or institution through the interface, and acquire the relevant information after receiving the consent information fed back by the user or institution.
The present application is further illustrated below in conjunction with various embodiments.
Example 1
According to embodiments of the present application, a keyword-based text search method embodiment is provided, it should be noted that the steps illustrated in the flowchart of the drawings may be performed in a computer system such as a set of computer executable instructions, and although a logical order is illustrated in the flowchart, in some cases, the steps illustrated or described may be performed in an order other than that illustrated herein.
The present application provides a text search system based on keywords (abbreviated as search system) for executing a text search method based on keywords in the present application, fig. 1 is a flowchart of an alternative text search method based on keywords according to an embodiment of the present application, as shown in fig. 1, and the method includes the following steps:
step S101, N search keywords corresponding to a target user are obtained.
In step S101, N is a positive integer.
Optionally, the search system firstly acquires a search text input to the search box by the target user, and then sequentially performs data cleaning, word segmentation and word deactivation operation on the search text to obtain N search keywords corresponding to the target user.
Step S102, determining the similarity between each search keyword in the N search keywords and each index keyword in the M index keywords.
In step S102, M is a positive integer.
Optionally, the search system first obtains a search vector corresponding to each search keyword, then obtains an index vector corresponding to each index keyword, and then determines a similarity between each search keyword and each index keyword according to a cosine distance, a euclidean distance, a manhattan distance, or a cosine similarity between the search vector and the index vector.
Step S103, determining financial labels corresponding to each index keyword in the M index keywords and emotion labels corresponding to each index keyword.
In step S103, the financial tag is used to characterize the category of the financial product corresponding to the index keyword, and the emotion tag is used to characterize the interest level of the target user in the index keyword.
Optionally, the search system acquires the historical browsing times and the historical browsing time of the target user on the index text corresponding to each index keyword, so that the interest level of the target user on the index keywords is determined according to the historical feedback result (namely the historical browsing times and the historical browsing time of the index text) corresponding to each index keyword.
Step S104, according to the similarity between each search keyword and each index keyword, determining the search results corresponding to the N search keywords according to the financial labels corresponding to each index keyword and the emotion labels corresponding to each index keyword.
As can be seen from the above, compared with the prior art, the method determines the search result according to the single feature corresponding to the index keyword, and determines the search result corresponding to the N search keywords according to the multiple features of the index keyword, where the multiple features of the index keyword include the similarity between each search keyword and each index keyword, the financial tag corresponding to each index keyword, and the emotion tag corresponding to each index keyword, so as to achieve the purpose of determining the search result according to the type of the financial product and the interest preference information of the user.
Therefore, the technical scheme of the application realizes the purpose of determining the search results according to the types of financial products and the interest preference information of the user by determining the search results corresponding to the N search keywords according to the characteristics of the index keywords, thereby realizing the purpose of improving the accuracy of the search results returned by text search based on the keywords, and further solving the technical problem of low accuracy of the search results returned by text search based on the keywords in the prior art.
In an alternative embodiment, after data cleaning is performed on a search text of a target user input search box, a target search text is obtained, wherein the data cleaning is used for unifying text formats of the search text, word segmentation processing is performed on the target search text to obtain P words, wherein P is a positive integer greater than or equal to N, then a first keyword weight corresponding to each word in the P words is obtained, wherein the first keyword weight corresponding to each word is used for representing the frequency of occurrence of the word in the target search text, then a second keyword weight corresponding to each word in the P words is obtained, wherein the second keyword weight corresponding to each word is used for representing the frequency of occurrence of the word in the historical search text, the historical search text is text input to the search box by the target user in a historical preset time period, and finally N search keywords corresponding to the target user are determined according to the first keyword weight and the second keyword weight corresponding to each word in the P words.
Optionally, after performing word segmentation processing on the target search text to obtain P words, the search system performs word disabling operation on the P words corresponding to the target search text to obtain Q first words, where Q is a positive integer less than or equal to P, the disabling words at least include a connective word, a preposition, a pronoun and an article, and then determines N search keywords according to a first keyword weight and a second keyword weight corresponding to each first word in the Q first words.
In an alternative embodiment, the search system performs weighted summation calculation on a first keyword weight and a second keyword weight corresponding to each word in the P words to obtain a target keyword weight corresponding to the word, then determines that the first target word is not a search keyword, wherein the target keyword weight corresponding to the first target word is smaller than a preset keyword weight, and then determines that the second target word is a search keyword, wherein the target keyword weight corresponding to the second target word is greater than or equal to the preset keyword weight.
Optionally, the search system sorts the P words according to the order of the target key weight corresponding to each word in the P words from large to small, so as to obtain a sorting result, and the first N words in the sorting result are selected as N search keywords.
In an alternative embodiment, the search system firstly obtains R index texts included in the index library, where each index text includes S index keywords, R and S are positive integers, and S is less than or equal to M, secondly determines an index word frequency corresponding to each index keyword of the M index keywords, where the index word frequency is used to characterize a number ratio of the index keywords to the S index keywords included in each index text of the R index texts, and then determines an index inverse document frequency corresponding to each index keyword of the M index keywords, where the index inverse document frequency is used to characterize a number ratio of the index text including the index keyword of the R index texts to the R index texts, and then determines an index vector corresponding to the index keyword according to the index word frequency and the index inverse document frequency corresponding to each index keyword of the M index keywords.
Optionally, the search system determines a search word frequency corresponding to each of the N search keywords, where the search word frequency is used to characterize a number ratio of the search keywords among the P words, and then determines a search inverse document frequency corresponding to each of the N search keywords, where the search inverse document frequency is used to characterize a number ratio of the search text including the search keywords among the R historical search texts, R is a positive integer, and then determines a search vector corresponding to each of the N search keywords according to the search word frequency and the search inverse document frequency corresponding to the search keywords.
In an alternative embodiment, fig. 2 is a flowchart of an alternative method of determining the similarity between each search keyword and each index keyword according to an embodiment of the present application, as shown in fig. 2, the method comprising the steps of:
step S201, a search vector corresponding to each of the N search keywords is obtained.
Step S202, determining the vector inner product between the search vector corresponding to each search keyword and the index vector corresponding to each index keyword.
Step S203, determining the modular length of the search vector corresponding to each search keyword and the modular length of the index vector corresponding to each index keyword.
Step S204, the similarity between each search keyword and each index keyword is determined according to the ratio of the vector inner product corresponding to each search keyword to the target product.
In step S204, the target product is obtained by multiplying the modulo length corresponding to the search keyword by the modulo length corresponding to the index keyword.
For example, assuming that a search vector corresponding to one of the N search keywords is an a vector and an index vector corresponding to one of the M index keywords is a B vector, a similarity between the a vector and the B vector is (a·b)/(|a|b||), where a·b is used to represent an inner product between the a vector and the B vector, |a||is used to represent a modular length of the a vector, and|b|is used to represent a modular length of the B vector.
In an alternative embodiment, fig. 3 is a flowchart of an alternative method for determining emotion tags corresponding to each index key according to an embodiment of the present application, as shown in fig. 3, including the steps of:
step S301, obtaining historical feedback results corresponding to the M index keywords.
In step S301, the history feedback result includes the browsing times and browsing durations of the target user on the index text corresponding to each of the M index keywords.
Step S302, determining a first emotion score corresponding to each index keyword according to the browsing times of the index text corresponding to the index keyword.
In step S302, the number of browses of the index text is proportional to the first emotion score.
Step S303, determining a second emotion score corresponding to each index keyword according to the browsing duration of the index text corresponding to the index keyword.
In step S303, the browsing duration of the index text is proportional to the second emotion score.
Step S304, weighted average calculation is carried out on the first emotion scores and the second emotion scores corresponding to each index keyword, and target emotion scores corresponding to the index keywords are obtained.
Step S305, determining emotion labels corresponding to the index keywords according to the target emotion scores corresponding to the index keywords.
Optionally, the financial tags corresponding to the index key include, but are not limited to: the system comprises a first label, a second label and a third label, wherein the first label is used for representing that the type of the financial product corresponding to the index keyword is a fund product, the second label is used for representing that the type of the financial product corresponding to the index keyword is a loan product, the third label is used for representing that the type of the financial product corresponding to the index keyword is an insurance product, and the search system determines the financial score of the first label, the financial score of the second label and the financial score of the third label according to the sales corresponding to the fund product, the sales corresponding to the loan product and the sales corresponding to the insurance product.
In an alternative embodiment, the search system firstly determines a similarity score corresponding to each index keyword according to the similarity between the index keywords and each search keyword, wherein the similarity score is in direct proportion to the similarity, secondly determines a financial score corresponding to a financial label of each index keyword, wherein the financial score is used for representing the purchase preference degree of a user on a financial product corresponding to the financial label, and then determines search results corresponding to N search keywords according to the similarity score, the financial score and the target emotion score corresponding to each index keyword in S index keywords included in each index text.
In an alternative embodiment, FIG. 4 is a flowchart of an alternative method of determining search results, as shown in FIG. 4, according to an embodiment of the present application, the method comprising the steps of:
in step S401, the sum of similarity scores corresponding to each of the S index keywords included in each index text is used as the first index score of the index text.
Step S402, taking the sum of financial scores corresponding to each index keyword in S index keywords included in each index text as a second index score of the index text.
Step S403, taking the sum of the target emotion scores corresponding to each of the S index keywords included in each index text as the third index score of the index text.
Step S404, performing weighted average calculation on the first index score, the second index score and the third index score corresponding to each index text to obtain a target index score corresponding to the index text.
Step S405, sorting R index texts according to the target index scores corresponding to each index text to obtain sorting results, and taking the sorting results corresponding to the R index texts as search results corresponding to N search keywords.
Optionally, after obtaining the target index score corresponding to each index text, the search system determines W target index texts from the R index texts according to the target index score corresponding to each index text, where W is a positive integer less than or equal to R, and the target index texts are index texts with target index scores greater than or equal to a preset score threshold, and then, the search system ranks the W target index texts according to the order of the target index scores from large to small, so as to obtain a ranking result, and uses the ranking result as a search result corresponding to the N search keywords.
As can be seen from the above, compared with the prior art, the method determines the search result according to the single feature corresponding to the index keyword, and determines the search result corresponding to the N search keywords according to the multiple features of the index keyword, where the multiple features of the index keyword include the similarity between each search keyword and each index keyword, the financial tag corresponding to each index keyword, and the emotion tag corresponding to each index keyword, so as to achieve the purpose of determining the search result according to the type of the financial product and the interest preference information of the user.
Therefore, the technical scheme of the application realizes the purpose of determining the search results according to the types of financial products and the interest preference information of the user by determining the search results corresponding to the N search keywords according to the characteristics of the index keywords, thereby realizing the purpose of improving the accuracy of the search results returned by text search based on the keywords, and further solving the technical problem of low accuracy of the search results returned by text search based on the keywords in the prior art.
Example 2
According to an embodiment of the present application, an embodiment of a text search device based on keywords is provided. Fig. 5 is a schematic diagram of an alternative keyword-based text search apparatus according to an embodiment of the present application, as shown in fig. 5, the keyword-based text search apparatus includes: a first acquisition unit 501, a first determination unit 502, a second determination unit 503, and a third determination unit 504.
Optionally, the first obtaining unit is configured to obtain N search keywords corresponding to the target user, where N is a positive integer, the first determining unit is configured to determine a similarity between each search keyword in the N search keywords and each index keyword in the M index keywords, where M is a positive integer, the second determining unit is configured to determine a financial tag corresponding to each index keyword in the M index keywords and an emotion tag corresponding to each index keyword, where the financial tag is used to characterize a category of a financial product corresponding to the index keyword, the emotion tag is used to characterize an interest level of the target user in the index keywords, and the third determining unit is configured to determine a search result corresponding to the N search keywords according to the similarity between each search keyword and each index keyword, the financial tag corresponding to each index keyword, and the emotion tag corresponding to each index keyword.
In an alternative embodiment, the acquisition unit further comprises: the device comprises a data cleaning subunit, a word segmentation subunit, a first acquisition subunit, a second acquisition subunit and a first determination subunit.
Optionally, the data cleaning subunit is configured to perform data cleaning on a search text input by a target user to a search box to obtain a target search text, where the data cleaning is configured to unify a text format of the search text, the word segmentation subunit is configured to perform word segmentation processing on the target search text to obtain P words, where P is a positive integer greater than or equal to N, the first obtaining subunit is configured to obtain a first keyword weight corresponding to each word in the P words, where the first keyword weight corresponding to each word is configured to characterize a frequency of occurrence of the word in the target search text, and the second obtaining subunit is configured to obtain a second keyword weight corresponding to each word in the P words, where the second keyword corresponding to each word is configured to characterize a frequency of occurrence of the word in a history search text, and the history search text is a text input to the search box by the target user in a history preset time period, and the first determining subunit is configured to determine N keywords corresponding to the target user according to the first keyword and the second keyword corresponding to each word in the P words.
In an alternative embodiment, the first determining subunit further comprises: the device comprises a first computing module, a first determining module and a second determining module.
Optionally, the first calculating module is configured to perform weighted summation calculation on a first keyword weight and a second keyword weight corresponding to each word in the P words to obtain a target keyword weight corresponding to the word, the first determining module is configured to determine that the first target word is not a search keyword, where the target keyword weight corresponding to the first target word is smaller than a preset keyword weight, and the second determining module is configured to determine that the second target word is a search keyword, and where the target keyword weight corresponding to the second target word is greater than or equal to the preset keyword weight.
In an alternative embodiment, the keyword-based text search apparatus further includes: a second acquisition unit, a fourth determination unit, a fifth determination unit, and a sixth determination unit.
Optionally, the second obtaining unit is configured to obtain R index texts included in the index library, where each index text includes S index keywords, R and S are positive integers, and S is less than or equal to M, the fourth determining unit is configured to determine an index word frequency corresponding to each index keyword of the M index keywords, where the index word frequency is configured to characterize a number ratio of the index keywords to S index keywords included in each index text of the R index texts, and the fifth determining unit is configured to determine an index inverse document frequency corresponding to each index keyword of the M index keywords, where the index inverse document frequency is configured to characterize a number ratio of the index text including the index keyword of the R index texts in the R index texts, and the sixth determining unit determines an index vector corresponding to the index keyword according to the index word frequency and the index inverse document frequency corresponding to each index keyword of the M index keywords.
In an alternative embodiment, the first determining unit comprises: the third acquisition subunit, the second determination subunit, the third determination subunit, and the fourth determination subunit.
Optionally, the method further includes a third obtaining subunit, configured to obtain a search vector corresponding to each of the N search keywords, a second determining subunit, configured to determine a vector inner product between the search vector corresponding to each search keyword and the index vector corresponding to each index keyword, a third determining subunit, configured to determine a modular length of the search vector corresponding to each search keyword and a modular length of the index vector corresponding to each index keyword, and a fourth determining subunit, configured to determine a similarity between each search keyword and each index keyword according to a ratio of the vector inner product corresponding to each search keyword to a target product, where the target product is obtained by multiplying the modular length corresponding to the search keyword by the modular length corresponding to the index keyword.
In an alternative embodiment, the second determining unit comprises: a fourth acquisition subunit, a fifth determination subunit, a sixth determination subunit, a calculation subunit, and a seventh determination subunit.
Optionally, the fourth obtaining subunit is configured to obtain a historical feedback result corresponding to the M index keywords, where the historical feedback result includes a browsing frequency and a browsing duration of an index text corresponding to each index keyword in the M index keywords, the fifth determining subunit is configured to determine a first emotion score corresponding to the index keyword according to the browsing frequency of the index text corresponding to each index keyword, where the browsing frequency of the index text is in direct proportion to the first emotion score, the sixth determining subunit is configured to determine a second emotion score corresponding to the index keyword according to the browsing duration of the index text corresponding to each index keyword, where the browsing duration of the index text is in direct proportion to the second emotion score, the calculating subunit is configured to perform weighted average calculation on the first emotion score and the second emotion score corresponding to each index keyword, to obtain a target emotion score corresponding to the index keyword, and the seventh determining subunit is configured to determine an emotion tag corresponding to the index keyword according to the target emotion score corresponding to each index keyword.
In an alternative embodiment, the third determining unit further comprises: an eighth determination subunit, a ninth determination subunit, and a tenth determination subunit.
Optionally, the eighth determining subunit is configured to determine a similarity score corresponding to each search keyword according to a similarity between each index keyword and each search keyword, where the similarity score is proportional to the similarity, and the ninth determining subunit is configured to determine a financial score corresponding to a financial tag of each index keyword, where the financial score is used to characterize a purchase preference degree of a user for a financial product corresponding to the financial tag, and the tenth determining subunit is configured to determine a search result corresponding to N search keywords according to the similarity score, the financial score, and the target emotion score corresponding to each index keyword in the S index keywords included in each index text.
In an alternative embodiment, the tenth determination subunit further comprises: the system comprises a third determining module, a fourth determining module, a fifth determining module, a second calculating module and a sequencing module.
Optionally, the third determining module is configured to use a sum of similarity scores corresponding to each of the S index keywords included in each index text as a first index score of the index text, the fourth determining module is configured to use a sum of financial scores corresponding to each of the S index keywords included in each index text as a second index score of the index text, and the fifth determining module is configured to use a sum of target emotion scores corresponding to each of the S index keywords included in each index text as a third index score of the index text; the second calculation module is used for carrying out weighted average calculation on the first index score, the second index score and the third index score corresponding to each index text to obtain a target index score corresponding to the index text, and the ordering module is used for ordering R index texts according to the target index score corresponding to each index text to obtain an ordering result, and the ordering result corresponding to the R index texts is used as a search result corresponding to N search keywords.
In the method, N search keywords corresponding to a target user are firstly obtained, wherein N is a positive integer, secondly, similarity between each search keyword in the N search keywords and each index keyword in the M index keywords is determined, M is a positive integer, then, financial labels corresponding to each index keyword in the M index keywords and emotion labels corresponding to each index keyword are determined, wherein the financial labels are used for representing types of financial products corresponding to the index keywords, the emotion labels are used for representing interest levels of the target user to the index keywords, and then, search results corresponding to the N search keywords are determined according to the similarity between each search keyword and each index keyword and the emotion labels corresponding to each index keyword.
As can be seen from the above, compared with the prior art, the method determines the search result according to the single feature corresponding to the index keyword, and determines the search result corresponding to the N search keywords according to the multiple features of the index keyword, where the multiple features of the index keyword include the similarity between each search keyword and each index keyword, the financial tag corresponding to each index keyword, and the emotion tag corresponding to each index keyword, so as to achieve the purpose of determining the search result according to the type of the financial product and the interest preference information of the user.
Therefore, the technical scheme of the application realizes the purpose of determining the search results according to the types of financial products and the interest preference information of the user by determining the search results corresponding to the N search keywords according to the characteristics of the index keywords, thereby realizing the purpose of improving the accuracy of the search results returned by text search based on the keywords, and further solving the technical problem of low accuracy of the search results returned by text search based on the keywords in the prior art.
Example 3
According to another aspect of the embodiments of the present application, there is also provided a computer readable storage medium, including a stored computer program, where the computer program when executed controls a device in which the computer readable storage medium is located to perform the keyword-based text search method of any one of the above embodiments 1.
Example 4
According to another aspect of the embodiments of the present application, there is also provided an electronic device, including: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to perform the keyword-based text search method of any of the above-described embodiments 1 via execution of the executable instructions.
Fig. 6 is a schematic diagram of an alternative electronic device according to an embodiment of the present application, and as shown in fig. 6, the embodiment of the present application provides an electronic device, where the electronic device includes a processor, a memory, and a program stored on the memory and capable of running on the processor, and the processor implements the keyword-based text search method in any one of the foregoing embodiments 1 when executing the program.
The foregoing embodiment numbers of the present application are merely for describing, and do not represent advantages or disadvantages of the embodiments.
In the foregoing embodiments of the present application, the descriptions of the embodiments are emphasized, and for a portion of this disclosure that is not described in detail in this embodiment, reference is made to the related descriptions of other embodiments.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory. The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, etc., such as Read Only Memory (ROM) or flash RAM. Memory is an example of a computer-readable medium.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises an element.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and changes may be made to the present application by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc. which are within the spirit and principles of the present application are intended to be included within the scope of the claims of the present application.

Claims (11)

1. A keyword-based text search method, comprising:
acquiring N search keywords corresponding to a target user, wherein N is a positive integer;
determining the similarity between each search keyword in the N search keywords and each index keyword in M index keywords, wherein M is a positive integer;
Determining a financial tag corresponding to each index keyword and an emotion tag corresponding to each index keyword in the M index keywords, wherein the financial tag is used for representing the category of financial products corresponding to the index keywords, and the emotion tag is used for representing the interest level of the target user in the index keywords;
and determining search results corresponding to the N search keywords according to the similarity between each search keyword and each index keyword, and the financial label corresponding to each index keyword and the emotion label corresponding to each index keyword.
2. The keyword-based text search method of claim 1, wherein obtaining N search keywords corresponding to a target user comprises:
performing data cleaning on the search text input by the target user into a search box to obtain a target search text, wherein the data cleaning is used for unifying the text format of the search text;
word segmentation processing is carried out on the target search text to obtain P words, wherein P is a positive integer greater than or equal to N;
acquiring a first key weight corresponding to each word in the P words, wherein the first key weight corresponding to each word is used for representing the frequency of the word in the target search text;
Acquiring a second key weight corresponding to each word in the P words, wherein the second key weight corresponding to each word is used for representing the occurrence frequency of the word in a historical search text, and the historical search text is a text input to a search box by the target user in a historical preset time period;
and determining N search keywords corresponding to the target user according to the first key weight and the second key weight corresponding to each word in the P words.
3. The keyword-based text search method of claim 2, wherein determining N search keywords corresponding to the target user according to the first keyword weight and the second keyword weight corresponding to each word of the P words comprises:
carrying out weighted summation calculation on a first key weight and a second key weight corresponding to each word in the P words to obtain a target key weight corresponding to the word;
determining that a first target word is not the search keyword, wherein a target keyword weight corresponding to the first target word is smaller than a preset keyword weight;
and determining that the second target word is the search keyword, wherein a target keyword weight corresponding to the second target word is greater than or equal to the preset keyword weight.
4. The keyword-based text search method of claim 1, wherein prior to determining the similarity between each of the N search keywords and each of the M index keywords, the keyword-based text search method further comprises:
r index texts included in an index library are obtained, wherein each index text comprises S index keywords, R and S are positive integers, and S is smaller than or equal to M;
determining an index word frequency corresponding to each index keyword in the M index keywords, wherein the index word frequency is used for representing the quantity ratio of S index keywords included in each index text in the R index texts;
determining an index inverse document frequency corresponding to each index keyword in the M index keywords, wherein the index inverse document frequency is used for representing the quantity ratio of index texts including the index keywords in the R index texts;
and determining an index vector corresponding to each index keyword according to the index word frequency and the index inverse document frequency corresponding to each index keyword in the M index keywords.
5. The keyword-based text search method of claim 1, wherein determining a similarity between each of the N search keywords and each of the M index keywords comprises:
acquiring a search vector corresponding to each search keyword in the N search keywords;
determining the vector inner product between the search vector corresponding to each search keyword and the index vector corresponding to each index keyword;
determining the modular length of the search vector corresponding to each search keyword and the modular length of the index vector corresponding to each index keyword;
and determining the similarity between each search keyword and each index keyword according to the ratio of the vector inner product corresponding to each search keyword to a target product, wherein the target product is obtained by multiplying the modular length corresponding to the search keyword and the modular length corresponding to the index keyword.
6. The keyword-based text search method of claim 4, wherein determining an emotion tag corresponding to each of the M index keywords comprises:
Acquiring historical feedback results corresponding to the M index keywords, wherein the historical feedback results comprise browsing times and browsing time of the target user on index texts corresponding to each index keyword in the M index keywords;
determining a first emotion score corresponding to each index keyword according to the browsing times of the index text corresponding to the index keyword, wherein the browsing times of the index text are in direct proportion to the first emotion score;
determining a second emotion score corresponding to each index keyword according to the browsing duration of the index text corresponding to the index keyword, wherein the browsing duration of the index text is in direct proportion to the second emotion score;
performing weighted average calculation on the first emotion scores and the second emotion scores corresponding to each index keyword to obtain target emotion scores corresponding to the index keywords;
and determining the emotion label corresponding to each index keyword according to the target emotion score corresponding to each index keyword.
7. The keyword-based text search method of claim 6, wherein determining the search results corresponding to the N search keywords according to the similarity between each search keyword and each index keyword, the financial tag corresponding to each index keyword, and the emotion tag corresponding to each index keyword, comprises:
Determining a similarity score corresponding to each search keyword according to the similarity between each index keyword and each search keyword, wherein the similarity score is in direct proportion to the similarity;
determining financial scores corresponding to financial labels of the index keywords, wherein the financial scores are used for representing purchase preference degrees of users on financial products corresponding to the financial labels;
and determining search results corresponding to the N search keywords according to the similarity scores, the financial scores and the target emotion scores corresponding to each of the S index keywords included in each index text.
8. The keyword-based text search method of claim 7, wherein determining the search results corresponding to the N search keywords according to the similarity score, the financial score, and the target emotion score corresponding to each of the S index keywords included in each of the index texts, comprises:
taking the sum of similarity scores corresponding to each index keyword in S index keywords contained in each index text as a first index score of the index text;
Taking the sum of financial scores corresponding to each index keyword in the S index keywords contained in each index text as a second index score of the index text;
taking the sum of target emotion scores corresponding to each index keyword in the S index keywords contained in each index text as a third index score of the index text;
performing weighted average calculation on the first index score, the second index score and the third index score corresponding to each index text to obtain a target index score corresponding to the index text;
and sequencing the R index texts according to the target index scores corresponding to the index texts to obtain sequencing results, and taking the sequencing results corresponding to the R index texts as search results corresponding to the N search keywords.
9. A keyword-based text search apparatus, comprising:
the first acquisition unit is used for acquiring N search keywords corresponding to a target user, wherein N is a positive integer;
a first determining unit, configured to determine a similarity between each of the N search keywords and each of M index keywords, where M is a positive integer;
A second determining unit, configured to determine a financial tag corresponding to each index keyword and an emotion tag corresponding to each index keyword in the M index keywords, where the financial tag is used to characterize a category of a financial product corresponding to the index keyword, and the emotion tag is used to characterize an interest level of the target user in the index keyword;
and the third determining unit is used for determining search results corresponding to the N search keywords according to the similarity between each search keyword and each index keyword, and the financial label corresponding to each index keyword and the emotion label corresponding to each index keyword.
10. A computer readable storage medium, wherein a computer program is stored in the computer readable storage medium, and wherein the computer program when executed controls a device in which the computer readable storage medium is located to perform the keyword-based text search method according to any one of claims 1 to 8.
11. An electronic device comprising one or more processors and a memory for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the keyword-based text search method of any of claims 1 to 8.
CN202311483219.7A 2023-11-08 2023-11-08 Text searching method and device based on keywords and electronic equipment Pending CN117421418A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311483219.7A CN117421418A (en) 2023-11-08 2023-11-08 Text searching method and device based on keywords and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311483219.7A CN117421418A (en) 2023-11-08 2023-11-08 Text searching method and device based on keywords and electronic equipment

Publications (1)

Publication Number Publication Date
CN117421418A true CN117421418A (en) 2024-01-19

Family

ID=89528191

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311483219.7A Pending CN117421418A (en) 2023-11-08 2023-11-08 Text searching method and device based on keywords and electronic equipment

Country Status (1)

Country Link
CN (1) CN117421418A (en)

Similar Documents

Publication Publication Date Title
US9460117B2 (en) Image searching
CN110019669B (en) Text retrieval method and device
CN110046303B (en) Information recommendation method and device based on demand matching platform
US8290925B1 (en) Locating product references in content pages
CN111061954B (en) Search result sorting method and device and storage medium
CN113688313A (en) Training method of prediction model, information pushing method and device
CN112070550A (en) Keyword determination method, device and equipment based on search platform and storage medium
CN107391535A (en) The method and device of document is searched in document application
CN117033744A (en) Data query method and device, storage medium and electronic equipment
CN116028626A (en) Text matching method and device, storage medium and electronic equipment
CN117421418A (en) Text searching method and device based on keywords and electronic equipment
CN106886546B (en) Construction method and equipment of data website
CN114676677A (en) Information processing method, information processing apparatus, server, and storage medium
CN114840762A (en) Recommended content determining method and device and electronic equipment
CN110210030B (en) Statement analysis method and device
CN113590805A (en) Method and device for searching textile commodity names based on knowledge graph
CN114661958A (en) Tree structure data searching method and device, electronic equipment and storage medium
CN110019771B (en) Text processing method and device
CN111144098A (en) Recall method and device for expanded question sentence
CN112561744A (en) Method and device for generating similar case retrieval report
CN115499400B (en) An information processing method a device(s) electronic device and computer storage medium
CN116610869B (en) Recommended content management method and device, electronic equipment and storage medium
CN110781365A (en) Commodity searching method, device and system and electronic equipment
CN114218259B (en) Multi-dimensional scientific information search method and system based on big data SaaS
CN117591624B (en) Test case recommendation method based on semantic index relation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination