CN117743376B

CN117743376B - Big data mining method, device and storage medium for digital financial service

Info

Publication number: CN117743376B
Application number: CN202410183608.6A
Authority: CN
Inventors: 张钰琨
Original assignee: Blue Flame Technology Chengdu Co ltd
Current assignee: Blue Flame Technology Chengdu Co ltd
Priority date: 2024-02-19
Filing date: 2024-02-19
Publication date: 2024-05-03
Anticipated expiration: 2044-02-19
Also published as: CN117743376A

Abstract

The invention belongs to the technical field of big data mining, and discloses a big data mining method, a device and a storage medium of digital financial services, wherein the invention utilizes the big data mining technology to mine financial service search records of the latest time of users, and keyword extraction processing is carried out on the financial service search records to obtain keywords of the financial services of interest to the users; meanwhile, the invention also provides a keyword expansion step, so that expansion keywords of the financial service keywords can be obtained; then, based on the financial service keywords and the expansion keywords, a plurality of financial services interested by the user can be matched; compared with the traditional technology, the method and the device not only improve the diversity of recommendation, but also improve the accuracy of recommendation, can bring more diversified choices to users, and are suitable for large-scale application and popularization.

Description

Big data mining method, device and storage medium for digital financial service

Technical Field

The invention belongs to the technical field of big data mining, and particularly relates to a big data mining method, a device and a storage medium for digital financial services.

Background

With the improvement of living standard, more and more users tend to purchase digital financial services for financial investment or enjoying insurance service, wherein the digital financial services are combined with traditional financial service modes through internet and information technology means, such as online loans, online insurance, online funds and the like, and are rapidly developed as new generation financial services; at present, a financial system mainly receives a financial service request sent by a user in a passive mode, and performs corresponding transaction processing based on financial service products to be purchased in the request (for example, the user selects corresponding insurance service based on own demand, and then consults with corresponding insurance company sales personnel by the user so as to complete the purchase of the insurance); meanwhile, the financial system can recommend similar products to each user based on the financial service products purchased by the user, so that the sales of the products can be realized; however, the aforementioned recommendation method has the following disadvantages: because only the user request can be passively received, or similar financial products are recommended to the users who have purchased the products in a scattered manner, the recommended financial service products are single and more diversified choices cannot be brought to the users; therefore, how to provide a big data mining method of digital financial services capable of recommending diversified financial service products for users has become a problem to be solved.

Disclosure of Invention

The invention aims to provide a big data mining method, a big data mining device and a storage medium for digital financial services, which are used for solving the problem that the recommended financial service products in the prior art are single and can not bring more diversified selections to users.

In order to achieve the above purpose, the present invention adopts the following technical scheme:

in a first aspect, a big data mining method for digital financial services is provided, including:

Acquiring a network search corpus of a target user, wherein the network search corpus comprises financial service search records of the target user in a preset historical time period;

Classifying the web search corpus to take financial service search records belonging to search sentences as texts to be segmented and financial service search records belonging to search keywords as first keywords;

extracting keywords from each text to be segmented to obtain second keywords of each text to be segmented;

determining financial search keywords based on each first keyword and each second keyword, and performing keyword mining processing on the financial search keywords to obtain at least one expanded keyword;

according to the financial search keywords and the at least one expansion keyword, performing financial service matching processing to match out a first financial service corresponding to the financial search keywords and a second financial service corresponding to each expansion keyword;

and taking the first financial service and the second financial service as recommended financial services, and pushing the recommended financial services to the user terminals corresponding to the target users.

Based on the disclosure, the invention firstly crawls the financial service search records of the target user in a preset historical time period, and generates a web search corpus based on the crawled search records; then classifying the web search corpus to take the search records belonging to the search keywords in the corpus as the first keywords and the search records belonging to the search sentences in the corpus as the texts to be segmented; then, the invention extracts keywords of each text to be segmented to obtain second keywords of each text to be segmented; thus, the first keyword and the second keyword are keywords corresponding to the financial service of interest to the user; based on the above, matching of financial service products can be performed based on the keywords; specifically, the financial search keyword is determined according to the first keyword and the second keyword; meanwhile, the invention also carries out mining processing on the financial search keywords to obtain expanded keywords; finally, matching of the financial services can be performed based on the keywords of the financial services of interest to the user and the corresponding expanded keywords thereof, so that the financial services of interest to the user can be obtained and pushed to the target user.

Through the design, the invention utilizes the big data mining technology to mine the financial service search record of the user in the latest time, and the key words of the financial service of interest of the user are obtained by extracting the key words; meanwhile, the invention also provides a keyword expansion step, so that expansion keywords of the financial service keywords can be obtained; then, based on the financial service keywords and the expansion keywords, a plurality of financial services interested by the user can be matched; compared with the traditional technology, the method and the device not only improve the diversity of recommendation, but also improve the accuracy of recommendation, can bring more diversified choices to users, and are suitable for large-scale application and popularization.

In one possible design, the keyword extraction process is performed on each text to be segmented to obtain a second keyword of each text to be segmented, including:

for any text to be segmented, performing word segmentation processing on the text to be segmented to obtain a plurality of segmented words;

Constructing a semantic network diagram of any text to be segmented by utilizing the plurality of segmented words, wherein the semantic network diagram comprises a plurality of nodes, each node corresponds to one segmented word of the plurality of segmented words respectively, and when the segmented words corresponding to any two nodes have an association relationship, the any two nodes are connected by adopting an association edge;

Calculating the relativity of each word segment in the plurality of word segments relative to any text to be segmented based on each text to be segmented and the semantic network diagram;

And determining a second keyword of any text to be cut from the plurality of segmented words according to the relativity of each segmented word relative to any text to be cut.

In one possible design, calculating, based on each text to be segmented and the semantic network graph, a relevance of each word segment in the plurality of word segments with respect to the text to be segmented includes:

For any node in the semantic network graph, calculating word frequency-reverse file frequency values of word segmentation corresponding to any node based on each text to be segmented;

according to the semantic network diagram, calculating the importance degree and the clustering contribution degree of the word segmentation corresponding to any node in the semantic network diagram;

and calculating the relativity of the word segmentation corresponding to any node relative to any text to be segmented according to the word frequency-reverse file frequency value of the word segmentation corresponding to any node and the importance and clustering contribution of the word segmentation corresponding to any node in the semantic network diagram.

In one possible design, based on each text to be segmented, calculating a word frequency-reverse file frequency value of the segmented word corresponding to the any node includes:

according to the following formula (1), calculating word frequency-reverse file frequency values of word segmentation corresponding to any node;

（1）

in the above-mentioned formula (1), Word frequency-reverse file frequency value representing word segmentation corresponding to any node,/>Representing the word segmentation/>, corresponding to any nodeThe number of occurrences in any of the text d to be cut,Representing the number of word segmentation with the largest occurrence number in any text d to be segmented,/>Representing that each text to be segmented contains word segmentation/> corresponding to any nodeIs the number of texts of/>Representing the total number of text to be cut.

In one possible design, according to the semantic network diagram, calculating the importance degree and the cluster contribution degree of the word segmentation corresponding to the any node in the semantic network diagram includes:

According to the following formula (2), calculating the importance of the word corresponding to any node in the semantic network diagram, and according to the following formula (3), calculating the clustering contribution of the word corresponding to any node in the semantic network diagram;

（2）

In the above-mentioned formula (2), The importance of the word corresponding to any node in the semantic network diagram is represented,Representing the average length of each first target path in the semantic network graph,/>Representing the average length of each second target path in the semantic network graph, wherein any first target path is a path containing any node, and any second target path is a path not containing any node;

（3）

In the above-mentioned formula (3), Representing the clustering contribution degree of the segmentation corresponding to any node in the semantic network diagram,/>Clustering coefficients representing corresponding word segmentation of ith node in the semantic network diagram,/>Clustering coefficient representing the corresponding word segmentation of the jth node in the target semantic network diagram, wherein/>Q represents the number of nodes directly connected with the ith node in the semantic network diagram, v represents the association edges between the ith node and each target node and the total number of association edges between each target node in the semantic network diagram, each target node is a node directly connected with the ith node, n is the total number of nodes in the semantic network diagram, k is the total number of nodes in the target semantic network diagram, and the target semantic network is the semantic network diagram after deleting any node;

Correspondingly, according to the word frequency-reverse file frequency value of the word segmentation corresponding to any node and the importance degree and the clustering contribution degree of the word segmentation corresponding to any node in the semantic network graph, calculating the relevance degree of the word segmentation corresponding to any node relative to any text to be segmented, wherein the method comprises the following steps:

Calculating the relativity of the segmentation corresponding to any node relative to any text to be segmented according to the following formula (4);

（4）

in the above-mentioned formula (4), Representing the relativity of the segmentation corresponding to any node relative to any text to be segmented,/>A word frequency-reverse file frequency value representing the word segmentation corresponding to any node,Representing the adjustment factor.

In one possible design, according to the relativity of each word segment relative to the text to be cut, determining the second keyword of the text to be cut from a plurality of word segments, including:

Sequencing each word according to the sequence of the correlation degree from large to small so as to obtain an initial keyword sequence;

Based on any text to be cut, carrying out keyword merging processing on initial keywords in the initial keyword sequence so as to obtain a plurality of candidate keywords after the keyword merging processing;

And selecting the candidate keywords with p bits at the front from a plurality of candidate keywords as the second keywords of any text to be cut, wherein p is a positive integer greater than 1.

In one possible design, the keyword mining process is performed on the financial search keyword to obtain at least one expanded keyword, including:

acquiring an expansion word database and semantic vectors of the financial search keywords, wherein the expansion word database stores massive expansion words and the semantic vectors of each expansion word;

Calculating the vector distance between the semantic vector of the financial search keyword and the semantic vector of each expansion word in the expansion word database;

And determining at least one expansion keyword of the search keyword according to the vector distance between the semantic vector of the financial search keyword and the semantic vector of each expansion word.

In a second aspect, there is provided a big data mining apparatus for digital financial services, comprising:

The crawling unit is used for acquiring a network search corpus of a target user, wherein the network search corpus comprises financial service search records of the target user in a preset historical time period;

The classification unit is used for classifying the web search corpus to take financial service search records belonging to search sentences as texts to be segmented and financial service search records belonging to search keywords as first keywords;

The keyword extraction unit is used for extracting keywords from each text to be segmented to obtain second keywords of each text to be segmented;

the keyword expansion unit is used for determining financial search keywords based on the first keywords and the second keywords, and carrying out keyword mining processing on the financial search keywords to obtain at least one expansion keyword;

A financial service matching unit, configured to perform financial service matching processing according to the financial search keyword and the at least one expansion keyword, so as to match a first financial service corresponding to the financial search keyword and a second financial service corresponding to each expansion keyword;

and the pushing unit is used for taking the first financial service and the second financial service as recommended financial services and pushing the recommended financial services to the user terminals corresponding to the target users.

In a third aspect, another big data mining apparatus for digital financial services is provided, taking the apparatus as an electronic device, and the big data mining apparatus includes a memory, a processor and a transceiver, which are sequentially communicatively connected, where the memory is configured to store a computer program, the transceiver is configured to send and receive a message, and the processor is configured to read the computer program, and execute a big data mining method for the digital financial services as in the first aspect or any one of the first aspect possible designs.

In a fourth aspect, there is provided a storage medium having instructions stored thereon which, when executed on a computer, perform a method of big data mining of the digital financial service as in the first aspect or any one of the possible designs of the first aspect.

The beneficial effects are that:

The invention utilizes big data mining technology to mine the financial service search record of the user in the latest time, and obtains the keywords of the financial service of interest to the user by extracting the keywords; meanwhile, the invention also provides a keyword expansion step, so that expansion keywords of the financial service keywords can be obtained; then, based on the financial service keywords and the expansion keywords, a plurality of financial services interested by the user can be matched; compared with the traditional technology, the method and the device not only improve the diversity of recommendation, but also improve the accuracy of recommendation, can bring more diversified choices to users, and are suitable for large-scale application and popularization.

Drawings

FIG. 1 is a schematic flow chart of steps of a big data mining method for digital financial services according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a big data mining device for digital financial services according to an embodiment of the present invention;

Fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the present invention will be briefly described below with reference to the accompanying drawings and the description of the embodiments or the prior art, and it is obvious that the following description of the structure of the drawings is only some embodiments of the present invention, and other drawings can be obtained according to these drawings without inventive effort to a person skilled in the art. It should be noted that the description of these examples is for aiding in understanding the present invention, but is not intended to limit the present invention.

Examples:

Referring to fig. 1, in the big data mining method of digital financial service provided in this embodiment, a keyword of a financial service of interest to a user is extracted by crawling a financial service search record of the user in a recent time period, and expansion is performed based on the extracted keyword, so as to obtain an expanded keyword; finally, matching the financial service of interest of the user according to the extracted keyword of the financial service of interest of the user and the expansion keyword thereof, and pushing the service; compared with the traditional technology, the method not only improves the diversity of recommendation, but also improves the accuracy of recommendation, can bring more diversified choices to users, and is suitable for large-scale application and popularization; in this embodiment, the method may be, but not limited to, running on the digital financial server side, and it is to be understood that the foregoing execution subject is not limited to the embodiment of the present application, and accordingly, the running steps of the method may be, but not limited to, as shown in the following steps S1 to S6.

S1, acquiring a network search corpus of a target user, wherein the network search corpus comprises financial service search records of the target user in a preset historical time period; in this embodiment, the crawling of the financial service search records of the target user may be performed by, but not limited to, adopting web crawler technology, wherein its crawling sources may be, but not limited to, search engines, websites, financial apps, etc.; and the crawling time may be, but is not limited to, one month, half month or one week before the current time; meanwhile, after a plurality of search records and/or logs are crawled out, data can be cleaned and screened to obtain financial service search records of a target user in a preset historical time period, and then the crawled out search records are utilized to form a network search corpus.

After the web search corpus of the target user is constructed, keywords of the user for performing financial search in a preset history period can be extracted based on the web search corpus, so that keywords of financial services interested by the user are obtained, wherein the keyword extraction process can be shown as the following steps S2 and S3, but is not limited to the following steps.

S2, classifying the web search corpus to take financial service search records belonging to search sentences as texts to be segmented and financial service search records belonging to search keywords as first keywords; in this embodiment, the search records that are already keywords need to be screened out from the network search prediction library, that is, the target user directly uses the search records that are searched by keywords, such as the search records of car insurance, loan, etc., all serve as the search records belonging to the search keywords, while the remaining search records in the network search corpus serve as the search records belonging to the search sentences; therefore, the search records which are already keywords do not need to be subjected to keyword extraction processing, and can be directly used as financial keywords (namely first keywords) interested by users, and the search records which belong to search sentences need to be subjected to keyword extraction processing; based on the design, the operation speed can be greatly improved.

After the classification processing of the web search corpus is completed, keyword extraction processing can be performed on search records belonging to search sentences so as to extract second keywords corresponding to each search sentence; the keyword extraction process of each text to be segmented (i.e., the search record belonging to the search sentence) is as follows in step S3.

S3, extracting keywords from each text to be segmented to obtain second keywords of each text to be segmented; in a specific application, since the keyword extraction process of each text to be cut is the same, the keyword extraction process will be described below by taking any text to be cut as an example, where the specific process may be, but is not limited to, as shown in steps S31 to S34 below.

S31, for any text to be segmented, performing word segmentation processing on the text to be segmented to obtain a plurality of segmented words; in this embodiment, the method may, but is not limited to, use a word segmentation device to segment any text to be segmented, for example, use jieba chinese word segmentation device, and at the same time, after obtaining a plurality of word segmentation words, perform processing such as removing stop words; therefore, the interference of irrelevant words such as stop words and the like on the extraction of the subsequent keywords can be avoided.

In the embodiment, the word segmentation words of any text to be segmented are utilized to construct a semantic network diagram, and then the relativity of each word segmentation word relative to any text to be segmented is calculated based on the semantic network diagram; finally, extracting keywords according to the correlation degree; the construction process of the semantic network graph is shown in the following step S32.

S32, constructing a semantic network diagram of any text to be segmented by utilizing the plurality of segmented words, wherein the semantic network diagram comprises a plurality of nodes, each node corresponds to one segmented word in the plurality of segmented words respectively, and when the segmented words corresponding to any two nodes have an association relationship, the any two nodes are connected by adopting an association edge; in this embodiment, the semantic network graph uses a single word or phrase as a node, uses an association relationship between words or phrases as an edge, and finally maps a text or an article into a word as a node, and uses the relationship between words as an edge; of course, the semantic network graph is a common technology in the word segmentation field, for example, the semantic network graph can be constructed based on the part of speech of each word and by using a k nearest neighbor coupling graph construction algorithm.

After the semantic network diagram of any text to be segmented is constructed, combining all the text to be segmented to calculate the relativity of each word relative to the text to be segmented; the process of calculating the correlation is shown in the following step S33.

S33, calculating the relativity of each word segment in the plurality of word segments relative to any text to be segmented based on each text to be segmented and the semantic network diagram; in this embodiment, since the relevance calculating process of each word segment is the same, any word segment is taken as an example to describe the calculating process of the relevance of the word segment with respect to any text to be cut, where the specific calculating process may be, but is not limited to, as shown in the following steps S33a to S33 c.

S33a, for any node in the semantic network diagram, calculating word frequency-reverse file frequency values of word segmentation corresponding to any node based on each text to be segmented; in particular implementation, the importance of a word relative to a text may be determined by the number of occurrences in the text and the frequency of occurrences in the rest of the text, where the more occurrences in the text and the less occurrences in the rest of the text indicate that the word is more important for the text and the higher the relevance, so the embodiment uses a word frequency-inverse document frequency value (i.e., IF-IDF value) as one of the indicators for the relevance quantization of the word segment corresponding to the any node.

Meanwhile, for example, but not limited to, the following formula (1) may be adopted to calculate the word frequency-reverse file frequency value of the word segmentation corresponding to the any node.

（1）

In the above-mentioned formula (1),Word frequency-reverse file frequency value representing word segmentation corresponding to any node,/>Representing the word segmentation/>, corresponding to any nodeThe number of occurrences in any of the text d to be cut,/>Representing the number of word segmentation with the largest occurrence number in any text d to be segmented,/>Representing that each text to be segmented contains word segmentation/> corresponding to any nodeIs the number of texts of/>Representing the total number of text to be cut.

Based on the formula (1), after calculating the word frequency-reverse file frequency value of the word corresponding to any node, calculating the word corresponding to any node, and calculating the relevance of the word corresponding to any node relative to any text to be segmented by combining the word frequency-reverse file frequency value in the semantic network graph; the calculation process of the importance and the cluster contribution of any node may be, but not limited to, the following step S33 b.

S33b, calculating the importance degree and the clustering contribution degree of the segmentation corresponding to any node in the semantic network diagram according to the semantic network diagram; in this embodiment, for example, but not limited to, the following formula (2) may be used to calculate the importance of the word segmentation corresponding to any node in the semantic network diagram.

（2）

In the above-mentioned formula (2),The importance of the word corresponding to any node in the semantic network diagram is represented,Representing the average length of each first target path in the semantic network graph,/>Representing the average length of each second target path in the semantic network graph, wherein any first target path is a path containing any node, and any second target path is a path not containing any node; in specific application, assuming that the association edge passing between the node A and the node B in the semantic network diagram is the association edge of the node A-node C and the association edge of the node C-node B, the path from the node A to the node B is the association edge of the node A-node C and the association edge of the node C-node B; meanwhile, if any node is node C, the path of node A-node B is used as a first target path; similarly, assuming that the node a is directly connected to the node B, the path between the node a and the node B does not pass through the node C, and thus, the path between the node a and the node B is a second target path; of course, the determination process of the remaining first target path and the second target path is the same as that of the foregoing example, and will not be repeated here; meanwhile, the length of any first target path is the number of the associated edges contained in the first target path.

Thus, the foregoing formula (2) is equivalent to calculating the difference between the average path length including the any node and the average path length not including the any node, that is, the average path change amount of the semantic network graph after the node is removed from the semantic network graph, where the change amount indicates the contribution degree of the any node to the network graph, and the larger the change amount is, the larger the characterization association is, and vice versa, the smaller the change amount is.

After the importance degree of the word segmentation corresponding to any node in the semantic network diagram is obtained, the clustering contribution degree of the word segmentation corresponding to any node in the semantic network diagram can be calculated; the clustering contribution degree of the word segmentation corresponding to any node in the semantic network map can be calculated according to the following formula (3) by way of example but not limited thereto.

（3）

In the above-mentioned formula (3),Representing the clustering contribution degree of the segmentation corresponding to any node in the semantic network diagram,/>Clustering coefficients representing corresponding word segmentation of ith node in the semantic network diagram,/>Clustering coefficient representing the corresponding word segmentation of the jth node in the target semantic network diagram, wherein/>Q represents the number of nodes directly connected with the ith node in the semantic network diagram, v represents the association edges between the ith node and each target node and the total number of association edges between each target node in the semantic network diagram, each target node is a node directly connected with the ith node, n is the total number of nodes in the semantic network diagram, k is the total number of nodes in the target semantic network diagram, and the target semantic network is the semantic network diagram after deleting any node; in this embodiment, v corresponds to the number of actually associated edges between the i-th node and each node to which it is connected; the formula (3) represents the variation of the average clustering coefficient of the whole semantic network graph after any node is removed; thus, the change amount of the average clustering coefficient can measure the importance of any node to the network diagram from a local range.

After calculating the word frequency-reverse document frequency value, the importance and the clustering contribution of the word segmentation corresponding to the any node through the formulas (1) - (3), the three data can be used to calculate the relevance of the any node relative to any text to be segmented, wherein the calculation process is shown in the following step S33 c.

S33c, calculating the relativity of the word segmentation corresponding to any node relative to any text to be segmented according to the word frequency-reverse file frequency value of the word segmentation corresponding to any node and the importance and clustering contribution of the word segmentation corresponding to any node in a semantic network diagram; in this embodiment, for example, but not limited to, according to the following formula (4), the relevance of the segmentation word corresponding to the any node with respect to the any text to be segmented may be calculated.

（4）

In the above-mentioned formula (4),Representing the relativity of the segmentation corresponding to any node relative to any text to be segmented,/>Representing the regulatory factor; in specific applications, examples/>0.4,0.35,0.25 In this order.

From the steps S31 to S33 and the sub-steps thereof, the relativity of any word relative to any text to be cut can be calculated; then, the correlation degree of each word segment relative to any text to be segmented can be calculated according to the same principle; finally, extracting keywords according to the correlation degree; wherein the extraction process is as shown in step S34 below.

S34, determining a second keyword of any text to be segmented from a plurality of segmented words according to the relativity of each segmented word relative to any text to be segmented; in this embodiment, the following steps S34a to S34c may be used, for example and not limited to, to determine the second keyword of any text to be cut from a plurality of word segments.

S34a, sequencing all the segmented words according to the sequence of the correlation degree from large to small so as to obtain an initial keyword sequence.

S34b, carrying out keyword merging processing on the initial keywords in the initial keyword sequence based on any text to be cut so as to obtain a plurality of candidate keywords after the keyword merging processing; in this embodiment, after sorting the words according to the degree of correlation from large to small, the keywords may be combined according to the adjacent relationship of the words in the initial keyword sequence in the text to be segmented; if the three initial keywords in the initial keyword sequence are single words, judging whether the next initial keyword or the last initial keyword is adjacent to the initial keyword in the original text, and if so, merging; if not, deleting; of course, the rest of the initial keywords are also the same, and when the initial keywords are multi-word and there is no initial keyword adjacent to the initial keyword in the original text, the initial keyword can be directly used as a candidate keyword; thus, through the design, a plurality of candidate keywords can be obtained.

After obtaining a plurality of candidate keywords, a second keyword of the text to be cut may be extracted from the plurality of candidate keywords, where the extraction process is shown in step S34c below.

S34c, selecting a candidate keyword with p bits before from a plurality of candidate keywords, wherein p is a positive integer greater than 1, and taking the candidate keyword as a second keyword of any text to be cut; in this embodiment, the example P may be, but is not limited to, 25% P, and P is the total number of candidate keywords; of course, the ratio thereof may be specifically set according to actual use, and is not limited to the foregoing examples.

The second keywords of any text to be cut can be obtained through the steps S31-S34 and the substeps, and then the second keywords of the rest texts to be cut can be obtained according to the same principle; in this way, the first keyword may be combined to determine the financial search keyword obtained by performing the financial service big data mining this time, where the determination process of the financial search keyword may be, but is not limited to, as shown in the following step S4.

S4, determining financial search keywords based on the first keywords and the second keywords, and carrying out keyword mining processing on the search keywords to obtain at least one expanded keyword; in this embodiment, the keywords belonging to the type of financial service product may be selected from the first keywords and the second keywords, or the keywords belonging to the type of financial service product and the name keywords of the merchant of the financial service product may be selected as the financial search keywords; then, keyword mining processing is required to be performed on the financial search keywords so as to find out similar keywords, and therefore the matching range is enlarged.

In the present embodiment, the expansion processing of the financial search keyword can be performed by using the following steps S41 to S43, for example.

S41, acquiring an expansion word database and semantic vectors of the financial search keywords, wherein the expansion word database stores massive expansion words and the semantic vectors of each expansion word; in this embodiment, when the financial search keywords only contain keywords of the product type, semantic vectors of all the financial search keywords are directly obtained; when the financial search keyword also contains the keyword of the merchant, the keyword of the merchant is not acquired to carry out semantic vector, but the merchant keyword which is the same as the merchant is directly acquired; the example of supposing that the financial search keyword includes: safety, car insurance and fund, then, the semantic expansion is carried out on the car insurance and fund, and the expansion of the same type of merchants is carried out on the safety, for example, the expansion obtains keywords of the merchants of the same type of personal insurance and life; further, examples may include, but are not limited to, outputting semantic vectors of financial search keywords using a BERT model trained using a corpus; of course, the concept of the commonly used technical scheme that the BERT model generates semantic vectors as word vectorization is not repeated; in addition, the expansion word database is stored in the financial digital financial server in advance, and is called when in use.

After the semantic vector of the financial search keyword is obtained, the expansion keyword closest to the semantic vector can be determined according to the distance between the semantic vector and each expansion word in the expansion word database; the determination process of the expanded keyword is as follows in step S42 and step S43.

S42, calculating a vector distance between the semantic vector of the financial search keyword and the semantic vector of each expansion word in the expansion word database; in this embodiment, the distance formula between the vectors is a common formula of similarity measurement, and the principle thereof is not described again.

S43, determining at least one expansion keyword of the financial search keyword according to the vector distance between the semantic vector of the financial search keyword and the semantic vector of each expansion word; in this embodiment, for example, but not limited to, the expansion word with the smallest distance is used as the expansion keyword; or the first 3 expansion words closest to the user are used as expansion keywords.

Thus, after the expanded keywords of the financial search keywords are obtained through the steps S41 to S43, the matching of financial services can be performed; wherein the matching process is as shown in step S5 below.

S5, according to the financial search keywords and the at least one expansion keyword, financial service matching processing is carried out so as to match out a first financial service corresponding to the financial search keywords and a second financial service corresponding to each expansion keyword; in this embodiment, keyword matching is performed according to the financial search keywords and the expanded keywords, so as to obtain financial service products containing the financial search keywords and/or the expanded keywords.

After the first financial service corresponding to each financial search keyword and the second financial service corresponding to each expansion keyword are matched, the matched financial service can be used as a recommended financial service, so that the recommended financial service is recommended to a target user; wherein the recommendation process is shown in the following step S6.

S6, using the first financial service and the second financial service as recommended financial services, and pushing the recommended financial services to user terminals corresponding to the target users; in this embodiment, the recommended financial service may be pushed to the target user in the form of, but not limited to, a short message, APP pop-up window, or phone.

According to the big data mining method of the digital financial service described in detail in the steps S1 to S6, keywords of the financial service of interest to the user are extracted by crawling a financial service search record of the user in a latest time period, and expansion is performed based on the extracted keywords, so that expanded keywords are obtained; finally, matching the financial service of interest of the user according to the extracted keyword of the financial service of interest of the user and the expansion keyword thereof, and pushing the service; compared with the traditional technology, the method not only improves the diversity of recommendation, but also improves the accuracy of recommendation, can bring more diversified choices to users, and is suitable for large-scale application and popularization.

As shown in fig. 2, a second aspect of the present embodiment provides a hardware device for implementing the big data mining method for digital financial services according to the first aspect of the present embodiment, including:

The crawling unit is used for acquiring a network search corpus of the target user, wherein the network search corpus comprises financial service search records of the target user in a preset historical time period.

The classification unit is used for performing classification processing on the web search corpus so as to take financial service search records belonging to search sentences as texts to be segmented and financial service search records belonging to search keywords as first keywords.

And the keyword extraction unit is used for carrying out keyword extraction processing on each text to be segmented so as to obtain second keywords of each text to be segmented.

And the keyword expansion unit is used for determining financial search keywords based on the first keywords and the second keywords, and carrying out keyword mining processing on the financial search keywords to obtain at least one expansion keyword.

And the financial service matching unit is used for carrying out financial service matching processing according to the financial search keyword and the at least one expansion keyword so as to match out a first financial service corresponding to the financial search keyword and a second financial service corresponding to each expansion keyword.

The working process, working details and technical effects of the device provided in this embodiment may refer to the first aspect of the embodiment, and are not described herein again.

As shown in fig. 3, a third aspect of the present embodiment provides another big data mining apparatus for digital financial services, taking the apparatus as an electronic device, including: the system comprises a memory, a processor and a transceiver which are connected in sequence in communication, wherein the memory is used for storing a computer program, the transceiver is used for receiving and transmitting messages, and the processor is used for reading the computer program and executing the big data mining method of the digital financial service according to the first aspect of the embodiment.

By way of specific example, the Memory may include, but is not limited to, random access Memory (random access Memory, RAM), read Only Memory (ROM), flash Memory (Flash Memory), first-in-first-Out Memory (First Input First Output, FIFO) and/or first-in-last-Out Memory (FIRST IN LAST Out, FILO), and the like; in particular, the processor may include one or more processing cores, such as a 4-core processor, an 8-core processor, or the like. The processor may be implemented in at least one hardware form of DSP (DIGITAL SIGNAL Processing), FPGA (Field-Programmable gate array), PLA (Programmable Logic Array ), and may also include a main processor and a coprocessor, where the main processor is a processor for Processing data in a wake-up state, and is also called CPU (Central Processing Unit ); a coprocessor is a low-power processor for processing data in a standby state.

In some embodiments, the processor may be integrated with a GPU (Graphics Processing Unit, image processor) for rendering and drawing of content required to be displayed by the display screen, e.g., the processor may not be limited to a microprocessor of the STM32F105 family, a reduced instruction set computer (reduced instruction set computer, RISC) microprocessor, an X86 or other architecture processor, or a processor that integrates an embedded neural network processor (neural-network processing units, NPU); the transceiver may be, but is not limited to, a wireless fidelity (WIFI) wireless transceiver, a bluetooth wireless transceiver, a General Packet Radio Service (GPRS) wireless transceiver, a ZigBee wireless transceiver (low power local area network protocol based on the ieee802.15.4 standard), a 3G transceiver, a 4G transceiver, and/or a 5G transceiver, etc. In addition, the device may include, but is not limited to, a power module, a display screen, and other necessary components.

The working process, working details and technical effects of the electronic device provided in this embodiment may refer to the first aspect of the embodiment, and are not described herein again.

A fourth aspect of the present embodiment provides a storage medium storing instructions containing the method for mining big data of a digital financial service according to the first aspect of the present embodiment, that is, the storage medium storing instructions thereon, which when executed on a computer, perform the method for mining big data of a digital financial service according to the first aspect of the present embodiment.

The storage medium refers to a carrier for storing data, and may include, but is not limited to, a floppy disk, an optical disk, a hard disk, a flash Memory, a flash disk, and/or a Memory Stick (Memory Stick), where the computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable devices.

The working process, working details and technical effects of the storage medium provided in this embodiment may refer to the first aspect of the embodiment, and are not described herein again.

Finally, it should be noted that: the foregoing description is only of the preferred embodiments of the invention and is not intended to limit the scope of the invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A method for mining big data of a digital financial service, comprising:

The first financial service and the second financial service are used as recommended financial services, and the recommended financial services are pushed to user terminals corresponding to the target users;

Extracting keywords from each text to be segmented to obtain second keywords of each text to be segmented, wherein the extracting the keywords comprises the following steps:

Determining a second keyword of any text to be cut from a plurality of segmented words according to the relativity of each segmented word relative to any text to be cut;

Based on each text to be segmented and the semantic network graph, calculating the relativity of each word segment in the plurality of word segments relative to any text to be segmented, wherein the method comprises the following steps:

2. The big data mining method of a digital financial service according to claim 1, wherein calculating word frequency-reverse document frequency values of the word segments corresponding to the any node based on each text to be segmented, comprises:

（1）

in the above-mentioned formula (1), Word frequency-reverse file frequency value representing word segmentation corresponding to any node,/>Representing the word segmentation/>, corresponding to any nodeThe number of occurrences in any of the text d to be cut,/>Representing the number of word segmentation with the largest occurrence number in any text d to be segmented,/>Representing that each text to be segmented contains word segmentation/> corresponding to any nodeIs the number of texts of/>Representing the total number of text to be cut.

3. The big data mining method of digital financial service according to claim 1, wherein calculating the importance and the cluster contribution of the word segment corresponding to any node in the semantic network map according to the semantic network map comprises:

（2）

In the above-mentioned formula (2), Representing importance of word segmentation corresponding to any node in semantic network diagram,/>Representing the average length of each first target path in the semantic network graph,/>Representing the average length of each second target path in the semantic network graph, wherein any first target path is a path containing any node, and any second target path is a path not containing any node;

（3）

In the above-mentioned formula (3), The clustering contribution degree of the segmentation corresponding to any node in the semantic network diagram is represented,Clustering coefficients representing corresponding word segmentation of ith node in the semantic network diagram,/>Clustering coefficient representing the corresponding word segmentation of the jth node in the target semantic network diagram, wherein/>Q represents the number of nodes directly connected with the ith node in the semantic network diagram, v represents the association edges between the ith node and each target node and the total number of association edges between each target node in the semantic network diagram, each target node is a node directly connected with the ith node, n is the total number of nodes in the semantic network diagram, k is the total number of nodes in the target semantic network diagram, and the target semantic network is the semantic network diagram after deleting any node;

（4）

in the above-mentioned formula (4), Representing the relativity of the segmentation corresponding to any node relative to any text to be segmented,/>Word frequency-reverse file frequency value representing word segmentation corresponding to any node,/>Representing the adjustment factor.

4. The big data mining method of a digital financial service according to claim 1, wherein determining the second keyword of the any text to be cut from a plurality of words according to the relevance of each word to the any text to be cut, includes:

5. The method of claim 1, wherein performing keyword mining on the financial search keywords to obtain at least one expanded keyword comprises:

And determining at least one expansion keyword of the financial search keyword according to the vector distance between the semantic vector of the financial search keyword and the semantic vector of each expansion word.

6. A big data mining apparatus for digital financial services, comprising:

the pushing unit is used for taking the first financial service and the second financial service as recommended financial services and pushing the recommended financial services to the user terminals corresponding to the target users;

7. A big data mining apparatus for digital financial services, comprising: a memory, a processor and a transceiver in communication, wherein the memory is configured to store a computer program, the transceiver is configured to receive and transmit messages, and the processor is configured to read the computer program and perform the big data mining method of the digital financial service of any of claims 1-5.

8. A storage medium having instructions stored thereon which, when executed on a computer, perform the method of big data mining of a digital financial service as claimed in any one of claims 1 to 5.