WO2023272850A1

WO2023272850A1 - Decision tree-based product matching method, apparatus and device, and storage medium

Info

Publication number: WO2023272850A1
Application number: PCT/CN2021/108777
Authority: WO
Inventors: 平高明
Original assignee: 未鲲(上海)科技服务有限公司
Priority date: 2021-06-29
Filing date: 2021-07-28
Publication date: 2023-01-05
Also published as: CN113450147A; CN113450147B

Abstract

The present application relates to the technical field of artificial intelligence, and discloses a decision tree-based product matching method, apparatus and device, and a storage medium. The method comprises: when an unread email is detected at fixed time periods, obtaining a subject and body content of the unread email; then inputting the subject and the body content into a trained decision tree for decision determination; if the unread email is related to product sales, then identifying the content in the unread email as target content; extracting product related information in the target content; and then matching the product name with a corresponding sales amount to obtain product sales information, and outputting the product sales information. The present application further relates to blockchain technology, and the product sales information is stored in a blockchain. The present application achieves the automatic determination, by means of a trained decision tree, of whether unread email is related to product sales information, and product related information in the email is extracted for product matching, helping to improve product matching efficiency.

Description

Product matching method, device, equipment and storage medium based on decision tree

This application claims the priority of the Chinese patent application with the application number 202110725709.8 submitted to the China Patent Office on June 29, 2021, and the title of the invention is "Decision Tree-Based Product Matching Method, Device, Equipment, and Storage Medium", the entire content of which Incorporated in this application by reference.

technical field

The present application relates to the technical field of artificial intelligence, and in particular to a decision tree-based product matching method, device, equipment and storage medium.

Background technique

As the business continues to expand, day-to-day operations have become an integral part of the normal running of every company. Sales operations, as part of daily operations, aim to establish a product sales ecosystem, maintain this ecosystem, and form an ecological closed loop for user contributions and product content consumption. After the product sales strategy is clarified, operators need to mechanically enter the sales strategy instructions.

In the existing product sales operation method, it is often necessary to manually identify the sales plan in the email, and split and match the sales plan, so as to obtain the matching plan of related products. However, the inventor realized that this requires manual attention to the email situation, and to judge whether the email address is related to product sales, which may lead to missing emails, resulting in too long follow-up product analysis time, resulting in low product matching efficiency. There is an urgent need for a method that can improve product matching efficiency.

Contents of the invention

The purpose of the embodiments of the present application is to propose a decision tree-based product matching method, device, equipment, and storage medium, so as to improve product matching efficiency.

In the first aspect, the embodiment of the present application provides a decision tree-based product matching method, including:

Through timing tasks, regularly detect the receiving status of the target mailbox;

If it is detected that there are unread emails in the target mailbox, then obtain the subject and body content of the unread emails;

Inputting the subject and text content into the trained decision tree to judge whether the unread email is related to product sales, and obtain the first judgment result;

If the first judgment result is that the unread email is related to product sales, identifying the content in the unread email as the target content;

Traversing the target content through preset keywords to extract product-related information in the target content, wherein the product-related information includes product name, sales time, and sales amount;

The product name is matched with the corresponding sales amount through a preset matching rule to obtain product sales information, and based on the sales time, the product sales information is output.

In the second aspect, the embodiment of the present application provides a decision tree-based product matching device, including:

The receiving situation detection module is used to regularly detect the receiving situation of the target mailbox through the timing task;

An unread email acquisition module, configured to acquire the subject and body content of the unread email if it is detected that there is an unread email in the target mailbox;

An unread mail judging module, which is used to input the subject and text content into the trained decision tree to judge whether the unread mail is related to product sales, and obtain the first judgment result;

A target content identification module, configured to identify the content in the unread email as the target content if the first judgment result is that the unread email is related to product sales;

A product-related information extraction module, configured to traverse the target content through preset keywords to extract product-related information in the target content, wherein the product-related information includes product name, sales time, and sales amount;

The product sales information output module is used to match the product name with the corresponding sales amount through preset matching rules to obtain product sales information, and output the product sales information based on the sales time.

In a third aspect, the embodiment of the present application also provides a computer device, which includes a memory and a processor, where computer-readable instructions are stored in the memory, and the processor implements the following steps when executing the computer-readable instructions:

In the fourth aspect, the embodiment of the present application further provides a computer-readable storage medium, the computer-readable storage medium stores computer-readable instructions, and when the computer-readable instructions are executed by a processor, the processor executes the following: step:

Embodiments of the present application provide a decision tree-based product matching method, device, device, and storage medium. The embodiment of the present application realizes automatically judging whether the unread email is related to the product sales information through the trained decision tree, and if so, extracts the product-related information in the email for product matching, which is beneficial to improve the matching efficiency of the product.

Description of drawings

In order to illustrate the solution in this application more clearly, a brief introduction will be given below to the accompanying drawings that need to be used in the description of the embodiments of the application. Obviously, the accompanying drawings in the following description are some embodiments of the application. Ordinary technicians can also obtain other drawings based on these drawings on the premise of not paying creative work.

Fig. 1 is the application environment diagram of the product matching method based on the decision tree provided by the embodiment of the present application;

Fig. 2 is an implementation flowchart of a decision tree-based product matching method provided according to an embodiment of the present application;

Fig. 3 is an implementation flowchart of the sub-process in the decision tree-based product matching method provided by the embodiment of the present application;

Fig. 4 is another implementation flowchart of the sub-process in the decision tree-based product matching method provided by the embodiment of the present application;

Fig. 5 is another implementation flowchart of the sub-process in the decision tree-based product matching method provided by the embodiment of the present application;

Fig. 6 is another implementation flowchart of the sub-process in the decision tree-based product matching method provided by the embodiment of the present application;

Fig. 7 is another implementation flowchart of the sub-process in the decision tree-based product matching method provided by the embodiment of the present application;

Fig. 8 is another implementation flowchart of the sub-process in the decision tree-based product matching method provided by the embodiment of the present application;

FIG. 9 is a schematic diagram of a decision tree-based product matching device provided in an embodiment of the present application;

Fig. 10 is a schematic diagram of a computer device provided by an embodiment of the present application.

detailed description

Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by those skilled in the technical field of the application; the terms used herein in the description of the application are only to describe specific embodiments The purpose is not to limit the present application; the terms "comprising" and "having" and any variations thereof in the description and claims of the present application and the description of the above drawings are intended to cover non-exclusive inclusion. The terms "first", "second" and the like in the description and claims of the present application or the above drawings are used to distinguish different objects, rather than to describe a specific order.

Reference herein to an "embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the present application. The occurrences of this phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is understood explicitly and implicitly by those skilled in the art that the embodiments described herein can be combined with other embodiments.

In order to enable those skilled in the art to better understand the solutions of the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below in conjunction with the accompanying drawings.

The present application will be described in detail below in conjunction with the accompanying drawings and embodiments.

Referring to FIG. 1 , a system architecture 100 may include

terminal devices

101 , 102 , 103 , a network 104 and a server 105 . The network 104 is used as a medium for providing communication links between the

terminal devices

101 , 102 , 103 and the server 105 . Network 104 may include various connection types, such as wires, wireless communication links, or fiber optic cables, among others.

Users can use

terminal devices

101 , 102 , 103 to interact with server 105 via network 104 to receive or send messages and the like. Various communication client applications may be installed on the

terminal devices

101, 102, 103, such as web browser applications, search applications, instant messaging tools, and the like.

The

terminal devices

101, 102, 103 may be various electronic devices with display screens and supporting web browsing, including but not limited to smart phones, tablet computers, laptop computers, desktop computers and the like.

The server 105 may be a server that provides various services, such as a background server that provides support for pages displayed on the

terminal devices

101 , 102 , 103 .

It should be noted that the decision tree-based product matching method provided in the embodiment of the present application is generally executed by a server, and correspondingly, the decision tree-based product matching device is generally configured in the server.

It should be understood that the numbers of terminal devices, networks and servers in Fig. 1 are only illustrative. According to the implementation needs, there can be any number of terminal devices, networks and servers.

Please refer to FIG. 2 , which shows a specific implementation of a decision tree-based product matching method.

It should be noted that if there are substantially the same results, the method of the present application is not limited to the flow sequence shown in Figure 2, and the method includes the following steps:

S1: Through the timing task, regularly detect the receiving situation of the target mailbox.

In the embodiment of the present application, in order to understand the technical solution more clearly, the terminal involved in the present application is introduced in detail below.

One is the server. The server can monitor the mailbox of the client through regular monitoring tasks. If it receives an unread email, it will judge whether the unread email is related to product sales. If it is related, it will extract the product-related information of the email, and Based on the preset matching rules, match the product-related information to obtain product sales information, then output the product sales information, generate a data analysis report, and return the data analysis report to the client.

The second is the client terminal. The user of the client terminal can be the operator of the product sales, who can receive the data analysis report returned by the server.

Specifically, in the implementation of this application, the reception of a specific mailbox is monitored in real time through the technology of regularly executing tasks. If an unread email is detected, the unread email will be judged to determine whether the unread email is related to product sales. . Further, by judging the reading identifier of the mail, it is judged whether the mail has been read. Wherein, the read flag is obtained through the generated log file, and the read flag is used to distinguish the read or unread status of mail processing.

S2: If it is detected that there are unread emails in the target mailbox, obtain the subject and body content of the unread emails.

Specifically, when an unread email is detected, the unread email is accurately identified from the target mailbox through the generated log file. The subject and body content of the unread email are then captured as a basis for subsequently judging whether the unread email is related to product sales.

S3: Input the subject and text content into the trained decision tree to judge whether the unread email is related to product sales, and obtain the first judgment result.

Specifically, in this embodiment of the present application, a decision tree is trained to determine whether unread emails are related to product sales through the trained decision tree. Among them, the decision tree processes the data by using the induction algorithm to generate classification rules and decision trees, and then predicts and analyzes new data. The terminal node "Leaf Node" of the tree represents the category (Class) of the classification result, each internal node represents a variable test, and the branch (Branch) is the test output, representing a possible value of the variable. For classification purposes, variable values are tested on the data, and each path represents a classification rule. The decision tree model maximizes the difference of dependent variables by continuously dividing the data. The ultimate goal is to classify the data into different organizations or different branches, and establish the strongest classification on the value of dependent variables. Rely on the powerful tools of deep learning to classify the subject and content of emails. In the embodiment of this application, the Sklearn module of python machine learning is used to realize the decision tree.

Specifically, a decision tree is a predictive model, which represents a mapping relationship between object attributes and object values. Each node in the tree represents an object, and each bifurcated path represents a possible attribute value, and each leaf node corresponds to the object represented by the path from the root node to the leaf node. value. For example, a decision tree that predicts whether a person will buy a computer, using this tree, can classify new records, starting from the root node (age), if a person's age is middle-aged, it is directly judged that this person will buy a computer , if it is a teenager, it needs to further judge whether it is a student; if it is an old person, it needs to further judge its credit rating until the leaf node can determine the category of the record. Similarly, in this embodiment of the application, a decision tree can be used to determine whether unread emails are related to product sales. For example, a decision tree that predicts whether an unread email is related to product sales (the decision tree can be obtained through training), using this tree, can classify new unread emails; start classification from the root node (mail subject and body content) , there may be three classifications of "spam mail", "daily mail" and "mail including the word 'product'"; if it is classified as "spam mail" or "daily mail", it is directly determined that the mail has nothing to do with product sales; if Classified as emails containing the word 'product'", then start to classify from this node, and may be classified into "mails with the word 'sales'", "mails with the word 'sales'" and "no word' If it is the first two classification results, it is determined to be an email related to product sales; if it is the third result, further judgment is made until the leaf node can determine the category of the email.

Please refer to Fig. 3, Fig. 3 has shown a kind of specific implementation of step S3, described in detail as follows:

S31: Input the subject and text content into the trained decision tree.

S32: Use the trained decision tree to classify the nodes of the topic and text content, and obtain the current node classification result.

Specifically, through the trained decision tree, the input topic and text content are classified into nodes, and the next node is predicted from the root node, so as to obtain the classification result of the current node. Among them, node classification means that the decision tree selects an optimal feature according to the input data, and divides the input data set into subsets according to this data feature, so that each subset has the best classification under the current conditions.

S33: Determine the final leaf node of the unread mail based on the classification result of the current node as the predicted leaf node.

Specifically, predict the next leaf node of the classification result of the current node through the trained decision tree to obtain the final leaf node in the unread mail, and obtain the classification result of the unread mail by judging the final leaf node. In this embodiment, the classification results of unread emails are divided into "related to product sales" and "not related to product sales".

S34: Obtain a first judgment result by predicting the prediction result of the leaf node.

Specifically, for each record in the subject and text content, classify and predict it through the decision tree, and judge which child node the record should enter according to the classification result set of the current node on each node, until reaching a certain a leaf node. Obtain a prediction value through the current leaf node, and obtain the classification result with the highest probability among all prediction results according to the prediction result of the decision tree after training, and output the classification results of all records, so as to obtain the first judgment result. Wherein, the first judgment result is divided into two results: the unread emails are related to product sales and the unread emails are not related to product sales.

In this embodiment, the subject and text content are classified through the trained decision tree to obtain the current node classification result, and based on the current node classification result, the final leaf node of the unread email is judged as the predicted leaf node, and then the predicted The prediction result of the leaf node obtains the first judgment result, so that unread emails related to product sales can be quickly filtered out from many emails, which is conducive to rapid analysis of emails and thus helps to improve product matching efficiency.

Please refer to FIG. 4, which shows a specific implementation before step S3, which is described in detail as follows:

S3A: Obtain sample emails, and grab the content in the sample emails to obtain sample training data.

Specifically, sample emails are divided into emails related to product sales and emails not related to product sales, which are used to train the decision tree. Grab the content in the sample email, including the email subject, body content, and attachment content, and use the content as sample training data.

S3B: identifying product feature attributes in the sample training data, wherein the product feature attributes are inherent attributes of products in the sample training data.

Specifically, the product characteristic attribute is an inherent attribute of the product in the sample training data, such as product name, product category, product sales amount, product sales time, and the like.

S3C: Perform feature selection on the data in the sample training data according to the product feature attributes to generate a target training set, wherein the target training set is divided into training data and test data.

Specifically, feature selection is performed on the data in the sample training data through product feature attributes, and data with product feature attributes is selected to generate target training data. Among them, feature selection refers to selecting data related to product feature attributes. For example, if the sample training data set is a data set of 50 columns, after feature selection, there are only 10 columns of data remaining, then the 10 columns of data will be used as the target training set. In addition, attribute selection can not only reduce the size of the data set, but also improve the prediction effect of the decision tree model. When performing feature selection, the method used is generally through algorithm selection (such as: feature selection algorithm based on information gain) and The selection is carried out by a combination of artificial selection.

S3D: Use the decision tree algorithm to train the target training set to obtain a trained decision tree.

Specifically, from the input sample emails based on the decision tree, attribute data useful for the prediction process are selected according to the feature attributes. That is, when training the decision tree in the embodiment of the present application, product-related information, such as product name, product category, sales amount, sales time, etc., are used as product feature attributes. Use these feature data to filter the training data, and select emails related to product sales to obtain the target training set, and then divide the target training set into training data and test data according to the preset ratio. The training data is used to train the decision tree , the test data is used to verify the trained decision tree, and the preset ratio is set according to the actual situation, which is not limited here. In a specific embodiment, the preset ratio is 8:2.

In this embodiment, by obtaining sample emails and grabbing the content in the sample emails, the sample training data is obtained, the product feature attributes in the sample training data are identified, and then feature selection is performed on the data in the sample training data according to the product feature attributes. Generate the target training set, and then use the decision tree algorithm to train the target training set to obtain a trained decision tree, and realize the training of the decision tree through the email information, so that it is convenient for the follow-up to quickly judge whether the unread email is related to product sales through the decision tree, thereby Improve the identification of product-related information in emails, which is conducive to improving product matching efficiency.

Please refer to FIG. 5, which shows a specific implementation of step S3D, which is described in detail as follows:

S3D1: Using the ID3 algorithm, the training data is used to calculate the nodes of the decision tree to obtain the node characteristics.

Among them, the ID3 algorithm was first proposed by J. Ross Quinlan at the University of Sydney in 1975 as a classification prediction algorithm. The core of the algorithm is "information entropy". The ID3 algorithm calculates the information gain of each attribute, and considers that the attribute with high information gain is a good attribute. Each division selects the attribute with the highest information gain as the division standard, and repeats this process until a decision tree that can perfectly classify training samples is generated. In this embodiment, the ID3 algorithm is used to train the decision tree corresponding to the emails related to product sales. Among them, node calculation refers to starting from the root node (root node), and selecting the feature with the largest information gain as the feature of the node for the information gain of all possible features.

S3D2: Based on node characteristics, recursively calculate the decision tree, where each recursive calculation obtains a basic decision tree.

Specifically, by performing recursive calculations on the decision tree, each recursive calculation selects the feature with the largest information gain as the node feature for the next calculation, and at the same time, each recursive calculation obtains a decision tree.

S3D3: Test and calculate the basic decision tree through the test data to obtain the error value.

S3D4: When the error value is less than the preset threshold, stop the recursive calculation and get the trained decision tree.

Specifically, in decision tree model training, there are generally two methods of model testing. One is to divide the data in the training set into two parts, and use part of the data for training to generate a decision tree (ie training data), and part of the data It is used for testing (that is, test data), wherein, generally, test cases are selected in the test data; another method is to use the n-fold cross-validation method to divide the data in the training set into n folds. If the data is divided into 10 Take 8 of them for training to generate a decision tree, and the remaining 2 for testing, as test cases for testing, until all 10 data are used as test cases for testing separately, then the entire testing process is completed. In this embodiment, each time the basic decision tree is obtained, the basic decision tree is tested and calculated through the test data to obtain the error value, and the recursive calculation is stopped until the error value is less than the preset threshold, so as to obtain the trained decision Tree. Among them, the error value refers to the difference between the classification result of the basic decision tree and the actual classification result.

It should be noted that the preset threshold is set according to actual conditions, and is not limited here. In a specific embodiment, the preset threshold is 0.05.

In this embodiment, the ID3 algorithm is used to calculate the nodes of the decision tree with the training data to obtain the node characteristics, and based on the node characteristics, the decision tree is recursively calculated, and then the basic decision tree is tested and calculated by the test data to obtain the error value , and when the error value is less than the preset threshold, stop the recursive calculation, get the trained decision tree, realize the training of the decision tree, and obtain the decision tree related to product sales, which is convenient for improving the identification of whether unread emails are related to the product Accuracy.

S4: If the first judgment result is that the unread emails are related to product sales, identify the content in the unread emails as the target content.

Specifically, if the first judgment result is that the unread email is related to product sales, the unread email needs to be analyzed for product sales, so all content in the unread email needs to be identified as the target content. The above steps only identify the subject and body content of unread emails, but in actual situations, emails may have attachments, and the attachments may also include information about product sales. Therefore, it is necessary to determine whether the unread email includes an attachment, and if it includes an attachment, further identify and analyze the attachment, obtain the content in the attachment, and use the content in the attachment together with the subject and body content as the target content.

Referring to Fig. 6, Fig. 6 shows a specific implementation of step S4, which is described in detail as follows:

S41: If the first judgment result is that the unread emails are related to product sales, set the subject and body content as target content, and detect whether the unread emails include attachments to obtain a detection result.

Specifically, if the first judgment result is that the unread email is related to product sales, the subject and body content are taken as the target content first, and then it is judged whether the unread email includes attachments.

S42: If the detection result is that the unread email includes an attachment, judge the text type of the attachment, and obtain a second judgment result.

Specifically, if the unread email includes an attachment, then determine the text type of the attachment, so as to obtain the target content in the attachment in a corresponding manner according to the text type of the attachment.

S43: If the second determination result is that the text type of the attachment is a word document, read the text content corresponding to the word document as the target content.

Specifically, if the text type of the attachment is a word document, the text content corresponding to the word document can be directly read and used as the target content.

S44: If the second judgment result is that the text type of the attachment is a text-type PDF, read out the text content of the text-type PDF as the target content by means of java class package analysis.

Specifically, the java class package includes PDFBox, iText, and XPDF. The PDFBox (an open source project under a BSD license) is a pure Java class library prepared for developers to read and create PDF documents, which can extract text; Based on a java class library that can quickly generate PDF documents, not only PDF or rtf documents can be generated through iText, but also XML and Html files can be converted into PDF files; XPDF is an open source project, and corresponding local methods can be called to achieve extraction Chinese pdf file.

S45: If the second judging result is that the text type of the attachment is a picture-type PDF, read out the text content of the picture-type PDF as the target content by way of OCR recognition.

Among them, the OCR (optical character recognition) recognition method refers to the method of scanning text data, and then analyzing and processing image files to obtain text and layout information. In this embodiment, the text content of the picture-type PDF is read out through OCR recognition.

In this embodiment, if the unread mail is related to product sales, the subject and body content are used as the target content, and whether the unread mail includes an attachment is detected to obtain the detection result. If there is an attachment, the corresponding text type is used according to the attachment text type. Acquiring the corresponding target content through the analysis method, realizing accurate acquisition of the target content, is conducive to improving the subsequent identification of product-related information, which in turn is conducive to improving the efficiency of product matching.

S5: Traversing the target content through preset keywords to extract product-related information in the target content, wherein the product-related information includes product name, sales time, and sales amount.

Specifically, the product, product name, amount, sales amount, sales time, etc. are used as preset keywords, and combined with preset keywords to traverse the target content, thereby extracting product-related information in the target content, wherein the product-related information includes Product name, sales time and sales amount, etc. For example, by traversing the target content, it is obtained that there is "this sale product: xxx snack" in the content, then locate the part of the target content about the product, and obtain the corresponding product name. Correspondingly, for the sales time and sales amount, as long as the corresponding preset keywords are located and the subsequent data and time of the preset keywords are obtained, they will be used as the sales amount and sales time. Wherein, the preset keywords are set according to information related to product sales, and may be words such as product, amount, and sales.

Please refer to Figure 7, Figure 7 shows a specific implementation of step S5, described in detail as follows:

S51: Traversing the target content through preset keywords to acquire sentences including preset keywords in the target content as key sentences.

Specifically, by identifying the preset keywords in the target content, the sentences of the preset keywords in the target content are obtained as key sentences for subsequent identification of corresponding product-related information.

S52: Obtain data corresponding to preset keywords in key sentences, and obtain sales time and sales amount.

Specifically, if the email is related to product sales, the target content often includes the relevant amount of product sales and the relevant time of product sales. By obtaining the data corresponding to the preset keywords in the key sentence, the sales time and sales amount are obtained. For example, the key sentence is "The target sales amount in this shopping mall activity is 30,000 yuan, and the sales time is 2021-5-3 ", by identifying the data corresponding to the preset keywords, you can get the sales amount of 30,000 and the sales time of 2021-5-3.

S53: Split the key sentence by a delimiter to obtain a product name.

Specifically, in the key sentence, it is split by a delimiter to obtain preset keywords, such as information after "product", so as to obtain the product name. For example, the key sentence is "This time our sale product: XXX item". Further, the product-related information in the target content can be obtained through regular matching.

In this embodiment, the preset keywords are used to traverse the target content to obtain sentences including the preset keywords in the target content as key sentences, and then the data corresponding to the preset keywords in the key sentences are obtained to obtain the sales time and the sales amount, and split the key sentences through delimiters to obtain the product name, so as to accurately identify the corresponding product-related information from the target content, which is conducive to improving the matching efficiency of products.

Referring to FIG. 8, FIG. 8 shows a specific implementation of step S53, which is described in detail as follows:

S54: Judging whether there is a table in the target content, and obtaining a third judging result.

S55: If the third determination result is that there is a table in the target content, then parse the table to obtain header information corresponding to the table.

Specifically, since there may be a form in the body content and attachment content of the email, the form may also be related information of the product. Therefore, it is necessary to determine whether there is a table in the target content, and if so, parse the table to obtain the header information, and then judge whether the table information is related to product sales based on the header information.

S56: Judging whether the header information matches a preset keyword, and obtaining a fourth judging result.

S57: If the fourth determination result is that the header information matches the preset keyword, then obtain product-related information based on the header information.

Specifically, if there is a keyword matching the header information in the preset keywords, the fourth judgment result is that the header information matches the preset keyword; otherwise, the fourth judgment result is that the header information matches the preset keyword. Keywords do not match. When the fourth judgment result is that the header information matches the preset keywords, the data corresponding to the header information in the table is obtained, and then the product-related information can be obtained.

In this implementation, if there is a table in the target content, the header information in the table is obtained, and it is judged whether the table matches the preset keywords. If so, the corresponding product-related information is obtained, and the product is obtained from the table. Relevant information, so that the corresponding product-related information is not missed, so as to facilitate subsequent matching of corresponding product sales information.

S6: Match the product name with the corresponding sales amount through the preset matching rules to obtain product sales information, and output the product sales information based on the sales time.

Specifically, the obtained product-related information is matched with the corresponding sales amount according to the preset matching rules, and combined with the sales time point, through selenuim combined with the requests technology to set the product status configuration of the timing effective mode or the immediate restart mode. Furthermore, after configuring the person group and the channel, the specific person group and the specific channel can accurately obtain the sales volume. At the same time, after the task product assignment task is completed, an analysis report can be generated and sent to the operator.

In this implementation, the receiving status of the mailbox is regularly detected through the scheduled task, and if an unread email is detected, the subject and text content of the unread email are obtained; the subject and text content are input into the trained decision tree for decision-making and judgment , judge whether the unread email is related to product sales, and get the judgment result; if the judgment result is that the unread email is related to product sales, then identify the content in the unread email as the target content; traverse the target content through preset keywords, and use Extract the product-related information in the target content; match the product name with the corresponding sales amount through the preset matching rules to obtain product sales information, and output the product sales information based on the sales time. The trained decision tree is used to automatically determine whether unread emails are related to product sales information. If so, product matching is carried out by extracting product-related information in emails, which is conducive to improving product matching efficiency. In addition, this application also uses timing tasks to detect emails and process product-related emails in a timely manner; this application also combines the corresponding format of email content and adopts different analysis methods to obtain corresponding product information, which is conducive to improving product matching efficiency.

It should be emphasized that, in order to further ensure the privacy and security of the above-mentioned product sales information, the above-mentioned product sales information can also be stored in a node of a block chain.

Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented by instructing related hardware through computer-readable instructions, and the computer-readable instructions can be stored in a computer-readable storage medium. , when the computer-readable instructions are executed, they may include the processes of the embodiments of the above-mentioned methods. Wherein, the aforementioned storage medium may be a nonvolatile storage medium such as a magnetic disk, an optical disk, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM).

Please refer to FIG. 9 , as an implementation of the method shown in FIG. 2 above, the present application provides an embodiment of a decision tree-based product matching device, which corresponds to the method embodiment shown in FIG. 2 . The device can be specifically applied to various electronic devices.

As shown in Figure 9, the product matching device based on the decision tree of the present embodiment includes: a receiving situation detection module 71, an unread mail acquisition module 72, an unread mail judgment module 73, a target content identification module 74, and a product-related information extraction module 75 and product sales information output module 76, wherein:

The receiving situation detection module 71 is used to regularly detect the receiving situation of the target mailbox through a timing task;

The unread mail acquisition module 72 is used to obtain the subject and text content of the unread mail if it is detected that there are unread mails in the target mailbox;

The unread email judging module 73 is used to input the subject and text content into the trained decision tree to judge whether the unread email is related to product sales and obtain the first judgment result;

The target content recognition module 74 is used to identify the content in the unread mail as the target content if the first judgment result is that the unread mail is related to product sales;

The product-related information extraction module 75 is used to traverse the target content through preset keywords to extract product-related information in the target content, wherein the product-related information includes product name, sales time, and sales amount;

The product sales information output module 76 is configured to match the product name with the corresponding sales amount through preset matching rules to obtain product sales information, and output the product sales information based on the sales time.

Further, the unread mail judging module 73 includes:

The content input unit is used to input the subject and text content into the trained decision tree;

The node classification unit is used to classify the topics and text content through the trained decision tree to obtain the current node classification results;

The node prediction unit is used to judge the final leaf node of the unread mail based on the current node classification result, as the predicted leaf node;

The prediction result inferring unit is configured to obtain the first judgment result based on the prediction result of the prediction leaf node.

Further, before the unread mail judging module 73, it also includes:

The sample training data module is used to obtain sample emails, and grab the content in the sample emails to obtain sample training data;

The product characteristic attribute identification module is used to identify the product characteristic attribute in the sample training data, wherein the product characteristic attribute is the inherent attribute of the product in the sample training data;

The target training set generation module is used to perform feature selection on the data in the sample training data according to the product feature attributes to generate a target training set, wherein the target training set is divided into training data and test data;

The target training set training module is used to train the target training set using the decision tree algorithm to obtain a trained decision tree.

Further, the target training set training module includes:

The node feature acquisition unit is used to adopt the ID3 algorithm to perform node calculation on the decision tree with the training data to obtain the node feature;

A recursive calculation unit, configured to recursively calculate the decision tree based on node characteristics, wherein each recursive calculation obtains a basic decision tree;

An error value generation unit is used to test and calculate the basic decision tree through the test data to obtain an error value;

The recursive calculation end unit is used to stop the recursive calculation when the error value is less than a preset threshold, and obtain a trained decision tree.

Further, the target content identification module 74 includes:

The detection result acquisition unit is used for if the first judgment result is that the unread email is related to product sales, then use the subject and body content as the target content, and detect whether the unread email includes attachments to obtain the detection result;

The second judgment result generation unit is used to judge the text type of the attachment if the detection result is that the unread mail includes an attachment, and obtain a second judgment result;

The first result generation unit is used to read the text content corresponding to the word document as the target content if the second judgment result is that the text type of the attachment is a word document;

The second result generation unit is used to read out the text content of the text PDF as the target content if the second judgment result is that the text type of the attachment is a text PDF;

The third result generating unit is configured to read the text content of the image PDF as the target content by means of OCR recognition if the second judgment result is that the text type of the attachment is an image PDF.

Further, the product-related information extraction module 75 includes:

A key sentence acquisition unit, configured to traverse the target content through preset keywords, so as to obtain sentences including preset keywords in the target content as key sentences;

The data acquisition unit is used to acquire the data corresponding to the preset keywords in the key sentence, and obtain the sales time and sales amount;

The product name obtaining unit is used to split the key sentence by a delimiter to obtain the product name.

Further, after the product name obtaining unit, it also includes:

a third judging result acquisition unit, configured to judge whether there is a table in the target content, and obtain a third judging result;

A header information acquisition unit configured to parse the table to obtain header information corresponding to the table if the third judgment result is that there is a table in the target content;

The header information matching unit is used to judge whether the header information matches the preset keywords to obtain the fourth judgment result;

The fourth judging result display unit is configured to obtain product-related information based on the header information if the fourth judging result is that the header information matches a preset keyword.

In order to solve the above technical problems, the embodiment of the present application further provides computer equipment. Please refer to FIG. 10 for details. FIG. 10 is a block diagram of the basic structure of the computer device in this embodiment.

The computer device 8 includes a memory 81 , a processor 82 , and a network interface 83 connected to each other through a system bus for communication. It should be pointed out that the figure only shows a computer device 8 with three components memory 81, processor 82, and network interface 83, but it should be understood that it is not required to implement all the components shown, and alternative implementation more or fewer components. Among them, those skilled in the art can understand that the computer device here is a device that can automatically perform numerical calculation and/or information processing according to preset or stored instructions, and its hardware includes but is not limited to microprocessors, dedicated Integrated circuit (Application Specific Integrated Circuit, ASIC), programmable gate array (Field-Programmable Gate Array, FPGA), digital processor (Digital Signal Processor, DSP), embedded devices, etc.

Wherein, the memory 81 stores computer-readable instructions, and when the processor 82 executes the computer-readable instructions, all the steps of any embodiment of the above-mentioned decision tree-based product matching method can be implemented.

The computer equipment may be computing equipment such as desktop computers, notebooks, palmtop computers, and cloud servers. Computer equipment can interact with users through keyboards, mice, remote controls, touch pads, or voice-activated devices.

Memory 81 includes at least one type of readable storage medium, and the computer readable storage medium can be non-volatile or volatile, and the readable storage medium includes flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), Random Access Memory (RAM), Static Random Access Memory (SRAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Programmable Read Only Memory (PROM), Magnetic storage, magnetic disks, optical disks, etc. In some embodiments, the memory 81 may be an internal storage unit of the computer device 8 , such as a hard disk or a memory of the computer device 8 . In other embodiments, the memory 81 can also be an external storage device of the computer device 8, such as a plug-in hard disk equipped on the computer device 8, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) card, flash memory card (Flash Card), etc. Of course, the memory 81 may also include both the internal storage unit of the computer device 8 and its external storage device. In this embodiment, the memory 81 is generally used to store the operating system installed in the computer device 8 and various application software, such as computer-readable instructions of a decision tree-based product matching method, and the like. In addition, the memory 81 can also be used to temporarily store various types of data that have been output or will be output.

In some embodiments, the processor 82 may be a central processing unit (Central Processing Unit, CPU), a controller, a microcontroller, a microprocessor, or other data processing chips. The processor 82 is generally used to control the overall operation of the computer device 8 . In this embodiment, the processor 82 is used to run the computer-readable instructions stored in the memory 81 or process data, for example, to run the above-mentioned computer-readable instructions of the decision tree-based product matching method, so as to realize the decision tree-based product matching method. various embodiments.

The network interface 83 may include a wireless network interface or a wired network interface, and the network interface 83 is generally used to establish a communication connection between the computer device 8 and other electronic devices.

The present application also provides another implementation manner, that is, to provide a computer-readable storage medium, the computer-readable storage medium stores computer-readable instructions, and the computer-readable instructions can be executed by at least one processor, so that at least one The processor executes all the steps of any embodiment of a decision tree-based product matching method as described above.

Through the description of the above embodiments, those skilled in the art can clearly understand that the methods of the above embodiments can be implemented by means of software plus a necessary general-purpose hardware platform, and of course also by hardware, but in many cases the former is better implementation. Based on such an understanding, the technical solution of the present application can be embodied in the form of a software product in essence or the part that contributes to the prior art, and the computer software product is stored in a storage medium (such as ROM/RAM, disk, CD) contains several instructions to make a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) execute the method of each embodiment of the present application.

The blockchain referred to in this application is a new application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm. Blockchain (Blockchain), essentially a decentralized database, is a series of data blocks associated with each other using cryptographic methods. Each data block contains a batch of network transaction information, which is used to verify its Validity of information (anti-counterfeiting) and generation of the next block. The blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.

Apparently, the embodiments described above are only some of the embodiments of the present application, not all of them. The drawings show preferred embodiments of the present application, but do not limit the patent scope of the present application. The present application can be implemented in many different forms, on the contrary, the purpose of providing these embodiments is to make the understanding of the disclosure of the present application more thorough and comprehensive. Although the present application has been described in detail with reference to the foregoing embodiments, those skilled in the art can still modify the technical solutions described in the foregoing specific embodiments, or perform equivalent replacements for some of the technical features . All equivalent structures made using the contents of the description and drawings of this application, directly or indirectly used in other related technical fields, are also within the scope of protection of this application.

Claims

A decision tree-based product matching method, including:

Through timing tasks, regularly detect the receiving status of the target mailbox;

If it is detected that there are unread emails in the target mailbox, then obtain the subject and body content of the unread emails;

Inputting the subject and text content into the trained decision tree to judge whether the unread email is related to product sales, and obtain the first judgment result;

If the first judgment result is that the unread email is related to product sales, identifying the content in the unread email as the target content;

Traversing the target content through preset keywords to extract product-related information in the target content, wherein the product-related information includes product name, sales time, and sales amount;

The product name is matched with the corresponding sales amount through a preset matching rule to obtain product sales information, and based on the sales time, the product sales information is output.
The product matching method based on a decision tree according to claim 1, wherein the subject and text content are input into the trained decision tree to judge whether the unread mail is related to product sales, and obtain the first 1. Judgment results, including:

Input the subject and text content into the trained decision tree;

Carrying out node classification to the subject and text content through the trained decision tree to obtain the current node classification result;

judging the final leaf node of the unread mail based on the classification result of the current node as a predicted leaf node;

The first judgment result is obtained through the prediction result of the prediction leaf node.
The product matching method based on decision tree according to claim 1, wherein, in said inputting said subject and text content into the trained decision tree, to judge whether said unread mail is related to product sales, obtain Before the first judgment result, the method also includes:

Obtaining a sample email, and grabbing the content of the sample email to obtain sample training data;

identifying product feature attributes in the sample training data, wherein the product feature attributes are inherent attributes of products in the sample training data;

Carry out feature selection to the data in the sample training data according to the product characteristic attribute, generate the target training set, wherein, the target training set is divided into training data and test data;

Using a decision tree algorithm to train the target training set to obtain the trained decision tree.
The product matching method based on decision tree according to claim 3, wherein said adopting decision tree algorithm to train said target training set to obtain said trained decision tree comprises:

Using the ID3 algorithm, the training data is used to calculate the nodes of the decision tree to obtain node features;

performing recursive calculation on the decision tree based on the node features, wherein each recursive calculation obtains a basic decision tree;

performing test calculation on the basic decision tree through test data to obtain an error value;

When the error value is smaller than the preset threshold, the recursive calculation is stopped to obtain the trained decision tree.
The product matching method based on a decision tree according to claim 1, wherein if the first judgment result is that the unread email is related to product sales, then identify the content in the unread email as the target content include:

If the first judgment result is that the unread email is related to product sales, then use the subject and body content as the target content, and detect whether the unread email includes attachments to obtain a detection result;

If the detection result is that the unread email includes an attachment, then judging the text type of the attachment to obtain a second judgment result;

If the second judgment result is that the text type of the attachment is a word document, then read the text content corresponding to the word document as the target content;

If the second judgment result is that the text type of the attachment is a text-type PDF, the text content of the text-type PDF is read out as the target content by means of java class package parsing;

If the second judging result is that the text type of the attachment is a picture-type PDF, the text content of the picture-type PDF is read out as the target content by way of OCR identification.
The decision tree-based product matching method according to claim 1, wherein said traversing said target content through preset keywords to extract product-related information in said target content comprises:

traversing the target content through preset keywords to obtain sentences including the preset keywords in the target content as key sentences;

Obtain the data corresponding to the preset keyword in the key sentence, and obtain the sales time and the sales amount;

The key sentence is split by a delimiter to obtain the product name.
The method for product matching based on a decision tree according to claim 6, wherein, after the key sentence is split by a delimiter to obtain the product name, further comprising:

judging whether there is a table in the target content, and obtaining a third judging result;

If the third judgment result is that there is a table in the target content, then parsing the table to obtain header information corresponding to the table;

judging whether the header information matches the preset keyword to obtain a fourth judging result;

If the fourth determination result is that the header information matches a preset keyword, the product-related information is acquired based on the header information.
A product matching device based on a decision tree, comprising:

The receiving situation detection module is used to regularly detect the receiving situation of the target mailbox through the timing task;

An unread email acquisition module, configured to acquire the subject and body content of the unread email if it is detected that there is an unread email in the target mailbox;

An unread mail judging module, which is used to input the subject and text content into the trained decision tree to judge whether the unread mail is related to product sales, and obtain the first judgment result;

A target content identification module, configured to identify the content in the unread email as the target content if the first judgment result is that the unread email is related to product sales;

A product-related information extraction module, configured to traverse the target content through preset keywords to extract product-related information in the target content, wherein the product-related information includes product name, sales time, and sales amount;

The product sales information output module is used to match the product name with the corresponding sales amount through preset matching rules to obtain product sales information, and output the product sales information based on the sales time.
A computer device, comprising a memory and a processor, wherein computer-readable instructions are stored in the memory, wherein the processor implements the following steps when executing the computer-readable instructions:

Through timing tasks, regularly detect the receiving status of the target mailbox;

If it is detected that there are unread emails in the target mailbox, then obtain the subject and body content of the unread emails;

Inputting the subject and text content into the trained decision tree to judge whether the unread email is related to product sales, and obtain the first judgment result;

If the first judgment result is that the unread email is related to product sales, identifying the content in the unread email as the target content;

Traversing the target content through preset keywords to extract product-related information in the target content, wherein the product-related information includes product name, sales time, and sales amount;

The product name is matched with the corresponding sales amount through a preset matching rule to obtain product sales information, and based on the sales time, the product sales information is output.
The computer device according to claim 9, wherein said inputting said subject and text content into a trained decision tree to judge whether said unread email is related to product sales, and obtain a first judgment result, comprising :

Input the subject and text content into the trained decision tree;

Carrying out node classification to the subject and text content through the trained decision tree to obtain the current node classification result;

judging the final leaf node of the unread mail based on the classification result of the current node as a predicted leaf node;

The first judgment result is obtained through the prediction result of the prediction leaf node.
The computer device according to claim 9, wherein, before said inputting said subject and text content into a trained decision tree to judge whether said unread email is related to product sales, and obtaining a first judgment result , the method also includes:

Obtaining a sample email, and grabbing the content of the sample email to obtain sample training data;

identifying product feature attributes in the sample training data, wherein the product feature attributes are inherent attributes of products in the sample training data;

Carry out feature selection to the data in the sample training data according to the product characteristic attribute, generate the target training set, wherein, the target training set is divided into training data and test data;

Using a decision tree algorithm to train the target training set to obtain the trained decision tree.
The computer device according to claim 11, wherein said adopting a decision tree algorithm to train said target training set to obtain said trained decision tree comprises:

Using the ID3 algorithm, the training data is used to calculate the nodes of the decision tree to obtain node features;

performing recursive calculation on the decision tree based on the node features, wherein each recursive calculation obtains a basic decision tree;

performing test calculation on the basic decision tree through test data to obtain an error value;

When the error value is smaller than the preset threshold, the recursive calculation is stopped to obtain the trained decision tree.
The computer device according to claim 9, wherein if the first judgment result is that the unread email is related to product sales, then identifying the content in the unread email, as the target content, includes:

If the first judgment result is that the unread email is related to product sales, then use the subject and body content as the target content, and detect whether the unread email includes attachments to obtain a detection result;

If the detection result is that the unread email includes an attachment, then judging the text type of the attachment to obtain a second judgment result;

If the second judgment result is that the text type of the attachment is a word document, then read the text content corresponding to the word document as the target content;

If the second judgment result is that the text type of the attachment is a text-type PDF, the text content of the text-type PDF is read out as the target content by means of java class package parsing;

If the second judging result is that the text type of the attachment is a picture-type PDF, the text content of the picture-type PDF is read out as the target content by way of OCR recognition.
The computer device according to claim 9, wherein said traversing said target content through preset keywords to extract product-related information in said target content comprises:

traversing the target content through preset keywords to obtain sentences including the preset keywords in the target content as key sentences;

Obtain the data corresponding to the preset keyword in the key sentence, and obtain the sales time and the sales amount;

The key sentence is split by a delimiter to obtain the product name.
The computer device according to claim 14, wherein, after the key sentence is split by separators to obtain the product name, further comprising:

judging whether there is a table in the target content, and obtaining a third judging result;

If the third judgment result is that there is a table in the target content, then parsing the table to obtain header information corresponding to the table;

judging whether the header information matches the preset keyword to obtain a fourth judging result;

If the fourth determination result is that the header information matches a preset keyword, the product-related information is acquired based on the header information.
A computer-readable storage medium, wherein the computer-readable storage medium stores computer-readable instructions, and when the computer-readable instructions are executed by a processor, the processor performs the following steps:

Through timing tasks, regularly detect the receiving status of the target mailbox;

If it is detected that there are unread emails in the target mailbox, then obtain the subject and body content of the unread emails;

Inputting the subject and text content into the trained decision tree to judge whether the unread email is related to product sales, and obtain the first judgment result;

If the first judgment result is that the unread email is related to product sales, identifying the content in the unread email as the target content;

Traversing the target content through preset keywords to extract product-related information in the target content, wherein the product-related information includes product name, sales time, and sales amount;

The product name is matched with the corresponding sales amount through a preset matching rule to obtain product sales information, and based on the sales time, the product sales information is output.
The computer-readable storage medium according to claim 16, wherein the subject and text content are input into a trained decision tree to determine whether the unread email is related to product sales, and obtain a first judgment Results, including:

Input the subject and text content into the trained decision tree;

Carrying out node classification to the subject and text content through the trained decision tree to obtain the current node classification result;

judging the final leaf node of the unread mail based on the classification result of the current node as a predicted leaf node;

The first judgment result is obtained through the prediction result of the prediction leaf node.
The computer-readable storage medium according to claim 16, wherein, after inputting the subject and text content into the trained decision tree to determine whether the unread email is related to product sales, the first Before judging the result, the method also includes:

Obtaining a sample email, and grabbing the content of the sample email to obtain sample training data;

identifying product feature attributes in the sample training data, wherein the product feature attributes are inherent attributes of products in the sample training data;

Carry out feature selection to the data in sample training data according to product feature attribute, generate target training set, wherein, described target training set is divided into training data and test data;

Using a decision tree algorithm to train the target training set to obtain the trained decision tree.
The computer-readable storage medium according to claim 18, wherein said training the target training set using a decision tree algorithm to obtain the trained decision tree comprises:

Using the ID3 algorithm, the training data is used to calculate the nodes of the decision tree to obtain node features;

performing recursive calculation on the decision tree based on the node features, wherein each recursive calculation obtains a basic decision tree;

performing test calculation on the basic decision tree through test data to obtain an error value;

When the error value is smaller than the preset threshold, the recursive calculation is stopped to obtain the trained decision tree.
The computer-readable storage medium according to claim 16, wherein if the first judgment result is that the unread email is related to product sales, then identify the content in the unread email as the target content, include:

If the first judgment result is that the unread email is related to product sales, then use the subject and body content as the target content, and detect whether the unread email includes attachments to obtain a detection result;

If the detection result is that the unread email includes an attachment, then judging the text type of the attachment to obtain a second judgment result;

If the second judgment result is that the text type of the attachment is a word document, then read the text content corresponding to the word document as the target content;

If the second judgment result is that the text type of the attachment is a text-type PDF, the text content of the text-type PDF is read out as the target content by means of java class package analysis;

If the second judging result is that the text type of the attachment is a picture-type PDF, the text content of the picture-type PDF is read out as the target content by way of OCR recognition.