WO2023272850A1 - Decision tree-based product matching method, apparatus and device, and storage medium - Google Patents

Decision tree-based product matching method, apparatus and device, and storage medium Download PDF

Info

Publication number
WO2023272850A1
WO2023272850A1 PCT/CN2021/108777 CN2021108777W WO2023272850A1 WO 2023272850 A1 WO2023272850 A1 WO 2023272850A1 CN 2021108777 W CN2021108777 W CN 2021108777W WO 2023272850 A1 WO2023272850 A1 WO 2023272850A1
Authority
WO
WIPO (PCT)
Prior art keywords
product
decision tree
content
target
sales
Prior art date
Application number
PCT/CN2021/108777
Other languages
French (fr)
Chinese (zh)
Inventor
平高明
Original Assignee
未鲲(上海)科技服务有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 未鲲(上海)科技服务有限公司 filed Critical 未鲲(上海)科技服务有限公司
Publication of WO2023272850A1 publication Critical patent/WO2023272850A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying

Definitions

  • the present application relates to the technical field of artificial intelligence, and in particular to a decision tree-based product matching method, device, equipment and storage medium.
  • the purpose of the embodiments of the present application is to propose a decision tree-based product matching method, device, equipment, and storage medium, so as to improve product matching efficiency.
  • the embodiment of the present application provides a decision tree-based product matching method, including:
  • the unread email is related to product sales, identifying the content in the unread email as the target content
  • the product name is matched with the corresponding sales amount through a preset matching rule to obtain product sales information, and based on the sales time, the product sales information is output.
  • the embodiment of the present application provides a decision tree-based product matching device, including:
  • the receiving situation detection module is used to regularly detect the receiving situation of the target mailbox through the timing task;
  • An unread email acquisition module configured to acquire the subject and body content of the unread email if it is detected that there is an unread email in the target mailbox;
  • An unread mail judging module which is used to input the subject and text content into the trained decision tree to judge whether the unread mail is related to product sales, and obtain the first judgment result;
  • a target content identification module configured to identify the content in the unread email as the target content if the first judgment result is that the unread email is related to product sales;
  • a product-related information extraction module configured to traverse the target content through preset keywords to extract product-related information in the target content, wherein the product-related information includes product name, sales time, and sales amount;
  • the product sales information output module is used to match the product name with the corresponding sales amount through preset matching rules to obtain product sales information, and output the product sales information based on the sales time.
  • the embodiment of the present application also provides a computer device, which includes a memory and a processor, where computer-readable instructions are stored in the memory, and the processor implements the following steps when executing the computer-readable instructions:
  • the unread email is related to product sales, identifying the content in the unread email as the target content
  • the product name is matched with the corresponding sales amount through a preset matching rule to obtain product sales information, and based on the sales time, the product sales information is output.
  • the embodiment of the present application further provides a computer-readable storage medium, the computer-readable storage medium stores computer-readable instructions, and when the computer-readable instructions are executed by a processor, the processor executes the following: step:
  • the unread email is related to product sales, identifying the content in the unread email as the target content
  • the product name is matched with the corresponding sales amount through a preset matching rule to obtain product sales information, and based on the sales time, the product sales information is output.
  • Embodiments of the present application provide a decision tree-based product matching method, device, device, and storage medium.
  • the embodiment of the present application realizes automatically judging whether the unread email is related to the product sales information through the trained decision tree, and if so, extracts the product-related information in the email for product matching, which is beneficial to improve the matching efficiency of the product.
  • Fig. 1 is the application environment diagram of the product matching method based on the decision tree provided by the embodiment of the present application;
  • Fig. 2 is an implementation flowchart of a decision tree-based product matching method provided according to an embodiment of the present application
  • Fig. 3 is an implementation flowchart of the sub-process in the decision tree-based product matching method provided by the embodiment of the present application;
  • Fig. 4 is another implementation flowchart of the sub-process in the decision tree-based product matching method provided by the embodiment of the present application;
  • Fig. 5 is another implementation flowchart of the sub-process in the decision tree-based product matching method provided by the embodiment of the present application.
  • Fig. 6 is another implementation flowchart of the sub-process in the decision tree-based product matching method provided by the embodiment of the present application.
  • Fig. 7 is another implementation flowchart of the sub-process in the decision tree-based product matching method provided by the embodiment of the present application.
  • Fig. 8 is another implementation flowchart of the sub-process in the decision tree-based product matching method provided by the embodiment of the present application.
  • FIG. 9 is a schematic diagram of a decision tree-based product matching device provided in an embodiment of the present application.
  • Fig. 10 is a schematic diagram of a computer device provided by an embodiment of the present application.
  • a system architecture 100 may include terminal devices 101 , 102 , 103 , a network 104 and a server 105 .
  • the network 104 is used as a medium for providing communication links between the terminal devices 101 , 102 , 103 and the server 105 .
  • Network 104 may include various connection types, such as wires, wireless communication links, or fiber optic cables, among others.
  • terminal devices 101 , 102 , 103 Users can use terminal devices 101 , 102 , 103 to interact with server 105 via network 104 to receive or send messages and the like.
  • Various communication client applications may be installed on the terminal devices 101, 102, 103, such as web browser applications, search applications, instant messaging tools, and the like.
  • the terminal devices 101, 102, 103 may be various electronic devices with display screens and supporting web browsing, including but not limited to smart phones, tablet computers, laptop computers, desktop computers and the like.
  • the server 105 may be a server that provides various services, such as a background server that provides support for pages displayed on the terminal devices 101 , 102 , 103 .
  • decision tree-based product matching method provided in the embodiment of the present application is generally executed by a server, and correspondingly, the decision tree-based product matching device is generally configured in the server.
  • terminal devices, networks and servers in Fig. 1 are only illustrative. According to the implementation needs, there can be any number of terminal devices, networks and servers.
  • FIG. 2 shows a specific implementation of a decision tree-based product matching method.
  • the server can monitor the mailbox of the client through regular monitoring tasks. If it receives an unread email, it will judge whether the unread email is related to product sales. If it is related, it will extract the product-related information of the email, and Based on the preset matching rules, match the product-related information to obtain product sales information, then output the product sales information, generate a data analysis report, and return the data analysis report to the client.
  • the second is the client terminal.
  • the user of the client terminal can be the operator of the product sales, who can receive the data analysis report returned by the server.
  • the reception of a specific mailbox is monitored in real time through the technology of regularly executing tasks. If an unread email is detected, the unread email will be judged to determine whether the unread email is related to product sales. . Further, by judging the reading identifier of the mail, it is judged whether the mail has been read. Wherein, the read flag is obtained through the generated log file, and the read flag is used to distinguish the read or unread status of mail processing.
  • the unread email when an unread email is detected, the unread email is accurately identified from the target mailbox through the generated log file. The subject and body content of the unread email are then captured as a basis for subsequently judging whether the unread email is related to product sales.
  • S3 Input the subject and text content into the trained decision tree to judge whether the unread email is related to product sales, and obtain the first judgment result.
  • a decision tree is trained to determine whether unread emails are related to product sales through the trained decision tree.
  • the decision tree processes the data by using the induction algorithm to generate classification rules and decision trees, and then predicts and analyzes new data.
  • the terminal node "Leaf Node” of the tree represents the category (Class) of the classification result
  • each internal node represents a variable test
  • the branch (Branch) is the test output, representing a possible value of the variable.
  • variable values are tested on the data
  • each path represents a classification rule.
  • the decision tree model maximizes the difference of dependent variables by continuously dividing the data. The ultimate goal is to classify the data into different organizations or different branches, and establish the strongest classification on the value of dependent variables. Rely on the powerful tools of deep learning to classify the subject and content of emails.
  • the Sklearn module of python machine learning is used to realize the decision tree.
  • a decision tree is a predictive model, which represents a mapping relationship between object attributes and object values.
  • Each node in the tree represents an object, and each bifurcated path represents a possible attribute value, and each leaf node corresponds to the object represented by the path from the root node to the leaf node. value.
  • a decision tree that predicts whether a person will buy a computer using this tree, can classify new records, starting from the root node (age), if a person's age is middle-aged, it is directly judged that this person will buy a computer , if it is a teenager, it needs to further judge whether it is a student; if it is an old person, it needs to further judge its credit rating until the leaf node can determine the category of the record.
  • a decision tree can be used to determine whether unread emails are related to product sales.
  • a decision tree that predicts whether an unread email is related to product sales can classify new unread emails; start classification from the root node (mail subject and body content) , there may be three classifications of "spam mail", “daily mail” and "mail including the word 'product'”; if it is classified as “spam mail” or "daily mail", it is directly determined that the mail has nothing to do with product sales; if Classified as emails containing the word 'product'", then start to classify from this node, and may be classified into "mails with the word 'sales'", "mails with the word 'sales'" and "no word' If it is the first two classification results, it is determined to be an email related to product sales; if it is the third result, further judgment is made until the leaf node can determine the category of the email.
  • Fig. 3 has shown a kind of specific implementation of step S3, described in detail as follows:
  • S32 Use the trained decision tree to classify the nodes of the topic and text content, and obtain the current node classification result.
  • node classification means that the decision tree selects an optimal feature according to the input data, and divides the input data set into subsets according to this data feature, so that each subset has the best classification under the current conditions.
  • S33 Determine the final leaf node of the unread mail based on the classification result of the current node as the predicted leaf node.
  • the classification results of unread emails are divided into "related to product sales" and "not related to product sales”.
  • the first judgment result is divided into two results: the unread emails are related to product sales and the unread emails are not related to product sales.
  • the subject and text content are classified through the trained decision tree to obtain the current node classification result, and based on the current node classification result, the final leaf node of the unread email is judged as the predicted leaf node, and then the predicted The prediction result of the leaf node obtains the first judgment result, so that unread emails related to product sales can be quickly filtered out from many emails, which is conducive to rapid analysis of emails and thus helps to improve product matching efficiency.
  • FIG. 4 shows a specific implementation before step S3, which is described in detail as follows:
  • S3A Obtain sample emails, and grab the content in the sample emails to obtain sample training data.
  • sample emails are divided into emails related to product sales and emails not related to product sales, which are used to train the decision tree. Grab the content in the sample email, including the email subject, body content, and attachment content, and use the content as sample training data.
  • S3B identifying product feature attributes in the sample training data, wherein the product feature attributes are inherent attributes of products in the sample training data.
  • the product characteristic attribute is an inherent attribute of the product in the sample training data, such as product name, product category, product sales amount, product sales time, and the like.
  • S3C Perform feature selection on the data in the sample training data according to the product feature attributes to generate a target training set, wherein the target training set is divided into training data and test data.
  • feature selection is performed on the data in the sample training data through product feature attributes, and data with product feature attributes is selected to generate target training data.
  • feature selection refers to selecting data related to product feature attributes. For example, if the sample training data set is a data set of 50 columns, after feature selection, there are only 10 columns of data remaining, then the 10 columns of data will be used as the target training set.
  • attribute selection can not only reduce the size of the data set, but also improve the prediction effect of the decision tree model.
  • the method used is generally through algorithm selection (such as: feature selection algorithm based on information gain) and The selection is carried out by a combination of artificial selection.
  • S3D Use the decision tree algorithm to train the target training set to obtain a trained decision tree.
  • attribute data useful for the prediction process are selected according to the feature attributes. That is, when training the decision tree in the embodiment of the present application, product-related information, such as product name, product category, sales amount, sales time, etc., are used as product feature attributes. Use these feature data to filter the training data, and select emails related to product sales to obtain the target training set, and then divide the target training set into training data and test data according to the preset ratio.
  • the training data is used to train the decision tree
  • the test data is used to verify the trained decision tree
  • the preset ratio is set according to the actual situation, which is not limited here. In a specific embodiment, the preset ratio is 8:2.
  • the sample training data is obtained, the product feature attributes in the sample training data are identified, and then feature selection is performed on the data in the sample training data according to the product feature attributes.
  • step S3D shows a specific implementation of step S3D, which is described in detail as follows:
  • S3D1 Using the ID3 algorithm, the training data is used to calculate the nodes of the decision tree to obtain the node characteristics.
  • the ID3 algorithm was first proposed by J. Ross Quinlan at the University of Sydney in 1975 as a classification prediction algorithm.
  • the core of the algorithm is "information entropy".
  • the ID3 algorithm calculates the information gain of each attribute, and considers that the attribute with high information gain is a good attribute.
  • Each division selects the attribute with the highest information gain as the division standard, and repeats this process until a decision tree that can perfectly classify training samples is generated.
  • the ID3 algorithm is used to train the decision tree corresponding to the emails related to product sales.
  • node calculation refers to starting from the root node (root node), and selecting the feature with the largest information gain as the feature of the node for the information gain of all possible features.
  • S3D2 Based on node characteristics, recursively calculate the decision tree, where each recursive calculation obtains a basic decision tree.
  • each recursive calculation selects the feature with the largest information gain as the node feature for the next calculation, and at the same time, each recursive calculation obtains a decision tree.
  • S3D3 Test and calculate the basic decision tree through the test data to obtain the error value.
  • decision tree model training there are generally two methods of model testing.
  • One is to divide the data in the training set into two parts, and use part of the data for training to generate a decision tree (ie training data), and part of the data It is used for testing (that is, test data), wherein, generally, test cases are selected in the test data; another method is to use the n-fold cross-validation method to divide the data in the training set into n folds. If the data is divided into 10 Take 8 of them for training to generate a decision tree, and the remaining 2 for testing, as test cases for testing, until all 10 data are used as test cases for testing separately, then the entire testing process is completed.
  • the basic decision tree is tested and calculated through the test data to obtain the error value, and the recursive calculation is stopped until the error value is less than the preset threshold, so as to obtain the trained decision Tree.
  • the error value refers to the difference between the classification result of the basic decision tree and the actual classification result.
  • the preset threshold is set according to actual conditions, and is not limited here. In a specific embodiment, the preset threshold is 0.05.
  • the ID3 algorithm is used to calculate the nodes of the decision tree with the training data to obtain the node characteristics, and based on the node characteristics, the decision tree is recursively calculated, and then the basic decision tree is tested and calculated by the test data to obtain the error value , and when the error value is less than the preset threshold, stop the recursive calculation, get the trained decision tree, realize the training of the decision tree, and obtain the decision tree related to product sales, which is convenient for improving the identification of whether unread emails are related to the product Accuracy.
  • the unread email needs to be analyzed for product sales, so all content in the unread email needs to be identified as the target content.
  • the above steps only identify the subject and body content of unread emails, but in actual situations, emails may have attachments, and the attachments may also include information about product sales. Therefore, it is necessary to determine whether the unread email includes an attachment, and if it includes an attachment, further identify and analyze the attachment, obtain the content in the attachment, and use the content in the attachment together with the subject and body content as the target content.
  • Fig. 6 shows a specific implementation of step S4, which is described in detail as follows:
  • the first judgment result is that the unread email is related to product sales
  • the subject and body content are taken as the target content first, and then it is judged whether the unread email includes attachments.
  • the unread email includes an attachment
  • determine the text type of the attachment so as to obtain the target content in the attachment in a corresponding manner according to the text type of the attachment.
  • the text type of the attachment is a word document
  • the text content corresponding to the word document can be directly read and used as the target content.
  • the java class package includes PDFBox, iText, and XPDF.
  • PDFBox an open source project under a BSD license
  • XPDF is an open source project, and corresponding local methods can be called to achieve extraction Chinese pdf file.
  • the OCR (optical character recognition) recognition method refers to the method of scanning text data, and then analyzing and processing image files to obtain text and layout information.
  • the text content of the picture-type PDF is read out through OCR recognition.
  • the unread mail is related to product sales
  • the subject and body content are used as the target content, and whether the unread mail includes an attachment is detected to obtain the detection result. If there is an attachment, the corresponding text type is used according to the attachment text type. Acquiring the corresponding target content through the analysis method, realizing accurate acquisition of the target content, is conducive to improving the subsequent identification of product-related information, which in turn is conducive to improving the efficiency of product matching.
  • S5 Traversing the target content through preset keywords to extract product-related information in the target content, wherein the product-related information includes product name, sales time, and sales amount.
  • the product, product name, amount, sales amount, sales time, etc. are used as preset keywords, and combined with preset keywords to traverse the target content, thereby extracting product-related information in the target content, wherein the product-related information includes Product name, sales time and sales amount, etc.
  • the product-related information includes Product name, sales time and sales amount, etc.
  • the preset keywords are set according to information related to product sales, and may be words such as product, amount, and sales.
  • Figure 7 shows a specific implementation of step S5, described in detail as follows:
  • S51 Traversing the target content through preset keywords to acquire sentences including preset keywords in the target content as key sentences.
  • the sentences of the preset keywords in the target content are obtained as key sentences for subsequent identification of corresponding product-related information.
  • S52 Obtain data corresponding to preset keywords in key sentences, and obtain sales time and sales amount.
  • the target content often includes the relevant amount of product sales and the relevant time of product sales.
  • the sales time and sales amount are obtained.
  • the key sentence is "The target sales amount in this shopping mall activity is 30,000 yuan, and the sales time is 2021-5-3 ", by identifying the data corresponding to the preset keywords, you can get the sales amount of 30,000 and the sales time of 2021-5-3.
  • the key sentence it is split by a delimiter to obtain preset keywords, such as information after "product”, so as to obtain the product name.
  • the key sentence is "This time our sale product: XXX item”.
  • the product-related information in the target content can be obtained through regular matching.
  • the preset keywords are used to traverse the target content to obtain sentences including the preset keywords in the target content as key sentences, and then the data corresponding to the preset keywords in the key sentences are obtained to obtain the sales time and the sales amount, and split the key sentences through delimiters to obtain the product name, so as to accurately identify the corresponding product-related information from the target content, which is conducive to improving the matching efficiency of products.
  • FIG. 8 shows a specific implementation of step S53, which is described in detail as follows:
  • the form since there may be a form in the body content and attachment content of the email, the form may also be related information of the product. Therefore, it is necessary to determine whether there is a table in the target content, and if so, parse the table to obtain the header information, and then judge whether the table information is related to product sales based on the header information.
  • the fourth judgment result is that the header information matches the preset keyword; otherwise, the fourth judgment result is that the header information matches the preset keyword. Keywords do not match.
  • the fourth judgment result is that the header information matches the preset keywords, the data corresponding to the header information in the table is obtained, and then the product-related information can be obtained.
  • the header information in the table is obtained, and it is judged whether the table matches the preset keywords. If so, the corresponding product-related information is obtained, and the product is obtained from the table. Relevant information, so that the corresponding product-related information is not missed, so as to facilitate subsequent matching of corresponding product sales information.
  • S6 Match the product name with the corresponding sales amount through the preset matching rules to obtain product sales information, and output the product sales information based on the sales time.
  • the obtained product-related information is matched with the corresponding sales amount according to the preset matching rules, and combined with the sales time point, through selenuim combined with the requests technology to set the product status configuration of the timing effective mode or the immediate restart mode. Furthermore, after configuring the person group and the channel, the specific person group and the specific channel can accurately obtain the sales volume. At the same time, after the task product assignment task is completed, an analysis report can be generated and sent to the operator.
  • the receiving status of the mailbox is regularly detected through the scheduled task, and if an unread email is detected, the subject and text content of the unread email are obtained; the subject and text content are input into the trained decision tree for decision-making and judgment , judge whether the unread email is related to product sales, and get the judgment result; if the judgment result is that the unread email is related to product sales, then identify the content in the unread email as the target content; traverse the target content through preset keywords, and use Extract the product-related information in the target content; match the product name with the corresponding sales amount through the preset matching rules to obtain product sales information, and output the product sales information based on the sales time.
  • the trained decision tree is used to automatically determine whether unread emails are related to product sales information.
  • product matching is carried out by extracting product-related information in emails, which is conducive to improving product matching efficiency.
  • this application also uses timing tasks to detect emails and process product-related emails in a timely manner; this application also combines the corresponding format of email content and adopts different analysis methods to obtain corresponding product information, which is conducive to improving product matching efficiency.
  • the above-mentioned product sales information can also be stored in a node of a block chain.
  • the aforementioned storage medium may be a nonvolatile storage medium such as a magnetic disk, an optical disk, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM).
  • the present application provides an embodiment of a decision tree-based product matching device, which corresponds to the method embodiment shown in FIG. 2 .
  • the device can be specifically applied to various electronic devices.
  • the product matching device based on the decision tree of the present embodiment includes: a receiving situation detection module 71, an unread mail acquisition module 72, an unread mail judgment module 73, a target content identification module 74, and a product-related information extraction module 75 and product sales information output module 76, wherein:
  • the receiving situation detection module 71 is used to regularly detect the receiving situation of the target mailbox through a timing task
  • the unread mail acquisition module 72 is used to obtain the subject and text content of the unread mail if it is detected that there are unread mails in the target mailbox;
  • the unread email judging module 73 is used to input the subject and text content into the trained decision tree to judge whether the unread email is related to product sales and obtain the first judgment result;
  • the target content recognition module 74 is used to identify the content in the unread mail as the target content if the first judgment result is that the unread mail is related to product sales;
  • the product-related information extraction module 75 is used to traverse the target content through preset keywords to extract product-related information in the target content, wherein the product-related information includes product name, sales time, and sales amount;
  • the product sales information output module 76 is configured to match the product name with the corresponding sales amount through preset matching rules to obtain product sales information, and output the product sales information based on the sales time.
  • the unread mail judging module 73 includes:
  • the content input unit is used to input the subject and text content into the trained decision tree
  • the node classification unit is used to classify the topics and text content through the trained decision tree to obtain the current node classification results
  • the node prediction unit is used to judge the final leaf node of the unread mail based on the current node classification result, as the predicted leaf node;
  • the prediction result inferring unit is configured to obtain the first judgment result based on the prediction result of the prediction leaf node.
  • the unread mail judging module 73 it also includes:
  • the sample training data module is used to obtain sample emails, and grab the content in the sample emails to obtain sample training data;
  • the product characteristic attribute identification module is used to identify the product characteristic attribute in the sample training data, wherein the product characteristic attribute is the inherent attribute of the product in the sample training data;
  • the target training set generation module is used to perform feature selection on the data in the sample training data according to the product feature attributes to generate a target training set, wherein the target training set is divided into training data and test data;
  • the target training set training module is used to train the target training set using the decision tree algorithm to obtain a trained decision tree.
  • the target training set training module includes:
  • the node feature acquisition unit is used to adopt the ID3 algorithm to perform node calculation on the decision tree with the training data to obtain the node feature;
  • a recursive calculation unit configured to recursively calculate the decision tree based on node characteristics, wherein each recursive calculation obtains a basic decision tree
  • An error value generation unit is used to test and calculate the basic decision tree through the test data to obtain an error value
  • the recursive calculation end unit is used to stop the recursive calculation when the error value is less than a preset threshold, and obtain a trained decision tree.
  • the target content identification module 74 includes:
  • the detection result acquisition unit is used for if the first judgment result is that the unread email is related to product sales, then use the subject and body content as the target content, and detect whether the unread email includes attachments to obtain the detection result;
  • the second judgment result generation unit is used to judge the text type of the attachment if the detection result is that the unread mail includes an attachment, and obtain a second judgment result;
  • the first result generation unit is used to read the text content corresponding to the word document as the target content if the second judgment result is that the text type of the attachment is a word document;
  • the second result generation unit is used to read out the text content of the text PDF as the target content if the second judgment result is that the text type of the attachment is a text PDF;
  • the third result generating unit is configured to read the text content of the image PDF as the target content by means of OCR recognition if the second judgment result is that the text type of the attachment is an image PDF.
  • the product-related information extraction module 75 includes:
  • a key sentence acquisition unit configured to traverse the target content through preset keywords, so as to obtain sentences including preset keywords in the target content as key sentences;
  • the data acquisition unit is used to acquire the data corresponding to the preset keywords in the key sentence, and obtain the sales time and sales amount;
  • the product name obtaining unit is used to split the key sentence by a delimiter to obtain the product name.
  • the product name obtaining unit it also includes:
  • a third judging result acquisition unit configured to judge whether there is a table in the target content, and obtain a third judging result
  • a header information acquisition unit configured to parse the table to obtain header information corresponding to the table if the third judgment result is that there is a table in the target content
  • the header information matching unit is used to judge whether the header information matches the preset keywords to obtain the fourth judgment result;
  • the fourth judging result display unit is configured to obtain product-related information based on the header information if the fourth judging result is that the header information matches a preset keyword.
  • the above-mentioned product sales information can also be stored in a node of a block chain.
  • FIG. 10 is a block diagram of the basic structure of the computer device in this embodiment.
  • the computer device 8 includes a memory 81 , a processor 82 , and a network interface 83 connected to each other through a system bus for communication. It should be pointed out that the figure only shows a computer device 8 with three components memory 81, processor 82, and network interface 83, but it should be understood that it is not required to implement all the components shown, and alternative implementation more or fewer components.
  • the computer device here is a device that can automatically perform numerical calculation and/or information processing according to preset or stored instructions, and its hardware includes but is not limited to microprocessors, dedicated Integrated circuit (Application Specific Integrated Circuit, ASIC), programmable gate array (Field-Programmable Gate Array, FPGA), digital processor (Digital Signal Processor, DSP), embedded devices, etc.
  • ASIC Application Specific Integrated Circuit
  • FPGA Field-Programmable Gate Array
  • DSP Digital Signal Processor
  • the memory 81 stores computer-readable instructions, and when the processor 82 executes the computer-readable instructions, all the steps of any embodiment of the above-mentioned decision tree-based product matching method can be implemented.
  • the computer equipment may be computing equipment such as desktop computers, notebooks, palmtop computers, and cloud servers. Computer equipment can interact with users through keyboards, mice, remote controls, touch pads, or voice-activated devices.
  • Memory 81 includes at least one type of readable storage medium, and the computer readable storage medium can be non-volatile or volatile, and the readable storage medium includes flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), Random Access Memory (RAM), Static Random Access Memory (SRAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Programmable Read Only Memory (PROM), Magnetic storage, magnetic disks, optical disks, etc.
  • the memory 81 may be an internal storage unit of the computer device 8 , such as a hard disk or a memory of the computer device 8 .
  • the memory 81 can also be an external storage device of the computer device 8, such as a plug-in hard disk equipped on the computer device 8, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) card, flash memory card (Flash Card), etc.
  • the memory 81 may also include both the internal storage unit of the computer device 8 and its external storage device.
  • the memory 81 is generally used to store the operating system installed in the computer device 8 and various application software, such as computer-readable instructions of a decision tree-based product matching method, and the like.
  • the memory 81 can also be used to temporarily store various types of data that have been output or will be output.
  • the processor 82 may be a central processing unit (Central Processing Unit, CPU), a controller, a microcontroller, a microprocessor, or other data processing chips.
  • the processor 82 is generally used to control the overall operation of the computer device 8 .
  • the processor 82 is used to run the computer-readable instructions stored in the memory 81 or process data, for example, to run the above-mentioned computer-readable instructions of the decision tree-based product matching method, so as to realize the decision tree-based product matching method.
  • the network interface 83 may include a wireless network interface or a wired network interface, and the network interface 83 is generally used to establish a communication connection between the computer device 8 and other electronic devices.
  • the present application also provides another implementation manner, that is, to provide a computer-readable storage medium, the computer-readable storage medium stores computer-readable instructions, and the computer-readable instructions can be executed by at least one processor, so that at least one The processor executes all the steps of any embodiment of a decision tree-based product matching method as described above.
  • the methods of the above embodiments can be implemented by means of software plus a necessary general-purpose hardware platform, and of course also by hardware, but in many cases the former is better implementation.
  • the technical solution of the present application can be embodied in the form of a software product in essence or the part that contributes to the prior art, and the computer software product is stored in a storage medium (such as ROM/RAM, disk, CD) contains several instructions to make a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) execute the method of each embodiment of the present application.
  • a terminal device which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.
  • Blockchain essentially a decentralized database, is a series of data blocks associated with each other using cryptographic methods. Each data block contains a batch of network transaction information, which is used to verify its Validity of information (anti-counterfeiting) and generation of the next block.
  • the blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.

Abstract

The present application relates to the technical field of artificial intelligence, and discloses a decision tree-based product matching method, apparatus and device, and a storage medium. The method comprises: when an unread email is detected at fixed time periods, obtaining a subject and body content of the unread email; then inputting the subject and the body content into a trained decision tree for decision determination; if the unread email is related to product sales, then identifying the content in the unread email as target content; extracting product related information in the target content; and then matching the product name with a corresponding sales amount to obtain product sales information, and outputting the product sales information. The present application further relates to blockchain technology, and the product sales information is stored in a blockchain. The present application achieves the automatic determination, by means of a trained decision tree, of whether unread email is related to product sales information, and product related information in the email is extracted for product matching, helping to improve product matching efficiency.

Description

基于决策树的产品匹配方法、装置、设备及存储介质Product matching method, device, equipment and storage medium based on decision tree
本申请要求于2021年6月29日提交中国专利局、申请号为202110725709.8,发明名称为“基于决策树的产品匹配方法、装置、设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application with the application number 202110725709.8 submitted to the China Patent Office on June 29, 2021, and the title of the invention is "Decision Tree-Based Product Matching Method, Device, Equipment, and Storage Medium", the entire content of which Incorporated in this application by reference.
技术领域technical field
本申请涉及人工智能技术领域,尤其涉及一种基于决策树的产品匹配方法、装置、设备及存储介质。The present application relates to the technical field of artificial intelligence, and in particular to a decision tree-based product matching method, device, equipment and storage medium.
背景技术Background technique
随着业务不断拓展,日常运营已成为每个公司正常运营的必要组成部分。销售运营作为日常运营的其中一环,旨在建立产品销售生态,并维护这个生态圈,形成用户贡献和消费产品内容的生态闭环。产品销售策略明确后需要运营人员将销售策略指令做机械性的录入。As the business continues to expand, day-to-day operations have become an integral part of the normal running of every company. Sales operations, as part of daily operations, aim to establish a product sales ecosystem, maintain this ecosystem, and form an ecological closed loop for user contributions and product content consumption. After the product sales strategy is clarified, operators need to mechanically enter the sales strategy instructions.
在现有的产品销售运营方法中,往往需要人工识别邮件中的销售方案,并对该销售方案进行拆分和匹配,从而得到相关产品的匹配方案。但是,发明人意识到这需要人工去关注邮件的情况,并给予判断邮箱是否与产品销售相关,这导致可能对邮件遗漏,从而导致后续产品分析时间过长,导致产品的匹配效率较低。现亟需一种能够提高产品匹配效率的方法。In the existing product sales operation method, it is often necessary to manually identify the sales plan in the email, and split and match the sales plan, so as to obtain the matching plan of related products. However, the inventor realized that this requires manual attention to the email situation, and to judge whether the email address is related to product sales, which may lead to missing emails, resulting in too long follow-up product analysis time, resulting in low product matching efficiency. There is an urgent need for a method that can improve product matching efficiency.
发明内容Contents of the invention
本申请实施例的目的在于提出一种基于决策树的产品匹配方法、装置、设备及存储介质,以提高产品的匹配效率。The purpose of the embodiments of the present application is to propose a decision tree-based product matching method, device, equipment, and storage medium, so as to improve product matching efficiency.
第一方面,本申请实施例提供一种基于决策树的产品匹配方法,包括:In the first aspect, the embodiment of the present application provides a decision tree-based product matching method, including:
通过定时任务,定时检测目标邮箱的接收情况;Through timing tasks, regularly detect the receiving status of the target mailbox;
若检测到所述目标邮箱中存在未读邮件,则获取所述未读邮件的主题与正文内容;If it is detected that there are unread emails in the target mailbox, then obtain the subject and body content of the unread emails;
将所述主题与正文内容输入到训练好的决策树中,以判断所述未读邮件是否与产品销售相关,得到第一判断结果;Inputting the subject and text content into the trained decision tree to judge whether the unread email is related to product sales, and obtain the first judgment result;
若所述第一判断结果为所述未读邮件与产品销售相关,则识别所述未读邮件中的内容,作为目标内容;If the first judgment result is that the unread email is related to product sales, identifying the content in the unread email as the target content;
通过预设关键词遍历所述目标内容,以提取所述目标内容中的产品相关信息,其中,所述产品相关信息包括产品名称、销售时间、销售金额;Traversing the target content through preset keywords to extract product-related information in the target content, wherein the product-related information includes product name, sales time, and sales amount;
通过预设的匹配规则,将所述产品名称匹配对应的销售金额,得到产品销售信息,并基于所述销售时间,将所述产品销售信息进行输出。The product name is matched with the corresponding sales amount through a preset matching rule to obtain product sales information, and based on the sales time, the product sales information is output.
第二方面,本申请实施例提供一种基于决策树的产品匹配装置,包括:In the second aspect, the embodiment of the present application provides a decision tree-based product matching device, including:
接收情况检测模块,用于通过定时任务,定时检测目标邮箱的接收情况;The receiving situation detection module is used to regularly detect the receiving situation of the target mailbox through the timing task;
未读邮件获取模块,用于若检测到所述目标邮箱中存在未读邮件,则获取所述未读邮件的主题与正文内容;An unread email acquisition module, configured to acquire the subject and body content of the unread email if it is detected that there is an unread email in the target mailbox;
未读邮件判断模块,用于将所述主题与正文内容输入到训练好的决策树中,以判断所述未读邮件是否与产品销售相关,得到第一判断结果;An unread mail judging module, which is used to input the subject and text content into the trained decision tree to judge whether the unread mail is related to product sales, and obtain the first judgment result;
目标内容识别模块,用于若所述第一判断结果为所述未读邮件与产品销售相关,则识别所述未读邮件中的内容,作为目标内容;A target content identification module, configured to identify the content in the unread email as the target content if the first judgment result is that the unread email is related to product sales;
产品相关信息提取模块,用于通过预设关键词遍历所述目标内容,以提取所述目标内容中的产品相关信息,其中,所述产品相关信息包括产品名称、销售时间、销售金额;A product-related information extraction module, configured to traverse the target content through preset keywords to extract product-related information in the target content, wherein the product-related information includes product name, sales time, and sales amount;
产品销售信息输出模块,用于通过预设的匹配规则,将所述产品名称匹配对应的销售金额,得到产品销售信息,并基于所述销售时间,将所述产品销售信息进行输出。The product sales information output module is used to match the product name with the corresponding sales amount through preset matching rules to obtain product sales information, and output the product sales information based on the sales time.
第三方面,本申请实施例还提供一种计算机设备,其包括存储器和处理器,所述存储器中存储有计算机可读指令,所述处理器执行所述计算机可读指令时实现如下步骤:In a third aspect, the embodiment of the present application also provides a computer device, which includes a memory and a processor, where computer-readable instructions are stored in the memory, and the processor implements the following steps when executing the computer-readable instructions:
通过定时任务,定时检测目标邮箱的接收情况;Through timing tasks, regularly detect the receiving status of the target mailbox;
若检测到所述目标邮箱中存在未读邮件,则获取所述未读邮件的主题与正文内容;If it is detected that there are unread emails in the target mailbox, then obtain the subject and body content of the unread emails;
将所述主题与正文内容输入到训练好的决策树中,以判断所述未读邮件是否与产品销售相关,得到第一判断结果;Inputting the subject and text content into the trained decision tree to judge whether the unread email is related to product sales, and obtain the first judgment result;
若所述第一判断结果为所述未读邮件与产品销售相关,则识别所述未读邮件中的内容,作为目标内容;If the first judgment result is that the unread email is related to product sales, identifying the content in the unread email as the target content;
通过预设关键词遍历所述目标内容,以提取所述目标内容中的产品相关信息,其中,所述产品相关信息包括产品名称、销售时间、销售金额;Traversing the target content through preset keywords to extract product-related information in the target content, wherein the product-related information includes product name, sales time, and sales amount;
通过预设的匹配规则,将所述产品名称匹配对应的销售金额,得到产品销售信息,并基于所述销售时间,将所述产品销售信息进行输出。The product name is matched with the corresponding sales amount through a preset matching rule to obtain product sales information, and based on the sales time, the product sales information is output.
第四方面,本申请实施例还提供一种计算机可读存储介质,所述计算机可读存储介质存储有计算机可读指令,所述计算机可读指令被处理器执行时使得所述处理器执行如下步骤:In the fourth aspect, the embodiment of the present application further provides a computer-readable storage medium, the computer-readable storage medium stores computer-readable instructions, and when the computer-readable instructions are executed by a processor, the processor executes the following: step:
通过定时任务,定时检测目标邮箱的接收情况;Through timing tasks, regularly detect the receiving status of the target mailbox;
若检测到所述目标邮箱中存在未读邮件,则获取所述未读邮件的主题与正文内容;If it is detected that there are unread emails in the target mailbox, then obtain the subject and body content of the unread emails;
将所述主题与正文内容输入到训练好的决策树中,以判断所述未读邮件是否与产品销售相关,得到第一判断结果;Inputting the subject and text content into the trained decision tree to judge whether the unread email is related to product sales, and obtain the first judgment result;
若所述第一判断结果为所述未读邮件与产品销售相关,则识别所述未读邮件中的内容,作为目标内容;If the first judgment result is that the unread email is related to product sales, identifying the content in the unread email as the target content;
通过预设关键词遍历所述目标内容,以提取所述目标内容中的产品相关信息,其中,所述产品相关信息包括产品名称、销售时间、销售金额;Traversing the target content through preset keywords to extract product-related information in the target content, wherein the product-related information includes product name, sales time, and sales amount;
通过预设的匹配规则,将所述产品名称匹配对应的销售金额,得到产品销售信息,并基于所述销售时间,将所述产品销售信息进行输出。The product name is matched with the corresponding sales amount through a preset matching rule to obtain product sales information, and based on the sales time, the product sales information is output.
本申请实施例提供了一种基于决策树的产品匹配方法、装置、设备及存储介质。本申请实施例实现了通过训练好的决策树自动判断出未读邮件是否与产品销售信息相关,若是,则通过提取邮件中的产品相关信息进行产品匹配,有利于提高产品的匹配效率。Embodiments of the present application provide a decision tree-based product matching method, device, device, and storage medium. The embodiment of the present application realizes automatically judging whether the unread email is related to the product sales information through the trained decision tree, and if so, extracts the product-related information in the email for product matching, which is beneficial to improve the matching efficiency of the product.
附图说明Description of drawings
为了更清楚地说明本申请中的方案,下面将对本申请实施例描述中所需要使用的附图作一个简单介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to illustrate the solution in this application more clearly, a brief introduction will be given below to the accompanying drawings that need to be used in the description of the embodiments of the application. Obviously, the accompanying drawings in the following description are some embodiments of the application. Ordinary technicians can also obtain other drawings based on these drawings on the premise of not paying creative work.
图1是本申请实施例提供的基于决策树的产品匹配方法的应用环境示意图;Fig. 1 is the application environment diagram of the product matching method based on the decision tree provided by the embodiment of the present application;
图2根据本申请实施例提供的基于决策树的产品匹配方法的一实现流程图;Fig. 2 is an implementation flowchart of a decision tree-based product matching method provided according to an embodiment of the present application;
图3是本申请实施例提供的基于决策树的产品匹配方法中子流程的一实现流程图;Fig. 3 is an implementation flowchart of the sub-process in the decision tree-based product matching method provided by the embodiment of the present application;
图4是本申请实施例提供的基于决策树的产品匹配方法中子流程的又一实现流程图;Fig. 4 is another implementation flowchart of the sub-process in the decision tree-based product matching method provided by the embodiment of the present application;
图5是本申请实施例提供的基于决策树的产品匹配方法中子流程的又一实现流程图;Fig. 5 is another implementation flowchart of the sub-process in the decision tree-based product matching method provided by the embodiment of the present application;
图6是本申请实施例提供的基于决策树的产品匹配方法中子流程的又一实现流程图;Fig. 6 is another implementation flowchart of the sub-process in the decision tree-based product matching method provided by the embodiment of the present application;
图7是本申请实施例提供的基于决策树的产品匹配方法中子流程的又一实现流程图;Fig. 7 is another implementation flowchart of the sub-process in the decision tree-based product matching method provided by the embodiment of the present application;
图8是本申请实施例提供的基于决策树的产品匹配方法中子流程的又一实现流程图;Fig. 8 is another implementation flowchart of the sub-process in the decision tree-based product matching method provided by the embodiment of the present application;
图9是本申请实施例提供的基于决策树的产品匹配装置示意图;FIG. 9 is a schematic diagram of a decision tree-based product matching device provided in an embodiment of the present application;
图10是本申请实施例提供的计算机设备的示意图。Fig. 10 is a schematic diagram of a computer device provided by an embodiment of the present application.
具体实施方式detailed description
除非另有定义,本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同;本文中在申请的说明书中所使用的术语只是为了描述具体的实施例的目的,不是旨在于限制本申请;本申请的说明书和权利要求书及上述附图说明中的术语“包括”和“具有”以及它们的任何变形,意图在于覆盖不排他的包含。本申请的说明书和权利要求书或上述附图中的术语“第一”、“第二”等是用于区别不同对象,而不 是用于描述特定顺序。Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by those skilled in the technical field of the application; the terms used herein in the description of the application are only to describe specific embodiments The purpose is not to limit the present application; the terms "comprising" and "having" and any variations thereof in the description and claims of the present application and the description of the above drawings are intended to cover non-exclusive inclusion. The terms "first", "second" and the like in the description and claims of the present application or the above drawings are used to distinguish different objects, rather than to describe a specific order.
在本文中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是,本文所描述的实施例可以与其它实施例相结合。Reference herein to an "embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the present application. The occurrences of this phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is understood explicitly and implicitly by those skilled in the art that the embodiments described herein can be combined with other embodiments.
为了使本技术领域的人员更好地理解本申请方案,下面将结合附图,对本申请实施例中的技术方案进行清楚、完整地描述。In order to enable those skilled in the art to better understand the solutions of the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below in conjunction with the accompanying drawings.
下面结合附图和实施方式对本申请进行详细说明。The present application will be described in detail below in conjunction with the accompanying drawings and embodiments.
请参阅图1,系统架构100可以包括终端设备101、102、103,网络104和服务器105。网络104用以在终端设备101、102、103和服务器105之间提供通信链路的介质。网络104可以包括各种连接类型,例如有线、无线通信链路或者光纤电缆等等。Referring to FIG. 1 , a system architecture 100 may include terminal devices 101 , 102 , 103 , a network 104 and a server 105 . The network 104 is used as a medium for providing communication links between the terminal devices 101 , 102 , 103 and the server 105 . Network 104 may include various connection types, such as wires, wireless communication links, or fiber optic cables, among others.
用户可以使用终端设备101、102、103通过网络104与服务器105交互,以接收或发送消息等。终端设备101、102、103上可以安装有各种通讯客户端应用,例如网页浏览器应用、搜索类应用、即时通信工具等。Users can use terminal devices 101 , 102 , 103 to interact with server 105 via network 104 to receive or send messages and the like. Various communication client applications may be installed on the terminal devices 101, 102, 103, such as web browser applications, search applications, instant messaging tools, and the like.
终端设备101、102、103可以是具有显示屏并且支持网页浏览的各种电子设备,包括但不限于智能手机、平板电脑、膝上型便携计算机和台式计算机等等。The terminal devices 101, 102, 103 may be various electronic devices with display screens and supporting web browsing, including but not limited to smart phones, tablet computers, laptop computers, desktop computers and the like.
服务器105可以是提供各种服务的服务器,例如对终端设备101、102、103上显示的页面提供支持的后台服务器。The server 105 may be a server that provides various services, such as a background server that provides support for pages displayed on the terminal devices 101 , 102 , 103 .
需要说明的是,本申请实施例所提供的基于决策树的产品匹配方法一般由服务器执行,相应地,基于决策树的产品匹配装置一般配置于服务器中。It should be noted that the decision tree-based product matching method provided in the embodiment of the present application is generally executed by a server, and correspondingly, the decision tree-based product matching device is generally configured in the server.
应该理解,图1中的终端设备、网络和服务器的数目仅仅是示意性的。根据实现需要,可以具有任意数目的终端设备、网络和服务器。It should be understood that the numbers of terminal devices, networks and servers in Fig. 1 are only illustrative. According to the implementation needs, there can be any number of terminal devices, networks and servers.
请参阅图2,图2示出了基于决策树的产品匹配方法的一种具体实施方式。Please refer to FIG. 2 , which shows a specific implementation of a decision tree-based product matching method.
需注意的是,若有实质上相同的结果,本申请的方法并不以图2所示的流程顺序为限,该方法包括如下步骤:It should be noted that if there are substantially the same results, the method of the present application is not limited to the flow sequence shown in Figure 2, and the method includes the following steps:
S1:通过定时任务,定时检测目标邮箱的接收情况。S1: Through the timing task, regularly detect the receiving situation of the target mailbox.
在本申请实施例中,为了更清楚的理解技术方案,下面对本申请所涉及的终端进行详细介绍。In the embodiment of the present application, in order to understand the technical solution more clearly, the terminal involved in the present application is introduced in detail below.
一是服务器,服务器能够通过定时监控任务,对用户端的邮箱进行监控,若是接收到未读邮件,则判断该未读邮件是否与产品销售相关,若相关,则提取该邮件的产品相关信息,并基于预设的匹配规则,将该产品相关信息进行匹配,得到产品销售信息,再将产品销售信息进行输出,并生成数据分析报告,将该数据分析报告返回该用户端。One is the server. The server can monitor the mailbox of the client through regular monitoring tasks. If it receives an unread email, it will judge whether the unread email is related to product sales. If it is related, it will extract the product-related information of the email, and Based on the preset matching rules, match the product-related information to obtain product sales information, then output the product sales information, generate a data analysis report, and return the data analysis report to the client.
二是用户端,该用户端的用户可以为产品销售的运营人员,其可以接收服务器所返回数据分析报告。The second is the client terminal. The user of the client terminal can be the operator of the product sales, who can receive the data analysis report returned by the server.
具体的,在申请本实施中,通过定时执行任务技术,实时监控特定邮箱的接收情况,若检测到有未读邮件,则将对该未读邮件进行判断,判断该未读是否与产品销售相关。进一步的,通过判断邮件的读取标识,来判断所述邮件是否已读。其中,该读取标识通过生成的日志文件进行获取,读取标志用以区分邮件处理读取或未读状态。Specifically, in the implementation of this application, the reception of a specific mailbox is monitored in real time through the technology of regularly executing tasks. If an unread email is detected, the unread email will be judged to determine whether the unread email is related to product sales. . Further, by judging the reading identifier of the mail, it is judged whether the mail has been read. Wherein, the read flag is obtained through the generated log file, and the read flag is used to distinguish the read or unread status of mail processing.
S2:若检测到目标邮箱中存在未读邮件,则获取未读邮件的主题与正文内容。S2: If it is detected that there are unread emails in the target mailbox, obtain the subject and body content of the unread emails.
具体的,当检测到未读邮件时,通过所生成的日志文件从目标邮箱中,准确识别出该未读邮件。再抓取出该未读邮件的主题和正文内容,作为后续判断该未读邮件是否与产品销售相关的依据。Specifically, when an unread email is detected, the unread email is accurately identified from the target mailbox through the generated log file. The subject and body content of the unread email are then captured as a basis for subsequently judging whether the unread email is related to product sales.
S3:将主题与正文内容输入到训练好的决策树中,以判断未读邮件是否与产品销售相关,得到第一判断结果。S3: Input the subject and text content into the trained decision tree to judge whether the unread email is related to product sales, and obtain the first judgment result.
具体的,本申请实施例是通过训练决策树,通过训练好的决策树去判断未读邮件是否与产品销售相关。其中,决策树对数据进行处理是利用归纳算法产生分类规则和决策树, 再对新数据进行预测分析。树的终端节点“叶子节点”(Leaf Node),表示分类结果的类别(Class),每个内部节点表示一个变量的测试,分枝(Branch)为测试输出,代表变量的一个可能数值。为达到分类目的,变量值在数据上测试,每一条路径代表一个分类规则。决策树模型通过不断地划分数据,使依赖变量的差别最大,最终目的是将数据分类到不同的组织或不同的分枝,在依赖变量的值上建立最强的归类。依靠深度学习的强大工具,对邮件主题和内容进行信息分类。本申请实施例中,采用了python机器学习的Sklearn模块进行决策树的实现。Specifically, in this embodiment of the present application, a decision tree is trained to determine whether unread emails are related to product sales through the trained decision tree. Among them, the decision tree processes the data by using the induction algorithm to generate classification rules and decision trees, and then predicts and analyzes new data. The terminal node "Leaf Node" of the tree represents the category (Class) of the classification result, each internal node represents a variable test, and the branch (Branch) is the test output, representing a possible value of the variable. For classification purposes, variable values are tested on the data, and each path represents a classification rule. The decision tree model maximizes the difference of dependent variables by continuously dividing the data. The ultimate goal is to classify the data into different organizations or different branches, and establish the strongest classification on the value of dependent variables. Rely on the powerful tools of deep learning to classify the subject and content of emails. In the embodiment of this application, the Sklearn module of python machine learning is used to realize the decision tree.
具体的,决策树是一个预测模型,其代表的是对象属性与对象值之间的一种映射关系。树中每个节点表示某个对象,而每个分叉路径则代表的某个可能的属性值,而每个叶结点则对应从根节点到该叶节点所经历的路径所表示的对象的值。例如,预测一个人是否会购买电脑的决策树,利用这棵树,可以对新记录进行分类,从根节点(年龄)开始,如果某个人的年龄为中年,就直接判断这个人会买电脑,如果是青少年,则需要进一步判断是否是学生;如果是老年则需要进一步判断其信用等级,直到叶子结点可以判定记录的类别。同样的,在本申请实施例中,可利用决策树判断未读邮件是否与产品销售相关。例如,预测未读邮件是否产品销售相关的决策树(该决策树可以通过训练得到),利用这棵树,可以对新的未读邮件进行分类;从根节点(邮件主题与正文内容)开始分类,可能存在“垃圾邮件”、“日常邮件”以及“包括词汇‘产品’的邮件”三种分类;若是分类为“垃圾邮件”、“日常邮件”,则直接判定该邮件与产品销售无关;若是分类为包括词汇‘产品’的邮件”,则与该节点开始进行分类,可能分类分为“存在词汇‘产品销售’的邮件”、“存在词汇‘销售金额’的邮件”以及“不存在词汇‘销售’的邮件”三种结果,若是前两种分类结果,则判定其为与产品销售相关邮件,若是第三种结果,则进一步进行判断,直到叶子结点可以判定邮件的类别。Specifically, a decision tree is a predictive model, which represents a mapping relationship between object attributes and object values. Each node in the tree represents an object, and each bifurcated path represents a possible attribute value, and each leaf node corresponds to the object represented by the path from the root node to the leaf node. value. For example, a decision tree that predicts whether a person will buy a computer, using this tree, can classify new records, starting from the root node (age), if a person's age is middle-aged, it is directly judged that this person will buy a computer , if it is a teenager, it needs to further judge whether it is a student; if it is an old person, it needs to further judge its credit rating until the leaf node can determine the category of the record. Similarly, in this embodiment of the application, a decision tree can be used to determine whether unread emails are related to product sales. For example, a decision tree that predicts whether an unread email is related to product sales (the decision tree can be obtained through training), using this tree, can classify new unread emails; start classification from the root node (mail subject and body content) , there may be three classifications of "spam mail", "daily mail" and "mail including the word 'product'"; if it is classified as "spam mail" or "daily mail", it is directly determined that the mail has nothing to do with product sales; if Classified as emails containing the word 'product'", then start to classify from this node, and may be classified into "mails with the word 'sales'", "mails with the word 'sales'" and "no word' If it is the first two classification results, it is determined to be an email related to product sales; if it is the third result, further judgment is made until the leaf node can determine the category of the email.
请参阅图3,图3示出了步骤S3的一种具体实施方式,详叙如下:Please refer to Fig. 3, Fig. 3 has shown a kind of specific implementation of step S3, described in detail as follows:
S31:将主题与正文内容输入到训练好的决策树中。S31: Input the subject and text content into the trained decision tree.
S32:通过训练好的决策树对主题与正文内容进行节点分类,得到当前节点分类结果。S32: Use the trained decision tree to classify the nodes of the topic and text content, and obtain the current node classification result.
具体的,通过训练好的决策树对输入的主题与正文内容进行节点分类,从根节点开始预测下个节点,从而得到当前节点分类结果。其中,节点分类是指决策树根据输入数据选择一个最优特征,按着这一数据特征将输入数据集分割成子集,使得各个子集有一个在当前条件下最好的分类。Specifically, through the trained decision tree, the input topic and text content are classified into nodes, and the next node is predicted from the root node, so as to obtain the classification result of the current node. Among them, node classification means that the decision tree selects an optimal feature according to the input data, and divides the input data set into subsets according to this data feature, so that each subset has the best classification under the current conditions.
S33:基于当前节点分类结果判断未读邮件的最终叶子节点,作为预测叶子节点。S33: Determine the final leaf node of the unread mail based on the classification result of the current node as the predicted leaf node.
具体的,通过训练好的决策树对当前节点分类结果进行预测下一叶子节点,得到未读邮件中的最终叶子节点,通过判断该最终叶子节点,可以获取到对该未读邮件的分类结果。在本实施例中,对未读邮件的分类结果分为“与产品销售相关”、“与产品销售无关”。Specifically, predict the next leaf node of the classification result of the current node through the trained decision tree to obtain the final leaf node in the unread mail, and obtain the classification result of the unread mail by judging the final leaf node. In this embodiment, the classification results of unread emails are divided into "related to product sales" and "not related to product sales".
S34:通过对预测叶子节点的预测结果,得到第一判断结果。S34: Obtain a first judgment result by predicting the prediction result of the leaf node.
具体的,对主题与正文内容中的每一条记录,通过决策树对其进行分类预测,在每一个节点上都根据当前节点的分类结果集合判断该条记录应进入哪一个子节点,直到到达某个叶子节点。通过当前叶子节点获得一个预测值,根据训练还得决策树的预测结果,得到所有预测结果中概率最大的那个分类结果,输出所有记录的分类结果,从而得到第一判断结果。其中,第一判断结果分为未读邮件与产品销售相关以及未读邮件与产品销售无关两种结果。Specifically, for each record in the subject and text content, classify and predict it through the decision tree, and judge which child node the record should enter according to the classification result set of the current node on each node, until reaching a certain a leaf node. Obtain a prediction value through the current leaf node, and obtain the classification result with the highest probability among all prediction results according to the prediction result of the decision tree after training, and output the classification results of all records, so as to obtain the first judgment result. Wherein, the first judgment result is divided into two results: the unread emails are related to product sales and the unread emails are not related to product sales.
本实施例中,通过训练好的决策树对主题与正文内容进行节点分类,得到当前节点分类结果,并基于当前节点分类结果判断未读邮件的最终叶子节点,作为预测叶子节点,然后通过对预测叶子节点的预测结果,得到第一判断结果,实现从众多邮件中,快速筛选出与产品销售相关的未读邮件,有利于快速对邮件进行分析,从而有利于提高产品的匹配效率。In this embodiment, the subject and text content are classified through the trained decision tree to obtain the current node classification result, and based on the current node classification result, the final leaf node of the unread email is judged as the predicted leaf node, and then the predicted The prediction result of the leaf node obtains the first judgment result, so that unread emails related to product sales can be quickly filtered out from many emails, which is conducive to rapid analysis of emails and thus helps to improve product matching efficiency.
请参阅图4,图4示出了步骤S3之前的一种具体实施方式,详叙如下:Please refer to FIG. 4, which shows a specific implementation before step S3, which is described in detail as follows:
S3A:获取样本邮件,并抓取样本邮件中的内容,得到样本训练数据。S3A: Obtain sample emails, and grab the content in the sample emails to obtain sample training data.
具体的,样本邮件分为与产品销售相关的邮件以及与产品销售无关的邮件,其用来训练决策树。抓取样本邮件中的内容包括邮件主题、正文内容以及附件内容,并将该内容作为样本训练数据。Specifically, sample emails are divided into emails related to product sales and emails not related to product sales, which are used to train the decision tree. Grab the content in the sample email, including the email subject, body content, and attachment content, and use the content as sample training data.
S3B:识别样本训练数据中的产品特征属性,其中,产品特征属性为样本训练数据中产品的固有属性。S3B: identifying product feature attributes in the sample training data, wherein the product feature attributes are inherent attributes of products in the sample training data.
具体的,产品特征属性为样本训练数据中产品的固有属性,例如产品名称、产品种类、产品销售金额、产品销售时间等。Specifically, the product characteristic attribute is an inherent attribute of the product in the sample training data, such as product name, product category, product sales amount, product sales time, and the like.
S3C:根据产品特征属性对样本训练数据中的数据进行特征选择,生成目标训练集,其中,目标训练集分为训练数据和测试数据。S3C: Perform feature selection on the data in the sample training data according to the product feature attributes to generate a target training set, wherein the target training set is divided into training data and test data.
具体的,通过产品特征属性对样本训练数据中的数据进行特征选择,选取与产品特征属性的数据,从而生成目标训练数据。其中,特征选择是指选取与产品特征属性相关的数据。例如,若该样本训练数据集是50列的数据集,在经过特征选择后,剩余的只有10列数据,则该10列数据就会作为目标训练集。此外,属性选择既可以缩小数据集的大小,又可以提高决策树模型的预测效果,而在进行特征选择时,所用到的方法一般是通过算法选择(如:基于信息增益的特征选择算法)和人工选择相结合的方式来进行的选择的。Specifically, feature selection is performed on the data in the sample training data through product feature attributes, and data with product feature attributes is selected to generate target training data. Among them, feature selection refers to selecting data related to product feature attributes. For example, if the sample training data set is a data set of 50 columns, after feature selection, there are only 10 columns of data remaining, then the 10 columns of data will be used as the target training set. In addition, attribute selection can not only reduce the size of the data set, but also improve the prediction effect of the decision tree model. When performing feature selection, the method used is generally through algorithm selection (such as: feature selection algorithm based on information gain) and The selection is carried out by a combination of artificial selection.
S3D:采用决策树算法对目标训练集进行训练,得到训练好的决策树。S3D: Use the decision tree algorithm to train the target training set to obtain a trained decision tree.
具体的,基于决策树从输入的样本邮件中,根据特征属性选择对预测过程有用的属性数据。也即在训练本申请实施例中的决策树时,通过将产品相关信息,如产品名称、产品种类、销售金额、销售时间等作为产品特征属性。将这些特征数据对训练数据进特征筛选,选取具备与产品销售相关的邮件,从而得到目标训练集,再将目标训练集根据预设比例分为训练数据和测试数据,训练数据用来训练决策树,测试数据用来验证所训练的决策树,且该预设比例根据实际情况进行设定,此处不做限定,在一具体实施例中,预设比例为8:2。Specifically, from the input sample emails based on the decision tree, attribute data useful for the prediction process are selected according to the feature attributes. That is, when training the decision tree in the embodiment of the present application, product-related information, such as product name, product category, sales amount, sales time, etc., are used as product feature attributes. Use these feature data to filter the training data, and select emails related to product sales to obtain the target training set, and then divide the target training set into training data and test data according to the preset ratio. The training data is used to train the decision tree , the test data is used to verify the trained decision tree, and the preset ratio is set according to the actual situation, which is not limited here. In a specific embodiment, the preset ratio is 8:2.
本实施例中,通过获取样本邮件,并抓取样本邮件中的内容,得到样本训练数据,识别样本训练数据中的产品特征属性,再根据产品特征属性对样本训练数据中的数据进行特征选择,生成目标训练集,然后采用决策树算法对目标训练集进行训练,得到训练好的决策树,实现通过邮件信息训练决策树,便于后续通过该决策树快速判断未读邮件是否与产品销售相关,从而提高识别邮件中的产品相关信息,进而有利于提高产品的匹配效率。In this embodiment, by obtaining sample emails and grabbing the content in the sample emails, the sample training data is obtained, the product feature attributes in the sample training data are identified, and then feature selection is performed on the data in the sample training data according to the product feature attributes. Generate the target training set, and then use the decision tree algorithm to train the target training set to obtain a trained decision tree, and realize the training of the decision tree through the email information, so that it is convenient for the follow-up to quickly judge whether the unread email is related to product sales through the decision tree, thereby Improve the identification of product-related information in emails, which is conducive to improving product matching efficiency.
请参阅图5,图5示出了步骤S3D的一种具体实施方式,详叙如下:Please refer to FIG. 5, which shows a specific implementation of step S3D, which is described in detail as follows:
S3D1:采用ID3算法,将训练数据对决策树进行节点计算,得到节点特征。S3D1: Using the ID3 algorithm, the training data is used to calculate the nodes of the decision tree to obtain the node characteristics.
其中,ID3算法最早是由罗斯昆(J.Ross Quinlan)于1975年在悉尼大学提出的一种分类预测算法,算法的核心是“信息熵”。ID3算法通过计算每个属性的信息增益,认为信息增益高的是好属性,每次划分选取信息增益最高的属性为划分标准,重复这个过程,直至生成一个能完美分类训练样例的决策树。在本实施例中,通过ID3算法训练与产品销售相关邮件对应的决策树。其中,节点计算是指从根结点(root node)开始,对所有可能的特征的信息增益,选择信息增益最大的特征作为结点的特征。Among them, the ID3 algorithm was first proposed by J. Ross Quinlan at the University of Sydney in 1975 as a classification prediction algorithm. The core of the algorithm is "information entropy". The ID3 algorithm calculates the information gain of each attribute, and considers that the attribute with high information gain is a good attribute. Each division selects the attribute with the highest information gain as the division standard, and repeats this process until a decision tree that can perfectly classify training samples is generated. In this embodiment, the ID3 algorithm is used to train the decision tree corresponding to the emails related to product sales. Among them, node calculation refers to starting from the root node (root node), and selecting the feature with the largest information gain as the feature of the node for the information gain of all possible features.
S3D2:基于节点特征,对决策树进行递归计算,其中,每次递归计算得到一个基础决策树。S3D2: Based on node characteristics, recursively calculate the decision tree, where each recursive calculation obtains a basic decision tree.
具体的,通过对决策树进行递归计算,每次递归计算,都选择信息增益最大的特征作为下一计算的节点特征,同时每次递归计算都得到一个决策树。Specifically, by performing recursive calculations on the decision tree, each recursive calculation selects the feature with the largest information gain as the node feature for the next calculation, and at the same time, each recursive calculation obtains a decision tree.
S3D3:通过测试数据对基础决策树进行测试计算,得到误差值。S3D3: Test and calculate the basic decision tree through the test data to obtain the error value.
S3D4:当误差值小于预设阈值时,停止递归计算,得到训练好的决策树。S3D4: When the error value is less than the preset threshold, stop the recursive calculation and get the trained decision tree.
具体的,在决策树模型训练中,一般会有两种模型测试的方法,一种是将训练集中的数据分成两部分,将一部分数据用来做训练生成决策树(即训练数据),一部分数据用来做测试(即测试数据),其中,一般在测试数据中选择测试例;另一种方法是采用n一折交 叉验证法,将训练集中的数据分为n折,若将数据分为10份,取其中8份用来做训练生成决策树,剩下的2份用来做测试,作为测试例进行测试,直到将10份数据全都作为测试例分别进行测试,则完成整个测试过程。在本实施例中,每次得到的基础决策树,都通过测试数据对基础决策树进行测试计算,得到误差值,直到该误差值小于预设阈值时,停止递归计算,从而得到训练好的决策树。其中,误差值是指基础决策树的分类结果与实际分类结果的相差值。Specifically, in decision tree model training, there are generally two methods of model testing. One is to divide the data in the training set into two parts, and use part of the data for training to generate a decision tree (ie training data), and part of the data It is used for testing (that is, test data), wherein, generally, test cases are selected in the test data; another method is to use the n-fold cross-validation method to divide the data in the training set into n folds. If the data is divided into 10 Take 8 of them for training to generate a decision tree, and the remaining 2 for testing, as test cases for testing, until all 10 data are used as test cases for testing separately, then the entire testing process is completed. In this embodiment, each time the basic decision tree is obtained, the basic decision tree is tested and calculated through the test data to obtain the error value, and the recursive calculation is stopped until the error value is less than the preset threshold, so as to obtain the trained decision Tree. Among them, the error value refers to the difference between the classification result of the basic decision tree and the actual classification result.
需要说明的是,预设阈值根据实际情况进行设定,此处不做限定。在一具体实施例中,预设阈值为0.05。It should be noted that the preset threshold is set according to actual conditions, and is not limited here. In a specific embodiment, the preset threshold is 0.05.
本实施例中,采用ID3算法,将训练数据对决策树进行节点计算,得到节点特征,并基于节点特征,对决策树进行递归计算,再通过测试数据对基础决策树进行测试计算,得到误差值,且当误差值小于预设阈值时,停止递归计算,得到训练好的决策树,实现对决策树的训练,获取到与产品销售相关的决策树,便于提高识别未读邮件是否与产品相关的准确率。In this embodiment, the ID3 algorithm is used to calculate the nodes of the decision tree with the training data to obtain the node characteristics, and based on the node characteristics, the decision tree is recursively calculated, and then the basic decision tree is tested and calculated by the test data to obtain the error value , and when the error value is less than the preset threshold, stop the recursive calculation, get the trained decision tree, realize the training of the decision tree, and obtain the decision tree related to product sales, which is convenient for improving the identification of whether unread emails are related to the product Accuracy.
S4:若第一判断结果为未读邮件与产品销售相关,则识别未读邮件中的内容,作为目标内容。S4: If the first judgment result is that the unread emails are related to product sales, identify the content in the unread emails as the target content.
具体的,若是第一判断结果为未读邮件与产品销售相关,则需要对该未读邮件进行产品销售进行分析,所以需要识别该未读邮件中的所有内容,作为目标内容。上述步骤仅识别出未读邮件的主题和正文内容,但是在实际情况下,邮件可能会存在附件的情况,该附件也可以包括产品销售的相关信息。所以需要判断该未读邮件是否包括附件,若是包括附件,则进一步对该附件进行识别和解析,获取附件当中的内容,将附件中的内容与主题、正文内容共同作为目标内容。Specifically, if the first judgment result is that the unread email is related to product sales, the unread email needs to be analyzed for product sales, so all content in the unread email needs to be identified as the target content. The above steps only identify the subject and body content of unread emails, but in actual situations, emails may have attachments, and the attachments may also include information about product sales. Therefore, it is necessary to determine whether the unread email includes an attachment, and if it includes an attachment, further identify and analyze the attachment, obtain the content in the attachment, and use the content in the attachment together with the subject and body content as the target content.
请参阅图6,图6示出了步骤S4的一种具体实施方式,详叙如下:Referring to Fig. 6, Fig. 6 shows a specific implementation of step S4, which is described in detail as follows:
S41:若第一判断结果为未读邮件与产品销售相关,则将主题与正文内容作为目标内容,并检测未读邮件是否包括附件,得到检测结果。S41: If the first judgment result is that the unread emails are related to product sales, set the subject and body content as target content, and detect whether the unread emails include attachments to obtain a detection result.
具体的,若第一判断结果为未读邮件与产品销售相关,先将主题与正文内容作为目标内容,再判断未读邮件中是否包括附件。Specifically, if the first judgment result is that the unread email is related to product sales, the subject and body content are taken as the target content first, and then it is judged whether the unread email includes attachments.
S42:若检测结果为未读邮件包括附件,则判断附件的文本类型,得到第二判断结果。S42: If the detection result is that the unread email includes an attachment, judge the text type of the attachment, and obtain a second judgment result.
具体的,若是未读邮件中包括附件,再判断附件的文本类型,便于根据附件的文本类型,采用对应的方式获取附件中的目标内容。Specifically, if the unread email includes an attachment, then determine the text type of the attachment, so as to obtain the target content in the attachment in a corresponding manner according to the text type of the attachment.
S43:若第二判断结果为附件的文本类型为word文档,则读取word文档对应的文字内容,作为目标内容。S43: If the second determination result is that the text type of the attachment is a word document, read the text content corresponding to the word document as the target content.
具体的,若附件的文本类型为word文档,则可以直接读取word文档对应的文字内容,并将其作为目标内容。Specifically, if the text type of the attachment is a word document, the text content corresponding to the word document can be directly read and used as the target content.
S44:若第二判断结果为附件的文本类型为文本型PDF,则通过java类包解析的方式,读取出文本型PDF的文字内容,作为目标内容。S44: If the second judgment result is that the text type of the attachment is a text-type PDF, read out the text content of the text-type PDF as the target content by means of java class package analysis.
具体的,java类包包括PDFBox、iText、XPDF,该PDFBox(一个BSD许可下的源码开放项目)是一个为开发人员读取和创建PDF文档而准备的纯Java类库,可以提取文本;iText用于能够快速产生PDF文档的一个java类库,通过iText不仅可以生成PDF或rtf的文档,而且可以将XML、Html文件转化为PDF文件;XPDF是一个开源项目,可以调用对应的本地方法来实现抽取中文pdf文件。Specifically, the java class package includes PDFBox, iText, and XPDF. The PDFBox (an open source project under a BSD license) is a pure Java class library prepared for developers to read and create PDF documents, which can extract text; Based on a java class library that can quickly generate PDF documents, not only PDF or rtf documents can be generated through iText, but also XML and Html files can be converted into PDF files; XPDF is an open source project, and corresponding local methods can be called to achieve extraction Chinese pdf file.
S45:若第二判断结果为附件的文本类型为图片型PDF,则通过OCR识别的方式,读取出图片型PDF的文字内容,作为目标内容。S45: If the second judging result is that the text type of the attachment is a picture-type PDF, read out the text content of the picture-type PDF as the target content by way of OCR recognition.
其中,OCR(optical character recognition)识别的方式是指对文本资料进行扫描,然后对图像文件进行分析处理,获取文字及版面信息的的方式。在本实施例中,通过OCR识别的方式,读取出图片型PDF的文字内容。Among them, the OCR (optical character recognition) recognition method refers to the method of scanning text data, and then analyzing and processing image files to obtain text and layout information. In this embodiment, the text content of the picture-type PDF is read out through OCR recognition.
本实施例中,若是未读邮件与产品销售相关,则将主题与正文内容作为目标内容,并 检测未读邮件是否包括附件,得到检测结果,若存在附件,则根据附件的文本类型,采用对应的解析方式获取对应的目标内容,实现精确获取目标内容,有利于提高后续识别产品相关信息,进而有利于提高产品的匹配效率。In this embodiment, if the unread mail is related to product sales, the subject and body content are used as the target content, and whether the unread mail includes an attachment is detected to obtain the detection result. If there is an attachment, the corresponding text type is used according to the attachment text type. Acquiring the corresponding target content through the analysis method, realizing accurate acquisition of the target content, is conducive to improving the subsequent identification of product-related information, which in turn is conducive to improving the efficiency of product matching.
S5:通过预设关键词遍历目标内容,以提取目标内容中的产品相关信息,其中,产品相关信息包括产品名称、销售时间、销售金额。S5: Traversing the target content through preset keywords to extract product-related information in the target content, wherein the product-related information includes product name, sales time, and sales amount.
具体的,通过产品、产品名称、金额、销售金额、销售时间等作为预设关键词,并结合预设关键词遍历目标内容,从而提取到目标内容中的产品相关信息,其中,产品相关信息包括产品名称、销售时间以及销售金额等。例如,通过遍历目标内容,获取到内容中存在“此次销售产品:xxx小吃”,则定位到目标内容关于产品的地方,通过获取预设关键词“产品”后分隔符的内容,从而得到对应产品名称。相应的,销售时间和销售金额,只要定位到对应预设关键词,获取预设关键词后续的数据和时间,则作为销售金额和销售时间。其中,预设关键词根据产品销售相关信息进行设定,可以为产品、金额、销售等等词语。Specifically, the product, product name, amount, sales amount, sales time, etc. are used as preset keywords, and combined with preset keywords to traverse the target content, thereby extracting product-related information in the target content, wherein the product-related information includes Product name, sales time and sales amount, etc. For example, by traversing the target content, it is obtained that there is "this sale product: xxx snack" in the content, then locate the part of the target content about the product, and obtain the corresponding product name. Correspondingly, for the sales time and sales amount, as long as the corresponding preset keywords are located and the subsequent data and time of the preset keywords are obtained, they will be used as the sales amount and sales time. Wherein, the preset keywords are set according to information related to product sales, and may be words such as product, amount, and sales.
请参阅图7,图7示出了步骤S5的一种具体实施方式,详叙如下:Please refer to Figure 7, Figure 7 shows a specific implementation of step S5, described in detail as follows:
S51:通过预设关键词遍历目标内容,以获取目标内容中的包括预设关键词的句子,作为关键句子。S51: Traversing the target content through preset keywords to acquire sentences including preset keywords in the target content as key sentences.
具体的,通过识别目标内容中的预设关键词,来获取到目标内容中预设关键词的句子,作为关键句子,用以后续识别对应的产品相关信息。Specifically, by identifying the preset keywords in the target content, the sentences of the preset keywords in the target content are obtained as key sentences for subsequent identification of corresponding product-related information.
S52:获取关键句子中与预设关键词相对应的数据,得到销售时间和销售金额。S52: Obtain data corresponding to preset keywords in key sentences, and obtain sales time and sales amount.
具体的,若是邮件是与产品销售相关的,往往目标内容中都包括产品销售的相关金额和产品销售的相关时间。通过获取关键句子中中与预设关键词所对应的数据,得到销售时间和销售金额,例如,关键句子为“在此次商场活动中目标的销售金额为30000元,销售时间2021-5-3”,通过识别预设关键词对应的数据,则可以得到销售金额30000,销售时间2021-5-3。Specifically, if the email is related to product sales, the target content often includes the relevant amount of product sales and the relevant time of product sales. By obtaining the data corresponding to the preset keywords in the key sentence, the sales time and sales amount are obtained. For example, the key sentence is "The target sales amount in this shopping mall activity is 30,000 yuan, and the sales time is 2021-5-3 ", by identifying the data corresponding to the preset keywords, you can get the sales amount of 30,000 and the sales time of 2021-5-3.
S53:通过分隔符对关键句子进行拆分,以获取产品名称。S53: Split the key sentence by a delimiter to obtain a product name.
具体的,在关键句子中,通过分隔符对其进行拆分,获取预设关键词,例如“产品”之后的信息,从而得到产品名称。例如,关键句子为“这次我们的销售产品:XXX商品”。进一步的,可以通过正则匹配的方式,获取目标内容中的产品相关信息。Specifically, in the key sentence, it is split by a delimiter to obtain preset keywords, such as information after "product", so as to obtain the product name. For example, the key sentence is "This time our sale product: XXX item". Further, the product-related information in the target content can be obtained through regular matching.
本实施例中,通过预设关键词遍历目标内容,以获取目标内容中的包括预设关键词的句子,作为关键句子,再获取关键句子中与预设关键词所对应的数据,得到销售时间和销售金额,并通过分隔符对关键句子进行拆分,获取得产品名称,实现从目标内容中精准识别出对应的产品相关信息,从而有利于提高产品的匹配效率。In this embodiment, the preset keywords are used to traverse the target content to obtain sentences including the preset keywords in the target content as key sentences, and then the data corresponding to the preset keywords in the key sentences are obtained to obtain the sales time and the sales amount, and split the key sentences through delimiters to obtain the product name, so as to accurately identify the corresponding product-related information from the target content, which is conducive to improving the matching efficiency of products.
请参阅图8,图8示出了步骤S53的一种具体实施方式,详叙如下:Referring to FIG. 8, FIG. 8 shows a specific implementation of step S53, which is described in detail as follows:
S54:判断目标内容中是否存在表格,得到第三判断结果。S54: Judging whether there is a table in the target content, and obtaining a third judging result.
S55:若第三判断结果为目标内容中存在表格,则对表格进行解析,以获取表格对应的表头信息。S55: If the third determination result is that there is a table in the target content, then parse the table to obtain header information corresponding to the table.
具体的,由于邮件中的正文内容和附件内容中可能存在表格,该表格也有可能为产品的相关信息。所以需要目标内容中是否存在表格,若存在,则对该表格进行解析,以获取表头信息,再对通过对该表头信息来判断该表格信息是否与产品销售相关。Specifically, since there may be a form in the body content and attachment content of the email, the form may also be related information of the product. Therefore, it is necessary to determine whether there is a table in the target content, and if so, parse the table to obtain the header information, and then judge whether the table information is related to product sales based on the header information.
S56:判断表头信息是否与预设关键词相匹配,得到第四判断结果。S56: Judging whether the header information matches a preset keyword, and obtaining a fourth judging result.
S57:若第四判断结果为表头信息与预设关键词相匹配,则基于表头信息,获取产品相关信息。S57: If the fourth determination result is that the header information matches the preset keyword, then obtain product-related information based on the header information.
具体的,若是预设关键词中存在与表头信息相匹配的关键词,则第四判断结果为表头信息与预设关键词相匹配,否则,第四判断结果为表头信息与预设关键词不相匹配。在第四判断结果为表头信息与预设关键词相匹配时,获取表格中表头信息对应的数据,则可以获取到产品相关信息。Specifically, if there is a keyword matching the header information in the preset keywords, the fourth judgment result is that the header information matches the preset keyword; otherwise, the fourth judgment result is that the header information matches the preset keyword. Keywords do not match. When the fourth judgment result is that the header information matches the preset keywords, the data corresponding to the header information in the table is obtained, and then the product-related information can be obtained.
本实施中,若目标内容中存在表格,则获取表格中的表头信息,并判断该表格是否与预设关键词相匹配,若是,则获取对应的产品相关信息,实现了从表格中获取产品相关信息,使得不遗漏对应的产品相关信息,从而有利于后续匹配对应的产品销售信息。In this implementation, if there is a table in the target content, the header information in the table is obtained, and it is judged whether the table matches the preset keywords. If so, the corresponding product-related information is obtained, and the product is obtained from the table. Relevant information, so that the corresponding product-related information is not missed, so as to facilitate subsequent matching of corresponding product sales information.
S6:通过预设的匹配规则,将产品名称匹配对应的销售金额,得到产品销售信息,并基于销售时间,将产品销售信息进行输出。S6: Match the product name with the corresponding sales amount through the preset matching rules to obtain product sales information, and output the product sales information based on the sales time.
具体的,将获取到的产品相关信息按照预设的匹配规则,将产品名称匹配对应的销售金额,并结合销售时间点,通过selenuim结合requests技术设置定时生效模式或者立即重启模式的产品状态配置。进一步的,可以通过配置人组以及渠道后,特定人组与特定的渠道能准确获得销售额度。同时在完成任务产品分配任务后,可以生成分析报告生成,并将发送报告给运营人员。Specifically, the obtained product-related information is matched with the corresponding sales amount according to the preset matching rules, and combined with the sales time point, through selenuim combined with the requests technology to set the product status configuration of the timing effective mode or the immediate restart mode. Furthermore, after configuring the person group and the channel, the specific person group and the specific channel can accurately obtain the sales volume. At the same time, after the task product assignment task is completed, an analysis report can be generated and sent to the operator.
本实施中,通过定时任务,定时检测邮箱的接收情况,若检测到有未读邮件时,则获取未读邮件的主题与正文内容;将主题与正文内容输入到训练好的决策树进行决策判断,判断未读邮件是否与产品销售相关,得到判断结果;若判断结果为未读邮件与产品销售相关,则识别未读邮件中的内容,作为目标内容;通过预设关键词遍历目标内容,以提取目标内容中产品相关信息;通过预设的匹配规则,将产品名称匹配对应的销售金额,得到产品销售信息,并基于销售时间,将产品销售信息进行输出。实现了通过训练好的决策树自动判断出未读邮件是否与产品销售信息相关,若是,则通过提取邮件中的产品相关信息进行产品匹配,有利于提高产品的匹配效率。并且本申请还通过定时任务,检测邮件情况,及时对产品相关邮件进行处理;本申请还结合邮件内容的对应格式,采用不同的解析方式,获取对应的产品信息,有利于提高产品的匹配效率。In this implementation, the receiving status of the mailbox is regularly detected through the scheduled task, and if an unread email is detected, the subject and text content of the unread email are obtained; the subject and text content are input into the trained decision tree for decision-making and judgment , judge whether the unread email is related to product sales, and get the judgment result; if the judgment result is that the unread email is related to product sales, then identify the content in the unread email as the target content; traverse the target content through preset keywords, and use Extract the product-related information in the target content; match the product name with the corresponding sales amount through the preset matching rules to obtain product sales information, and output the product sales information based on the sales time. The trained decision tree is used to automatically determine whether unread emails are related to product sales information. If so, product matching is carried out by extracting product-related information in emails, which is conducive to improving product matching efficiency. In addition, this application also uses timing tasks to detect emails and process product-related emails in a timely manner; this application also combines the corresponding format of email content and adopts different analysis methods to obtain corresponding product information, which is conducive to improving product matching efficiency.
需要强调的是,为进一步保证上述产品销售信息的私密和安全性,上述产品销售信息还可以存储于一区块链的节点中。It should be emphasized that, in order to further ensure the privacy and security of the above-mentioned product sales information, the above-mentioned product sales information can also be stored in a node of a block chain.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,该计算机可读指令可存储于一计算机可读取存储介质中,该计算机可读指令在执行时,可包括如上述各方法的实施例的流程。其中,前述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)等非易失性存储介质,或随机存储记忆体(Random Access Memory,RAM)等。Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented by instructing related hardware through computer-readable instructions, and the computer-readable instructions can be stored in a computer-readable storage medium. , when the computer-readable instructions are executed, they may include the processes of the embodiments of the above-mentioned methods. Wherein, the aforementioned storage medium may be a nonvolatile storage medium such as a magnetic disk, an optical disk, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM).
请参考图9,作为对上述图2所示方法的实现,本申请提供了一种基于决策树的产品匹配装置的一个实施例,该装置实施例与图2所示的方法实施例相对应,该装置具体可以应用于各种电子设备中。Please refer to FIG. 9 , as an implementation of the method shown in FIG. 2 above, the present application provides an embodiment of a decision tree-based product matching device, which corresponds to the method embodiment shown in FIG. 2 . The device can be specifically applied to various electronic devices.
如图9所示,本实施例的基于决策树的产品匹配装置包括:接收情况检测模块71、未读邮件获取模块72、未读邮件判断模块73、目标内容识别模块74、产品相关信息提取模块75及产品销售信息输出模块76,其中:As shown in Figure 9, the product matching device based on the decision tree of the present embodiment includes: a receiving situation detection module 71, an unread mail acquisition module 72, an unread mail judgment module 73, a target content identification module 74, and a product-related information extraction module 75 and product sales information output module 76, wherein:
接收情况检测模块71,用于通过定时任务,定时检测目标邮箱的接收情况;The receiving situation detection module 71 is used to regularly detect the receiving situation of the target mailbox through a timing task;
未读邮件获取模块72,用于若检测到目标邮箱中存在未读邮件,则获取未读邮件的主题与正文内容;The unread mail acquisition module 72 is used to obtain the subject and text content of the unread mail if it is detected that there are unread mails in the target mailbox;
未读邮件判断模块73,用于将主题与正文内容输入到训练好的决策树中,以判断未读邮件是否与产品销售相关,得到第一判断结果;The unread email judging module 73 is used to input the subject and text content into the trained decision tree to judge whether the unread email is related to product sales and obtain the first judgment result;
目标内容识别模块74,用于若第一判断结果为未读邮件与产品销售相关,则识别未读邮件中的内容,作为目标内容;The target content recognition module 74 is used to identify the content in the unread mail as the target content if the first judgment result is that the unread mail is related to product sales;
产品相关信息提取模块75,用于通过预设关键词遍历目标内容,以提取目标内容中的产品相关信息,其中,产品相关信息包括产品名称、销售时间、销售金额;The product-related information extraction module 75 is used to traverse the target content through preset keywords to extract product-related information in the target content, wherein the product-related information includes product name, sales time, and sales amount;
产品销售信息输出模块76,用于通过预设的匹配规则,将产品名称匹配对应的销售金额,得到产品销售信息,并基于销售时间,将产品销售信息进行输出。The product sales information output module 76 is configured to match the product name with the corresponding sales amount through preset matching rules to obtain product sales information, and output the product sales information based on the sales time.
进一步的,未读邮件判断模块73包括:Further, the unread mail judging module 73 includes:
内容输入单元,用于将主题与正文内容输入到训练好的决策树中;The content input unit is used to input the subject and text content into the trained decision tree;
节点分类单元,用于通过训练好的决策树对主题与正文内容进行节点分类,得到当前节点分类结果;The node classification unit is used to classify the topics and text content through the trained decision tree to obtain the current node classification results;
节点预测单元,用于基于当前节点分类结果判断未读邮件的最终叶子节点,作为预测叶子节点;The node prediction unit is used to judge the final leaf node of the unread mail based on the current node classification result, as the predicted leaf node;
预测结果推测单元,用于通过对预测叶子节点的预测结果,得到第一判断结果。The prediction result inferring unit is configured to obtain the first judgment result based on the prediction result of the prediction leaf node.
进一步的,在未读邮件判断模块73之前,还包括:Further, before the unread mail judging module 73, it also includes:
样本训练数据模块,用于获取样本邮件,并抓取样本邮件中的内容,得到样本训练数据;The sample training data module is used to obtain sample emails, and grab the content in the sample emails to obtain sample training data;
产品特征属性识别模块,用于识别样本训练数据中的产品特征属性,其中,产品特征属性为样本训练数据中产品的固有属性;The product characteristic attribute identification module is used to identify the product characteristic attribute in the sample training data, wherein the product characteristic attribute is the inherent attribute of the product in the sample training data;
目标训练集生成模块,用于根据产品特征属性对样本训练数据中的数据进行特征选择,生成目标训练集,其中,目标训练集分为训练数据和测试数据;The target training set generation module is used to perform feature selection on the data in the sample training data according to the product feature attributes to generate a target training set, wherein the target training set is divided into training data and test data;
目标训练集训练模块,用于采用决策树算法对目标训练集进行训练,得到训练好的决策树。The target training set training module is used to train the target training set using the decision tree algorithm to obtain a trained decision tree.
进一步的,目标训练集训练模块包括:Further, the target training set training module includes:
节点特征获取单元,用于采用ID3算法,将训练数据对决策树进行节点计算,得到节点特征;The node feature acquisition unit is used to adopt the ID3 algorithm to perform node calculation on the decision tree with the training data to obtain the node feature;
递归计算单元,用于基于节点特征,对决策树进行递归计算,其中,每次递归计算得到一个基础决策树;A recursive calculation unit, configured to recursively calculate the decision tree based on node characteristics, wherein each recursive calculation obtains a basic decision tree;
误差值生成单元,用于通过测试数据对基础决策树进行测试计算,得到误差值;An error value generation unit is used to test and calculate the basic decision tree through the test data to obtain an error value;
递归计算结束单元,用于当误差值小于预设阈值时,停止递归计算,得到训练好的决策树。The recursive calculation end unit is used to stop the recursive calculation when the error value is less than a preset threshold, and obtain a trained decision tree.
进一步的,目标内容识别模块74包括:Further, the target content identification module 74 includes:
检测结果获取单元,用于若第一判断结果为未读邮件与产品销售相关,则将主题与正文内容作为目标内容,并检测未读邮件是否包括附件,得到检测结果;The detection result acquisition unit is used for if the first judgment result is that the unread email is related to product sales, then use the subject and body content as the target content, and detect whether the unread email includes attachments to obtain the detection result;
第二判断结果生成单元,用于若检测结果为未读邮件包括附件,则判断附件的文本类型,得到第二判断结果;The second judgment result generation unit is used to judge the text type of the attachment if the detection result is that the unread mail includes an attachment, and obtain a second judgment result;
第一结果生成单元,用于若第二判断结果为附件的文本类型为word文档,则读取word文档对应的文字内容,作为目标内容;The first result generation unit is used to read the text content corresponding to the word document as the target content if the second judgment result is that the text type of the attachment is a word document;
第二结果生成单元,用于若第二判断结果为附件的文本类型为文本型PDF,则通过java类包解析的方式,读取出文本型PDF的文字内容,作为目标内容;The second result generation unit is used to read out the text content of the text PDF as the target content if the second judgment result is that the text type of the attachment is a text PDF;
第三结果生成单元,用于若第二判断结果为附件的文本类型为图片型PDF,则通过OCR识别的方式,读取出图片型PDF的文字内容,作为目标内容。The third result generating unit is configured to read the text content of the image PDF as the target content by means of OCR recognition if the second judgment result is that the text type of the attachment is an image PDF.
进一步的,产品相关信息提取模块75包括:Further, the product-related information extraction module 75 includes:
关键句子获取单元,用于通过预设关键词遍历目标内容,以获取目标内容中的包括预设关键词的句子,作为关键句子;A key sentence acquisition unit, configured to traverse the target content through preset keywords, so as to obtain sentences including preset keywords in the target content as key sentences;
数据获取单元,用于获取关键句子中与预设关键词相对应的数据,得到销售时间和销售金额;The data acquisition unit is used to acquire the data corresponding to the preset keywords in the key sentence, and obtain the sales time and sales amount;
产品名称获取单元,用于通过分隔符对关键句子进行拆分,以获取产品名称。The product name obtaining unit is used to split the key sentence by a delimiter to obtain the product name.
进一步的,在产品名称获取单元之后,还包括:Further, after the product name obtaining unit, it also includes:
第三判断结果获取单元,用于判断目标内容中是否存在表格,得到第三判断结果;a third judging result acquisition unit, configured to judge whether there is a table in the target content, and obtain a third judging result;
表头信息获取单元,用于若第三判断结果为目标内容中存在表格,则对表格进行解析,以获取表格对应的表头信息;A header information acquisition unit configured to parse the table to obtain header information corresponding to the table if the third judgment result is that there is a table in the target content;
表头信息匹配单元,用于判断表头信息是否与预设关键词相匹配,得到第四判断结果;The header information matching unit is used to judge whether the header information matches the preset keywords to obtain the fourth judgment result;
第四判断结果显示单元,用于若第四判断结果为表头信息与预设关键词相匹配,则基于表头信息,获取产品相关信息。The fourth judging result display unit is configured to obtain product-related information based on the header information if the fourth judging result is that the header information matches a preset keyword.
需要强调的是,为进一步保证上述产品销售信息的私密和安全性,上述产品销售信息还可以存储于一区块链的节点中。It should be emphasized that, in order to further ensure the privacy and security of the above-mentioned product sales information, the above-mentioned product sales information can also be stored in a node of a block chain.
为解决上述技术问题,本申请实施例还提供计算机设备。具体请参阅图10,图10为本实施例计算机设备基本结构框图。In order to solve the above technical problems, the embodiment of the present application further provides computer equipment. Please refer to FIG. 10 for details. FIG. 10 is a block diagram of the basic structure of the computer device in this embodiment.
计算机设备8包括通过系统总线相互通信连接存储器81、处理器82、网络接口83。需要指出的是,图中仅示出了具有三种组件存储器81、处理器82、网络接口83的计算机设备8,但是应理解的是,并不要求实施所有示出的组件,可以替代的实施更多或者更少的组件。其中,本技术领域技术人员可以理解,这里的计算机设备是一种能够按照事先设定或存储的指令,自动进行数值计算和/或信息处理的设备,其硬件包括但不限于微处理器、专用集成电路(Application Specific Integrated Circuit,ASIC)、可编程门阵列(Field-Programmable Gate Array,FPGA)、数字处理器(Digital Signal Processor,DSP)、嵌入式设备等。The computer device 8 includes a memory 81 , a processor 82 , and a network interface 83 connected to each other through a system bus for communication. It should be pointed out that the figure only shows a computer device 8 with three components memory 81, processor 82, and network interface 83, but it should be understood that it is not required to implement all the components shown, and alternative implementation more or fewer components. Among them, those skilled in the art can understand that the computer device here is a device that can automatically perform numerical calculation and/or information processing according to preset or stored instructions, and its hardware includes but is not limited to microprocessors, dedicated Integrated circuit (Application Specific Integrated Circuit, ASIC), programmable gate array (Field-Programmable Gate Array, FPGA), digital processor (Digital Signal Processor, DSP), embedded devices, etc.
其中,存储器81中存储有计算机可读指令,处理器82执行计算机可读指令时可实现上述基于决策树的产品匹配方法的任意实施例的所有步骤。Wherein, the memory 81 stores computer-readable instructions, and when the processor 82 executes the computer-readable instructions, all the steps of any embodiment of the above-mentioned decision tree-based product matching method can be implemented.
计算机设备可以是桌上型计算机、笔记本、掌上电脑及云端服务器等计算设备。计算机设备可以与用户通过键盘、鼠标、遥控器、触摸板或声控设备等方式进行人机交互。The computer equipment may be computing equipment such as desktop computers, notebooks, palmtop computers, and cloud servers. Computer equipment can interact with users through keyboards, mice, remote controls, touch pads, or voice-activated devices.
存储器81至少包括一种类型的可读存储介质,计算机可读存储介质可以是非易失性,也可以是易失性,可读存储介质包括闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘等。在一些实施例中,存储器81可以是计算机设备8的内部存储单元,例如该计算机设备8的硬盘或内存。在另一些实施例中,存储器81也可以是计算机设备8的外部存储设备,例如该计算机设备8上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。当然,存储器81还可以既包括计算机设备8的内部存储单元也包括其外部存储设备。本实施例中,存储器81通常用于存储安装于计算机设备8的操作系统和各类应用软件,例如基于决策树的产品匹配方法的计算机可读指令等。此外,存储器81还可以用于暂时地存储已经输出或者将要输出的各类数据。 Memory 81 includes at least one type of readable storage medium, and the computer readable storage medium can be non-volatile or volatile, and the readable storage medium includes flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), Random Access Memory (RAM), Static Random Access Memory (SRAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Programmable Read Only Memory (PROM), Magnetic storage, magnetic disks, optical disks, etc. In some embodiments, the memory 81 may be an internal storage unit of the computer device 8 , such as a hard disk or a memory of the computer device 8 . In other embodiments, the memory 81 can also be an external storage device of the computer device 8, such as a plug-in hard disk equipped on the computer device 8, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) card, flash memory card (Flash Card), etc. Of course, the memory 81 may also include both the internal storage unit of the computer device 8 and its external storage device. In this embodiment, the memory 81 is generally used to store the operating system installed in the computer device 8 and various application software, such as computer-readable instructions of a decision tree-based product matching method, and the like. In addition, the memory 81 can also be used to temporarily store various types of data that have been output or will be output.
处理器82在一些实施例中可以是中央处理器(Central Processing Unit,CPU)、控制器、微控制器、微处理器、或其他数据处理芯片。该处理器82通常用于控制计算机设备8的总体操作。本实施例中,处理器82用于运行存储器81中存储的计算机可读指令或者处理数据,例如运行上述基于决策树的产品匹配方法的计算机可读指令,以实现基于决策树的产品匹配方法的各种实施例。In some embodiments, the processor 82 may be a central processing unit (Central Processing Unit, CPU), a controller, a microcontroller, a microprocessor, or other data processing chips. The processor 82 is generally used to control the overall operation of the computer device 8 . In this embodiment, the processor 82 is used to run the computer-readable instructions stored in the memory 81 or process data, for example, to run the above-mentioned computer-readable instructions of the decision tree-based product matching method, so as to realize the decision tree-based product matching method. various embodiments.
网络接口83可包括无线网络接口或有线网络接口,该网络接口83通常用于在计算机设备8与其他电子设备之间建立通信连接。The network interface 83 may include a wireless network interface or a wired network interface, and the network interface 83 is generally used to establish a communication connection between the computer device 8 and other electronic devices.
本申请还提供了另一种实施方式,即提供一种计算机可读存储介质,计算机可读存储介质存储有计算机计算机可读指令,计算机可读指令可被至少一个处理器执行,以使至少一个处理器执行如上述的一种基于决策树的产品匹配方法的任意实施例的所有步骤。The present application also provides another implementation manner, that is, to provide a computer-readable storage medium, the computer-readable storage medium stores computer-readable instructions, and the computer-readable instructions can be executed by at least one processor, so that at least one The processor executes all the steps of any embodiment of a decision tree-based product matching method as described above.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本申请各个实施例的方法。Through the description of the above embodiments, those skilled in the art can clearly understand that the methods of the above embodiments can be implemented by means of software plus a necessary general-purpose hardware platform, and of course also by hardware, but in many cases the former is better implementation. Based on such an understanding, the technical solution of the present application can be embodied in the form of a software product in essence or the part that contributes to the prior art, and the computer software product is stored in a storage medium (such as ROM/RAM, disk, CD) contains several instructions to make a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) execute the method of each embodiment of the present application.
本申请所指区块链是分布式数据存储、点对点传输、共识机制、加密算法等计算机技术的新型应用模式。区块链(Blockchain),本质上是一个去中心化的数据库,是一串使用 密码学方法相关联产生的数据块,每一个数据块中包含了一批次网络交易的信息,用于验证其信息的有效性(防伪)和生成下一个区块。区块链可以包括区块链底层平台、平台产品服务层以及应用服务层等。The blockchain referred to in this application is a new application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm. Blockchain (Blockchain), essentially a decentralized database, is a series of data blocks associated with each other using cryptographic methods. Each data block contains a batch of network transaction information, which is used to verify its Validity of information (anti-counterfeiting) and generation of the next block. The blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
显然,以上所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例,附图中给出了本申请的较佳实施例,但并不限制本申请的专利范围。本申请可以以许多不同的形式来实现,相反地,提供这些实施例的目的是使对本申请的公开内容的理解更加透彻全面。尽管参照前述实施例对本申请进行了详细的说明,对于本领域的技术人员来而言,其依然可以对前述各具体实施方式所记载的技术方案进行修改,或者对其中部分技术特征进行等效替换。凡是利用本申请说明书及附图内容所做的等效结构,直接或间接运用在其他相关的技术领域,均同理在本申请专利保护范围之内。Apparently, the embodiments described above are only some of the embodiments of the present application, not all of them. The drawings show preferred embodiments of the present application, but do not limit the patent scope of the present application. The present application can be implemented in many different forms, on the contrary, the purpose of providing these embodiments is to make the understanding of the disclosure of the present application more thorough and comprehensive. Although the present application has been described in detail with reference to the foregoing embodiments, those skilled in the art can still modify the technical solutions described in the foregoing specific embodiments, or perform equivalent replacements for some of the technical features . All equivalent structures made using the contents of the description and drawings of this application, directly or indirectly used in other related technical fields, are also within the scope of protection of this application.

Claims (20)

  1. 一种基于决策树的产品匹配方法,包括:A decision tree-based product matching method, including:
    通过定时任务,定时检测目标邮箱的接收情况;Through timing tasks, regularly detect the receiving status of the target mailbox;
    若检测到所述目标邮箱中存在未读邮件,则获取所述未读邮件的主题与正文内容;If it is detected that there are unread emails in the target mailbox, then obtain the subject and body content of the unread emails;
    将所述主题与正文内容输入到训练好的决策树中,以判断所述未读邮件是否与产品销售相关,得到第一判断结果;Inputting the subject and text content into the trained decision tree to judge whether the unread email is related to product sales, and obtain the first judgment result;
    若所述第一判断结果为所述未读邮件与产品销售相关,则识别所述未读邮件中的内容,作为目标内容;If the first judgment result is that the unread email is related to product sales, identifying the content in the unread email as the target content;
    通过预设关键词遍历所述目标内容,以提取所述目标内容中的产品相关信息,其中,所述产品相关信息包括产品名称、销售时间、销售金额;Traversing the target content through preset keywords to extract product-related information in the target content, wherein the product-related information includes product name, sales time, and sales amount;
    通过预设的匹配规则,将所述产品名称匹配对应的销售金额,得到产品销售信息,并基于所述销售时间,将所述产品销售信息进行输出。The product name is matched with the corresponding sales amount through a preset matching rule to obtain product sales information, and based on the sales time, the product sales information is output.
  2. 根据权利要求1所述的基于决策树的产品匹配方法,其中,所述将所述主题与正文内容输入到训练好的决策树中,以判断所述未读邮件是否与产品销售相关,得到第一判断结果,包括:The product matching method based on a decision tree according to claim 1, wherein the subject and text content are input into the trained decision tree to judge whether the unread mail is related to product sales, and obtain the first 1. Judgment results, including:
    将所述主题与正文内容输入到所述训练好的决策树中;Input the subject and text content into the trained decision tree;
    通过所述训练好的决策树对所述主题与正文内容进行节点分类,得到当前节点分类结果;Carrying out node classification to the subject and text content through the trained decision tree to obtain the current node classification result;
    基于所述当前节点分类结果判断所述未读邮件的最终叶子节点,作为预测叶子节点;judging the final leaf node of the unread mail based on the classification result of the current node as a predicted leaf node;
    通过对所述预测叶子节点的预测结果,得到所述第一判断结果。The first judgment result is obtained through the prediction result of the prediction leaf node.
  3. 根据权利要求1所述的基于决策树的产品匹配方法,其中,在所述将所述主题与正文内容输入到训练好的决策树中,以判断所述未读邮件是否与产品销售相关,得到第一判断结果之前,所述方法还包括:The product matching method based on decision tree according to claim 1, wherein, in said inputting said subject and text content into the trained decision tree, to judge whether said unread mail is related to product sales, obtain Before the first judgment result, the method also includes:
    获取样本邮件,并抓取所述样本邮件中的内容,得到样本训练数据;Obtaining a sample email, and grabbing the content of the sample email to obtain sample training data;
    识别所述样本训练数据中的产品特征属性,其中,所述产品特征属性为所述样本训练数据中产品的固有属性;identifying product feature attributes in the sample training data, wherein the product feature attributes are inherent attributes of products in the sample training data;
    根据产品特征属性对样本训练数据中的数据进行特征选择,生成目标训练集,其中,所述目标训练集分为训练数据和测试数据;Carry out feature selection to the data in the sample training data according to the product characteristic attribute, generate the target training set, wherein, the target training set is divided into training data and test data;
    采用决策树算法对所述目标训练集进行训练,得到所述训练好的决策树。Using a decision tree algorithm to train the target training set to obtain the trained decision tree.
  4. 根据权利要求3所述的基于决策树的产品匹配方法,其中,所述采用决策树算法对所述目标训练集进行训练,得到所述训练好的决策树,包括:The product matching method based on decision tree according to claim 3, wherein said adopting decision tree algorithm to train said target training set to obtain said trained decision tree comprises:
    采用ID3算法,将所述训练数据对决策树进行节点计算,得到节点特征;Using the ID3 algorithm, the training data is used to calculate the nodes of the decision tree to obtain node features;
    基于所述节点特征,对所述所述决策树进行递归计算,其中,每次递归计算得到一个基础决策树;performing recursive calculation on the decision tree based on the node features, wherein each recursive calculation obtains a basic decision tree;
    通过测试数据对所述基础决策树进行测试计算,得到误差值;performing test calculation on the basic decision tree through test data to obtain an error value;
    当所述误差值小于预设阈值时,停止所述递归计算,得到所述训练好的决策树。When the error value is smaller than the preset threshold, the recursive calculation is stopped to obtain the trained decision tree.
  5. 根据权利要求1所述的基于决策树的产品匹配方法,其中,所述若所述第一判断结果为所述未读邮件与产品销售相关,则识别所述未读邮件中的内容,作为目标内容,包括:The product matching method based on a decision tree according to claim 1, wherein if the first judgment result is that the unread email is related to product sales, then identify the content in the unread email as the target content include:
    若所述第一判断结果为所述未读邮件与产品销售相关,则将所述主题与正文内容作为所述目标内容,并检测所述未读邮件是否包括附件,得到检测结果;If the first judgment result is that the unread email is related to product sales, then use the subject and body content as the target content, and detect whether the unread email includes attachments to obtain a detection result;
    若所述检测结果为所述未读邮件包括附件,则判断所述附件的文本类型,得到第二判断结果;If the detection result is that the unread email includes an attachment, then judging the text type of the attachment to obtain a second judgment result;
    若所述第二判断结果为所述附件的文本类型为word文档,则读取所述word文档对应的文字内容,作为所述目标内容;If the second judgment result is that the text type of the attachment is a word document, then read the text content corresponding to the word document as the target content;
    若所述第二判断结果为所述附件的文本类型为文本型PDF,则通过java类包解析的方式,读取出所述文本型PDF的文字内容,作为所述目标内容;If the second judgment result is that the text type of the attachment is a text-type PDF, the text content of the text-type PDF is read out as the target content by means of java class package parsing;
    若所述第二判断结果为所述附件的文本类型为图片型PDF,则通过OCR识别的方式, 读取出所述图片型PDF的文字内容,作为所述目标内容。If the second judging result is that the text type of the attachment is a picture-type PDF, the text content of the picture-type PDF is read out as the target content by way of OCR identification.
  6. 根据权利要求1所述的基于决策树的产品匹配方法,其中,所述通过预设关键词遍历所述目标内容,以提取所述目标内容中的产品相关信息,包括:The decision tree-based product matching method according to claim 1, wherein said traversing said target content through preset keywords to extract product-related information in said target content comprises:
    通过预设关键词遍历所述目标内容,以获取目标内容中的包括所述预设关键词的句子,作为关键句子;traversing the target content through preset keywords to obtain sentences including the preset keywords in the target content as key sentences;
    获取关键句子中与所述预设关键词相对应的数据,得到所述销售时间和所述销售金额;Obtain the data corresponding to the preset keyword in the key sentence, and obtain the sales time and the sales amount;
    通过分隔符对所述关键句子进行拆分,以获取所述产品名称。The key sentence is split by a delimiter to obtain the product name.
  7. 根据权利要求6所述的基于决策树的产品匹配方法,其中,在所述通过分隔符对所述关键句子进行拆分,以获取所述产品名称之后,还包括:The method for product matching based on a decision tree according to claim 6, wherein, after the key sentence is split by a delimiter to obtain the product name, further comprising:
    判断所述目标内容中是否存在表格,得到第三判断结果;judging whether there is a table in the target content, and obtaining a third judging result;
    若所述第三判断结果为所述目标内容中存在表格,则对所述表格进行解析,以获取所述表格对应的表头信息;If the third judgment result is that there is a table in the target content, then parsing the table to obtain header information corresponding to the table;
    判断所述表头信息是否与所述预设关键词相匹配,得到第四判断结果;judging whether the header information matches the preset keyword to obtain a fourth judging result;
    若所述第四判断结果为所述表头信息与预设关键词相匹配,则基于所述表头信息,获取所述产品相关信息。If the fourth determination result is that the header information matches a preset keyword, the product-related information is acquired based on the header information.
  8. 一种基于决策树的产品匹配装置,包括:A product matching device based on a decision tree, comprising:
    接收情况检测模块,用于通过定时任务,定时检测目标邮箱的接收情况;The receiving situation detection module is used to regularly detect the receiving situation of the target mailbox through the timing task;
    未读邮件获取模块,用于若检测到所述目标邮箱中存在未读邮件,则获取所述未读邮件的主题与正文内容;An unread email acquisition module, configured to acquire the subject and body content of the unread email if it is detected that there is an unread email in the target mailbox;
    未读邮件判断模块,用于将所述主题与正文内容输入到训练好的决策树中,以判断所述未读邮件是否与产品销售相关,得到第一判断结果;An unread mail judging module, which is used to input the subject and text content into the trained decision tree to judge whether the unread mail is related to product sales, and obtain the first judgment result;
    目标内容识别模块,用于若所述第一判断结果为所述未读邮件与产品销售相关,则识别所述未读邮件中的内容,作为目标内容;A target content identification module, configured to identify the content in the unread email as the target content if the first judgment result is that the unread email is related to product sales;
    产品相关信息提取模块,用于通过预设关键词遍历所述目标内容,以提取所述目标内容中的产品相关信息,其中,所述产品相关信息包括产品名称、销售时间、销售金额;A product-related information extraction module, configured to traverse the target content through preset keywords to extract product-related information in the target content, wherein the product-related information includes product name, sales time, and sales amount;
    产品销售信息输出模块,用于通过预设的匹配规则,将所述产品名称匹配对应的销售金额,得到产品销售信息,并基于所述销售时间,将所述产品销售信息进行输出。The product sales information output module is used to match the product name with the corresponding sales amount through preset matching rules to obtain product sales information, and output the product sales information based on the sales time.
  9. 一种计算机设备,包括存储器和处理器,所述存储器中存储有计算机可读指令,其中,所述处理器执行所述计算机可读指令时实现如下步骤:A computer device, comprising a memory and a processor, wherein computer-readable instructions are stored in the memory, wherein the processor implements the following steps when executing the computer-readable instructions:
    通过定时任务,定时检测目标邮箱的接收情况;Through timing tasks, regularly detect the receiving status of the target mailbox;
    若检测到所述目标邮箱中存在未读邮件,则获取所述未读邮件的主题与正文内容;If it is detected that there are unread emails in the target mailbox, then obtain the subject and body content of the unread emails;
    将所述主题与正文内容输入到训练好的决策树中,以判断所述未读邮件是否与产品销售相关,得到第一判断结果;Inputting the subject and text content into the trained decision tree to judge whether the unread email is related to product sales, and obtain the first judgment result;
    若所述第一判断结果为所述未读邮件与产品销售相关,则识别所述未读邮件中的内容,作为目标内容;If the first judgment result is that the unread email is related to product sales, identifying the content in the unread email as the target content;
    通过预设关键词遍历所述目标内容,以提取所述目标内容中的产品相关信息,其中,所述产品相关信息包括产品名称、销售时间、销售金额;Traversing the target content through preset keywords to extract product-related information in the target content, wherein the product-related information includes product name, sales time, and sales amount;
    通过预设的匹配规则,将所述产品名称匹配对应的销售金额,得到产品销售信息,并基于所述销售时间,将所述产品销售信息进行输出。The product name is matched with the corresponding sales amount through a preset matching rule to obtain product sales information, and based on the sales time, the product sales information is output.
  10. 根据权利要求9所述的计算机设备,其中,所述将所述主题与正文内容输入到训练好的决策树中,以判断所述未读邮件是否与产品销售相关,得到第一判断结果,包括:The computer device according to claim 9, wherein said inputting said subject and text content into a trained decision tree to judge whether said unread email is related to product sales, and obtain a first judgment result, comprising :
    将所述主题与正文内容输入到所述训练好的决策树中;Input the subject and text content into the trained decision tree;
    通过所述训练好的决策树对所述主题与正文内容进行节点分类,得到当前节点分类结果;Carrying out node classification to the subject and text content through the trained decision tree to obtain the current node classification result;
    基于所述当前节点分类结果判断所述未读邮件的最终叶子节点,作为预测叶子节点;judging the final leaf node of the unread mail based on the classification result of the current node as a predicted leaf node;
    通过对所述预测叶子节点的预测结果,得到所述第一判断结果。The first judgment result is obtained through the prediction result of the prediction leaf node.
  11. 根据权利要求9所述的计算机设备,其中,在所述将所述主题与正文内容输入到训练好的决策树中,以判断所述未读邮件是否与产品销售相关,得到第一判断结果之前,所述方法还包括:The computer device according to claim 9, wherein, before said inputting said subject and text content into a trained decision tree to judge whether said unread email is related to product sales, and obtaining a first judgment result , the method also includes:
    获取样本邮件,并抓取所述样本邮件中的内容,得到样本训练数据;Obtaining a sample email, and grabbing the content of the sample email to obtain sample training data;
    识别所述样本训练数据中的产品特征属性,其中,所述产品特征属性为所述样本训练数据中产品的固有属性;identifying product feature attributes in the sample training data, wherein the product feature attributes are inherent attributes of products in the sample training data;
    根据产品特征属性对样本训练数据中的数据进行特征选择,生成目标训练集,其中,所述目标训练集分为训练数据和测试数据;Carry out feature selection to the data in the sample training data according to the product characteristic attribute, generate the target training set, wherein, the target training set is divided into training data and test data;
    采用决策树算法对所述目标训练集进行训练,得到所述训练好的决策树。Using a decision tree algorithm to train the target training set to obtain the trained decision tree.
  12. 根据权利要求11所述的计算机设备,其中,所述采用决策树算法对所述目标训练集进行训练,得到所述训练好的决策树,包括:The computer device according to claim 11, wherein said adopting a decision tree algorithm to train said target training set to obtain said trained decision tree comprises:
    采用ID3算法,将所述训练数据对决策树进行节点计算,得到节点特征;Using the ID3 algorithm, the training data is used to calculate the nodes of the decision tree to obtain node features;
    基于所述节点特征,对所述所述决策树进行递归计算,其中,每次递归计算得到一个基础决策树;performing recursive calculation on the decision tree based on the node features, wherein each recursive calculation obtains a basic decision tree;
    通过测试数据对所述基础决策树进行测试计算,得到误差值;performing test calculation on the basic decision tree through test data to obtain an error value;
    当所述误差值小于预设阈值时,停止所述递归计算,得到所述训练好的决策树。When the error value is smaller than the preset threshold, the recursive calculation is stopped to obtain the trained decision tree.
  13. 根据权利要求9所述的计算机设备,其中,所述若所述第一判断结果为所述未读邮件与产品销售相关,则识别所述未读邮件中的内容,作为目标内容,包括:The computer device according to claim 9, wherein if the first judgment result is that the unread email is related to product sales, then identifying the content in the unread email, as the target content, includes:
    若所述第一判断结果为所述未读邮件与产品销售相关,则将所述主题与正文内容作为所述目标内容,并检测所述未读邮件是否包括附件,得到检测结果;If the first judgment result is that the unread email is related to product sales, then use the subject and body content as the target content, and detect whether the unread email includes attachments to obtain a detection result;
    若所述检测结果为所述未读邮件包括附件,则判断所述附件的文本类型,得到第二判断结果;If the detection result is that the unread email includes an attachment, then judging the text type of the attachment to obtain a second judgment result;
    若所述第二判断结果为所述附件的文本类型为word文档,则读取所述word文档对应的文字内容,作为所述目标内容;If the second judgment result is that the text type of the attachment is a word document, then read the text content corresponding to the word document as the target content;
    若所述第二判断结果为所述附件的文本类型为文本型PDF,则通过java类包解析的方式,读取出所述文本型PDF的文字内容,作为所述目标内容;If the second judgment result is that the text type of the attachment is a text-type PDF, the text content of the text-type PDF is read out as the target content by means of java class package parsing;
    若所述第二判断结果为所述附件的文本类型为图片型PDF,则通过OCR识别的方式,读取出所述图片型PDF的文字内容,作为所述目标内容。If the second judging result is that the text type of the attachment is a picture-type PDF, the text content of the picture-type PDF is read out as the target content by way of OCR recognition.
  14. 根据权利要求9所述的计算机设备,其中,所述通过预设关键词遍历所述目标内容,以提取所述目标内容中的产品相关信息,包括:The computer device according to claim 9, wherein said traversing said target content through preset keywords to extract product-related information in said target content comprises:
    通过预设关键词遍历所述目标内容,以获取目标内容中的包括所述预设关键词的句子,作为关键句子;traversing the target content through preset keywords to obtain sentences including the preset keywords in the target content as key sentences;
    获取关键句子中与所述预设关键词相对应的数据,得到所述销售时间和所述销售金额;Obtain the data corresponding to the preset keyword in the key sentence, and obtain the sales time and the sales amount;
    通过分隔符对所述关键句子进行拆分,以获取所述产品名称。The key sentence is split by a delimiter to obtain the product name.
  15. 根据权利要求14所述的计算机设备,其中,在所述通过分隔符对所述关键句子进行拆分,以获取所述产品名称之后,还包括:The computer device according to claim 14, wherein, after the key sentence is split by separators to obtain the product name, further comprising:
    判断所述目标内容中是否存在表格,得到第三判断结果;judging whether there is a table in the target content, and obtaining a third judging result;
    若所述第三判断结果为所述目标内容中存在表格,则对所述表格进行解析,以获取所述表格对应的表头信息;If the third judgment result is that there is a table in the target content, then parsing the table to obtain header information corresponding to the table;
    判断所述表头信息是否与所述预设关键词相匹配,得到第四判断结果;judging whether the header information matches the preset keyword to obtain a fourth judging result;
    若所述第四判断结果为所述表头信息与预设关键词相匹配,则基于所述表头信息,获取所述产品相关信息。If the fourth determination result is that the header information matches a preset keyword, the product-related information is acquired based on the header information.
  16. 一种计算机可读存储介质,其中,所述计算机可读存储介质存储有计算机可读指令,所述计算机可读指令被处理器执行时使得所述处理器执行如下步骤:A computer-readable storage medium, wherein the computer-readable storage medium stores computer-readable instructions, and when the computer-readable instructions are executed by a processor, the processor performs the following steps:
    通过定时任务,定时检测目标邮箱的接收情况;Through timing tasks, regularly detect the receiving status of the target mailbox;
    若检测到所述目标邮箱中存在未读邮件,则获取所述未读邮件的主题与正文内容;If it is detected that there are unread emails in the target mailbox, then obtain the subject and body content of the unread emails;
    将所述主题与正文内容输入到训练好的决策树中,以判断所述未读邮件是否与产品销售相关,得到第一判断结果;Inputting the subject and text content into the trained decision tree to judge whether the unread email is related to product sales, and obtain the first judgment result;
    若所述第一判断结果为所述未读邮件与产品销售相关,则识别所述未读邮件中的内容,作为目标内容;If the first judgment result is that the unread email is related to product sales, identifying the content in the unread email as the target content;
    通过预设关键词遍历所述目标内容,以提取所述目标内容中的产品相关信息,其中,所述产品相关信息包括产品名称、销售时间、销售金额;Traversing the target content through preset keywords to extract product-related information in the target content, wherein the product-related information includes product name, sales time, and sales amount;
    通过预设的匹配规则,将所述产品名称匹配对应的销售金额,得到产品销售信息,并基于所述销售时间,将所述产品销售信息进行输出。The product name is matched with the corresponding sales amount through a preset matching rule to obtain product sales information, and based on the sales time, the product sales information is output.
  17. 根据权利要求16所述的计算机可读存储介质,其中,所述将所述主题与正文内容输入到训练好的决策树中,以判断所述未读邮件是否与产品销售相关,得到第一判断结果,包括:The computer-readable storage medium according to claim 16, wherein the subject and text content are input into a trained decision tree to determine whether the unread email is related to product sales, and obtain a first judgment Results, including:
    将所述主题与正文内容输入到所述训练好的决策树中;Input the subject and text content into the trained decision tree;
    通过所述训练好的决策树对所述主题与正文内容进行节点分类,得到当前节点分类结果;Carrying out node classification to the subject and text content through the trained decision tree to obtain the current node classification result;
    基于所述当前节点分类结果判断所述未读邮件的最终叶子节点,作为预测叶子节点;judging the final leaf node of the unread mail based on the classification result of the current node as a predicted leaf node;
    通过对所述预测叶子节点的预测结果,得到所述第一判断结果。The first judgment result is obtained through the prediction result of the prediction leaf node.
  18. 根据权利要求16所述的计算机可读存储介质,其中,在所述将所述主题与正文内容输入到训练好的决策树中,以判断所述未读邮件是否与产品销售相关,得到第一判断结果之前,所述方法还包括:The computer-readable storage medium according to claim 16, wherein, after inputting the subject and text content into the trained decision tree to determine whether the unread email is related to product sales, the first Before judging the result, the method also includes:
    获取样本邮件,并抓取所述样本邮件中的内容,得到样本训练数据;Obtaining a sample email, and grabbing the content of the sample email to obtain sample training data;
    识别所述样本训练数据中的产品特征属性,其中,所述产品特征属性为所述样本训练数据中产品的固有属性;identifying product feature attributes in the sample training data, wherein the product feature attributes are inherent attributes of products in the sample training data;
    根据产品特征属性对样本训练数据中的数据进行特征选择,生成目标训练集,其中,所述目标训练集分为训练数据和测试数据;Carry out feature selection to the data in sample training data according to product feature attribute, generate target training set, wherein, described target training set is divided into training data and test data;
    采用决策树算法对所述目标训练集进行训练,得到所述训练好的决策树。Using a decision tree algorithm to train the target training set to obtain the trained decision tree.
  19. 根据权利要求18所述的计算机可读存储介质,其中,所述采用决策树算法对所述目标训练集进行训练,得到所述训练好的决策树,包括:The computer-readable storage medium according to claim 18, wherein said training the target training set using a decision tree algorithm to obtain the trained decision tree comprises:
    采用ID3算法,将所述训练数据对决策树进行节点计算,得到节点特征;Using the ID3 algorithm, the training data is used to calculate the nodes of the decision tree to obtain node features;
    基于所述节点特征,对所述所述决策树进行递归计算,其中,每次递归计算得到一个基础决策树;performing recursive calculation on the decision tree based on the node features, wherein each recursive calculation obtains a basic decision tree;
    通过测试数据对所述基础决策树进行测试计算,得到误差值;performing test calculation on the basic decision tree through test data to obtain an error value;
    当所述误差值小于预设阈值时,停止所述递归计算,得到所述训练好的决策树。When the error value is smaller than the preset threshold, the recursive calculation is stopped to obtain the trained decision tree.
  20. 根据权利要求16所述的计算机可读存储介质,其中,所述若所述第一判断结果为所述未读邮件与产品销售相关,则识别所述未读邮件中的内容,作为目标内容,包括:The computer-readable storage medium according to claim 16, wherein if the first judgment result is that the unread email is related to product sales, then identify the content in the unread email as the target content, include:
    若所述第一判断结果为所述未读邮件与产品销售相关,则将所述主题与正文内容作为所述目标内容,并检测所述未读邮件是否包括附件,得到检测结果;If the first judgment result is that the unread email is related to product sales, then use the subject and body content as the target content, and detect whether the unread email includes attachments to obtain a detection result;
    若所述检测结果为所述未读邮件包括附件,则判断所述附件的文本类型,得到第二判断结果;If the detection result is that the unread email includes an attachment, then judging the text type of the attachment to obtain a second judgment result;
    若所述第二判断结果为所述附件的文本类型为word文档,则读取所述word文档对应的文字内容,作为所述目标内容;If the second judgment result is that the text type of the attachment is a word document, then read the text content corresponding to the word document as the target content;
    若所述第二判断结果为所述附件的文本类型为文本型PDF,则通过java类包解析的方式,读取出所述文本型PDF的文字内容,作为所述目标内容;If the second judgment result is that the text type of the attachment is a text-type PDF, the text content of the text-type PDF is read out as the target content by means of java class package analysis;
    若所述第二判断结果为所述附件的文本类型为图片型PDF,则通过OCR识别的方式,读取出所述图片型PDF的文字内容,作为所述目标内容。If the second judging result is that the text type of the attachment is a picture-type PDF, the text content of the picture-type PDF is read out as the target content by way of OCR recognition.
PCT/CN2021/108777 2021-06-29 2021-07-28 Decision tree-based product matching method, apparatus and device, and storage medium WO2023272850A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110725709.8 2021-06-29
CN202110725709.8A CN113450147B (en) 2021-06-29 2021-06-29 Product matching method, device, equipment and storage medium based on decision tree

Publications (1)

Publication Number Publication Date
WO2023272850A1 true WO2023272850A1 (en) 2023-01-05

Family

ID=77813818

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/108777 WO2023272850A1 (en) 2021-06-29 2021-07-28 Decision tree-based product matching method, apparatus and device, and storage medium

Country Status (2)

Country Link
CN (1) CN113450147B (en)
WO (1) WO2023272850A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117408787A (en) * 2023-12-15 2024-01-16 江西求是高等研究院 Root cause mining analysis method and system based on decision tree
CN117575300A (en) * 2024-01-19 2024-02-20 德阳凯达门业有限公司 Task allocation method and device for workshops
CN117575300B (en) * 2024-01-19 2024-05-14 德阳凯达门业有限公司 Task allocation method and device for workshops

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114511822B (en) * 2022-04-19 2022-07-12 广州市方连科技有限公司 Daily sundries sales system capable of predicting sales volume according to monitoring picture

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108768819A (en) * 2018-03-08 2018-11-06 成都美美臣科技有限公司 A kind of marketing mail is sent to effect preventive inspection method and device
US20190095866A1 (en) * 2016-03-01 2019-03-28 Beijing Gridsum Technology Co., Ltd. Method and device for processing mail data
CN111680161A (en) * 2020-07-07 2020-09-18 腾讯科技(深圳)有限公司 Text processing method and device and computer readable storage medium
CN112016321A (en) * 2020-10-13 2020-12-01 上海一嗨成山汽车租赁南京有限公司 Method, electronic device and storage medium for mail processing

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5465695B2 (en) * 2011-05-31 2014-04-09 楽天株式会社 Progress presenting apparatus, progress presenting system, progress presenting program, and progress presenting method
JP2015170026A (en) * 2014-03-05 2015-09-28 東芝テック株式会社 Selling article settlement terminal and commodity consumption limit date notification system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190095866A1 (en) * 2016-03-01 2019-03-28 Beijing Gridsum Technology Co., Ltd. Method and device for processing mail data
CN108768819A (en) * 2018-03-08 2018-11-06 成都美美臣科技有限公司 A kind of marketing mail is sent to effect preventive inspection method and device
CN111680161A (en) * 2020-07-07 2020-09-18 腾讯科技(深圳)有限公司 Text processing method and device and computer readable storage medium
CN112016321A (en) * 2020-10-13 2020-12-01 上海一嗨成山汽车租赁南京有限公司 Method, electronic device and storage medium for mail processing

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117408787A (en) * 2023-12-15 2024-01-16 江西求是高等研究院 Root cause mining analysis method and system based on decision tree
CN117408787B (en) * 2023-12-15 2024-03-05 江西求是高等研究院 Root cause mining analysis method and system based on decision tree
CN117575300A (en) * 2024-01-19 2024-02-20 德阳凯达门业有限公司 Task allocation method and device for workshops
CN117575300B (en) * 2024-01-19 2024-05-14 德阳凯达门业有限公司 Task allocation method and device for workshops

Also Published As

Publication number Publication date
CN113450147A (en) 2021-09-28
CN113450147B (en) 2024-04-26

Similar Documents

Publication Publication Date Title
US7899769B2 (en) Method for identifying emerging issues from textual customer feedback
US11810070B2 (en) Classifying digital documents in multi-document transactions based on embedded dates
US7853544B2 (en) Systems and methods for automatically categorizing unstructured text
US20150120583A1 (en) Process and mechanism for identifying large scale misuse of social media networks
WO2018040068A1 (en) Knowledge graph-based semantic analysis system and method
US9104709B2 (en) Cleansing a database system to improve data quality
WO2023272850A1 (en) Decision tree-based product matching method, apparatus and device, and storage medium
Kiefer Assessing the Quality of Unstructured Data: An Initial Overview.
AU2014343044B2 (en) Method and system for document data extraction template management
JP2004078512A (en) Document management method and document management device
Gaglani et al. Unsupervised WhatsApp fake news detection using semantic search
US20220121668A1 (en) Method for recommending document, electronic device and storage medium
CN112686022A (en) Method and device for detecting illegal corpus, computer equipment and storage medium
CN112925911A (en) Complaint classification method based on multi-modal data and related equipment thereof
CN117195319A (en) Verification method and device for electronic part of file, electronic equipment and medium
CN112200465A (en) Electric power AI method and system based on multimedia information intelligent analysis
US20220358293A1 (en) Alignment of values and opinions between two distinct entities
US20220329556A1 (en) Detect and alert user when sending message to incorrect recipient or sending inappropriate content to a recipient
CN113420042A (en) Data statistics method, device, equipment and storage medium based on presentation
CN114117047A (en) Method and system for classifying illegal voice based on C4.5 algorithm
CN110851346A (en) Method, device and equipment for detecting boundary problem of query statement and storage medium
CN111027296A (en) Report generation method and system based on knowledge base
KR102565960B1 (en) Box electronic documentation system capable of creating, storing, transmitting, and deriving statistics using an input user interface, and providing method thereof
US20230134796A1 (en) Named entity recognition system for sentiment labeling
US20230214679A1 (en) Extracting and classifying entities from digital content items

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21947792

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE