CN116306506A

CN116306506A - Intelligent mail template method based on content identification

Info

Publication number: CN116306506A
Application number: CN202211103395.9A
Authority: CN
Inventors: 严峻; 孟祥磊; 侯颖; 张威
Original assignee: Wuhan Best Information Technology Co ltd
Current assignee: Wuhan Best Information Technology Co ltd
Priority date: 2022-09-09
Filing date: 2022-09-09
Publication date: 2023-06-23

Abstract

The invention discloses an intelligent mail template method based on content identification, and relates to the technical field of computer application. The invention comprises the following steps: collecting the basic structure of an HTML mail of a Web page, extracting the text content of the Web page, labeling a training result and a feature training result, and establishing a corresponding mail feature model; creating a mail template file to be output so as to generate a preset mail template file; identifying text content of a user and inputting a mail characteristic model; outputting a mail template file by the mail characteristic model; and performing program format conversion on the mail template file to generate a target mail template. According to the method, the HTML mail training mail feature model of the Web page is collected, the mail template file to be output is created to generate the preset mail template file, the mail feature model is input to generate the target mail template after the text content of the user is identified, and the mail generation efficiency and accuracy are improved.

Description

Intelligent mail template method based on content identification

Technical Field

The invention belongs to the technical field of computer application, and particularly relates to an intelligent mail template method based on content identification.

Background

Along with the development of diversification and refinement of computer application service scenes, each scene needs to monitor a large number of service indexes, key indexes monitored are read and summarized in a systematic way, and finally data analysis daily reports are pushed to a mail terminal in a mail way.

At present, for personalized mail sending, for example, a mail contains elements such as a picture, a table, a title and the like, the size and the brightness of the picture, the attribute of the table, the thickness and the color of a border and a shading, the font size, the font color, the line spacing and the like of the title are required to be regulated according to the requirement of mail presentation, so that a developer needs to develop different JAVA codes, finally, different JAVA codes are spliced together in order to form a mail which is finally required, so that the requirement of user service is met, namely, the traditional mail sending is realized in a mode of splicing HTML mail codes by means of Jmail.

When the HTML mail code is spliced by Jmail, different JAVA codes need to be developed, so that the code redundancy is high, the code is not modularized and is difficult to manage, and when the mail with different display modes needs to be newly added, the whole HTML code needs to be rewritten, so that the mail generation efficiency is low.

Disclosure of Invention

The invention aims to provide an intelligent mail template method based on content identification, which is characterized in that a mail characteristic model is trained by collecting HTML mails of Web pages, a mail template file to be output is created to generate a preset mail template file, and a mail characteristic model is input to generate a target mail template after the text content of a user is identified, so that the problems of low mail generation efficiency and inaccurate mail generation in the prior art are solved.

In order to solve the technical problems, the invention is realized by the following technical scheme:

the invention relates to an intelligent mail template method based on content identification, which comprises the following steps:

step S1: collecting the basic structure of an HTML mail of a Web page, and extracting the text content of the Web page;

step S2: preprocessing the extracted mail information;

step S3: marking, training and identifying the processed mail information through a deep learning algorithm;

step S4: the deep learning algorithm performs labeling training and feature training to obtain labeling training results and feature training results;

step S5: establishing a corresponding mail feature model according to the labeling training result and the feature training result;

step S6: creating a mail template file to be output so as to generate a preset mail template file;

step S7: identifying text content of a user and inputting a mail characteristic model;

step S8: outputting a mail template file by the mail characteristic model;

step S9: and performing program format conversion on the mail template file to generate a target mail template.

As a preferable technical solution, in the step S2, preprocessing the mail information includes the following steps:

step S21: collecting and training mail information;

step S22: discretizing the mail text and establishing a mail element library;

step S23: text feature extraction and text vectorization representation;

step S24: and carrying out weighted representation on words and elements corresponding to the text, and representing the text in a vector form.

As a preferable technical solution, in the step S21, when the mail information is collected and trained, the mail information needs to be cleaned; statistical value χ is adopted during cleaning ² Is to be selected by the size of the screen,the specific expression is as follows:

wherein t is a feature item, C is a category of text, N is a total number of texts in the training set, a represents a frequency of occurrence of texts containing the feature item t in the category C, B represents a frequency of occurrence of texts containing the feature item t and not belonging to the category C, C represents a frequency of occurrence of texts not containing the feature item t in the category C, and D represents a frequency of occurrence of texts not containing the feature item t and not belonging to the category C.

As a preferred embodiment, in the step S22, the mail element includes one or more of a graph, a table, and a title; when the mail element is a graph, the attribute parameters comprise pixels, resolution, size, color, tone, saturation, brightness and gray value of the graph; the mail element is a representation, and the attribute parameters comprise the margin, the frame, the shading and the colors and line thicknesses of the frame and the shading of the table; when the mail element is a title, the attribute parameters include a title font, a font size, a line spacing and a font color.

As a preferable technical solution, in the step S23, a TF-IDF method is adopted for text feature extraction and text vectorization; the weighting function of the TF-IDF is expressed as:

w _ij ＝t _i f _j ·id _i f _j ；

vectorizing the above formula is expressed as:

w _ij ＝t _i f _j ×log(N/n _i )；

wherein tf represents the word frequency of the feature word, idf represents the data of the text with the feature word, N represents the text quantity in the training set, and N _i Representing the total number of texts in which the feature term t appears.

In step S6, the mail feature model performs corresponding matching in a preset information recommendation library, determines element information corresponding to the information element library in the information recommendation library, and sends the recommendation information to a preset mail template file.

As a preferable technical solution, in step S9, the element data input by the user is processed in the mail generating process to generate the element display information.

The invention has the following beneficial effects:

according to the method, the HTML mail training mail feature model of the Web page is collected, the mail template file to be output is created to generate the preset mail template file, the mail feature model is input to generate the target mail template after the text content of the user is identified, and the mail generation efficiency and accuracy are improved.

Of course, it is not necessary for any one product to practice the invention to achieve all of the advantages set forth above at the same time.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a method for intelligent mail templates based on content identification according to the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Referring to fig. 1, the invention discloses an intelligent mail template method based on content identification, which comprises the following steps:

step S2: preprocessing the extracted mail information;

step S6: creating a mail template file to be output so as to generate a preset mail template file; the mail feature model carries out corresponding matching in a preset information recommendation library, determines element information corresponding to an information element library in the information recommendation library, and sends the recommendation information to a preset mail template file;

step S8: outputting a mail template file by the mail characteristic model;

step S9: and performing program format conversion on the mail template file to generate a target mail template, processing element data input by a user in the mail generation process to generate element display information, generating the mail template at a terminal, and enabling the user to briefly modify the generated template according to actual needs until the mail effect required by the user is met.

In step S2, preprocessing the mail information includes the steps of:

step S21: collecting and training mail information;

step S22: discretizing the mail text and establishing a mail element library;

step S23: text feature extraction and text vectorization representation;

In step S21, when the mail information is collected and trained, the mail information needs to be cleaned; statistical value χ is adopted during cleaning ² Is selected by the size of the expression vector, and is specifically expressedThe formula is as follows:

where t is a feature item, C is a category of text, N is a total number of texts in the training set, a represents a frequency of occurrence of a text containing the feature item t in the category C, B represents a frequency of occurrence of a text containing the feature item t and not belonging to the category C, C represents a frequency of occurrence of a text not containing the feature item t in the category C, and D represents a frequency of occurrence of a text not containing the feature item t and not belonging to the category C, and therefore, n=a+b+c+d is known;

the magnitude of the association between a feature item and a class depends on the statistical value χ ² Size, characteristic item and class χ ² The higher the value, the greater the relevance of the explanatory feature item and the category, the more category distinguishing information is contained, and vice versa; usage statistics χ ² The relation between the feature items and the text categories is fully considered and described, the feature extraction precision is greatly improved, and the algorithm is simple and easy to realize.

In step S22, the mail element includes one or more of a graph, a table, and a title; when the mail element is a graph, the attribute parameters comprise pixels, resolution, size, color, tone, saturation, brightness and gray-scale value of the graph; the mail element is a representation, and the attribute parameters comprise the margin, the frame, the shading and the colors and line thicknesses of the frame and the shading of the table; when the mail element is a title, then the attribute parameters include title font, font size, line spacing, and font color.

In step S23, a TF-IDF method is adopted for text feature extraction and text vectorization; the weighting function of TF-IDF is expressed as:

w _ij ＝t _i f _j ·id _i f _j ；

vectorizing the above formula is expressed as:

w _ij ＝t _i f _j ×log(N/n _i )；

where tf represents the term frequency of the feature word and idf represents the presence of the featureWord text data, N represents the number of text in the training set, N _i Representing the total number of texts in which the feature item t appears;

the weighting rule of the TF-IDF method is as follows: a word appears multiple times in one text and then multiple times in another peer document. But for words with a large ability to distinguish between different text categories, words with a small frequency of occurrence in the text are often referred to as word frequency-inverse document frequency functions.

It should be noted that, in the above system embodiment, each unit included is only divided according to the functional logic, but not limited to the above division, so long as the corresponding function can be implemented; in addition, the specific names of the functional units are also only for distinguishing from each other, and are not used to limit the protection scope of the present invention.

In addition, those skilled in the art will appreciate that all or part of the steps in implementing the methods of the embodiments described above may be implemented by a program to instruct related hardware, and the corresponding program may be stored in a computer readable storage medium.

The preferred embodiments of the invention disclosed above are intended only to assist in the explanation of the invention. The preferred embodiments are not exhaustive or to limit the invention to the precise form disclosed. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, to thereby enable others skilled in the art to best understand and utilize the invention. The invention is limited only by the claims and the full scope and equivalents thereof.

Claims

1. The intelligent mail template method based on the content identification is characterized by comprising the following steps:

step S2: preprocessing the extracted mail information;

step S8: outputting a mail template file by the mail characteristic model;

2. The intelligent mail template method based on content recognition according to claim 1, wherein the step S2 of preprocessing the mail message comprises the steps of:

step S21: collecting and training mail information;

step S22: discretizing the mail text and establishing a mail element library;

step S23: text feature extraction and text vectorization representation;

3. The intelligent mail template method based on content recognition according to claim 2, wherein in the step S21, when the mail information is collected and trained, the mail information needs to be cleaned; statistical value χ is adopted during cleaning ² Is selected by the size of the formula:

4. The intelligent mail template method based on content recognition according to claim 2, wherein in step S22, the mail elements include one or more of a graph, a table, and a title; when the mail element is a graph, the attribute parameters comprise pixels, resolution, size, color, tone, saturation, brightness and gray value of the graph; the mail element is a representation, and the attribute parameters comprise the margin, the frame, the shading and the colors and line thicknesses of the frame and the shading of the table; when the mail element is a title, the attribute parameters include a title font, a font size, a line spacing and a font color.

5. The intelligent mail template method based on content recognition according to claim 2, wherein in step S23, the TF-IDF method is used for text feature extraction and text vectorization; the weighting function of the TF-IDF is expressed as:

w _ij ＝t _i f _j ·id _i f _j ；

vectorizing the above formula is expressed as:

w _ij ＝t _i f _j ×log(N/n _i )；

6. The intelligent mail template method based on content recognition according to claim 1, wherein in step S6, the mail feature model performs corresponding matching in a preset information recommendation library, determines element information corresponding to the information element library in the information recommendation library, and sends the recommendation information to a preset mail template file.

7. The intelligent mail template method based on content recognition according to claim 1, wherein in step S9, the element data input by the user is processed in the mail generation process to generate element presentation information.