Disclosure of Invention
The application mainly aims to provide a text auditing method, a text auditing device, computer equipment and a storage medium, and aims to solve the problem that the traditional auditing process is complicated.
The application provides a text auditing method, which comprises the following steps:
acquiring a text to be checked, and detecting the category to which the text to be checked belongs;
acquiring a plurality of pre-stored texts corresponding to the categories and satisfaction corresponding to the pre-stored texts from a text database;
performing data preprocessing on the pre-stored texts based on a professional word stock and a special character identification stock, so as to obtain temporary texts corresponding to the pre-stored texts;
word segmentation is carried out on each temporary text through a text classifier;
calculating the weight of each word according to the satisfaction corresponding to each temporary text to obtain a word weight database;
calculating the prediction satisfaction degree of the text to be checked according to the weight corresponding to each word in the word weight database;
and judging whether the text to be checked meets the pushing requirement according to the prediction satisfaction.
Further, the step of calculating the prediction satisfaction degree of the text to be checked according to the weight corresponding to each word in the word weight database includes:
word segmentation is carried out on the text to be checked through the text classifier, and initial words are obtained;
deleting the words which do not exist in the word weight database in the initial words to obtain target words, and obtaining the number corresponding to the target words after word segmentation;
and calculating the prediction satisfaction through a preset prediction satisfaction calculation formula according to the number of the target words, the corresponding number and weight of each target word.
Further, the step of calculating the weight of each word according to the satisfaction corresponding to the temporary text to obtain a word weight database includes:
acquiring text scores of the words;
according to the text scores of the words and the satisfaction degree of the temporary texts; calculating the matching value of each word in each temporary text through a preset matching value calculation formula;
counting the number of each word in each temporary text, and calculating the weight of each word according to the matching value of each word in each temporary text by a preset weight calculation formula;
and constructing a word weight database according to the weights of the words.
Further, the step of detecting the category to which the text to be checked belongs includes:
acquiring text information of the text to be checked, and vectorizing the text information to obtain a first vector corresponding to the text to be checked;
calculating the similarity between the first vector and the second vector corresponding to each class according to a preset similarity calculation formula;
and judging the category to which the text to be checked belongs according to the similarity.
Further, the step of obtaining a plurality of pre-stored texts corresponding to the category and satisfaction corresponding to the pre-stored texts from a text database includes:
acquiring a plurality of pre-stored texts from the text database, and acquiring service conversion rate and text information of the pre-stored texts;
and calculating the satisfaction degree of the pre-stored text according to the service conversion rate and the text information rate.
Further, after the step of determining whether the text to be checked meets the push requirement according to the predicted satisfaction, the method further includes:
if the pushing requirement is met, inputting the text to be checked into an n-gram model for calculation to obtain a legal value of the text to be checked;
judging whether the legal value reaches a preset legal value or not;
and if the preset legal value is reached, pushing the text to be checked.
The application also provides a text auditing device, which comprises:
the system comprises a text to be audited acquisition module, a text verification module and a text verification module, wherein the text to be audited acquisition module is used for acquiring a text to be audited and detecting the category to which the text to be audited belongs;
the pre-stored text acquisition module is used for acquiring a plurality of pre-stored texts corresponding to the categories and satisfaction corresponding to the pre-stored texts from a text database;
the preprocessing module is used for preprocessing data of the pre-stored texts based on a professional word stock and a special character identification stock so as to obtain temporary texts corresponding to the pre-stored texts;
the word segmentation module is used for segmenting each temporary text through a text classifier;
the weight calculation module is used for calculating the weight of each word according to the satisfaction corresponding to each temporary text so as to obtain a word weight database;
the prediction satisfaction calculation module is used for calculating the prediction satisfaction of the text to be checked according to the weight corresponding to each word in the word weight database;
and the auditing module is used for judging whether the text to be audited meets the pushing requirement according to the prediction satisfaction.
Further, the prediction satisfaction calculation module includes:
the word segmentation sub-module is used for segmenting the text to be checked through the text classifier to obtain each initial word;
the deleting sub-module is used for deleting the words which do not exist in the word weight database in the initial words to obtain target words, and obtaining the number corresponding to the target words after word segmentation;
and the prediction satisfaction calculating sub-module is used for calculating the prediction satisfaction through a preset prediction satisfaction calculating formula according to the number of the target words, the number and the weight corresponding to each target word.
The application also provides a computer device comprising a memory storing a computer program and a processor implementing the steps of any of the methods described above when the processor executes the computer program.
The application also provides a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of the method of any of the preceding claims.
The application has the beneficial effects that: according to the text auditing method, the predicted satisfaction degree of the text to be audited can be calculated based on the satisfaction degree of other prestored texts of the category, so that the text to be audited is audited automatically, and the human resources of a company are saved.
Detailed Description
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
It should be noted that, in the embodiments of the present application, all directional indicators (such as up, down, left, right, front, and back) are merely used to explain the relative positional relationship, movement conditions, and the like between the components in a specific posture (as shown in the drawings), if the specific posture is changed, the directional indicators correspondingly change, and the connection may be a direct connection or an indirect connection.
The term "and/or" is herein merely an association relation describing an associated object, meaning that there may be three relations, e.g., a and B, may represent: a exists alone, A and B exist together, and B exists alone.
Furthermore, descriptions such as those referred to as "first," "second," and the like, are provided for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implying an order of magnitude of the indicated technical features in the present disclosure. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In addition, the technical solutions of the embodiments may be combined with each other, but it is necessary to base that the technical solutions can be realized by those skilled in the art, and when the technical solutions are contradictory or cannot be realized, the combination of the technical solutions should be considered to be absent and not within the scope of protection claimed in the present application.
Referring to fig. 1, the application provides a text auditing method, which comprises the following steps:
s1: acquiring a text to be checked, and detecting the category to which the text to be checked belongs;
s2: acquiring a plurality of pre-stored texts corresponding to the categories and satisfaction corresponding to the pre-stored texts from a text database;
s3: performing data preprocessing on the pre-stored texts based on a professional word stock and a special character identification stock, so as to obtain temporary texts corresponding to the pre-stored texts;
s4: word segmentation is carried out on each temporary text through a text classifier;
s5: calculating the weight of each word according to the satisfaction corresponding to each temporary text to obtain a word weight database;
s6: calculating the prediction satisfaction degree of the text to be checked according to the weight corresponding to each word in the word weight database;
s7: and judging whether the text to be checked meets the pushing requirement according to the prediction satisfaction.
And (2) acquiring the text to be checked and detecting the category to which the text to be checked belongs as described in the step (S1). The text to be checked is obtained by editing and uploading by a user, the category to which the text to be checked belongs is detected, text information in the text to be checked is extracted, the text information is identified according to the semantics to determine the category of the text to be checked, vectorization can be carried out on the extracted text information, similarity calculation is carried out on vectors corresponding to all the categories, and the category with the highest similarity is set as the category of the text to be checked.
As described in the above step S2, a plurality of pre-stored texts corresponding to the category and satisfaction corresponding to the pre-stored texts are obtained from a text database. Since the category of the text to be checked is known, a plurality of pre-stored texts corresponding to the category can be obtained from a database, wherein the pre-stored texts are texts after the checking is passed and contain corresponding service information, and after the pre-stored texts are sent to clients, corresponding feedback messages can be obtained, and the feedback messages can be the click quantity of the pre-stored texts or the increment quantity of the corresponding service. And converting the feedback information into satisfaction according to the corresponding relation.
As described in step S3 above, pre-stored texts are pre-processed based on professional word libraries and special character identification libraries, such as professional words like "birthday blessing", "marketing", "service", etc., words and sentences in the pre-stored texts are pre-processed based on the professional words, the pre-processing includes converting words in the words and sentences, cleaning and filtering dirty words, de-duplicating repeated words, replacing synonyms, supplementing incomplete sentences according to sentence meaning, etc. Wherein the characters in the special character identification library are "@, #,"% and the like ", and the characters are deleted when preprocessing is performed.
As described in the above step S4, a Python LibShortText text classifier may be adopted and replaced by a chinese word segmentation device in a self-defined manner, and words in the temporary text may be obtained by calculation through any algorithm such as decision tree, multi-layer perceptron, naive bayes (including bernoulli bayes, gao Sibei phyllos and polynomial bayes), random forest, adaBoost, feedforward neural network and LSTM.
And calculating the weight of each word according to the satisfaction corresponding to the temporary text in the step S5. Because the temporary text corresponds to the previous pre-stored text, and the pre-stored text is the transmitted text, the corresponding satisfaction degree can be obtained according to the reading interest of people, and the weight is given to the words according to the satisfaction degree of each pre-stored text. The specific method of assigning is described in detail later and will not be described here.
As described in step S6, the predicted satisfaction of the text to be checked is calculated according to the weight of each word, for example, the predicted satisfaction is calculated according to the number of the target words, the corresponding number and weight of each target word, and a preset predicted satisfaction calculation formula, where the preset predicted satisfaction calculation formula may be that the predicted satisfaction is obtained by accumulating products of the number of each target word and the corresponding weight, and the influence of the text length on the satisfaction may be further considered, and the effects are multiplied by corresponding coefficients. Because the weight is obtained based on satisfaction corresponding to other temporary texts, the value obtained by calculating the weight is also a value related to the satisfaction, so that the satisfaction of the text to be audited can be calculated through a preset predictive satisfaction calculation formula, and the predictive satisfaction is obtained. The specific calculation formula is described in detail later, and is not described here again.
And (7) judging whether the text to be checked meets the pushing requirement according to the predicted satisfaction degree. Specifically, a satisfaction threshold may be set, when the satisfaction threshold is reached, it is determined that the corresponding text to be checked passes the check, and when the satisfaction threshold is lower than the satisfaction threshold, it is determined that the corresponding text to be checked does not pass the check.
In one embodiment, the step S6 of calculating the predicted satisfaction of the text to be checked according to the weight of each word includes:
s601: word segmentation is carried out on the text to be checked through the text classifier, and initial words are obtained;
s602: deleting the words which do not exist in the word weight database in the initial words to obtain target words, and obtaining the number corresponding to the target words after word segmentation;
s603: and calculating the prediction satisfaction through a preset prediction satisfaction calculation formula according to the number of the target words, the corresponding number and weight of each target word.
As described in the above steps S601 to S602, calculation of the prediction satisfaction of the text to be checked is achieved. Specifically, the text to be checked can be subjected to word segmentation through a text classifier, and then the number of words corresponding to the word segmentation is obtained, and as the text to be checked possibly has no words in the word weight database, no corresponding weights exist in the word weight database, and therefore the words can be deleted. Wherein the preset predictive satisfaction calculation formula is as followsWherein->Representing the number of the initial words, +.>Representing the number of target words, +.>Representing the number of i-th target words, +.>Weight representing the i-th target word, +.>For the set standard number, ++>,/>,/>Are allAnd (5) setting parameters. In addition, the satisfaction degree of the text to be checked is related to the content and the length of the text, and a standard number can be set in advanceAs a coefficient, then subtracting the reading influence by the text length +.>It will be appreciated that too long or too short a text length affects the corresponding satisfaction, thus setting the function +.>When the text length is standard, i.e. the number of words in the text x is standard +.>The function takes a minimum value b.
In one embodiment, the step S5 of calculating the weight of each word according to the satisfaction degree corresponding to each temporary text to obtain a word weight database includes:
s501: acquiring text scores of the words;
s502: according to the text scores of the words and the satisfaction degree of the temporary texts; calculating the matching value of each word in each temporary text through a preset matching value calculation formula;
s503: counting the number of each word in each temporary text, and calculating the weight of each word according to the matching value of each word in each temporary text by a preset weight calculation formula;
s504: and constructing a word weight database according to the weights of the words.
As described in the above steps S501-S504, weight calculation for each word is implemented. In particular, since it is impossible to determine what word comparison attracts customers, satisfaction of pre-stored texts and text scores of the respective words can be known, and thus the corresponding words can be extracted individually,then, based on the values of the words in the different pre-stored texts, a matching value in each pre-stored text is calculated, it being understood that some words will be more emotional, so that the text scores of the words can be obtained in advance, and then the matching value in the temporary text is calculated, in a specific embodiment, the matching value can be calculated by the formulaWherein->Indicate->The>Matching value corresponding to each word, +.>Indicate->Said satisfaction of the individual texts, +.>Indicate->Text scores corresponding to the individual terms; then the average value of the corresponding words is calculated as the final weight of the words, namely the weight calculation formula is +.>And calculating the weight of each word, and constructing a word weight database according to the weight of each word.
In one embodiment, the step S1 of detecting the category to which the text to be checked belongs includes:
s101: acquiring text information of the text to be checked, and vectorizing the text information to obtain a first vector corresponding to the text to be checked;
s102: according to a preset similarity calculation formula
S103: and judging the category to which the text to be checked belongs according to the similarity.
As described in the above steps S101 to S103, detection of the category to which the text to be checked belongs is achieved. Specifically, similarity calculation may be performed according to the text information of the text to be checked, and vector information corresponding to each class, where the closer the calculated value of similarity is to 1, the more similar the first vector is to the second vector, and the closer the calculated value of similarity is to-1, the more dissimilar the first vector is to the second vector. And then selecting a category corresponding to the second vector with the largest similarity value as the category of the text to be checked. Wherein the preset similarity calculation formula can be thatWherein, said->Representing a first vector, said ++>Representing a second vector; />Representing the similarity of the first vector and the second vector.
In one embodiment, the step S2 of obtaining a plurality of pre-stored texts corresponding to the category from the text database and satisfaction corresponding to the pre-stored texts includes:
s201: acquiring a plurality of pre-stored texts from the text database, and acquiring service conversion rate and text information of the pre-stored texts;
s202: and calculating the satisfaction degree of the pre-stored text according to the service conversion rate and the text information rate.
As described in the above steps S201 to S202, the acquisition of each of the pre-stored texts and the corresponding satisfaction is realized. Specifically, since the category of the text to be checked is known, a plurality of pre-stored texts can be acquired from the corresponding category in the database, and since the information received by the client in different periods of time may be different, it is preferable to acquire the pre-stored text closer to the current time. And calculating to obtain the satisfaction degree corresponding to the pre-stored text through the corresponding service conversion rate and text information, wherein the service conversion rate and the text information are also stored in a corresponding text database. That is, after each text is sent out, the data is collected and then is summarized into a text database for storage. And then calculating the corresponding satisfaction according to the preset corresponding relation, thereby realizing the acquisition of the satisfaction.
In one embodiment, after the step S7 of determining whether the text to be checked meets the push requirement according to the predicted satisfaction, the method further includes:
s801: if the pushing requirement is met, inputting the text to be checked into an n-gram model for calculation to obtain a legal value of the text to be checked;
s802: judging whether the legal value reaches a preset legal value or not;
s803: and if the preset legal value is reached, pushing the text to be checked.
As described in the above steps S801 to S803, the calculation of the legal value of the text to be audited is realized, and the text to be audited is further audited. Inputting the text to be checked into an n-gram model for calculation to obtain a legal value of the text to be checked. The n-gram model is trained based on standard sentences with complete sentence meaning and no grammar errors, and then the calculated legal value is compared with a preset legal value, if the legal value is lower than the preset legal value, the text to be checked can be considered to have grammar errors, and the corresponding personnel are required to confirm. If the preset legal value is reached, the corresponding text to be checked can be pushed.
Referring to fig. 2, the present application proposes a text auditing apparatus, including:
the text to be audited acquisition module 10 is used for acquiring the text to be audited and detecting the category to which the text to be audited belongs;
a pre-stored text obtaining module 20, configured to obtain a plurality of pre-stored texts corresponding to the categories and satisfaction corresponding to the pre-stored texts from a text database;
the preprocessing module 30 is configured to perform data preprocessing on the pre-stored texts based on a professional word stock and a special character identifier stock, so as to obtain temporary texts corresponding to each pre-stored text;
a word segmentation module 40, configured to segment each of the temporary texts through a text classifier;
the weight calculation module 50 is configured to calculate a weight of each word according to the satisfaction corresponding to each temporary text, so as to obtain a word weight database;
a predicted satisfaction calculating module 60, configured to calculate a predicted satisfaction of the text to be checked according to the weight corresponding to each word in the word weight database;
and the auditing module 70 is used for judging whether the text to be audited meets the pushing requirement according to the prediction satisfaction.
In one embodiment, the predictive satisfaction calculation module 60 includes:
the word segmentation sub-module is used for segmenting the text to be checked through the text classifier to obtain each initial word;
the deleting sub-module is used for deleting the words which do not exist in the word weight database in the initial words to obtain target words, and obtaining the number corresponding to the target words after word segmentation;
and the prediction satisfaction calculating sub-module is used for calculating the prediction satisfaction through a preset prediction satisfaction calculating formula according to the number of the target words, the number and the weight corresponding to each target word.
In one embodiment, the weight calculation module 50 includes:
the text score obtaining sub-module is used for obtaining the text score of each word;
a matching value calculation sub-module, configured to calculate a text score according to each term and satisfaction of each temporary text; calculating the matching value of each word in each temporary text through a preset matching value calculation formula;
the weight calculation sub-module is used for counting the number of each word in each temporary text and calculating the weight of each word according to the matching value of each word in each temporary text through a preset weight calculation formula;
the word weight database construction module is used for constructing the word weight database according to the weights of the words.
In one embodiment, the text to be checked acquisition module 10 includes:
the text information acquisition sub-module is used for acquiring the text information of the text to be checked and vectorizing the text information to obtain a first vector corresponding to the text to be checked;
the similarity calculation submodule is used for calculating the similarity of the first vector and the second vector corresponding to each class according to a preset similarity calculation formula;
and the category judging module is used for judging the category to which the text to be checked belongs according to the similarity.
In one embodiment, the pre-stored text retrieval module 20 includes:
a pre-stored text obtaining sub-module, configured to obtain a plurality of pre-stored texts from the text database, and obtain service conversion rate and text information of the pre-stored texts;
and the satisfaction degree calculating sub-module is used for calculating the satisfaction degree of the pre-stored text according to the service conversion rate and the text information rate.
In one embodiment, the text auditing apparatus further includes:
the legal value calculation module is used for inputting the text to be checked into the n-gram model for calculation if the pushing requirement is met, so as to obtain the legal value of the text to be checked;
the legal value judging module is used for judging whether the legal value reaches a preset legal value or not;
and the pushing module is used for pushing the text to be checked if the preset legal value is reached.
The application has the beneficial effects that: according to the text auditing method, the predicted satisfaction degree of the text to be audited can be calculated based on the satisfaction degree of other prestored texts of the category, so that the text to be audited is audited automatically, and the human resources of a company are saved.
Referring to fig. 3, in an embodiment of the present application, there is further provided a computer device, which may be a server, and an internal structure thereof may be as shown in fig. 3. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the computer is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used to store various pre-stored texts and the like. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program may implement the text auditing method according to any of the above embodiments when executed by a processor.
It will be appreciated by those skilled in the art that the architecture shown in fig. 3 is merely a block diagram of a portion of the architecture in connection with the present inventive arrangements and is not intended to limit the computer devices to which the present inventive arrangements are applicable.
The embodiment of the application also provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor, can implement the text auditing method according to any of the above embodiments.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by hardware associated with a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium provided by the present application and used in embodiments may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual speed data rate SDRAM (SSRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, apparatus, article or method that comprises the element.
The above description is only of the preferred embodiments of the present application and is not intended to limit the present application, but various modifications and variations can be made to the present application by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.