CN112163585B - Text auditing method and device, computer equipment and storage medium - Google Patents

Text auditing method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN112163585B
CN112163585B CN202011247736.0A CN202011247736A CN112163585B CN 112163585 B CN112163585 B CN 112163585B CN 202011247736 A CN202011247736 A CN 202011247736A CN 112163585 B CN112163585 B CN 112163585B
Authority
CN
China
Prior art keywords
text
word
satisfaction
weight
checked
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011247736.0A
Other languages
Chinese (zh)
Other versions
CN112163585A (en
Inventor
宋晓薇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Seven Cats Culture Media Co ltd
Shenzhen Lian Intellectual Property Service Center
Original Assignee
Shanghai Seven Cats Culture Media Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Seven Cats Culture Media Co ltd filed Critical Shanghai Seven Cats Culture Media Co ltd
Priority to CN202011247736.0A priority Critical patent/CN112163585B/en
Publication of CN112163585A publication Critical patent/CN112163585A/en
Application granted granted Critical
Publication of CN112163585B publication Critical patent/CN112163585B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3349Reuse of stored results of previous queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a text auditing method, a text auditing device, computer equipment and a storage medium, wherein the method comprises the following steps: acquiring a text to be checked; acquiring a plurality of prestored texts and corresponding satisfaction from a text database; performing data preprocessing on the pre-stored texts to obtain temporary texts corresponding to the pre-stored texts; word segmentation is carried out on each temporary text; calculating the weight of each word according to the satisfaction corresponding to each temporary text; calculating the prediction satisfaction degree of the text to be checked according to the weight corresponding to each word; and judging whether the text to be checked meets the pushing requirement according to the prediction satisfaction. The application has the beneficial effects that: according to the text auditing method, the predicted satisfaction degree of the text to be audited can be calculated based on the satisfaction degree of other prestored texts of the category, so that the text to be audited is audited automatically, and the human resources of a company are saved.

Description

Text auditing method and device, computer equipment and storage medium
Technical Field
The present application relates to the field of data processing, and in particular, to a text auditing method, apparatus, computer device, and storage medium.
Background
After the text is edited, the business party is required to apply for corresponding mass-sending approval on the corresponding platform, at present, the business party is mainly used for auditing by manpower, and some texts even need to be audited in multiple aspects, for example, the existing OPR (Optical Pattern Recognition, optical identification) system is required to derive the corresponding texts from the business department first, obtain the text information in the texts, and maintain the texts in the database of the OPR after the personnel to be audited confirm, if the text information is not in the database of the OPR, the text cannot be massively sent, the auditing process is complicated, the efficiency of the company is greatly reduced, and the manpower resources of the company are wasted.
Disclosure of Invention
The application mainly aims to provide a text auditing method, a text auditing device, computer equipment and a storage medium, and aims to solve the problem that the traditional auditing process is complicated.
The application provides a text auditing method, which comprises the following steps:
acquiring a text to be checked, and detecting the category to which the text to be checked belongs;
acquiring a plurality of pre-stored texts corresponding to the categories and satisfaction corresponding to the pre-stored texts from a text database;
performing data preprocessing on the pre-stored texts based on a professional word stock and a special character identification stock, so as to obtain temporary texts corresponding to the pre-stored texts;
word segmentation is carried out on each temporary text through a text classifier;
calculating the weight of each word according to the satisfaction corresponding to each temporary text to obtain a word weight database;
calculating the prediction satisfaction degree of the text to be checked according to the weight corresponding to each word in the word weight database;
and judging whether the text to be checked meets the pushing requirement according to the prediction satisfaction.
Further, the step of calculating the prediction satisfaction degree of the text to be checked according to the weight corresponding to each word in the word weight database includes:
word segmentation is carried out on the text to be checked through the text classifier, and initial words are obtained;
deleting the words which do not exist in the word weight database in the initial words to obtain target words, and obtaining the number corresponding to the target words after word segmentation;
and calculating the prediction satisfaction through a preset prediction satisfaction calculation formula according to the number of the target words, the corresponding number and weight of each target word.
Further, the step of calculating the weight of each word according to the satisfaction corresponding to the temporary text to obtain a word weight database includes:
acquiring text scores of the words;
according to the text scores of the words and the satisfaction degree of the temporary texts; calculating the matching value of each word in each temporary text through a preset matching value calculation formula;
counting the number of each word in each temporary text, and calculating the weight of each word according to the matching value of each word in each temporary text by a preset weight calculation formula;
and constructing a word weight database according to the weights of the words.
Further, the step of detecting the category to which the text to be checked belongs includes:
acquiring text information of the text to be checked, and vectorizing the text information to obtain a first vector corresponding to the text to be checked;
calculating the similarity between the first vector and the second vector corresponding to each class according to a preset similarity calculation formula;
and judging the category to which the text to be checked belongs according to the similarity.
Further, the step of obtaining a plurality of pre-stored texts corresponding to the category and satisfaction corresponding to the pre-stored texts from a text database includes:
acquiring a plurality of pre-stored texts from the text database, and acquiring service conversion rate and text information of the pre-stored texts;
and calculating the satisfaction degree of the pre-stored text according to the service conversion rate and the text information rate.
Further, after the step of determining whether the text to be checked meets the push requirement according to the predicted satisfaction, the method further includes:
if the pushing requirement is met, inputting the text to be checked into an n-gram model for calculation to obtain a legal value of the text to be checked;
judging whether the legal value reaches a preset legal value or not;
and if the preset legal value is reached, pushing the text to be checked.
The application also provides a text auditing device, which comprises:
the system comprises a text to be audited acquisition module, a text verification module and a text verification module, wherein the text to be audited acquisition module is used for acquiring a text to be audited and detecting the category to which the text to be audited belongs;
the pre-stored text acquisition module is used for acquiring a plurality of pre-stored texts corresponding to the categories and satisfaction corresponding to the pre-stored texts from a text database;
the preprocessing module is used for preprocessing data of the pre-stored texts based on a professional word stock and a special character identification stock so as to obtain temporary texts corresponding to the pre-stored texts;
the word segmentation module is used for segmenting each temporary text through a text classifier;
the weight calculation module is used for calculating the weight of each word according to the satisfaction corresponding to each temporary text so as to obtain a word weight database;
the prediction satisfaction calculation module is used for calculating the prediction satisfaction of the text to be checked according to the weight corresponding to each word in the word weight database;
and the auditing module is used for judging whether the text to be audited meets the pushing requirement according to the prediction satisfaction.
Further, the prediction satisfaction calculation module includes:
the word segmentation sub-module is used for segmenting the text to be checked through the text classifier to obtain each initial word;
the deleting sub-module is used for deleting the words which do not exist in the word weight database in the initial words to obtain target words, and obtaining the number corresponding to the target words after word segmentation;
and the prediction satisfaction calculating sub-module is used for calculating the prediction satisfaction through a preset prediction satisfaction calculating formula according to the number of the target words, the number and the weight corresponding to each target word.
The application also provides a computer device comprising a memory storing a computer program and a processor implementing the steps of any of the methods described above when the processor executes the computer program.
The application also provides a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of the method of any of the preceding claims.
The application has the beneficial effects that: according to the text auditing method, the predicted satisfaction degree of the text to be audited can be calculated based on the satisfaction degree of other prestored texts of the category, so that the text to be audited is audited automatically, and the human resources of a company are saved.
Drawings
FIG. 1 is a flow chart of a text auditing method according to an embodiment of the present application;
FIG. 2 is a block diagram of a text auditing method according to an embodiment of the present application;
fig. 3 is a schematic block diagram of a computer device according to an embodiment of the present application.
The achievement of the objects, functional features and advantages of the present application will be further described with reference to the accompanying drawings, in conjunction with the embodiments.
Detailed Description
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
It should be noted that, in the embodiments of the present application, all directional indicators (such as up, down, left, right, front, and back) are merely used to explain the relative positional relationship, movement conditions, and the like between the components in a specific posture (as shown in the drawings), if the specific posture is changed, the directional indicators correspondingly change, and the connection may be a direct connection or an indirect connection.
The term "and/or" is herein merely an association relation describing an associated object, meaning that there may be three relations, e.g., a and B, may represent: a exists alone, A and B exist together, and B exists alone.
Furthermore, descriptions such as those referred to as "first," "second," and the like, are provided for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implying an order of magnitude of the indicated technical features in the present disclosure. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In addition, the technical solutions of the embodiments may be combined with each other, but it is necessary to base that the technical solutions can be realized by those skilled in the art, and when the technical solutions are contradictory or cannot be realized, the combination of the technical solutions should be considered to be absent and not within the scope of protection claimed in the present application.
Referring to fig. 1, the application provides a text auditing method, which comprises the following steps:
s1: acquiring a text to be checked, and detecting the category to which the text to be checked belongs;
s2: acquiring a plurality of pre-stored texts corresponding to the categories and satisfaction corresponding to the pre-stored texts from a text database;
s3: performing data preprocessing on the pre-stored texts based on a professional word stock and a special character identification stock, so as to obtain temporary texts corresponding to the pre-stored texts;
s4: word segmentation is carried out on each temporary text through a text classifier;
s5: calculating the weight of each word according to the satisfaction corresponding to each temporary text to obtain a word weight database;
s6: calculating the prediction satisfaction degree of the text to be checked according to the weight corresponding to each word in the word weight database;
s7: and judging whether the text to be checked meets the pushing requirement according to the prediction satisfaction.
And (2) acquiring the text to be checked and detecting the category to which the text to be checked belongs as described in the step (S1). The text to be checked is obtained by editing and uploading by a user, the category to which the text to be checked belongs is detected, text information in the text to be checked is extracted, the text information is identified according to the semantics to determine the category of the text to be checked, vectorization can be carried out on the extracted text information, similarity calculation is carried out on vectors corresponding to all the categories, and the category with the highest similarity is set as the category of the text to be checked.
As described in the above step S2, a plurality of pre-stored texts corresponding to the category and satisfaction corresponding to the pre-stored texts are obtained from a text database. Since the category of the text to be checked is known, a plurality of pre-stored texts corresponding to the category can be obtained from a database, wherein the pre-stored texts are texts after the checking is passed and contain corresponding service information, and after the pre-stored texts are sent to clients, corresponding feedback messages can be obtained, and the feedback messages can be the click quantity of the pre-stored texts or the increment quantity of the corresponding service. And converting the feedback information into satisfaction according to the corresponding relation.
As described in step S3 above, pre-stored texts are pre-processed based on professional word libraries and special character identification libraries, such as professional words like "birthday blessing", "marketing", "service", etc., words and sentences in the pre-stored texts are pre-processed based on the professional words, the pre-processing includes converting words in the words and sentences, cleaning and filtering dirty words, de-duplicating repeated words, replacing synonyms, supplementing incomplete sentences according to sentence meaning, etc. Wherein the characters in the special character identification library are "@, #,"% and the like ", and the characters are deleted when preprocessing is performed.
As described in the above step S4, a Python LibShortText text classifier may be adopted and replaced by a chinese word segmentation device in a self-defined manner, and words in the temporary text may be obtained by calculation through any algorithm such as decision tree, multi-layer perceptron, naive bayes (including bernoulli bayes, gao Sibei phyllos and polynomial bayes), random forest, adaBoost, feedforward neural network and LSTM.
And calculating the weight of each word according to the satisfaction corresponding to the temporary text in the step S5. Because the temporary text corresponds to the previous pre-stored text, and the pre-stored text is the transmitted text, the corresponding satisfaction degree can be obtained according to the reading interest of people, and the weight is given to the words according to the satisfaction degree of each pre-stored text. The specific method of assigning is described in detail later and will not be described here.
As described in step S6, the predicted satisfaction of the text to be checked is calculated according to the weight of each word, for example, the predicted satisfaction is calculated according to the number of the target words, the corresponding number and weight of each target word, and a preset predicted satisfaction calculation formula, where the preset predicted satisfaction calculation formula may be that the predicted satisfaction is obtained by accumulating products of the number of each target word and the corresponding weight, and the influence of the text length on the satisfaction may be further considered, and the effects are multiplied by corresponding coefficients. Because the weight is obtained based on satisfaction corresponding to other temporary texts, the value obtained by calculating the weight is also a value related to the satisfaction, so that the satisfaction of the text to be audited can be calculated through a preset predictive satisfaction calculation formula, and the predictive satisfaction is obtained. The specific calculation formula is described in detail later, and is not described here again.
And (7) judging whether the text to be checked meets the pushing requirement according to the predicted satisfaction degree. Specifically, a satisfaction threshold may be set, when the satisfaction threshold is reached, it is determined that the corresponding text to be checked passes the check, and when the satisfaction threshold is lower than the satisfaction threshold, it is determined that the corresponding text to be checked does not pass the check.
In one embodiment, the step S6 of calculating the predicted satisfaction of the text to be checked according to the weight of each word includes:
s601: word segmentation is carried out on the text to be checked through the text classifier, and initial words are obtained;
s602: deleting the words which do not exist in the word weight database in the initial words to obtain target words, and obtaining the number corresponding to the target words after word segmentation;
s603: and calculating the prediction satisfaction through a preset prediction satisfaction calculation formula according to the number of the target words, the corresponding number and weight of each target word.
As described in the above steps S601 to S602, calculation of the prediction satisfaction of the text to be checked is achieved. Specifically, the text to be checked can be subjected to word segmentation through a text classifier, and then the number of words corresponding to the word segmentation is obtained, and as the text to be checked possibly has no words in the word weight database, no corresponding weights exist in the word weight database, and therefore the words can be deleted. Wherein the preset predictive satisfaction calculation formula is as followsWherein->Representing the number of the initial words, +.>Representing the number of target words, +.>Representing the number of i-th target words, +.>Weight representing the i-th target word, +.>For the set standard number, ++>,/>,/>Are allAnd (5) setting parameters. In addition, the satisfaction degree of the text to be checked is related to the content and the length of the text, and a standard number can be set in advanceAs a coefficient, then subtracting the reading influence by the text length +.>It will be appreciated that too long or too short a text length affects the corresponding satisfaction, thus setting the function +.>When the text length is standard, i.e. the number of words in the text x is standard +.>The function takes a minimum value b.
In one embodiment, the step S5 of calculating the weight of each word according to the satisfaction degree corresponding to each temporary text to obtain a word weight database includes:
s501: acquiring text scores of the words;
s502: according to the text scores of the words and the satisfaction degree of the temporary texts; calculating the matching value of each word in each temporary text through a preset matching value calculation formula;
s503: counting the number of each word in each temporary text, and calculating the weight of each word according to the matching value of each word in each temporary text by a preset weight calculation formula;
s504: and constructing a word weight database according to the weights of the words.
As described in the above steps S501-S504, weight calculation for each word is implemented. In particular, since it is impossible to determine what word comparison attracts customers, satisfaction of pre-stored texts and text scores of the respective words can be known, and thus the corresponding words can be extracted individually,then, based on the values of the words in the different pre-stored texts, a matching value in each pre-stored text is calculated, it being understood that some words will be more emotional, so that the text scores of the words can be obtained in advance, and then the matching value in the temporary text is calculated, in a specific embodiment, the matching value can be calculated by the formulaWherein->Indicate->The>Matching value corresponding to each word, +.>Indicate->Said satisfaction of the individual texts, +.>Indicate->Text scores corresponding to the individual terms; then the average value of the corresponding words is calculated as the final weight of the words, namely the weight calculation formula is +.>And calculating the weight of each word, and constructing a word weight database according to the weight of each word.
In one embodiment, the step S1 of detecting the category to which the text to be checked belongs includes:
s101: acquiring text information of the text to be checked, and vectorizing the text information to obtain a first vector corresponding to the text to be checked;
s102: according to a preset similarity calculation formula
S103: and judging the category to which the text to be checked belongs according to the similarity.
As described in the above steps S101 to S103, detection of the category to which the text to be checked belongs is achieved. Specifically, similarity calculation may be performed according to the text information of the text to be checked, and vector information corresponding to each class, where the closer the calculated value of similarity is to 1, the more similar the first vector is to the second vector, and the closer the calculated value of similarity is to-1, the more dissimilar the first vector is to the second vector. And then selecting a category corresponding to the second vector with the largest similarity value as the category of the text to be checked. Wherein the preset similarity calculation formula can be thatWherein, said->Representing a first vector, said ++>Representing a second vector; />Representing the similarity of the first vector and the second vector.
In one embodiment, the step S2 of obtaining a plurality of pre-stored texts corresponding to the category from the text database and satisfaction corresponding to the pre-stored texts includes:
s201: acquiring a plurality of pre-stored texts from the text database, and acquiring service conversion rate and text information of the pre-stored texts;
s202: and calculating the satisfaction degree of the pre-stored text according to the service conversion rate and the text information rate.
As described in the above steps S201 to S202, the acquisition of each of the pre-stored texts and the corresponding satisfaction is realized. Specifically, since the category of the text to be checked is known, a plurality of pre-stored texts can be acquired from the corresponding category in the database, and since the information received by the client in different periods of time may be different, it is preferable to acquire the pre-stored text closer to the current time. And calculating to obtain the satisfaction degree corresponding to the pre-stored text through the corresponding service conversion rate and text information, wherein the service conversion rate and the text information are also stored in a corresponding text database. That is, after each text is sent out, the data is collected and then is summarized into a text database for storage. And then calculating the corresponding satisfaction according to the preset corresponding relation, thereby realizing the acquisition of the satisfaction.
In one embodiment, after the step S7 of determining whether the text to be checked meets the push requirement according to the predicted satisfaction, the method further includes:
s801: if the pushing requirement is met, inputting the text to be checked into an n-gram model for calculation to obtain a legal value of the text to be checked;
s802: judging whether the legal value reaches a preset legal value or not;
s803: and if the preset legal value is reached, pushing the text to be checked.
As described in the above steps S801 to S803, the calculation of the legal value of the text to be audited is realized, and the text to be audited is further audited. Inputting the text to be checked into an n-gram model for calculation to obtain a legal value of the text to be checked. The n-gram model is trained based on standard sentences with complete sentence meaning and no grammar errors, and then the calculated legal value is compared with a preset legal value, if the legal value is lower than the preset legal value, the text to be checked can be considered to have grammar errors, and the corresponding personnel are required to confirm. If the preset legal value is reached, the corresponding text to be checked can be pushed.
Referring to fig. 2, the present application proposes a text auditing apparatus, including:
the text to be audited acquisition module 10 is used for acquiring the text to be audited and detecting the category to which the text to be audited belongs;
a pre-stored text obtaining module 20, configured to obtain a plurality of pre-stored texts corresponding to the categories and satisfaction corresponding to the pre-stored texts from a text database;
the preprocessing module 30 is configured to perform data preprocessing on the pre-stored texts based on a professional word stock and a special character identifier stock, so as to obtain temporary texts corresponding to each pre-stored text;
a word segmentation module 40, configured to segment each of the temporary texts through a text classifier;
the weight calculation module 50 is configured to calculate a weight of each word according to the satisfaction corresponding to each temporary text, so as to obtain a word weight database;
a predicted satisfaction calculating module 60, configured to calculate a predicted satisfaction of the text to be checked according to the weight corresponding to each word in the word weight database;
and the auditing module 70 is used for judging whether the text to be audited meets the pushing requirement according to the prediction satisfaction.
In one embodiment, the predictive satisfaction calculation module 60 includes:
the word segmentation sub-module is used for segmenting the text to be checked through the text classifier to obtain each initial word;
the deleting sub-module is used for deleting the words which do not exist in the word weight database in the initial words to obtain target words, and obtaining the number corresponding to the target words after word segmentation;
and the prediction satisfaction calculating sub-module is used for calculating the prediction satisfaction through a preset prediction satisfaction calculating formula according to the number of the target words, the number and the weight corresponding to each target word.
In one embodiment, the weight calculation module 50 includes:
the text score obtaining sub-module is used for obtaining the text score of each word;
a matching value calculation sub-module, configured to calculate a text score according to each term and satisfaction of each temporary text; calculating the matching value of each word in each temporary text through a preset matching value calculation formula;
the weight calculation sub-module is used for counting the number of each word in each temporary text and calculating the weight of each word according to the matching value of each word in each temporary text through a preset weight calculation formula;
the word weight database construction module is used for constructing the word weight database according to the weights of the words.
In one embodiment, the text to be checked acquisition module 10 includes:
the text information acquisition sub-module is used for acquiring the text information of the text to be checked and vectorizing the text information to obtain a first vector corresponding to the text to be checked;
the similarity calculation submodule is used for calculating the similarity of the first vector and the second vector corresponding to each class according to a preset similarity calculation formula;
and the category judging module is used for judging the category to which the text to be checked belongs according to the similarity.
In one embodiment, the pre-stored text retrieval module 20 includes:
a pre-stored text obtaining sub-module, configured to obtain a plurality of pre-stored texts from the text database, and obtain service conversion rate and text information of the pre-stored texts;
and the satisfaction degree calculating sub-module is used for calculating the satisfaction degree of the pre-stored text according to the service conversion rate and the text information rate.
In one embodiment, the text auditing apparatus further includes:
the legal value calculation module is used for inputting the text to be checked into the n-gram model for calculation if the pushing requirement is met, so as to obtain the legal value of the text to be checked;
the legal value judging module is used for judging whether the legal value reaches a preset legal value or not;
and the pushing module is used for pushing the text to be checked if the preset legal value is reached.
The application has the beneficial effects that: according to the text auditing method, the predicted satisfaction degree of the text to be audited can be calculated based on the satisfaction degree of other prestored texts of the category, so that the text to be audited is audited automatically, and the human resources of a company are saved.
Referring to fig. 3, in an embodiment of the present application, there is further provided a computer device, which may be a server, and an internal structure thereof may be as shown in fig. 3. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the computer is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used to store various pre-stored texts and the like. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program may implement the text auditing method according to any of the above embodiments when executed by a processor.
It will be appreciated by those skilled in the art that the architecture shown in fig. 3 is merely a block diagram of a portion of the architecture in connection with the present inventive arrangements and is not intended to limit the computer devices to which the present inventive arrangements are applicable.
The embodiment of the application also provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor, can implement the text auditing method according to any of the above embodiments.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by hardware associated with a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium provided by the present application and used in embodiments may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual speed data rate SDRAM (SSRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, apparatus, article or method that comprises the element.
The above description is only of the preferred embodiments of the present application and is not intended to limit the present application, but various modifications and variations can be made to the present application by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims (8)

1. A method for auditing text, comprising:
acquiring a text to be checked, and detecting the category to which the text to be checked belongs;
acquiring a plurality of pre-stored texts corresponding to the categories and satisfaction corresponding to the pre-stored texts from a text database;
performing data preprocessing on the pre-stored texts based on a professional word stock and a special character identification stock, so as to obtain temporary texts corresponding to the pre-stored texts;
word segmentation is carried out on each temporary text through a text classifier;
calculating the weight of each word according to the satisfaction corresponding to each temporary text to obtain a word weight database;
calculating the prediction satisfaction degree of the text to be checked according to the weight corresponding to each word in the word weight database;
judging whether the text to be checked meets the pushing requirement according to the prediction satisfaction;
the step of calculating the prediction satisfaction degree of the text to be checked according to the weight corresponding to each word in the word weight database comprises the following steps:
word segmentation is carried out on the text to be checked through the text classifier, and initial words are obtained;
deleting the words which do not exist in the word weight database in the initial words to obtain target words, and obtaining the number corresponding to the target words after word segmentation;
calculating the prediction satisfaction through a preset prediction satisfaction calculation formula according to the number of the target words, the corresponding number and weight of each target word;
wherein the preset prediction satisfaction calculation formula is as followsWherein->Representing the number of the initial words, +.>Representing the number of target words, +.>Representing the number of i-th target words, +.>Weight representing the i-th target word, +.>For the set standard number, ++>,/>,/>All are set parameters.
2. The method for auditing text according to claim 1, wherein the step of calculating the weight of each word according to the satisfaction corresponding to each temporary text to obtain a word weight database comprises the steps of:
acquiring text scores of the words;
according to the text scores of the words and the satisfaction degree of the temporary texts; calculating the matching value of each word in each temporary text through a preset matching value calculation formula;
counting the number of each word in each temporary text, and calculating the weight of each word according to the matching value of each word in each temporary text by a preset weight calculation formula;
constructing a word weight database according to the weights of the words;
wherein, the calculation formula of the matching value is thatWherein->Indicate->The>Matching value corresponding to each word, +.>Indicate->Said satisfaction of the individual texts, +.>Indicate->Text scores corresponding to the individual terms;
the weight calculation formula is as followsAnd calculating the weight of each word, and constructing a word weight database according to the weight of each word.
3. The text review method of claim 1, wherein the step of detecting the category to which the text to be reviewed belongs comprises:
acquiring text information of the text to be checked, and vectorizing the text information to obtain a first vector corresponding to the text to be checked;
calculating the similarity between the first vector and the second vector corresponding to each class according to a preset similarity calculation formula;
judging the category of the text to be checked according to the similarity;
wherein the preset similarity calculation formula is as followsWherein, said->Representing a first vector, said ++>Representing a second vector; />Representing the similarity of the first vector and the second vector.
4. The text auditing method of claim 1, wherein the step of obtaining a plurality of pre-stored texts corresponding to the category from a text database, and satisfaction corresponding to the pre-stored texts, comprises:
acquiring a plurality of pre-stored texts from the text database, and acquiring service conversion rate and text information of the pre-stored texts;
and calculating the satisfaction degree of the pre-stored text according to the service conversion rate and the text information rate.
5. The text auditing method of claim 1, further comprising, after the step of determining whether the text to be audited meets push requirements according to the predicted satisfaction:
if the pushing requirement is met, inputting the text to be checked into an n-gram model for calculation to obtain a legal value of the text to be checked;
judging whether the legal value reaches a preset legal value or not;
and if the preset legal value is reached, pushing the text to be checked.
6. A text auditing apparatus, comprising:
the system comprises a text to be audited acquisition module, a text verification module and a text verification module, wherein the text to be audited acquisition module is used for acquiring a text to be audited and detecting the category to which the text to be audited belongs;
the pre-stored text acquisition module is used for acquiring a plurality of pre-stored texts corresponding to the categories and satisfaction corresponding to the pre-stored texts from a text database;
the preprocessing module is used for preprocessing data of the pre-stored texts based on a professional word stock and a special character identification stock so as to obtain temporary texts corresponding to the pre-stored texts;
the word segmentation module is used for segmenting each temporary text through a text classifier;
the weight calculation module is used for calculating the weight of each word according to the satisfaction corresponding to each temporary text so as to obtain a word weight database;
the prediction satisfaction calculation module is used for calculating the prediction satisfaction of the text to be checked according to the weight corresponding to each word in the word weight database;
the auditing module is used for judging whether the text to be audited meets the pushing requirement according to the prediction satisfaction;
the prediction satisfaction calculation module comprises:
the word segmentation sub-module is used for segmenting the text to be checked through the text classifier to obtain each initial word;
the deleting sub-module is used for deleting the words which do not exist in the word weight database in the initial words to obtain target words, and obtaining the number corresponding to the target words after word segmentation;
the prediction satisfaction calculating sub-module is used for calculating the prediction satisfaction through a preset prediction satisfaction calculating formula according to the number of the target words, the number and the weight corresponding to each target word;
wherein the preset prediction satisfaction calculation formula is as followsWherein->Representing the number of the initial words, +.>Representing the number of target words, +.>Representing the number of i-th target words, +.>Weight representing the i-th target word, +.>For the set standard number, ++>,/>,/>All are set parameters.
7. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any one of claims 1 to 5 when the computer program is executed.
8. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 5.
CN202011247736.0A 2020-11-10 2020-11-10 Text auditing method and device, computer equipment and storage medium Active CN112163585B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011247736.0A CN112163585B (en) 2020-11-10 2020-11-10 Text auditing method and device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011247736.0A CN112163585B (en) 2020-11-10 2020-11-10 Text auditing method and device, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN112163585A CN112163585A (en) 2021-01-01
CN112163585B true CN112163585B (en) 2023-11-10

Family

ID=73865708

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011247736.0A Active CN112163585B (en) 2020-11-10 2020-11-10 Text auditing method and device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112163585B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113807081B (en) * 2021-09-18 2024-06-14 北京云上曲率科技有限公司 Chat text content error correction method and device based on context

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6845374B1 (en) * 2000-11-27 2005-01-18 Mailfrontier, Inc System and method for adaptive text recommendation
CN101000627A (en) * 2007-01-15 2007-07-18 北京搜狗科技发展有限公司 Method and device for issuing correlation information
CN106445998A (en) * 2016-05-26 2017-02-22 达而观信息科技(上海)有限公司 Text content auditing method and system based on sensitive word
CN109543516A (en) * 2018-10-16 2019-03-29 深圳壹账通智能科技有限公司 Signing intention judgment method, device, computer equipment and storage medium
CN110377900A (en) * 2019-06-17 2019-10-25 深圳壹账通智能科技有限公司 Checking method, device, computer equipment and the storage medium of Web content publication
CN110737818A (en) * 2019-09-06 2020-01-31 平安科技(深圳)有限公司 Network release data processing method and device, computer equipment and storage medium
CN111046142A (en) * 2019-12-13 2020-04-21 深圳前海环融联易信息科技服务有限公司 Text examination method and device, electronic equipment and computer storage medium
CN111126928A (en) * 2018-10-29 2020-05-08 阿里巴巴集团控股有限公司 Method and device for auditing release content
CN111275410A (en) * 2020-02-29 2020-06-12 重庆百事得大牛机器人有限公司 Remote interaction method for remote counselor of enterprise
CN111274782A (en) * 2020-02-25 2020-06-12 平安科技(深圳)有限公司 Text auditing method and device, computer equipment and readable storage medium
CN111832300A (en) * 2020-07-24 2020-10-27 中国联合网络通信集团有限公司 Contract auditing method and device based on deep learning
CN111859958A (en) * 2020-07-23 2020-10-30 中国平安财产保险股份有限公司 High-complaint-risk user identification method, complaint early warning method and related equipment

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9898511B2 (en) * 2015-01-22 2018-02-20 International Business Machines Corporation Method of manipulating vocabulary depending on the audience

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6845374B1 (en) * 2000-11-27 2005-01-18 Mailfrontier, Inc System and method for adaptive text recommendation
CN101000627A (en) * 2007-01-15 2007-07-18 北京搜狗科技发展有限公司 Method and device for issuing correlation information
CN106445998A (en) * 2016-05-26 2017-02-22 达而观信息科技(上海)有限公司 Text content auditing method and system based on sensitive word
CN109543516A (en) * 2018-10-16 2019-03-29 深圳壹账通智能科技有限公司 Signing intention judgment method, device, computer equipment and storage medium
CN111126928A (en) * 2018-10-29 2020-05-08 阿里巴巴集团控股有限公司 Method and device for auditing release content
CN110377900A (en) * 2019-06-17 2019-10-25 深圳壹账通智能科技有限公司 Checking method, device, computer equipment and the storage medium of Web content publication
CN110737818A (en) * 2019-09-06 2020-01-31 平安科技(深圳)有限公司 Network release data processing method and device, computer equipment and storage medium
CN111046142A (en) * 2019-12-13 2020-04-21 深圳前海环融联易信息科技服务有限公司 Text examination method and device, electronic equipment and computer storage medium
CN111274782A (en) * 2020-02-25 2020-06-12 平安科技(深圳)有限公司 Text auditing method and device, computer equipment and readable storage medium
CN111275410A (en) * 2020-02-29 2020-06-12 重庆百事得大牛机器人有限公司 Remote interaction method for remote counselor of enterprise
CN111859958A (en) * 2020-07-23 2020-10-30 中国平安财产保险股份有限公司 High-complaint-risk user identification method, complaint early warning method and related equipment
CN111832300A (en) * 2020-07-24 2020-10-27 中国联合网络通信集团有限公司 Contract auditing method and device based on deep learning

Also Published As

Publication number Publication date
CN112163585A (en) 2021-01-01

Similar Documents

Publication Publication Date Title
CN111783474B (en) Comment text viewpoint information processing method and device and storage medium
CN111108501B (en) Context-based multi-round dialogue method, device, equipment and storage medium
US9740677B2 (en) Methods and systems for analyzing communication situation based on dialogue act information
US20160210279A1 (en) Methods and systems for analyzing communication situation based on emotion information
CN110413961B (en) Method and device for text scoring based on classification model and computer equipment
CN111191442B (en) Similar problem generation method, device, equipment and medium
CN114239547A (en) Statement generation method, electronic device and storage medium
CN111400340B (en) Natural language processing method, device, computer equipment and storage medium
CN112163585B (en) Text auditing method and device, computer equipment and storage medium
CN113569021B (en) Method for classifying users, computer device and readable storage medium
CN111310462A (en) User attribute determination method, device, equipment and storage medium
CN114358736A (en) Customer service work order generation method and device, storage medium and electronic equipment
Nasiboglu et al. COMPARISON OF SPACY AND STANFORD LIBRARIES'PRE-TRAINED DEEP LEARNING MODELS FOR NAMED ENTITY RECOGNITION.
CN113052487A (en) Evaluation text processing method and device and computer equipment
CN112711662A (en) Text acquisition method and device, readable storage medium and electronic equipment
CN111178718A (en) Fair competition auditing method, server, system and storage medium
CN113221792B (en) Chapter detection model construction method, cataloguing method and related equipment
CN109344388A (en) A kind of comment spam recognition methods, device and computer readable storage medium
CN114283429A (en) Material work order data processing method, device, equipment and storage medium
CN114254088A (en) Method for constructing automatic response model and automatic response method
CN114298460A (en) Material work order assignment processing method, device, equipment and storage medium
CN113935307A (en) Method and device for extracting features of advertisement case
CN113850085B (en) Enterprise grade evaluation method and device, electronic equipment and readable storage medium
CN112287669B (en) Text processing method and device, computer equipment and storage medium
US20170169032A1 (en) Method and system of selecting and orderingcontent based on distance scores

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20231016

Address after: Room 114, No.128 Shexin Road, Sheshan Town, Songjiang District, Shanghai, 201600

Applicant after: Shanghai Seven Cats Culture Media Co.,Ltd.

Address before: 518000 Room 202, block B, aerospace micromotor building, No.7, Langshan No.2 Road, Xili street, Nanshan District, Shenzhen City, Guangdong Province

Applicant before: Shenzhen LIAN intellectual property service center

Effective date of registration: 20231016

Address after: 518000 Room 202, block B, aerospace micromotor building, No.7, Langshan No.2 Road, Xili street, Nanshan District, Shenzhen City, Guangdong Province

Applicant after: Shenzhen LIAN intellectual property service center

Address before: 518000 Room 201, building A, No. 1, Qian Wan Road, Qianhai Shenzhen Hong Kong cooperation zone, Shenzhen, Guangdong (Shenzhen Qianhai business secretary Co., Ltd.)

Applicant before: PING AN PUHUI ENTERPRISE MANAGEMENT Co.,Ltd.

GR01 Patent grant
GR01 Patent grant