WO2020253350A1 - Procédé et appareil de vérification de publication de contenu de réseau, dispositif informatique et support de stockage - Google Patents

Procédé et appareil de vérification de publication de contenu de réseau, dispositif informatique et support de stockage Download PDF

Info

Publication number
WO2020253350A1
WO2020253350A1 PCT/CN2020/085582 CN2020085582W WO2020253350A1 WO 2020253350 A1 WO2020253350 A1 WO 2020253350A1 CN 2020085582 W CN2020085582 W CN 2020085582W WO 2020253350 A1 WO2020253350 A1 WO 2020253350A1
Authority
WO
WIPO (PCT)
Prior art keywords
basic
content
preset
word segmentation
published
Prior art date
Application number
PCT/CN2020/085582
Other languages
English (en)
Chinese (zh)
Inventor
夏新
Original Assignee
深圳壹账通智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳壹账通智能科技有限公司 filed Critical 深圳壹账通智能科技有限公司
Publication of WO2020253350A1 publication Critical patent/WO2020253350A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Definitions

  • This application relates to the field of natural language processing, and in particular to a review method, device, computer equipment, and storage medium for publishing network content.
  • keyword detection is mainly used for auditing.
  • the inventor realizes that this auditing method can only perform matching based on preset keywords, and then judges whether the published content is standardized, which is limited by the setting of keywords. Moreover, it is easy for users to avoid keywords to publish bad content, which makes the review intelligence and accuracy rate of online published content low.
  • the embodiments of the present application provide a method, device, computer equipment and storage medium for reviewing network content publishing to solve the current keyword matching method for reviewing network content publishing, which leads to the problems of low intelligence and low accuracy of review. .
  • a review method for network content publishing including:
  • the current user information is matched with each user information in the preset list type database to determine the user type corresponding to the current user information, wherein the list type database includes each user information and the user information corresponding User type;
  • the comprehensive score is compared with a preset score threshold. If the comprehensive score is greater than the preset score threshold, it is confirmed that the content to be published is legal, the content to be published is published, and the review is sent to the client Message passed.
  • a review device for publishing network content including:
  • the request receiving module is configured to obtain the current user information and the content to be published included in the review request if the review request for network content publishing sent by the client is received;
  • the type matching module is used to match the current user information with each user information in the preset list type database to determine the user type corresponding to the current user information, wherein the list type database includes each user information The user type corresponding to the user information;
  • the content analysis module is configured to, if the user type corresponding to the current user information is an ordinary user, analyze the content to be published according to a preset sentence division method to obtain each basis contained in the content to be published Statement
  • the semantic recognition module is used to perform semantic recognition on each of the basic sentences in a natural language semantic recognition method, and obtain the semantic score corresponding to each of the basic sentences;
  • the comprehensive score module is used to determine the comprehensive score of the content to be published according to the semantic score of each basic sentence
  • the result determination module is configured to compare the comprehensive score with a preset score threshold. If the comprehensive score is greater than the preset score threshold, confirm that the content to be published is legal, publish the content to be published, and send it to The client sends an approved message.
  • a computer device includes a memory, a processor, and a computer program stored in the memory and capable of running on the processor, and when the processor executes the computer program, a method for reviewing network content publishing is implemented, Including: when receiving a review request for publishing network content from a client, obtaining the current user information and content to be published contained in the review request, matching the current user information with each user information in the preset list type database, and confirming The user type corresponding to the current user information.
  • the content to be published is analyzed according to the preset sentence division method, and each basic sentence contained in the content to be published is obtained, and then natural
  • the method of language semantic recognition is to perform semantic recognition on each basic sentence, and obtain the semantic score corresponding to each basic sentence. Then, according to the semantic score of each basic sentence, determine the comprehensive score of the content to be published, and finally combine the comprehensive score with the prediction Set a scoring threshold for comparison. When the comprehensive score is greater than the preset scoring threshold, confirm that the content to be published is legal, publish the content to be published, and send an approved message to the client.
  • a computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, a method for reviewing network content publishing is realized, including: receiving a client sending a network content publishing In the review request, obtain the current user information and content to be published in the review request, match the current user information with each user information in the preset list type database, and determine the user type corresponding to the current user information.
  • the user type corresponding to the information is an ordinary user
  • the content to be published is analyzed according to the preset sentence division method to obtain each basic sentence contained in the content to be published, and then natural language semantic recognition is used to identify each basic sentence Perform semantic recognition to obtain the semantic score corresponding to each basic sentence, and then determine the comprehensive score of the content to be published according to the semantic score of each basic sentence, and finally compare the comprehensive score with the preset score threshold.
  • the score threshold confirm that the content to be published is legal, publish the content to be published, and send an approved message to the client.
  • the method, device, computer equipment, and storage medium for reviewing network content publishing provided by the embodiments of this application realize intelligent semantic recognition of network content, and verify whether the network content publishing is reasonable according to the identified semantics, thereby improving network content The degree of intelligence and accuracy of published audits.
  • FIG. 1 is a schematic diagram of an application environment of a method for reviewing network content publishing provided by an embodiment of the present application
  • FIG. 2 is an implementation flowchart of a method for reviewing network content publishing provided by an embodiment of the present application
  • Figure 3 is a flow chart of reviewing non-ordinary users in the reviewing method for network content publishing provided by an embodiment of the present application
  • FIG. 4 is a flowchart of the implementation of step S40 in the method for reviewing network content publishing provided by an embodiment of the present application;
  • FIG. 5 is a flowchart of the implementation of step S41 in the method for reviewing network content publishing provided by an embodiment of the present application
  • Fig. 6 is a schematic diagram of a verification device for network content publishing provided by an embodiment of the present application.
  • Fig. 7 is a schematic diagram of a computer device provided by an embodiment of the present application.
  • FIG. 1 shows an application environment of a method for reviewing network content publishing provided by an embodiment of the present application.
  • the method for reviewing network content publishing is applied in review scenarios of network content publishing in network forums, network live broadcasts or other types of network communities.
  • the recording scene includes the client, server and management. Among them, the server and the client, the server and the management are connected through the network, the client sends the audit request of the network content release to the server, and the server After the review request is obtained, the user type is determined, and the review method is determined according to the user type.
  • the user type is an ordinary user, the content to be published is obtained, and semantic analysis is performed to obtain the semantic score of the content to be published, and then determine the content to be published.
  • the legality of the published content, and when it is illegal, the corresponding prompt information will be sent to the management terminal.
  • the client and the management terminal may specifically be, but are not limited to, smart terminal devices such as mobile phones, tablet computers, and personal computers (Personal Computer, PC), and the server may specifically be implemented by an independent server or a server cluster composed of multiple servers.
  • FIG. 2 shows a method for reviewing network content publishing provided by an embodiment of the present application. The method is applied to the server in FIG. 1 as an example for description. The details are as follows:
  • the client when the user communicates on the forum through the client, he first edits the content to be published, and after clicking the submit button of the client, the client sends a review request containing the user information and the content to be published to the server, and the server transmits it via the network
  • the agreement receives the user information and content to be published contained in the review request.
  • the user information includes but is not limited to user account information, etc.
  • the server determines the user type through the user account information.
  • the review method corresponding to the user type is used to review the published content. , In order to improve the review efficiency of network content publishing.
  • the content to be published is text information, link information, image information, and video information that is edited by the user on the client and used to upload forums or other online communities, and used to interact with other online users.
  • the network transmission protocol includes but is not limited to: Internet Control Message Protocol (ICMP), Address Resolution Protocol (ARP Address Resolution Protocol, ARP), File Transfer Protocol (File Transfer Protocol, FTP), etc.
  • ICMP Internet Control Message Protocol
  • ARP Address Resolution Protocol ARP
  • FTP File Transfer Protocol
  • S20 Match the current user information with each user information in the preset list type database to determine the user type corresponding to the current user information, where the list type database includes each user information and the user type corresponding to the user information.
  • the server stores a preset list type database.
  • the preset list type database contains the user information of all registered users and the user type corresponding to each user information.
  • the preset list is searched by traversal query.
  • the type database is queried to realize the user type judgment on the user information obtained in step S10, and the user type corresponding to the user information is obtained.
  • the user types contained in the preset list type database may include: whitelisted users, blacklisted users, and ordinary user types.
  • the distinction between different user types is based on the credit rating of the user. For example, the management personnel list The user’s corresponding credit rating is relatively high, and they are generally classified as whitelisted users. Users who have been suspected of illegal operations for many times should be in the normal order of the online community. The corresponding credit rating is low. When the credit rating is reduced to a certain level, they will be listed as black. List of user types.
  • the user type is the user information of ordinary users, and the corresponding audit request needs to be further intelligently evaluated, and the audit result is determined according to the evaluation result.
  • the content to be published is analyzed according to a preset sentence division method to obtain each basic sentence contained in the content to be published.
  • the preset sentence division method may be through regular matching of preset delimiters, and then use the position where the preset delimiter is matched as the delimiting point, and segment the content to be published to obtain the Each basic sentence contained in the publication content.
  • the preset separators include but are not limited to: paragraph characters, line breaks, punctuation marks, etc., which can be specifically set according to actual needs and are not limited here.
  • each of the basic sentences is semantically recognized, and the semantics corresponding to each basic sentence is scored according to preset scoring conditions to obtain the semantic score of each basic sentence.
  • natural language semantic recognition Natural Language Processing, NLP
  • AI artificial intelligence
  • Text to speech/Speech synthesis Speech recognition
  • Chinese word segmentation Chinese word segmentation
  • Part-of-speech tagging Syntax analysis
  • Parsing text classification
  • Text classification Text categorization
  • information retrieval Information retrieval
  • automatic summarization Automatic summarization
  • text proofing text proofing
  • S50 Determine the comprehensive score of the content to be published according to the semantic score of each basic sentence.
  • the semantic score of each basic sentence is weighted and summarized through a preset weighting method to obtain a comprehensive score of the content to be published.
  • the preset weighting method can be set according to actual needs, for example, different weighting coefficients are set for semantic scores in different ranges.
  • S60 Compare the comprehensive score with the preset score threshold. If the comprehensive score is greater than the preset score threshold, confirm that the content to be published is legal, publish the content to be published, and send an approved message to the client.
  • the server presets a scoring threshold, and compares the comprehensive score with the preset scoring threshold. When the comprehensive score is greater than the preset scoring threshold, confirms that the content to be published is legal, publishes the content to be published, and sends the review to the client Message passed.
  • the comprehensive score is greater than or equal to the preset score threshold, it is confirmed that the content to be published may be suspected of violating regulations, and the content to be published will be rejected, and the client will be notified that the review has not passed, and the content to be published
  • the content review request is recorded for subsequent management personnel to manage.
  • the current user information and content to be published contained in the review request are obtained, and the current user information is combined with each user information in the preset list type database.
  • the comprehensive score is greater than the preset score threshold, confirm that the content to be published is legal, publish the content to be published, and send an approved message to the client to realize intelligent evaluation of network content Semantic recognition, and based on the identified semantics to review whether the network content publishing is reasonable, which improves the intelligence and accuracy of reviewing network content publishing.
  • the method for reviewing network content publishing further includes:
  • the content to be published is directly published.
  • the preset list type database is queried by traversal query, and it is determined that the user type corresponding to the current user information is a blacklist user, it is determined that there is no need to review the content to be published containing semantic information, and directly Delete the content to be published, and send a message of disapproval to the client.
  • step S70 and step S80 are not necessarily executed sequentially, and they can be executed in parallel, which is not limited here.
  • the user types are whitelisted users and blacklisted users, and quick review operations are performed in a preset manner, without the need for semantic recognition of the content to be published for users of these two user types, which improves the network content. Release review efficiency.
  • the following uses a specific embodiment to perform the semantic recognition of each basic sentence by using the natural language semantic recognition method mentioned in step S40 to obtain each basic sentence
  • the specific implementation method of the corresponding semantic score will be described in detail.
  • FIG. 4 shows a specific implementation process of step S40 provided in an embodiment of the present application, which is detailed as follows:
  • S41 Perform word segmentation processing on the basic sentence through a preset word segmentation method to obtain the basic word segmentation contained in the basic sentence.
  • each basic sentence obtained in step S30 is subjected to word segmentation processing to obtain the basic word segmentation contained in each basic sentence.
  • the preset word segmentation methods include but are not limited to: third-party word segmentation tools or word segmentation algorithms, etc.
  • common third-party word segmentation tools include, but are not limited to: Stanford NLP word segmentation, ICTCLAS word segmentation system, ansj word segmentation tool and HanLP Chinese word segmentation tool, etc.
  • word segmentation algorithms include but are not limited to: Maximum Matching (MM) algorithm, Reverse Direction Maximum Matching Method (RMM) algorithm, Bi-directction Matching method, BM) algorithm, Hidden Mark Markov Model (Hidden Markov Model, HMM) and N-gram model, etc.
  • the basic word segmentation is extracted by word segmentation. On the one hand, it can filter out some meaningless words in the effective basic sentence. On the other hand, it is also beneficial to use these basic word segmentation to generate word vectors.
  • S42 Convert the basic word segmentation into a word vector, and cluster the word vector through a preset clustering algorithm to obtain the cluster center corresponding to each basic sentence.
  • language representation mainly refers to the formal or mathematical description of language, so that language can be expressed in a computer and can be processed automatically by computer programs.
  • the word vector referred to in the embodiment of this application is to express a basic word segmentation in the form of a vector.
  • the word vector is used to transform each basic word segment to obtain the word vector corresponding to each basic word segment, and then the word vector is clustered through a preset clustering algorithm to obtain the corresponding to each basic word segment.
  • the clustering centers of the word vectors of, and then the clustering centers corresponding to the basic word segmentation in the same basic sentence are continuously clustered to obtain the clustering centers corresponding to the basic sentence.
  • clustering algorithm is also called cluster analysis. It is a statistical analysis method for the classification of samples or indicators. It is also an important algorithm for data mining.
  • Clustering algorithms include but are not limited to: K-Means ) Clustering algorithm, mean shift clustering algorithm, density-based clustering (Density-Based Spatial Clustering of Applications with Noise, DBSCAN) method, maximum expected clustering based on Gaussian mixture model, agglomerative hierarchical clustering and graph group detection ( Graph Community Detection) algorithm, etc.
  • the K-Means clustering algorithm is adopted to cluster the word vectors corresponding to each basic word segment to determine the classification corresponding to each basic word segment, and then to cluster the basic sentences , Get the cluster center corresponding to the basic sentence.
  • the server pre-stores preset semantic vectors representing designated semantics, and each preset semantic vector corresponds to a preset semantic score.
  • the cluster center corresponding to the basic sentence and these predictions are calculated separately.
  • Set the distance of the semantic vector use the preset word meaning vector corresponding to the minimum distance as the target vector, and use the semantic score corresponding to the target vector as the semantic score of the basic sentence.
  • the scoring parameter can be calculated according to the distance between the basic sentence and the target vector, and the semantic score of the basic sentence can be determined according to the scoring parameter and the semantic score corresponding to the target vector.
  • the basic sentence is segmented through a preset word segmentation method to obtain the basic word segmentation contained in the basic sentence, and then the basic word segmentation is converted into a word vector, and the word vector is processed through a preset clustering algorithm Perform clustering to obtain the cluster center corresponding to each basic sentence.
  • a preset word segmentation method to obtain the basic word segmentation contained in the basic sentence
  • the basic word segmentation is converted into a word vector
  • the word vector is processed through a preset clustering algorithm Perform clustering to obtain the cluster center corresponding to each basic sentence.
  • the target vector uses the semantic score corresponding to the target vector as the semantic score corresponding to the basic sentence, which realizes the semantic score of the basic sentence, and improves the intelligence and efficiency of the review.
  • the following uses a specific embodiment to perform word segmentation processing on the basic sentence through the preset word segmentation method mentioned in step S41 to obtain the basic word segmentation contained in the basic sentence
  • the specific implementation method is described in detail.
  • FIG. 5 shows a specific implementation process of step S41 provided by an embodiment of the present application, which is detailed as follows:
  • S411 Obtain a preset training corpus, and use the N-gram model to analyze the preset training corpus to obtain word sequence data of the preset training corpus.
  • the training corpus is used to evaluate the basic sentences in the natural language, and the corpus obtained by training using related corpus, by using the N-gram model to perform statistical analysis on each corpus in the preset training corpus, Obtain the number of times that one corpus H appears after another corpus I in the preset training corpus, and then obtain the word sequence data of the word sequence composed of "corpus I + corpus H".
  • the content of the training corpus in the embodiment of the present application includes, but is not limited to: professional information corresponding to topics of forums or online communities, online corpus, general corpus, etc.
  • Corpus refers to a large-scale electronic text library that has been scientifically sampled and processed.
  • Corpus is the basic resource for linguistic research and the main resource for empirical language research methods. It is used in lexicography, language teaching, traditional language research, and statistical or case-based research in natural language processing.
  • Corpus that is, language materials, Corpus is the content of linguistic research and the basic unit of corpus.
  • the preset training corpus is a corpus in the field of "current affairs" by crawling popular web topics and current affairs news through a web crawler.
  • the word sequence refers to a sequence formed by combining at least two corpora in a certain order
  • the word sequence frequency refers to the ratio of the number of occurrences of the word sequence to the number of occurrences of word segmentation in the entire corpus.
  • the word segmentation here refers to It is a word sequence obtained by combining consecutive word sequences according to a preset combination method. For example, the number of occurrences of a certain word sequence "I love tomatoes" in the entire corpus is 100 times, and the sum of the number of occurrences of all word segments in the entire corpus is 100000 times, then the word sequence frequency of the word sequence "I love tomatoes" is 0.0001 .
  • the N-gram model is a commonly used language model in the semantic recognition of large vocabulary continuous text.
  • the sentence with the greatest probability can be calculated, so as to realize the automatic conversion to Chinese characters without the user's manual selection, which improves the accuracy of word sequence determination.
  • a preset training corpus is obtained, and the N-gram model is used to analyze the preset training corpus to obtain words in the preset training corpus.
  • the sequence data process can be carried out before the review, and the obtained word sequence data can be stored.
  • the word sequence data can be directly called.
  • S412 Perform word segmentation analysis on the basic sentence to obtain M word segmentation sequences.
  • each basic sentence has a different sentence segmentation method, and the understood sentence may have differences.
  • the server obtains the basic sentence after obtaining the composition of the M word segmentation sequence of the basic sentence.
  • M is the total number of all possible word segmentation sequences.
  • each word segmentation sequence is a result obtained by dividing a basic sentence, and the obtained word sequence contains at least two word segmentation.
  • a basic sentence is "Today is really hot”, and the basic sentence is analyzed to obtain the word segmentation sequence A as: “today”, “true”, and “hot”, and the word segmentation sequence B is: "Jin”, “Innocent”, “Hot”, etc.
  • S413 For each word segmentation sequence, calculate the occurrence probability of each word segmentation sequence according to the word sequence data of the preset training corpus to obtain the occurrence probability of M word segmentation sequences.
  • the occurrence probability of each word segmentation sequence is calculated to obtain the occurrence probability of M word segmentation sequences.
  • the Markov hypothesis theory can be used to calculate the occurrence probability of the word segmentation sequence: the appearance of the Y-th word is only related to the previous Y-1 words, and is not related to any other words.
  • the probability of the entire sentence is the probability of the occurrence of each word product.
  • P(T) is the probability of the entire sentence
  • W 1 W 2 ... W Y-1 ) is the probability that the Y- th participle appears after the word sequence composed of Y-1 participles.
  • S414 From the occurrence probabilities of M word segmentation sequences, select the word segmentation sequence corresponding to the occurrence probability that reaches the preset probability threshold as the target word segmentation sequence, and use each word segmentation in the target word segmentation sequence as the basic word segmentation contained in the basic sentence .
  • an occurrence probability is obtained through the calculation in step S413, and the occurrence probabilities of a total of M word segmentation sequences are obtained.
  • the occurrence probabilities of the M word segmentation sequences are respectively compared with the preset probability threshold, and the selection is greater than Or the occurrence probability equal to the preset probability threshold is used as the effective occurrence probability, and then the word segmentation sequence corresponding to the effective occurrence probability is found, and these word segmentation sequences are used as the target word segmentation sequence.
  • the word segmentation sequence whose occurrence probability does not meet the requirements is filtered out, so that the selected target word segmentation sequence is closer to the meaning expressed in natural language, and the accuracy of semantic recognition is improved.
  • the content to be published is determined to be content that does not conform to the specification.
  • the review failure is regarded as the review result, and the The client sends a reminder message of "Please abide by the rules of online speech and be a civilized netizen".
  • the preset number is 5 . After sorting the effective occurrence probabilities, select the first 5 effective occurrence probabilities in the ranking, and then obtain the word segmentation sequence corresponding to these 5 occurrence probabilities as the target word segmentation sequence.
  • the word segmentation sequence corresponding to the maximum occurrence probability is selected as the target word segmentation sequence, so as to reduce the amount of subsequent calculations and improve the review efficiency of network content publishing.
  • the word sequence data of the preset training corpus is obtained, so that words can be used directly when calculating the probability of occurrence.
  • Sequence data which saves time for calculating probability and helps improve audit efficiency.
  • the basic sentence is analyzed by word segmentation to obtain M word segmentation sequences, and then for each word segmentation sequence, based on the word sequence data of the preset training corpus, Calculate the occurrence probability of each word segmentation sequence to obtain the occurrence probability of M word segmentation sequences.
  • each word segmentation in the target word segmentation sequence is used as the basic word segmentation contained in the basic sentence to ensure the accuracy of word segmentation, which is beneficial to improve the accuracy of subsequent clustering and semantic evaluation through basic word segmentation.
  • step S50 according to the semantic score of each basic sentence, the specific implementation process of determining the comprehensive score of the content to be published is detailed as follows:
  • M i is the i-th sentence based semantic scores
  • a and b are preset parameter
  • S i is the i th basis statement weighted score
  • W is the composite score content to be distributed
  • i and n are positive integers, and i ⁇ n.
  • the semantic score can be used to express the degree of semantic specification.
  • a semantic score less than 0 indicates that the semantics of the basic sentence is not standardized.
  • the preset parameter a is set to a larger value than the preset parameter b. , So that the non-standard basic sentence has a greater impact on the entire content to be published.
  • the values of the preset parameters a and b can be selected according to the actual situation, and there is no specific limitation here.
  • the semantic scores of different ranges are weighted and summarized by the preset formula to obtain the comprehensive score of the content to be published, which is beneficial to improve the rationality of the comprehensive score evaluation.
  • Fig. 6 shows a schematic block diagram of a device for reviewing network content publishing in one-to-one correspondence with the method for reviewing network content publishing in the foregoing embodiment.
  • the review device for publishing network content includes a request receiving module 10, a type matching module 20, a content analysis module 30, a semantic recognition module 40, a comprehensive scoring module 50 and a result determination module 60.
  • the detailed description of each functional module is as follows:
  • the request receiving module 10 is configured to obtain the current user information and the content to be published included in the review request if a review request for network content publishing sent by the client is received;
  • the type matching module 20 is used to match the current user information with each user information in the preset list type database to determine the user type corresponding to the current user information.
  • the list type database includes each user information and the corresponding user information. user type;
  • the content parsing module 30 is configured to, if the user type corresponding to the current user information is an ordinary user, analyze the content to be published according to a preset sentence division method to obtain each basic sentence contained in the content to be published;
  • the semantic recognition module 40 is used to perform semantic recognition on each basic sentence by using natural language semantic recognition, and obtain the semantic score corresponding to each basic sentence;
  • the comprehensive scoring module 50 is used to determine the comprehensive score of the content to be published according to the semantic score of each basic sentence
  • the result determination module 60 is used to compare the comprehensive score with a preset score threshold. If the comprehensive score is greater than the preset score threshold, confirm that the content to be published is legal, publish the content to be published, and send a message of approval to the client.
  • the verification device for publishing network content further includes:
  • the first review module 70 is configured to publish content to be published if the user type corresponding to the current user information is a whitelist user;
  • the second review module 80 is configured to remove the content to be published if the user type corresponding to the current user information is a blacklisted user, and send a message that the review failed to the client.
  • semantic recognition module 40 includes:
  • the word segmentation unit 41 is used to perform word segmentation processing on the basic sentence through a preset word segmentation method to obtain the basic word segmentation contained in the basic sentence;
  • the clustering unit 42 is configured to convert the basic word segmentation into a word vector, and cluster the word vector through a preset clustering algorithm to obtain the cluster center corresponding to each basic sentence;
  • the scoring unit 43 is used to calculate the distance between the cluster center corresponding to the basic sentence and each preset word meaning vector for each basic sentence, and use the preset word meaning vector corresponding to the minimum distance as the target vector, and set the semantics corresponding to the target vector The score serves as the semantic score corresponding to the basic sentence.
  • word segmentation unit 41 includes:
  • the training subunit 411 is used to obtain a preset training corpus, and use the N-gram model to analyze the preset training corpus to obtain word sequence data of the preset training corpus;
  • the parsing subunit 412 is used to perform word segmentation analysis on the basic sentence to obtain M word segmentation sequences;
  • the calculation subunit 413 is used to calculate the occurrence probability of each word sequence according to the word sequence data of the preset training corpus for each word segmentation sequence to obtain the occurrence probability of M word segmentation sequences;
  • the selection subunit 414 is used to select the word segmentation sequence corresponding to the occurrence probability that reaches the preset probability threshold from the occurrence probabilities of the M word segmentation sequences as the target word segmentation sequence, and use each word segmentation in the target word segmentation sequence as the basic sentence The basic participle contained in.
  • the comprehensive scoring module 50 includes:
  • the score calculation unit 51 is used to calculate the comprehensive score of the content to be published using the following formula:
  • M i is the i-th sentence based semantic scores
  • a and b are preset parameter
  • S i is the i th basis statement weighted score
  • W is the composite score content to be distributed
  • i and n are positive integers, and i ⁇ n.
  • Each module in the above-mentioned network content publishing review device can be implemented in whole or in part by software, hardware, and a combination thereof.
  • the above-mentioned modules may be embedded in the form of hardware or independent of the processor in the computer equipment, or may be stored in the memory of the computer equipment in the form of software, so that the processor can call and execute the operations corresponding to the above-mentioned modules.
  • Fig. 7 is a schematic diagram of a computer device provided by an embodiment of the present application.
  • the computer device may be a server, and its internal structure diagram may be as shown in Figure 7.
  • the computer equipment includes a processor, a memory, a network interface and a database connected through a system bus. Among them, the processor of the computer device is used to provide calculation and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium stores an operating system, a computer program, and a database.
  • the internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage medium.
  • the database of the computer device is used to store the preset corpus and the preset word meaning vector.
  • the network interface of the computer device is used to communicate with an external terminal through a network connection. When the computer program is executed by the processor, any one or more sets of steps of the above-disclosed method for reviewing network content publishing are realized.
  • a computer device including a memory, a processor, and a computer program stored in the memory and capable of running on the processor.
  • the processor executes the computer program to implement the method for auditing network content publishing in the foregoing embodiment. , For example, steps S10 to S60 shown in FIG. 2.
  • the functions of the various modules/units of the verification device for publishing network content in the foregoing embodiment are implemented, for example, the functions of the modules 10 to 60 shown in FIG. 6. To avoid repetition, I won’t repeat them here.
  • a computer-readable storage medium is provided, the computer-readable storage medium is a volatile storage medium or a non-volatile storage medium, and the computer-readable storage medium stores a computer program, and the computer program When executed by a processor, the steps of the verification method for network content publishing in the foregoing embodiment are realized, or the computer program, when executed by a processor, realizes the functions of each module/unit in the verification apparatus for network content publishing in the foregoing embodiment. To avoid repetition, I won’t repeat them here.
  • Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Channel (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
  • SRAM static RAM
  • DRAM dynamic RAM
  • SDRAM synchronous DRAM
  • DDRSDRAM double data rate SDRAM
  • ESDRAM enhanced SDRAM
  • SLDRAM synchronous chain Channel
  • memory bus Radbus direct RAM
  • RDRAM direct memory bus dynamic RAM
  • RDRAM memory bus dynamic RAM

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

L'invention concerne un procédé et un appareil de vérification de publication de contenu de réseau, un dispositif informatique et un support de stockage. Ledit procédé comprend : lors de la réception d'une demande de vérification pour une publication de contenu de réseau, acquérir des informations d'utilisateur actuelles et un contenu à publier qui sont contenus dans la demande de vérification, et déterminer un type d'utilisateur correspondant aux informations d'utilisateur actuelles ; si le type d'utilisateur correspondant aux informations d'utilisateur actuelles est un utilisateur ordinaire, analyser ledit contenu pour obtenir des instructions de base, puis utiliser un mode de reconnaissance sémantique en langage naturel pour effectuer une reconnaissance sémantique sur les instructions de base, de manière à obtenir des scores sémantiques correspondant aux instructions de base, puis déterminer un score global dudit contenu en fonction des scores sémantiques des instructions de base ; et déterminer si ledit contenu est légal selon le score global et un seuil de score prédéfini. La présente invention réalise une reconnaissance sémantique intelligente de contenu de réseau ; de plus, la publication de contenu de réseau est vérifiée en fonction de la signification sémantique reconnue pour déterminer si la publication de contenu de réseau est raisonnable, ce qui améliore le degré d'intelligence et la précision de vérification de publication de contenu de réseau.
PCT/CN2020/085582 2019-06-17 2020-04-20 Procédé et appareil de vérification de publication de contenu de réseau, dispositif informatique et support de stockage WO2020253350A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910522440.6A CN110377900A (zh) 2019-06-17 2019-06-17 网络内容发布的审核方法、装置、计算机设备及存储介质
CN201910522440.6 2019-06-17

Publications (1)

Publication Number Publication Date
WO2020253350A1 true WO2020253350A1 (fr) 2020-12-24

Family

ID=68248961

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/085582 WO2020253350A1 (fr) 2019-06-17 2020-04-20 Procédé et appareil de vérification de publication de contenu de réseau, dispositif informatique et support de stockage

Country Status (2)

Country Link
CN (1) CN110377900A (fr)
WO (1) WO2020253350A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112783917A (zh) * 2021-01-04 2021-05-11 广州海量数据库技术有限公司 工单审核方法及装置、存储介质及电子设备
CN113835730A (zh) * 2021-09-24 2021-12-24 支付宝(杭州)信息技术有限公司 一种更新审核程序的方法、装置、设备及介质

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110377900A (zh) * 2019-06-17 2019-10-25 深圳壹账通智能科技有限公司 网络内容发布的审核方法、装置、计算机设备及存储介质
CN110929055B (zh) * 2019-11-15 2023-05-02 北京达佳互联信息技术有限公司 多媒体质量检测方法、装置、电子设备及存储介质
CN111125023A (zh) * 2019-11-15 2020-05-08 北京十分科技有限公司 文件的审核、审核控制、发布方法及对应装置
CN111209363B (zh) * 2019-12-25 2024-02-09 华为技术有限公司 语料数据处理方法、装置、服务器和存储介质
CN111309938A (zh) * 2020-01-22 2020-06-19 恒大新能源汽车科技(广东)有限公司 一种多媒体文件处理方法及装置
CN111414515A (zh) * 2020-03-17 2020-07-14 中国建设银行股份有限公司 一种资源审核方法、装置、设备及存储介质
CN113761182A (zh) * 2020-06-17 2021-12-07 北京沃东天骏信息技术有限公司 一种确定业务问题的方法和装置
CN112163585B (zh) * 2020-11-10 2023-11-10 上海七猫文化传媒有限公司 文本的审核方法、装置、计算机设备及存储介质
CN112464036B (zh) * 2020-11-24 2023-06-16 行吟信息科技(武汉)有限公司 一种违规数据的审核方法及装置
CN112906387B (zh) * 2020-12-25 2023-08-04 北京百度网讯科技有限公司 风险内容识别方法、装置、设备、介质和计算机程序产品
CN113010708B (zh) * 2021-03-11 2023-08-25 上海麦糖信息科技有限公司 针对违规朋友圈内容以及违规聊天内容的审核方法及系统
CN114245160A (zh) * 2021-12-07 2022-03-25 北京达佳互联信息技术有限公司 信息处理方法、装置、电子设备及存储介质
CN116822494B (zh) * 2023-08-28 2023-12-08 深圳有咖互动科技有限公司 广播剧信息处理方法、装置、电子设备和计算机可读介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101446970A (zh) * 2008-12-15 2009-06-03 腾讯科技(深圳)有限公司 一种对用户发布的文本内容审核处理的方法及其装置
CN102096680A (zh) * 2009-12-15 2011-06-15 北京大学 信息有效性分析的方法和装置
CN102098332A (zh) * 2010-12-30 2011-06-15 北京新媒传信科技有限公司 一种内容审核方法和装置
US8224950B2 (en) * 1997-03-25 2012-07-17 Symantec Corporation System and method for filtering data received by a computer system
CN109635073A (zh) * 2018-10-18 2019-04-16 深圳壹账通智能科技有限公司 论坛社区应用管理方法、装置、设备及计算机可读存储介质
CN110377900A (zh) * 2019-06-17 2019-10-25 深圳壹账通智能科技有限公司 网络内容发布的审核方法、装置、计算机设备及存储介质

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6334697B2 (ja) * 2013-11-08 2018-05-30 グーグル エルエルシー ディスプレイコンテンツのイメージを抽出し、生成するシステムおよび方法
CN109800307B (zh) * 2019-01-18 2022-08-02 深圳壹账通智能科技有限公司 产品评价的分析方法、装置、计算机设备及存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8224950B2 (en) * 1997-03-25 2012-07-17 Symantec Corporation System and method for filtering data received by a computer system
CN101446970A (zh) * 2008-12-15 2009-06-03 腾讯科技(深圳)有限公司 一种对用户发布的文本内容审核处理的方法及其装置
CN102096680A (zh) * 2009-12-15 2011-06-15 北京大学 信息有效性分析的方法和装置
CN102098332A (zh) * 2010-12-30 2011-06-15 北京新媒传信科技有限公司 一种内容审核方法和装置
CN109635073A (zh) * 2018-10-18 2019-04-16 深圳壹账通智能科技有限公司 论坛社区应用管理方法、装置、设备及计算机可读存储介质
CN110377900A (zh) * 2019-06-17 2019-10-25 深圳壹账通智能科技有限公司 网络内容发布的审核方法、装置、计算机设备及存储介质

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112783917A (zh) * 2021-01-04 2021-05-11 广州海量数据库技术有限公司 工单审核方法及装置、存储介质及电子设备
CN113835730A (zh) * 2021-09-24 2021-12-24 支付宝(杭州)信息技术有限公司 一种更新审核程序的方法、装置、设备及介质

Also Published As

Publication number Publication date
CN110377900A (zh) 2019-10-25

Similar Documents

Publication Publication Date Title
WO2020253350A1 (fr) Procédé et appareil de vérification de publication de contenu de réseau, dispositif informatique et support de stockage
CN110765244B (zh) 获取应答话术的方法、装置、计算机设备及存储介质
CN108304375B (zh) 一种信息识别方法及其设备、存储介质、终端
US11023478B2 (en) Determining temporal categories for a domain of content for natural language processing
US7783476B2 (en) Word extraction method and system for use in word-breaking using statistical information
US20190347571A1 (en) Classifier training
US20070192309A1 (en) Method and system for identifying sentence boundaries
US9483582B2 (en) Identification and verification of factual assertions in natural language
CN110929125B (zh) 搜索召回方法、装置、设备及其存储介质
CN110928994A (zh) 相似案例检索方法、相似案例检索装置和电子设备
WO2021114841A1 (fr) Procédé de génération de rapport d'utilisateur, et dispositif terminal
CN112328742A (zh) 基于人工智能的培训方法、装置、计算机设备及存储介质
WO2020077825A1 (fr) Procédé, appareil et dispositif de gestion d'application de forum/communauté, ainsi que support de stockage lisible
CN111767393A (zh) 一种文本核心内容提取方法及装置
CN111985228A (zh) 文本关键词提取方法、装置、计算机设备和存储介质
CN114896305A (zh) 一种基于大数据技术的智慧互联网安全平台
WO2023240878A1 (fr) Procédé et appareil de reconnaissance de ressource, et dispositif et support d'enregistrement
CN111552798B (zh) 基于名称预测模型的名称信息处理方法、装置、电子设备
CN113343108A (zh) 推荐信息处理方法、装置、设备及存储介质
WO2022134834A1 (fr) Procédé, appareil et dispositif de prédiction d'événement potentiel, et support de stockage
CN111930949B (zh) 搜索串处理方法、装置、计算机可读介质及电子设备
TWI734085B (zh) 使用意圖偵測集成學習之對話系統及其方法
WO2023035529A1 (fr) Procédé et appareil d'interrogation intelligente d'informations basés sur la reconnaissance d'intention, dispositif et support
CN110941713A (zh) 基于主题模型的自优化金融资讯版块分类方法
CN113177164B (zh) 基于大数据的多平台协同新媒体内容监控管理系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20827002

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20827002

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205 DATED 29/03/2022)

122 Ep: pct application non-entry in european phase

Ref document number: 20827002

Country of ref document: EP

Kind code of ref document: A1