CN110377900A - Checking method, device, computer equipment and the storage medium of Web content publication - Google Patents

Checking method, device, computer equipment and the storage medium of Web content publication Download PDF

Info

Publication number
CN110377900A
CN110377900A CN201910522440.6A CN201910522440A CN110377900A CN 110377900 A CN110377900 A CN 110377900A CN 201910522440 A CN201910522440 A CN 201910522440A CN 110377900 A CN110377900 A CN 110377900A
Authority
CN
China
Prior art keywords
content
released
sentence
user information
basic sentence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910522440.6A
Other languages
Chinese (zh)
Inventor
夏新
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
OneConnect Smart Technology Co Ltd
Original Assignee
OneConnect Smart Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by OneConnect Smart Technology Co Ltd filed Critical OneConnect Smart Technology Co Ltd
Priority to CN201910522440.6A priority Critical patent/CN110377900A/en
Publication of CN110377900A publication Critical patent/CN110377900A/en
Priority to PCT/CN2020/085582 priority patent/WO2020253350A1/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of checking methods of Web content publication, device, computer equipment and storage medium, the described method includes: in the audit request for receiving Web content publication, obtain the current user information for including in audit request and content to be released, and determine the corresponding user type of current user information, if the corresponding user type of current user information is ordinary user, then content to be released is parsed, obtain basic sentence, and then by the way of natural language semantics recognition, semantics recognition is carried out to basic sentence, obtain the corresponding semantic score of basic sentence, further according to the semantic score of each basic sentence, determine the comprehensive score of the content to be released, confirm whether the content to be released is legal according to comprehensive score and default scoring threshold value, it realizes intelligent to Web content progress semantics recognition, and according to identification Whether semanteme out is reasonable to audit Web content publication, improves the audit intelligence degree and accuracy of Web content publication.

Description

Checking method, device, computer equipment and the storage medium of Web content publication
Technical field
The present invention relates to checking method, devices, meter that natural language processing field more particularly to a kind of Web content are issued Calculate machine equipment and storage medium.
Background technique
With the rapid development of science and technology and the increasingly raising of people's quality of life, more and more people are carried out mutual using network Dynamic and study, all kinds of forums also become one of the popular approach that people are exchanged by network.Currently, there are tens of thousands of opinions daily Altar user post money order receipt to be signed and returned to the sender by forum to exchange, this makes the exchange of people more and more convenient, but inevitably, also has A few peoples are because of personal emotional problems, and the speech of some vulgar, violences, supertition and reaction is disseminated in publication on network forum, these The proper communication exchange of numerous netizens is hindered in speech, thus, it is necessary to when forum user post money order receipt to be signed and returned to the sender, in publication Appearance is audited, it is ensured that the communication environment of the positive health of maintenance forum.
In the prior art, it is mainly audited by the way of keyword search, this audit mode can only be according to default Keyword matched, and then judge to issue whether content standardizes, be limited to the setting of keyword, and be easy to be avoided by user Keyword carries out publication harmful content, so that the audit intelligence degree and accuracy of network distributable content are lower.
Summary of the invention
The embodiment of the present invention provides checking method, device, computer equipment and the storage medium of a kind of Web content publication, By solve current keyword it is matched in a manner of carry out the audit of Web content publication, caused audit intelligence degree is low and correct The low problem of rate.
A kind of checking method of Web content publication, comprising:
If receiving the audit request of the Web content publication of client transmission, obtain the audit request in include Current user information and content to be released;
The current user information is matched with each user information in default list types of database, determines institute State the corresponding user type of current user information, wherein the list types of database includes each user information and the use The corresponding user type of family information;
If the corresponding user type of the current user information is ordinary user, according to preset sentence division mode, The content to be released is parsed, obtains each of including basic sentence in the content to be released;
By the way of natural language semantics recognition, semantics recognition is carried out to each basic sentence, obtains each institute State the corresponding semantic score of basic sentence;
According to the semantic score of each basic sentence, the comprehensive score of the content to be released is determined;
The comprehensive score is compared with default scoring threshold value, if the comprehensive score is greater than the default scoring threshold Value then confirms the content legality to be released, issues the content to be released, and disappear to what client transmission audit passed through Breath.
A kind of audit device of Web content publication, comprising:
Request receiving module, if the audit request of the Web content publication for receiving client transmission, obtains institute State the current user information for including in audit request and content to be released;
Type matching module, for believing each user in the current user information and default list types of database Breath is matched, and determines the corresponding user type of the current user information, wherein the list types of database includes each User information and the corresponding user type of the user information;
Context resolution module, if being ordinary user for the corresponding user type of the current user information, according to pre- If sentence division mode, the content to be released is parsed, each basis for including in the content to be released is obtained Sentence;
Semantics recognition module, for carrying out language to each basic sentence by the way of natural language semantics recognition Justice identification obtains the corresponding semantic score of each basis sentence;
Comprehensive score module determines the content to be released for the semantic score according to each basic sentence Comprehensive score;
As a result determining module, for the comprehensive score to be compared with default scoring threshold value, if the comprehensive score Greater than the default scoring threshold value, then the content legality to be released is confirmed, issue the content to be released, and to the client End sends the message that audit passes through.
A kind of computer equipment, including memory, processor and storage are in the memory and can be in the processing The computer program run on device, the processor realize the audit of above-mentioned Web content publication when executing the computer program The step of method.
A kind of computer readable storage medium, the computer-readable recording medium storage have computer program, the meter Calculation machine program realizes the step of checking method of above-mentioned Web content publication when being executed by processor.
Checking method, device, computer equipment and the storage medium of Web content publication provided in an embodiment of the present invention, Receive client send Web content publication audit request when, obtain audit request in include current user information and to Content is issued, current user information is compared with each user information in default list types of database, is determined current The corresponding user type of user information, if the corresponding user type of current user information is ordinary user, according to preset language Sentence division mode, parses content to be released, obtains each of including basic sentence in content to be released, so using from Right language semantic is known otherwise, carries out semantics recognition to each basic sentence, obtains the corresponding semanteme of each basic sentence and comments Point, further according to the semantic score of each basic sentence, the comprehensive score of the content to be released is determined, finally by comprehensive score and in advance If scoring threshold value is compared, when comprehensive score is greater than default scoring threshold value, the content legality to be released is confirmed, publication should be to Publication content, and the message that audit passes through is sent to client, realization is intelligent to Web content progress semantics recognition, and according to The semanteme identified come audit the Web content publication it is whether reasonable, improve Web content publication audit intelligence degree and Accuracy.
Detailed description of the invention
In order to illustrate the technical solution of the embodiments of the present invention more clearly, below by institute in the description to the embodiment of the present invention Attached drawing to be used is needed to be briefly described, it should be apparent that, the accompanying drawings in the following description is only some implementations of the invention Example, for those of ordinary skill in the art, without any creative labor, can also be according to these attached drawings Obtain other attached drawings.
Fig. 1 is the application environment schematic diagram of the checking method of Web content publication provided in an embodiment of the present invention;
Fig. 2 is the implementation flow chart of the checking method of Web content publication provided in an embodiment of the present invention;
Fig. 3 is the auditing flow in the checking method of Web content publication provided in an embodiment of the present invention to non-generic user Figure;
Fig. 4 is the implementation flow chart of step S40 in the checking method of Web content publication provided in an embodiment of the present invention;
Fig. 5 is the implementation flow chart of step S41 in the checking method of Web content publication provided in an embodiment of the present invention;
Fig. 6 is the schematic diagram of the audit device of Web content publication provided in an embodiment of the present invention;
Fig. 7 is the schematic diagram of computer equipment provided in an embodiment of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are some of the embodiments of the present invention, instead of all the embodiments.Based on this hair Embodiment in bright, every other implementation obtained by those of ordinary skill in the art without making creative efforts Example, shall fall within the protection scope of the present invention.
Referring to Fig. 1, Fig. 1 shows the application environment of the checking method of Web content publication provided in an embodiment of the present invention. The checking method of Web content publication is applied to the net in network forum, network direct broadcasting or other kinds of Web Community In the audit scene of network content publication.The record scene includes client, server-side and management end, wherein server-side and client It is attached between end, between server-side and management end by network, client sends examining for Web content publication to server-side Core request, server-side judge user type, and determine audit mode according to user type after getting audit request, When user type is ordinary user, content to be released is obtained, and carry out semantic analysis, the semanteme for obtaining content to be released is commented Point, and then determine the legitimacy of content to be released, and when illegal, corresponding prompt information is sent to management end.Client and Management end specifically can be, but not limited to be the intelligence such as mobile phone, tablet computer, personal computer (Personal Computer, PC) Terminal device, server-side can specifically be realized with the server cluster that independent server or multiple servers form.
Referring to Fig. 2, Fig. 2 shows a kind of checking methods of Web content publication provided in an embodiment of the present invention, with the party Method is illustrated for applying the server-side in Fig. 1, and details are as follows:
S10: if receive client transmission Web content publication audit request, obtain audit request in include Current user information and content to be released.
Specifically, user first edits content to be released when carrying out forum's exchange by client, and is clicking client Submitting button after, client sends the audit request comprising user information and content to be released to server-side, and server-side passes through The network transmission protocol receives the user information and content to be released for including in audit request.
Wherein, user information includes but is not limited to user account information etc., and server-side is determined and used by user account information Family type, in the present embodiment, for the user of different user types, using the corresponding audit mode of the user type to pending Cloth content is audited, to improve the review efficiency of Web content publication.
Wherein, content to be released is that user is good in client-side editing, for uploading forum or other Web Communities, is used for Text information, link information, image information and video information for being interacted with other network users etc..
Wherein, the network transmission protocol includes but is not limited to: Internet Control Message agreement (Internet Control Message Protocol, ICMP), address resolution protocol (ARP Address Resolution Protocol, ARP) and text Part transport protocol (File Transfer Protocol, FTP) etc..
S20: current user information is matched with each user information in default list types of database, and determination is worked as The corresponding user type of preceding user information, wherein list types of database includes that each user information and user information are corresponding User type.
Specifically, server-side is stored with default list types of database, comprising all in the default list types of database The user information and the corresponding user type of each user information for registering user, by using the mode of traversal queries, to this Default list types of database is inquired, and is realized and is carried out user type judgement to the user information got in step S10, obtains To the corresponding user type of the user information.
Wherein, preset the list types of database user type that includes may include: white list user, black list user and Ordinary user's type etc., the differentiation of different user types is divided according to the credit grade to user, for example, administrator The corresponding credit grade of user in member's list is relatively high, can generally be divided into white list user, repeatedly be accused of violation operation and answer The user of Web Community's regular turn, corresponding credit grade is relatively low, reduces to a certain extent, will be put into credit grade To the inventory of black list user's type.
Wherein, user type is the user information of ordinary user, and corresponding audit request needs further progress intelligent Assessment, determines auditing result according to assessment result.
S30: if the corresponding user type of current user information is ordinary user, according to preset sentence division mode, Content to be released is parsed, obtains each of including basic sentence in content to be released.
Specifically, right according to preset sentence division mode when the corresponding user type of user information is ordinary user Content to be released is parsed, and obtains each of including basic sentence in content to be released.
In the present embodiment, preset sentence division mode can be by carrying out canonical to preset list separator Match, and then to be matched to that there are the positions of preset list separator as separation, cutting is carried out to content to be released, is obtained pending It each of include basic sentence in cloth content.
Wherein, preset list separator includes but is not limited to: segmentation symbol, newline, punctuation mark etc., specifically can be according to reality Border demand is configured, herein without limitation.
S40: by the way of natural language semantics recognition, semantics recognition is carried out to each basic sentence, obtains each base The corresponding semantic score of plinth sentence.
Specifically, by way of natural language semantics recognition, semantics recognition, and root are carried out to each basic sentence According to preset scoring condition, scores the corresponding semanteme of each basis sentence, obtain the semantic score of each basic sentence.
Wherein, natural language semantics recognition (Natural Language Processing, NLP) is artificial intelligence (AI) A subdomains understanding parsing is carried out to natural language by way of machine learning, to solve natural language field Some problems, the main application range of NLP includes but is not limited to: text reads aloud (Text to speech)/speech synthesis (Speech Synthesis), speech recognition (Speech recognition), Chinese Automatic Word Segmentation (Chinese word Segmentation), part-of-speech tagging (Part-of-speech tagging), syntactic analysis (Parsing), text classification (Text categorization), information retrieval (Information retrieval), autoabstract (Automatic Summarization) and text proofreads (Text-proofing) etc..
S50: according to the semantic score of each basic sentence, the comprehensive score of content to be released is determined.
Specifically, by preset weighting scheme, the semantic score of each basic sentence is weighted and is summarized, obtain to Issue the comprehensive score of content.
Wherein, preset weighting scheme can be set according to actual needs, for example, for the semanteme in different range Different weighting coefficients etc. is arranged in scoring.
S60: comprehensive score is compared with default scoring threshold value, if comprehensive score is greater than default scoring threshold value, really Recognize content legality to be released, issues content to be released, and send the message that audit passes through to client.
Specifically, server-side is preset with scoring threshold value, and comprehensive score is compared with default scoring threshold value, is commented in synthesis When dividing greater than default scoring threshold value, content legality to be released is confirmed, issue the content to be released, and send to audit to client and lead to The message crossed.
It is worth noting that confirming that content to be released may relate to when comprehensive score is greater than or equal to default scoring threshold value Dislike in violation of rules and regulations, refusal is issued into the content to be released, and send to client and audit unacceptable prompt information, and this is to be released Content auditing request is recorded, and so as to follow-up management, personnel are managed.
In the present embodiment, it when receiving the audit request of client transmission Web content publication, obtains audit and requests In include current user information and content to be released, by each use in current user information and default list types of database Family information is compared, and determines the corresponding user type of current user information, if the corresponding user type of current user information is Ordinary user parses content to be released then according to preset sentence division mode, obtain include in content to be released Each basis sentence, and then by the way of natural language semantics recognition, semantics recognition is carried out to each basic sentence, is obtained every The corresponding semantic score of a basis sentence determines the synthesis of the content to be released further according to the semantic score of each basic sentence Comprehensive score is finally compared by scoring with default scoring threshold value, and when comprehensive score is greater than default scoring threshold value, confirmation should Content legality to be released issues the content to be released, and sends the message that audit passes through to client, realizes intelligence to network Content carries out semantics recognition, and whether reasonable, improve in network if auditing according to the semanteme identified Web content publication Hold the audit intelligence degree and accuracy of publication.
In one embodiment, referring to Fig. 3, after step S20, the checking method of Web content publication further include:
S70: if the corresponding user type of current user information is white list user, content to be released is issued.
Specifically, in the mode by using traversal queries, this is preset after list types of database inquires, is determined When the corresponding user type of current user information is white list user, then the content to be released is directly issued.
S80: if the corresponding user type of current user information is black list user, removing content to be released, and to visitor Family end, which is sent, audits unacceptable message.
Specifically, in the mode by using traversal queries, this is preset after list types of database inquires, is determined When the corresponding user type of current user information is black list user, then judgement is without auditing in the content to be released comprising semanteme Information directly deletes the content to be released, and sends to client and audit unacceptable message.
It should be noted that the successive of certainty does not execute sequence by step S70 and step S80, execution arranged side by side can be Relationship, herein with no restrictions.
In the present embodiment, it by being white list user and black list user to user type, is carried out according to predetermined manner Quick review operations, the content to be released without the user to both user types carry out semantics recognition, improve network The review efficiency of content publication.
On the basis of the corresponding embodiment of Fig. 2, below by a specific embodiment come to being mentioned in step S40 And by the way of natural language semantics recognition, semantics recognition is carried out to each basic sentence, obtains each basic sentence pair The concrete methods of realizing for the semantic score answered is described in detail.
Referring to Fig. 4, Fig. 4 shows the specific implementation flow of step S40 provided in an embodiment of the present invention, details are as follows:
S41: by preset participle mode, word segmentation processing is carried out to basic sentence, obtains the base for including in basic sentence Plinth participle.
Specifically, by preset participle mode, each basis sentence obtained in step S30 is carried out at participle Reason obtains the basis participle for including in each basic sentence.
Wherein, preset participle mode includes but is not limited to: segmenting tool or segmentation methods etc. by third party.
Wherein, common third party participle tool includes but is not limited to: Stanford NLP segmenter, ICTClAS participle System, ansj segment tool and HanLP Chinese word segmentation tool etc..
Wherein, segmentation methods include but is not limited to: maximum forward matches (Maximum Matching, MM) algorithm, reverse Maximum matching (ReverseDirectionMaximum Matching Method, RMM) algorithm, two-way maximum matching (Bi- Directction Matching method, BM) algorithm, Hidden Markov Model (Hidden Markov Model, HMM) and N-gram model etc..
It is readily appreciated that ground, basic participle is extracted by way of participle, on the one hand, can filter out in effectively basic sentence On the other hand some meaningless vocabulary are also beneficial to subsequent using these basis participle generation term vectors.
S42: basis participle is converted into term vector, and by preset clustering algorithm, term vector is clustered, is obtained The corresponding cluster centre of each basis sentence.
In artificial intelligence, language expression refers mainly to the formalization of language or the description of mathematics, so as to table in a computer Show language, and computer program can be allowed to automatically process.In the embodiment of the present invention signified term vector be exactly with the form of vector come Indicate a basis participle.
Specifically, the mode for first passing through term vector converts each basis participle, obtains each basis participle and corresponds to Term vector term vector is clustered, each basis is obtained and segments corresponding term vector and then by preset clustering algorithm Cluster centre, and then the basis in the same basic sentence is segmented into corresponding cluster centre and carries out continuing to cluster, obtain base The corresponding cluster centre of plinth sentence.
Wherein, cluster (Cluster) algorithm is also known as cluster analysis, it is a kind of statistical analysis of sample or index classification problem Method, while being also an important algorithm of data mining, clustering algorithm includes but is not limited to: K mean value (K-Means) cluster is calculated Method, mean shift clustering algorithm, density clustering (Density-Based Spatial Clustering of Applications with Noise, DBSCAN) method, the greatest hope cluster based on gauss hybrid models, cohesion level be poly- Class and figure group detection (Graph Community Detection) algorithm etc..
Preferably, in the present embodiment, using K mean value (K-Means) clustering algorithm, by being corresponded to each basis participle Term vector clustered, determine that each basis segments corresponding classification, and then cluster to basic sentence, obtain basic language The corresponding cluster centre of sentence.
S43: for each basic sentence, calculate the corresponding cluster centre of basic sentence and each default meaning of a word vector away from From, and using the corresponding default meaning of a word vector of minimum range as object vector, using the corresponding semantic score of object vector as base The corresponding semantic score of plinth sentence.
Specifically, server-side, which is previously stored with, indicates to specify semantic default semantic vector, each default semantic vector pair There should be preset semantic score, for each basic sentence, it is pre- with these to calculate separately the corresponding cluster centre of basis sentence If the distance of semantic vector, and using the corresponding default meaning of a word vector of minimum range as object vector, and object vector is corresponding Semantic score of the semantic score as the basis sentence.
Preferably, in the present embodiment, after determining object vector, can also according to basic sentence and object vector away from From, calculating grading parameters, and according to the semantic score of grading parameters and the determining basic sentence of the corresponding semantic score of object vector.
In the present embodiment, by preset participle mode, word segmentation processing is carried out to basic sentence, is obtained in basic sentence The basis participle for including, and then basis participle is converted into term vector, and by preset clustering algorithm, gather to term vector Class obtains the corresponding cluster centre of each basic sentence, for each basic sentence, calculates the corresponding cluster centre of basic sentence At a distance from each default meaning of a word vector, and using the corresponding default meaning of a word vector of minimum range as object vector, by target to Corresponding semantic score is measured as the corresponding semantic score of basic sentence, the semantic score to basic sentence is realized, improves The intelligence degree and review efficiency of audit.
On the basis of the corresponding embodiment of Fig. 2, below by a specific embodiment come to being mentioned in step S41 And by preset participle mode, word segmentation processing is carried out to basic sentence, obtains the basis participle for including in basic sentence Concrete methods of realizing is described in detail.
Referring to Fig. 5, Fig. 5 shows the specific implementation flow of step S41 provided in an embodiment of the present invention, details are as follows:
S411: being obtained preset training corpus, and analyzed using N-gram model preset training corpus, Obtain the word order column data of preset training corpus.
Specifically, training corpus and is used in order to using the basic sentence in natural language to assess The corpus that related corpus is trained, by using N-gram model to each corpus in preset training corpus into Row statistical analysis show that a corpus H in preset training corpus appears in the number after another corpus I, and then The word order column data occurred to the word sequence of " corpus I+ corpus H " composition.Content in the embodiment of the present invention in training corpus Including but not limited to: the corresponding specialized information of the topic of forum or Web Community, network corpus and general corpus etc..
Wherein, corpus (Corpus) refers to the extensive e-text library through scientific sampling and processing.Corpus is language The basic resource of Yan Xue research and the main resource of empiricism speech research method are applied to lexicography, language religion It learns, conventional language research, based on statistics or the research of example etc., corpus, i.e. linguistic data, corpus in natural language processing It is the content of introduction on linguistics research, and constitutes the basic unit of corpus.
For example, in a specific embodiment, preset training corpus is by new to popular network topics and current events News is crawled by way of web crawlers, obtains the corpus in " current events " field.
Wherein, word sequence refers to the sequence being composed of at least two corpus according to certain sequence, and word sequence frequency is Refer to that the number that the word sequence occurs accounts in entire corpus the ratio for segmenting (Word Segmentation) frequency of occurrence, here Participle refer to the word sequence for being combined continuous word sequence according to preset combination.For example, some word The number that sequence " love eats tomato " occurs in entire corpus is 100 times, the number that entire all participles of corpus occur The sum of be 100000 times, then the word sequence frequency of word sequence " love eats tomato " be 0.0001.
Wherein, N-gram model is common a kind of language model in the continuous text semantics recognition of big vocabulary, using upper and lower Collocation information in text between adjacent word, when needing the text conversion continuously without space at Chinese character string (i.e. sentence), Ke Yiji The sentence with maximum probability is calculated, to realize the automatic conversion for arriving Chinese character, is manually selected without user, improves word sequence Determining accuracy.
It is worth noting that obtaining preset instruction in the present embodiment to improve the review efficiency of Web content publication Practice corpus, and preset training corpus is analyzed using N-gram model, obtains the word of preset training corpus The process of sequence data can carry out before audit, and obtained word order column data is stored, and need to be released When content carries out semantics recognition, the word order column data is called directly.
S412: participle parsing is carried out to basic sentence, obtains M segmentation sequence.
Specifically, each basic sentence, punctuate mode is different, and there may be difference for the sentence for understanding out, to guarantee language The correctness that sentence understands, server-side obtain the composition of M segmentation sequence of the basis sentence after getting basic sentence, and M is The sum of all segmentation sequences being likely to occur.
Wherein, each segmentation sequence is one kind for being divided a basic sentence as a result, what is obtained includes The word sequence of at least two participles.
For example, in a specific embodiment, a basic sentence is " today is very hot ", and to the basis, sentence is parsed, Obtain segmentation sequence A are as follows: " today ", "true", " heat " obtain segmentation sequence B are as follows: " the present ", " innocence ", " heat " etc..
S413: each participle sequence is calculated according to the word order column data of preset training corpus for each segmentation sequence The probability of happening of column obtains the probability of happening of M segmentation sequence.
Specifically, according to the word order column data got in step S412, probability of happening meter is carried out to each segmentation sequence It calculates, obtains the probability of happening of M segmentation sequence.
Probability of happening is calculated specifically to segmentation sequence Markov can be used to assume and is theoretical: the appearance of the Y word with it is preceding Y-1, face word is related, and all uncorrelated to other any words, and the probability of whole sentence is exactly the product of each word probability of occurrence.These Probability can be obtained by counting the number that Y word occurs simultaneously directly from corpus.That is:
P (T)=P (W1W2...WY)=P (W1)P(W2|W1)...P(WY|W1W2...WY-1) formula (1)
Wherein, P (T) is the probability that whole sentence occurs, P (WY|W1W2...WY-1) it is that the Y participle appears in Y-1 participle group At word sequence after probability.
Such as: after " Chinese nation is the nationality for having long civilization " the words carries out speech recognition, draw Point a kind of segmentation sequence are as follows: " Chinese nation ", "Yes", "one", " having ", " long ", " civilization ", " history ", " ", There are altogether 9 participles in " nationality ", when n=9, i.e., calculating " nationality " this segment appearing in that " Chinese nation is One has long civilization " probability after this word sequence.
S414: from the probability of happening of M segmentation sequence, corresponding point of probability of happening for reaching predetermined probabilities threshold value is chosen Word sequence, as target segmentation sequence, and by each participle in target segmentation sequence, as the basis for including in basic sentence Participle.
Specifically, for each segmentation sequence, a probability of happening is obtained by the calculating of step S413, M is obtained The probability of happening of this M segmentation sequence is compared with predetermined probabilities threshold value by the probability of happening of a segmentation sequence respectively, is chosen More than or equal to the probability of happening of predetermined probabilities threshold value, as effective probability of happening, and then it is corresponding to find effective probability of happening Segmentation sequence, using these segmentation sequences as target segmentation sequence.
By being compared with predetermined probabilities threshold value, the undesirable segmentation sequence of probability of happening is filtered out, to make The meaning that the target segmentation sequence that must be chosen more is expressed close in natural language, improves the accuracy rate of semantics recognition.
It should be noted that if the probability of happening of calculated M segmentation sequence is respectively less than preset probability threshold value, then really The fixed content to be released is not meet the content of specification, at this point, will audit not by sending as auditing result, and to client The reminder message of " network speech specification please be abide by, be a civilized netizen ".If target segmentation sequence number is greater than default Number, is ranked up according to the size order of its corresponding probability of happening, and the predetermined number segmentation sequence for choosing sequence front is made Then after being ranked up effective probability of happening, sequence preceding 5 is chosen for example, preset number is 5 for target segmentation sequence A effective probability of happening, and then this corresponding segmentation sequence of 5 probability of happening is obtained as target segmentation sequence.
Preferably, in the present embodiment, the corresponding segmentation sequence of maximum probability of happening is chosen, as target segmentation sequence, To reduce subsequent operand, the review efficiency of Web content publication is improved.
In the present embodiment, by obtaining preset training corpus, and using N-gram model to preset trained language Material library is analyzed, and is obtained the word order column data of preset training corpus, can directly be made when facilitating subsequent calculating probability of happening Word sequence data is conducive to improve review efficiency to save the time for calculating probability, meanwhile, basic sentence is carried out Participle parsing obtains M segmentation sequence, and then is directed to each segmentation sequence, the word order columns according to preset training corpus According to calculating the probability of happening of each segmentation sequence, obtain the probability of happening of M segmentation sequence, then the generation from M segmentation sequence It in probability, chooses and reaches the corresponding segmentation sequence of probability of happening of predetermined probabilities threshold value, as target segmentation sequence, and by target Each participle in segmentation sequence, as the basis participle for including in basic sentence, it is ensured that the accuracy of participle is conducive to improve The accuracy rate of cluster and semantic assessment is carried out subsequently through basis participle.
In one embodiment, in step S50, according to the semantic score of each basic sentence, the comprehensive of content to be released is determined Scoring specific implementation flow is closed, details are as follows:
The comprehensive score of content to be released is calculated by following formula:
Wherein, MiFor the semantic score of i-th of basic sentence, a and b are parameter preset, SiFor i-th basic sentence plus Power scoring, n are the quantity of basic sentence, and W is the comprehensive score of content to be released, and i and n are positive integer, and i≤n.
It is worth noting that in the present embodiment, semantic score can be used for expressing the degree of semantic normalization, semantic score is small Show that the semanteme of the basis sentence there are lack of standardization, parameter preset a is arranged the value bigger than parameter preset b, so that not advising in 0 Influence of the basic sentence of model to entire content to be released is bigger, and the value of parameter preset a and b can carry out according to the actual situation It chooses, is not specifically limited herein.
In the present embodiment, by preset formula, the semantic score of different range is weighted and is summarized, obtained to be released The comprehensive score of content is conducive to the reasonability for improving comprehensive score assessment.
It should be understood that the size of the serial number of each step is not meant that the order of the execution order in above-described embodiment, each process Execution sequence should be determined by its function and internal logic, the implementation process without coping with the embodiment of the present invention constitutes any limit It is fixed.
Fig. 6 shows the audit with the one-to-one Web content publication of the checking method of above-described embodiment Web content publication The functional block diagram of device.As shown in fig. 6, the audit device of Web content publication includes request receiving module 10, type matching Module 20, Context resolution module 30, semantics recognition module 40, comprehensive score module 50 and result determining module 60.Each function mould Detailed description are as follows for block:
Request receiving module 10, if the audit request of the Web content publication for receiving client transmission, obtains The current user information for including in audit request and content to be released;
Type matching module 20, for by each user information in current user information and default list types of database Matched, determine the corresponding user type of current user information, wherein list types of database include each user information and The corresponding user type of user information;
Context resolution module 30, if being ordinary user for the corresponding user type of current user information, according to default Sentence division mode, content to be released is parsed, obtains each of including basic sentence in content to be released;
Semantics recognition module 40, for being carried out to each basic sentence semantic by the way of natural language semantics recognition Identification obtains the corresponding semantic score of each basic sentence;
Comprehensive score module 50 determines that the synthesis of content to be released is commented for the semantic score according to each basic sentence Point;
As a result determining module 60, for comprehensive score to be compared with default scoring threshold value, if comprehensive score is greater than in advance If scoring threshold value, then content legality to be released is confirmed, issue content to be released, and send the message that audit passes through to client.
Further, the audit device of Web content publication further include:
First auditing module 70, if for the corresponding user type of current user information be white list user, publication to Issue content;
Second auditing module 80, if for the corresponding user type of current user information be black list user, remove to Content is issued, and is sent to client and audits unacceptable message.
Further, semantics recognition module 40 includes:
Participle unit 41, for carrying out word segmentation processing to basic sentence, obtaining basic sentence by preset participle mode In include basis participle;
Cluster cell 42, for basis participle to be converted to term vector, and by preset clustering algorithm, to term vector into Row cluster obtains the corresponding cluster centre of each basic sentence;
Score unit 43, for being directed to each basis sentence, calculating the corresponding cluster centre of basic sentence and each presetting The distance of meaning of a word vector, and using the corresponding default meaning of a word vector of minimum range as object vector, by the corresponding language of object vector Justice scoring is as the corresponding semantic score of basic sentence.
Further, participle unit 41 includes:
Training subelement 411, for obtaining preset training corpus, and using N-gram model to preset trained language Material library is analyzed, and the word order column data of preset training corpus is obtained;
Parsing subunit 412 obtains M segmentation sequence for carrying out participle parsing to basic sentence;
Computation subunit 413, for being directed to each segmentation sequence, according to the word order column data of preset training corpus, The probability of happening for calculating each segmentation sequence obtains the probability of happening of M segmentation sequence;
Subelement 414 is chosen, for choosing the hair for reaching predetermined probabilities threshold value from the probability of happening of M segmentation sequence The corresponding segmentation sequence of life probability, as target segmentation sequence, and by each participle in target segmentation sequence, as basic language The basis participle for including in sentence.
Further, comprehensive score module 50 includes:
Score computing unit 51, for calculating the comprehensive score of content to be released by following formula:
Wherein, MiFor the semantic score of i-th of basic sentence, a and b are parameter preset, SiFor i-th basic sentence plus Power scoring, n are the quantity of basic sentence, and W is the comprehensive score of content to be released, and i and n are positive integer, and i≤n.
Specific limit of audit device about Web content publication may refer to above for Web content publication The restriction of checking method, details are not described herein.Modules in the audit device of above-mentioned Web content publication can whole or portion Divide and is realized by software, hardware and combinations thereof.Above-mentioned each module can be embedded in the form of hardware or independently of computer equipment In processor in, can also be stored in a software form in the memory in computer equipment, in order to processor calling hold The corresponding operation of the above modules of row.
Fig. 7 is the schematic diagram for the computer equipment that one embodiment of the invention provides.The computer equipment can be server-side, Its internal structure chart can be as shown in Figure 7.The computer equipment includes processor, the memory, net connected by system bus Network interface and database.Wherein, the processor of the computer equipment is for providing calculating and control ability.The computer equipment Memory includes non-volatile memory medium, built-in storage.The non-volatile memory medium is stored with operating system, computer journey Sequence and database.The built-in storage provides ring for the operation of operating system and computer program in non-volatile memory medium Border.The database of the computer equipment is for storing preset corpus and default meaning of a word vector.The network of the computer equipment Interface is used to communicate with external terminal by network connection.To realize a kind of network when the computer program is executed by processor The checking method of content publication.
In one embodiment, a kind of computer equipment is provided, including memory, processor and storage are on a memory And the computer program that can be run on a processor, processor realize that above-described embodiment Web content is sent out when executing computer program The step of checking method of cloth, such as step S10 shown in Fig. 2 to step S60.Alternatively, when processor executes computer program Realize above-described embodiment Web content publication audit device each module/unit function, such as module shown in fig. 6 10 to The function of module 60.To avoid repeating, which is not described herein again.
It is apparent to those skilled in the art that for convenience of description and succinctly, only with above-mentioned each function Can unit, module division progress for example, in practical application, can according to need and by above-mentioned function distribution by different Functional unit, module are completed, i.e., the internal structure of described device is divided into different functional unit or module, more than completing The all or part of function of description.
In one embodiment, a computer readable storage medium is provided, meter is stored on the computer readable storage medium Calculation machine program, the computer program realize the step of the checking method of above-described embodiment Web content publication when being executed by processor Suddenly, alternatively, the computer program realizes each mould in the audit device of above-described embodiment Web content publication when being executed by processor Block/unit function.To avoid repeating, which is not described herein again.
Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be with Relevant hardware is instructed to complete by computer program, the computer program can be stored in a non-volatile computer In read/write memory medium, the computer program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein, To any reference of memory, storage, database or other media used in each embodiment provided herein, Including non-volatile and/or volatile memory.Nonvolatile memory may include read-only memory (ROM), programming ROM (PROM), electrically programmable ROM (EPROM), electrically erasable ROM (EEPROM) or flash memory.Volatile memory may include Random access memory (RAM) or external cache.By way of illustration and not limitation, RAM is available in many forms, Such as static state RAM (SRAM), dynamic ram (DRAM), synchronous dram (SDRAM), double data rate sdram (DDRSDRAM), enhancing Type SDRAM (ESDRAM), synchronization link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic ram (DRDRAM) and memory bus dynamic ram (RDRAM) etc..
Embodiment described above is merely illustrative of the technical solution of the present invention, rather than its limitations;Although referring to aforementioned reality Applying example, invention is explained in detail, those skilled in the art should understand that: it still can be to aforementioned each Technical solution documented by embodiment is modified or equivalent replacement of some of the technical features;And these are modified Or replacement, the spirit and scope for technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution should all It is included within protection scope of the present invention.

Claims (10)

1. a kind of checking method of Web content publication, which is characterized in that the checking method of the Web content publication includes:
If receive client transmission Web content publication audit request, obtain it is described audit request in include it is current User information and content to be released;
The current user information is matched with each user information in default list types of database, is worked as described in determination The corresponding user type of preceding user information, wherein the list types of database includes each user information and user letter Cease corresponding user type;
If the corresponding user type of the current user information is ordinary user, according to preset sentence division mode, to institute It states content to be released to be parsed, obtains each of including basic sentence in the content to be released;
By the way of natural language semantics recognition, semantics recognition is carried out to each basic sentence, obtains each base The corresponding semantic score of plinth sentence;
According to the semantic score of each basic sentence, the comprehensive score of the content to be released is determined;
The comprehensive score is compared with default scoring threshold value, if the comprehensive score is greater than the default scoring threshold value, Then confirm the content legality to be released, issues the content to be released, and send the message that audit passes through to the client.
2. the checking method of Web content publication as described in claim 1, which is characterized in that described by the active user Information is compared with each user information in default list types of database, determines the corresponding use of the current user information After the type of family, the checking method of the Web content publication further include:
If the corresponding user type of the current user information is white list user, the content to be released is issued;
If the corresponding user type of the current user information is black list user, the content to be released is removed, and to institute It states client and sends the unacceptable message of audit.
3. the checking method of Web content publication as described in claim 1, which is characterized in that described semantic using natural language Know otherwise, semantics recognition is carried out to each basic sentence, obtains the corresponding semantic score of each basis sentence Include:
By preset participle mode, word segmentation processing is carried out to the basic sentence, obtains the base for including in the basic sentence Plinth participle;
The basis participle is converted into term vector, and by preset clustering algorithm, the term vector is clustered, is obtained The corresponding cluster centre of each basic sentence;
For each basic sentence, calculate the corresponding cluster centre of the basic sentence and each default meaning of a word vector away from From, and using the corresponding default meaning of a word vector of minimum range as object vector, using the corresponding semantic score of object vector as institute State the corresponding semantic score of basic sentence.
4. the checking method of Web content publication as claimed in claim 3, which is characterized in that use natural language language described Justice is known otherwise, carries out semantics recognition to each basic sentence, obtains the corresponding semanteme of each basis sentence and comments / preceding, the checking method of the Web content publication further include:
Preset training corpus is obtained, and the preset training corpus is analyzed using N-gram model, is obtained The word order column data of the preset training corpus;
It is described that word segmentation processing is carried out to the basic sentence by preset participle mode, obtain include in the basic sentence Basis participle include:
Participle parsing is carried out to the basic sentence, obtains M segmentation sequence;
Each participle sequence is calculated according to the word order column data of the preset training corpus for each segmentation sequence The probability of happening of column obtains the probability of happening of M segmentation sequence;
From the probability of happening of the M segmentation sequences, the probability of happening that selection reaches predetermined probabilities threshold value is described point corresponding Word sequence, as target segmentation sequence, and by each participle in target segmentation sequence, as including in the basic sentence Basis participle.
5. such as the checking method of the described in any item Web content publications of Claims 1-4, which is characterized in that the basis is every The semantic score of a basic sentence, determines that the comprehensive score of the content to be released includes:
The comprehensive score of content to be released is calculated by following formula:
Wherein, MiFor the semantic score of i-th of basic sentence, a and b are parameter preset, SiFor i-th of basic sentence Weighted scoring, n is the quantity of the basic sentence, and W is the comprehensive score of the content to be released, and i and n are positive integer, and i ≤n。
6. a kind of audit device of Web content publication, which is characterized in that the audit device of the Web content publication includes:
Request receiving module, if the audit request of the Web content publication for receiving client transmission, obtains described examine The current user information for including in core request and content to be released;
Type matching module, for by each user information in the current user information and default list types of database into Row matching, determines the corresponding user type of the current user information, wherein the list types of database includes each user Information and the corresponding user type of the user information;
Context resolution module, if being ordinary user for the corresponding user type of the current user information, according to preset Sentence division mode parses the content to be released, obtains each of including basic sentence in the content to be released;
Semantics recognition module, for carrying out semantic knowledge to each basic sentence by the way of natural language semantics recognition Not, the corresponding semantic score of each basis sentence is obtained;
Comprehensive score module determines the synthesis of the content to be released for the semantic score according to each basic sentence Scoring;
As a result determining module, for the comprehensive score to be compared with default scoring threshold value, if the comprehensive score is greater than The default scoring threshold value then confirms the content legality to be released, issues the content to be released, and send out to the client The message that core of submitting to a higher level for approval or revision passes through.
7. the audit device of Web content publication as claimed in claim 6, which is characterized in that the Web content publication is examined Nuclear device further include:
First auditing module, if being used for the corresponding user type of the current user information for white list user, described in publication Content to be released;
Second auditing module, if being used for the corresponding user type of the current user information for black list user, described in removal Content to be released, and sent to the client and audit unacceptable message.
8. the audit device of Web content publication as claimed in claim 6, which is characterized in that the semantics recognition module packet It includes:
Participle unit, for carrying out word segmentation processing to the basic sentence, obtaining the basic language by preset participle mode The basis participle for including in sentence;
Cluster cell, for the basis participle to be converted to term vector, and by preset clustering algorithm, to the term vector It is clustered, obtains the corresponding cluster centre of each basis sentence;
Score unit, for for each basic sentence, calculating the corresponding cluster centre of the basic sentence and each pre- If the distance of meaning of a word vector, and using the corresponding default meaning of a word vector of minimum range as object vector, object vector is corresponding Semantic score is as the corresponding semantic score of the basis sentence.
9. a kind of computer equipment, including memory, processor and storage are in the memory and can be in the processor The computer program of upper operation, which is characterized in that the processor realized when executing the computer program as claim 1 to The checking method of 5 described in any item Web content publications.
10. a kind of computer readable storage medium, the computer-readable recording medium storage has computer program, and feature exists In realization such as Web content described in any one of claim 1 to 5 publication examines when the computer program is executed by processor Kernel method.
CN201910522440.6A 2019-06-17 2019-06-17 Checking method, device, computer equipment and the storage medium of Web content publication Pending CN110377900A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201910522440.6A CN110377900A (en) 2019-06-17 2019-06-17 Checking method, device, computer equipment and the storage medium of Web content publication
PCT/CN2020/085582 WO2020253350A1 (en) 2019-06-17 2020-04-20 Network content publication auditing method and apparatus, computer device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910522440.6A CN110377900A (en) 2019-06-17 2019-06-17 Checking method, device, computer equipment and the storage medium of Web content publication

Publications (1)

Publication Number Publication Date
CN110377900A true CN110377900A (en) 2019-10-25

Family

ID=68248961

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910522440.6A Pending CN110377900A (en) 2019-06-17 2019-06-17 Checking method, device, computer equipment and the storage medium of Web content publication

Country Status (2)

Country Link
CN (1) CN110377900A (en)
WO (1) WO2020253350A1 (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110929055A (en) * 2019-11-15 2020-03-27 北京达佳互联信息技术有限公司 Multimedia quality detection method and device, electronic equipment and storage medium
CN111125023A (en) * 2019-11-15 2020-05-08 北京十分科技有限公司 File auditing, auditing control and publishing method and corresponding device
CN111209363A (en) * 2019-12-25 2020-05-29 华为技术有限公司 Corpus data processing method, apparatus, server and storage medium
CN111309938A (en) * 2020-01-22 2020-06-19 恒大新能源汽车科技(广东)有限公司 Multimedia file processing method and device
CN111414515A (en) * 2020-03-17 2020-07-14 中国建设银行股份有限公司 Resource auditing method, device, equipment and storage medium
WO2020253350A1 (en) * 2019-06-17 2020-12-24 深圳壹账通智能科技有限公司 Network content publication auditing method and apparatus, computer device and storage medium
CN112163585A (en) * 2020-11-10 2021-01-01 平安普惠企业管理有限公司 Text auditing method and device, computer equipment and storage medium
CN112464036A (en) * 2020-11-24 2021-03-09 行吟信息科技(武汉)有限公司 Method and device for auditing violation data
CN112906387A (en) * 2020-12-25 2021-06-04 北京百度网讯科技有限公司 Risk content identification method, apparatus, device, medium, and computer program product
CN113010708A (en) * 2021-03-11 2021-06-22 上海麦糖信息科技有限公司 Verification method and system for illegal friend circle content and illegal chat content
CN113761182A (en) * 2020-06-17 2021-12-07 北京沃东天骏信息技术有限公司 Method and device for determining service problem
CN114245160A (en) * 2021-12-07 2022-03-25 北京达佳互联信息技术有限公司 Information processing method, information processing device, electronic equipment and storage medium
CN116822494A (en) * 2023-08-28 2023-09-29 深圳有咖互动科技有限公司 Broadcast play information processing method, apparatus, electronic device and computer readable medium

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112783917A (en) * 2021-01-04 2021-05-11 广州海量数据库技术有限公司 Work order auditing method and device, storage medium and electronic equipment
CN113835730A (en) * 2021-09-24 2021-12-24 支付宝(杭州)信息技术有限公司 Method, device, equipment and medium for updating audit program

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101446970A (en) * 2008-12-15 2009-06-03 腾讯科技(深圳)有限公司 Method for censoring and process text contents issued by user and device thereof
CN102096680A (en) * 2009-12-15 2011-06-15 北京大学 Method and device for analyzing information validity
WO2015066891A1 (en) * 2013-11-08 2015-05-14 Google Inc. Systems and methods for extracting and generating images for display content
CN109800307A (en) * 2019-01-18 2019-05-24 深圳壹账通智能科技有限公司 Analysis method, device, computer equipment and the storage medium of product evaluation

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6539430B1 (en) * 1997-03-25 2003-03-25 Symantec Corporation System and method for filtering data received by a computer system
CN102098332B (en) * 2010-12-30 2014-04-16 北京新媒传信科技有限公司 Method and device for examining and verifying contents
CN109635073A (en) * 2018-10-18 2019-04-16 深圳壹账通智能科技有限公司 Forum's community application management method, device, equipment and computer readable storage medium
CN110377900A (en) * 2019-06-17 2019-10-25 深圳壹账通智能科技有限公司 Checking method, device, computer equipment and the storage medium of Web content publication

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101446970A (en) * 2008-12-15 2009-06-03 腾讯科技(深圳)有限公司 Method for censoring and process text contents issued by user and device thereof
CN102096680A (en) * 2009-12-15 2011-06-15 北京大学 Method and device for analyzing information validity
WO2015066891A1 (en) * 2013-11-08 2015-05-14 Google Inc. Systems and methods for extracting and generating images for display content
CN109800307A (en) * 2019-01-18 2019-05-24 深圳壹账通智能科技有限公司 Analysis method, device, computer equipment and the storage medium of product evaluation

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
杨志明;王来奇;王泳;: "基于双通道卷积神经网络的问句意图分类研究", 中文信息学报, no. 05 *

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020253350A1 (en) * 2019-06-17 2020-12-24 深圳壹账通智能科技有限公司 Network content publication auditing method and apparatus, computer device and storage medium
CN111125023A (en) * 2019-11-15 2020-05-08 北京十分科技有限公司 File auditing, auditing control and publishing method and corresponding device
CN110929055A (en) * 2019-11-15 2020-03-27 北京达佳互联信息技术有限公司 Multimedia quality detection method and device, electronic equipment and storage medium
CN111209363A (en) * 2019-12-25 2020-05-29 华为技术有限公司 Corpus data processing method, apparatus, server and storage medium
CN111209363B (en) * 2019-12-25 2024-02-09 华为技术有限公司 Corpus data processing method, corpus data processing device, server and storage medium
CN111309938A (en) * 2020-01-22 2020-06-19 恒大新能源汽车科技(广东)有限公司 Multimedia file processing method and device
CN111414515A (en) * 2020-03-17 2020-07-14 中国建设银行股份有限公司 Resource auditing method, device, equipment and storage medium
CN113761182A (en) * 2020-06-17 2021-12-07 北京沃东天骏信息技术有限公司 Method and device for determining service problem
CN112163585A (en) * 2020-11-10 2021-01-01 平安普惠企业管理有限公司 Text auditing method and device, computer equipment and storage medium
CN112163585B (en) * 2020-11-10 2023-11-10 上海七猫文化传媒有限公司 Text auditing method and device, computer equipment and storage medium
CN112464036B (en) * 2020-11-24 2023-06-16 行吟信息科技(武汉)有限公司 Method and device for auditing violation data
CN112464036A (en) * 2020-11-24 2021-03-09 行吟信息科技(武汉)有限公司 Method and device for auditing violation data
CN112906387B (en) * 2020-12-25 2023-08-04 北京百度网讯科技有限公司 Risk content identification method, apparatus, device, medium and computer program product
CN112906387A (en) * 2020-12-25 2021-06-04 北京百度网讯科技有限公司 Risk content identification method, apparatus, device, medium, and computer program product
CN113010708A (en) * 2021-03-11 2021-06-22 上海麦糖信息科技有限公司 Verification method and system for illegal friend circle content and illegal chat content
CN113010708B (en) * 2021-03-11 2023-08-25 上海麦糖信息科技有限公司 Method and system for auditing illegal friend circle content and illegal chat content
CN114245160A (en) * 2021-12-07 2022-03-25 北京达佳互联信息技术有限公司 Information processing method, information processing device, electronic equipment and storage medium
CN116822494A (en) * 2023-08-28 2023-09-29 深圳有咖互动科技有限公司 Broadcast play information processing method, apparatus, electronic device and computer readable medium
CN116822494B (en) * 2023-08-28 2023-12-08 深圳有咖互动科技有限公司 Broadcast play information processing method, apparatus, electronic device and computer readable medium

Also Published As

Publication number Publication date
WO2020253350A1 (en) 2020-12-24

Similar Documents

Publication Publication Date Title
CN110377900A (en) Checking method, device, computer equipment and the storage medium of Web content publication
US20210232762A1 (en) Architectures for natural language processing
US11301637B2 (en) Methods, devices, and systems for constructing intelligent knowledge base
CN111241237B (en) Intelligent question-answer data processing method and device based on operation and maintenance service
CN111274365B (en) Intelligent inquiry method and device based on semantic understanding, storage medium and server
KR101605430B1 (en) SYSTEM AND METHOD FOR BUINDING QAs DATABASE AND SEARCH SYSTEM AND METHOD USING THE SAME
Kumar et al. Sanative chatbot for health seekers
CN113704451B (en) Power user appeal screening method and system, electronic device and storage medium
WO2021218028A1 (en) Artificial intelligence-based interview content refining method, apparatus and device, and medium
CN109299271A (en) Training sample generation, text data, public sentiment event category method and relevant device
CN110909531B (en) Information security screening method, device, equipment and storage medium
CN112287069A (en) Information retrieval method and device based on voice semantics and computer equipment
CN109947934A (en) For the data digging method and system of short text
US11734360B2 (en) Methods and systems for facilitating classification of documents
RU61442U1 (en) SYSTEM OF AUTOMATED ORDERING OF UNSTRUCTURED INFORMATION FLOW OF INPUT DATA
US20210192125A1 (en) Methods and systems for facilitating summarization of a document
WO2023137918A1 (en) Text data analysis method and apparatus, model training method, and computer device
CN113961811B (en) Event map-based conversation recommendation method, device, equipment and medium
Voronov et al. Forecasting popularity of news article by title analyzing with BN-LSTM network
CN114417827A (en) Text context processing method and device, electronic equipment and storage medium
Harshvardhan et al. Topic modelling Twitterati sentiments using Latent Dirichlet allocation during demonetization
CN110276001A (en) Make an inventory a page recognition methods, device, calculate equipment and medium
DeVille et al. Text as Data: Computational Methods of Understanding Written Expression Using SAS
CN117291192B (en) Government affair text semantic understanding analysis method and system
Zaruba Using natural language processing to measure the consistency of opinions expressed by politicians

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination