WO2021151333A1 - Procédé et appareil de reconnaissance de mots sensibles basés sur l'intelligence artificielle et dispositif informatique - Google Patents
Procédé et appareil de reconnaissance de mots sensibles basés sur l'intelligence artificielle et dispositif informatique Download PDFInfo
- Publication number
- WO2021151333A1 WO2021151333A1 PCT/CN2020/124684 CN2020124684W WO2021151333A1 WO 2021151333 A1 WO2021151333 A1 WO 2021151333A1 CN 2020124684 W CN2020124684 W CN 2020124684W WO 2021151333 A1 WO2021151333 A1 WO 2021151333A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- text information
- word
- combination
- target
- word slot
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
Definitions
- This application relates to the field of artificial intelligence technology, and in particular to a method, device and computer equipment for identifying sensitive words based on artificial intelligence.
- this application provides an artificial intelligence-based method, device, and computer equipment for identifying sensitive words, the main purpose of which is to improve the current traditional filtering method of sensitive words, which causes the technical problem of low accuracy in identifying sensitive words.
- a method for identifying sensitive words based on artificial intelligence includes: acquiring text information to be recognized; identifying a target word slot combination contained in the text information, wherein the target word The slot combination is composed of at least one preset word slot; according to the target word slot combination and the intermediate word information of the target word slot combination in the text information, it is determined whether the text information contains sensitive words; If the text information contains sensitive words, the text information is restricted.
- an artificial intelligence-based sensitive word recognition device includes: an acquisition module for acquiring text information to be recognized; a recognition module for recognizing content contained in the text information
- the target word slot combination wherein the target word slot combination is composed of at least one preset word slot; the judgment module is used for combining the target word slot combination and the middle character in the text information according to the target word slot combination
- the word information determines whether the text information contains sensitive words; the processing module is used to perform restriction processing on the text information if it is determined that the text information contains sensitive words.
- a readable storage medium having computer readable instructions stored thereon, and when the computer readable instructions are executed by a processor, the following method is implemented: obtaining text information to be recognized; The target word slot combination included in the text information, wherein the target word slot combination is composed of at least one preset word slot; according to the target word slot combination and the target word slot combination in the middle of the text information The word information determines whether the text information contains sensitive words; if it is determined that the text information contains sensitive words, the text information is restricted.
- a computer device including a readable storage medium, a processor, and computer readable instructions stored on the readable storage medium and executable on the processor, the processor executing all
- the computer-readable instruction implements the following method: acquiring text information to be recognized; identifying a target word slot combination contained in the text information, wherein the target word slot combination is composed of at least one preset word slot; according to the The target word slot combination and the intermediate word information of the target word slot combination in the text information are judged whether the text information contains sensitive words; if it is determined that the text information contains sensitive words, the text information is Restrict processing.
- This application can accurately identify whether the text information contains sensitive words, which can improve the accuracy of sensitive word recognition and improve the processing efficiency of sensitive words.
- FIG. 1 shows a schematic flowchart of an artificial intelligence-based sensitive word recognition method provided by an embodiment of the present application.
- FIG. 2 shows a schematic flowchart of another method for identifying sensitive words based on artificial intelligence according to an embodiment of the present application.
- Fig. 3 shows a schematic structural diagram of an artificial intelligence-based sensitive word recognition device provided by an embodiment of the present application.
- the technical solution of this application can be applied to the fields of artificial intelligence, blockchain and/or big data technology to realize sensitive word recognition.
- the data involved in this application such as text information, can be stored in a database, or can be stored in a blockchain, which is not limited in this application.
- this embodiment provides an artificial intelligence-based sensitive word recognition method. As shown in FIG. 1, the method includes the following steps.
- the text information to be recognized can be the text information of the communication message to be published, such as the message sending text in the instant messaging software, the online communication text between the platform customer service staff and the user, and the message publishing text of the public platform (such as the text published by the web comment , The text of the product evaluation, the text sent by the video barrage, etc.) etc.
- the text information to be recognized can also be text within a specified range (such as a specified range text in a publicly published electronic book, a specified range text in a publicly issued notification message, etc.), etc.
- the execution subject of this embodiment may be a device or equipment for sensitive word recognition and processing, which may be deployed on the client or server, etc., which can improve the accuracy of sensitive word recognition.
- the target word slot combination is composed of at least one preset word slot.
- word slots can be set in advance. These word slots can be determined according to different sensitive words. Specifically, they can include word slots of sensitive words (such as "reduction of principal”, “reduction of rent”, “personal loan”, etc., as well as Bank card number, ID number, account password format, a series of digital symbols and other word slots), non-sensitive word slots (such as "no", "must” and other word slots, as well as single numbers, single words and other word slots ), the word slot of the sensitive word synonyms (such as the word slot that is essentially synonymous with the sensitive word but does not belong to the scope of the sensitive word), and can also contain each participle obtained by splitting the sensitive word (such as for the sensitive word "Go to your unit to investigate "The three word slots of "Fuck You", “Unit” and “Investigation” obtained by splitting). Then these word slots are combined and matched according to the corresponding sensitive word recognition to obtain the word slot combination.
- word slots of sensitive words such
- the pre-statistical word slot combination can be stored in a predetermined storage location (such as a database, a mapping table, etc.).
- a predetermined storage location such as a database, a mapping table, etc.
- each word segmentation in the text information can be combined with a predetermined storage location.
- Each word slot combination in the storage location is matched, and a matching word slot combination is found as the target word slot combination contained in the text information.
- the intermediate word information may be word information that appears between each word slot included in the word slot combination in the text information.
- the text message is "XX finds someone to go to your unit, and finds out XX after doing a background investigation on you", where XX stands for words that are omitted from the text message, and the target word slot combination contained in the text message is "Go You” + “unit” + “investigation”, and the "de” between "fuck you” and “unit”, and "being a background for you” between “unit” and “investigation” are the middle words .
- the word slot combination corresponding to the sensitive word has the same meaning as that of the sensitive word to a certain extent. It can be a word slot combination composed of the sensitive word itself; or it is not a sensitive word, but Combination of word slots with the meaning of sensitive words, etc.
- the published text that actually contains sensitive words will be mixed with spaces, symbols, or added some words, or the same semantic rewriting through other texts, etc., which will affect whether there is sensitive in the text information.
- the accuracy of word discrimination is not only judges the word slot combination and the intermediate word information of the word slot combination in the text information, it can accurately identify whether the text information contains sensitive words in these cases, which can improve the sensitivity. Accuracy of word recognition.
- the text information when it is determined that the text information contains sensitive words, the text information can be marked and reminded to inform the existence of sensitive word information, such as highlighting the text part containing the target word slot combination in the text information (such as highlighting, bolding, adding Underscore, etc.), or restrict the sending of communication messages containing the text information, etc.
- sensitive word information such as highlighting the text part containing the target word slot combination in the text information (such as highlighting, bolding, adding Underscore, etc.), or restrict the sending of communication messages containing the text information, etc.
- the target word slot combination contained in the text information can be identified.
- the word slot combination is composed of at least one preset word slot, and then according to the target word slot combination and target
- the word slot combines the intermediate word information in the text information to determine whether the text information contains sensitive words.
- this embodiment uses the method of discriminating intermediate words between the word slot combination + word slot combination, even if symbols or spaces are added in the text sensitive words, or some words are added, Or the same semantic rewriting through other texts, etc., can accurately identify whether the text information contains sensitive words, which can improve the accuracy of sensitive word recognition. If it is determined that the text information contains sensitive words, the text information can also be restricted and processed in time. The entire process of sensitive word recognition + restriction processing can be automated, which improves the efficiency of sensitive word processing.
- step 201 can specifically include: Obtain the text information to be recognized from the block chain.
- the text information to be recognized can be obtained from the target node of the blockchain, and then sensitive word recognition can be performed on the text information.
- the blockchain referred to in this embodiment is a new application mode of computer technology such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm.
- Blockchain essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information for verification. The validity of the information (anti-counterfeiting) and the generation of the next block.
- the blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
- the preset special matches can be symbols such as "@”, “#”, “ ⁇ ", “ ⁇ ”, “/”, and "*".
- the character spaces and preset special symbols in the text information are cleared, which can effectively reduce noise interference and improve the accurate matching of word slot combinations and corresponding detection rules.
- the configuration of the sensitive word recognition rule can be performed first, and the rule configuration can be divided into three layers: slot, rule, and model.
- the word slot contains some preset keywords such as sensitive words, non-sensitive words, sensitive word synonyms, etc.
- the rule is a combination of word slots (equivalent to a preset verification rule, that is, the text information meets the criterion for the presence of sensitive words)
- the model is a combination of rules (equivalent to a combination of multiple verification rules).
- rules and models can be freely combined, and a sensitive word filtering strategy that meets the requirements can be formulated according to the business scenario. For example, after establishing word slots, rules, and models, each participle in the text information after clearing character spaces and preset special symbols is matched with the word slot combination in the rule, and then the target word slot combination contained in it is found.
- a single word slot combination may correspond to at least one target verification rule, and each verification rule is equivalent to a preset criterion for containing sensitive words.
- a single word slot combination corresponds to at least two verification rules, it is equivalent to a combination of verification rules.
- this embodiment can pre-determine whether the word slot combination corresponds to a single check rule or a check rule combination containing at least two check rules according to actual needs. That is, the qualifier slot combination can be used in the rule layer or the model. Floor.
- the range of sensitive words can be limited, that is, the content of detection.
- the scope of action is the rule layer or the model layer. The detection is performed in the specified range, and then the verification rules can be flexibly used for sensitive word recognition. In the case of, a variety of verification rules can be used to make accurate judgments from different angles, which can improve the accuracy of sensitive word recognition.
- At least one target verification rule corresponding to each target word slot combination is combined to obtain a target verification rule combination, and the target verification rule combination includes at least one preset sensitive word determination criterion.
- the word slot arrangement information of the target word slot combination in the text information and the intermediate word information between the word slots respectively determine whether the text information meets the multiple preset sensitive word determination criteria in the target check rule combination.
- step 205 may specifically include: if the criteria for determining sensitive words is that each word slot in the target word slot combination appears in the text information, and the number of intermediate words is limited If the range is to meet the criteria of the judgment, then the arrangement information of the word slots in the judgment (such as the sequence in which the word slots appear in the text) conforms to the preset word slot sequence corresponding to the target word slot combination, and the number of intermediate words is less than or equal to the predetermined word slot sequence.
- the text information contains sensitive words; if the sensitive word judgment criterion is that each word slot in the target word slot combination appears in the text information, and the number of intermediate words within a limited range does not meet the judgment standard, When it is determined that the word slot arrangement information matches the preset word slot sequence corresponding to the target word slot combination, and the number of intermediate words is greater than or equal to the preset number threshold, it is determined that the text information contains sensitive words.
- each word slot combination has its own corresponding preset word slot sequence, which is used to determine whether it has the meaning of a sensitive word.
- the word slot combination can correspond to at least one preset word slot sequence according to actual conditions.
- the middle word part is used to modify the semantics of the word slot combination, and the preset number threshold is used to determine whether the language-modified word slot combination still has the meaning of a sensitive word.
- the threshold size can be preset according to the actual situation.
- a word slot combination composed of word slots "Go to you", “unit”, and “investigation” can be matched to get the corresponding preset verification rule as the judgment standard of [and], and editable among the three word slots is allowed
- the number of characters in is 8. If the message sent by the user contains these three words at the same time and there are less than 8 words in the middle, it will be judged to meet the criterion of [and], that is, it is determined that the message sent by the user contains sensitive words. And if there are more than 8 edited characters among the three sensitive words, it is judged that it does not meet the criterion of [and], that is, it is determined that the information sent by the user does not contain sensitive words.
- the word slot combination consisting of the word slot of sensitive words "reduction of principal” and the word slot of non-sensitive words "will not", after matching, the corresponding preset verification rule is [Non] judgment standard, There are 3 editable characters between the two sensitive words. If the information sent by the user contains these two words at the same time, and there are less than 3 words in the middle, it will be judged to meet the criterion of [Non], that is, it is determined that the information sent by the user does not contain sensitive words. [Non]
- the verification rule is to set a word slot of a sensitive word and a word slot of a non-sensitive word. If two words appear together, they will not be hit, and the middle word can be set.
- the number of words in the middle between the two word slots of "no” and “principal reduction” is 0, which is considered to have not hit the corresponding verification rule Standards, and then determine that this text does not contain sensitive words.
- the principal will be reduced
- the middle word between the two word slots of "no” and “principal reduction” ", you can rest assured, will definitely” the number is greater than 3, it is considered that the corresponding verification rule standard is hit, and then it is determined that the text contains sensitive words.
- the combination of verification rules may include at least two verification rules, which is equivalent to identifying sensitive words through the model in step 203.
- the combination of verification rules contains three verification rules.
- the first verification rule is ID verification
- the second verification rule is sensitive word + and verification
- the third verification rule is sensitive word + non-verification.
- verification rule 1 when using verification rule 1 to identify sensitive words, it can be recognized whether the text information (after removing character spaces, preset special symbols, rare words and other noisy texts) contains a string of numeric word slots, if it contains this type of word slot
- the word slot can determine whether the string of numbers corresponding to the word slot conforms to the ID card format.
- step 205 may further specifically include: if the target inspection rule combination includes at least one preset sensitive word determination criterion with different execution priorities, then according to each target inspection rule combination.
- the execution priority of the sensitive word judgment standard is from high to low, and the text information is judged in turn; in the process of sequential judgment, if it is determined that there is a sensitive word judgment standard that the text information meets, then the subsequent judgment on the text information is stopped , And use the currently obtained judgment result as the result of judging the text information by using the target inspection rule combination.
- the target check rule combination contains five check rules, and the five check rules are preset with execution priority in the check rule combination (for example, the preset priority is based on the sensitive word recognition success rate from high to low) ,
- the order of execution priority from high to low is: verification rule one>verification rule three>verification rule four>verification rule five>verification rule two, and then follow the order of this sort and use the corresponding verification in turn
- the rule judges the text information. If it is judged that there are sensitive words in the text information through verification rule three, the subsequent verification process of verification rule four, verification rule five, and verification rule two will be stopped. Through this optional method, there is no need to check sensitive words one by one, and the judgment result can be obtained as quickly as possible, which can improve the efficiency of sensitive word recognition.
- the calculation priority range can be limited. Similar to the four arithmetic operations, the verification rules appearing in the priority range are executed first. If there are specific regular symbols in the verification rules that represent different meanings, the verification rules that match the priority range can be placed in brackets "()". When the verification rules are executed, the content of the rules in the brackets will be executed first, and then the others will be executed. The verification rules.
- the text information meets at least one set of sensitive word determination criteria in the target inspection rule combination, it is determined that the text information contains sensitive words.
- a set of sensitive word judgment standards may include at least one sensitive word judgment standard, that is, one, or two, or more sensitive word judgment standards, which may be determined according to actual sensitive word judgment accuracy requirements. .
- step 206b which is parallel to step 206a, if none of the text information meets the sensitive word determination criteria in the target inspection rule combination, it is determined that the text information does not contain sensitive words.
- a plurality of preset sensitive word judgment criteria in the combination of inspection rules may be used to perform sensitive word recognition judgment on the text information respectively. If the text information is judged to meet at least one of the preset sensitive word judgment criteria, then It can be determined that the text information contains sensitive words, which can improve the accuracy of sensitive word recognition.
- restricting the text information may specifically include: preventing the publication of the text information; or, using preset characters (such as "*", "-” and other characters in the text part containing the target word slot combination in the text information, (It has the effect of desensitization) before publishing after replacement; or, sending the text information to the review module for review, and publishing if the review is passed.
- preset characters such as "*", "-" and other characters in the text part containing the target word slot combination in the text information, (It has the effect of desensitization) before publishing after replacement
- sending the text information to the review module for review and publishing if the review is passed.
- the system will prevent the user from publishing sensitive words, or directly delete the content containing sensitive words sent by the user. For some less sensitive words, they will not be deleted immediately after they are sent out, and the reviewers need to conduct a second manual review.
- the method of this embodiment may further include: recording the text part of the target word slot combination in the text information as sample data; and then periodically analyzing each recorded sample data to make statistics on each sample data Word combinations that appear more frequently than the preset frequency threshold and are different from the existing word slot combinations; calculate the semantic similarity between the word combinations obtained by statistics and the preset sensitive words and/or preset sensitive sentences; The target word combination whose semantic similarity is greater than the preset similarity threshold is used as a new word slot combination, and the verification rule corresponding to the new word slot combination is updated according to the sample data containing the new word slot combination; it can be used later The new word slot combination and corresponding inspection rules determine whether other text information contains sensitive words.
- the automatic update of the sensitive word recognition system can be realized, so as to further improve the accuracy of subsequent sensitive word recognition.
- the entire sensitive word recognition system is equivalent to having the function of machine learning, which can realize the accurate recognition of sensitive words by artificial intelligence.
- text data determined to contain sensitive words may sometimes include other word combinations with the meaning of sensitive words.
- This embodiment collects these text data as sample data; regularly analyzes the sample data, finds word combinations that appear more than a certain threshold and is different from the existing word slot combinations, and compares them with the preset sensitivity Words and/or preset sensitive sentences are calculated for semantic similarity, and then find those new word slot combinations that have not been discovered before and also have the meaning of sensitive words, and formulate their corresponding verification rules.
- the new word slot combination and corresponding inspection rules can be used to determine whether other text information contains sensitive words, so as to find more text data that actually has the meaning of sensitive words.
- the method of this embodiment can also be applied to a system for intelligent sensitive word quality inspection.
- Algorithms can be used to match entries. Specific rules and strategies can be set to reduce noise interference. It can span text and perform accurate sensitive words. filter. After constructing the sensitive vocabulary, the algorithm is used to traverse the text and match with the sensitive word tree to achieve the function of identifying and filtering sensitive vocabulary. Intelligent strategies can be customized according to customer needs to efficiently filter prohibited messages, malicious promotion, vulgar abuse, low-quality irrigation and other sensitive words and prohibited variants.
- the intelligent quality inspection system has a high accuracy of content review and recognition, which can quickly process text, greatly reduce the workload of manual review, eliminate online risks, improve content output quality, purify the network environment, and ensure a good user experience.
- this embodiment provides an artificial intelligence-based sensitive word recognition device.
- the device includes: an acquisition module 31, a recognition module 32, The judgment module 33 and the processing module 34.
- the obtaining module 31 is used to obtain the text information to be recognized.
- the recognition module 32 is configured to recognize a target word slot combination contained in the text information, wherein the target word slot combination is composed of at least one predetermined word slot.
- the judgment module 33 is configured to judge whether the text information contains sensitive words according to the target word slot combination and the intermediate word information of the target word slot combination in the text information.
- the processing module 34 is configured to perform filtering processing on the text information if it is determined that the text information contains sensitive words.
- the judgment module 33 is specifically configured to obtain a target verification rule combination according to at least one target verification rule corresponding to each target word slot combination; according to the target word slot combination in the text information The word slot arrangement information and the intermediate word information between the word slots in, respectively determine whether the text information meets the multiple preset sensitive word determination criteria in the target inspection rule combination; if the text information meets the At least one set of sensitive word determination criteria in the target inspection rule combination is determined to include the sensitive words; if none of the text information meets the determination criteria of each sensitive word in the target inspection rule combination, then the text information is determined Does not contain sensitive words.
- the judging module 33 is specifically used to determine if the sensitive word judgment criterion is that each word slot in the target word slot combination appears in the text information, and the number of intermediate words is within a limited range.
- the criterion for determination is to determine the text information when it is determined that the word slot arrangement information matches the preset word slot sequence corresponding to the target word slot combination, and the number of intermediate words is less than or equal to the preset number threshold.
- the sensitive word judgment criterion is that each word slot in the target word slot combination appears in the text information, and the number of intermediate words does not meet the judgment criterion within a limited range, then the word is judged When the slot arrangement information conforms to the preset word slot sequence corresponding to the target word slot combination, and the number of intermediate words is greater than or equal to the preset number threshold, it is determined that the text information contains sensitive words.
- the judging module 33 is specifically further configured to, if the target inspection rule combination includes at least one preset sensitive word determination criterion with different execution priorities, then according to each sensitive word in the target inspection rule combination.
- the execution priority of the word judgment standard is from high to low, and the text information is judged in sequence; in the process of sequential judgment, if it is determined that there is a sensitive word judgment standard that the text information meets, then the subsequent review of the text information is stopped. The text information is judged, and the currently obtained judgment result is used as the result of judging the text information by using the target inspection rule combination.
- the device also includes: a recording module and an analysis module.
- the recording module is configured to record the text part of the text information containing the target word slot combination as sample data after the restriction processing is performed on the text information.
- the analysis module is used to periodically analyze the recorded sample data, and count the word combinations in each sample data that appear more frequently than the preset frequency threshold and are different from the existing word slot combinations; Word combinations are calculated for semantic similarity with preset sensitive words and/or preset sensitive sentences; target word combinations with semantic similarity greater than the preset similarity threshold are used as a new word slot combination, and based on the new word slot combination.
- the sample data of the word slot combination is updated with the verification rule corresponding to the new word slot combination; the new word slot combination and the corresponding verification rule are used to determine whether other text information contains sensitive words.
- the processing module 34 is specifically configured to prevent the publication of the text information; or, replace the text part of the text information containing the target word slot combination with preset characters before publishing; or , Send the text information to the review module for review, and publish it if the review is passed.
- the text information is pre-stored in the blockchain; correspondingly, the obtaining module 31 is specifically configured to obtain the text information from the blockchain; the recognition module 32, Specifically, it is used to clear the character spaces and preset special symbols in the text information; identify the target word slot combination included in the text information after removing the character spaces and the preset special symbols.
- this embodiment also provides a readable storage medium on which computer-readable instructions are stored.
- the computer-readable instructions are executed by a processor, the foregoing Figure 1 and Figure 2 show the artificial intelligence-based sensitive word recognition method.
- the readable storage medium involved in this application may be a computer readable storage medium.
- the storage medium involved in this application such as a readable storage medium, may be non-volatile, such as a non-volatile readable storage medium, or may be volatile, such as a volatile readable storage medium.
- the technical solution of this application can be embodied in the form of a software product.
- the software product can be stored in a non-volatile storage medium (which can be a CD-ROM, U disk, mobile hard disk, etc.), including several
- the instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute the methods in each implementation scenario of the present application.
- this embodiment also provides a computer device, which may be a personal computer, a notebook computer, or a server.
- the physical equipment includes a storage medium and a processor; the storage medium is used to store computer-readable instructions; the processor is used to execute computer-readable instructions to implement the aforementioned manual-based Intelligent method of identifying sensitive words.
- the storage medium may be a readable storage medium, such as a nonvolatile readable storage medium or a volatile readable storage medium.
- the computer device may also include a user interface, a network interface, a camera, a radio frequency (RF) circuit, a sensor, an audio circuit, a WI-FI module, and so on.
- the user interface may include a display screen (Display), an input unit such as a keyboard (Keyboard), etc., and the optional user interface may also include a USB interface, a card reader interface, and the like.
- the optional network interface can include standard wired interface, wireless interface (such as Bluetooth interface, WI-FI interface), etc.
- the computer device structure provided in this embodiment does not constitute a limitation on the physical device, and may include more or fewer components, or combine certain components, or arrange different components.
- the storage medium may also include an operating system and a network communication module.
- the operating system is a program that manages the hardware and software resources of the aforementioned physical devices, and supports the operation of information processing programs and other software and/or programs.
- the network communication module is used to realize the communication between the various components in the storage medium and the communication with other hardware and software in the physical device.
- the target word slot combination contained in the text information can be identified.
- the word slot combination is composed of at least one preset word slot, and then the target word slot combination and the target word slot combination are used in the text information In the middle word information, judge whether the text information contains sensitive words.
- this embodiment uses the method of discriminating intermediate words between the word slot combination + word slot combination, even if symbols or spaces are added in the text sensitive words, or some words are added, Or the same semantic rewriting through other texts, etc., can accurately identify whether the text information contains sensitive words, which can improve the accuracy of sensitive word recognition. If it is determined that the text information contains sensitive words, the text information can also be restricted and processed in time. The entire process of sensitive word recognition + restriction processing can be automated, which improves the efficiency of sensitive word processing.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Machine Translation (AREA)
Abstract
L'invention concerne un procédé et un appareil de reconnaissance de mots sensibles, ainsi qu'un dispositif informatique, se rapportant au domaine technique de l'intelligence artificielle. Le procédé comprend les étapes consistant : dans un premier temps, à acquérir des informations textuelles à soumettre à une reconnaissance (101) ; puis à procéder à une reconnaissance sur une combinaison d'emplacements de mots cible comprise dans les informations textuelles (102), la combinaison d'emplacements de mots cible étant composée d'au moins un emplacement de mot prédéfini ; à déterminer ensuite, en fonction de la combinaison d'emplacements de mots cible et d'informations de mots et de caractères intermédiaires de la combinaison d'emplacements de mots cible dans les informations textuelles, si les informations textuelles comprennent un mot sensible (103) ; et si tel est le cas, à soumettre les informations textuelles à un traitement de limitation (104). Ce procédé peut améliorer la précision de la reconnaissance de mots sensibles. De plus, ledit procédé concerne également la technologie des chaînes de blocs et des données textuelles peuvent être stockées dans une chaîne de blocs de façon à garantir la confidentialité et la sécurité des données.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010927419.7A CN112016317A (zh) | 2020-09-07 | 2020-09-07 | 基于人工智能的敏感词识别方法、装置及计算机设备 |
CN202010927419.7 | 2020-09-07 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021151333A1 true WO2021151333A1 (fr) | 2021-08-05 |
Family
ID=73515434
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/124684 WO2021151333A1 (fr) | 2020-09-07 | 2020-10-29 | Procédé et appareil de reconnaissance de mots sensibles basés sur l'intelligence artificielle et dispositif informatique |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN112016317A (fr) |
WO (1) | WO2021151333A1 (fr) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112580344A (zh) * | 2020-12-22 | 2021-03-30 | 京东数字科技控股股份有限公司 | 信息监督方法、装置、设备、存储介质及程序产品 |
CN113705211B (zh) * | 2021-10-29 | 2022-01-18 | 云账户技术(天津)有限公司 | 营业执照字号自动生成方法、装置和可读存储介质 |
CN114357511A (zh) * | 2021-12-30 | 2022-04-15 | 北京鼎普科技股份有限公司 | 一种对文档关键内容作标记的方法、装置和用户终端 |
CN117436437A (zh) * | 2022-07-11 | 2024-01-23 | 华为云计算技术有限公司 | 一种组合敏感词检测方法、装置、设备及集群 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107992471A (zh) * | 2017-11-10 | 2018-05-04 | 北京光年无限科技有限公司 | 一种人机交互过程中的信息过滤方法及装置 |
CN108519970A (zh) * | 2018-02-06 | 2018-09-11 | 平安科技(深圳)有限公司 | 文本中敏感信息的鉴定方法、电子装置及可读存储介质 |
CN110096585A (zh) * | 2019-03-26 | 2019-08-06 | 珠海鹏游网络科技有限公司 | 一种智能敏感词过滤系统 |
US20190295533A1 (en) * | 2018-01-26 | 2019-09-26 | Shanghai Xiaoi Robot Technology Co., Ltd. | Intelligent interactive method and apparatus, computer device and computer readable storage medium |
CN111339760A (zh) * | 2018-12-18 | 2020-06-26 | 北京京东尚科信息技术有限公司 | 词法分析模型的训练方法、装置、电子设备、存储介质 |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102779176A (zh) * | 2012-06-27 | 2012-11-14 | 北京奇虎科技有限公司 | 关键词过滤系统及方法 |
CN107193804B (zh) * | 2017-06-02 | 2019-03-29 | 河海大学 | 一种面向词和组合词的垃圾短信文本特征选择方法 |
CN111191443A (zh) * | 2019-12-19 | 2020-05-22 | 深圳壹账通智能科技有限公司 | 基于区块链的敏感词检测方法、装置、计算机设备和存储介质 |
CN111539206B (zh) * | 2020-04-27 | 2023-07-25 | 中国银行股份有限公司 | 一种确定敏感信息的方法、装置、设备及存储介质 |
-
2020
- 2020-09-07 CN CN202010927419.7A patent/CN112016317A/zh active Pending
- 2020-10-29 WO PCT/CN2020/124684 patent/WO2021151333A1/fr active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107992471A (zh) * | 2017-11-10 | 2018-05-04 | 北京光年无限科技有限公司 | 一种人机交互过程中的信息过滤方法及装置 |
US20190295533A1 (en) * | 2018-01-26 | 2019-09-26 | Shanghai Xiaoi Robot Technology Co., Ltd. | Intelligent interactive method and apparatus, computer device and computer readable storage medium |
CN108519970A (zh) * | 2018-02-06 | 2018-09-11 | 平安科技(深圳)有限公司 | 文本中敏感信息的鉴定方法、电子装置及可读存储介质 |
CN111339760A (zh) * | 2018-12-18 | 2020-06-26 | 北京京东尚科信息技术有限公司 | 词法分析模型的训练方法、装置、电子设备、存储介质 |
CN110096585A (zh) * | 2019-03-26 | 2019-08-06 | 珠海鹏游网络科技有限公司 | 一种智能敏感词过滤系统 |
Also Published As
Publication number | Publication date |
---|---|
CN112016317A (zh) | 2020-12-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021151333A1 (fr) | Procédé et appareil de reconnaissance de mots sensibles basés sur l'intelligence artificielle et dispositif informatique | |
CN110855676B (zh) | 网络攻击的处理方法、装置及存储介质 | |
JP5460887B2 (ja) | 分類ルール生成装置及び分類ルール生成プログラム | |
CN110929145A (zh) | 舆情分析方法、装置、计算机装置及存储介质 | |
US9418057B2 (en) | Fraud detection using text analysis | |
US11100148B2 (en) | Sentiment normalization based on current authors personality insight data points | |
US10387467B2 (en) | Time-based sentiment normalization based on authors personality insight data points | |
CN113076735B (zh) | 目标信息的获取方法、装置和服务器 | |
US20220164542A1 (en) | Disentanglement of Chat Utterances | |
US12079543B2 (en) | Rendering visual components on applications in response to voice commands | |
CN111552798B (zh) | 基于名称预测模型的名称信息处理方法、装置、电子设备 | |
WO2024011933A1 (fr) | Procédé et appareil de détection de mots sensibles combinés, et grappe | |
CN115314268A (zh) | 基于流量指纹和行为的恶意加密流量检测方法和系统 | |
US20180150747A1 (en) | Enhancing Time-to-Answer for Community Questions in Online Discussion Sites | |
US20180150748A1 (en) | Enhanced Ingestion of Question-Answer Pairs into Question Answering Systems by Preprocessing Online Discussion Sites | |
CN111144546A (zh) | 评分方法、装置、电子设备及存储介质 | |
CN109672586A (zh) | 一种dpi业务流量识别方法、装置与计算机可读存储介质 | |
CN111027065B (zh) | 一种勒索病毒识别方法、装置、电子设备及存储介质 | |
WO2024055603A1 (fr) | Procédé et appareil permettant d'identifier un texte provenant d'un mineur | |
US20200167475A1 (en) | Self-Evolved Adjustment Framework for Cloud-Based Large System Based on Machine Learning | |
CN111552890B (zh) | 基于名称预测模型的名称信息处理方法、装置、电子设备 | |
CN114547059A (zh) | 平台数据的更新处理方法、装置及计算机设备 | |
Coray | Óðinn: A Framework for Large-Scale Wordlist Analysis and Struc-ture-Based Password Guessing | |
TWI477996B (zh) | 自動分析個人化輸入之方法 | |
KR100753779B1 (ko) | 혼합형 초성 검색을 수행하는 방법 및 상기 방법을수행하는 시스템 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20916505 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20916505 Country of ref document: EP Kind code of ref document: A1 |