CN105183761A - Sensitive word replacement method and apparatus - Google Patents

Sensitive word replacement method and apparatus Download PDF

Info

Publication number
CN105183761A
CN105183761A CN201510446574.6A CN201510446574A CN105183761A CN 105183761 A CN105183761 A CN 105183761A CN 201510446574 A CN201510446574 A CN 201510446574A CN 105183761 A CN105183761 A CN 105183761A
Authority
CN
China
Prior art keywords
sensitive word
word
sensitive
replacement
lexicon
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510446574.6A
Other languages
Chinese (zh)
Other versions
CN105183761B (en
Inventor
张琦
刘锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Netease Media Technology Beijing Co Ltd
Original Assignee
Netease Media Technology Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Netease Media Technology Beijing Co Ltd filed Critical Netease Media Technology Beijing Co Ltd
Priority to CN201510446574.6A priority Critical patent/CN105183761B/en
Publication of CN105183761A publication Critical patent/CN105183761A/en
Application granted granted Critical
Publication of CN105183761B publication Critical patent/CN105183761B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Machine Translation (AREA)

Abstract

Embodiments of the invention provide a sensitive word replacement method and apparatus. The method comprises: receiving a target text; searching for a sensitive word in the target text according to a sensitive word library; determining a non sensitive word corresponding to the sensitive word according to a sensitive word replacement rule, wherein the non sensitive word has lower sensitivity than the sensitive word and is used for expressing the meaning same as or similar to the sensitive word; and replacing the sensitive word with the non sensitive word. According to the method and the apparatus, when a user releases the content to the internet, even if the text is mingled with the sensitive word, the release positivity of the user also can be fully protected by properly processing the sensitive word, thereby improving the sense of participation of the user.

Description

Sensitive word replacement method and device
Technical field
Embodiments of the present invention relate to communication technical field, and more specifically, embodiments of the present invention relate to a kind of sensitive word replacement method and device.
Background technology
This part embodiments of the present invention be intended to for stating in claims provide background or context.Description is not herein because be included in just admit it is prior art in this part.
The appearance of internet greatly facilitates the issue between users of the various information content and propagation.Such as, network instant communication system because of its can complete communication between client quickly and easily and use by increasing people.In addition, microblogging and forum also possess and have large customer base body, issue and the feature such as information of watching is convenient, influence surface is larger.Therefore, usually someone will utilize various internet appliance to send the text message comprising " sensitive word " in a large number.Such as, described sensitive word can comprise uncivil vocabulary, relate to the responsive vocabulary etc. of national security etc.
At present, the identification of the susceptibility of target text is undertaken by manual type mostly, or by manually setting up responsive vocabulary, and based on this responsive vocabulary, matching inquiry is carried out to target text by machine, to determine the susceptibility of target text.In the case, when we are content distributed to internet time, once be mingled with sensitive word in the text, so generally two kinds of situations below will be there will be.
A kind of situation is that system is directly forbidden that user submits target text to and points out in target text to user and has been mingled with sensitive word.
Another kind of situation is that system allows user to submit target text to, but before target text is really shown to internet, can enter " manual examination and verification link ", confirm whether really to have occurred in text to judge it is these words of sensitive word by system in the mode of manual examination and verification.If manual examination and verification are thought include sensitive word in target text, then do not allow this section of text to be published to internet, on the contrary, if manual examination and verification think that it does not comprise sensitive word, then by this section of textual presentation on internet.
Summary of the invention
But, due in the prior art, once judge to contain sensitive word in target text, to forbid that the text is submitted to system or forbids that the text is published to internet by system by user utterly, that is, user, by without any issuing oneself idea, so this causes the issue enthusiasm destroying user, reduces the sense of participation of user.
Therefore, in the prior art, the Consumer's Experience how improved when content distributed is very bothersome process.
For this reason; be starved of a kind of sensitive word replacement method and device of improvement, to make user in content distributed to internet, even if be mingled with sensitive word in the text; also by carrying out suitable process to sensitive word to the issue enthusiasm of the user that adequately protects, the sense of participation of user is promoted.
In the present context, embodiments of the present invention are expected to provide a kind of sensitive word replacement method and device.
In the first aspect of embodiment of the present invention, provide a kind of sensitive word replacement method, comprising: receiving target text; In described target text, sensitive word is searched according to responsive dictionary; Determine the non-sensitive word corresponding with described sensitive word according to sensitive word Substitution Rules, described non-sensitive word has the susceptibility lower than described sensitive word and for expressing the implication identical or close with described sensitive word; And described sensitive word is replaced with described non-sensitive word.
In the second aspect of embodiment of the present invention, provide a kind of sensitive word alternative, comprising: target text receiving element, for receiving target text; Sensitive word searches unit, for searching sensitive word according to responsive dictionary in described target text; Non-sensitive word determining unit, for determining the non-sensitive word corresponding with described sensitive word according to sensitive word Substitution Rules, described non-sensitive word has the susceptibility lower than described sensitive word and for expressing the implication identical or close with described sensitive word; And non-sensitive word replacement unit, for described sensitive word is replaced with described non-sensitive word.
In the third aspect of embodiment of the present invention, provide a kind of sensitive word alternative, comprising: storage unit and processing unit, described storage unit stores computer instruction, when described processing unit performs described computer instruction, perform following steps: receiving target text; In described target text, sensitive word is searched according to responsive dictionary; Determine the non-sensitive word corresponding with described sensitive word according to sensitive word Substitution Rules, described non-sensitive word has the susceptibility lower than described sensitive word and for expressing the implication identical or close with described sensitive word; And described sensitive word is replaced with described non-sensitive word.
In the fourth aspect of embodiment of the present invention, provide a kind of computer program, comprising: program code, when performing described program code on one or more calculation element, described program code is for performing following steps: receiving target text; In described target text, sensitive word is searched according to responsive dictionary; Determine the non-sensitive word corresponding with described sensitive word according to sensitive word Substitution Rules, described non-sensitive word has the susceptibility lower than described sensitive word and for expressing the implication identical or close with described sensitive word; And described sensitive word is replaced with described non-sensitive word.
According to sensitive word replacement method and the device of embodiment of the present invention, significant process can be carried out to the sensitive word in text, to go sensitization to this sensitive word.The benefit done like this is: at customer-side, reduces the negative energy of user, is conducive to social harmony; At system aspects, reduce the workload of " manual examination and verification " this kind of work; In cultural, embody humanistic care and the social harmony of software.Therefore, method of the present invention makes user in content distributed to internet, even if be mingled with sensitive word in the text, also by carrying out suitable process to sensitive word to the issue enthusiasm of the user that adequately protects, promotes the sense of participation of user.
Accompanying drawing explanation
By reference to accompanying drawing reading detailed description hereafter, above-mentioned and other objects of exemplary embodiment of the invention, feature and advantage will become easy to understand.In the accompanying drawings, show some embodiments of the present invention by way of example, and not by way of limitation, wherein:
Fig. 1 schematically shows the block schematic illustration of an exemplary application scene of embodiments of the present invention;
Fig. 2 schematically shows the process flow diagram of an embodiment of sensitive word replacement method in embodiments of the present invention;
Fig. 3 schematically shows in embodiments of the present invention the process flow diagram of the embodiment determining non-sensitive word step;
Fig. 4 schematically shows in embodiments of the present invention the process flow diagram of the first example determining non-sensitive word step;
Fig. 5 schematically shows in embodiments of the present invention the process flow diagram of the second example determining non-sensitive word step;
Fig. 6 schematically shows in embodiments of the present invention the process flow diagram of the 3rd example determining non-sensitive word step;
Fig. 7 schematically shows in embodiments of the present invention the process flow diagram of the another embodiment determining non-sensitive word step;
Fig. 8 schematically shows the schematic diagram of the sensitive word alternative according to embodiment of the present invention.
In the accompanying drawings, identical or corresponding label represents identical or corresponding part.
Embodiment
Below with reference to some illustrative embodiments, principle of the present invention and spirit are described.Should be appreciated that providing these embodiments is only used to enable those skilled in the art understand better and then realize the present invention, and not limit the scope of the invention by any way.On the contrary, provide these embodiments to be to make the disclosure more thorough and complete, and the scope of the present disclosure intactly can be conveyed to those skilled in the art.
One skilled in the art will appreciate that embodiments of the present invention can be implemented as a kind of system, device, equipment, method or computer program.Therefore, the disclosure can be implemented as following form, that is: hardware, completely software (comprising firmware, resident software, microcode etc.) completely, or the form that hardware and software combines.
According to the embodiment of the present invention, a kind of sensitive word replacement method and device is proposed.
In this article, it is to be appreciated that any number of elements in accompanying drawing is all unrestricted for example, and any name is all only for distinguishing, and does not have any limitation.
Below with reference to some representative embodiments of the present invention, explaination principle of the present invention and spirit in detail.
summary of the invention
The present inventor finds, in the prior art, once judge to contain sensitive word in target text, to forbid that the text is submitted to system or forbids that the text is published to internet by system by user utterly, that is, user expresses the idea of oneself without any method, obviously, this destroys the issue enthusiasm of user, reduce the sense of participation of user.
Based on the analysis of the above-mentioned discovery of the present inventor, fundamental design idea of the present invention is: after the target text receiving user's submission, once detect and comprise sensitive word at this target text, then this sensitive word can be replaced with the non-sensitive word with identical or close implication, and then proceed content issue.
After describing ultimate principle of the present invention, lower mask body introduces various non-limiting embodiment of the present invention.
application scenarios overview
Fig. 1 schematically shows the block schematic illustration of an exemplary application scene of embodiments of the present invention.
With reference to figure 1, the embodiment of the present invention can be applied in content delivering system as shown in Figure 1, and this content delivering system comprises server 101, client 102 etc.
Such as, user can be undertaken by the user interface interaction equipment (such as, client 102) on subscriber equipment and the server 101 issued for content alternately.It will be understood by those skilled in the art that the block schematic illustration shown in Fig. 1 is only the example that embodiments of the present invention can be achieved wherein.The scope of application of embodiment of the present invention is not subject to the restriction of any aspect of this framework.Such as, embodiments of the present invention can be applied in stand-alone application scene equally, namely, only rely on client 102 to complete application, and without the need to carrying out alternately with server 101.
It should be noted that, subscriber equipment herein can be existing, research and develop or in the future research and development, can by the mutual any equipment of any type of wired or wireless connection (such as, Wi-Fi, LAN, concentric cable, cellular network etc.) and server 101.Include but not limited to: existing, research and develop or in the future research and development, desk-top computer, laptop computer, mobile terminal (comprising smart mobile phone, non intelligent mobile phone, various panel computer) etc.
It is also to be noted that server 101 be herein only existing, research and develop or in the future research and development, an example of the equipment that Web Publishing can be provided to apply to user.Embodiments of the present invention are unrestricted in this regard.
It should be noted that the method for embodiment of the present invention can be performed by client 102, similarly, also can be performed by server 101, part can certainly be performed by client 102, partly performed by server 101.Obviously, the present invention is unrestricted in executive agent, as long as perform the method disclosed in embodiment of the present invention.
illustrative methods
Below in conjunction with the application scenarios of Fig. 1, be described with reference to Figure 2 the sensitive word replacement method according to exemplary embodiment of the invention.It should be noted that above-mentioned application scenarios is only that embodiments of the present invention are unrestricted in this regard for the ease of understanding spirit of the present invention and principle and illustrating.On the contrary, embodiments of the present invention can be applied to applicable any scene.
Fig. 2 schematically shows the process flow diagram of an embodiment of sensitive word replacement method in embodiments of the present invention.
As shown in Figure 2, the sensitive word replacement method of the present embodiment is as specifically comprised:
In step S210, receiving target text.
In one example, can suppose that the executive agent of the sensitive word replacement method of the present embodiment is the client 102 shown in Fig. 1.
Such as, this target text can be the content of text that input block (such as, keyboard, mouse, trace ball, touch pad, touch-screen, microphone etc.) that user passes through to be equipped with on a user device inputs.Next, user can such as initiate to issue the order of this target text (such as to internet by identical or different input block, press shortcut Ctrl+Enter by keyboard or click " transmission " or " confirmation " button etc. by mouse), make client 102 can receive this target text, and start the sensitive word replacement method performing the present embodiment.
In addition, in another example, also can suppose that the executive agent of the sensitive word replacement method of the present embodiment is the server 101 shown in Fig. 1.
At this moment, after client 102 receives the target text of user's input, client 102 can be undertaken alternately by wired or wireless connection and server 101 then, the target text that user inputs is sent on server 101, make server 101 can receive this target text, and start the sensitive word replacement method performing the present embodiment.
In step S220, in described target text, search sensitive word according to responsive dictionary.
Such as, can in the executive agent of the sensitive word replacement method of the present embodiment a preset responsive dictionary.
For this reason, can in advance by the analysis to bulk information, some sensitive words commonly used in summary information, form responsive dictionary, are stored in client 102 or server 101.Such as, sensitive word can comprise uncivil vocabulary, relate to the responsive vocabulary of national security etc. and even the vocabulary etc. for the purpose of publicity, advertisement etc.
Certainly, to above-mentioned responsive dictionary carry out preset only for exemplary purposes, the embodiment of the present invention is not limited to this.Such as, this responsive dictionary also can be preset in cloud server, and only just downloads in use in client 102 or content server 101.In addition, in client 102 or content server 101, can also be constantly updated responsive dictionary by cloud server.Undertaken by cloud server upgrading the Dynamic Maintenance that can realize responsive word lists, thus ensure the abundant, correct of responsive vocabulary and real-time.In addition, the responsive storehouse of this word can also have the ability of self-teaching, more can optimize the recognition capability of " sensitive word " like this.
Next, the executive agent of this method can in target text information extraction content, for examination.Then, this executive agent can contrast responsive dictionary, the sensitive word whether stored up containing responsive dictionary internal memory in the examination information content.
Once judge to there is sensitive word in described target text, then this method will proceed in step S230, continue to perform.On the contrary, if judge to there is not sensitive word in described target text, then this target text is published on internet by executive agent by permission, and this method terminates.
In step S230, determine the non-sensitive word corresponding with described sensitive word according to sensitive word Substitution Rules, described non-sensitive word has the susceptibility lower than described sensitive word and for expressing the implication identical or close with described sensitive word.
In response to judging to there is sensitive word in described target text, the executive agent of this method can determine the non-sensitive word corresponding with described sensitive word according to system default or user-defined sensitive word Substitution Rules.
Fig. 3 schematically shows in embodiments of the present invention the process flow diagram of the embodiment determining non-sensitive word step.
As shown in Figure 3, this step S230 can comprise:
In step S310, obtain replacement lexicon according to described sensitive word Substitution Rules.
In embodiments of the present invention, multiple sensitive word Substitution Rules can be provided to the executive agent of this method.Such as, described sensitive word Substitution Rules can comprise based on the semantic Substitution Rules of Semantic judgement, based on the spelling Substitution Rules of spelling process, the dialect Substitution Rules etc. based on Region Matching.
Correspondingly, different sensitive word Substitution Rules can correspond to different replacement lexicons.But embodiments of the present invention are unrestricted in this regard.Such as, also can be the one or more different replacement lexicon of multiple sensitive word Substitution Rules definition.Further, the process such as duplicate removal can also be carried out, with the storage space of the system of saving in this replacement lexicon further.
Advantageously, in an embodiment of the present invention, quite reasonable replacement lexicon can be there is to support the Substitution Rules such as above-mentioned Semantic judgement, spelling process and Region Matching.In addition, this replacement lexicon can also have the ability of self-teaching, more can optimize the result of " sensitive word " like this.
In step s 320, in described replacement lexicon, described non-sensitive word is searched according to described sensitive word.
After lexicon is replaced in acquisition, this replacement lexicon just can be utilized to search the non-sensitive word corresponding with described sensitive word.
Obviously, can one to one, also can be that one-to-many or many-one are corresponding between this sensitive word and this non-sensitive word.When there is multiple non-sensitive word for a sensitive word, automatically the non-sensitive word one can being selected to use according to the use habit of user, also can select used non-sensitive word randomly, or all candidate item are supplied to user and are selected voluntarily by user.
Below, describe in three different examples in further detail according to the non-sensitive word step of the determination in the sensitive word replacement method of embodiment of the present invention.
In a first example, can suppose that sensitive word Substitution Rules described in this are the semantic Substitution Rules based on Semantic judgement.
The implication of letter symbol is exactly semantic (semantic).Semantic can regard the concept representated by things in the real world corresponding to letter symbol simply as implication and these implications between relation, be the explanation of letter symbol on certain field and logical expressions.
In this example, by carrying out semantic analysis to find the non-sensitive word with it with identical or close semanteme to sensitive word, can process to carry out replacement to this sensitive word.
Fig. 4 schematically shows in embodiments of the present invention the process flow diagram of the first example determining non-sensitive word step.
As shown in Figure 4, this step S310 can comprise:
In step S410, are semantic Substitution Rules in response to described sensitive word Substitution Rules, obtain semantic vocabulary storehouse, described semantic vocabulary storehouse defines the corresponding relation between sensitive word and non-sensitive word, wherein, mutually corresponding sensitive word forms sentence element identical in sentence with non-sensitive word.
Such as, in semantic vocabulary storehouse, define the corresponding relation between sensitive word and non-sensitive word, and in the corresponding relation of every a pair sensitive word and non-sensitive word, both semantically has identical or close implication.
In the step s 420, described semantic vocabulary storehouse is defined as described replacement lexicon.
Continue with reference to figure 4, this step S320 can comprise:
In step S430, semantic analysis is carried out to described target text.
Such as, can pre-set in the equipment of executive agent or store semantic model, with the semanteme making it can judge target text according to this semantic model.Specifically, executive agent can, according to the application scenarios etc. of actualite, carry out learning and training semantic model, and then being prestored by semantic model is stored in this locality and/or high in the clouds.Then, after receiving target text, executive agent can search corresponding semantic model from local and/or high in the clouds, judges the definition of organization regulation between this target text Chinese character number and structural relation according to semantic model.
In step S440, determine the sentence element of described sensitive word self according to the result of described semantic analysis.
Next, the sentence element of sensitive word in target text can be determined according to the definition of the organization regulation between this target text Chinese character number and structural relation.Such as, this sentence element can comprise: subject, predicate, object, attribute, the adverbial modifier and complement etc.
In step S450, the sentence element according to described sensitive word self selects described non-sensitive word in described semantic vocabulary storehouse.
Once after judging what sentence element that described sensitive word is formed in target text, correspondingly suitable non-sensitive word can be searched in semantic vocabulary storehouse.
Particularly, first can determine according to the result of described semantic analysis the text object that described sensitive word acts in described target text; Then in described semantic vocabulary storehouse, described non-sensitive word is selected according to the sentence element of described sensitive word self and the implication of described text object.
Below, this example is illustrated by for two examples.
In the first example, suppose that the target text received in step S210 is " traffic of today too idiot, relevant department too idiot ".Obviously, this target text comprises two sensitive words " idiot ".By performing semantic analysis, can know that first sensitive word " idiot " is adjective, as the predicate of sentence, for describing subject " traffic ", and second sensitive word " idiot " is also adjective, as the predicate of sentence, for describing subject " department ".After obtaining above-mentioned analysis result, for first sensitive word, first a non-sensitive adjective for noun adjective " traffic " can be selected in semantic vocabulary storehouse, such as, " terror ".But, continue on for noun adjective " department " sentence will be caused clear and coherent not if noun adjective " traffic " this adjective " terror " will be used for, make audient cannot understand user's meaning to be expressed.For this reason, can continue to select a non-sensitive adjective for noun adjective " department " in semantic vocabulary storehouse, such as, " incapability ".Like this, just in subsequent step, above-mentioned target text can be converted to " traffic of today is too terrified, and relevant department is too incompetent ".
In second example, suppose that the target text received in step S210 is " how flight delays again, bastard " and " how having waste oil again, bastard ".Obviously, two sensitive words " bastard " are comprised in this target text respectively.By performing semantic analysis, can know that first sensitive word " bastard " is noun, as the independent language of sentence, for describing that user is for delayed impression, and second sensitive word " bastard " is also noun, as the independent language of sentence, for describing that user is for the impression of having waste oil.After obtaining above-mentioned analysis result, for first sensitive word, one first can be selected for describing delayed non-sensitive noun in semantic vocabulary storehouse, such as, " Heaven ".But, if continue on for this adjective " Heaven " to describe that having waste oil will cause adaptedness so not strong, make audient cannot understand user's meaning to be expressed exactly.For this reason, can continue in semantic vocabulary storehouse, to select one for describing the non-sensitive noun or nominal phrase of having waste oil, such as, " living difficult ".Like this, just in subsequent step, above-mentioned target text can be converted to respectively " how flight delays again, Heaven " and " how to have waste oil again, live difficult ".
By further contrasting above-mentioned two examples, can find, semantic analysis in the first example is closer to being a kind of semantic analysis based on sentence structure itself, and the semantic analysis in second example is closer to a kind of semantic analysis based on user situation.There is certain difference between the two, this is because the result of semantic analysis determines.
Obviously, in this example, replaced by semanteme, original sensitive word can be treated to the non-sensitive word with identical or close implication, thus original implication of original target text is fully remained when eliminating susceptibility, when there is no negative effect, ensure that user have expressed the meaning of oneself well.
In the second example, can suppose that sensitive word Substitution Rules described in this are the spelling Substitution Rules based on spelling process.
Spelling represents the letter and number of spoken and written languages.Letter representation mainly refers to phonetic, and it is the process combining syllable into syllables, is exactly the composing law according to mandarin syllable, the continuously split add tone and become a syllable rapidly of initial consonant, referral letter, simple or compound vowel of a Chinese syllable.Numeral mainly refers to when semantic word comprises Chinese figure time, can directly use arabic numeral to represent this Chinese figure.
In this example, by finding the letter corresponding with sensitive word and/or combination of numbers, sensitization process can be gone to this sensitive word.
Fig. 5 schematically shows in embodiments of the present invention the process flow diagram of the second example determining non-sensitive word step.
As shown in Figure 5, this step S310 can comprise:
In step S510, in response to described sensitive word Substitution Rules be spelling Substitution Rules, obtain spelling lexicon, described spelling lexicon defines the corresponding relation between sensitive word and non-sensitive word, wherein, non-sensitive word is the digital alphabet set corresponding with sensitive word.
Such as, in spelling lexicon, define corresponding relation between sensitive word and non-sensitive word, in the corresponding relation of every a pair sensitive word and non-sensitive word, non-sensitive word can indicate the implication of sensitive word by the set of letter and number.
Such as, non-sensitive word can be formed in the following ways: it comprises the initial in the phonetic alphabet of each word in sensitive word, and according to the order of each word in sensitive word, the initial in the phonetic of each word is combined as initial set successively.
Alternatively, non-sensitive word also can be formed in the following ways: it comprises the initial in the phonetic alphabet of each non-Chinese figure in sensitive word, and comprise the arabic numeral of each Chinese figure in sensitive word, and according to the order of each word in sensitive word, these initials and arabic numeral are combined as digital alphabet set successively.
In step S520, described spelling lexicon is defined as described replacement lexicon.
Continue with reference to figure 5, this step S320 can comprise:
In step S530, in described spelling lexicon, search specific digital alphabet set according to described sensitive word.
In step S540, described specific digital alphabet set is defined as described non-sensitive word.
Below, will give one example to illustrate this example.
Such as, suppose that the target text received in step S210 is " this people is too abnormal ".Obviously, this target text comprises a sensitive word " metamorphosis ".By searching in spelling lexicon, can find that the non-sensitive digital alphabet set corresponding with sensitive word " metamorphosis " is " BT ".Like this, just in subsequent step, above-mentioned target text can be converted to " this people too BT ".
Obviously, in this example, replaced by spelling, original sensitive word can be treated to the non-sensitive word can expressing identical or close implication, thus fully remain original implication of original target text when eliminating susceptibility.
In the 3rd example, can suppose that sensitive word Substitution Rules described in this are dialect Substitution Rules.
Dialect and territorial dialect, it is the variant that language is formed because of the difference of aspect, region, be national language different geographical on branch, be language development unbalancedness and reflection geographically.From the region branch that same language breaks up out, if be under the social condition of incomplete differentiation and the psychological identity of same language, be just called " dialect ".
In this example, by finding the dialect corresponding with sensitive word, the sensitivity of most of public for this sensitive word being in different geographical with user can be weakened.In addition, the non-sensitive dialect that dialect responsive to this is corresponding can also be searched further, to make other public being in identical region with user also can not be subject to negative effect and the word implication experienced more favorably expressed by user.
Fig. 6 schematically shows in embodiments of the present invention the process flow diagram of the 3rd example determining non-sensitive word step.
As shown in Figure 6, this step S310 can comprise:
In step S610, are dialect Substitution Rules in response to described sensitive word Substitution Rules, obtain Internet Protocol (IP) address of subscriber equipment.
Every platform main frame (Host) on internet all must have a unique IP address.IP agreement uses this address transmission of information between main frame exactly, and this is the basis that internet can run.The length of IP address is 32 (total 2^32 IP addresses), and be divided into 4 sections, 8 every section, decimally numeral, every piece of digital scope is 0 ~ 255, separates between section and section with fullstop.Such as, 159.226.1.1.IP address can be considered as network identity number and host identification number two parts, and that is, IP address can be made up of two parts, and a part is the network address, and another part is host address.
In step S620, determine the geographic area at described user place according to described Internet Protocol address.
Obtain client IP address after, can easily according to the instruments such as information database navigate to this client roughly or better address.According to this roughly or better address can judge that this user is in which provinces, cities and autonomous regions or even which city, district, county etc.
In step S630, obtain the first dialectism storehouse corresponding with described geographic area, described first dialectism storehouse defines the corresponding relation between sensitive word and dialect synonym, wherein, described dialect synonym has the susceptibility lower than described sensitive word and is for expressing the dialectism of identical with described sensitive word or close implication in the geographic area at described user place.
Such as, different dialectism storehouses can be gone out for different geographical region definition in advance.Certainly, in order to save storage space, the geographic area using identical dialect can be made to correspond to a dialectism storehouse.Thus, after the geographic area determining user, the dialectism storehouse corresponding with this geographic area and then can be searched.
In step S640, described first dialectism storehouse is defined as described replacement lexicon.
Continue with reference to figure 6, this step S320 can comprise:
In step S650, in described first dialectism storehouse, search dialect synonym, as described non-sensitive word according to described sensitive word.
Below, will give one example to illustrate this example.
Such as, suppose that the target text received in step S210 is " this people is too retarded ", and corresponding client ip address indicates this client is positioned at Sichuan Province.Obviously, this target text comprises a sensitive word " retarded ".By searching in dialectism storehouse, can find that the non-sensitive dialect corresponding with sensitive word " retarded " is " breathing out youngster ".Like this, just in subsequent step, above-mentioned target text can be converted to " this people Tai Ha ".
Obviously, in this example, replaced by simple dialect, the sensitivity of most of public for this sensitive word being in different geographical with this Sichuan user can be weakened.
Alternatively, even if because the dialectism after weakening still may have susceptibility to a certain extent, so non-sensitive dialect corresponding to dialect responsive to this can also be searched further, this responsive dialect can be eliminated further for the negative effect of other public being in identical region with user.
For this reason, continue with reference to figure 6, this step S320 also can comprise:
In step S660, in described first dialectism storehouse, search dialect synonym according to described sensitive word;
In step S670, obtain the second dialectism storehouse corresponding with described geographic area, described second dialectism storehouse defines the corresponding relation between dialect synonym and the non-sensitive word of dialect, wherein, the non-sensitive word of described dialect has the susceptibility lower than described dialect synonym and is for expressing the dialectism of implication identical or close with described dialect synonym in the geographic area at described user place; And
In step S680, in described second dialectism storehouse, search the non-sensitive word of dialect, as the described non-sensitive word will replacing described sensitive word according to described dialect synonym.
Below, will give one example to illustrate this example.
Such as, suppose that the target text received in step S210 is " this people is too retarded ", and corresponding client ip address indicates this client is positioned at Sichuan Province.Obviously, this target text comprises a sensitive word " retarded ".By searching in dialectism storehouse, can find that the non-sensitive word corresponding with sensitive word " retarded " is " breathing out youngster ".But, because " breathe out youngster " still has certain insult implication, so dialect responsive to this another non-sensitive dialect " foolish skull " that " to breathe out youngster " corresponding can be searched in dialectism storehouse further.Like this, just in subsequent step, above-mentioned target text can be converted to " the too foolish skull of this people ".
Obviously, compared with last example, in the present example, not only weaken the sensitivity of most of public for this sensitive word being in different geographical with this Sichuan user, the dialectism further this still especially with certain sensitivity is converted to insensitive dialectism further, and other public making to be in this Sichuan user identical region can front and understand the meaning that this Sichuan user will express meaningly.
Although be illustrated to obtain replacement lexicon in the above-described embodiment determining sensitive word step, the present invention is not limited thereto.Such as, in another embodiment of the present invention, when sensitive word Substitution Rules are the spelling Substitution Rules based on spelling process, also can replace lexicon without the need to obtaining, but directly carry out the determination of non-sensitive word.
Fig. 7 schematically shows in embodiments of the present invention the process flow diagram of the another embodiment determining non-sensitive word step.
As shown in Figure 7, this step S230 can comprise:
In step S710, are spelling Substitution Rules in response to described sensitive word Substitution Rules, determine the digital alphabet set corresponding with described sensitive word.
In step S720, described digital alphabet set is defined as described non-sensitive word.
As mentioned above, spelling is letter to letter symbol and/or digital direct representation, just because of like this, without the need to any preset lexicon, but directly can carry out the expression of numeral and/or letter to sensitive word.
In one example, can all adopt the set of letter to represent sensitive word.
Particularly, in this step S710, first can determine the initial in the phonetic alphabet of each word in described sensitive word; Then, the initial in the phonetic of each word is combined as initial set successively, as described digital alphabet set according to the order of each word in sensitive word.
Such as, suppose that the target text received in step S210 is " this people is too abnormal ".Obviously, this target text comprises a sensitive word " metamorphosis ".By carrying out phonetic analysis to this sensitive word, can find that the initial set corresponding with sensitive word " metamorphosis " is " BT ".Like this, just in subsequent step, above-mentioned target text can be converted to " this people too BT ".
In another example, when comprising Chinese figure at sensitive word, the set of letter and number can be adopted to represent sensitive word.
Particularly, in this step S710, first can judge whether comprise Chinese figure in described sensitive word; Then comprise Chinese figure in response at described sensitive word, determine the arabic numeral corresponding to each Chinese figure in described sensitive word, and determine the initial in the phonetic alphabet of each non-Chinese figure in described sensitive word; Finally, according to the order of each word in sensitive word, the initial in the phonetic of the arabic numeral corresponding to each Chinese figure and each word is combined as described digital alphabet set successively.
Such as, suppose that the target text received in step S210 is " this people really one two stupid ".Obviously, this target text comprises a sensitive word " two is stupid ", wherein comprises a Chinese figure " two ".By carrying out phonetic and numerical analysis to this sensitive word, can find that the digital alphabet set corresponding with sensitive word " two is stupid " is " 2S ".Like this, just in subsequent step, above-mentioned target text can be converted to " this people is a 2S really ".
In step S240, described sensitive word is replaced with described non-sensitive word.
Finally, can use and carry out replacing sensitive word by the determined non-sensitive word of above-mentioned either type.
In an embodiment of the invention, due to multiple sensitive word Substitution Rules can be there are, so preferably, different options can be provided to user further, user is made to select different sensitive word Substitution Rules as required, to meet the customization demand of user.
For this reason, before this step S230, the sensitive word replacement method of the present embodiment is as specifically comprised:
In step s 250, multiple replacement candidate rule is provided to user.
In step S260, receive the replacement candidate rule that user selects among described multiple replacement candidate rule.
In step S270, user-selected replacement candidate rule is defined as described sensitive word Substitution Rules.
Such as, can be provided such as based on multiple sensitive word Substitution Rules of the semantic Substitution Rules, the spelling Substitution Rules based on spelling process, the dialect Substitution Rules etc. based on Region Matching and so on of Semantic judgement to user by graphic user interface at user interface interaction equipment (such as, client 102).Further, utilize the selection performed by input equipment to operate according to user and determine that user is desirably in the sensitive word Substitution Rules used in step S230.
It should be noted that, performed before step S230 for step S250 to S270 hereinbefore as far as possible and be illustrated.But, the present invention is not limited thereto.Obviously, before this step S250 to S270 also can be positioned at step S220, before being even positioned at step S210.
By the technical scheme of the present embodiment, significant process can be carried out to the sensitive word in text, to go sensitization to this sensitive word.The benefit done like this is: at customer-side, reduces the negative energy of user, is conducive to social harmony; At system aspects, reduce the workload of " manual examination and verification " this kind of work; In cultural, embody humanistic care and the social harmony of software.Therefore, method of the present invention makes user in content distributed to internet, even if be mingled with sensitive word in the text, also by carrying out suitable process to sensitive word to the issue enthusiasm of the user that adequately protects, promotes the sense of participation of user.
example devices
After the method describing exemplary embodiment of the invention, next, the sensitive word alternative according to another illustrative embodiments of the present invention is introduced.
Fig. 8 schematically shows the schematic diagram of the sensitive word alternative according to embodiment of the present invention.As shown in Figure 8, this device 800 can comprise:
Target text receiving element 810, for receiving target text;
Sensitive word searches unit 820, for searching sensitive word according to responsive dictionary in described target text;
Non-sensitive word determining unit 830, for determining the non-sensitive word corresponding with described sensitive word according to sensitive word Substitution Rules, described non-sensitive word has the susceptibility lower than described sensitive word and for expressing the implication identical or close with described sensitive word; And
Non-sensitive word replacement unit 840, for replacing with described non-sensitive word by described sensitive word.
In one embodiment of the invention, in order to determine the non-sensitive word corresponding with described sensitive word according to sensitive word Substitution Rules, this non-sensitive word determining unit 830 can obtain replacement lexicon according to described sensitive word Substitution Rules; And in described replacement lexicon, search described non-sensitive word according to described sensitive word.
In a concrete example, in order to obtain replacement lexicon according to described sensitive word Substitution Rules, this non-sensitive word determining unit 830 can be semantic Substitution Rules in response to described sensitive word Substitution Rules, obtain semantic vocabulary storehouse, described semantic vocabulary storehouse defines the corresponding relation between sensitive word and non-sensitive word, wherein, mutually corresponding sensitive word forms sentence element identical in sentence with non-sensitive word; And described semantic vocabulary storehouse is defined as described replacement lexicon.
In this concrete example, in order to search described non-sensitive word according to described sensitive word in described replacement lexicon, this non-sensitive word determining unit 830 can carry out semantic analysis to described target text; The sentence element of described sensitive word self is determined according to the result of described semantic analysis; And in described semantic vocabulary storehouse, select described non-sensitive word according to the sentence element of described sensitive word self.
Particularly, in order to the sentence element according to described sensitive word self selects described non-sensitive word in described semantic vocabulary storehouse, this non-sensitive word determining unit 830 can determine according to the result of described semantic analysis the text object that described sensitive word acts in described target text; And in described semantic vocabulary storehouse, select described non-sensitive word according to the sentence element of described sensitive word self and the implication of described text object.
In another concrete example, in order to obtain replacement lexicon according to described sensitive word Substitution Rules, this non-sensitive word determining unit 830 can be spelling Substitution Rules in response to described sensitive word Substitution Rules, obtain spelling lexicon, described spelling lexicon defines the corresponding relation between sensitive word and non-sensitive word, wherein, non-sensitive word is the digital alphabet set corresponding with sensitive word; And described spelling lexicon is defined as described replacement lexicon.
In this concrete example, in order to search described non-sensitive word according to described sensitive word in described replacement lexicon, this non-sensitive word determining unit 830 can search specific digital alphabet set according to described sensitive word in described spelling lexicon; And described specific digital alphabet set is defined as described non-sensitive word
In another concrete example, in order to obtain replacement lexicon according to described sensitive word Substitution Rules, this non-sensitive word determining unit 830 can be dialect Substitution Rules in response to described sensitive word Substitution Rules, obtains Internet Protocol (IP) address of subscriber equipment; The geographic area at described user place is determined according to described Internet Protocol address; Obtain the first dialectism storehouse corresponding with described geographic area, described first dialectism storehouse defines the corresponding relation between sensitive word and dialect synonym, wherein, described dialect synonym has the susceptibility lower than described sensitive word and is for expressing the dialectism of identical with described sensitive word or close implication in the geographic area at described user place; And described first dialectism storehouse is defined as described replacement lexicon.
In this concrete example, in order to search described non-sensitive word according to described sensitive word in described replacement lexicon, this non-sensitive word determining unit 830 can search dialect synonym, as described non-sensitive word according to described sensitive word in described first dialectism storehouse.
Alternatively, in order to search described non-sensitive word according to described sensitive word in described replacement lexicon, this non-sensitive word determining unit 830 can search dialect synonym according to described sensitive word in described first dialectism storehouse; Obtain the second dialectism storehouse corresponding with described geographic area, described second dialectism storehouse defines the corresponding relation between dialect synonym and the non-sensitive word of dialect, wherein, the non-sensitive word of described dialect has the susceptibility lower than described dialect synonym and is for expressing the dialectism of implication identical or close with described dialect synonym in the geographic area at described user place; And in described second dialectism storehouse, search the non-sensitive word of dialect, as the described non-sensitive word will replacing described sensitive word according to described dialect synonym.
In one embodiment of the invention, in order to determine the non-sensitive word corresponding with described sensitive word according to sensitive word Substitution Rules, this non-sensitive word determining unit 830 can be spelling Substitution Rules in response to described sensitive word Substitution Rules, determines the digital alphabet set corresponding with described sensitive word; And described digital alphabet set is defined as described non-sensitive word.
In a concrete example, in order to determine the digital alphabet set corresponding with described sensitive word, this non-sensitive word determining unit 830 can determine the initial in the phonetic alphabet of each word in described sensitive word; And the initial in the phonetic of each word is combined as initial set successively, as described digital alphabet set according to the order of each word in sensitive word.
In another concrete example, in order to determine the digital alphabet set corresponding with described sensitive word, this non-sensitive word determining unit 830 can judge whether comprise Chinese figure in described sensitive word; Comprise Chinese figure in response at described sensitive word, determine the arabic numeral corresponding to each Chinese figure in described sensitive word, and determine the initial in the phonetic alphabet of each non-Chinese figure in described sensitive word; And according to the order of each word in sensitive word, the initial in the phonetic of the arabic numeral corresponding to each Chinese figure and each word is combined as described digital alphabet set successively.
Continue with reference to figure 8, this device 800 can also comprise:
Candidate rule providing unit 850, for providing multiple replacement candidate rule to user;
User selects receiving element 860, replaces candidate rule for receiving user among described multiple replacement candidate rule for one that selects; And
Substitution Rules determining unit 870, for being defined as described sensitive word Substitution Rules by user-selected replacement candidate rule.
Concrete configuration according to the unit in the described sensitive word alternative 800 of the embodiment of the present application is introduced in detail with operation in the sensitive word replacement method described above with reference to Fig. 1 to Fig. 7, and therefore, will omit its repeated description.
example devices
After the method and apparatus describing exemplary embodiment of the invention, next, the sensitive word alternative according to another illustrative embodiments of the present invention is introduced.
Person of ordinary skill in the field can understand, and various aspects of the present invention can be implemented as system, method or program product.Therefore, various aspects of the present invention can be implemented as following form, that is: hardware embodiment, completely Software Implementation (comprising firmware, microcode etc.) completely, or the embodiment that hardware and software aspect combines, " circuit ", " module " or " system " can be referred to as here.
In the embodiment that some are possible, the sensitive word alternative according to the embodiment of the present invention at least can comprise at least one processing unit and at least one storage unit.Wherein, described cell stores has program code, when described program code is performed by described processing unit, described processing unit is performed describe in this instructions above-mentioned " illustrative methods " part according to the step in the sensitive word replacement method of the various illustrative embodiments of the present invention.Such as, described processing unit can perform each step as shown in Figure 2: in step S210, receiving target text; In step S220, in described target text, search sensitive word according to responsive dictionary; In step S230, determine the non-sensitive word corresponding with described sensitive word according to sensitive word Substitution Rules, described non-sensitive word has the susceptibility lower than described sensitive word and for expressing the implication identical or close with described sensitive word; And in step S240, described sensitive word is replaced with described non-sensitive word.
exemplary process product
In the embodiment that some are possible, various aspects of the present invention can also be embodied as a kind of form of program product, it comprises program code, when described program code runs on a user device, described program code be used for that described subscriber equipment is performed and describe in this instructions above-mentioned " illustrative methods " part according to the step in the sensitive word replacement method of the various illustrative embodiments of the present invention, such as, described subscriber equipment can perform each step as shown in Figure 2: in step S210, receiving target text; In step S220, in described target text, search sensitive word according to responsive dictionary; In step S230, determine the non-sensitive word corresponding with described sensitive word according to sensitive word Substitution Rules, described non-sensitive word has the susceptibility lower than described sensitive word and for expressing the implication identical or close with described sensitive word; And in step S240, described sensitive word is replaced with described non-sensitive word.
Although it should be noted that the some unit or subelement that are referred to the device replaced for sensitive word in above-detailed, this division is only schematically not enforceable.In fact, according to the embodiment of the present invention, the Characteristic and function of two or more unit above-described can be specialized in a unit.Otherwise, the Characteristic and function of an above-described unit can Further Division for be specialized by multiple unit.
In addition, although describe the operation of the inventive method in the accompanying drawings with particular order, this is not that requirement or hint must perform these operations according to this particular order, or must perform the result that all shown operation could realize expectation.Additionally or alternatively, some step can be omitted, multiple step be merged into a step and perform, and/or a step is decomposed into multiple step and perform.
Although describe spirit of the present invention and principle with reference to some embodiments, but should be appreciated that, the present invention is not limited to disclosed embodiment, can not combine to be benefited to the feature that the division of each side does not mean that in these aspects yet, this division is only the convenience in order to state.The present invention is intended to contain the interior included various amendment of spirit and scope and the equivalent arrangements of claims.

Claims (15)

1. a sensitive word replacement method, comprising:
Receiving target text;
In described target text, sensitive word is searched according to responsive dictionary;
Determine the non-sensitive word corresponding with described sensitive word according to sensitive word Substitution Rules, described non-sensitive word has the susceptibility lower than described sensitive word and for expressing the implication identical or close with described sensitive word; And
Described sensitive word is replaced with described non-sensitive word.
2. method according to claim 1, wherein, determine that the non-sensitive word corresponding with described sensitive word comprises according to sensitive word Substitution Rules:
Replacement lexicon is obtained according to described sensitive word Substitution Rules; And
In described replacement lexicon, described non-sensitive word is searched according to described sensitive word.
3. method according to claim 2, wherein, obtains replacement lexicon according to described sensitive word Substitution Rules and comprises:
Semantic Substitution Rules in response to described sensitive word Substitution Rules, obtain semantic vocabulary storehouse, described semantic vocabulary storehouse defines the corresponding relation between sensitive word and non-sensitive word, and wherein, mutually corresponding sensitive word forms sentence element identical in sentence with non-sensitive word; And
Described semantic vocabulary storehouse is defined as described replacement lexicon.
4. method according to claim 3, wherein, in described replacement lexicon, search described non-sensitive word according to described sensitive word and comprise:
Semantic analysis is carried out to described target text;
The sentence element of described sensitive word self is determined according to the result of described semantic analysis; And
Sentence element according to described sensitive word self selects described non-sensitive word in described semantic vocabulary storehouse.
5. method according to claim 4, wherein, in described semantic vocabulary storehouse, select described non-sensitive word to comprise according to the sentence element of described sensitive word self:
The text object that described sensitive word acts in described target text is determined according to the result of described semantic analysis; And
In described semantic vocabulary storehouse, described non-sensitive word is selected according to the sentence element of described sensitive word self and the implication of described text object.
6. method according to claim 2, wherein, obtains replacement lexicon according to described sensitive word Substitution Rules and comprises:
In response to described sensitive word Substitution Rules be spelling Substitution Rules, obtain spelling lexicon, described spelling lexicon defines the corresponding relation between sensitive word and non-sensitive word, and wherein, non-sensitive word is the digital alphabet set corresponding with sensitive word; And
Described spelling lexicon is defined as described replacement lexicon.
7. method according to claim 6, wherein, in described replacement lexicon, search described non-sensitive word according to described sensitive word and comprise:
In described spelling lexicon, specific digital alphabet set is searched according to described sensitive word; And
Described specific digital alphabet set is defined as described non-sensitive word.
8. method according to claim 2, wherein, obtains replacement lexicon according to described sensitive word Substitution Rules and comprises:
Are dialect Substitution Rules in response to described sensitive word Substitution Rules, obtain Internet Protocol (IP) address of subscriber equipment;
The geographic area at described user place is determined according to described Internet Protocol address;
Obtain the first dialectism storehouse corresponding with described geographic area, described first dialectism storehouse defines the corresponding relation between sensitive word and dialect synonym, wherein, described dialect synonym has the susceptibility lower than described sensitive word and is for expressing the dialectism of identical with described sensitive word or close implication in the geographic area at described user place; And
Described first dialectism storehouse is defined as described replacement lexicon.
9. method according to claim 8, wherein, in described replacement lexicon, search described non-sensitive word according to described sensitive word and comprise:
In described first dialectism storehouse, dialect synonym is searched, as described non-sensitive word according to described sensitive word.
10. method according to claim 8, wherein, in described replacement lexicon, search described non-sensitive word according to described sensitive word and comprise:
In described first dialectism storehouse, dialect synonym is searched according to described sensitive word;
Obtain the second dialectism storehouse corresponding with described geographic area, described second dialectism storehouse defines the corresponding relation between dialect synonym and the non-sensitive word of dialect, wherein, the non-sensitive word of described dialect has the susceptibility lower than described dialect synonym and is for expressing the dialectism of implication identical or close with described dialect synonym in the geographic area at described user place; And
In described second dialectism storehouse, the non-sensitive word of dialect is searched, as the described non-sensitive word will replacing described sensitive word according to described dialect synonym.
11. methods according to claim 1, wherein, determine that the non-sensitive word corresponding with described sensitive word comprises according to sensitive word Substitution Rules:
In response to described sensitive word Substitution Rules be spelling Substitution Rules, determine the digital alphabet set corresponding with described sensitive word; And
Described digital alphabet set is defined as described non-sensitive word.
12. methods according to claim 11, wherein, determine that the digital alphabet set corresponding with described sensitive word comprises:
Determine the initial in the phonetic alphabet of each word in described sensitive word; And
Initial in the phonetic of each word is combined as initial set successively, as described digital alphabet set according to the order of each word in sensitive word.
13. methods according to claim 11, wherein, determine that the digital alphabet set corresponding with described sensitive word comprises:
Judge whether comprise Chinese figure in described sensitive word;
Comprise Chinese figure in response at described sensitive word, determine the arabic numeral corresponding to each Chinese figure in described sensitive word, and determine the initial in the phonetic alphabet of each non-Chinese figure in described sensitive word; And
According to the order of each word in sensitive word, the initial in the phonetic of the arabic numeral corresponding to each Chinese figure and each word is combined as described digital alphabet set successively.
14. methods according to claim 1, also comprise:
Multiple replacement candidate rule is provided to user;
Receive the replacement candidate rule that user selects among described multiple replacement candidate rule; And
User-selected replacement candidate rule is defined as described sensitive word Substitution Rules.
15. 1 kinds of sensitive word alternatives, comprising:
Target text receiving element, for receiving target text;
Sensitive word searches unit, for searching sensitive word according to responsive dictionary in described target text;
Non-sensitive word determining unit, for determining the non-sensitive word corresponding with described sensitive word according to sensitive word Substitution Rules, described non-sensitive word has the susceptibility lower than described sensitive word and for expressing the implication identical or close with described sensitive word; And
Non-sensitive word replacement unit, for replacing with described non-sensitive word by described sensitive word.
CN201510446574.6A 2015-07-27 2015-07-27 Sensitive word replacing method and device Active CN105183761B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510446574.6A CN105183761B (en) 2015-07-27 2015-07-27 Sensitive word replacing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510446574.6A CN105183761B (en) 2015-07-27 2015-07-27 Sensitive word replacing method and device

Publications (2)

Publication Number Publication Date
CN105183761A true CN105183761A (en) 2015-12-23
CN105183761B CN105183761B (en) 2020-04-07

Family

ID=54905845

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510446574.6A Active CN105183761B (en) 2015-07-27 2015-07-27 Sensitive word replacing method and device

Country Status (1)

Country Link
CN (1) CN105183761B (en)

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105574203A (en) * 2016-01-07 2016-05-11 沈文策 Information storage method and device
CN105808527A (en) * 2016-02-24 2016-07-27 北京百度网讯科技有限公司 Oriented translation method and device based on artificial intelligence
CN106372062A (en) * 2016-09-18 2017-02-01 长沙军鸽软件有限公司 Method and device for recognizing non-civilized terms in communication message
CN106453366A (en) * 2016-10-27 2017-02-22 北京锐安科技有限公司 Information transmission method and system, sending terminal and receiving terminal
CN107547513A (en) * 2017-07-14 2018-01-05 新华三信息安全技术有限公司 Message processing method, device, the network equipment and storage medium
CN108228704A (en) * 2017-11-03 2018-06-29 阿里巴巴集团控股有限公司 Identify method and device, the equipment of Risk Content
CN108564950A (en) * 2018-02-28 2018-09-21 上海与德科技有限公司 Method, intelligent terminal and the computer storage media of speech-to-text
CN109213468A (en) * 2018-08-23 2019-01-15 阿里巴巴集团控股有限公司 A kind of speech playing method and device
CN109962958A (en) * 2017-12-26 2019-07-02 上海全土豆文化传播有限公司 Document processing method and device
CN110472234A (en) * 2019-07-19 2019-11-19 平安科技(深圳)有限公司 Sensitive text recognition method, device, medium and computer equipment
CN110874398A (en) * 2020-01-14 2020-03-10 广东博智林机器人有限公司 Forbidden word processing method and device, electronic equipment and storage medium
CN111918173A (en) * 2020-07-22 2020-11-10 浙江大丰实业股份有限公司 Protection system of stage sound equipment and use method
CN112307770A (en) * 2020-10-13 2021-02-02 深圳前海微众银行股份有限公司 Sensitive information detection method and device, electronic equipment and storage medium
CN112559776A (en) * 2020-12-21 2021-03-26 绿瘦健康产业集团有限公司 Sensitive information positioning method and system
CN112599212A (en) * 2021-02-26 2021-04-02 北京妙医佳健康科技集团有限公司 Data processing method
CN113033217A (en) * 2021-04-19 2021-06-25 广州欢网科技有限责任公司 Method and device for automatically shielding and translating sensitive subtitle information
CN114706942A (en) * 2022-03-16 2022-07-05 马上消费金融股份有限公司 Text conversion model training method, text conversion device and electronic equipment
CN115963954A (en) * 2023-03-14 2023-04-14 北京中科智媒融媒体技术有限公司 Information publishing method, device, equipment and medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050138109A1 (en) * 2000-11-13 2005-06-23 Redlich Ron M. Data security system and method with adaptive filter
CN101136867A (en) * 2006-08-30 2008-03-05 腾讯科技(深圳)有限公司 Method and device for transmitting prompt message to client terminal of chat room
CN101470700A (en) * 2007-12-28 2009-07-01 日电(中国)有限公司 Text template generator, text generation equipment, text checking equipment and method thereof
US20100082332A1 (en) * 2008-09-26 2010-04-01 Rite-Solutions, Inc. Methods and apparatus for protecting users from objectionable text
CN101901325A (en) * 2010-07-21 2010-12-01 赵步 Copyright protection method
CN102339361A (en) * 2011-11-03 2012-02-01 厦门市智业软件工程有限公司 Method for monitoring sensitive words in segment quoting of electronic medical record
CN104317781A (en) * 2014-11-14 2015-01-28 移康智能科技(上海)有限公司 Sensitive word editor

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050138109A1 (en) * 2000-11-13 2005-06-23 Redlich Ron M. Data security system and method with adaptive filter
CN101136867A (en) * 2006-08-30 2008-03-05 腾讯科技(深圳)有限公司 Method and device for transmitting prompt message to client terminal of chat room
CN101470700A (en) * 2007-12-28 2009-07-01 日电(中国)有限公司 Text template generator, text generation equipment, text checking equipment and method thereof
US20100082332A1 (en) * 2008-09-26 2010-04-01 Rite-Solutions, Inc. Methods and apparatus for protecting users from objectionable text
CN101901325A (en) * 2010-07-21 2010-12-01 赵步 Copyright protection method
CN102339361A (en) * 2011-11-03 2012-02-01 厦门市智业软件工程有限公司 Method for monitoring sensitive words in segment quoting of electronic medical record
CN104317781A (en) * 2014-11-14 2015-01-28 移康智能科技(上海)有限公司 Sensitive word editor

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
葛本仪: "《现代汉语词汇学》", 30 June 2014, 北京:商务印书馆 *

Cited By (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105574203A (en) * 2016-01-07 2016-05-11 沈文策 Information storage method and device
US10055405B2 (en) 2016-02-24 2018-08-21 Beijing Baidu Netcom Science And Technology Co., Ltd. Computer-implemented directional translation method and apparatus
CN105808527A (en) * 2016-02-24 2016-07-27 北京百度网讯科技有限公司 Oriented translation method and device based on artificial intelligence
CN106372062A (en) * 2016-09-18 2017-02-01 长沙军鸽软件有限公司 Method and device for recognizing non-civilized terms in communication message
CN106453366A (en) * 2016-10-27 2017-02-22 北京锐安科技有限公司 Information transmission method and system, sending terminal and receiving terminal
CN107547513A (en) * 2017-07-14 2018-01-05 新华三信息安全技术有限公司 Message processing method, device, the network equipment and storage medium
CN107547513B (en) * 2017-07-14 2021-02-05 新华三信息安全技术有限公司 Message processing method, device, network equipment and storage medium
CN108228704A (en) * 2017-11-03 2018-06-29 阿里巴巴集团控股有限公司 Identify method and device, the equipment of Risk Content
CN108228704B (en) * 2017-11-03 2021-07-13 创新先进技术有限公司 Method, device and equipment for identifying risk content
CN109962958A (en) * 2017-12-26 2019-07-02 上海全土豆文化传播有限公司 Document processing method and device
CN109962958B (en) * 2017-12-26 2022-05-03 阿里巴巴(中国)有限公司 Document processing method and device
CN108564950A (en) * 2018-02-28 2018-09-21 上海与德科技有限公司 Method, intelligent terminal and the computer storage media of speech-to-text
CN109213468A (en) * 2018-08-23 2019-01-15 阿里巴巴集团控股有限公司 A kind of speech playing method and device
CN110472234A (en) * 2019-07-19 2019-11-19 平安科技(深圳)有限公司 Sensitive text recognition method, device, medium and computer equipment
CN110472234B (en) * 2019-07-19 2024-08-20 平安科技(深圳)有限公司 Sensitive text recognition method, device, medium and computer equipment
CN110874398B (en) * 2020-01-14 2020-06-02 广东博智林机器人有限公司 Forbidden word processing method and device, electronic equipment and storage medium
CN110874398A (en) * 2020-01-14 2020-03-10 广东博智林机器人有限公司 Forbidden word processing method and device, electronic equipment and storage medium
CN111918173B (en) * 2020-07-22 2021-10-29 浙江大丰实业股份有限公司 Protection system of stage sound equipment and use method
CN111918173A (en) * 2020-07-22 2020-11-10 浙江大丰实业股份有限公司 Protection system of stage sound equipment and use method
CN112307770A (en) * 2020-10-13 2021-02-02 深圳前海微众银行股份有限公司 Sensitive information detection method and device, electronic equipment and storage medium
CN112559776A (en) * 2020-12-21 2021-03-26 绿瘦健康产业集团有限公司 Sensitive information positioning method and system
CN112599212A (en) * 2021-02-26 2021-04-02 北京妙医佳健康科技集团有限公司 Data processing method
CN113033217A (en) * 2021-04-19 2021-06-25 广州欢网科技有限责任公司 Method and device for automatically shielding and translating sensitive subtitle information
CN113033217B (en) * 2021-04-19 2023-09-15 广州欢网科技有限责任公司 Automatic shielding translation method and device for subtitle sensitive information
CN114706942A (en) * 2022-03-16 2022-07-05 马上消费金融股份有限公司 Text conversion model training method, text conversion device and electronic equipment
CN114706942B (en) * 2022-03-16 2023-11-24 马上消费金融股份有限公司 Text conversion model training method, text conversion device and electronic equipment
CN115963954A (en) * 2023-03-14 2023-04-14 北京中科智媒融媒体技术有限公司 Information publishing method, device, equipment and medium

Also Published As

Publication number Publication date
CN105183761B (en) 2020-04-07

Similar Documents

Publication Publication Date Title
CN105183761A (en) Sensitive word replacement method and apparatus
US10140371B2 (en) Providing multi-lingual searching of mono-lingual content
US10997370B2 (en) Hybrid classifier for assigning natural language processing (NLP) inputs to domains in real-time
KR102364400B1 (en) Obtaining response information from multiple corpuses
US11482212B2 (en) Electronic device for analyzing meaning of speech, and operation method therefor
JP7108675B2 (en) Semantic matching method, device, electronic device, storage medium and computer program
KR102243536B1 (en) Method and system for controlling user access through content analysis of application
US20220012296A1 (en) Systems and methods to automatically categorize social media posts and recommend social media posts
KR102075505B1 (en) Method and system for extracting topic keyword
US20170177180A1 (en) Dynamic Highlighting of Text in Electronic Documents
TW200900967A (en) Multi-mode input method editor
US10042840B2 (en) Hybrid grammatical and ungrammatical parsing
TW201606750A (en) Speech recognition using a foreign word grammar
TWI588668B (en) Foreign language production support facilities and methods
WO2009026850A1 (en) Domain dictionary creation
CN110020429B (en) Semantic recognition method and device
US20200043074A1 (en) Apparatus and method of recommending items based on areas
RU2595531C2 (en) Method and system for generating definition of word based on multiple sources
Kharb et al. Embedding intelligence through cognitive services
CN112036135B (en) Text processing method and related device
KR102072708B1 (en) A method and computer program for inferring genre of a text contents
Yang et al. The construction of a kind of chat corpus in Chinese word segmentation
KR102501625B1 (en) Method and system for controlling user access through content analysis of application
KR102426079B1 (en) Online advertising method using mobile platform
KR102378565B1 (en) Method and system for controlling user access through content analysis of application

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant