CN110309423A - A kind of sensitive information recognition methods, device and electronic equipment - Google Patents

A kind of sensitive information recognition methods, device and electronic equipment Download PDF

Info

Publication number
CN110309423A
CN110309423A CN201910574799.8A CN201910574799A CN110309423A CN 110309423 A CN110309423 A CN 110309423A CN 201910574799 A CN201910574799 A CN 201910574799A CN 110309423 A CN110309423 A CN 110309423A
Authority
CN
China
Prior art keywords
information
search result
sensitive
query information
content
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910574799.8A
Other languages
Chinese (zh)
Inventor
刘维伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing QIYI Century Science and Technology Co Ltd
Original Assignee
Beijing QIYI Century Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing QIYI Century Science and Technology Co Ltd filed Critical Beijing QIYI Century Science and Technology Co Ltd
Priority to CN201910574799.8A priority Critical patent/CN110309423A/en
Publication of CN110309423A publication Critical patent/CN110309423A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9538Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • Medical Informatics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention provides a kind of sensitive information recognition methods, device and electronic equipments.This method comprises: obtaining user is input to the query information in search box;Query information is searched in excavating obtained target susceptibility information in advance;Wherein, target susceptibility information includes the historical query information for meeting preset condition;Preset condition are as follows: there are the second quantity that the first quantity of the first search result of sensitive content is greater than the second search result there is no sensitive content;When finding query information in target susceptibility information, determine that query information is sensitive information.In this way, can identify whether query information is sensitive information by the target susceptibility information excavated in advance, avoid through regular expression and identify sensitive information, reduces the human cost of identification sensitive information.

Description

A kind of sensitive information recognition methods, device and electronic equipment
Technical field
The present invention relates to technical field of information processing, more particularly to a kind of sensitive information recognition methods, device and electronics Equipment.
Background technique
In order to keep good network environment, it is often necessary to identify query information that user inputs in search box whether be Sensitive information.If sensitive information, then Search Hints information and search result etc. comprising the sensitive information can be shielded.Its In, which typically refers to pornography.
Whether the query information for often identifying user's input by regular expression at present is sensitive information.For example, passing through Regular expression " men and women does " come identify query information " men and women does " be sensitive information.But the canonical table It can not identify whether query information " men and women does " is sensitive information up to formula.
That is, this kind of sensitive information identification method needs technical staff that a large amount of regular expression is arranged, with can be with The various sensitive informations that user is inputted are identified by regular expression.But since a large amount of regular expressions are arranged Formula needs to expend the more time and efforts of technical staff, so that the human cost of identification sensitive information is higher.
Summary of the invention
The embodiment of the present invention is designed to provide a kind of sensitive information recognition methods, device and electronic equipment, with can be with Sensitive information is not identified by regular expression, to reduce the human cost of identification sensitive information.Specific technical solution is such as Under:
In a first aspect, the embodiment of the invention provides a kind of sensitive information recognition methods, comprising:
It obtains user and is input to the query information in search box;
Query information is searched in excavating obtained target susceptibility information in advance;Wherein, target susceptibility information includes meeting The historical query information of preset condition;Preset condition are as follows: there are the first quantity of the first search result of sensitive content to be greater than not There are the second quantity of the second search result of sensitive content;
When finding query information in target susceptibility information, determine that query information is sensitive information.
Optionally, in excavating obtained target susceptibility information in advance before lookup query information, further includes:
Obtain the historical query information that search box is input in preset historical time section;
It determines in historical time section, the search result being clicked corresponding to historical query information;
It determines in the search result that was clicked there are the first quantity of the first search result of sensitive content and is not present Second quantity of the second search result of sensitive content;
When the first quantity is greater than the second quantity, determine that historical query information is target susceptibility information.
Optionally it is determined that in the search result being clicked there are the first quantity of the first search result of sensitive content and The step of there is no the second quantity of the second search result of sensitive content, comprising:
Identify whether the object content in the search result being clicked includes sensitive content;Wherein, object content packet It includes: title and/or surface plot;
If object content includes sensitive content, determine that the search result being clicked is the first search result;
If object content does not include sensitive content, determine that the search result being clicked is the second search result;
Count the first quantity of the first search result and the second quantity of the second search result.
Optionally, the step of whether object content in search result that identification was clicked includes sensitive content, packet It includes:
Determine the number that the search result being clicked is clicked in historical time section;
When number is more than or equal to default number of clicks, identify whether the object content in the search result being clicked wraps Contain sensitive content.
Optionally, the step of being input to the historical query information of search box in preset historical time section is obtained, comprising:
Obtain the user journal in preset historical time section;
From obtaining the historical query information for being input to search box in historical time section in user journal.
Optionally, in embodiments of the present invention, sensitive information includes: pornography and/or violence information.
Second aspect, the embodiment of the invention also provides a kind of sensitive information identification devices, comprising:
First obtains module, the query information being input in search box for obtaining user;
Searching module, for searching query information in excavating obtained target susceptibility information in advance;Wherein, the target Sensitive information includes the historical query information for meeting preset condition;The preset condition are as follows: there are the first search of sensitive content As a result the first quantity is greater than the second quantity of the second search result there is no sensitive content;
First determining module, for when finding query information in target susceptibility information, determining that query information is quick Feel information.
Optionally, in embodiments of the present invention, further includes:
Second obtains module, for obtaining before searching query information in excavating obtained target susceptibility information in advance The historical query information of search box is input in preset historical time section;
Second determining module, for determining in historical time section, what is be clicked corresponding to historical query information is searched Hitch fruit;
Third determining module, for determining, there are the first search results of sensitive content in the search result being clicked First quantity and there is no the second quantity of the second search result of sensitive content;
4th determining module, for when the first quantity is greater than the second quantity, determining that historical query information is target susceptibility Information.
Optionally, in embodiments of the present invention, third determining module includes:
Whether recognition unit, the object content in search result being clicked for identification include sensitive content;Its In, object content includes: title and/or surface plot;
First determination unit determines that the search result being clicked is for when object content includes sensitive content First search result;
Second determination unit, for when object content does not include sensitive content, determining that the search result being clicked is Second search result;
Statistic unit, for counting the first quantity of the first search result and the second quantity of the second search result.
Optionally, in embodiments of the present invention, recognition unit is specifically used for:
Determine the number that the search result being clicked is clicked in historical time section;
When number is more than or equal to default number of clicks, identify whether the object content in the search result being clicked wraps Contain sensitive content.
Optionally, in embodiments of the present invention, the first acquisition module is specifically used for:
Obtain the user journal in preset historical time section;
From obtaining the historical query information for being input to search box in historical time section in user journal.
Optionally, in embodiments of the present invention, sensitive information may include: pornography and/or violence information.
The third aspect, the embodiment of the invention also provides a kind of electronic equipment, including processor, communication interface, memory And communication bus, wherein processor, communication interface, memory complete mutual communication by communication bus;
Memory, for storing computer program;
Processor when for executing the program stored on memory, realizes the described in any item method steps of first aspect Suddenly.
Fourth aspect, the embodiment of the invention also provides a kind of computer readable storage medium, the computer-readable storages Dielectric memory contains computer program, and first aspect described in any item sides are realized when the computer program is executed by processor Method step.
5th aspect, the embodiment of the invention also provides a kind of computer program products comprising instruction, when it is being calculated When being run on machine, so that computer executes the described in any item method and steps of first aspect.
In embodiments of the present invention, the query information that user is input in search box can be obtained.It is then possible to preparatory It excavates in obtained target susceptibility information and searches the query information.Wherein, target susceptibility information includes meeting going through for preset condition History query information.The preset condition are as follows: be greater than there are the first quantity of the first search result of sensitive content and be not present in sensitivity Second quantity of the second search result held.It, then can be true also, when finding the query information in target susceptibility information The fixed query information is sensitive information.In this way, can identify query information by the target susceptibility information excavated in advance Whether be sensitive information, avoid through regular expression and identify sensitive information, reduce identification sensitive information manpower at This.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described.
Fig. 1 is a kind of flow chart of sensitive information recognition methods provided in an embodiment of the present invention;
Fig. 2 is a kind of flow chart of method for excavating target susceptibility information provided in an embodiment of the present invention;
Fig. 3 is a kind of structural schematic diagram of sensitive information identification device provided in an embodiment of the present invention;
Fig. 4 is the structural schematic diagram of a kind of electronic equipment provided in an embodiment of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention is described.
In order to solve the problems in the existing technology, the embodiment of the invention provides a kind of sensitive information recognition methods, Device and electronic equipment.
Sensitive information recognition methods provided in an embodiment of the present invention is illustrated first below.
The sensitive information recognition methods can be applied to electronic equipment, the electronic equipment include but is not limited to computer, Mobile phone, intelligent wearable device and server.
Fig. 1 is a kind of flow chart of sensitive information recognition methods provided in an embodiment of the present invention.Referring to Fig. 1, sensitivity letter Breath recognition methods may include steps of:
S101: it obtains user and is input to the query information in search box;
It is understood that the search box can be search box set in browser, or in video website Set search box, is not limited thereto certainly.
For example, which can be " popular TV play " or " men and women does ", be not limited thereto certainly.
S102: query information is searched in excavating obtained target susceptibility information in advance;Wherein, target susceptibility information includes Meet the historical query information of preset condition;Preset condition are as follows: there are the first quantity of the first search result of sensitive content is big In there is no the second quantity of the second search result of sensitive content;
It is understood that can first excavate to obtain target susceptibility information, then searching in target susceptibility information again should Query information.
Wherein, the search result of each historical query information may include the first search result and the second search result.And And there are sensitive content in each first search result, sensitive content is not present in each second search result.
In one implementation, in mining process, for a historical query information, when the historical query is believed It, then can be with when the quantity (i.e. the first quantity) of first search result of breath is greater than quantity (i.e. the second quantity) of the second search result Determine that there are the search result of sensitive content is more than the search result there is no sensitive content.At this point it is possible to determine that this is gone through History query information is target susceptibility information.In this way, making in mining process, can be based on existing in obtained search result First search result quantity of sensitive content and there is no the second search result quantity of sensitive content come determine target susceptibility believe Breath.Wherein, sensitive content may include: Pornograph and/or violent content.
In another implementation, in mining process, for a historical query information, when the historical query The quantity (i.e. the first quantity) for the first search result of information being clicked is greater than the number for the second search result being clicked When measuring (i.e. the second quantity), the search result that there are the search results of sensitive content than sensitive content is not present can also be determined It is more.At this point it is possible to determine that the historical query information is target susceptibility information.In this way, making in mining process, can be based on There are the first search result quantity of sensitive content and there is no the second of sensitive content in the search result clicked by user Search result quantity determines target susceptibility information.
Wherein, due to may not only include the first search result in the search result of a historical query information but also include second Search result.And for a user, what user wanted search may be in the search result there is no the of sensitive content Two search results.Thus, based in the search result clicked by user there are the first search result quantity of sensitive content and The mode of target susceptibility information is determined there is no the second search result quantity of sensitive content, can be searched by what is be clicked Hitch fruit predicts user really interested content erotic content or non-sensitive content.In this way, can be in conjunction with the point of user It hits Behavior mining and obtains accurate target susceptibility information.
Wherein, Fig. 2 is a kind of flow chart of method for excavating target susceptibility information provided in an embodiment of the present invention.It ties below Fig. 2 is closed, the mode provided in an embodiment of the present invention for excavating target susceptibility information is illustrated.Referring to fig. 2, target susceptibility letter is excavated The mode of breath may include steps of:
S201: the historical query information that search box is input in preset historical time section is obtained;
It is understood that electronic equipment can obtain the user journal in preset historical time section.Then, from The historical query information that search box is input in historical time section is obtained in the log of family.For example, the historical query information It can be " men and women does ", be not limited thereto certainly.
Wherein, preset historical time section can be set as the case may be by those skilled in the art, this is default Historical time section can be with are as follows: the previous day of current point in time.It is of course also possible to be the previous moon of current point in time, certainly It is not limited thereto.
S202: determining in historical time section, the search result being clicked corresponding to historical query information;
Wherein it is determined that the search result clicked in the search result by user, it can be realized that user inputs the history The purpose of query information, it can determine user's really interested search result.To be searched according to user is really interested Hitch fruit determines whether the historical query information is target susceptibility information.
S203: there are the first quantity of the first search result of sensitive content and not in the determining search result being clicked There are the second quantity of the second search result of sensitive content;
After determining the corresponding search result being clicked of the historical query information, it can determine in the search result There are the first search results of sensitive content, and, there is no the second search results of sensitive content.It is then possible to count First quantity of one search result and the second quantity of the second search result.
Wherein it is possible to be determined as follows each search result being clicked with the presence or absence of sensitive content: identification Whether the object content in search result being clicked includes sensitive content.Wherein, object content include: title and/or Surface plot.
Since under normal circumstances, the number of words of the title of search result is more, thus can by identification title in whether There are sensitive contents to determine in the search result with the presence or absence of sensitive content.It specifically, can be by the semanteme that constructs in advance Whether identification model determines comprising sensitive content in the title, certainly simultaneously to identify the meaning of the title according to the meaning It is not limited to this.Wherein, when there are when sensitive content, then show that there are sensitive contents in the search result in title.
Wherein, the title of search result may include: the browser pop-up after clicking the search historical query information , it is one or more in the title division content and brief introductory section content of the search result, this is all reasonable.
In addition, since the surface plot in search result usually can reflect out the main contents of the search result.Thus, it can To determine in the search result by the way that whether identification surface plot is sensitization picture with the presence or absence of sensitive content.It specifically, can be with Content included in the surface plot (such as nude) is identified by picture recognition model, alternatively, identifying institute in the surface plot Classification belonging to the content for including (such as pornographic classification).In turn, can and then the surface plot be determined according to picture recognition result In whether include sensitive content, be not limited thereto certainly.
It is understood that the semantics recognition model can be that any one can identify text semanteme in the related technology Model.In addition, the picture recognition model can be that any one can be identified in image content or picture in the related technology Hold the model of generic.It is not specifically limited herein.
Wherein, in order to avoid clicking operation caused by maloperation, it can also determine that the search result being clicked is being gone through The number being clicked in the history period.Wherein, when number is more than or equal to default number of clicks, then show that number of clicks is more, It is the purposive click of user.At this point it is possible to execute object content in the search result that was clicked of identification whether include There is the operation of sensitive content.
Conversely, then showing that click caused by being likely to user misoperation is grasped when number is less than default number of clicks Make.At this point it is possible to abandon executing object content in the search result that was clicked of identification whether include sensitive content behaviour Make.In this way, the overdue search result hit can be filtered out, so as to reduce the number for the search result for needing to detect sensitive content Amount, and the accuracy of determining target susceptibility information can be improved.
S204: when the first quantity is greater than the second quantity, determine that historical query information is target susceptibility information.
Wherein, for a historical query information, the search knot that was clicked when the historical query information is corresponding When the first quantity of the first search result is greater than the second quantity of the second search result in fruit, then show the historical query information pair The search knot that there are the search result quantity of sensitive content than sensitive content is not present in the search result being clicked answered Fruit quantity is more, and shows that user wants to search sensitive content by the historical query information.At this point it is possible to determine the history Query information is target susceptibility information.
S103: when finding query information in target susceptibility information, determine that query information is sensitive information.
Wherein, when finding the query information in target susceptibility information, show that user wants through the query information Sensitive content is searched, at this time can determine that the query information is sensitive information.In this way, can be by the mesh that excavates in advance Sensitive information is marked simply and rapidly to identify whether query information is sensitive information.
Wherein, after determining that the query information is sensitive information, the search comprising the sensitive information can also be shielded and mentioned Show information and search result etc..Wherein, sensitive information includes: pornography and/or violence information.
In embodiments of the present invention, the query information that user is input in search box can be obtained.It is then possible to preparatory It excavates in obtained target susceptibility information and searches the query information.Wherein, target susceptibility information includes meeting going through for preset condition History query information.The preset condition are as follows: be greater than there are the first quantity of the first search result of sensitive content and be not present in sensitivity Second quantity of the second search result held.It, then can be true also, when finding the query information in target susceptibility information The fixed query information is sensitive information.In this way, can identify query information by the target susceptibility information excavated in advance Whether be sensitive information, avoid through regular expression and identify sensitive information, reduce identification sensitive information manpower at This.
It to sum up, can be by the mesh that excavates in advance using sensitive information identifying schemes provided in an embodiment of the present invention Sensitive information is marked simply and rapidly to identify whether query information is sensitive information, improves the speed of identification sensitive information, and Reduce the human cost of identification sensitive information.
Corresponding to above method embodiment, the embodiment of the invention also provides a kind of sensitive information identification devices, referring to figure 3, the apparatus may include:
First obtains module 301, the query information being input in search box for obtaining user;
Searching module 302, for searching query information in excavating obtained target susceptibility information in advance;Wherein, described Target susceptibility information includes the historical query information for meeting preset condition;The preset condition are as follows: there are the first of sensitive content First quantity of search result is greater than the second quantity of the second search result there is no sensitive content;
First determining module 303, for when finding query information in target susceptibility information, determining that query information is Sensitive information.
Using device provided in an embodiment of the present invention, the query information that user is input in search box can be obtained.Then, The query information can be searched in excavating obtained target susceptibility information in advance.Wherein, target susceptibility information includes meeting in advance If the historical query information of condition.The preset condition are as follows: there are the first quantity of the first search result of sensitive content to be greater than not There are the second quantity of the second search result of sensitive content.Also, works as and find the query information in target susceptibility information When, then it can determine that the query information is sensitive information.In this way, can be known by the target susceptibility information excavated in advance Whether other query information is sensitive information, avoids through regular expression and identifies sensitive information, reduces the sensitive letter of identification The human cost of breath.
Optionally, in embodiments of the present invention, can also include:
Second obtains module, for obtaining before searching query information in excavating obtained target susceptibility information in advance The historical query information of search box is input in preset historical time section;
Second determining module, for determining in historical time section, what is be clicked corresponding to historical query information is searched Hitch fruit;
Third determining module, for determining, there are the first search results of sensitive content in the search result being clicked First quantity and there is no the second quantity of the second search result of sensitive content;
4th determining module, for when the first quantity is greater than the second quantity, determining that historical query information is target susceptibility Information.
Optionally, in embodiments of the present invention, third determining module may include:
Whether recognition unit, the object content in search result being clicked for identification include sensitive content;Its In, object content includes: title and/or surface plot;
First determination unit determines that the search result being clicked is for when object content includes sensitive content First search result;
Second determination unit, for when object content does not include sensitive content, determining that the search result being clicked is Second search result;
Statistic unit, for counting the first quantity of the first search result and the second quantity of the second search result.
Optionally, in embodiments of the present invention, recognition unit is specifically used for:
Determine the number that the search result being clicked is clicked in historical time section;
When number is more than or equal to default number of clicks, identify whether the object content in the search result being clicked wraps Contain sensitive content.
Optionally, in embodiments of the present invention, the first acquisition module 301 is specifically used for:
Obtain the user journal in preset historical time section;
From obtaining the historical query information for being input to search box in historical time section in user journal.
Optionally, in embodiments of the present invention, sensitive information may include: pornography and/or violence information.
Corresponding to above method embodiment, the embodiment of the invention also provides a kind of electronic equipment, as shown in figure 4, including Processor 401, communication interface 402, memory 403 and communication bus 404, wherein processor 401, communication interface 402, storage Device 403 completes mutual communication by communication bus 404,
Memory 403, for storing computer program;
Processor 401 when for executing the program stored on memory 403, realizes any of the above-described sensitive letter Cease the method and step of recognition methods.
In embodiments of the present invention, electronic equipment can obtain the query information that user is input in search box.Then, may be used To search the query information in excavating obtained target susceptibility information in advance.Wherein, target susceptibility information includes meeting to preset The historical query information of condition.The preset condition are as follows: be greater than there are the first quantity of the first search result of sensitive content and do not deposit In the second quantity of the second search result of sensitive content.Also, when finding the query information in target susceptibility information, It can then determine that the query information is sensitive information.In this way, can be identified by the target susceptibility information excavated in advance Whether query information is sensitive information, avoids through regular expression and identifies sensitive information, reduces identification sensitive information Human cost.
Corresponding to above method embodiment, the embodiment of the invention also provides a kind of computer readable storage medium, the meters It is stored with computer program in calculation machine readable storage medium storing program for executing, realizes that any of the above-described is sensitive when computer program is executed by processor The method and step of information identifying method.
The computer program stored in computer readable storage medium provided in an embodiment of the present invention is by the place of electronic equipment After managing device execution, electronic equipment can obtain the query information that user is input in search box.It is then possible to be excavated in advance To target susceptibility information in search the query information.Wherein, target susceptibility information includes meeting the historical query of preset condition Information.The preset condition are as follows: be greater than that there is no the of sensitive content there are the first quantity of the first search result of sensitive content Second quantity of two search results.Also, when finding the query information in target susceptibility information, then it can determine that this is looked into Inquiry information is sensitive information.In this way, can be identified by the target susceptibility information excavated in advance query information whether be Sensitive information avoids through regular expression and identifies sensitive information, reduces the human cost of identification sensitive information.
Corresponding to above method embodiment, in another embodiment provided by the invention, additionally provide a kind of comprising instruction Computer program product, when run on a computer, so that computer executes the sensitive letter of any one of above-described embodiment Cease the method and step of recognition methods.
After computer program provided in an embodiment of the present invention is executed by the processor of electronic equipment, electronic equipment can be obtained User is input to the query information in search box.It is looked into it is then possible to search this in excavating obtained target susceptibility information in advance Ask information.Wherein, target susceptibility information includes the historical query information for meeting preset condition.The preset condition are as follows: there are sensitivities First quantity of the first search result of content is greater than the second quantity of the second search result there is no sensitive content.Also, When finding the query information in target susceptibility information, then it can determine that the query information is sensitive information.In this way, can be with Identify whether query information is sensitive information, avoids and passes through regular expressions by the target susceptibility information excavated in advance Formula identifies sensitive information, reduces the human cost of identification sensitive information.
The communication bus that above-mentioned electronic equipment is mentioned can be Peripheral Component Interconnect standard (Peripheral Component Interconnect, PCI) bus or expanding the industrial standard structure (Extended Industry Standard Architecture, EISA) bus etc..The communication bus can be divided into address bus, data/address bus, control bus etc..For just It is only indicated with a thick line in expression, figure, it is not intended that an only bus or a type of bus.
Communication interface is for the communication between above-mentioned electronic equipment and other equipment.
Memory may include random access memory (Random Access Memory, RAM), also may include non-easy The property lost memory (Non-Volatile Memory, NVM), for example, at least a magnetic disk storage.Optionally, memory may be used also To be storage device that at least one is located remotely from aforementioned processor.
Above-mentioned processor can be general processor, including central processing unit (Central Processing Unit, CPU), network processing unit (Network Processor, NP) etc.;It can also be digital signal processor (Digital Signal Processing, DSP), it is specific integrated circuit (Application Specific Integrated Circuit, ASIC), existing It is field programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device, discrete Door or transistor logic, discrete hardware components.
In the above-described embodiments, can come wholly or partly by software, hardware, firmware or any combination thereof real It is existing.When implemented in software, it can entirely or partly realize in the form of a computer program product.The computer program Product includes one or more computer instructions.When loading on computers and executing the computer program instructions, all or It partly generates according to process or function described in the embodiment of the present invention.The computer can be general purpose computer, dedicated meter Calculation machine, computer network or other programmable devices.The computer instruction can store in computer readable storage medium In, or from a computer readable storage medium to the transmission of another computer readable storage medium, for example, the computer Instruction can pass through wired (such as coaxial cable, optical fiber, number from a web-site, computer, server or data center User's line (DSL)) or wireless (such as infrared, wireless, microwave etc.) mode to another web-site, computer, server or Data center is transmitted.The computer readable storage medium can be any usable medium that computer can access or It is comprising data storage devices such as one or more usable mediums integrated server, data centers.The usable medium can be with It is magnetic medium, (for example, floppy disk, hard disk, tape), optical medium (for example, DVD) or semiconductor medium (such as solid state hard disk Solid State Disk (SSD)) etc..
It should be noted that, in this document, relational terms such as first and second and the like are used merely to a reality Body or operation are distinguished with another entity or operation, are deposited without necessarily requiring or implying between these entities or operation In any actual relationship or order or sequence.Moreover, the terms "include", "comprise" or its any other variant are intended to Non-exclusive inclusion, so that the process, method, article or equipment including a series of elements is not only wanted including those Element, but also including other elements that are not explicitly listed, or further include for this process, method, article or equipment Intrinsic element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that There is also other identical elements in process, method, article or equipment including the element.
Each embodiment in this specification is all made of relevant mode and describes, same and similar portion between each embodiment Dividing may refer to each other, and each embodiment focuses on the differences from other embodiments.Especially for device, For electronic equipment, computer readable storage medium and computer program product embodiments, since it is substantially similar to method reality Example is applied, so being described relatively simple, the relevent part can refer to the partial explaination of embodiments of method.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the scope of the present invention.It is all Any modification, equivalent replacement, improvement and so within the spirit and principles in the present invention, are all contained in protection scope of the present invention It is interior.

Claims (10)

1. a kind of sensitive information recognition methods characterized by comprising
It obtains user and is input to the query information in search box;
The query information is searched in excavating obtained target susceptibility information in advance;Wherein, the target susceptibility information includes Meet the historical query information of preset condition;The preset condition are as follows: there are the first numbers of the first search result of sensitive content Amount is greater than the second quantity of the second search result there is no sensitive content;
When finding the query information in the target susceptibility information, determine that the query information is sensitive information.
2. the method according to claim 1, wherein described in excavating obtained target susceptibility information in advance Before the step of searching the query information, further includes:
Obtain the historical query information that described search frame is input in preset historical time section;
It determines in the historical time section, the search result being clicked corresponding to the historical query information;
It determines in the search result that was clicked there are the first quantity of the first search result of sensitive content and there is no sensitivities Second quantity of the second search result of content;
When first quantity is greater than second quantity, determine that the historical query information is the target susceptibility information.
3. according to the method described in claim 2, it is characterized in that, there is sensitivity in the search result that the determination was clicked First quantity of the first search result of content and the step of there is no the second quantity of the second search result of sensitive content, packet It includes:
Identify whether the object content in the search result being clicked includes sensitive content;Wherein, the object content packet It includes: title and/or surface plot;
If the object content includes sensitive content, the search result being clicked described in determination is the first search result;
If the object content does not include sensitive content, the search result being clicked described in determination is the second search result;
Count the first quantity of first search result and the second quantity of second search result.
4. according to the method described in claim 3, it is characterized in that, in the target identified in the search result being clicked The step of whether appearance includes sensitive content, comprising:
Determine the number that the search result being clicked is clicked in the historical time section;
When the number is more than or equal to default number of clicks, identify whether the object content in the search result being clicked wraps Contain sensitive content.
5. the method according to any one of claim 2-4, which is characterized in that the acquisition is in preset historical time section The step of being inside input to the historical query information of described search frame, comprising:
Obtain the user journal in preset historical time section;
From obtaining the historical query information for being input to described search frame in the historical time section in the user journal.
6. the method according to claim 1, wherein the sensitive information includes: pornography and/or violence letter Breath.
7. a kind of sensitive information identification device characterized by comprising
First obtains module, the query information being input in search box for obtaining user;
Searching module, for searching the query information in excavating obtained target susceptibility information in advance;Wherein, the target Sensitive information includes the historical query information for meeting preset condition;The preset condition are as follows: there are the first search of sensitive content As a result the first quantity is greater than the second quantity of the second search result there is no sensitive content;
First determining module, for determining the inquiry when finding the query information in the target susceptibility information Information is sensitive information.
8. device according to claim 7, which is characterized in that further include:
Second obtains module, for obtaining before searching the query information in excavating obtained target susceptibility information in advance The historical query information of described search frame is input in preset historical time section;
Second determining module was clicked corresponding to the historical query information for determining in the historical time section Search result;
Third determining module, for determining, there are the first of the first search result of sensitive content in the search result being clicked Quantity and there is no the second quantity of the second search result of sensitive content;
4th determining module, for determining that the historical query information is when first quantity is greater than second quantity The target susceptibility information.
9. device according to claim 8, which is characterized in that the third determining module includes:
Whether recognition unit, the object content in search result being clicked for identification include sensitive content;Wherein, institute Stating object content includes: title and/or surface plot;
First determination unit, the search knot for being clicked described in determination when the object content includes sensitive content Fruit is the first search result;
Second determination unit, the search knot for being clicked described in determination when the object content does not include sensitive content Fruit is the second search result;
Statistic unit, for counting the first quantity of first search result and the second quantity of second search result.
10. a kind of electronic equipment, which is characterized in that including processor, communication interface, memory and communication bus, wherein processing Device, communication interface, memory complete mutual communication by communication bus;
Memory, for storing computer program;
Processor when for executing the program stored on memory, realizes method step as claimed in any one of claims 1 to 6 Suddenly.
CN201910574799.8A 2019-06-28 2019-06-28 A kind of sensitive information recognition methods, device and electronic equipment Pending CN110309423A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910574799.8A CN110309423A (en) 2019-06-28 2019-06-28 A kind of sensitive information recognition methods, device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910574799.8A CN110309423A (en) 2019-06-28 2019-06-28 A kind of sensitive information recognition methods, device and electronic equipment

Publications (1)

Publication Number Publication Date
CN110309423A true CN110309423A (en) 2019-10-08

Family

ID=68078597

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910574799.8A Pending CN110309423A (en) 2019-06-28 2019-06-28 A kind of sensitive information recognition methods, device and electronic equipment

Country Status (1)

Country Link
CN (1) CN110309423A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111666317A (en) * 2020-07-06 2020-09-15 腾讯科技(深圳)有限公司 Cheating information mining method and cheating information identification method and device
CN112818249A (en) * 2021-03-04 2021-05-18 中南大学 Multi-dimensional image construction method and system for crowd with specific tendency

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103020123A (en) * 2012-11-16 2013-04-03 中国科学技术大学 Method for searching bad video website
CN107862076A (en) * 2017-11-29 2018-03-30 四川九鼎智远知识产权运营有限公司 A kind of sensitive vocabulary monitor supervision platform
CN108388582A (en) * 2012-02-22 2018-08-10 谷歌有限责任公司 The mthods, systems and devices of related entities for identification

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108388582A (en) * 2012-02-22 2018-08-10 谷歌有限责任公司 The mthods, systems and devices of related entities for identification
CN103020123A (en) * 2012-11-16 2013-04-03 中国科学技术大学 Method for searching bad video website
CN107862076A (en) * 2017-11-29 2018-03-30 四川九鼎智远知识产权运营有限公司 A kind of sensitive vocabulary monitor supervision platform

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111666317A (en) * 2020-07-06 2020-09-15 腾讯科技(深圳)有限公司 Cheating information mining method and cheating information identification method and device
CN112818249A (en) * 2021-03-04 2021-05-18 中南大学 Multi-dimensional image construction method and system for crowd with specific tendency
CN112818249B (en) * 2021-03-04 2022-06-21 中南大学 Multi-dimensional image construction method and system for crowd with specific tendency

Similar Documents

Publication Publication Date Title
US10353947B2 (en) Relevancy evaluation for image search results
US7917528B1 (en) Contextual display of query refinements
CN110209827B (en) Search method, search device, computer-readable storage medium, and computer device
US8515809B2 (en) Dynamic modification of advertisements displayed in response to a search engine query
CN107784092A (en) A kind of method, server and computer-readable medium for recommending hot word
US10713291B2 (en) Electronic document generation using data from disparate sources
US20120221411A1 (en) Apparatus and methods for determining user intent and providing targeted content according to intent
US9892096B2 (en) Contextual hyperlink insertion
EP2646933A1 (en) Enabling predictive web browsing
WO2018205845A1 (en) Data processing method, server, and computer storage medium
EP2862105A1 (en) Ranking search results based on click through rates
CN108390788A (en) User identification method, device and electronic equipment
WO2017045532A1 (en) Application program classification display method and apparatus
CN109753601A (en) Recommendation information clicking rate determines method, apparatus and electronic equipment
CN109190014B (en) Regular expression generation method and device and electronic equipment
CN110309423A (en) A kind of sensitive information recognition methods, device and electronic equipment
CN104699837B (en) Method, device and server for selecting illustrated pictures of web pages
CN103955480B (en) A kind of method and apparatus for determining the target object information corresponding to user
CN107885875B (en) Synonymy transformation method and device for search words and server
CN108427883A (en) Webpage digs the detection method and device of mine script
CN109067794A (en) A kind of detection method and device of network behavior
CN116015842A (en) Network attack detection method based on user access behaviors
CN112836126A (en) Recommendation method and device based on knowledge graph, electronic equipment and storage medium
TWI457775B (en) Method for sorting and managing websites and electronic device of executing the same
CN109240591A (en) Interface display method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination