CN108647281A - Web page access risk supervision, reminding method, device and computer equipment - Google Patents

Web page access risk supervision, reminding method, device and computer equipment Download PDF

Info

Publication number
CN108647281A
CN108647281A CN201810416471.9A CN201810416471A CN108647281A CN 108647281 A CN108647281 A CN 108647281A CN 201810416471 A CN201810416471 A CN 201810416471A CN 108647281 A CN108647281 A CN 108647281A
Authority
CN
China
Prior art keywords
document
risk
web
name entity
web page
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810416471.9A
Other languages
Chinese (zh)
Other versions
CN108647281B (en
Inventor
刘健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201810416471.9A priority Critical patent/CN108647281B/en
Publication of CN108647281A publication Critical patent/CN108647281A/en
Application granted granted Critical
Publication of CN108647281B publication Critical patent/CN108647281B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Transfer Between Computers (AREA)

Abstract

This application involves a kind of web page access risk supervision, reminding method, device and computer equipment, the web page access risk checking method includes:Receive the web page address for the accessed webpage that terminal reports;Obtain the web document corresponding to web page address;Identify the name entity in web document;Include the quantity for the web document for naming entity and negative emotion content according to document library, determines risk supervision result;Document library includes the corresponding web document of web page address that each terminal history reports;Risk supervision result is fed back to the terminal corresponding to currently received web page address.Scheme provided by the present application may be implemented to improve the efficiency for carrying out a large amount of webpages for interconnecting the various information of Web realease risk supervision.

Description

Web page access risk supervision, reminding method, device and computer equipment
Technical field
This application involves field of computer technology, more particularly to a kind of web page access risk checking method, web page access Indicating risk method, apparatus and computer equipment.
Background technology
With the fast development of Internet technology, people can be obtained information by each webpage on internet, be shared Life dynamic, on-line payment etc..But at the same time, the network security problem of internet is increasingly gradually grown, and utilizes the net on internet The case where illegal information of page communication network and flame, is becoming increasingly rampant, and has brought great property loss and the person is pacified Total loss.
However, depending on the act of user for the detection for the webpage for having issued illegal information or flame at present Report, it is clear that this mode is extremely inefficient, and the network security of user cannot ensure.
Invention content
Based on this, it is necessary to for the existing mode being detected to issuing the webpage of illegal information or flame Less efficient technical problem provides a kind of web page access risk supervision, reminding method, device and computer equipment.
A kind of web page access risk checking method, including:
Receive the web page address for the accessed webpage that terminal reports;
Obtain the web document corresponding to the web page address;
Identify the name entity in the web document;
Include the quantity of the web document of the name entity and negative emotion content according to document library, determines that risk is examined Survey result;The document library includes the corresponding web document of web page address that each terminal history reports;
The risk supervision result is fed back to the terminal corresponding to currently received web page address.
A kind of web page access indicating risk method, including:
Obtain the web page address of accessed webpage;
The web page address is reported to server;
Receive the risk supervision result of the server feedback;The risk supervision result includes name according to document library The quantity of the web document of entity and negative emotion content determines, and the name entity is present in corresponding to the web page address Web document in;The document library includes the corresponding web document of web page address that each terminal history reports;
When the risk supervision result indicates, there are when risk, indicating risk to be shown by the risk supervision result.
A kind of web page access risk supervision device, described device include:
Web page address receiving module, the web page address for receiving the accessed webpage that terminal reports;
Web document acquisition module, for obtaining the web document corresponding to the web page address;
Entity recognition module is named, for identification the name entity in the web document;
Determining module, for including the number of the web document for naming entity and negative emotion content according to document library Amount, determines risk supervision result;The document library includes the corresponding web document of web page address that each terminal history reports;
Feedback module, for feeding back the risk supervision result to the terminal corresponding to currently received web page address.
A kind of web page access indicating risk device, described device include:
Web page address acquisition module, the web page address for obtaining accessed webpage;
Web page address reporting module, for reporting the web page address to server;
Receiving module, the risk supervision result for receiving the server feedback;The risk supervision result is according to text Shelves library includes that the quantity for the web document for naming entity and negative emotion content determines, and described in the name entity is present in In web document corresponding to web page address;The document library includes the corresponding webpage text of web page address that each terminal history reports Shelves;
Reminding module, for when the risk supervision result indicate there are when risk, shown by the risk supervision result Indicating risk.
A kind of computer readable storage medium is stored with computer program, when the computer program is executed by processor, So that the processor executes the step of above-mentioned web page access risk checking method or web page access indicating risk method.
A kind of computer equipment, including memory and processor, the memory are stored with computer program, the calculating When machine program is executed by the processor so that the processor executes above-mentioned web page access risk checking method or web page access The step of indicating risk method.
Above-mentioned web page access risk supervision, reminding method, device and computer equipment, the accessed net reported using terminal The web page address of page obtains corresponding web document, and the webpage of magnanimity can be got by the web page address that each terminal history reports Document, it is more accurate and efficient for collecting the web document for propagating various information manually.In this way, After the name entity in identifying the corresponding web document of the accessed webpage of present terminal, so that it may with based on the packet in document library The quantity for including the web document of the name entity and negative emotion content determines the risk supervision of the accessed webpage of terminal as a result, end End can show corresponding indicating risk by the risk supervision result of feedback in accessed webpage, can improve to internet A large amount of webpages of the upper various information of publication carry out the efficiency of risk supervision.
Description of the drawings
Fig. 1 is the applied environment figure of web page access risk checking method in one embodiment;
Fig. 2 is the flow diagram of web page access risk checking method in one embodiment;
Fig. 3 is accessed the interface schematic diagram of webpage by terminal in one embodiment;
Fig. 4 is showing for each name entity stored in name entity library in one embodiment and corresponding risk index It is intended to;
Fig. 5 carries out web page access risk supervision by the web page address of the access webpage reported to terminal in one embodiment Configuration diagram;
Fig. 6 be one embodiment according to document library include name entity and negative emotion content web document number Amount, determines the flow diagram of risk supervision result;
Fig. 7 is illustrated by the flow that the access risk of the access webpage reported to terminal in one embodiment is detected Figure;
Fig. 8 is the flow diagram of web page access risk checking method in a specific embodiment;
Fig. 9 is the flow diagram of web page access indicating risk method in one embodiment;
Figure 10 is the structure diagram of web page access risk supervision device in one embodiment;
Figure 11 is the structure diagram of web page access indicating risk device in one embodiment;
Figure 12 is the structure diagram of one embodiment Computer equipment;
Figure 13 is the structure diagram of another embodiment Computer equipment.
Specific implementation mode
It is with reference to the accompanying drawings and embodiments, right in order to make the object, technical solution and advantage of the application be more clearly understood The application is further elaborated.It should be appreciated that specific embodiment described herein is only used to explain the application, and It is not used in restriction the application.
Fig. 1 is the applied environment figure of web page access risk checking method in one embodiment.Referring to Fig.1, the web page access Risk checking method is applied to web page access risk detecting system.The web page access risk detecting system includes 110 kimonos of terminal Business device 120.Terminal 110 and server 120 pass through network connection.Terminal 110 can be specifically terminal console or mobile terminal, move Dynamic terminal specifically can be at least one of mobile phone, tablet computer, laptop etc..Server 120 can use independent clothes The server cluster of business device either multiple servers composition is realized.
As shown in Fig. 2, in one embodiment, providing a kind of web page access risk checking method.The present embodiment is main It is illustrated in this way applied to the server 120 in above-mentioned Fig. 1.With reference to Fig. 2, web page access risk checking method tool Body includes the following steps:
S202 receives the web page address for the accessed webpage that terminal reports.
Wherein, the accessed webpage of terminal can be the webpage of the pending access of terminal, can also be that terminal is browsing Webpage.Web page address can be the corresponding URL of the webpage (Uniform Resource Locator, unified resource positioning Symbol).
Specifically, terminal can be in user triggers and opens browser after the corresponding instruction of some webpage, by the webpage pair The web page address answered reports to server, the web page address that server receiving terminal reports.Terminal can also be worked as in user in browsing When preceding webpage, the address of the webpage of browsing is reported into server, the web page address that server receiving terminal reports.
S204 obtains the web document corresponding to web page address.
Wherein, web document is the document that webpage text content is constituted.For example, web document can include the webpage institute exhibition All content of text shown, including the content of text delivered of advertisement text content, user or newsletter archive content etc..
Specifically, server can crawl the web document corresponding to the web page address after getting the web page address, and The web document is stored to database, subsequently to analyze the webpage text content of the web document.In a reality It applies in example, web page address is a station address;Server can crawl all of station address after receiving the station address Subnet page address gets the corresponding web document of each subnet page address, can disposably crawl related to the station address Multiple web documents, and multiple web documents are stored in local.
It is appreciated that the web document that server obtains is got according to the web page address of the accessed webpage of terminal, Thus, in the miscellaneous webpage of each terminal access, server can be collected into a large amount of web document, to be based on A large amount of web documents of the accessed webpage of user carry out web page access risk supervision, highly efficient and accurate, be not necessarily to server from Interconnect a large amount of webpage of online collection.
In one embodiment, step S204 further includes:When there is no it is corresponding with web page address crawl record when, crawl Web document corresponding to web page address stores the web document crawled into document library.
Wherein, document library is the database for storing the corresponding web document of web page address that each terminal history reports. Server can inquire whether crawled the web page address, when there is no corresponding with the web page address after getting web page address When crawling record, just crawl the web document corresponding to the web page address, the web document crawled stored into document library. In this way, can avoid the feelings of the corresponding web document of the same webpage of repeated downloads in different terminal accesses identical webpage Condition.
S206 identifies the name entity in web document.
Wherein, name entity (Named Entity) is name, place name, the tissue in the webpage text content of web document Mechanism name and other entities with entitled mark.Name entity can also be the temporal expression in webpage text content (Temporal Expressions), such as date, time and duration etc..Name entity can also be webpage text content In numerical expression (Number Expressions), such as money, linear module, percentage etc..
In one embodiment, Named Entity Extraction Model can be used to naming entity present in web document in server It is identified.Name entity naming model can be trained in advance based on Hidden Markov Model (HiddenMarkovMo De, HMM), maximum entropy model (MaxmiumEntropy, ME), supporting vector machine model (Support VectorMachine, SVM), the machine learning model of conditional random field models (ConditionalRandom Fields, CRF) or neural network.
In one embodiment, step S206 is specifically included:Obtain the corresponding term vector of each word in web document;By each word Term vector be input in the Named Entity Extraction Model of pre-training, output obtains the corresponding entity of each word in webpage text content Type;Using the word for belonging to default entity type as the name entity recognized from web document.
Specifically, the predeterminable dictionary collection of server, according to the dictionary collection determine obtain web document in each word word to Amount, the term vector of each word is input in the Named Entity Extraction Model of pre-training, exports to obtain web page text by the model The corresponding entity type of each word in content, using the word for belonging to default entity type as the name recognized from the web document Entity.
Wherein, the corresponding entity type of each word can be the class label of each word, and each word has corresponding class label, Class label can be the word of default entity type as name entity by server.For example, being carried out to the risk for accessing webpage When detection, the general name entity for only considering to belong to the entity types such as name, place name, institution term does not consider to belong to number The name entity of both of expression formula or temporal expression entity types.
As shown in figure 3, showing the surface chart for the webpage that terminal is accessed.In the interface, having online friend, " king is first " XXX engineerings are multiple level marketing on earth to the webpage text content of life " publication”.Server obtains the net for the webpage that terminal reports Page address, after downloading the corresponding web document of the webpage according to the web page address, by the webpage text content in the web document " XXX engineerings are multiple level marketing on earth" be input in Named Entity Extraction Model, it can be belonged to by naming physical model to identify The name entity " XXX engineerings " of organization's title.
In one embodiment, step S204 further includes:When presence is corresponding with web page address crawls record, net is inquired The risk index of the included name entity of the corresponding web document of page address, and determine that risk is examined according to the risk index inquired Survey result.
Wherein, risk index is the numerical value of the risk for quantifying to name entity in web document.Name entity corresponding Risk index is higher, represents the name entity to there is a possibility that risk is higher.
In one embodiment, server can set risk index between 0 to 100, wherein 0 represents the name entity Complete devoid of risk, 100 represent the name entity dangerous, and 50 be default value, and the risk for representing the name entity is unknown or temporary Manual confirmation or to be updated is not known or waits for.
Risk supervision is as a result, be to carry out the result that risk supervision obtains to the accessed webpage currently reported.Risk supervision As a result it is determined by the risk index of the name entity included by the corresponding web document of accessed webpage.In one embodiment, When the corresponding web document of the accessed webpage of terminal includes multiple name entities, server can be according to risk index maximum Name entity determine risk supervision as a result, server can also according to this it is multiple name entities risk index weighted sum come Determine risk supervision result.For example, the corresponding web document of accessed webpage includes 3 name entities, each name entity pair The risk index answered is respectively 10,40 and 80, then the wind of accessed webpage can be determined according to highest risk index " 80 " Dangerous testing result is " high risk webpage " or " malicious web pages ".For another example, including 3 name entities corresponding risk indexs point Not Wei 10,20 and 30, then can determine that the risk supervision result of accessed webpage be " low according to highest risk index " 30 " Risk webpage " or " secure web-page ".
In one embodiment, server can record the web page address that history crawled webpage, get terminal every time When the web page address reported, the web page address of acquisition is matched with the web page address that history reports, to determine the net reported Whether page address had been crawled;When presence is corresponding with the web page address currently reported crawls record, query webpage The risk index of the included name entity of the corresponding web document in location, and risk supervision knot is determined according to the risk index inquired Fruit.
S208 includes the quantity for the web document for naming entity and negative emotion content according to document library, determines risk Testing result;Document library includes the corresponding web document of web page address that each terminal history reports.
As above, the web document stored in document library is to report accessed web page address corresponding by terminal history Web document.In this way, each webpage, when being browsed for the first time by terminal, the web page address of the webpage can be reported to server, clothes Business device can carry out webpage risk supervision to the webpage text content of the web page address, can realize in automatic hair at the first time The existing webpage larger in risk.
Negative emotion content is in web document for judging whether the web document is in the text of negative emotion document Hold.Server can according in web document whether comprising negative emotion content come by web document be divided into negative emotion document and Non- negative emotion document.
Specifically, server can be after the name entity in identifying web document, the webpage that is stored from document library The web document for including the name entity and negative emotion content that identify, the number for the document that statistical query arrives are inquired in document Amount, risk supervision result is determined according to the quantity of statistics.
In one embodiment, machine learning model can be used to the corresponding web document of the web page address of acquisition in server Classify, output obtains the type of the web document, as negative emotion document or non-negative emotion document.Machine learning mould Type can be specifically Naive Bayes Classifier (Naive Bayes classifiers) or the grader based on decision tree.Base Semantic analysis is carried out to the webpage text content of web document in machine learning model, can determine the type of web document.
In one embodiment, the webpage text content in web document can be divided into multiple sentences by server;From more Screening includes the sentence of name entity in a sentence;When there is default negative keyword in the sentence comprising name entity, then Judge that web document includes negative emotion content.
Wherein, negative keyword is preset, being the sentence in the preset webpage text content for judging web document is The no word for negative sentence.Presetting negative keyword can artificially collect and be stored after confirming, can also be by emotion What analysis obtained.It such as can be multi-level marketing, " illegal fund collection ", " Ponzi scheme " and " running away " etc. to preset negative keyword. Specifically, the webpage text content of web document can be divided into multiple sentences by server, will contain name entity and pre- If the sentence of negative keyword is determined as negative sentence, to which the web document for containing negative sentence is determined as including negative The web document of affective content.
Can be with webpage text content it should be noted that being here split the sentence in webpage text content In the text that is spaced of fullstop, question mark or exclamation mark be split for unit, can also be with the text chunk in web page text It falls and is split for unit, to which the document segment for containing name entity and predetermined keyword is determined as negative text chunk Fall, the web document for containing negative text fragment is determined as include negative emotion content web document.
For example, " XXX engineerings are multiple level marketing on earth to the sentence being partitioned into web interface as shown in Figure 3", the language Sentence includes name entity " XXX engineerings " and presets negative keyword multi-level marketing, then the sentence is negative sentence, to judge The web document is the web document for including negative emotion content.
In one embodiment, server can be inquired and identify after the name entity in identifying web document Entity is named to whether there is in name entity library;When the name entity recognized is not present in name entity library, then hold Row determines the step of risk supervision result according to the quantity that document library includes the web document for naming entity and negative emotion content Suddenly.
Wherein, name entity library is referred to for storing the name entity recognized from web document and corresponding risk Several databases.As shown in figure 4, real for each name entity and each name stored in name entity library in one embodiment The schematic diagram of the risk index of body.With reference to Fig. 4, name entity library includes name entities field and risk index field, this reality Example is applied to illustrate for 0 to 100 with risk index:The risk index of multiple level marketing tissue A is 100, is highest risk index, confirms There are greater risks;It is 10 to organize the risk index of B, is relatively low risk index, confirms that risk is relatively low;The risk of charity C Index is 0, confirms devoid of risk;The risk index for being accused of multiple level marketing tissue D is 80, confirms high risk;The risk of automobile brand E refers to Number is 50, it may be possible to new to be added in name entity library, default risk index is set as 50, represent risk wouldn't determine or wait for into One step is analyzed.
Specifically, server can be when the name entity recognized be not present in name entity library, in statistic document library The quantity of web document including name entity and negative emotion content determines that the name recognized is real according to the quantity of statistics The risk index of body determines testing result according to risk index.In one embodiment, server can be by the name recognized reality Body and corresponding risk index are added in name entity library.
In one embodiment, when the name entity recognized is present in name entity library, then directly inquiry identification The risk index of the name entity arrived;Risk supervision result is determined according to the risk index inquired.
Specifically, server can be then directly real from name when the name entity recognized is present in name entity library The risk index for the name entity that inquiry recognizes in body library, the risk of reported webpage is determined according to the risk index inquired Testing result.
S210 feeds back risk supervision result to the terminal corresponding to currently received web page address.
Specifically, server can just identify the net when the web page address that terminal reports there is no record is crawled accordingly Name entity in web document corresponding to page address, when the name entity recognized is not present in name entity library, Include then the quantity of the web document of the name entity and negative emotion content that recognize according to document library, determines the institute reported The risk supervision of webpage is accessed as a result, risk supervision result is fed back to the current terminal for reporting web page address.
In one embodiment, when the name entity recognized is present in name entity library, then directly real from name The risk index for the name entity that inquiry recognizes in body library, the risk of reported webpage is determined according to the risk index inquired Determining risk supervision result is sent to the current terminal for reporting web page address by testing result.
In one embodiment, when exist it is corresponding with the web page address reported crawl record when, then query webpage address The risk index of the included name entity of corresponding web document, and risk supervision knot is determined according to the risk index inquired Determining risk supervision result is sent to the current terminal for reporting web page address by fruit.
In one embodiment, server can be beaten after terminal has reported the web page address for the webpage of being accessed in terminal Before opening the webpage corresponding to the web page address reported, risk supervision just is fed back as a result, server can also be beaten in terminal to terminal It has opened webpage, during user browses webpage, has fed back risk supervision result to terminal.
As shown in figure 5, in one embodiment, to the web page address progress web page access for the accessed webpage that terminal reports The configuration diagram of risk supervision.With reference to Fig. 5, the application program installed in terminal reports institute to the unified query interface of server Access the web page address of webpage;The unified query interface of server is responsible for accessing each application program, such as browser or Instant Messenger Believe application program etc.;The web page address that terminal reports is got by the unified query interface of server, which is sent to After reptile module, the webpage text content of the web page address is crawled by reptile module, obtains web document, and will crawl Web document store into document library;Web document is analyzed by analysis module, identifies the name in web document Entity, and include the corresponding risk of web document calculating of the name entity and negative emotion content that recognize according to document library Index, and will be in the name entity that recognized and corresponding risk index update to name entity library;Last server can be with The risk supervision result determined according to risk index is fed back into terminal by unified query interface.
Above-mentioned web page access risk checking method, the web page address of the accessed webpage reported using terminal obtain corresponding Web document can get the web document of magnanimity, compared to manually by the web page address that each terminal history reports It is more accurate and efficient for collecting the web document for propagating various information.In this way, identifying that present terminal is accessed After name entity in the corresponding web document of webpage, so that it may with based in document library include the name entity and negative emotion The quantity of the web document of content determines the risk supervision of the accessed webpage of terminal as a result, terminal can be examined by the risk of feedback It surveys result and shows corresponding indicating risk in accessed webpage, a large amount of nets to interconnecting the various information of Web realease can be improved Page carries out the efficiency of risk supervision.
In one embodiment, as shown in fig. 6, step S208, includes name entity and negative emotion according to document library The quantity of the web document of content determines risk supervision as a result, specifically including following steps:
S602, statistic document library include name entity web document the first quantity, and including name entity with Second quantity of the web document of negative emotion content.
Wherein, the first quantity is the quantity of the web document including the name entity recognized stored in document library.The Two quantity are that is stored in document library includes the quantity of the web document for naming entity and negative emotion content recognized.Tool Body, server can be wrapped after recognizing name entity in current web page document in the web document of statistic document library storage Include the first quantity of the web document of the name entity recognized, and the name entity including recognizing and negative emotion content Web document the second quantity.
In one embodiment, server can traverse the web document stored in document library, be matched using canonical Mode query webpage document whether include the name entity that recognizes, to obtain the first quantity.
In one embodiment, when being often stored in one that terminal the reports web document not crawled in document library, clothes Whether business device can judge including negative emotion content the web document of addition, and by the result of judgement and the web document Corresponding storage.In this way, server only need to the web document that present terminal reports determine whether include negative emotion content i.e. Can, without judging all web documents in document library every time, the second quantity of server statistics can be facilitated.
S604 names the risk index of entity according to the ratio-dependent of the second quantity and the first quantity.
Wherein, the proportional representation of the second quantity and the first quantity negative emotion document ratio.Ratio is higher, risk index It is higher, that is, risk index and negative emotion document ratio positive correlation.Risk index has upper limit threshold and lower threshold.For example, wind Dangerous index can be the decimal between 0 to 1, and risk index can also be the numerical value between 0 to 100.
In one embodiment, risk index is the linear positive correlation function of negative emotion document ratio.For example risk refers to Number can be the direct proportion function of negative emotion document ratio, can also be the once linear function of negative emotion document ratio. Risk index can also be the positively nonlinear correlation function of negative emotion document ratio.
S606 determines risk supervision result based on the risk index of name entity.
In one embodiment, when the name entity recognized from web document there are it is multiple when, server can obtain To the corresponding risk index of each name entity, risk supervision result is determined based on highest risk index.
In one embodiment, server each name entity can correspond in one section of preset duration is to name entity library Risk index be updated, update used by step can be step S602 and step S604.In this way, when server identifies To name entity be present in name entity library when, the current risk index of the name entity can be directly based upon and determine that risk is examined Survey result.For example, server can week about or one day risk index to each name entity recorded in name entity library It is updated.
In the present embodiment, by statistic document library with the relevant negative emotion document ratio of name entity that recognizes Risk index is determined, so as to determining the corresponding web page access risk of the webpage comprising the name entity according to risk index Testing result.
In one embodiment, step S602 further includes specifically:When the name entity recognized does not exist in name entity When in library, then initialization includes the first quantity of the web document for naming entity, and including in name entity and negative emotion Second quantity of the web document of appearance;Traverse the web document in document library;When the web document of traversal includes name entity, First quantity is increased one certainly;When the web document of traversal includes name entity and negative emotion content, the second quantity is increased certainly One.
In one embodiment, when the name entity recognized does not exist in name entity library, server can order Name entity increases a record corresponding with the name entity recognized newly in library, and the risk index for initializing the name entity is Default value.Default value such as can be 50.
In one embodiment, when the name entity recognized does not exist in name entity library, server can be initial The first quantity for changing the web document that document library includes the name entity recognized is zero, and initialization document library includes identification To name entity and the second quantity of web document of negative emotion content be zero.
In one embodiment, server can be in corresponding first quantity of name entity that initialization recognizes, the second number It measures and after default value, the web document stored in document library can be traversed provided with risk index;In the webpage currently traversed When document includes the name entity recognized, the first quantity is increased one certainly;Include recognizing in the web document currently traversed When naming entity and negative emotion content, by the second quantity from increasing one, until at the end of traversal, so that it may at the end of according to traversal The first quantity and the second quantity update name entity risk index, and the risk that default value is recorded as in entity library will be named Index replaces with updated risk index.
It in one embodiment, can when the web document traversed in document library does not include the name entity recognized The risk index of the name entity in name entity library wouldn't be updated, that is, it is default value to name the risk index of entity, The risk index of the name entity is in and does not know, waits for the state further analyzed.
In the present embodiment, can be the name entity when the name entity recognized does not exist in name entity library Risk index use as default, also can determine that the risk of the name entity refers to according to the web document stored in document library Number.
In one embodiment, step S604 further includes specifically:Obtain the web document sum of document library;According to the second number Amount and the first ratio of the first quantity and the second ratio of the second quantity and web document sum, determine the wind of name entity Dangerous index;Risk index respectively with the first ratio and the second ratio positive correlation.
Wherein, the first proportional representation negative emotion document ratio, the second proportional representation negative emotion number of documents.The One ratio is higher, and risk index is higher;Similarly, the second ratio is higher, and risk index is also higher.
In one embodiment, server can be the first ratio and the corresponding weighting coefficient of the second ratio setting, the first ratio The sum of the weighting coefficient of example and the weighting coefficient of the second ratio are one, and server can be according to the first ratio of corresponding weighting coefficient pair It is weighted summation with the second ratio, according to the ratio-dependent risk index after weighted sum.
In one embodiment, according to the first ratio and the second quantity of the second quantity and the first quantity and webpage text Second ratio of shelves sum determines that the risk index of name entity includes:
Wherein, RI (e) is the corresponding risk indexs of name entity e;W and 1-w is nonnegative constant;M (e) is in document library Include the quantity of the web document of name entity e;N (e) is the webpage for including name entity e and negative emotion content in document library The quantity of document;T is the web document sum of document library.
As can be seen thatFor the first ratio,For the second ratio, by the first ratio and the second ratio according to corresponding Weighting coefficient to be weighted the value range of ratio obtained after summation be 0 to 1, can in order to preferably express risk index By obtained ratio multiplication by constants 100 again so that the value range of RI (e) is 0 to 100.
In one embodiment, server can be according to current w, i.e. the first ratio and the current weighting coefficient of the second ratio Come calculate recognize name entity risk index, risk supervision is determined according to risk index as a result, and judge the risk examine It whether accurate surveys result, records current w and the correspondence of accuracy rate, the size of w is adjusted according to the correspondence, to be promoted The accuracy rate of risk supervision result.
In the present embodiment, the risk index of entity is named not only to have with the negative document emotion ratio of the name entity It closes, it is also related comprising the negative quantity of document emotion of the name entity with document library, while considering that both is calculated Risk index it is more accurate.
In one embodiment, web page access risk checking method further includes:Entity is named when existing in name entity library When, the type that entity is named in entity library is named in inquiry;When inquiring the type of unartificial label, step S208 It specifically includes:Include the quantity update name entity for the web document for naming entity and negative emotion content according to document library Risk index, and risk supervision result is determined according to updated risk index;When inquiring the type of handmarking, The directly risk index of the pre-configuration of inquiry name entity, and risk supervision result is determined according to the risk index inquired.
Wherein, type is the mode for naming the risk index of each name entity marked in entity library to calculate.People Work type refers to that the risk index of the name entity is in manual confirmation and typing to name entity library.Unartificial label Type refers to that the risk index of the name entity is that typing is extremely named in entity library after server calculates automatically.
In one embodiment, it further includes type field, type field that entity is respectively named in name entity library Risk index for distinguishing name entity is the result that manual confirmation and typing or server calculate automatically.
In general, the accuracy of the risk index of handmarking's type higher than server calculate automatically as a result, thus, The priority of handmarking's type is higher than the result that server calculates automatically.That is, being artificial type for type Risk index is only updated by way of manual amendment.And be the risk index of unartificial type for type, Its accuracy is not as good as manual confirmation and typing in such a way that server calculates automatically as a result, it is desirable to be updated, certainly, It can be updated by way of manual amendment, be needed by the type of the risk index of the name entity after manual amendment Accordingly it is revised as handmarking's type.
In one embodiment, when the name entity recognized is not present with name entity library, server is by the life Name entity and the risk index to use as default are added to accordingly in name entity library, can also be corresponded to the name entity Type be set as unartificial type.
In the present embodiment, it is the corresponding type of each name entity setting up in name entity library, considers people Work confirm result and server calculate automatically as a result, it is possible to it is further promoted it is each name entity risk index it is accurate Degree.
As shown in fig. 7, the stream being detected by the access risk of the access webpage reported to terminal in one embodiment Journey schematic diagram.Flow starts, and when user accesses webpage, the web page address of accessed webpage is reported to server by terminal, by Server does web page access risk supervision.Server first judges whether the web page address is climbed after receiving web page address It took, and if so, terminating this risk supervision, directly returned to the risk supervision result corresponding with the web page address of storage;If It is no, then the web page address is sent to reptile module, the corresponding web document of webpage is downloaded by reptile module, is extracted by analysis module The webpage text content for analyzing web document identifies all name entities included in the web document;Analysis module judges Whether the name entity identified is all present in name entity library, if so, terminating this risk supervision, directly returns The risk supervision result corresponding with each name risk index of entity of storage;If it is not, then the name entity recognized is added It adds in name entity library, and the name entity initialization document library to increase newly includes the number of the web document of the name entity Amount M and M includes that the quantity N of the web document of negative emotion content is zero;Then the web document in document library is traversed, if There is also the web document for including the name entity recognized in document library, then M is updated;If current web page document also wraps Negative emotion content is included, then N is updated, until there is no the web documents for including the name entity recognized in document library When, then the risk index of the name entity is calculated according to last M and N, and update into name entity Kuku, and according to risk After index determines risk supervision result, by server feedback to terminal.
As shown in figure 8, in a specific embodiment, web page access risk checking method specifically includes following steps:
S802 receives the web page address for the accessed webpage that terminal reports.
S804, when presence is corresponding with web page address crawls record, the corresponding web document in query webpage address is wrapped The risk index of name entity is included, and risk supervision result is determined according to the risk index inquired.
S806, when there is no it is corresponding with web page address crawl record when, crawl the web document corresponding to web page address, The web document crawled is stored into document library.
S808 obtains the corresponding term vector of each word in web document.
The term vector of each word is input in the Named Entity Extraction Model of pre-training by S810, and output obtains web page text The corresponding entity type of each word in content.
S812, using the word for belonging to default entity type as the name entity recognized from web document.
S814, when the name entity recognized is present in name entity library, then statistic document library includes that name is real First quantity of the web document of body, and including name entity and negative emotion content web document the second quantity.
S816 obtains the web document sum of document library.
S818, according to the of the first ratio and the second quantity of the second quantity and the first quantity and web document sum Two ratios determine the risk index of name entity;Risk supervision result is determined based on the risk index of name entity.
S820, when the name entity recognized does not exist in name entity library, then initialization includes name entity First quantity of web document, and including name entity and negative emotion content web document the second quantity.
S822 traverses the web document in document library.
First quantity is increased one by S824 certainly when the web document of traversal includes name entity.
Second quantity is increased one by S826 certainly when the web document of traversal includes name entity and negative emotion content.
S828 obtains the web document sum of document library.
S830, according to the first ratio and the second quantity and web document of updated second quantity and the first quantity Second ratio of sum determines the risk index of name entity;Risk supervision result is determined based on the risk index of name entity.
Above-mentioned web page access risk checking method, the web page address of the accessed webpage reported using terminal obtain corresponding Web document can get the web document of magnanimity, compared to manually by the web page address that each terminal history reports It is more accurate and efficient for collecting the web document for propagating various information.In this way, identifying that present terminal is accessed After name entity in the corresponding web document of webpage, so that it may with based in document library include the name entity and negative emotion The quantity of the web document of content determines the risk supervision of the accessed webpage of terminal as a result, terminal can be examined by the risk of feedback It surveys result and shows corresponding indicating risk in accessed webpage, a large amount of nets to interconnecting the various information of Web realease can be improved Page carries out the efficiency of risk supervision.
Fig. 8 is the flow diagram of web page access risk checking method in one embodiment.Although should be understood that figure Each step in 8 flow chart is shown successively according to the instruction of arrow, but these steps are not necessarily to refer to according to arrow The sequence shown executes successively.Unless expressly stating otherwise herein, there is no stringent sequences to limit for the execution of these steps, this A little steps can execute in other order.Moreover, at least part step in figure X may include multiple sub-steps or more A stage, these sub-steps or stage are not necessarily to execute completion in synchronization, but can hold at different times Row, the execution sequence in these sub-steps either stage be also not necessarily carry out successively but can be with other steps or other At least part in the sub-step of step either stage executes in turn or alternately.
As shown in figure 9, in one embodiment, providing a kind of web page access indicating risk method.The present embodiment is main It is illustrated in this way applied to the terminal 110 in above-mentioned Fig. 1.With reference to Fig. 9, the web page access indicating risk method is specific Include the following steps:
S902 obtains the web page address of accessed webpage.
S904 reports web page address to server.
Specifically, terminal can be in user triggers and opens browser after the corresponding instruction of some webpage, by the webpage pair The web page address answered reports to server, the web page address that server receiving terminal reports.Terminal can also be worked as in user in browsing When preceding webpage, the address of the webpage of browsing is reported into server, the web page address that server receiving terminal reports.
S906 receives the risk supervision result of server feedback;Risk supervision result includes that name is real according to document library The quantity of the web document of body and negative emotion content determines, and entity is named to be present in the web document corresponding to web page address In;Document library includes the corresponding web document of web page address that each terminal history reports.
Wherein, risk supervision is as a result, be to carry out the result that risk supervision obtains to the accessed webpage currently reported.One In a embodiment, when the corresponding web document of the accessed webpage of terminal includes multiple name entities, server can basis Maximum risk index comes true in the risk index that the quantity of web document including name entity and negative emotion content determines Risk supervision is determined as a result, server can also determine risk supervision according to the weighted sum of the risk index of this multiple name entity As a result.
Document library is the database for storing the corresponding web document of web page address that each terminal history reports.Server Can inquire whether crawled the web page address after getting web page address, crawled when there is no corresponding with the web page address When record, the web document corresponding to the web page address is just crawled, the web document crawled is stored into document library.In this way, The case where can avoid the corresponding web document of the same webpage of repeated downloads in different terminal accesses identical webpage.
Name entity (Named Entity) be web document webpage text content in name, place name, organization Name and other entities with entitled mark.
S908, when risk supervision result indicates, there are when risk, indicating risk to be shown by risk supervision result.
For example, terminal can according to the risk supervision result of server feedback be " high risk webpage " or " malicious web pages " clear It pops up corresponding prompting frame on the webpage look at, prompts risk existing for the webpage that user browsed.
It in one embodiment, can when the risk supervision result that terminal receives indicates that risk or relatively low risk is not present Not show any indicating risk.
In one embodiment, risk supervision result is determined according to the risk index of name entity;The calculating of risk index Step includes:Statistic document library includes the first quantity of the web document for naming entity, and including name entity and negatively Second quantity of the web document of affective content;Obtain the web document sum of document library;According to the second quantity and the first quantity The first ratio and the second quantity and web document sum the second ratio, determine name entity risk index;Risk refers to Number respectively with the first ratio and the second ratio positive correlation.
Above-mentioned web page access indicating risk method, after the web page address for getting the accessed webpage of user, by webpage Location reports to server, and the web page address that server can be reported by history gets the web document of magnanimity, compared to people It is more accurate and efficient for work mode collects the web document for propagating various information.In this manner it is possible to be based on document library In include that the quantity of web document of name entity and negative emotion content determines the risk supervision knot of the accessed webpage of terminal Fruit, terminal can show corresponding indicating risk by the risk supervision result of feedback in accessed webpage, can improve pair The a large amount of webpages for interconnecting the various information of Web realease carry out the efficiency of risk supervision.
As shown in Figure 10, in one embodiment, a kind of web page access risk supervision device 1000, device tool are provided Body includes:Web page address receiving module 1002, name Entity recognition module 1006, determines mould at web document acquisition module 1004 Block 1008 and feedback module 1010, wherein:
Web page address receiving module 1002, the web page address for receiving the accessed webpage that terminal reports.
Web document acquisition module 1004, for obtaining the web document corresponding to web page address.
Entity recognition module 1006 is named, for identification the name entity in web document.
Determining module 1008, for including the number for the web document for naming entity and negative emotion content according to document library Amount, determines risk supervision result;Document library includes the corresponding web document of web page address that each terminal history reports.
Feedback module 1010, for feeding back risk supervision result to the terminal corresponding to currently received web page address.
In one embodiment, web document acquisition module 1004 is additionally operable to crawl when there is no corresponding with web page address When record, the web document corresponding to web page address is crawled, the web document crawled is stored into document library.
In one embodiment, web page access risk supervision device 1000 further includes risk index enquiry module, and risk refers to Number enquiry modules be used for when exist it is corresponding with web page address crawl record when, the corresponding web document in query webpage address is wrapped The risk index of name entity is included, and risk supervision result is determined according to the risk index inquired.
In one embodiment, risk index enquiry module is additionally operable to be present in name entity when the name entity recognized When in library, then the risk index for the name entity that directly inquiry recognizes;Risk supervision is determined according to the risk index inquired As a result.
In one embodiment, it includes the web document for naming entity that determining module 1008, which is additionally operable to statistic document library, First quantity, and including name entity and negative emotion content web document the second quantity;According to the second quantity and the The risk index of the ratio-dependent name entity of one quantity;Risk supervision result is determined based on the risk index of name entity.
In one embodiment, determining module 1008 is additionally operable to not exist in name entity library when the name entity recognized When middle, then initialization includes the first quantity of the web document for naming entity, and including name entity and negative emotion content Web document the second quantity;Traverse the web document in document library;It, will when the web document of traversal includes name entity First quantity increases one certainly;When the web document of traversal includes name entity and negative emotion content, the second quantity is increased one certainly.
In one embodiment, determining module 1008 is additionally operable to obtain the web document sum of document library;According to the second number Amount and the first ratio of the first quantity and the second ratio of the second quantity and web document sum, determine the wind of name entity Dangerous index;Risk index respectively with the first ratio and the second ratio positive correlation.
In one embodiment, determining module 1008 is additionally operable to determine the risk index of name entity according to following formula:Wherein, RI (e) is the corresponding risk indexs of name entity e;W and 1-w It is nonnegative constant;M (e) is the quantity of the web document comprising name entity e in document library;N (e) is in document library comprising life The quantity of the web document of name entity e and negative emotion content;T is the web document sum of document library.
In one embodiment, name Entity recognition module 1006 be additionally operable to obtain the corresponding word of each word in web document to Amount;The term vector of each word is input in the Named Entity Extraction Model of pre-training, output obtains each word in webpage text content Corresponding entity type;Using the word for belonging to default entity type as the name entity recognized from web document.
In one embodiment, web page access risk supervision device 1000 further includes type enquiry module, marking class Type enquiry module is used to, when naming in entity library in the presence of name entity, the marking class of entity is named in inquiry name entity library Type;Include naming the net of entity and negative emotion content according to document library when inquiring the type of unartificial label The quantity of page document, determines that risk supervision result includes:Include naming the net of entity and negative emotion content according to document library The risk index of the quantity update name entity of page document, and risk supervision result is determined according to updated risk index;When When inquiring the type of handmarking, the direct risk index of the pre-configuration of inquiry name entity, and according to inquiring Risk index determines risk supervision result.
Above-mentioned web page access risk supervision device 1000 utilizes the web page address acquisition pair for the accessed webpage that terminal reports The web document answered can get the web document of magnanimity, compared to artificial by the web page address that each terminal history reports It is more accurate and efficient for mode collects the web document for propagating various information.In this way, identifying present terminal institute After accessing the name entity in the corresponding web document of webpage, so that it may with based in document library include the name entity and it is negative The quantity of the web document of affective content determines the risk supervision of the accessed webpage of terminal as a result, terminal can be by the wind of feedback Dangerous testing result shows corresponding indicating risk in accessed webpage, can improve to the big of the interconnection various information of Web realease Measure the efficiency that webpage carries out risk supervision.
As shown in figure 11, in one embodiment, a kind of web page access indicating risk device 1100, device tool are provided Body includes:Web page address acquisition module 1102, web page address reporting module 1104, receiving module 1106 and reminding module 1108, Wherein:
Web page address acquisition module 1102, the web page address for obtaining accessed webpage.
Web page address reporting module 1104, for reporting web page address to server.
Receiving module 1106, the risk supervision result for receiving server feedback;Risk supervision result is according to document library Include the quantity determination for the web document for naming entity and negative emotion content, and it is right to name entity to be present in web page address institute In the web document answered;Document library includes the corresponding web document of web page address that each terminal history reports.
Reminding module 1108, for indicating there are when risk when risk supervision result, by risk supervision result display risk Prompt.
In one embodiment, risk supervision result is determined according to the risk index of name entity;The calculating of risk index Step includes:Statistic document library includes the first quantity of the web document for naming entity, and including name entity and negatively Second quantity of the web document of affective content;Obtain the web document sum of document library;According to the second quantity and the first quantity The first ratio and the second quantity and web document sum the second ratio, determine name entity risk index;Risk refers to Number respectively with the first ratio and the second ratio positive correlation.
Above-mentioned web page access indicating risk device 1100, after the web page address for getting the accessed webpage of user, by net Page address reports to server, and the web page address that server can be reported by history gets the web document of magnanimity, compared to It is more accurate and efficient for collecting the web document for propagating various information manually.In this manner it is possible to based on text In shelves library includes that the quantity of the web document of name entity and negative emotion content determines the risk inspection of the accessed webpage of terminal It surveys as a result, terminal can show corresponding indicating risk, Neng Gouti by the risk supervision result of feedback in accessed webpage Height carries out a large amount of webpages for interconnecting the various information of Web realease the efficiency of risk supervision.
Figure 12 shows the internal structure chart of one embodiment Computer equipment.The computer equipment can be specifically figure Server 120 in 1.As shown in figure 12, it includes being connected by system bus which, which includes the computer equipment, Processor, memory, network interface.Wherein, memory includes non-volatile memory medium and built-in storage.The computer equipment Non-volatile memory medium be stored with operating system, can also be stored with computer program, which is held by processor When row, processor may make to realize web page access risk checking method.Also computer program can be stored in the built-in storage, it should When computer program is executed by processor, processor may make to execute web page access risk checking method.
Figure 13 shows the internal structure chart of one embodiment Computer equipment.The computer equipment can be specifically figure Terminal 110 in 1.As shown in figure 13, it includes the place connected by system bus which, which includes the computer equipment, Manage device, memory, network interface, input unit and display screen.Wherein, memory includes non-volatile memory medium and interior storage Device.The non-volatile memory medium of the computer equipment is stored with operating system, can also be stored with computer program, the computer When program is executed by processor, processor may make to realize web page access indicating risk method.It can also be stored in the built-in storage There is computer program, when which is executed by processor, processor may make to execute web page access indicating risk method. The display screen of computer equipment can be liquid crystal display or electric ink display screen, and the input unit of computer equipment can be with It is the touch layer covered on display screen, can also be the button being arranged on computer equipment shell, trace ball or Trackpad, may be used also To be external keyboard, Trackpad or mouse etc..
It will be understood by those skilled in the art that structure shown in Figure 12 or 13, only relevant with application scheme The block diagram of part-structure does not constitute the restriction for the computer equipment being applied thereon to application scheme, specific to calculate Machine equipment may include either combining certain components or with different components than more or fewer components as shown in the figure Arrangement.
In one embodiment, web page access risk supervision device 1000 provided by the present application can be implemented as a kind of calculating The form of machine program, computer program can be run on computer equipment as shown in figure 12.In the memory of computer equipment The each program module for forming the web page access risk supervision device can be stored, for example, web page address shown in Fig. 10 receives mould Block 1002, web document acquisition module 1004, name Entity recognition module 1006, determining module 1008 and feedback module 1010. The computer program that each program module is constituted makes processor execute each embodiment of the application described in this specification Step in web page access risk checking method.
For example, computer equipment shown in Figure 12 can be by web page access risk supervision device as shown in Figure 10 Web page address receiving module 1002 executes step S202.Computer equipment can execute step by web document acquisition module 1004 S204.Computer equipment can be by naming Entity recognition module 1006 to execute step S206.Computer equipment can be by determining mould Block 1008 executes step S208.Computer equipment can execute step S210 by feedback module 1010.
In one embodiment, a kind of computer equipment, including memory and processor are provided, memory is stored with meter Calculation machine program, when computer program is executed by processor so that processor executes following steps:What reception terminal reported is accessed The web page address of webpage;Obtain the web document corresponding to web page address;Identify the name entity in web document;According to document Library includes the quantity for the web document for naming entity and negative emotion content, determines risk supervision result;Document library includes each The corresponding web document of web page address that terminal history reports;Risk is fed back to the terminal corresponding to currently received web page address Testing result.
In one embodiment, computer program is executed by processor the step for obtaining the web document corresponding to web page address When rapid so that processing implement body executes following steps:When there is no it is corresponding with web page address crawl record when, with crawling webpage Web document corresponding to location stores the web document crawled into document library;Method further includes:When presence and web page address It is corresponding when crawling record, the risk index of the included name entity of the corresponding web document in query webpage address, and according to looking into The risk index ask determines risk supervision result.
In one embodiment, when computer program is executed by processor so that processor also executes following steps:Work as knowledge When the name entity being clipped to is present in name entity library, then the risk index for the name entity that directly inquiry recognizes;According to The risk index inquired determines risk supervision result.
In one embodiment, it includes name entity and negative feelings that computer program, which is executed by processor according to document library, The quantity for feeling the web document of content, when determining the step of risk supervision result so that processing implement body executes following steps:System Meter document library includes the first quantity of the web document for naming entity, and the net including naming entity and negative emotion content Second quantity of page document;The risk index of entity is named according to the ratio-dependent of the second quantity and the first quantity;Based on name The risk index of entity determines risk supervision result.
In one embodiment, it includes the webpage text for naming entity that computer program, which is executed by processor statistic document library, First quantity of shelves, and when the step of the second quantity of the web document including name entity and negative emotion content so that It handles implement body and executes following steps:When the name entity recognized does not exist in name entity library, then initialization includes Name the first quantity of the web document of entity, and the second number of the web document including name entity and negative emotion content Amount;Traverse the web document in document library;When the web document of traversal includes name entity, the first quantity is increased one certainly;When When the web document of traversal includes name entity and negative emotion content, the second quantity is increased one certainly.
In one embodiment, computer program is executed by processor the ratio-dependent according to the second quantity and the first quantity When naming the step of the risk index of entity so that processing implement body executes following steps:The web document for obtaining document library is total Number;According to the second ratio of the first ratio and the second quantity of the second quantity and the first quantity and web document sum, determine Name the risk index of entity;Risk index respectively with the first ratio and the second ratio positive correlation.
In one embodiment, according to the first ratio and the second quantity of the second quantity and the first quantity and webpage text Second ratio of shelves sum determines that the risk index of name entity includes: Wherein, RI (e) is the corresponding risk indexs of name entity e;W and 1-w is nonnegative constant;M (e) is in document library comprising life The quantity of the web document of name entity e;N (e) is the web document for including name entity e and negative emotion content in document library Quantity;T is the web document sum of document library.
In one embodiment, computer program is executed by processor the step of name entity in identification web document When so that processing implement body executes following steps:Obtain the corresponding term vector of each word in web document;The term vector of each word is defeated Enter into the Named Entity Extraction Model of pre-training, output obtains the corresponding entity type of each word in webpage text content;It will belong to In default entity type word as the name entity recognized from web document.
In one embodiment, when computer program is executed by processor so that processor also executes following steps:By net Webpage text content in page document is divided into multiple sentences;Screening includes the sentence of name entity from multiple sentences;Work as packet When there is default negative keyword in the sentence of the entity containing name, then judge that web document includes negative emotion content.
In one embodiment, when computer program is executed by processor so that processor also executes following steps:Work as life When there is name entity in name entity library, the type of entity is named in inquiry name entity library;When inquiring unartificial mark When the type of note, includes the quantity for the web document for naming entity and negative emotion content according to document library, determine wind Dangerous testing result includes:Name is updated according to the quantity that document library includes the web document for naming entity and negative emotion content The risk index of entity, and risk supervision result is determined according to updated risk index;When the label for inquiring handmarking When type, the risk index of the pre-configuration of entity is named in directly inquiry, and determines risk supervision according to the risk index inquired As a result.
The web page address of above computer equipment, the accessed webpage reported using terminal obtains corresponding web document, The web document that magnanimity can be got by the web page address that each terminal history reports, compared to being collected manually for passing It is more accurate and efficient for the web document for broadcasting various information.In this way, identifying that the accessed webpage of present terminal is corresponding After name entity in web document, so that it may with based on the webpage for including the name entity and negative emotion content in document library The quantity of document determines the risk supervision of the accessed webpage of terminal as a result, terminal can be by the risk supervision result of feedback in institute It accesses in webpage and shows corresponding indicating risk, can improve and risk is carried out to a large amount of webpages for interconnecting the various information of Web realease The efficiency of detection.
In one embodiment, a kind of computer readable storage medium is provided, computer program is stored thereon with, is calculated When machine program is executed by processor so that processor executes following steps:With receiving the webpage for the accessed webpage that terminal reports Location;Obtain the web document corresponding to web page address;
Identify the name entity in web document;Include naming the webpage of entity and negative emotion content according to document library The quantity of document determines risk supervision result;Document library includes the corresponding web document of web page address that each terminal history reports; Risk supervision result is fed back to the terminal corresponding to currently received web page address.
In one embodiment, computer program is executed by processor the step for obtaining the web document corresponding to web page address When rapid so that processing implement body executes following steps:When there is no it is corresponding with web page address crawl record when, with crawling webpage Web document corresponding to location stores the web document crawled into document library;Method further includes:When presence and web page address It is corresponding when crawling record, the risk index of the included name entity of the corresponding web document in query webpage address, and according to looking into The risk index ask determines risk supervision result.
In one embodiment, when computer program is executed by processor so that processor also executes following steps:Work as knowledge When the name entity being clipped to is present in name entity library, then the risk index for the name entity that directly inquiry recognizes;According to The risk index inquired determines risk supervision result.
In one embodiment, it includes name entity and negative feelings that computer program, which is executed by processor according to document library, The quantity for feeling the web document of content, when determining the step of risk supervision result so that processing implement body executes following steps:System Meter document library includes the first quantity of the web document for naming entity, and the net including naming entity and negative emotion content Second quantity of page document;The risk index of entity is named according to the ratio-dependent of the second quantity and the first quantity;Based on name The risk index of entity determines risk supervision result.
In one embodiment, it includes the webpage text for naming entity that computer program, which is executed by processor statistic document library, First quantity of shelves, and when the step of the second quantity of the web document including name entity and negative emotion content so that It handles implement body and executes following steps:When the name entity recognized does not exist in name entity library, then initialization includes Name the first quantity of the web document of entity, and the second number of the web document including name entity and negative emotion content Amount;Traverse the web document in document library;When the web document of traversal includes name entity, the first quantity is increased one certainly;When When the web document of traversal includes name entity and negative emotion content, the second quantity is increased one certainly.
In one embodiment, computer program is executed by processor the ratio-dependent according to the second quantity and the first quantity When naming the step of the risk index of entity so that processing implement body executes following steps:The web document for obtaining document library is total Number;According to the second ratio of the first ratio and the second quantity of the second quantity and the first quantity and web document sum, determine Name the risk index of entity;Risk index respectively with the first ratio and the second ratio positive correlation.
In one embodiment, according to the first ratio and the second quantity of the second quantity and the first quantity and webpage text Second ratio of shelves sum determines that the risk index of name entity includes: Wherein, RI (e) is the corresponding risk indexs of name entity e;W and 1-w is nonnegative constant;M (e) is in document library comprising life The quantity of the web document of name entity e;N (e) is the web document for including name entity e and negative emotion content in document library Quantity;T is the web document sum of document library.
In one embodiment, computer program is executed by processor the step of name entity in identification web document When so that processing implement body executes following steps:Obtain the corresponding term vector of each word in web document;The term vector of each word is defeated Enter into the Named Entity Extraction Model of pre-training, output obtains the corresponding entity type of each word in webpage text content;It will belong to In default entity type word as the name entity recognized from web document.
In one embodiment, when computer program is executed by processor so that processor also executes following steps:By net Webpage text content in page document is divided into multiple sentences;Screening includes the sentence of name entity from multiple sentences;Work as packet When there is default negative keyword in the sentence of the entity containing name, then judge that web document includes negative emotion content.
In one embodiment, when computer program is executed by processor so that processor also executes following steps:Work as life When there is name entity in name entity library, the type of entity is named in inquiry name entity library;When inquiring unartificial mark When the type of note, includes the quantity for the web document for naming entity and negative emotion content according to document library, determine wind Dangerous testing result includes:Name is updated according to the quantity that document library includes the web document for naming entity and negative emotion content The risk index of entity, and risk supervision result is determined according to updated risk index;When the label for inquiring handmarking When type, the risk index of the pre-configuration of entity is named in directly inquiry, and determines risk supervision according to the risk index inquired As a result.
The web page address of above computer readable storage medium storing program for executing, the accessed webpage reported using terminal obtains corresponding net Page document, can get the web document of magnanimity, compared to searching manually by the web page address that each terminal history reports It is more accurate and efficient for collecting the web document for propagating various information.In this way, identifying the accessed net of present terminal After name entity in the corresponding web document of page, so that it may with based in document library including in the name entity and negative emotion The quantity of the web document of appearance determines the risk supervision of the accessed webpage of terminal as a result, terminal can be by the risk supervision of feedback As a result corresponding indicating risk is shown in accessed webpage, can improve a large amount of webpages to interconnecting the various information of Web realease Carry out the efficiency of risk supervision.
In one embodiment, web page access indicating risk device 1100 provided by the present application can be implemented as a kind of calculating The form of machine program, computer program can be run on computer equipment as shown in fig. 13 that.In the memory of computer equipment The each program module for forming the web page access indicating risk device can be stored, for example, web page address shown in Figure 11 obtains mould Block 1102, web page address reporting module 1104, receiving module 1106 and reminding module 1108.The meter that each program module is constituted Calculation machine program makes processor execute the web page access indicating risk method of each embodiment of the application described in this specification In step.
For example, computer equipment shown in Figure 13 can be by web page access indicating risk device as shown in figure 11 Web page address acquisition module 1102 executes step S902.Computer equipment can execute step by web page address reporting module 1104 S904.Computer equipment can execute step S906 by receiving module 1106.Computer equipment can be held by reminding module 1108 Row step S908.
In one embodiment, a kind of computer equipment, including memory and processor are provided, memory is stored with meter Calculation machine program, when computer program is executed by processor so that processor executes following steps:Obtain the webpage of accessed webpage Address;Web page address is reported to server;Receive the risk supervision result of server feedback;Risk supervision result is according to document library Include the quantity determination for the web document for naming entity and negative emotion content, and it is right to name entity to be present in web page address institute In the web document answered;Document library includes the corresponding web document of web page address that each terminal history reports;When risk supervision knot Fruit indicates, there are when risk, indicating risk to be shown by risk supervision result.
In one embodiment, risk supervision result is determined according to the risk index of name entity;The calculating of risk index Step includes:Statistic document library includes the first quantity of the web document for naming entity, and including name entity and negatively Second quantity of the web document of affective content;Obtain the web document sum of document library;According to the second quantity and the first quantity The first ratio and the second quantity and web document sum the second ratio, determine name entity risk index;Risk refers to Number respectively with the first ratio and the second ratio positive correlation.
Web page address is reported to clothes by above computer equipment after the web page address for getting the accessed webpage of user It is engaged in device, the web page address that server can be reported by history gets the web document of magnanimity, compared to collecting manually It is more accurate and efficient for web document for propagating various information.In this manner it is possible to based in document library include life The quantity of the web document of name entity and negative emotion content determines the risk supervision of the accessed webpage of terminal as a result, terminal can To show corresponding indicating risk in accessed webpage by the risk supervision result of feedback, can improve to interconnecting Web realease A large amount of webpages of various information carry out the efficiency of risk supervision.
In one embodiment, a kind of computer readable storage medium is provided, computer program is stored thereon with, is calculated When machine program is executed by processor so that processor executes following steps:Obtain the web page address of accessed webpage;To server Report web page address;Receive the risk supervision result of server feedback;Risk supervision result includes that name is real according to document library The quantity of the web document of body and negative emotion content determines, and entity is named to be present in the web document corresponding to web page address In;Document library includes the corresponding web document of web page address that each terminal history reports;When risk supervision result indicates that there are wind When dangerous, indicating risk is shown by risk supervision result.
In one embodiment, risk supervision result is determined according to the risk index of name entity;The calculating of risk index Step includes:Statistic document library includes the first quantity of the web document for naming entity, and including name entity and negatively Second quantity of the web document of affective content;Obtain the web document sum of document library;According to the second quantity and the first quantity The first ratio and the second quantity and web document sum the second ratio, determine name entity risk index;Risk refers to Number respectively with the first ratio and the second ratio positive correlation.
Above computer readable storage medium storing program for executing, after the web page address for getting the accessed webpage of user, by web page address Server is reported to, the web page address that server can be reported by history gets the web document of magnanimity, compared to artificial It is more accurate and efficient for mode collects the web document for propagating various information.In this manner it is possible to based in document library Include name entity and negative emotion content web document quantity determine the accessed webpage of terminal risk supervision as a result, Terminal can show corresponding indicating risk by the risk supervision result of feedback in accessed webpage, can improve to interconnection A large amount of webpages of the various information of Web realease carry out the efficiency of risk supervision.
One of ordinary skill in the art will appreciate that realizing all or part of flow in above-described embodiment method, being can be with Relevant hardware is instructed to complete by computer program, the program can be stored in a non-volatile computer and can be read In storage medium, the program is when being executed, it may include such as the flow of the embodiment of above-mentioned each method.Wherein, provided herein Each embodiment used in any reference to memory, storage, database or other media, may each comprise non-volatile And/or volatile memory.Nonvolatile memory may include that read-only memory (ROM), programming ROM (PROM), electricity can be compiled Journey ROM (EPROM), electrically erasable ROM (EEPROM) or flash memory.Volatile memory may include random access memory (RAM) or external cache.By way of illustration and not limitation, RAM is available in many forms, such as static state RAM (SRAM), dynamic ram (DRAM), synchronous dram (SDRAM), double data rate sdram (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronization link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) directly RAM (RDRAM), straight Connect memory bus dynamic ram (DRDRAM) and memory bus dynamic ram (RDRAM) etc..
Each technical characteristic of above example can be combined arbitrarily, to keep description succinct, not to above-described embodiment In each technical characteristic it is all possible combination be all described, as long as however, the combination of these technical characteristics be not present lance Shield is all considered to be the range of this specification record.
The several embodiments of the application above described embodiment only expresses, the description thereof is more specific and detailed, but simultaneously Cannot the limitation to the application the scope of the claims therefore be interpreted as.It should be pointed out that for those of ordinary skill in the art For, under the premise of not departing from the application design, various modifications and improvements can be made, these belong to the guarantor of the application Protect range.Therefore, the protection domain of the application patent should be determined by the appended claims.

Claims (15)

1. a kind of web page access risk checking method, including:
Receive the web page address for the accessed webpage that terminal reports;
Obtain the web document corresponding to the web page address;
Identify the name entity in the web document;
Include the quantity of the web document of the name entity and negative emotion content according to document library, determines risk supervision knot Fruit;The document library includes the corresponding web document of web page address that each terminal history reports;
The risk supervision result is fed back to the terminal corresponding to currently received web page address.
2. according to the method described in claim 1, it is characterized in that, the web document obtained corresponding to the web page address Including:
When there is no it is corresponding with the web page address crawl record when, crawl the web document corresponding to the web page address, The web document crawled is stored into document library;
The method further includes:
When presence is corresponding with the web page address crawls record, inquire included by the corresponding web document of the web page address The risk index of entity is named, and risk supervision result is determined according to the risk index inquired.
3. according to the method described in claim 1, it is characterized in that, the method further includes:
When the name entity recognized is present in name entity library, then
The risk index for the name entity that directly inquiry recognizes;
Risk supervision result is determined according to the risk index inquired.
4. according to the method described in claim 1, it is characterized in that, described according to document library includes the name entity and negative The quantity of the web document of face affective content determines that risk supervision result includes:
Statistic document library includes the first quantity of the web document of the name entity, and including the name entity and bears Second quantity of the web document of face affective content;
According to the risk index for naming entity described in the ratio-dependent of second quantity and first quantity;
Risk supervision result is determined based on the risk index of the name entity.
5. according to the method described in claim 4, it is characterized in that, the statistic document library includes the net of the name entity First quantity of page document, and the second quantity of web document including the name entity and negative emotion content includes:
When the name entity recognized does not exist in name entity library, then
Initialization includes the first quantity of the web document of the name entity, and including the name entity and negative emotion Second quantity of the web document of content;
Traverse the web document in the document library;
When the web document of traversal includes the name entity, first quantity is increased one certainly;
When the web document of traversal includes the name entity and negative emotion content, second quantity is increased one certainly.
6. according to the method described in claim 4, it is characterized in that, described according to second quantity and first quantity Described in ratio-dependent name entity risk index include:
Obtain the web document sum of the document library;
It is total according to the first ratio and second quantity of second quantity and first quantity and the web document The second several ratios determines the risk index of the name entity;The risk index respectively with first ratio and described Second ratio positive correlation.
7. according to the method described in claim 6, it is characterized in that, described according to second quantity and first quantity Second ratio of the first ratio and second quantity and web document sum determines the risk of the name entity Index includes:
Wherein, RI (e) is the corresponding risk indexs of name entity e;W and 1-w is nonnegative constant;M (e) is in document library Name the quantity of the web document of entity e;N (e) is the web document for including name entity e and negative emotion content in document library Quantity;T is the web document sum of document library.
8. method according to any one of claims 1 to 7, which is characterized in that the life in the identification web document Name entity include:
Obtain the corresponding term vector of each word in the web document;
The term vector of each word is input in the Named Entity Extraction Model of pre-training, output obtains in the webpage text content The corresponding entity type of each word;
Using the word for belonging to default entity type as the name entity recognized from the web document.
9. method according to any one of claims 1 to 7, which is characterized in that the method further includes:
Webpage text content in the web document is divided into multiple sentences;
Screening includes the sentence of the name entity from multiple sentences;
When there is default negative keyword in the sentence comprising the name entity, then judge that the web document includes negative Affective content.
10. method according to any one of claims 1 to 7, which is characterized in that the method further includes:
When there are when the name entity, inquire the marking class for naming entity described in the name entity library in name entity library Type;
When inquiring the type of unartificial label, described according to document library includes the name entity and negative emotion The quantity of the web document of content determines that risk supervision result includes:According to document library include the name entity and negative The quantity of the web document of affective content updates the risk index of the name entity, and is determined according to updated risk index Risk supervision result;
When inquiring the type of handmarking, the risk index of the pre-configuration of the name entity, and root are directly inquired Risk supervision result is determined according to the risk index inquired.
11. a kind of web page access indicating risk method, including:
Obtain the web page address of accessed webpage;
The web page address is reported to server;
Receive the risk supervision result of the server feedback;The risk supervision result includes name entity according to document library It is determined with the quantity of the web document of negative emotion content, and the name entity is present in the net corresponding to the web page address In page document;The document library includes the corresponding web document of web page address that each terminal history reports;
When the risk supervision result indicates, there are when risk, indicating risk to be shown by the risk supervision result.
12. according to the method for claim 11, which is characterized in that the risk supervision result is according to the name entity Risk index determines;The calculating step of the risk index includes:
Statistic document library includes the first quantity of the web document of the name entity, and including the name entity and bears Second quantity of the web document of face affective content;
Obtain the web document sum of the document library;
It is total according to the first ratio and second quantity of second quantity and first quantity and the web document The second several ratios determines the risk index of the name entity;The risk index respectively with first ratio and described Second ratio positive correlation.
13. a kind of web page access risk supervision device, described device include:
Web page address receiving module, the web page address for receiving the accessed webpage that terminal reports;
Web document acquisition module, for obtaining the web document corresponding to the web page address;
Entity recognition module is named, for identification the name entity in the web document;
Determining module, for including the quantity of the web document for naming entity and negative emotion content according to document library, Determine risk supervision result;The document library includes the corresponding web document of web page address that each terminal history reports;
Feedback module, for feeding back the risk supervision result to the terminal corresponding to currently received web page address.
14. a kind of web page access indicating risk device, described device include:
Web page address acquisition module, the web page address for obtaining accessed webpage;
Web page address reporting module, for reporting the web page address to server;
Receiving module, the risk supervision result for receiving the server feedback;The risk supervision result is according to document library Include the quantity determination for the web document for naming entity and negative emotion content, and the name entity is present in the webpage In web document corresponding to address;The document library includes the corresponding web document of web page address that each terminal history reports;
Reminding module, for indicating there are when risk when the risk supervision result, by risk supervision result display risk Prompt.
15. a kind of computer equipment, including memory and processor, the memory is stored with computer program, the calculating When machine program is executed by the processor so that the processor is executed such as any one of claim 1 to 12 the method Step.
CN201810416471.9A 2018-05-03 2018-05-03 Webpage access risk detection and prompting method and device and computer equipment Active CN108647281B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810416471.9A CN108647281B (en) 2018-05-03 2018-05-03 Webpage access risk detection and prompting method and device and computer equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810416471.9A CN108647281B (en) 2018-05-03 2018-05-03 Webpage access risk detection and prompting method and device and computer equipment

Publications (2)

Publication Number Publication Date
CN108647281A true CN108647281A (en) 2018-10-12
CN108647281B CN108647281B (en) 2023-11-14

Family

ID=63748892

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810416471.9A Active CN108647281B (en) 2018-05-03 2018-05-03 Webpage access risk detection and prompting method and device and computer equipment

Country Status (1)

Country Link
CN (1) CN108647281B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111372205A (en) * 2020-02-28 2020-07-03 维沃移动通信有限公司 Information prompting method and electronic equipment
CN111831948A (en) * 2019-04-18 2020-10-27 阿里巴巴集团控股有限公司 Webpage type detection method and device and computer equipment
CN113098859A (en) * 2021-03-30 2021-07-09 深圳市欢太科技有限公司 Webpage page backspacing method, device, terminal and storage medium
CN113254650A (en) * 2021-06-28 2021-08-13 明品云(北京)数据科技有限公司 Knowledge graph-based assessment pushing method, system, equipment and medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102929861B (en) * 2012-10-22 2015-07-22 杭州东信北邮信息技术有限公司 Method and system for calculating text emotion index
CN104980404B (en) * 2014-04-10 2020-04-14 腾讯科技(深圳)有限公司 Method and system for protecting account information security

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111831948A (en) * 2019-04-18 2020-10-27 阿里巴巴集团控股有限公司 Webpage type detection method and device and computer equipment
CN111372205A (en) * 2020-02-28 2020-07-03 维沃移动通信有限公司 Information prompting method and electronic equipment
CN113098859A (en) * 2021-03-30 2021-07-09 深圳市欢太科技有限公司 Webpage page backspacing method, device, terminal and storage medium
CN113254650A (en) * 2021-06-28 2021-08-13 明品云(北京)数据科技有限公司 Knowledge graph-based assessment pushing method, system, equipment and medium

Also Published As

Publication number Publication date
CN108647281B (en) 2023-11-14

Similar Documents

Publication Publication Date Title
CN107818344B (en) Method and system for classifying and predicting user behaviors
CN108153901A (en) The information-pushing method and device of knowledge based collection of illustrative plates
CN108647281A (en) Web page access risk supervision, reminding method, device and computer equipment
US20190179966A1 (en) Method and apparatus for identifying demand
CN108090162A (en) Information-pushing method and device based on artificial intelligence
Yao et al. Service recommendation for mashup composition with implicit correlation regularization
EP3189449A2 (en) Sentiment rating system and method
WO2011080899A1 (en) Information recommendation method
CN111783016B (en) Website classification method, device and equipment
CN105512180B (en) A kind of search recommended method and device
CN112347367B (en) Information service providing method, apparatus, electronic device and storage medium
CN110287292B (en) Judgment criminal measuring deviation degree prediction method and device
CN109947902A (en) A kind of data query method, apparatus and readable medium
CN111612610A (en) Risk early warning method and system, electronic equipment and storage medium
CN112487283A (en) Method and device for training model, electronic equipment and readable storage medium
CN105159898B (en) A kind of method and apparatus of search
US11269896B2 (en) System and method for automatic difficulty level estimation
CN113569118B (en) Self-media pushing method, device, computer equipment and storage medium
CN114693409A (en) Product matching method, device, computer equipment, storage medium and program product
CN107330705A (en) A kind of method and system according to multi-data source antifraud
CN111931069B (en) User interest determination method and device and computer equipment
CN110634006B (en) Advertisement click rate prediction method, device, equipment and readable storage medium
CN111881007A (en) Operation behavior judgment method, device, equipment and computer readable storage medium
CN114189545B (en) Internet user behavior big data analysis method and system
CN116016365A (en) Webpage identification method based on data packet length information under encrypted flow

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant