CN101989292A - Sensitive information analysis system and method - Google Patents

Sensitive information analysis system and method Download PDF

Info

Publication number
CN101989292A
CN101989292A CN2009101622631A CN200910162263A CN101989292A CN 101989292 A CN101989292 A CN 101989292A CN 2009101622631 A CN2009101622631 A CN 2009101622631A CN 200910162263 A CN200910162263 A CN 200910162263A CN 101989292 A CN101989292 A CN 101989292A
Authority
CN
China
Prior art keywords
responsive
sensitive
blocks
personage
source
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2009101622631A
Other languages
Chinese (zh)
Inventor
李超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Chenrui Technology Co Ltd
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN2009101622631A priority Critical patent/CN101989292A/en
Publication of CN101989292A publication Critical patent/CN101989292A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a sensitive information analysis system and a sensitive information analysis method. The sensitive information analysis system comprises a core processing module, a core database, a user interface and a sensitive person interface. The sensitive information analysis method comprises the following steps that: a system engineer installs a transparent gateway, a search engine and the sensitive information analysis system on a server; a user enters the sensitive information analysis system, adds the sensitive information into the system and sets automatic operation; when the time is up, the system automatically executes the operation, calculates the sensitivity of each data block according to the sensitivity of the sensitive information, displays the sensitive blocks with higher sensitivity to the user to judge whether the sensitive blocks are true sensitive blocks or not, further analyzes the true sensitive blocks, extracts words which may be new sensitive words, extracts data sources which may be new sensitive sources and then displays the words and the data sources to the user; the user judges whether the words and the data sources are new sensitive words and new sensitive sources; and all the steps are repeated. The sensitive information analysis system and the sensitive information analysis method have the advantage of finding out the sensitive information and the sensitive persons quickly and accurately.

Description

A kind of sensitive information analytic system and method
Technical field
The present invention relates generally to information search and data mining field, relates in particular to a kind of sensitive information analytic system and method.
Background technology
Through the computer technology of more than ten years and the high speed development of internet, the world has entered the information age widely.According to the data that on April 18th, 2009, official announced, Chinese netizen has reached 3.16 hundred million, keeps global first netizen's quantity country.Infotech is a handle double-edged sword, most netizens are when utilizing the internet to improve the quality of life of oneself, a handful of local and overseas responsive personages are also arranged under the subsidy of hostile force, attempt becomes the internet platform tools of harm national security, manufacturing social unrest.
In existing information search and the data mining field, statistical and analytical tool, full-text search database, web crawlers and search engine are arranged, and network public sentiment monitoring system.Wherein network public sentiment monitoring system can be gathered webpage automatically, handles automatically, sorts out and form report, has solved the problem of monitoring internet public feelings and flame content to a certain extent.
The core of statistical and analytical tool is statistical study, and the core of full-text search database is full-text search, and the core of search engine is web search, and the core of network public sentiment monitoring system is the public opinion monitoring.Their core technology has some related with the user's who safeguards internet security demand, but is not the one thing again essentially.What wherein network public sentiment monitoring system was more paid close attention to is the focus of socialization and popular public opinion, shortage is to the few responsive personage's of number identification, location and trace ability, and sensitive information and the precise search of sensitive event and the ability of judgement few to number.
Responsive personage utilizes the internet to reach its hidden purpose more and more continually, head and shoulders above the scope of manpower monitoring.Therefore, need exploitation one cover badly specially at the few responsive personage of number and the internet security monitoring system of sensitive information.
Summary of the invention
The object of the present invention is to provide a kind of sensitive information analytic system and method, realize a more efficiently responsive personage and sensitive information analytic system, protect internet security, to overcome the deficiencies in the prior art.
A kind of sensitive information analytic system of the present invention comprises core processing module, core database, user interface and responsive personage's interface; Described responsive personage's interface is the general-purpose interface that obtains responsive character data, responsive personage's interface comprises data warehouse interface, search engine interface, wherein, described data warehouse interface obtains business datum by transparent gateway, and described search engine interface grasps internet information by web crawlers; Described user interface is based on the user oriented of browser/server framework, the observation that offers the user and control interface, can represent relation and multidimensional event trace thereof between the responsive personage dynamically, intuitively; Sensitive informations such as described core data library storage user, responsive personage and sensitive word, responsive source, sensitive blocks, and the relation between them; Described core processing module integrates responsive personage's interface, user interface, core database, calculates susceptibility automatically, handles sensitive information, and the control total system.
Another form of the present invention is a kind of sensitive information analytical approach, may further comprise the steps:
The system engineer installs configuration transparent gateway, search engine and sensitive information analytic system on server;
The user uses initial administrator's password login to enter the sensitive information analytic system;
The user adds sensitive word, responsive source, responsive personage in system;
The user is provided with automatic job, content is set comprises which sensitive word of search, responsive source, responsive personage, when move, how long move once, to after time, system automatically performs operation, and according to the data in the operation extracting data warehouse and the page on the internet, the data that grab deposit buffer area in;
The susceptibility of each data block calculates according to the susceptibility in sensitive word and responsive source in system;
System represents the higher sensitive blocks of those susceptibilitys to the user, judges whether to be genuine sensitive blocks that the user judges sensitive blocks for it, if be true, sensitive blocks deposits the sensitive blocks table in;
System further analyzes genuine sensitive blocks, extract the vocabulary of sensitive word that may be new, and to extract may be the data source in new responsive source, represent to the user after extracting, for its judgement, the user judges whether to be new sensitive word and new responsive source, and in this way, then new sensitive word and new responsive source enter into responsive vocabulary, responsive source table;
Like this circulation above-mentioned steps, responsive vocabulary, responsive source table, the responsive list of characters just can constantly obtain renewal, and that system just can constantly grab is up-to-date, sensitive information the most accurately.
Beneficial effect of the present invention: total system is equivalent to a telescope, and user interface is equivalent to eyepiece, and responsive personage's interface is equivalent to object lens.The user can find out those sensitive informations and responsive personage rapidly and accurately as using telescope to use native system to observe responsive personage, that hide at a distance.
Description of drawings
Fig. 1 is the general frame synoptic diagram of the described sensitive information analytic system of the specific embodiment of the invention;
Fig. 2 is the medium sensitive configuration diagram of the described sensitive information analytic system of the specific embodiment of the invention;
Fig. 3 is the configuration diagram of sensitive blocks in the described sensitive information analytic system of the specific embodiment of the invention;
Fig. 4-A and Fig. 4-B are the algorithm synoptic diagram of the susceptibility of the described sensitive blocks of the specific embodiment of the invention;
Fig. 5 is used to discern responsive personage's synoptic diagram for the described susceptibility of the specific embodiment of the invention;
The synoptic diagram that Fig. 6 moves in hyperspace for the described responsive personage of the specific embodiment of the invention;
Fig. 7 is the described core database data structure of a specific embodiment of the invention synoptic diagram;
Fig. 8 is the process flow diagram of the described sensitive information analytical approach of the specific embodiment of the invention;
Embodiment
Below in conjunction with accompanying drawing the specific embodiment of the present invention is described.
As Figure 1-3, the described a kind of sensitive information analytic system of the specific embodiment of the invention comprises core processing module, core database, user interface and responsive personage's interface.Total system is equivalent to a telescope, and user interface is equivalent to eyepiece, and responsive personage's interface is equivalent to object lens.The user is as using telescope to use native system to observe responsive personage, that hide at a distance.
Described responsive personage's interface is the general-purpose interface that obtains responsive character data, responsive personage's interface comprises data warehouse interface, search engine interface, wherein, described data warehouse interface obtains intramural business datums such as public security, bank, airline, communication company by transparent gateway, described search engine interface grasps internet informations such as Email, news website, forum, blog by web crawlers.Responsive personage's interface by unified both can obtain the business datum of constituent parts inside, can obtain internet information again, reached the purpose that correlation inquiry is analyzed.
Described user interface is user oriented, the observation that offers the user and control interface.User interface is based on browser/server framework.User interface has adopted advanced graph technology, can represent relation and multidimensional event trace thereof between the responsive personage dynamically, intuitively.
Sensitive informations such as described core data library storage user, responsive personage and sensitive word, responsive source, sensitive blocks, and the relation between them.
Described core processing module integrates responsive personage's interface, user interface, core database, calculates susceptibility automatically, handles sensitive information, and the control total system.
The groundwork of described sensitive information analytic system is exactly the susceptibility that assisted user is differentiated different pieces of information, finds out wherein susceptibility higher data, and described susceptibility itself is the attribute of a quantification just, and it must be attached to certain carrier, just on the data.As shown in Figure 2, in general, susceptibility has two kinds of concrete carriers, and a kind of is susceptibility higher data source, is called for short responsive source; A kind of is the higher vocabulary of susceptibility, is called for short sensitive word.
Described responsive source comprises two kinds, and a kind of is responsive Service Database, promptly responsive storehouse; A kind of is responsive website, it is responsive website, wherein, the inside, described responsive storehouse has one or more responsive tables, have one or more responsive row in each responsive table, the inside, described responsive website has one or more responsive columns, has one or more sensitive page in each column.
The classification of described sensitive word is more, mainly contains the time (for example on August 8th, 2008), place (for example Beijing), personage's (for example XX molecule), target (for example Olympic Games), method (for example insurgent violence), incident (example of being made up of time, place, personage, target, method) etc.
Responsive source and sensitive word are mutual checkings, complement each other, and can also can find the responsive source that certain is new according to some sensitive words according to a series of new sensitive words of responsive Feed Discovery.
The described sensitive information analytic system of the specific embodiment of the invention can help the user to find responsive source and sensitive word that some is potential, but finally will be assert by the user.
Thousands of the pages are arranged in the website, and wherein real sensitivity may have only several pages; Thousands of data are arranged in the database, and wherein real sensitivity may have only several data.If because of in the website sensitive page being arranged whole website has all been shielded, it is not accurate enough doing like this.That is to say that for responsive source, really responsive is minimum unit---the row and the page, they are referred to as sensitive blocks.As shown in Figure 3, for sensitive word, the susceptibility of single sensitive word is not high." Beijing " is insensitive, and " weapon " is also insensitive.Whether the website as the Baidu can comprise certain sensitive word according to someone posting and determine whether shielding this model, and it also is very inaccurate doing like this.Single sensitive word itself is insensitive, but a plurality of sensitive word combines, and is just responsive." Beijing ", " weapon ", " 2008 " several like this word combinations get up, and have very high susceptibility.A plurality of sensitive words combine, and are exactly sensitive blocks.Sensitive blocks is the minimum component units in responsive source, is again the largest unit that sensitive word can be formed, and it is the real concrete embodiment of susceptibility.
The susceptibility of described sensitive blocks comes from two aspects, comes from the one hand the susceptibility in the responsive source of higher level at sensitive blocks place, the susceptibility of the subordinate's sensitive word that comes from sensitive blocks on the other hand and comprised.
Shown in Fig. 4-A, the susceptibility of certain responsive website is 2, and the susceptibility of certain column in the responsive website is 3, and there is a sensitive blocks (page) this column the inside, and the sensitive blocks grade susceptibility that succession obtains from it is exactly 2+3=5.
Shown in Fig. 4-B, this sensitive blocks bread inside contains 4 sensitive words, is 2008 respectively, Beijing, Tibet, weapon.The susceptibility of these 4 sensitive words is respectively 2,2,4,2.At this moment, the susceptibility of these 4 sensitive words is not the relation of simply adding up, and they will be sorted out earlier, and the susceptibility that is grouped into the sensitive word of a class adds up, and returns the susceptibility less than the sensitive word of a class to multiply each other.2008 belong to time dimension, and Beijing and Tibet belong to the place dimension, and weapon belongs to the method dimension.Like this, the susceptibility that combines of these 4 sensitive words is exactly 2* (2+4) * 2=24.
Because the world is multidimensional, knowledge also is multidimensional, and carrier of knowledge---sensitive blocks also is a multidimensional.The solid just of multidimensional, since be three-dimensional, calculate its volume, will multiply each other with the length on each limit.For example, one piece of model the inside,, be a military fan's model so at the most if be full of the weapon title; If be full of place names such as Beijing, Tibet, be a tourism fan's model so at the most; If be full of the process of the Olympic Games, be a sports enthusiast's model so at the most.But, if existing weapon title, place names such as Beijing, Tibet are arranged again, the process that the Olympic Games are arranged again, that character has just become, because people is military fan, not only is the tourism fan but also is that sports enthusiast's possibility is very little that the knowledge of multiple different dimensions appears in the model simultaneously, is likely for certain special purpose.So, be the relation of adding up with the susceptibility of the sensitive word of dimension, the susceptibility of the sensitive word of different dimensions is the relation of multiplying each other.
So obtain among Fig. 4-A and Fig. 4-B, total susceptibility of described sensitive blocks is 5+24=29.
And then obtain the computing formula of the susceptibility of described sensitive blocks:
The susceptibility of sensitive blocks=sensitive blocks place higher level's susceptibility and+the contained all kinds of subordinates sensitive word of sensitive blocks and product
The susceptibility of described sensitive blocks comes from the susceptibility of its website, place and the susceptibility of its sensitive word that is comprised, this is very significant, because the number of responsive website and sensitive word is limited, the number of sensitive blocks is unlimited, and is more much bigger than the number of responsive website and sensitive word in other words.The user is as long as reserve the susceptibility of responsive website and the susceptibility of sensitive word under the assistance of sensitive information analytic system, system just can calculate the susceptibility of the huge sensitive blocks of number automatically, and then finds out those real responsive contents, presents to the user.
The user judges finally whether these sensitive blocks are genuine responsive, if genuine responsive, system will find out some new responsive sources according to the website at these sensitive blocks places; According to the contained vocabulary of these sensitive blocks, find out some new sensitive words.So circulation is gone down, and sensitive word and responsive source will constantly be upgraded automatically, make the user with responsive personage's trial of strength in the status that has the initiative.
The final purpose of the described sensitive information analytic system of the specific embodiment of the invention is in order to realize the effective observation of user to responsive personage.This just needs to understand and grasp responsive personage's feature, history, the characteristics of motion, so that they are distinguished from the ocean of information, dig-ins and does not put, and can predict their next step action.The sensitive information analytic system has been used for reference human search experience, writes down each responsive personage's searching record, as long as open this responsive personage, even if the user does not import new keyword, system also can automatically find out his interested thing.Responsive personage can hide hardy, pretend oneself for hiding observation, and still, in any case hide and camouflage, responsive personage's essence can not become, and that is to say, his characteristic can not become, and his sensitive word has relative conservative property.As long as safeguard these responsive personages perseveringly, the user just can find them easily.
Not only internet hunt can be set up responsive personage, carries out data mining and also can set up responsive personage from Service Database.Described responsive personage can be a people, also can be a company, an organizational structure, a loose crowd.No matter be a people or a company, do not hinder us to discern it, observe it with sensitive word and susceptibility.
Responsive personage " people " has following attribute:
● General Properties such as name, sex, age, nationality, identification card number, passport No., accent, appearance feature.
● life circles such as birthplace, residence, work unit, lineal relative, colleague good friend.
● assets such as house property, automobile, account No., stock.
● communication tool and contact persons such as phone, mobile phone, Email, QQ, blog account number, forum's account number.
Responsive personage may pseudonymity, false papers, and mobile phone, online account number also can change, and this brings certain difficulty to identification work.But the change frequency of these attributes is limited, and can find the linking point between them, such as being connected by assets, being connected by the lineal relative etc.
Also have one can the responsive personage of identification attribute, be exactly his speech habits.If the speech that a people delivers on medium, network is all noted, be the speech habits that can analyze him, and then can judge also that one section anonymous literal is that he says.
A responsive personage has many attributes, and it is related to exist the susceptibility that differs in size between these attributes and the responsive personage.Such as the susceptibility that exists height between identification card number, passport No. and the responsive personage is related; Susceptibility between name and the responsive personage is had the same given name and family name because exist than higher; The susceptibility of sex, age, nationality is then lower.It is related to exist susceptibility again between attribute and the attribute, exists relatedly such as the network pet name and message custom, and it is related that identification card number and assets exist, and friend exists related with friend.
As shown in Figure 5, if occurred certain responsive personage's friend in sensitive blocks, its susceptibility is 7; Occurred again should the sensitivity personage the message custom, its susceptibility is 7.The susceptibility of this sensitive blocks is exactly so: 7+7=14.This just illustrates that this sensitive blocks is relevant with this sensitivity personage probably, is noticeable.
Knowledge world is a complex space that multidimensional is multistage.Specific to the sensitive information analytic system, can be reduced to the space that constitutes by time, place, personage, target, method, 6 dimensions of incident to knowledge space.Each responsive personage, a particle that can be regarded as in 6 dimension spaces, moving.The every appearance in sensitive blocks of this particle once just is equivalent to stay next coordinate in 6 dimension spaces.A plurality of coordinates are linked up, just constituted its movement locus.As shown in Figure 6, the motion of described responsive personage in hyperspace can show with the mode of figure, is more conducive to the user like this and obtains sensation intuitively, makes a policy in the shortest time.
Described susceptibility is the prerequisite of described core database and described core processing module.
As shown in Figure 7, described core database mainly comprises four class tables: entity list, relation table, system's table, cache table.
Described entity list comprises subscriber's meter, the responsive list of characters, responsive vocabulary, responsive source table, sensitive blocks table, every table all has a unique number, also be their major key, be respectively: subscriber's meter (U), the responsive list of characters (O), responsive vocabulary (W), responsive source table (R), sensitive blocks table (B).
Described relation table comprises authority list, responsive personage's relation table, sensitive blocks relation table.Every table has two fields, points to the numbering of two tables in the entity list respectively, makees the associating major key." responsive personage, the responsive source numbering " of wherein said authority list both can have been pointed to responsive personage's numbering of the responsive list of characters, also can point to the responsive source numbering of responsive source table.Because responsive personage's numbering is different with the prefix of responsive source numbering, so can not make mistakes.Equally, " responsive personage, responsive source, the sensitive word numbering " of sensitive blocks relation table can be pointed to responsive personage's numbering of the responsive list of characters, the responsive source numbering of responsive source table, the sensitive word numbering of responsive vocabulary, also can not make mistakes." sensitive word, the responsive source numbering " of responsive personage's relation table can be pointed to the responsive source numbering of responsive source table, the sensitive word numbering of responsive vocabulary, also can not make mistakes.
When specifically using between entity list and the relation table, the user uses the username and password login to enter subscriber's meter, by the access rights of authority list acquisition to the responsive list of characters and responsive source table.The responsive list of characters is associated with responsive vocabulary and responsive source table by responsive personage's relation table, and sensitive word and responsive source are associated with main body---the sensitive blocks of storage sensitive information by the sensitive blocks relation table.
Described system table comprises metadata table, schedule work, log sheet.Wherein, described metadata table is the description about this body structure of database, charge book in the described schedule work comprises two classes, one class is that the data that responsive personage's interface routine (being search engine or data warehouse) will be carried out grasp operation, one class is the data processing operation that sensitive information analytic system core database itself will be carried out, and described log sheet is the log record that each operation is carried out.
Described cache table is used for interim storage and extracts the table of coming from Service Database, data warehouse, and the web page contents of catching with web crawlers and search engine.After handling by analysis, the sensitive information in the cache table is directed to the sensitive blocks table, and remaining will be deleted over time.
Described core database adopts up-to-date Oracle (inscriptions on bones or tortoise shells) 11g enterprise version database.
Described core processing module comprises the java applet at SQL (Structured Query Language (SQL)) program in Oracle (inscriptions on bones or tortoise shells) database and search engine, interface, foreground.
As shown in Figure 8, the described sensitive information analytical approach of the specific embodiment of the invention may further comprise the steps:
1) system engineer installs configuration transparent gateway, search engine and sensitive information analytic system on server, wherein said transparent gateway is in order to extract data from Service Database and data warehouse, described search engine is in order to grasp data from the internet, and the data that their obtain will enter the buffer area of sensitive information analytic system.
2) user uses initial administrator's password login to enter the sensitive information analytic system, can revise password, also can add other user, and authorize.
3) user adds sensitive word, responsive source, responsive personage in system.Both can manually add, also can import in batches.
4) user is provided with automatic job, content is set comprises which sensitive word of search, responsive source, responsive personage, when moves, and how long moves once.
5) to after time, system can automatically perform operation, according to the data in the operation extracting data warehouse and the page on the internet.The data that grab deposit buffer area in.
6) susceptibility of each data block calculates according to the susceptibility in sensitive word and responsive source in system.
7) system represents the higher sensitive blocks of those susceptibilitys to the user, judges whether to be genuine sensitive blocks for it.
8) user judges sensitive blocks, if be true, sensitive blocks deposits the sensitive blocks table in.
9) system does further to analyze to genuine sensitive blocks, and extraction may be the vocabulary of new sensitive word, and to extract may be the data source (website) in new responsive source.Represent to the user after extracting, for its judgement.
10) user judges whether to be new sensitive word and new responsive source, and in this way, then new sensitive word and new responsive source enter into responsive vocabulary, responsive source table.
11) so move in circles, responsive vocabulary, responsive source table, the responsive list of characters just can constantly obtain renewal, and that system just can constantly grab is up-to-date, sensitive information the most accurately.
System can be by the time, count the number of times that incident (being sensitive blocks) occurs automatically by responsive personage, and generate chart.System also can analyze the relation between responsive personage and sensitive word, responsive personage and incident, responsive personage and the responsive personage, and represents with graphics mode.

Claims (8)

1. a sensitive information analytic system is characterized in that, comprising:
Responsive personage's interface, it comprises data warehouse interface and search engine interface, and wherein said data warehouse interface obtains business datum by transparent gateway, and described search engine interface grasps internet information by web crawlers;
User interface, it is the user oriented based on browser/server framework, the observation that offers the user and control interface;
Core database, it mainly comprises entity list, relation table, system's table and cache table, core database is used to store user, responsive personage, sensitive word, responsive source, sensitive blocks and the relation between them; And
Core processing module, it is used to integrate responsive personage's interface, user interface, core database with automatic calculating susceptibility, handles sensitive information and controls total system.
2. sensitive information analytic system as claimed in claim 1 is characterized in that:
Described responsive source comprises responsive Service Database, it is responsive storehouse, there are one or more responsive tables the inside, described responsive storehouse, one or more responsive row are arranged in each responsive table, and responsive website, be responsive website, there are one or more responsive columns the inside, described responsive website, in each responsive column one or more sensitive page is arranged; Described sensitive word comprises time, place, personage, target, method, incident; Described sensitive blocks is the minimum component units in responsive source, be again the largest unit that sensitive word can be formed, the susceptibility of described sensitive blocks comes from the susceptibility in the responsive source of higher level at described sensitive blocks place and the susceptibility of subordinate's sensitive word that described sensitive blocks comprised; Described responsive personage is a people, a company, an organizational structure or a loose crowd.
3. sensitive information analytic system as claimed in claim 1 or 2 is characterized in that, the computing formula of the susceptibility of described sensitive blocks is:
The susceptibility of sensitive blocks=sensitive blocks place higher level's susceptibility and+the contained all kinds of subordinates sensitive word of sensitive blocks and product.
4. sensitive information analytic system as claimed in claim 1 or 2, it is characterized in that: described entity list comprises subscriber's meter, the responsive list of characters, responsive vocabulary, responsive source table and sensitive blocks table, every table all has a unique number, is respectively: subscriber's meter (U), the responsive list of characters (O), responsive vocabulary (W), responsive source table (R), sensitive blocks table (B); Described relation table comprises authority list, sensitive blocks relation table and responsive personage's relation table, wherein every table has two fields, point to the numbering of two tables in the entity list respectively, wherein said authority list comprises " responsive personage " and " responsive source numbering " two fields, points to responsive personage's numbering of the responsive list of characters or points to the responsive source numbering of showing in responsive source; Described sensitive blocks relation table comprises " responsive personage, responsive source, sensitive word numbering " field, points to responsive personage's numbering of the responsive list of characters, the responsive source numbering of responsive source table, the sensitive word numbering of responsive vocabulary; Described responsive personage's relation table comprises " sensitive word, responsive source numbering " field, points to the responsive source numbering of responsive source table, the sensitive word numbering of responsive vocabulary; Described system table comprises metadata table, schedule work and log sheet, wherein said metadata table is the description about this body structure of database, charge book in the described schedule work comprises that the data that responsive personage's interface routine will be carried out grasp operation, and the data processing operation that will carry out of sensitive information analytic system core database itself, described log sheet is the log record that each operation is carried out; Described cache table is used for interim storage and extracts table that comes and the web page contents of catching with web crawlers and search engine from Service Database, data warehouse, be directed to the sensitive blocks table after sensitive information in the described cache table is handled by analysis, remaining information will be deleted over time in the cache table.
5. a sensitive information analytical approach is characterized in that, may further comprise the steps:
The system engineer installs configuration transparent gateway, search engine and sensitive information analytic system on server;
The user uses initial administrator's password login to enter the sensitive information analytic system;
The user adds sensitive word, responsive source, responsive personage in system;
The user is provided with automatic job, content is set comprises which sensitive word of search, responsive source, responsive personage, when move, how long move once, to after time, system automatically performs operation, and according to the data in the operation extracting data warehouse and the page on the internet, the data that grab deposit buffer area in;
The susceptibility of each data block calculates according to the susceptibility in sensitive word and responsive source in system;
System represents the higher sensitive blocks of those susceptibilitys to the user, judges whether to be genuine sensitive blocks that the user judges sensitive blocks for it, if be true, sensitive blocks deposits the sensitive blocks table in;
System further analyzes genuine sensitive blocks, extract the vocabulary of sensitive word that may be new, and to extract may be the data source in new responsive source, represent to the user after extracting, for its judgement, the user judges whether to be new sensitive word and new responsive source, and in this way, then new sensitive word and new responsive source enter into responsive vocabulary, responsive source table.
6. sensitive information analytical approach as claimed in claim 5 is characterized in that: transparent gateway is used for extracting data from Service Database and data warehouse; Search engine is used for grasping data from the internet.
7. as claim 5 or 6 described sensitive information analytical approachs, it is characterized in that:
Described responsive source comprises responsive Service Database, it is responsive storehouse, there are one or more responsive tables the inside, described responsive storehouse, one or more responsive row are arranged in each responsive table, and responsive website, be responsive website, there are one or more responsive columns the inside, described responsive website, in each responsive column one or more sensitive page is arranged; Described sensitive word comprises time, place, personage, target, method, incident; Described sensitive blocks is the minimum component units in responsive source, be again the largest unit that sensitive word can be formed, the susceptibility of described sensitive blocks comes from the susceptibility in the responsive source of higher level at described sensitive blocks place and the susceptibility of subordinate's sensitive word that described sensitive blocks comprised; Described responsive personage is a people, a company, an organizational structure or a loose crowd.
8. as claim 5 or 6 described sensitive information analytical approachs, it is characterized in that the computing formula of the susceptibility of described sensitive blocks is:
The susceptibility of sensitive blocks=sensitive blocks place higher level's susceptibility and+the contained all kinds of subordinates sensitive word of sensitive blocks and product.
CN2009101622631A 2009-07-31 2009-07-31 Sensitive information analysis system and method Pending CN101989292A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2009101622631A CN101989292A (en) 2009-07-31 2009-07-31 Sensitive information analysis system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2009101622631A CN101989292A (en) 2009-07-31 2009-07-31 Sensitive information analysis system and method

Publications (1)

Publication Number Publication Date
CN101989292A true CN101989292A (en) 2011-03-23

Family

ID=43745829

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2009101622631A Pending CN101989292A (en) 2009-07-31 2009-07-31 Sensitive information analysis system and method

Country Status (1)

Country Link
CN (1) CN101989292A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103093154A (en) * 2013-02-06 2013-05-08 杭州电子科技大学 Secret-level setting information management system and secret-level setting information management method
CN103678273A (en) * 2012-09-14 2014-03-26 安徽华贞信息科技有限公司 Internet paragraph level topic recognition system
CN105610637A (en) * 2015-09-24 2016-05-25 百度在线网络技术(北京)有限公司 Sensitive information acquisition method and apparatus thereof
CN109558480A (en) * 2018-11-30 2019-04-02 重庆市千将软件有限公司 For the counter method of crime of laundering behavior
CN109766447A (en) * 2018-12-25 2019-05-17 东软集团股份有限公司 A kind of method and apparatus of determining sensitive information
CN111274149A (en) * 2020-02-06 2020-06-12 中国建设银行股份有限公司 Test data processing method and device
CN111314292A (en) * 2020-01-15 2020-06-19 上海观安信息技术股份有限公司 Data security inspection method based on sensitive data identification

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103678273A (en) * 2012-09-14 2014-03-26 安徽华贞信息科技有限公司 Internet paragraph level topic recognition system
CN103093154A (en) * 2013-02-06 2013-05-08 杭州电子科技大学 Secret-level setting information management system and secret-level setting information management method
CN103093154B (en) * 2013-02-06 2016-01-20 杭州电子科技大学 One is determined confidential information management system and determines confidential information management method
CN105610637A (en) * 2015-09-24 2016-05-25 百度在线网络技术(北京)有限公司 Sensitive information acquisition method and apparatus thereof
CN109558480A (en) * 2018-11-30 2019-04-02 重庆市千将软件有限公司 For the counter method of crime of laundering behavior
CN109766447A (en) * 2018-12-25 2019-05-17 东软集团股份有限公司 A kind of method and apparatus of determining sensitive information
CN111314292A (en) * 2020-01-15 2020-06-19 上海观安信息技术股份有限公司 Data security inspection method based on sensitive data identification
CN111274149A (en) * 2020-02-06 2020-06-12 中国建设银行股份有限公司 Test data processing method and device

Similar Documents

Publication Publication Date Title
EP3819792A2 (en) Method, apparatus, device, and storage medium for intention recommendation
Cuzzocrea et al. Big data: a research agenda
Bozarth et al. Toward a better performance evaluation framework for fake news classification
Yu et al. Ring: Real-time emerging anomaly monitoring system over text streams
CN101989292A (en) Sensitive information analysis system and method
CN106033445B (en) The method and apparatus for obtaining article degree of association data
CN106250513A (en) A kind of event personalization sorting technique based on event modeling and system
CN103838785A (en) Vertical search engine in patent field
CN102831234A (en) Personalized news recommendation device and method based on news content and theme feature
CN110362740B (en) Water conservancy portal information hybrid recommendation method
CN108984667A (en) A kind of public sentiment monitoring system
CN113297457B (en) High-precision intelligent information resource pushing system and pushing method
CN106776567A (en) A kind of internet big data analyzes extracting method and system
Miller Automated detection of Chinese government astroturfers using network and social metadata
CN106649498A (en) Network public opinion analysis system based on crawler and text clustering analysis
Wang et al. Early Rumor Detection Based on Deep Recurrent Q‐Learning
Gomes et al. A survey on data stream, big data and real-time
Pandey Challenges of big data to big data mining with their processing framework
Yom-Tov et al. The werther effect revisited: Measuring the effect of news items on user behavior
Shuai et al. Improving news ranking by community tweets
Huang et al. Design a batched information retrieval system based on a concept-lattice-like structure
CN106777395A (en) A kind of topic based on community's text data finds system
Wang et al. Website clustering from query graph using social network analysis
CN113505117A (en) Data quality evaluation method, device, equipment and medium based on data indexes
US11354519B2 (en) Numerical information management device enabling numerical information search

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
ASS Succession or assignment of patent right

Owner name: BEIJING CHENRUI TECHNOLOGY CO., LTD.

Free format text: FORMER OWNER: LI CHAO

Effective date: 20120206

C41 Transfer of patent application or patent right or utility model
COR Change of bibliographic data

Free format text: CORRECT: ADDRESS; FROM: 163517 DAQING, HEILONGJIANG PROVINCE TO: 100036 HAIDIAN, BEIJING

TA01 Transfer of patent application right

Effective date of registration: 20120206

Address after: 100036, Beijing, Haidian District Fuxing Road, No. 65, building -A5, room 1601, room 16

Applicant after: Beijing Chenrui Technology Co., Ltd.

Address before: 163517, room 10, No. 300, Lane 4, Tong Yang Road, Datong District, Heilongjiang, Daqing

Applicant before: Li Chao

C12 Rejection of a patent application after its publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20110323