CN109388710A - A kind of IP address service attribute scaling method and device - Google Patents
A kind of IP address service attribute scaling method and device Download PDFInfo
- Publication number
- CN109388710A CN109388710A CN201810970182.3A CN201810970182A CN109388710A CN 109388710 A CN109388710 A CN 109388710A CN 201810970182 A CN201810970182 A CN 201810970182A CN 109388710 A CN109388710 A CN 109388710A
- Authority
- CN
- China
- Prior art keywords
- name
- domain name
- subdomain
- learning model
- machine learning
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 41
- 238000010801 machine learning Methods 0.000 claims abstract description 69
- 238000007635 classification algorithm Methods 0.000 claims abstract description 9
- 238000012549 training Methods 0.000 claims description 25
- 238000012360 testing method Methods 0.000 claims description 19
- 238000013527 convolutional neural network Methods 0.000 claims description 8
- 230000014509 gene expression Effects 0.000 claims description 8
- 238000013528 artificial neural network Methods 0.000 claims description 7
- 238000013507 mapping Methods 0.000 abstract description 6
- 238000004590 computer program Methods 0.000 description 7
- 238000010586 diagram Methods 0.000 description 7
- 238000012545 processing Methods 0.000 description 5
- 238000013479 data entry Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 3
- 238000000151 deposition Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000003860 storage Methods 0.000 description 2
- BQCADISMDOOEFD-UHFFFAOYSA-N Silver Chemical compound [Ag] BQCADISMDOOEFD-UHFFFAOYSA-N 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- PCHJSUWPFVWCPO-UHFFFAOYSA-N gold Chemical compound [Au] PCHJSUWPFVWCPO-UHFFFAOYSA-N 0.000 description 1
- 239000010931 gold Substances 0.000 description 1
- 229910052737 gold Inorganic materials 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000000155 melt Substances 0.000 description 1
- 229910000510 noble metal Inorganic materials 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 230000036544 posture Effects 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 230000002265 prevention Effects 0.000 description 1
- 230000007115 recruitment Effects 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 229910052709 silver Inorganic materials 0.000 description 1
- 239000004332 silver Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention relates to a kind of IP address service attribute scaling method and devices, which comprises obtains the subdomain name of domain name and the page info of domain name and its subdomain name;The classification results of the page info of domain name and its subdomain name are obtained using the text classification machine learning model pre-established;The categorical attribute of domain name and its corresponding IP address collection of subdomain name is demarcated using the classification results of domain name and its page info of subdomain name.Technical solution provided by the invention, the page info of domain name is obtained by web crawlers, determine that the business of domain name is classified using machine learning text classification algorithm model, establish the mapping relations of " IP- domain name-business classification ", complete the calibration classified to IP address upper layer bearer service, expand existing IP address attribute library, improves the real-time of IP operation attribute.
Description
Technical field
The present invention relates to internet areas, and in particular to a kind of IP address service attribute scaling method and device.
Background technique
Core of the IP address as internet, be connect people, object, environment tie.Traditional IP address attributes research is inclined
Overweight position attribution research, typical case include IP address positioning service, network flow intelligent scheduling, intelligent DNS parsing and
It precisely launches and tries to please advertisement etc., principle is the different push personalized services according to IP address position, but with can not determining IP
The service attribute of location upper layer carrying, is unfavorable for network security situation awareness.
Summary of the invention
The present invention provides a kind of IP address service attribute scaling method and device, and the purpose is to obtain domain by web crawlers
The page info of name determines that the business of domain name is classified using machine learning text classification algorithm model, establishes " IP- domain name-business
The mapping relations of classification " complete the calibration classified to IP address upper layer bearer service, have expanded existing IP address attribute library,
Improve the real-time of IP operation attribute.
The purpose of the present invention is adopt the following technical solutions realization:
A kind of IP address service attribute scaling method, it is improved in that the described method includes:
Obtain the subdomain name of domain name and the page info of domain name and its subdomain name;
The classification of the page info of domain name and its subdomain name is obtained using the text classification machine learning model pre-established
As a result;
Domain name and its corresponding IP address of subdomain name are demarcated using the classification results of domain name and its page info of subdomain name
The categorical attribute of collection.
Preferably, the subdomain name for obtaining domain name and the page info of domain name and its subdomain name, comprising:
A. whether legal judge domain name, if domain name is legal, then follow the steps b, otherwise end operation;
B. the First page information of domain name is obtained using web crawlers method, if the content of pages of the First page information is sky,
End operation, it is no to then follow the steps c;
C. the subdomain name in the First page information is obtained using regular expression matching, and exports the subdomain name;
D. step a to c is repeated to subdomain name, until there is no nested subdomain names in subdomain name.
Preferably, the establishment process of the text classification machine learning model pre-established, comprising:
A. the page info of categorical attribute has been demarcated as the training data of text classification machine learning model using history
And test data, utilize training data training text sorting machine learning model;
B. the accuracy that the text classification machine learning model is tested using test data, if the text classification machine
The accuracy of learning model reaches 85% or more, then exports text sorting machine learning model, if it is not, then modifying text point
The parameter of class machine learning model, and return step A;
Wherein, the text classification machine learning model is the text classification algorithm based on CNN/RNN, the text classification
The parameter of machine learning model can be learning rate, the neural network number of plies.
Preferably, the text classification machine learning model that the utilization pre-establishes obtains the page of domain name and its subdomain name
Before the classification results of information, comprising:
Remove the code information in the page info of domain name and its subdomain name.
Preferably, the acquisition process of domain name and its corresponding IP address collection of subdomain name, comprising:
According to dns resolution principle, one domain name of acquisition is parsed using at least one dns server or its subdomain name is corresponding
At least one IP address utilizes a domain name or corresponding at least one IP address building domain name of its subdomain name or its subdomain
The corresponding IP address collection of name, wherein dns server IP address corresponding with domain name or its subdomain name corresponds.
A kind of IP address service attribute caliberating device, it is improved in that described device includes:
First acquisition unit, for obtaining the subdomain name of domain name and the page info of domain name and its subdomain name;
Second acquisition unit, for obtaining domain name and its subdomain name using the text classification machine learning model pre-established
Page info classification results;
Unit is demarcated, for utilizing the classification results of domain name and its page info of subdomain name calibration domain name and its subdomain name
The categorical attribute of corresponding IP address collection.
Preferably, the first acquisition unit, comprising:
First judgment module, it is whether legal for judging domain name, if domain name is legal, the second judgment module is executed,
Otherwise end operation;
Second judgment module, for obtaining the First page information of domain name using web crawlers method, if the First page information
Content of pages is sky, then end operation, otherwise executes and obtains module;
Module is obtained, for obtaining the subdomain name in the First page information using regular expression matching, and exports the son
Domain name;
Loop module, for repeating first judgment module to module is obtained, until not depositing in subdomain name to subdomain name
In nested subdomain name.
Preferably, the establishment process of the text classification machine learning model pre-established, comprising:
Training module, for having demarcated the page info of categorical attribute using history as text classification machine learning model
Training data and test data, utilize training data training text sorting machine learning model;
Test module, for testing the accuracy of the text classification machine learning model using test data, if described
The accuracy of text classification machine learning model reaches 85% or more, then exports text sorting machine learning model, if it is not,
The parameter of text classification machine learning model is then modified, and returns to training module;
Wherein, the text classification machine learning model is the text classification algorithm based on CNN/RNN, the text classification
The parameter of machine learning model can be learning rate, the neural network number of plies.
Preferably, the text classification machine learning model that the utilization pre-establishes obtains the page of domain name and its subdomain name
Before the classification results of information, comprising:
Remove the code information in the page info of domain name and its subdomain name.
Preferably, the acquisition process of domain name and its corresponding IP address collection of subdomain name, comprising:
According to dns resolution principle, one domain name of acquisition is parsed using at least one dns server or its subdomain name is corresponding
At least one IP address utilizes a domain name or corresponding at least one IP address building domain name of its subdomain name or its subdomain
The corresponding IP address collection of name, wherein dns server IP address corresponding with domain name or its subdomain name corresponds.
Beneficial effects of the present invention:
Technical solution provided by the invention, by obtaining the subdomain name of domain name and the page letter of domain name and its subdomain name
Breath obtains the classification results of the page info of domain name and its subdomain name using the text classification machine learning model pre-established,
The classification of domain name and its corresponding IP address collection of subdomain name is demarcated using the classification results of domain name and its page info of subdomain name
Attribute realizes application service space and IP address space dynamic mapping, with national security visual angle, draws cyberspace application service
View is threatened, serves the perception of cyberspace security postures and the scheduling of commercialized intelligent network, with having expanded existing IP
Location attribute library improves the real-time of IP operation attribute;
Technical solution provided by the invention determines domain name by the method using distributed DNS parsing as much as possible
IP address collection;Use based on the text classification algorithm of CNN/RNN as text classification machine learning model, improve in webpage
The preparation rate of appearance business classification.
Detailed description of the invention
Fig. 1 is a kind of flow chart of IP address service attribute scaling method of the present invention;
Fig. 2 is a kind of structural schematic diagram of IP address service attribute caliberating device of the present invention.
Specific embodiment
It elaborates with reference to the accompanying drawing to a specific embodiment of the invention.
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention
In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is
A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art
All other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.
A kind of IP address service attribute scaling method provided by the invention, as shown in Figure 1, which comprises
101. obtaining the subdomain name of domain name and the page info of domain name and its subdomain name;
102. obtaining the page info of domain name and its subdomain name using the text classification machine learning model pre-established
Classification results;
Such as: the page of domain name www.icbc.com.cn and its subdomain name can be obtained using text classification machine learning model
The classification results of face information are as follows: finance and economics
103. utilizing the classification results of domain name and its page info of subdomain name calibration domain name and its corresponding IP of subdomain name
The categorical attribute of address set.
Generate " IP- domain name-business classification " mapping data entry, and " IP- domain name-business classification " mapping data by described in
Entry is stored in database, and the database can be MySQL.
It should be noted that IP address and domain name are it is possible that the case where multi-to-multi, i.e., single ip address may correspond to
Multiple domain names, multiple IP address may correspond to a domain name, and " IP- domain name-business classification " data entry of generation is needed with IP
It is joint major key with domain name;
For example, having to the domain name www.icbc.com.cn IP address being resolved to: 122.228.86.148,
115.231.14.81, totally 58 IP address such as 183.131.168.210,183.134.10.170 and 218.92.221.7.Finally
Generate " 122.228.86.148-www.icbc.com.cn- finance and economics ", " 115.231.14.81-www.icbc.com.cn- wealth
Through ", " 183.131.168.210-www.icbc.com.cn- finance and economics ", " 183.134.10.170-www.icbc.com.cn- wealth
Through ", data entries such as " 218.92.221.7-www.icbc.com.cn- finance and economics ", for the son of domain name www.icbc.com.cn
Domain name www.sh.icbc.com.cn, being resolved to IP address is 59.49.42.248, the classification results of content of pages are as follows: finance and economics,
Then ultimately generate " 59.49.42.248-www.sh.icbc.com.cn- finance and economics " data entry.
Further, the step 101, comprising:
A. whether legal judge domain name, if domain name is legal, then follow the steps b, otherwise end operation;
B. the First page information of domain name is obtained using web crawlers method, if the content of pages of the First page information is sky,
End operation, it is no to then follow the steps c;
C. the subdomain name in the First page information is obtained using regular expression matching, and exports the subdomain name;
D. step a to c is repeated to subdomain name, until there is no nested subdomain names in subdomain name.
Such as: the mistake of the page info of the subdomain name and domain name and its subdomain name of acquisition domain name www.icbc.com.cn
Journey may include:
A. whether legal judge domain name www.icbc.com.cn, if domain name is legal, thens follow the steps b, otherwise terminate
Operation;Judge that www.icbc.com.cn is legitimate domain name according to regular expressions
B. the First page information of domain name www.icbc.com.cn is obtained using web crawlers method, if the First page information
Content of pages is sky, then end operation, no to then follow the steps c;The www.icbc.com.cn homepage got using web crawlers
Content is not empty;
C. the subdomain name in the First page information is obtained using regular expression matching, and exports the subdomain name;
D. step a to c is repeated to subdomain name, until there is no nested subdomain names in subdomain name.
The subdomain name in the First page information of domain name www.icbc.com.cn, which can be obtained, through step c and step d 54,
Nested subdomain name has 34 in subdomain name, then the present embodiment gets 88 subdomain names altogether;
Further, the establishment process of the text classification machine learning model pre-established, comprising:
A. the page info of categorical attribute has been demarcated as the training data of text classification machine learning model using history
And test data, utilize training data training text sorting machine learning model;
Wherein, the training data and test data all can be 6500, and the training data and test data
Cover 14 kinds of classification;It is described 14 kinds classification are as follows: finance and economics, lottery ticket, house property, stock, household, education, science and technology, society, fashion, when
Political affairs, sport, constellation, game, amusement;
The text classification machine learning model is use convolutional neural networks and the Recognition with Recurrent Neural Network of open source;
B. the accuracy that the text classification machine learning model is tested using test data, if the text classification machine
The accuracy of learning model reaches 85% or more, then exports text sorting machine learning model, if it is not, then modifying text point
The parameter of class machine learning model, and return step A;
Wherein, the text classification machine learning model is the text classification algorithm based on CNN/RNN, the text classification
The parameter of machine learning model can be learning rate, the neural network number of plies;
After tested, the accuracy of the text classification machine learning model can achieve 96.04%;
Further, the text classification machine learning model that the utilization pre-establishes obtains the page of domain name and its subdomain name
Before the classification results of face information, comprising:
Remove the code information in the page info of domain name and its subdomain name.
Such as: due to web crawlers return be webpage html source code, it is therefore desirable to web crawlers obtain page
Face content is cleaned, is regular, and title, keywords, description and text key message of the page are extracted.
For example, title, keywords, the description and just that can be extracted from domain name www.icbc.com.cn
Literary key message has:
Title: China, Industrial and Commercial Bank of China website;
Keywords: online fund, online stock, online noble metal, online gold, financial management in the Internet, online insurance, online
Foreign exchange, online futures, online bond, expert's commentary, finance and economics dynamic, e-bank, Web bank, telephone bank, Mobile banking,
Online Payment, online contribution, personal finance, bank card, corporate business, institution business, assets trustship, supplementary pension, investment silver
Row, assets disposal, online shopping mall, industrial and commercial bank learning centre, original stage, E move the world, financial consultation, focus, online forum, work
Row style and features, industrial and commercial bank's news flash, media see industrial and commercial bank, Financial Information, important announcement, preferential activity, customer service, financial supermarket;
Description: industrial and commercial bank's financial service is introduced all-sidely, and Investment & Financing abundant information is comprehensive, online transaction side
Just quick, meet the financial service demand of client's specialization, diversification, hommization, make collection business, information, transaction, shopping,
It interacts in integrated synthesis financial service platform;
Text: individual client, corporate client, global main website, branch, service network, customer service, personnel recruitment,
Traditional font, EN, keyword, account service, deposit and loan please be input, is credit card, Investment & Financing, private bank, financial market, a
The login of people Web bank, registration, business guide, Internetbank assistant, client downloads, safe prefecture, prevention false website, in enterprise network
Bank logon registration, business guide, Internetbank assistant, is demonstrated, melts e power purchase business platform, personal store, enterprise store;Important public affairs
It accuses: the sale about the State Development Bank's second phase first phase in 2018 and third phase financial bond over-the-counter market distribution supervention row
Notice issues supervention row about the State Development Bank's second phase first phase in 2018 and 2017 the 9th phase financial bond over-the-counter markets
Sales release etc.;
Further, the acquisition process of domain name and its corresponding IP address collection of subdomain name, comprising:
According to dns resolution principle, one domain name of acquisition is parsed using at least one dns server or its subdomain name is corresponding
At least one IP address utilizes a domain name or corresponding at least one IP address building domain name of its subdomain name or its subdomain
The corresponding IP address collection of name, wherein dns server IP address corresponding with domain name or its subdomain name corresponds.
For example, the process for obtaining domain name www.icbc.com.cn and its corresponding IP address collection of subdomain name may include:
Domain name www.icbc.com.cn and its subdomain name are held respectively using 15 dns servers both domestic and external are deployed in
Row dns resolution obtains 531 IP address after duplicate removal;
The dns server can be 114.114.114.114,8.8.8.8.
In the embodiment of the present invention, it can be re-execute the steps 101-103 with 30 days for the period, update " IP- domain name-business
Classification " mapping relations.
The present invention also provides a kind of IP address service attribute caliberating devices, as shown in Fig. 2, described device includes:
First acquisition unit, for obtaining the subdomain name of domain name and the page info of domain name and its subdomain name;
Second acquisition unit, for obtaining domain name and its subdomain name using the text classification machine learning model pre-established
Page info classification results;
Unit is demarcated, for utilizing the classification results of domain name and its page info of subdomain name calibration domain name and its subdomain name
The categorical attribute of corresponding IP address collection.
Further, the first acquisition unit, comprising:
First judgment module, it is whether legal for judging domain name, if domain name is legal, the second judgment module is executed,
Otherwise end operation;
Second judgment module, for obtaining the First page information of domain name using web crawlers method, if the First page information
Content of pages is sky, then end operation, otherwise executes and obtains module;
Module is obtained, for obtaining the subdomain name in the First page information using regular expression matching, and exports the son
Domain name;
Loop module, for repeating first judgment module to module is obtained, until not depositing in subdomain name to subdomain name
In nested subdomain name.
Further, the establishment process of the text classification machine learning model pre-established, comprising:
Training module, for having demarcated the page info of categorical attribute using history as text classification machine learning model
Training data and test data, utilize training data training text sorting machine learning model;
Test module, for testing the accuracy of the text classification machine learning model using test data, if described
The accuracy of text classification machine learning model reaches 85% or more, then exports text sorting machine learning model, if it is not,
The parameter of text classification machine learning model is then modified, and returns to training module;
Wherein, the text classification machine learning model is the text classification algorithm based on CNN/RNN, the text classification
The parameter of machine learning model can be learning rate, the neural network number of plies.
Further, the text classification machine learning model that the utilization pre-establishes obtains the page of domain name and its subdomain name
Before the classification results of face information, comprising:
Remove the code information in the page info of domain name and its subdomain name.
Further, the acquisition process of domain name and its corresponding IP address collection of subdomain name, comprising:
According to dns resolution principle, one domain name of acquisition is parsed using at least one dns server or its subdomain name is corresponding
At least one IP address utilizes a domain name or corresponding at least one IP address building domain name of its subdomain name or its subdomain
The corresponding IP address collection of name, wherein dns server IP address corresponding with domain name or its subdomain name corresponds.
It should be understood by those skilled in the art that, embodiments herein can provide as method, system or computer program
Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the application
Apply the form of example.Moreover, it wherein includes the computer of computer usable program code that the application, which can be used in one or more,
The computer program implemented in usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) produces
The form of product.
The application is referring to method, the process of equipment (system) and computer program product according to the embodiment of the present application
Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions
The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs
Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce
A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real
The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy
Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates,
Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or
The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting
Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or
The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one
The step of function of being specified in a box or multiple boxes.
Finally it should be noted that: the above embodiments are merely illustrative of the technical scheme of the present invention and are not intended to be limiting thereof, to the greatest extent
Invention is explained in detail referring to above-described embodiment for pipe, it should be understood by those ordinary skilled in the art that: still
It can be with modifications or equivalent substitutions are made to specific embodiments of the invention, and without departing from any of spirit and scope of the invention
Modification or equivalent replacement, should all cover within the scope of the claims of the present invention.
Claims (10)
1. a kind of IP address service attribute scaling method, which is characterized in that the described method includes:
Obtain the subdomain name of domain name and the page info of domain name and its subdomain name;
The classification results of the page info of domain name and its subdomain name are obtained using the text classification machine learning model pre-established;
Domain name and its corresponding IP address collection of subdomain name are demarcated using the classification results of domain name and its page info of subdomain name
Categorical attribute.
2. the method as described in claim 1, which is characterized in that the subdomain name for obtaining domain name and domain name and its subdomain name
Page info, comprising:
A. whether legal judge domain name, if domain name is legal, then follow the steps b, otherwise end operation;
B. the First page information that domain name is obtained using web crawlers method is terminated if the content of pages of the First page information is sky
Operation, it is no to then follow the steps c;
C. the subdomain name in the First page information is obtained using regular expression matching, and exports the subdomain name;
D. step a to c is repeated to subdomain name, until there is no nested subdomain names in subdomain name.
3. the method as described in claim 1, which is characterized in that the text classification machine learning model pre-established is built
Vertical process, comprising:
A. the page info of categorical attribute has been demarcated as the training data of text classification machine learning model and survey using history
Data are tried, training data training text sorting machine learning model is utilized;
B. the accuracy that the text classification machine learning model is tested using test data, if the text classification machine learning
The accuracy of model reaches 85% or more, then exports text sorting machine learning model, if it is not, then modifying text classification machine
The parameter of device learning model, and return step A;
Wherein, the text classification machine learning model is the text classification algorithm based on CNN/RNN, the text classification machine
The parameter of learning model can be learning rate, the neural network number of plies.
4. the method as described in claim 1, which is characterized in that the text classification machine learning model that the utilization pre-establishes
Before the classification results for obtaining the page info of domain name and its subdomain name, comprising:
Remove the code information in the page info of domain name and its subdomain name.
5. the method as described in claim 1, which is characterized in that the acquisition of domain name and its corresponding IP address collection of subdomain name
Process, comprising:
According to dns resolution principle, one domain name of acquisition is parsed using at least one dns server or its subdomain name is corresponding at least
One IP address utilizes a domain name or corresponding at least one IP address building domain name of its subdomain name or its subdomain name pair
The IP address collection answered, wherein dns server IP address corresponding with domain name or its subdomain name corresponds.
6. a kind of IP address service attribute caliberating device, which is characterized in that described device includes:
First acquisition unit, for obtaining the subdomain name of domain name and the page info of domain name and its subdomain name;
Second acquisition unit, for obtaining the page of domain name and its subdomain name using the text classification machine learning model pre-established
The classification results of face information;
Unit is demarcated, for corresponding using the classification results of domain name and its page info of subdomain name calibration domain name and its subdomain name
IP address collection categorical attribute.
7. device as claimed in claim 6, which is characterized in that the first acquisition unit, comprising:
First judgment module, it is whether legal for judging domain name, if domain name is legal, the second judgment module is executed, otherwise
End operation;
Second judgment module, for obtaining the First page information of domain name using web crawlers method, if the page of the First page information
Content is sky, then end operation, otherwise executes and obtains module;
Module is obtained, for obtaining the subdomain name in the First page information using regular expression matching, and exports the subdomain name;
Loop module, for repeating first judgment module to module is obtained, until there is no embedding in subdomain name to subdomain name
The subdomain name of set.
8. device as claimed in claim 6, which is characterized in that the text classification machine learning model pre-established is built
Vertical process, comprising:
Training module, for having demarcated the page info of categorical attribute using history as the instruction of text classification machine learning model
Practice data and test data, utilizes training data training text sorting machine learning model;
Test module, for testing the accuracy of the text classification machine learning model using test data, if the text
The accuracy of sorting machine learning model reaches 85% or more, then text sorting machine learning model is exported, if it is not, then repairing
Change the parameter of text classification machine learning model, and returns to training module;
Wherein, the text classification machine learning model is the text classification algorithm based on CNN/RNN, the text classification machine
The parameter of learning model can be learning rate, the neural network number of plies.
9. device as claimed in claim 6, which is characterized in that the text classification machine learning model that the utilization pre-establishes
Before the classification results for obtaining the page info of domain name and its subdomain name, comprising:
Remove the code information in the page info of domain name and its subdomain name.
10. device as claimed in claim 6, which is characterized in that domain name and its corresponding IP address collection of subdomain name obtain
Take process, comprising:
According to dns resolution principle, one domain name of acquisition is parsed using at least one dns server or its subdomain name is corresponding at least
One IP address utilizes a domain name or corresponding at least one IP address building domain name of its subdomain name or its subdomain name pair
The IP address collection answered, wherein dns server IP address corresponding with domain name or its subdomain name corresponds.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810970182.3A CN109388710A (en) | 2018-08-24 | 2018-08-24 | A kind of IP address service attribute scaling method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810970182.3A CN109388710A (en) | 2018-08-24 | 2018-08-24 | A kind of IP address service attribute scaling method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109388710A true CN109388710A (en) | 2019-02-26 |
Family
ID=65417571
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810970182.3A Pending CN109388710A (en) | 2018-08-24 | 2018-08-24 | A kind of IP address service attribute scaling method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109388710A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110795434A (en) * | 2019-10-30 | 2020-02-14 | 北京邮电大学 | Method and device for constructing service attribute database |
CN112149743A (en) * | 2020-09-25 | 2020-12-29 | 杭州安恒信息技术股份有限公司 | Access control method, device, equipment and medium |
CN112929458A (en) * | 2019-12-06 | 2021-06-08 | 中国电信股份有限公司 | Method and device for determining address of server of APP (application) and storage medium |
CN113076453A (en) * | 2021-03-22 | 2021-07-06 | 鹏城实验室 | Domain name classification method, device and computer readable storage medium |
CN113596194A (en) * | 2021-08-02 | 2021-11-02 | 牙木科技股份有限公司 | Method for DNS traffic classification calibration and DNS server |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103051742A (en) * | 2012-12-20 | 2013-04-17 | 新浪网技术(中国)有限公司 | IP (Internet Protocol) address attribute determining method, page processing method, relevant equipment and system |
CN103404182A (en) * | 2012-12-26 | 2013-11-20 | 华为技术有限公司 | Method and apparatus for preventing illegal access of business |
CN103684856A (en) * | 2013-11-27 | 2014-03-26 | 江苏省未来网络创新研究院 | Video website infrastructure measurement and analysis method |
JP2014230139A (en) * | 2013-05-23 | 2014-12-08 | Kddi株式会社 | Service estimation device and method |
US20150304199A1 (en) * | 2014-04-16 | 2015-10-22 | Jds Uniphase Corporation | Categorizing ip-based network traffic using dns data |
CN105516390A (en) * | 2015-12-23 | 2016-04-20 | 北京奇虎科技有限公司 | Method and device for managing domain name |
CN107404495A (en) * | 2017-09-01 | 2017-11-28 | 北京亚鸿世纪科技发展有限公司 | A kind of device based on IP address portrait |
CN108256104A (en) * | 2018-02-05 | 2018-07-06 | 恒安嘉新(北京)科技股份公司 | Internet site compressive classification method based on multidimensional characteristic |
-
2018
- 2018-08-24 CN CN201810970182.3A patent/CN109388710A/en active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103051742A (en) * | 2012-12-20 | 2013-04-17 | 新浪网技术(中国)有限公司 | IP (Internet Protocol) address attribute determining method, page processing method, relevant equipment and system |
CN103404182A (en) * | 2012-12-26 | 2013-11-20 | 华为技术有限公司 | Method and apparatus for preventing illegal access of business |
JP2014230139A (en) * | 2013-05-23 | 2014-12-08 | Kddi株式会社 | Service estimation device and method |
CN103684856A (en) * | 2013-11-27 | 2014-03-26 | 江苏省未来网络创新研究院 | Video website infrastructure measurement and analysis method |
US20150304199A1 (en) * | 2014-04-16 | 2015-10-22 | Jds Uniphase Corporation | Categorizing ip-based network traffic using dns data |
CN105516390A (en) * | 2015-12-23 | 2016-04-20 | 北京奇虎科技有限公司 | Method and device for managing domain name |
CN107404495A (en) * | 2017-09-01 | 2017-11-28 | 北京亚鸿世纪科技发展有限公司 | A kind of device based on IP address portrait |
CN108256104A (en) * | 2018-02-05 | 2018-07-06 | 恒安嘉新(北京)科技股份公司 | Internet site compressive classification method based on multidimensional characteristic |
Non-Patent Citations (1)
Title |
---|
高志强 等: "深度学习从入门到实战", vol. 1, 北京航空航天大学出版社, pages: 204 - 208 * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110795434A (en) * | 2019-10-30 | 2020-02-14 | 北京邮电大学 | Method and device for constructing service attribute database |
CN112929458A (en) * | 2019-12-06 | 2021-06-08 | 中国电信股份有限公司 | Method and device for determining address of server of APP (application) and storage medium |
CN112929458B (en) * | 2019-12-06 | 2023-04-07 | 中国电信股份有限公司 | Method and device for determining address of server of APP (application) and storage medium |
CN112149743A (en) * | 2020-09-25 | 2020-12-29 | 杭州安恒信息技术股份有限公司 | Access control method, device, equipment and medium |
CN113076453A (en) * | 2021-03-22 | 2021-07-06 | 鹏城实验室 | Domain name classification method, device and computer readable storage medium |
CN113076453B (en) * | 2021-03-22 | 2024-10-18 | 鹏城实验室 | Domain name classification method, device and computer readable storage medium |
CN113596194A (en) * | 2021-08-02 | 2021-11-02 | 牙木科技股份有限公司 | Method for DNS traffic classification calibration and DNS server |
CN113596194B (en) * | 2021-08-02 | 2023-07-21 | 牙木科技股份有限公司 | Method for classifying and calibrating DNS traffic and DNS server |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109388710A (en) | A kind of IP address service attribute scaling method and device | |
CN106446228A (en) | Collection analysis method and device for WEB page data | |
CN103678659A (en) | E-commerce website cheat user identification method and system based on random forest algorithm | |
Cappariello et al. | How does foreign demand activate domestic value added? A comparison among the largest euro-area economies | |
Jokonya et al. | Factors influencing retail SMEs adoption of social media for digital marketing | |
Hinson et al. | The Internet and export: Some cross-country evidence from selected African countries | |
US11245665B2 (en) | Training a learning algorithm to suggest domain names | |
CN108256078B (en) | Information acquisition method and device | |
Hassan et al. | Fintech in the Islamic Banking Sector and Its Impact on the Stakeholders in the Wake of COVID-19 | |
Koenig et al. | Globalization and E-commerce: Diffusion and Impacts of the Internet and E-commerce in Germany | |
CN105786834A (en) | Method and system for generating structured abstract of social webpage | |
Malala | Law and Regulation of Mobile Payment Systems: Issues arising ‘post’financial inclusion in Kenya | |
Nagaraj et al. | AI-driven Intelligent Models for Business Excellence | |
CN114780735B (en) | Policy matching method, system and readable storage medium based on data analysis | |
Bansal | Prospects of electronic commerce in India | |
Hassen et al. | Factors Influencing the adoption of e-commerce by Small and Medium-Sized Enterprises (SMEs) in Algeria: A qualitative study | |
CN115907968A (en) | Wind control rejection inference method and device based on pedestrian credit | |
Kim-Leffingwell et al. | Money backfires: How Chinese investment fuels anti-China protests abroad | |
US11539661B2 (en) | Using a learning algorithm to suggest domain names | |
Bąk | Accounting narratives and disclosures in reporting the case of Letters from the Management Board Presidents of selected companies in the light of narrative economics | |
Domingos | Online Consumer Behaviour: How to Create and Maintain E-Loyalty | |
Aldrich | Response to my critics | |
US20200242406A1 (en) | Creating training data for a learning algorithm to suggest domain names | |
Kavenuke et al. | Mobile money payment adoption in tourism: incidence from SMEs from Zanzibar | |
Lind | The role of e-commerce in the economic development of Vietnam during 1990 to 2020 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190226 |
|
RJ01 | Rejection of invention patent application after publication |