CN108959289A - Categories of websites acquisition methods and device - Google Patents

Categories of websites acquisition methods and device Download PDF

Info

Publication number
CN108959289A
CN108959289A CN201710351636.4A CN201710351636A CN108959289A CN 108959289 A CN108959289 A CN 108959289A CN 201710351636 A CN201710351636 A CN 201710351636A CN 108959289 A CN108959289 A CN 108959289A
Authority
CN
China
Prior art keywords
data set
data acquisition
acquisition system
order data
website
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710351636.4A
Other languages
Chinese (zh)
Other versions
CN108959289B (en
Inventor
林霞霞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Jingdong Shangke Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN201710351636.4A priority Critical patent/CN108959289B/en
Publication of CN108959289A publication Critical patent/CN108959289A/en
Application granted granted Critical
Publication of CN108959289B publication Critical patent/CN108959289B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/231Hierarchical techniques, i.e. dividing or merging pattern sets so as to obtain a dendrogram
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2216/00Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
    • G06F2216/07Guided tours

Abstract

This application discloses categories of websites acquisition methods and devices.One specific embodiment of this method includes: the order data set obtained targeted website in the first preset time period and access data acquisition system;Order data set and access data acquisition system are analyzed, order data is selected from order data set and generates target order data set, access data are selected from access data acquisition system and generate target access data acquisition system;Feature vector is extracted from target order data set and target access data acquisition system;Feature vector is input to websites collection model trained in advance to classify, obtains the second level classification of targeted website, wherein websites collection model is used to characterize the corresponding relationship of the feature vector of website and the second level classification of website.This embodiment improves websites collection efficiency.

Description

Categories of websites acquisition methods and device
Technical field
This application involves field of computer technology, and in particular to Internet technical field more particularly to categories of websites obtain Method and apparatus.
Background technique
With the popularity of the internet, more prominent the advantages of shopping at network.Use a network for the userbase of shopping not Disconnected to rise, various types of websites (such as Online Store) are also emerged one after another.
For same type of website, different management modes might have.According to different management modes, same type Website can also be divided into different classifications.
However, existing websites collection mode is usually that those skilled in the art divide website by manual analysis Class, websites collection efficiency are lower.
Summary of the invention
The purpose of the embodiment of the present application is to propose a kind of improved categories of websites acquisition methods and device, more than solving The technical issues of background technology part is mentioned.
In a first aspect, the embodiment of the present application provides a kind of categories of websites acquisition methods, this method comprises: obtaining target network The order data set and access data acquisition system stood in the first preset time period;To order data set and access data acquisition system It is analyzed, order data is selected from order data set and generates target order data set, from access data acquisition system It selects access data and generates target access data acquisition system;It is extracted from target order data set and target access data acquisition system Feature vector;Feature vector is input to websites collection model trained in advance to classify, obtains the second level class of targeted website Not, wherein websites collection model is used to characterize the corresponding relationship of the feature vector of website and the second level classification of website.
In some embodiments, feature vector includes at least one of the following: that the order volume of targeted website, targeted website are ordered Single amount of money, the user sessions of targeted website, targeted website pageview.
In some embodiments, classify feature vector is input in advance trained websites collection model, obtain After the second level classification of targeted website, further includes: the first mapping table of inquiry obtains belonging to the second level classification of targeted website Category, wherein the first mapping table is for storing category belonging to second level classification and second level classification;Obtain target The initial category that website is submitted in registration;Determine category belonging to the second level classification of targeted website and initial one Whether grade classification is identical;If not identical, output abnormality prompt information.
In some embodiments, classify feature vector is input in advance trained websites collection model, obtain After the second level classification of targeted website, further includes: the second mapping table of inquiry, the second level classification for obtaining targeted website are corresponding Lower list rush hour section, wherein when the second mapping table is for storing second level classification and second level classification corresponding lower single peak Between section;Export the corresponding lower single rush hour section of second level classification of targeted website.
In some embodiments, this method further includes the steps that the disaggregated model that sets up a web site, and set up a web site disaggregated model Step includes: to obtain order data set of multiple websites in the second preset time period and access data acquisition system respectively;To more The order data set and access data acquisition system of a website are analyzed, and are selected and are ordered from the order data set of multiple websites Forms data generates multiple sample order data set, and it is more that access data generation is selected from the access data acquisition system of multiple websites A sample interview data acquisition system;It is extracted from multiple sample order data set and multiple sample interview data acquisition systems respectively more A sampling feature vectors;Multiple sampling feature vectors are clustered, websites collection model is obtained.
In some embodiments, the order data set of multiple websites and access data acquisition system are analyzed, from multiple Order data is selected in the order data set of website generates multiple sample order data set, the access number from multiple websites Multiple sample interview data acquisition systems are generated according to access data are selected in set, comprising: by the order data set of multiple websites It is deleted with the order data of field missing in access data acquisition system and access data, obtains the first order data collection of multiple websites It closes and first accesses data acquisition system;The first order data set of multiple websites and the first access data acquisition system are gone respectively It handles again, obtains the second order data set and the second access data acquisition system of multiple websites;Based on preset first cluster It is several that second order data set of multiple websites and the second access data acquisition system are denoised, obtain multiple sample order datas Set and multiple sample interview data acquisition systems.
In some embodiments, it is extracted from multiple sample order data set and multiple sample interview data acquisition systems respectively Multiple sampling feature vectors out, comprising: multiple sample order data set and multiple sample interview data acquisition systems are carried out respectively Normalized obtains multiple normalized sample order data set and multiple normalized sample interview data acquisition systems;Point It first derivative set corresponding with multiple normalized sample order data set and Sheng Cheng not be visited with multiple normalized samples Ask data acquisition system corresponding first derivative set, and as multiple sampling feature vectors.
In some embodiments, multiple sampling feature vectors are clustered, obtains websites collection model, comprising: be based on Preset second cluster number and preset distance parameter carry out level to multiple sampling feature vectors using hierarchy clustering method Cluster, obtains websites collection model.
In some embodiments, hierarchy clustering method includes at least one of the following: knearest neighbour method, longest distance method, puts down Equal Furthest Neighbor, centroid distance method.
Second aspect, the embodiment of the present application provide a kind of categories of websites acquisition device, which includes: acquiring unit, It is configured to the order data set obtained targeted website in the first preset time period and access data acquisition system;Selection unit, It is configured to analyze order data set and access data acquisition system, it is raw that order data is selected from order data set At target order data set, access data are selected from access data acquisition system and generate target access data acquisition system;It extracts single Member is configured to extract feature vector from target order data set and target access data acquisition system;Taxon, configuration are used Classify in feature vector is input to websites collection model trained in advance, obtain the second level classification of targeted website, wherein Websites collection model is used to characterize the corresponding relationship of the feature vector of website and the second level classification of website.
In some embodiments, feature vector includes at least one of the following: that the order volume of targeted website, targeted website are ordered Single amount of money, the user sessions of targeted website, targeted website pageview.
In some embodiments, the device further include: the first query unit is configured to the first mapping table of inquiry, Obtain category belonging to the second level classification of targeted website, wherein the first mapping table is for storing second level classification and two Category belonging to grade classification;Classification acquiring unit is configured to obtain targeted website is submitted in registration initial one Grade classification;Determination unit is configured to determine that category belonging to the second level classification of targeted website is with initial category It is no identical;First output unit, if being configured to not identical, output abnormality prompt information.
In some embodiments, the device further include: the second query unit is configured to the second mapping table of inquiry, Obtain the corresponding lower single rush hour section of second level classification of targeted website, wherein the second mapping table is for storing second level class The corresponding lower single rush hour section of other and second level classification;Second output unit is configured to the second level classification of output targeted website Corresponding lower single rush hour section.
In some embodiments, which further includes websites collection model foundation unit, websites collection model foundation unit Include: acquisition subelement, is configured to obtain order data set and visit of multiple websites in the second preset time period respectively Ask data acquisition system;Subelement is chosen, is configured to analyze the order data set and access data acquisition system of multiple websites, Order data is selected from the order data set of multiple websites and generates multiple sample order data set, from multiple websites Access data are selected in access data acquisition system generates multiple sample interview data acquisition systems;Subelement is extracted, is configured to distinguish Multiple sampling feature vectors are extracted from multiple sample order data set and multiple sample interview data acquisition systems;Cluster is single Member.It is configured to cluster multiple sampling feature vectors, obtains websites collection model.
In some embodiments, choosing subelement includes: removing module, is configured to the order data collection of multiple websites It closes and accesses the order data of field missing and access data in data acquisition system to delete, obtain the first order data of multiple websites Set and the first access data acquisition system;Deduplication module is configured to the first order data set to multiple websites respectively and One access data acquisition system carries out duplicate removal processing, obtains the second order data set and the second access data acquisition system of multiple websites; Module is denoised, is configured to access the second order data set of multiple websites and second based on preset first cluster number Data acquisition system is denoised, and multiple sample order data set and multiple sample interview data acquisition systems are obtained.
In some embodiments, extracting subelement includes: normalization module, is configured to respectively to multiple sample order numbers According to set and multiple sample interview data acquisition systems be normalized, obtain multiple normalized sample order data set and Multiple normalized sample interview data acquisition systems;Derivation module is configured to generate respectively and multiple normalized sample orders The corresponding first derivative set of data acquisition system and first derivative set corresponding with multiple normalized sample interview data acquisition systems, And as multiple sampling feature vectors.
In some embodiments, cluster subelement is further configured to: based on preset second cluster number and being preset Distance parameter, using hierarchy clustering method to multiple sampling feature vectors carry out hierarchical clustering, obtain websites collection model.
In some embodiments, hierarchy clustering method includes at least one of the following: knearest neighbour method, longest distance method, puts down Equal Furthest Neighbor, centroid distance method.
The third aspect, the embodiment of the present application provide a kind of server, which includes: one or more processors; Storage device, for storing one or more programs;When one or more programs are executed by one or more processors, so that one A or multiple processors realize the method as described in implementation any in first aspect.
Fourth aspect, the embodiment of the present application provide a kind of computer readable storage medium, are stored thereon with computer journey Sequence realizes the method as described in implementation any in first aspect when the computer program is executed by processor.
Categories of websites acquisition methods and device provided by the embodiments of the present application, by obtaining targeted website when first is default Between order data set and access data acquisition system in section, to analyze order data set and access data acquisition system, To generate target order data set and target access data acquisition system;Then, from target order data set and target access Feature vector is extracted in data acquisition system;Classify finally, feature vector is input to websites collection model trained in advance, from And obtain the second level classification of targeted website.Classified by websites collection model to website, to improve websites collection effect Rate.
Detailed description of the invention
By reading a detailed description of non-restrictive embodiments in the light of the attached drawings below, the application's is other Feature, objects and advantages will become more apparent upon:
Fig. 1 is that this application can be applied to exemplary system architecture figures therein;
Fig. 2 is the flow chart according to one embodiment of the categories of websites acquisition methods of the application;
Fig. 3 is the flow chart according to one embodiment of the method for the disaggregated model that sets up a web site of the application;
Fig. 4 is the structural schematic diagram according to one embodiment of the categories of websites acquisition device of the application;
Fig. 5 is adapted for the structural schematic diagram for the computer system for realizing the server of the embodiment of the present application.
Specific embodiment
The application is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched The specific embodiment stated is used only for explaining related invention, rather than the restriction to the invention.It also should be noted that in order to Convenient for description, part relevant to related invention is illustrated only in attached drawing.
It should be noted that in the absence of conflict, the features in the embodiments and the embodiments of the present application can phase Mutually combination.The application is described in detail below with reference to the accompanying drawings and in conjunction with the embodiments.
Fig. 1 is shown can be using the categories of websites acquisition methods of the application or the exemplary system of categories of websites acquisition device System framework 100.
As shown in Figure 1, system architecture 100 may include terminal device 101, database server 102,103 kimonos of network Business device 104.Network 103 between terminal device 101, database server 102 and server 104 to provide communication link Medium.Network 103 may include various connection types, such as wired, wireless communication link or fiber optic cables etc..
User can be used terminal device 101 and be interacted by network 103 with server 104, to receive or send message etc.. For example, terminal device 101, which can be used, in user sends targeted website in the first preset time to server 104 by network 103 Order data set and access data acquisition system in section.Wherein, terminal device 101 can be various electronic equipments, including but not It is limited to smart phone, tablet computer, E-book reader, pocket computer on knee and desktop computer etc..
Database server 102 can be used for order data set of the storage targeted website in the first preset time period With access data acquisition system so that by network 103 to obtain targeted website from database server 102 pre- first for server 104 If order data set and access data acquisition system in the period.
Server 104 can be to provide the server of various services.For example, server 104 can from terminal device 101 or Person's database server 102 obtains order data set and access data acquisition system of the targeted website in the first preset time period, And order data set of the accessed targeted website in the first preset time period and access data acquisition system are analyzed Deng processing, and export processing result (such as second level classification of targeted website).
It should be noted that categories of websites acquisition methods provided by the embodiment of the present application are generally executed by server 104, Correspondingly, categories of websites acquisition device is generally positioned in server 104.
It should be understood that the number of terminal device, database server, network and server in Fig. 1 is only schematic 's.According to needs are realized, any number of terminal device, database server, network and server can have.In server In the case where being stored with order data set of the targeted website in the first preset time period and access data acquisition system in 104, it is Terminal device 101 and database server 102 can be not provided in system framework 100.
With continued reference to Fig. 2, it illustrates the processes according to one embodiment of the categories of websites acquisition methods of the application 200.The categories of websites acquisition methods, comprising the following steps:
Step 201, order data set and access data acquisition system of the targeted website in the first preset time period are obtained.
In the present embodiment, electronic equipment (such as the server shown in FIG. 1 of categories of websites acquisition methods operation thereon 104) available targeted website ordering in the first preset time period (such as in some day, in a certain week, in certain January etc.) Forms data set and access data acquisition system.Wherein, website typically refers to use HTML according to certain rule on the internet Tool makings such as (Hyper Text Markup Language, HyperText Markup Languages) are used to show specific content correlation The set of webpage.For example, website can be the Online Store on some e-commerce platform, targeted website can be some electronics Some Online Store on business platform.
In the present embodiment, order data set can be the data set relevant to order of user in the target website It closes.Wherein, every order data can include but is not limited to: the information of targeted website is (for example, the title of targeted website, target Telephone number, address of targeted website of website etc.), the information of lower single user (for example, the account name of lower single user, under be applied alone The telephone number at family, address of lower single user etc.), the information of lower single article is (for example, the title of lower list article, lower single article SKU (Stock Keeping Unit, keeper unit) numbers, the category of lower single article, the price of lower single article etc.) etc. data. Access data acquisition system can be the data acquisition system relevant to access of user in the target website.Wherein, every access data can To include but is not limited to: the information of targeted website is (for example, the title of targeted website, the telephone number of targeted website, targeted website Address etc.), the information of access user is (for example, the account name of access user, the telephone number for accessing user, access user Address etc.), the information of access article is (for example, the title of access article, the category for accessing No. SKU of article, accessing article, visit Ask the price etc. of article) etc. data.
It should be noted that electronic equipment can from local, communicate with terminal (such as the terminal shown in FIG. 1 of connection Equipment 101) or communicate with and obtain target in the database server (such as database server 102 shown in FIG. 1) of connection Order data set of the website in the first preset time period and access data acquisition system, the present embodiment obtain electronic equipment from where Take order data set of the targeted website in the first preset time period and access data acquisition system without limiting.
Step 202, order data set and access data acquisition system are analyzed, selects and orders from order data set Forms data generates target order data set, and access data are selected from access data acquisition system and generate target access data set It closes.
In the present embodiment, based on order data set acquired in step 201 and access data acquisition system, electronic equipment can To analyze order data set and access data acquisition system, target order data set is obtained from order data set, Target access data acquisition system is obtained from access data acquisition system.
In the present embodiment, electronic equipment can obtain target order data set and target access number in several ways According to set.
In some optional implementations of the present embodiment, electronic equipment can be randomly selected from order data set Several order datas generate target order data set out;Electronic equipment can randomly select out several from access data acquisition system It accesses data and generates target access data acquisition system.
In some optional implementations of the present embodiment, electronic equipment can be first by order data set and access The order data and access data that field lacks in data acquisition system are deleted;Then respectively to order data set and access data set It closes and carries out duplicate removal processing, to obtain target order data set and target access data acquisition system.
Step 203, feature vector is extracted from target order data set and target access data acquisition system.
In the present embodiment, step 202 target order data set generated and target access data acquisition system, electricity are based on Sub- equipment can extract feature vector from target order data set and target access data acquisition system.As an example, electronics is set It is standby can be for statistical analysis to target order data set, to obtain the order volume of targeted website;Electronic equipment can be with It is for statistical analysis to target access data acquisition system, to obtain the user sessions of targeted website.At this point, electronic equipment can be by mesh The order volume of website and the user sessions of targeted website are marked as feature vector;It can also order volume and target network to targeted website The user sessions stood is normalized, and using the user sessions of the order volume of normalized targeted website and targeted website as spy Levy vector.
In some optional implementations of the present embodiment, feature vector can include but is not limited to following at least one : the order volume of targeted website, the order amount of money of targeted website, the user sessions of targeted website, targeted website pageview.
Step 204, feature vector is input to websites collection model trained in advance to classify, obtains targeted website Second level classification.
In the present embodiment, it is based on the extracted feature vector of step 203, feature vector can be input to by electronic equipment Trained websites collection model is classified in advance, to obtain the second level classification of targeted website.Wherein, second level classification can be The management mode classification of website.For example, second level classification can include but is not limited to: wholesale and retail pattern class, solid shop/brick and mortar store on-line shop Pattern class, buys pattern class on behalf at distribution model classification.
In the present embodiment, websites collection model can be used for characterizing the feature vector of website and the second level classification of website Corresponding relationship.Here, electronic equipment can set up a web site disaggregated model in several ways.For example, electronic equipment can be based on The second level classification of feature vector and website to a large amount of websites counts and generates the second level for being stored with multiple feature vectors and website The mapping table of the corresponding relationship of classification, and using the mapping table as websites collection model.
In some optional implementations of the present embodiment, after obtaining the second level classification of targeted website, electronics is set It is standby to inquire the first mapping table first, obtain category belonging to the second level classification of targeted website;Later, mesh is obtained The initial category that mark website is submitted in registration;Then, it is determined that category belonging to the second level classification of targeted website It is whether identical as initial category;Finally, the category belonging to the second level classification of targeted website and initial category In different situation, output abnormality prompt information.Wherein, the first mapping table can be used for storing second level classification and second level Category belonging to classification.Category can be the type of website, according to the difference of the category of website institute items for merchandising, net Multiple types can be divided by standing, for example, electronic product website, books class website, foodstuff website, drug class website, Clothing website etc..As an example, if category belonging to the second level classification of targeted website is drug class, and targeted website The initial category submitted in registration is clothing, at this point, electronic equipment can be with output abnormality prompt information, for mentioning The case where showing targeted website there may be fake registrations.
In some optional implementations of the present embodiment, after obtaining the second level classification of targeted website, electronics is set It is standby to inquire the second mapping table first, obtain the corresponding lower single rush hour section of second level classification of targeted website;Then, Export the corresponding lower single rush hour section of second level classification of targeted website.Wherein, the second mapping table can be used for storing two Grade classification and the corresponding lower single rush hour section of second level classification.Here, for each second level classification, those skilled in the art can be with It is for statistical analysis to lower single time of a large amount of websites, to obtain the corresponding lower single rush hour section of each second level classification.
Categories of websites acquisition methods provided by the embodiments of the present application, by obtaining targeted website in the first preset time period Order data set and access data acquisition system, to analyze order data set and access data acquisition system, thus raw At target order data set and target access data acquisition system;Then, from target order data set and target access data set Feature vector is extracted in conjunction;Classify finally, feature vector is input to websites collection model trained in advance, to obtain The second level classification of targeted website.Classified by websites collection model to website, to improve websites collection efficiency.
With further reference to Fig. 3, it illustrates the processes 300 of one embodiment of the method for the disaggregated model that sets up a web site.It should Set up a web site disaggregated model method process 300, comprising the following steps:
Step 301, order data set of multiple websites in the second preset time period and access data set are obtained respectively It closes.
In the present embodiment, electronic equipment (such as server 104 shown in FIG. 1) can obtain multiple websites respectively Order data set and access data acquisition system in two preset time periods (such as in some day, in a certain week, in certain January etc.). Wherein, website can be the Online Store on some e-commerce platform.
Step 302, the order data set to multiple websites and access data acquisition system are analyzed, from ordering for multiple websites Order data is selected in forms data set and generates multiple sample order data set, from the access data acquisition system of multiple websites It selects access data and generates multiple sample interview data acquisition systems.
In the present embodiment, order data set and access data acquisition system based on multiple websites acquired in step 301, Electronic equipment can order data set to multiple websites and access data acquisition system analyze, the order numbers from multiple websites Multiple sample order data set are generated according to order data is selected in set, are chosen from the access data acquisition system of multiple websites Access data generate multiple sample interview data acquisition systems out.
In the present embodiment, electronic equipment can obtain multiple sample order data set in several ways and sample is visited Ask data acquisition system.
In some optional implementations of the present embodiment, for each website in multiple websites, electronic equipment can To randomly select out the sample order data set that several order datas generate the website from the order data set of the website; Electronic equipment can randomly select out the sample that several access data generate the website from the access data acquisition system of the website and visit Ask data acquisition system.
In some optional implementations of the present embodiment, electronic equipment can obtain multiple samples by following steps Order data set and sample interview data acquisition system.
Firstly, electronic equipment can be ordered what field in the order data set of multiple websites and access data acquisition system lacked Forms data and access data are deleted, and the first order data set and the first access data acquisition system of multiple websites are obtained.Specifically, For every order data of each website or every access data, electronic equipment can determine that this order data or this are visited Ask whether the field in data is complete, if imperfect, this order data or this is accessed into data and deleted.
Then, electronic equipment can respectively to the first order data set of multiple websites and first access data acquisition system into Row duplicate removal processing obtains the second order data set and the second access data acquisition system of multiple websites.Specifically, for each net The the first order data set stood or the first access data acquisition system, electronic equipment can be to the first order data set of the website Or first access data acquisition system carry out duplicate removal processing, duplicate first orders in the first order data set to get rid of the website Duplicate first access data in forms data or the first access data acquisition system.
Finally, electronic equipment can be based on preset first cluster number (for example, the first cluster number is between 12-17 Value) the second order data set of multiple websites and the second access data acquisition system are denoised, obtain multiple sample orders Data acquisition system and multiple sample interview data acquisition systems.Specifically, electronic equipment can be poly- using level based on the first cluster number Class method carries out hierarchical clustering to the second order data set of multiple websites and the second access data acquisition system, may be deposited with removal In the second order data set of the website of fake registrations and the second access data acquisition system, and the second order of remaining website Data acquisition system and the second access data acquisition system are as multiple sample order data set and multiple sample interview data acquisition systems.
Step 303, it is extracted from multiple sample order data set and multiple sample interview data acquisition systems respectively multiple Sampling feature vectors.
In the present embodiment, step 302 multiple sample order data set generated and multiple sample interview numbers are based on According to set, electronic equipment can extract from multiple sample order data set and multiple sample interview data acquisition systems more respectively A sampling feature vectors.Wherein, sampling feature vectors can include but is not limited at least one of following: the order volume of website, net The user sessions of the order amount of money, website stood, website pageview.As an example, for each sample order data set or often A sample interview data acquisition system, electronic equipment can be for statistical analysis to the sample order data set, to obtain the sample The corresponding order volume of this order data set;Electronic equipment can also be for statistical analysis to the sample interview data acquisition system, from And obtain the corresponding user sessions of sample interview data acquisition system.At this point, electronic equipment can be by the sample order data set pair The order volume answered user sessions corresponding with the sample interview data acquisition system is as sampling feature vectors.
In some optional implementations of the present embodiment, electronic equipment can extract multiple samples by following steps Feature vector.
Firstly, electronic equipment can respectively carry out multiple sample order data set and multiple sample interview data acquisition systems Normalized obtains multiple normalized sample order data set and multiple normalized sample interview data acquisition systems.This In, electronic equipment can use min-max standardized method to multiple sample order data set and multiple sample interview data Set is normalized.Specifically, minimum value (min) and maximum value (max) can be arranged in electronic equipment first;Then will Original value x standardizes formula by following min-max and is mapped to the value x in section [min, max]*:
As an example, order volume of certain website within certain day 7-12 moment is as shown in table 1 below:
Table 1
If the order volume in each moment in table 1 is standardized formula by min-max to be mapped in section [0,1] Normalized numerical value, then normalized order volume of certain website within certain day 7-12 moment is as shown in table 2 below:
Table 2
Then, electronic equipment can generate first derivative corresponding with multiple normalized sample order data set respectively Set and first derivative set corresponding with multiple normalized sample interview data acquisition systems, and as multiple sample characteristics to Amount.Example immediately above, electronic equipment can use following formula and ask corresponding with the normalized order volume in each moment First derivative f'(x* i):
Wherein, i is positive integer, and 7≤i≤12, x*For normalized order volume, x* iIt is normalized in the i-th moment Order volume, f'(x*) it is first derivative corresponding with normalized order volume, f'(x* i) it is to be ordered with normalized in the i-th moment It is single to measure corresponding first derivative.
Step 304, multiple sampling feature vectors are clustered, obtains websites collection model.
In the present embodiment, the extracted multiple sampling feature vectors of step 303 are based on, electronic equipment can be to multiple samples Eigen vector is clustered, thus between the feature vector to set up a web site and the second level classification of website accurate corresponding relationship instruction Websites collection model after white silk.Wherein, cluster is usually and is divided into the set of physics or abstract object to be made of similar object Multiple classes process.Pair by clustering the set that class generated is one group of data object, in these objects and same class It is different with the object in other classes as similar to each other.Here, to multiple sampling feature vectors carry out cluster can be generated it is multiple Class, the corresponding second level classification of each class.
In some optional implementations of the present embodiment, electronic equipment can be based on preset second cluster number It is (in general, the second cluster number is generally less than the first cluster number, for example, second cluster number value between 2-5) and default Distance parameter, using hierarchy clustering method to multiple sampling feature vectors carry out hierarchical clustering, obtain websites collection model.Its In, hierarchical clustering is a kind of main clustering method, completes to cluster by generating a series of clustering tree of nestings.Single-point cluster It is in the bottom of tree, has a root node cluster in the top layer of tree.Root node cluster covers whole all data points.Layer Secondary cluster can be divided into merging (from bottom to top) cluster and division (from top to bottom) cluster, use agglomerative clustering here.Distance ginseng Number may include the distance between two objects of the distance between two classes value and same class value.Here, distance parameter institute The distance of instruction can be Euclidean distance or manhatton distance.The termination condition of hierarchical clustering is for the distance between two classes and together The distance between two objects of one class reach distance indicated by distance parameter or the number of class reaches the second cluster Number.
In some optional implementations of the present embodiment, hierarchy clustering method can include but is not limited to it is following at least One: knearest neighbour method (SL method, single-linkage), is put down at longest distance method (CL method, complete-linkage) Equal Furthest Neighbor (AL method, average-linkage), centroid distance method (centroid-linkage).Wherein, knearest neighbour method Between class distance be equal to two class objects between minimum range.The between class distance of longest distance method is equal between two class objects most Big distance.The between class distance of average distance method is equal to the average distance between two class objects.The class spacing of centroid distance method is equal to The distance between two class object mass centers.
The method of the disaggregated model provided by the embodiments of the present application that sets up a web site, by obtaining multiple websites when second is default Between order data set and access data acquisition system in section, so as to the order data set and access data acquisition system to multiple websites It is analyzed, to generate multiple sample order data set and sample interview data acquisition system;Then, it is ordered respectively from multiple samples Multiple sampling feature vectors are extracted in forms data set and multiple sample interview data acquisition systems;Finally, to multiple sample characteristics Vector is clustered, to obtain websites collection model.To realize the two of the feature vector and website that rapidly set up a web site The websites collection model of accurate corresponding relationship between grade classification.
With further reference to Fig. 4, as the realization to method shown in above-mentioned each figure, this application provides a kind of categories of websites to obtain One embodiment of device is taken, the Installation practice is corresponding with embodiment of the method shown in Fig. 2, which can specifically apply In various electronic equipments.
As shown in figure 4, the categories of websites acquisition device 400 of the present embodiment may include: acquiring unit 401, selection unit 402, extraction unit 403 and taxon 404.Wherein, it is default first to be configured to acquisition targeted website for acquiring unit 401 Order data set and access data acquisition system in period;Selection unit 402 is configured to order data set and access Data acquisition system is analyzed, and order data is selected from order data set and generates target order data set, from access number Target access data acquisition system is generated according to access data are selected in set;Extraction unit 403 is configured to from target order data Feature vector is extracted in set and target access data acquisition system;Taxon 404 is configured to for feature vector being input in advance Trained websites collection model is classified, and the second level classification of targeted website is obtained, wherein websites collection model is for characterizing net The corresponding relationship of the second level classification of the feature vector and website stood.
In the present embodiment, in categories of websites acquisition device 400: acquiring unit 401, selection unit 402, extraction unit 403 and taxon 404 specific processing and its brought technical effect can be respectively with reference to the step in Fig. 2 corresponding embodiment 201, the related description of step 202, step 203 and step 204, details are not described herein.
In some optional implementations of the present embodiment, feature vector includes at least one of the following: targeted website Order volume, the order amount of money of targeted website, the user sessions of targeted website, targeted website pageview.
In some optional implementations of the present embodiment, categories of websites acquisition device 400 can also include: first to look into Unit (not shown) is ask, the first mapping table of inquiry is configured to, obtains one belonging to the second level classification of targeted website Grade classification, wherein the first mapping table is for storing category belonging to second level classification and second level classification;Classification obtains single First (not shown) is configured to obtain the initial category that targeted website is submitted in registration;Determination unit, configuration For determining whether category belonging to the second level classification of targeted website is identical as initial category;First output unit (not shown), if being configured to not identical, output abnormality prompt information.
In some optional implementations of the present embodiment, categories of websites acquisition device 400 can also include: second to look into Ask unit (not shown), be configured to inquiry the second mapping table, obtain targeted website second level classification it is corresponding under Single rush hour section, wherein the second mapping table is for storing second level classification and second level classification corresponding lower single rush hour Section;Second output unit (not shown) is configured to second level classification corresponding lower single rush hour of output targeted website Section.
In some optional implementations of the present embodiment, categories of websites acquisition device 400 can also be including website point Class model establishes unit (not shown), and websites collection model foundation unit may include: to obtain subelement (not show in figure Out), it is configured to obtain order data set of multiple websites in the second preset time period and access data acquisition system respectively;Choosing Subelement (not shown) is taken, is configured to analyze the order data set and access data acquisition system of multiple websites, Order data is selected from the order data set of multiple websites and generates multiple sample order data set, from multiple websites Access data are selected in access data acquisition system generates multiple sample interview data acquisition systems;Subelement (not shown) is extracted, It is configured to extract multiple sample characteristics from multiple sample order data set and multiple sample interview data acquisition systems respectively Vector;Cluster subelement (not shown).It is configured to cluster multiple sampling feature vectors, obtains websites collection mould Type.
In some optional implementations of the present embodiment, choosing subelement may include: that removing module (does not show in figure Out), it is configured to the order data and access of field missing in the order data set of multiple websites and access data acquisition system Data are deleted, and the first order data set and the first access data acquisition system of multiple websites are obtained;Deduplication module (does not show in figure Out), it is configured to carry out duplicate removal processing to the first order data set of multiple websites and the first access data acquisition system respectively, obtain To the second order data set of multiple websites and the second access data acquisition system;Module (not shown) is denoised, is configured to The second order data set of multiple websites and the second access data acquisition system are denoised based on preset first cluster number, Obtain multiple sample order data set and multiple sample interview data acquisition systems.
In some optional implementations of the present embodiment, extracting subelement may include: to normalize module (in figure not Show), it is configured to that multiple sample order data set and multiple sample interview data acquisition systems are normalized respectively, Obtain multiple normalized sample order data set and multiple normalized sample interview data acquisition systems;Derivation module is (in figure Be not shown), be configured to generate respectively first derivative set corresponding with multiple normalized sample order data set and with The corresponding first derivative set of multiple normalized sample interview data acquisition systems, and as multiple sampling feature vectors.
In some optional implementations of the present embodiment, cluster subelement is further configured to: based on preset Second cluster number and preset distance parameter carry out hierarchical clustering to multiple sampling feature vectors using hierarchy clustering method, Obtain websites collection model.
In some optional implementations of the present embodiment, hierarchy clustering method includes at least one of the following: most short distance From method, longest distance method, average distance method, centroid distance method.
Below with reference to Fig. 5, it illustrates the computer systems 500 for the server for being suitable for being used to realize the embodiment of the present application Structural schematic diagram.Server shown in Fig. 5 is only an example, should not function and use scope band to the embodiment of the present application Carry out any restrictions.
As shown in figure 5, computer system 500 includes central processing unit (CPU) 501, it can be read-only according to being stored in Program in memory (ROM) 502 or be loaded into the program in random access storage device (RAM) 503 from storage section 508 and Execute various movements appropriate and processing.In RAM 503, also it is stored with system 500 and operates required various programs and data. CPU 501, ROM 502 and RAM 503 are connected with each other by bus 504.Input/output (I/O) interface 505 is also connected to always Line 504.
I/O interface 505 is connected to lower component: the importation 506 including keyboard, mouse etc.;It is penetrated including such as cathode The output par, c 507 of spool (CRT), liquid crystal display (LCD) etc. and loudspeaker etc.;Storage section 508 including hard disk etc.; And the communications portion 509 of the network interface card including LAN card, modem etc..Communications portion 509 via such as because The network of spy's net executes communication process.Driver 510 is also connected to I/O interface 505 as needed.Detachable media 511, such as Disk, CD, magneto-optic disk, semiconductor memory etc. are mounted on as needed on driver 510, in order to read from thereon Computer program be mounted into storage section 508 as needed.
Particularly, in accordance with an embodiment of the present disclosure, it may be implemented as computer above with reference to the process of flow chart description Software program.For example, embodiment of the disclosure includes a kind of computer program product comprising be carried on computer-readable medium On computer program, which includes the program code for method shown in execution flow chart.In such reality It applies in example, which can be downloaded and installed from network by communications portion 509, and/or from detachable media 511 are mounted.When the computer program is executed by central processing unit (CPU) 501, limited in execution the present processes Above-mentioned function.
It should be noted that the above-mentioned computer-readable medium of the application can be computer-readable signal media or meter Calculation machine readable storage medium storing program for executing either the two any combination.Computer readable storage medium for example can be --- but not Be limited to --- electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor system, device or device, or any above combination.Meter The more specific example of calculation machine readable storage medium storing program for executing can include but is not limited to: have the electrical connection, just of one or more conducting wires Taking formula computer disk, hard disk, random access storage device (RAM), read-only memory (ROM), erasable type may be programmed read-only storage Device (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light storage device, magnetic memory device, Or above-mentioned any appropriate combination.In this application, computer readable storage medium can be it is any include or storage journey The tangible medium of sequence, the program can be commanded execution system, device or device use or in connection.And at this In application, computer-readable signal media may include in a base band or as carrier wave a part propagate data-signal, Wherein carry computer-readable program code.The data-signal of this propagation can take various forms, including but unlimited In electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be that computer can Any computer-readable medium other than storage medium is read, which can send, propagates or transmit and be used for By the use of instruction execution system, device or device or program in connection.Include on computer-readable medium Program code can transmit with any suitable medium, including but not limited to: wireless, electric wire, optical cable, RF etc. are above-mentioned Any appropriate combination.
Flow chart and block diagram in attached drawing are illustrated according to the system of the various embodiments of the application, method and computer journey The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation A part of one module, program segment or code of table, a part of the module, program segment or code include one or more use The executable instruction of the logic function as defined in realizing.It should also be noted that in some implementations as replacements, being marked in box The function of note can also occur in a different order than that indicated in the drawings.For example, two boxes succeedingly indicated are actually It can be basically executed in parallel, they can also be executed in the opposite order sometimes, and this depends on the function involved.Also it to infuse Meaning, the combination of each box in block diagram and or flow chart and the box in block diagram and or flow chart can be with holding The dedicated hardware based system of functions or operations as defined in row is realized, or can use specialized hardware and computer instruction Combination realize.
Being described in unit involved in the embodiment of the present application can be realized by way of software, can also be by hard The mode of part is realized.Described unit also can be set in the processor, for example, can be described as: a kind of processor packet Include acquiring unit, selection unit, extraction unit and taxon.Wherein, the title of these units not structure under certain conditions The restriction of the pairs of unit itself, for example, acquiring unit is also described as " obtaining targeted website in the first preset time period The unit of interior order data set and access data acquisition system ".
As on the other hand, present invention also provides a kind of computer-readable medium, which be can be Included in server described in above-described embodiment;It is also possible to individualism, and without in the supplying server.It is above-mentioned Computer-readable medium carries one or more program, when said one or multiple programs are executed by the server, So that the server: obtaining order data set and access data acquisition system of the targeted website in the first preset time period;To ordering Forms data set and access data acquisition system are analyzed, and order data is selected from order data set and generates target order numbers According to set, access data are selected from access data acquisition system and generate target access data acquisition system;From target order data set Feature vector is extracted in target access data acquisition system;Feature vector is input to websites collection model trained in advance to be divided Class obtains the second level classification of targeted website, wherein websites collection model is for characterizing the feature vector of website and the second level of website The corresponding relationship of classification.
Above description is only the preferred embodiment of the application and the explanation to institute's application technology principle.Those skilled in the art Member is it should be appreciated that invention scope involved in the application, however it is not limited to technology made of the specific combination of above-mentioned technical characteristic Scheme, while should also cover in the case where not departing from foregoing invention design, it is carried out by above-mentioned technical characteristic or its equivalent feature Any combination and the other technical solutions formed.Such as features described above has similar function with (but being not limited to) disclosed herein Can technical characteristic replaced mutually and the technical solution that is formed.

Claims (13)

1. a kind of categories of websites acquisition methods, which is characterized in that the described method includes:
Obtain order data set and access data acquisition system of the targeted website in the first preset time period;
The order data set and the access data acquisition system are analyzed, selects and orders from the order data set Forms data generates target order data set, and access data are selected from the access data acquisition system and generate target access data Set;
Feature vector is extracted from the target order data set and the target access data acquisition system;
Described eigenvector is input to websites collection model trained in advance to classify, obtains the second level of the targeted website Classification, wherein the websites collection model is used to characterize the corresponding relationship of the feature vector of website and the second level classification of website.
2. the method according to claim 1, wherein described eigenvector includes at least one of the following: the mesh Mark the order volume of website, the order amount of money of the targeted website, the user sessions of the targeted website, the targeted website browsing Amount.
3. the method according to claim 1, wherein described eigenvector is input to training in advance described Websites collection model is classified, after obtaining the second level classification of the targeted website, further includes:
The first mapping table is inquired, category belonging to the second level classification of the targeted website is obtained, wherein described first Mapping table is for storing category belonging to second level classification and second level classification;
Obtain the initial category that the targeted website is submitted in registration;
Determine whether category belonging to the second level classification of the targeted website and the initial category are identical;
If not identical, output abnormality prompt information.
4. the method according to claim 1, wherein described eigenvector is input to training in advance described Websites collection model is classified, after obtaining the second level classification of the targeted website, further includes:
The second mapping table is inquired, obtains the corresponding lower single rush hour section of second level classification of the targeted website, wherein institute The second mapping table is stated for storing second level classification and the corresponding lower single rush hour section of second level classification;
Export the corresponding lower single rush hour section of second level classification of the targeted website.
5. method described in one of -4 according to claim 1, which is characterized in that the method also includes the disaggregated models that sets up a web site The step of, it is described set up a web site disaggregated model the step of include:
Order data set of multiple websites in the second preset time period and access data acquisition system are obtained respectively;
Order data set and access data acquisition system to the multiple website are analyzed, the order numbers from the multiple website Multiple sample order data set are generated according to order data is selected in set, from the access data acquisition system of the multiple website It selects access data and generates multiple sample interview data acquisition systems;
Multiple samples are extracted from the multiple sample order data set and the multiple sample interview data acquisition system respectively Feature vector;
The multiple sampling feature vectors are clustered, websites collection model is obtained.
6. according to the method described in claim 5, it is characterized in that, described to the order data set of the multiple website and visit It asks that data acquisition system is analyzed, the multiple samples of order data generation is selected from the order data set of the multiple website and are ordered Forms data set selects access data from the access data acquisition system of the multiple website and generates multiple sample interview data sets It closes, comprising:
By the order data and access data of field missing in the order data set of the multiple website and access data acquisition system It deletes, obtains the first order data set and the first access data acquisition system of the multiple website;
Duplicate removal processing is carried out to the first order data set of the multiple website and the first access data acquisition system respectively, obtains institute State the second order data set and the second access data acquisition system of multiple websites;
Based on preset first cluster number to the second order data set of the multiple website and the second access data acquisition system It is denoised, obtains multiple sample order data set and multiple sample interview data acquisition systems.
7. according to the method described in claim 5, it is characterized in that, it is described respectively from the multiple sample order data set and Multiple sampling feature vectors are extracted in the multiple sample interview data acquisition system, comprising:
The multiple sample order data set and the multiple sample interview data acquisition system are normalized respectively, obtained To multiple normalized sample order data set and multiple normalized sample interview data acquisition systems;
Generate respectively first derivative set corresponding with the multiple normalized sample order data set and with it is the multiple The corresponding first derivative set of normalized sample interview data acquisition system, and as multiple sampling feature vectors.
8. according to the method described in claim 5, it is characterized in that, described cluster the multiple sampling feature vectors, Obtain websites collection model, comprising:
Based on preset second cluster number and preset distance parameter, using hierarchy clustering method to the multiple sample characteristics Vector carries out hierarchical clustering, obtains websites collection model.
9. according to the method described in claim 8, it is characterized in that, the hierarchy clustering method includes at least one of the following: most Short distance method, longest distance method, average distance method, centroid distance method.
10. a kind of categories of websites acquisition device, which is characterized in that described device includes:
Acquiring unit is configured to the order data set obtained targeted website in the first preset time period and access data set It closes;
Selection unit is configured to analyze the order data set and the access data acquisition system, from the order Order data is selected in data acquisition system and generates target order data set, selects access number from the access data acquisition system According to generation target access data acquisition system;
Extraction unit, be configured to from the target order data set and the target access data acquisition system extract feature to Amount;
Taxon is configured to for described eigenvector being input to websites collection model trained in advance and classifies, obtains The second level classification of the targeted website, wherein the websites collection model be used for characterize website feature vector and website two The corresponding relationship of grade classification.
11. device according to claim 10, which is characterized in that described device further includes websites collection model foundation list Member, the websites collection model foundation unit include:
Subelement is obtained, is configured to obtain order data set and access of multiple websites in the second preset time period respectively Data acquisition system;
Subelement is chosen, is configured to analyze the order data set and access data acquisition system of the multiple website, from Order data is selected in the order data set of the multiple website and generates multiple sample order data set, from the multiple Access data are selected in the access data acquisition system of website generates multiple sample interview data acquisition systems;
Subelement is extracted, is configured to respectively from the multiple sample order data set and the multiple sample interview data set Multiple sampling feature vectors are extracted in conjunction;
Cluster subelement.It is configured to cluster the multiple sampling feature vectors, obtains websites collection model.
12. a kind of server, comprising:
One or more processors;
Storage device, for storing one or more programs,
When one or more of programs are executed by one or more of processors, so that one or more of processors are real The now method as described in any in claim 1-9.
13. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer program The method as described in any in claim 1-9 is realized when being executed by processor.
CN201710351636.4A 2017-05-18 2017-05-18 Website category acquisition method and device Active CN108959289B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710351636.4A CN108959289B (en) 2017-05-18 2017-05-18 Website category acquisition method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710351636.4A CN108959289B (en) 2017-05-18 2017-05-18 Website category acquisition method and device

Publications (2)

Publication Number Publication Date
CN108959289A true CN108959289A (en) 2018-12-07
CN108959289B CN108959289B (en) 2022-04-26

Family

ID=64462802

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710351636.4A Active CN108959289B (en) 2017-05-18 2017-05-18 Website category acquisition method and device

Country Status (1)

Country Link
CN (1) CN108959289B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111882265A (en) * 2020-06-29 2020-11-03 深圳市法本信息技术股份有限公司 Cross-border e-commerce automatic customs declaration method and automatic customs declaration robot
CN112417893A (en) * 2020-12-16 2021-02-26 江苏徐工工程机械研究院有限公司 Software function demand classification method and system based on semantic hierarchical clustering
CN114615262A (en) * 2022-01-30 2022-06-10 阿里巴巴(中国)有限公司 Network aggregation method, storage medium, processor and system

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103324628A (en) * 2012-03-21 2013-09-25 腾讯科技(深圳)有限公司 Industry classification method and system for text publishing
CN103605794A (en) * 2013-12-05 2014-02-26 国家计算机网络与信息安全管理中心 Website classifying method
CN103744981A (en) * 2014-01-14 2014-04-23 南京汇吉递特网络科技有限公司 System for automatic classification analysis for website based on website content
CN104809125A (en) * 2014-01-24 2015-07-29 腾讯科技(深圳)有限公司 Method and device for identifying webpage categories
CN105184574A (en) * 2015-06-30 2015-12-23 电子科技大学 Method for detecting fraud behavior of merchant category code cloning
US9262646B1 (en) * 2013-05-31 2016-02-16 Symantec Corporation Systems and methods for managing web browser histories
CN105556557A (en) * 2013-09-20 2016-05-04 日本电气株式会社 Shipment-volume prediction device, shipment-volume prediction method, recording medium, and shipment-volume prediction system
CN106682217A (en) * 2016-12-31 2017-05-17 成都数联铭品科技有限公司 Method for enterprise second-grade industry classification based on automatic screening and learning of information

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103324628A (en) * 2012-03-21 2013-09-25 腾讯科技(深圳)有限公司 Industry classification method and system for text publishing
US9262646B1 (en) * 2013-05-31 2016-02-16 Symantec Corporation Systems and methods for managing web browser histories
CN105556557A (en) * 2013-09-20 2016-05-04 日本电气株式会社 Shipment-volume prediction device, shipment-volume prediction method, recording medium, and shipment-volume prediction system
CN103605794A (en) * 2013-12-05 2014-02-26 国家计算机网络与信息安全管理中心 Website classifying method
CN103744981A (en) * 2014-01-14 2014-04-23 南京汇吉递特网络科技有限公司 System for automatic classification analysis for website based on website content
CN104809125A (en) * 2014-01-24 2015-07-29 腾讯科技(深圳)有限公司 Method and device for identifying webpage categories
CN105184574A (en) * 2015-06-30 2015-12-23 电子科技大学 Method for detecting fraud behavior of merchant category code cloning
CN106682217A (en) * 2016-12-31 2017-05-17 成都数联铭品科技有限公司 Method for enterprise second-grade industry classification based on automatic screening and learning of information

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111882265A (en) * 2020-06-29 2020-11-03 深圳市法本信息技术股份有限公司 Cross-border e-commerce automatic customs declaration method and automatic customs declaration robot
CN112417893A (en) * 2020-12-16 2021-02-26 江苏徐工工程机械研究院有限公司 Software function demand classification method and system based on semantic hierarchical clustering
CN114615262A (en) * 2022-01-30 2022-06-10 阿里巴巴(中国)有限公司 Network aggregation method, storage medium, processor and system

Also Published As

Publication number Publication date
CN108959289B (en) 2022-04-26

Similar Documents

Publication Publication Date Title
CN106911697B (en) Access rights setting method, device, server and storage medium
CN107832468B (en) Demand recognition methods and device
CN109460513A (en) Method and apparatus for generating clicking rate prediction model
CN107105031A (en) Information-pushing method and device
CN108090162A (en) Information-pushing method and device based on artificial intelligence
CN107908789A (en) Method and apparatus for generating information
CN108520324A (en) Method and apparatus for generating information
CN107391680A (en) Content recommendation method, device and equipment
CN107315824A (en) Method and apparatus for generating thermodynamic chart
CN109976997A (en) Test method and device
CN110298716A (en) Information-pushing method and device
CN108776692A (en) Method and apparatus for handling information
CN109214730A (en) Information-pushing method and device
CN109388548A (en) Method and apparatus for generating information
CN109087138A (en) Data processing method and system, computer system and readable storage medium storing program for executing
CN108121699A (en) For the method and apparatus of output information
CN107977678A (en) Method and apparatus for output information
CN109711733A (en) For generating method, electronic equipment and the computer-readable medium of Clustering Model
CN107346344A (en) The method and apparatus of text matches
CN108959289A (en) Categories of websites acquisition methods and device
CN110097302A (en) The method and apparatus for distributing order
CN109784407A (en) The method and apparatus for determining the type of literary name section
CN110209658A (en) Data cleaning method and device
CN110309142A (en) The method and apparatus of regulation management
CN109753424A (en) The method and apparatus of AB test

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant