CN114117292B - Internet big data analysis and extraction method - Google Patents

Internet big data analysis and extraction method Download PDF

Info

Publication number
CN114117292B
CN114117292B CN202111298638.4A CN202111298638A CN114117292B CN 114117292 B CN114117292 B CN 114117292B CN 202111298638 A CN202111298638 A CN 202111298638A CN 114117292 B CN114117292 B CN 114117292B
Authority
CN
China
Prior art keywords
data
distance
similarity
domain name
following
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111298638.4A
Other languages
Chinese (zh)
Other versions
CN114117292A (en
Inventor
陈大海
张冰
徐浩
葛卫春
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Information Consulting and Designing Institute Co Ltd
Original Assignee
China Information Consulting and Designing Institute Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Information Consulting and Designing Institute Co Ltd filed Critical China Information Consulting and Designing Institute Co Ltd
Priority to CN202111298638.4A priority Critical patent/CN114117292B/en
Publication of CN114117292A publication Critical patent/CN114117292A/en
Application granted granted Critical
Publication of CN114117292B publication Critical patent/CN114117292B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides an Internet big data analysis and extraction method, which comprises the following steps: step 1, dividing a data object into different parts and types according to the characteristics of the data to obtain a data range to be extracted; step 2, establishing a regression model, solving each parameter of the model according to the measured data, evaluating whether the regression model can fit the measured data, and if so, further reducing the range of the data to be extracted according to the independent variable; step 3, dividing the data into more than two aggregation classes according to the characteristic attribute of the data, wherein elements in each aggregation class have the same characteristics, and grouping the data to be grabbed; step 4, calculating the similarity degree of the two data by adopting a similarity matching method; step 5, using word frequency as a statistical index to indicate the data segment information fed back by the data; and 6, obtaining a data analysis result. The method is automatically completed by utilizing the characteristic learning algorithm based on the embedded mapping, and has high calculation efficiency.

Description

Internet big data analysis and extraction method
Technical Field
The invention belongs to the technical field of big data, and particularly relates to an Internet big data analysis and extraction method.
Background
Big data refers to a data set which cannot be captured, managed and processed by a conventional software tool within a certain time range, and is a massive, high-growth-rate and diversified information asset which needs a new processing mode to have stronger decision-making ability, insight discovery ability and flow optimization ability.
At present, many adopt the web crawler mode, snatch relevant information from public website, then carry out structuring processing and storage, can be disturbed by useless information such as a large amount of expiration information, phishing website information, and the data accuracy and practicality are lower. Therefore, there is a need to intensively study an internet data extraction method to solve the problem of improving the reliability and accuracy of data.
The existing intelligent processing system for big data has at least the following disadvantages: the prior data technology lacks analysis of unstructured data, loses a large amount of effective information, and influences the analysis result of the service; the existing data analysis and extraction are excessively dependent on human-powered feature extraction, so that the accuracy is low, the calculation efficiency is poor, the response to a user request is slow, and the user experience is affected; different services typically employ different data processing and feature extraction methods, resulting in a large amount of redundant data processing, and the features of the data units of the different services are not compatible.
Disclosure of Invention
The invention aims to: the invention aims to solve the defects in the prior art, and provides an Internet big data analysis and extraction method which eliminates data with low accuracy and reliability to obtain positive check data with higher reliability and reliability.
The method specifically comprises the following steps:
step 1, dividing a data object into different parts and types according to the characteristics of the data, and further analyzing to obtain a data range to be extracted;
step 2, determining causal relation between variables by defining dependent variables and independent variables, establishing a regression model, solving each parameter of the model according to measured data, evaluating whether the regression model can fit the measured data, and if so, further reducing the range of the data to be extracted according to the independent variables;
step 3, dividing the data into more than two aggregation classes according to the characteristic attribute of the data (the characteristic attribute is used for representing the data, the source of the characteristic attribute can be statistical analysis, such as internet text data used in the invention, the characteristic attribute comprises source websites, topics, words, word frequency statistics and the like, the step 3 is to firstly perform a preliminary grouping, which is equivalent to initializing work, and then further refining and extracting), and the elements in each aggregation class have the same characteristic and group the data to be grabbed;
step 4, calculating the similarity degree of the two data by adopting a similarity matching method;
step 5, extracting the data frequently appearing in the steps 1 to 4 (the statistics of the selected word frequency reaches the first 20 percent), and according to the attribute characteristics of the data, using the word frequency as a statistical index to indicate the data segment information fed back by the data;
and 6, carrying out data decomposition on the data segment in the data to be analyzed according to the data segment decomposition regular expression (the regular expression is a computer text processing technology, is an internet text, contains a plurality of format symbols (such as html mark symbols and the like) and needs to be processed and filtered by means of the regular expression) so as to generate a data item value, associating the data item value with a data item name list corresponding to the data segment decomposition regular expression so as to form an intermediate data pair corresponding to the data item name and the data item value, and carrying out statistical analysis on the intermediate data pair according to a data statistical rule so as to obtain a data analysis result.
Preferably, in the step 2, the similarity matching algorithm can be applied to fields such as data cleansing, user input error correction, recommendation statistics, hacking detection system, automatic scoring system, web page searching and DNA sequence matching. In step 2, the measured data represents the data of the actual experimental test, that is, the input data, and the independent variable is derived from the measured data.
The step 2 comprises the following steps: setting the self-variable data object as X= { X 1 ,x 2 ,…,x m The corresponding dependent variable is y= { y } 1 ,y 2 ,…,y m The regression model is:
y=w 0 +w 1 x 1 +w 2 x 2 +…+w m x m
wherein,x m 、y m respectively representing an mth independent variable and an mth dependent variable; w= { w 0 ,w 1 ,w 2 ,…,w m And the regression coefficient is set, w m Represents the mth regression coefficient, μ is the random error, and the error of fit, L (X), is measured by the square error:
from the following componentsThe method comprises the following steps:
is a parameter estimation value for w (regression coefficient);
solving the problem of under fitting by local weighted linear regression, and adding weight w to the error i The error is:
wherein W is a diagonal matrix, a Gaussian kernel is adopted, and the corresponding weight function W (j, j) is as follows:
where k represents the variance of the Gaussian function, resulting in a new regression coefficientThe method comprises the following steps:
wherein w=w T W。
In step 4, the similarity between data objects in different groups is required to be low, the similarity between data objects in the same group is required to be high, and J is calculated by the following objective function:
where J is the sum of square errors of all objects in the measured dataset, x i Representing any one of the objects in the dataset, u j Is the j-th aggregation class (cluster) C j The goal is to have the objective function converge.
In step 4, the similarity matching method includes calculation of average index and variation index, and graphic representation of data distribution form, and by calculating distance between data items, similarity between two data items is measured, and comprehensive consideration of Euclidean distance, manand distance, minkowski distance, and included angle cosine distance is adopted, wherein the calculation formula is as follows:
the Euclidean distance D is calculated by adopting the following formula 1 (X i ,X j ):
The Manchurian distance D is calculated using the following formula 2 (X i ,X j ):
D 2 (X i ,X j )=|x i1 -x j1 |+x i2 -x j2 |+…+|x id -x jd |
The Minkowski distance D is calculated using the formula 3 (X i ,X j ):
The cosine distance D of the included angle is calculated by adopting the following formula 4 (X i ,X j ):
Wherein X is i ={x i1 ,x i2 ,…,x id }∈R d And X j ={x j1 ,x j2 ,…,x jd }∈R d Representing two data item samples in a data item collection, a smaller distance value representing a greater similarity of the samples, a greater distance representing a lesser similarity of the samples; i, j=1, 2,3, …, N; x is x id Representing the ith data item sample X i The d-th value of (2); r is R d Representing a real set of dimensions d;
the weighted sum distance D (X) is calculated using the following formula i ,X j ):
D(X i ,X j )=a 1 ·D 1 (X i ,X j )+a 2 ·D 2 (X i ,X j )+a 3 ·D 3 (X i ,X j )+a 4 ·D 4 (X i ,X j )
Wherein a is 1 ,a 2 ,a 3 ,a 4 The weight values of the corresponding Euclidean distance, manand ton distance, minkowski distance and included angle cosine distance are respectively the value ranges [0,1 ]]And a 1 +a 2 +a 3 +a 4 =1。
The invention also comprises a method for acquiring the site home page, which comprises the following steps: taking out the domain name address from the website of the webpage, and performing jump processing on the domain name address to obtain a website top page corresponding to the webpage; taking out the domain name addresses one by one from the websites of the full-network webpage, adding the domain name addresses into a domain name address set after performing duplication removal processing, and performing jump processing on all the domain name addresses in the domain name address set to obtain a corresponding website top page; or, using the top page sample set of the site to statistically analyze the link anchor text and the website style characteristics of the site to construct a top page classifier, and using the top page classifier to analyze the webpage to obtain all the top pages of the site.
The invention also includes: the method for acquiring the contact page corresponding to the webpage specifically comprises the following steps: and utilizing a contact page sample set of the sites to statistically analyze the link anchor text, the page title and the website style characteristics of the contact page sample set to construct a contact page classifier, and utilizing the contact page classifier to analyze the webpage to obtain contact information pages of all the sites.
The invention has the following beneficial effects:
1. the data structuring module can preprocess and network the original big data to convert the original big data into network data or structure data, so that the characterization learning module can utilize a characterization learning algorithm of the network data to realize rapid and uniform feature extraction of the data and express the data in a high-dimensional vector form; the whole characteristic extraction process is automatically completed by using a characterization learning algorithm based on embedded mapping without human participation, and the calculation efficiency is high.
2. Structural information (namely effective information) in the original big data is also greatly reserved in the process of feature extraction, so that the accuracy of tasks such as classification or prediction by using the feature information is improved; furthermore, due to the adoption of the characteristic learning algorithm based on the embedded mapping, the data characteristic system mined from the original big data can be uniformly represented in a high-dimensional vector form, so that the intelligent processing system of the big data is not limited to a specific application service, and a uniform and effective processing method can be provided for various application services.
Drawings
The foregoing and/or other advantages of the invention will become more apparent from the following detailed description of the invention when taken in conjunction with the accompanying drawings and detailed description.
FIG. 1 is a flow chart of the method of the present invention.
FIG. 2 is a schematic representation of the results of an embodiment of the present invention.
Detailed Description
Referring to fig. 1, the invention provides an internet big data analysis and extraction method, which comprises the following steps:
s1, dividing a data object into different parts and types according to the characteristics of the data, and further analyzing to obtain a data range to be extracted;
s2, determining causal relation between variables by stipulating dependent variables and independent variables, establishing a regression model, solving each parameter of the model according to measured data, then evaluating whether the regression model can fit the measured data well, if so, further narrowing the data range to be extracted according to the independent variables, and applying the similarity matching algorithm to fields such as data cleaning, user input error correction, recommendation statistics, plagiarism detection systems, automatic scoring systems, web page searching and DNA sequence matching;
s3, dividing the data into a plurality of aggregation classes according to the characteristic attribute of the data, wherein the elements in each aggregation class have the same characteristics as much as possible, and the data to be grabbed are grouped in a classification mode with the characteristic difference between different aggregation classes as large as possible;
s4, calculating the similarity degree of the two data by adopting a similarity matching method, wherein the similarity degree is usually measured by a percentage, and the similarity matching method comprises calculation of an average index and a variation index and graphic representation of a data distribution form;
s5, extracting the frequently-occurring data set in the steps, and according to the attribute characteristics of the data, using word frequency as a statistical index to indicate the data segment information fed back by the data;
s6, carrying out data decomposition on the data segment in the data to be analyzed according to the data segment decomposition regular expression, generating a data item value, associating the data item value with a data item name list corresponding to the data segment decomposition regular expression, forming an intermediate data pair corresponding to the data item name and the data item value, and carrying out statistical analysis on the intermediate data pair according to a data statistical rule to obtain a data analysis result.
The method for acquiring the site home page comprises the following steps: taking out the domain name address from the website of the webpage, and performing jump processing on the domain name address to obtain a site home page corresponding to the webpage; taking out the domain name addresses one by one from the websites of the full-network webpage, adding the domain name addresses into a domain name address set after performing duplication removal processing, and performing jump processing on all the domain name addresses in the domain name address set to obtain a corresponding website top page; or, using the top page sample set of the site to statistically analyze the link anchor text and the website style characteristics of the site to construct a top page classifier, and using the top page classifier to analyze the webpage to obtain all the top pages of the site.
The method for acquiring the contact pages corresponding to the webpage comprises the following steps: and utilizing the contact page sample set of the sites to statistically analyze the link anchor text, the page title and the website style characteristics of the sites to construct a contact page classifier, and utilizing the contact page classifier to analyze the webpage pages to obtain the contact information pages of all the sites.
The data structuring module can preprocess and network the original big data to convert the original big data into network data or structure data, so that the characterization learning module can utilize a characterization learning algorithm of the network data to realize rapid and uniform feature extraction of the data and express the data in a high-dimensional vector form; the whole characteristic extraction process is automatically completed by using a characterization learning algorithm based on embedded mapping without human participation, and the calculation efficiency is high.
Structural information (namely effective information) in the original big data is also greatly reserved in the process of feature extraction, so that the accuracy of tasks such as classification or prediction by using the feature information is improved; furthermore, due to the adoption of the characteristic learning algorithm based on the embedded mapping, the data characteristic system mined from the original big data can be uniformly represented in a high-dimensional vector form, so that the intelligent processing system of the big data is not limited to a specific application service, and a uniform and effective processing method can be provided for various application services.
By utilizing the method, extraction analysis is carried out on data such as recruitment release information, job seeker information and the like of a recruitment website, and the recruitment information is separated into the following data dimensions after extraction: post name, salary requirements, working city, working years, working properties, academic requirements, recruiters, post descriptions, post responsibilities, post welfare, detailed work places, post publisher names, company names, industries to which the company belongs, company personnel, company properties, company descriptions, company official network addresses and the like; after job hunting information is extracted, the job hunting information is divided into the following steps according to data dimension: job seekers name, gender, date of birth, political aspect, working years, graduation institutions, academia, job positions, desired salary, technical capabilities, working experiences, honor certificates, mobile phones, mailboxes, addresses, and the like.
According to the technical steps of the invention, experiments are carried out, the crawled internet data sets are screened according to the data dimension as effective characteristics, the screened results (namely effective data) are used as data sets for carrying out the next experiment, 0.1% of the accuracy of the experimental results is verified according to the extracted data sets, the obtained experimental results are as follows, the data sets are shown in table 1, and the experimental results are shown in fig. 2.
TABLE 1
The invention provides an internet big data analysis and extraction method, and the method and the way for realizing the technical scheme are a plurality of, the above is only the preferred implementation mode of the invention, and it should be pointed out that a plurality of improvements and modifications can be made by one of ordinary skill in the art without departing from the principle of the invention, and the improvements and modifications are also considered as the protection scope of the invention. The components not explicitly described in this embodiment can be implemented by using the prior art.

Claims (4)

1. The Internet big data analysis and extraction method is characterized by comprising the following steps:
step 1, dividing a data object into different parts and types according to the characteristics of the data, and further analyzing to obtain a data range to be extracted;
step 2, determining causal relation between variables by defining dependent variables and independent variables, establishing a regression model, solving each parameter of the model according to measured data, evaluating whether the regression model can fit the measured data, and if so, further reducing the range of the data to be extracted according to the independent variables;
step 3, dividing the data into more than two aggregation classes according to the characteristic attribute of the data, wherein elements in each aggregation class have the same characteristics, and grouping the data to be grabbed;
step 4, calculating the similarity degree of the two data by adopting a similarity matching method;
step 5, extracting the frequently occurring data in the steps 1 to 4, and according to the attribute characteristics of the data, using word frequency as a statistical index to indicate the data segment information fed back by the data;
step 6, carrying out data decomposition on the data segment in the data to be analyzed according to the data segment decomposition regular expression, generating a data item value, associating the data item value with a data item name list corresponding to the data segment decomposition regular expression, forming an intermediate data pair corresponding to the data item name and the data item value, and carrying out statistical analysis on the intermediate data pair according to a data statistical rule to obtain a data analysis result;
in step 4, the similarity matching method includes calculation of average index and variation index, and graphic representation of data distribution form, and by calculating distance between data items, similarity between two data items is measured, and comprehensive consideration of Euclidean distance, manand distance, minkowski distance, and included angle cosine distance is adopted, wherein the calculation formula is as follows:
the Euclidean distance D is calculated by adopting the following formula 1 (X i ,X j ):
Using, for exampleThe following formula calculates the Manand ton distance D 2 (X i ,X j ):
D 2 (X i ,X j )=|x i1 -x j1 |+|x i2 -x j2 |+…+|x id -x jd |
The Minkowski distance D is calculated using the formula 3 (X i ,X j ):
The cosine distance D of the included angle is calculated by adopting the following formula 4 (X i ,X j ):
Wherein X is i ={x i1 ,x i2 ,…,x id }∈R d And X j ={x j1 ,x j2 ,…,x jd }∈R d Representing two data item samples in a data item collection, a smaller distance value representing a greater similarity of the samples, a greater distance representing a lesser similarity of the samples; i, j=1, 2,3, …, N; x is x id Representing the ith data item sample X i The d-th value of (2); r is R d Representing a real set of dimensions d;
the weighted sum distance D (X) is calculated using the following formula i ,X j ):
D(X i ,X j )=a 1 ·D 1 (X i ,X j )+a 2 ·D 2 (X i ,X j )+a 3 ·D 3 (X i ,X j )+a 4 ·D 4 (X i ,X j )
Wherein a is 1 ,a 2 ,a 3 ,a 4 The weight values of the corresponding Euclidean distance, manand ton distance, minkowski distance and included angle cosine distance are respectively the value ranges [0,1 ]]And a 1 +a 2 +a 3 +a 4 =1;
The method for acquiring the site home page further comprises the following steps: taking out the domain name address from the website of the webpage, and performing jump processing on the domain name address to obtain a website top page corresponding to the webpage; taking out the domain name addresses one by one from the websites of the full-network webpage, adding the domain name addresses into a domain name address set after performing duplication removal processing, and performing jump processing on all the domain name addresses in the domain name address set to obtain a corresponding website top page; or, using the top page sample set of the site to statistically analyze the link anchor text and the website style characteristics of the site to construct a top page classifier, and using the top page classifier to analyze the webpage to obtain all the top pages of the site.
2. The method of claim 1, wherein step 2 comprises: setting the self-variable data object as X= { X 1 ,x 2 ,…,x m The corresponding dependent variable is y= { y } 1 ,y 2 ,…,y m The regression model is:
y=w 0 +w 1 x 1 +w 2 x 2 +…+w m x m
wherein x is m 、y m Respectively representing an mth independent variable and an mth dependent variable; w= { w 0 ,w 1 ,w 2 ,…,w m And the regression coefficient is set, w m Represents the mth regression coefficient, μ is the random error, and the error of fit, L (X), is measured by the square error:
from the following componentsThe method comprises the following steps:
is a parameter estimation value for w (regression coefficient);
solving the problem of under fitting by local weighted linear regression, and adding weight w to the error i The error is:
wherein W is a diagonal matrix, a Gaussian kernel is adopted, and the corresponding weight function W (j, j) is as follows:
where k represents the variance of the Gaussian function, resulting in a new regression coefficientThe method comprises the following steps:
wherein w=w T W。
3. The method according to claim 2, wherein in step 4, the similarity between data objects of different groups is required to be low, and the similarity between data objects in the same group is required to be high, J is calculated by the following objective function:
wherein J is the flatness of all objects in the measured datasetSum of square errors, x i Representing any one of the objects in the dataset, u j Is the j-th aggregation class C j The goal is to have the objective function converge.
4. A method according to claim 3, further comprising: the method for acquiring the contact page corresponding to the webpage specifically comprises the following steps: and utilizing a contact page sample set of the sites to statistically analyze the link anchor text, the page title and the website style characteristics of the contact page sample set to construct a contact page classifier, and utilizing the contact page classifier to analyze the webpage to obtain contact information pages of all the sites.
CN202111298638.4A 2021-11-04 2021-11-04 Internet big data analysis and extraction method Active CN114117292B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111298638.4A CN114117292B (en) 2021-11-04 2021-11-04 Internet big data analysis and extraction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111298638.4A CN114117292B (en) 2021-11-04 2021-11-04 Internet big data analysis and extraction method

Publications (2)

Publication Number Publication Date
CN114117292A CN114117292A (en) 2022-03-01
CN114117292B true CN114117292B (en) 2024-04-16

Family

ID=80381256

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111298638.4A Active CN114117292B (en) 2021-11-04 2021-11-04 Internet big data analysis and extraction method

Country Status (1)

Country Link
CN (1) CN114117292B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003186901A (en) * 2001-12-21 2003-07-04 Nippon Telegr & Teleph Corp <Ntt> Web SITE RETRIEVAL METHOD AND SYSTEM, EXECUTION PROGRAM FOR THE METHOD, AND RECORDING MEDIUM WITH ITS PROGRAM RECORDED THEREON
CN103514234A (en) * 2012-06-30 2014-01-15 北京百度网讯科技有限公司 Method and device for extracting page information
CN107657032A (en) * 2017-09-28 2018-02-02 佛山市南方数据科学研究院 A kind of internet big data analyzes extracting method
WO2018068360A1 (en) * 2016-10-11 2018-04-19 国云科技股份有限公司 Method for obtaining regression relationships between dependent variables and independent variables during data analysis
CN109241446A (en) * 2018-10-17 2019-01-18 重庆聚焦人才服务有限公司 A kind of position recommended method and system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003186901A (en) * 2001-12-21 2003-07-04 Nippon Telegr & Teleph Corp <Ntt> Web SITE RETRIEVAL METHOD AND SYSTEM, EXECUTION PROGRAM FOR THE METHOD, AND RECORDING MEDIUM WITH ITS PROGRAM RECORDED THEREON
CN103514234A (en) * 2012-06-30 2014-01-15 北京百度网讯科技有限公司 Method and device for extracting page information
WO2018068360A1 (en) * 2016-10-11 2018-04-19 国云科技股份有限公司 Method for obtaining regression relationships between dependent variables and independent variables during data analysis
CN107657032A (en) * 2017-09-28 2018-02-02 佛山市南方数据科学研究院 A kind of internet big data analyzes extracting method
CN109241446A (en) * 2018-10-17 2019-01-18 重庆聚焦人才服务有限公司 A kind of position recommended method and system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于LightGBM算法的移动用户信用评分研究;国强强;朱振方;;计算机技术与发展;20200910(第09期);全文 *
基于文本频率页面分割算法对论坛正文提取;马凯凯;钱亚赫;阮东跃;;中国水运(下半月);20180215(第02期);全文 *

Also Published As

Publication number Publication date
CN114117292A (en) 2022-03-01

Similar Documents

Publication Publication Date Title
Boididou et al. Detection and visualization of misleading content on Twitter
CN109145215B (en) Network public opinion analysis method, device and storage medium
Blazquez et al. Web data mining for monitoring business export orientation
CN112528025A (en) Text clustering method, device and equipment based on density and storage medium
Govindasamy et al. Analysis of student academic performance using clustering techniques
CN108241867B (en) Classification method and device
CN109190698B (en) Classification and identification system and method for network digital virtual assets
CN113177700B (en) Risk assessment method, system, electronic equipment and storage medium
CN112308173B (en) Multi-target object evaluation method based on multi-evaluation factor fusion and related equipment thereof
CN112395513A (en) Public opinion transmission power analysis method
CN107330705A (en) A kind of method and system according to multi-data source antifraud
CN114117292B (en) Internet big data analysis and extraction method
CN112487306A (en) Automatic event marking and classifying method based on knowledge graph
Lei et al. Automatically classify chinese judgment documents utilizing machine learning algorithms
Sharafat et al. Legal data mining from civil judgments
CN115269816A (en) Core personnel mining method and device based on information processing method and storage medium
CN112506930B (en) Data insight system based on machine learning technology
CN112434126B (en) Information processing method, device, equipment and storage medium
Wang Construction of Alumni Information Analysis Model Based on Big Data
CN112818215A (en) Product data processing method, device, equipment and storage medium
Jittawiriyanukoon Evaluation of a multiple regression model for noisy and missing data
Saraswathi et al. Effective Search Engine Spam Classification
CN111967541B (en) Data classification method and device based on multi-platform samples
CN107093021A (en) Electricity power engineering goods and materials contract is honoured an agreement sincere public sentiment monitoring system
Kaui et al. Detection of phishing webpages using weights computed through genetic algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant