CN113032561A - Data reliability evaluation algorithm based on online big data intelligent aggregation mode - Google Patents

Data reliability evaluation algorithm based on online big data intelligent aggregation mode Download PDF

Info

Publication number
CN113032561A
CN113032561A CN202110287067.8A CN202110287067A CN113032561A CN 113032561 A CN113032561 A CN 113032561A CN 202110287067 A CN202110287067 A CN 202110287067A CN 113032561 A CN113032561 A CN 113032561A
Authority
CN
China
Prior art keywords
data
reliability
trust
data source
website
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110287067.8A
Other languages
Chinese (zh)
Inventor
谭继军
李阳
蒋华东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Digital Data Technology Co ltd
Original Assignee
Shanghai Digital Data Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Digital Data Technology Co ltd filed Critical Shanghai Digital Data Technology Co ltd
Priority to CN202110287067.8A priority Critical patent/CN113032561A/en
Publication of CN113032561A publication Critical patent/CN113032561A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • General Physics & Mathematics (AREA)
  • Development Economics (AREA)
  • Technology Law (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a data reliability evaluation algorithm based on an online big data intelligent aggregation mode, which comprises the following steps: step 1, assigning weights of data source credit acquisition: carrying out credit rating on the data source, and determining the weight z of data source credit acquisitioni(ii) a And 2, assigning the reliability of the data source: assigning values to the same type of data based on the occurrence times of keyword clustering to determine data credit score, and assigning reliability based on the occurrence times of clusteringi(ii) a Step 3, calculating reliability evaluation score: obtaining reliability score S based on clustering result of different resultsiTaking the weight coefficient z corresponding to the highest credit evaluation of the data sources in all the data sources of the resultiThen the evaluation score Y of the reliability of the result can be calculatedi=zi*Si. The invention can automatically judge the reliability of different data results and automatically realize the reliability evaluation of' false and true removingAnd (4) screening.

Description

Data reliability evaluation algorithm based on online big data intelligent aggregation mode
Technical Field
The invention belongs to the technical field of big data processing, and particularly relates to a data reliability evaluation algorithm based on an online big data intelligent aggregation mode.
Background
At present, technical key points of online big data intelligent aggregation service companies in the market are biased to the construction of general capabilities such as online big data intelligent aggregation capability, online data efficient acquisition management capability and data visualization display capability, and data processing key points are biased to the relevance among data.
Since the department always focuses on the business scene of the small and micro financial big data wind control service, and the key of the wind control service is data reliability, the data reliability evaluation algorithm based on the online big data intelligent aggregation mode is mainly developed around the characteristics of the scene of the small and micro financial big data wind control on the basis of the traditional online big data intelligent aggregation service.
Because only the aggregated data is more reliable, the risk evaluation system based on the online big data can be more reliable, and then the risk of the small and micro finance is more controllable, so that the development of the small and micro finance is better assisted.
Disclosure of Invention
The invention aims to provide a data reliability evaluation algorithm based on an online big data intelligent aggregation mode, which automatically judges the reliability of different data results through a set of data reliability algorithm and has the advantage of automatically realizing the reliability evaluation screening of 'false removal and true presence'.
In order to solve the technical problems, the technical scheme adopted by the invention is as follows:
a data reliability evaluation algorithm based on an online big data intelligent aggregation mode uses a data reliability evaluation terminal based on the online big data intelligent aggregation mode, and specifically comprises the following steps:
step 1, assigning weights of data source credit acquisition: carrying out credit rating on the data source, and determining the weight z of data source credit acquisitioni
And 2, assigning the reliability of the data source: assigning values to the same type of data based on the occurrence times of keyword clustering to determine data credit score, and assigning reliability based on the occurrence times of clusteringi
Step 3, calculating reliability evaluation score: obtaining reliability score S based on clustering result of different resultsiTaking the result corresponding to the highest credit rating of the data sourcesWeight coefficient ziThen the evaluation score Y of the reliability of the result can be calculatedi=zi*Si
Further limiting, the specific process of step 1 is as follows: based on the field ranking of the subdivision type of the data source website, the background of an operation company, data authority, a data production mode and the activity of the website, the website is subdivided into five levels of complete trust, comparative trust, general trust, comparative distrust and complete distrust.
Further limiting, wherein in the step 1, the completely trusted data source is a white list website data source, data can be crawled in the white list type data source in a similar manner, namely, the data can be completely trusted, and the weight is infinite; the white list website data source comprises a learning letter network and a referee document website.
Further limiting, in step 1, a completely untrusted data source is a blacklist website data source, and such website data sources are automatically excluded in the process of data aggregation; completely untrusted data sources include unfamiliar social networking sites.
Further defined, wherein in step 1, data sources that are more trusted, generally trusted and less untrusted are given a weighting factor zi
Comparing the distrusted websites, and weighting the distrusted websites by a weight coefficient ziSet to 0.8;
a general trust website, which has a weight coefficient ziSet to 1.0;
comparing the trust websites, and weighting the trust websites by a weight coefficient ziSet to 1.2.
Compared with the prior art, the invention has the following beneficial effects:
according to the method, the problem that results of the same type of data have differences (namely mutual contradictions) after online big data are intelligently aggregated is focused around a scene of wind control service, the reliability of different data results is automatically judged through a set of data reliability algorithm, and the reliability evaluation and screening of 'fake removal and true deposit' is automatically realized; the data reliability algorithm can focus on a large-data wind control business scene, mainly solves the specific scene that the same data has conflicting results, evaluates and screens the reliability of the data conflicting results, solves the problem of data noise possibly existing in online large-data intelligent aggregated data, and better adapts to high scenes such as wind control and the like which have high requirements on data reliability.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.
FIG. 1 is a flow chart of the algorithm of the present invention.
Fig. 2 is an overall schematic diagram of the evaluation terminal according to the present invention.
Reference numerals: the method comprises the following steps of 1-a terminal body, 2-a touch screen, 3-a retrieval module, 4-a data storage module, 5-a data processing module, 6-a voice input device, 7-a voice output device and 8-a control circuit board.
Detailed Description
The present invention will be further described with reference to the following examples, which are intended to illustrate only some, but not all, of the embodiments of the present invention. Based on the embodiments of the present invention, other embodiments used by those skilled in the art without any creative effort belong to the protection scope of the present invention.
Example one
Referring to fig. 1 and fig. 2, the embodiment discloses a data reliability evaluation algorithm based on an online big data intelligent aggregation mode, and a data reliability evaluation terminal based on the online big data intelligent aggregation mode is used, and the specific structure of the data reliability evaluation terminal is as follows:
the evaluation terminal comprises a terminal body 1, a touch screen 2, a retrieval module 3, a data storage module 4 and a data processing module 5, wherein the touch screen 2 is installed on the terminal body 1, the retrieval module 3, the data storage module 4 and the data processing module 5 are integrated on a control circuit board 8, the touch screen 2 is an input device, a field to be retrieved is input in a touch mode, the retrieval module 3 carries out automatic retrieval on the network after receiving the field, information with the field is automatically cached in the data storage module 4, the data processing module 5 analyzes and processes data cached in a memory, and then a structure obtained after data processing is displayed on the touch screen 2, so that the purpose of quickly obtaining real information can be achieved.
In actual use, the processed data is uploaded to a cloud platform, and if a person searches the field subsequently, the data result stored in the cloud is directly called in a preset period; wherein the preset period is in units of days, weeks or months.
The algorithm of the data processing module 5 specifically comprises the following steps:
step 1, assigning weights of data source credit acquisition: carrying out credit rating on the data source, and determining the weight z of data source credit acquisitioni
And 2, assigning the reliability of the data source: assigning values to the same type of data based on the occurrence times of keyword clustering to determine data credit score, and assigning reliability based on the occurrence times of clusteringi
Step 3, calculating reliability evaluation score: obtaining reliability score S based on clustering result of different resultsiTaking the weight coefficient z corresponding to the highest credit evaluation of the data sources in all the data sources of the resultiThen the evaluation score Y of the reliability of the result can be calculatedi=zi*Si
Further limiting, the specific process of step 1 is as follows: based on the field ranking of the subdivision type of the data source website, the background of an operation company, data authority, a data production mode and the activity of the website, the website is subdivided into five levels of complete trust, comparative trust, general trust, comparative distrust and complete distrust.
Further limiting, wherein in the step 1, the completely trusted data source is a white list website data source, data can be crawled in the white list type data source in a similar manner, namely, the data can be completely trusted, and the weight is infinite; the white list website data source comprises a learning letter network and a referee document website.
Further limiting, in step 1, a completely untrusted data source is a blacklist website data source, and such website data sources are automatically excluded in the process of data aggregation; completely untrusted data sources include unfamiliar social networking sites.
Further defined, wherein in step 1, data sources that are more trusted, generally trusted and less untrusted are given a weighting factor zi
Comparing the distrusted websites, and weighting the distrusted websites by a weight coefficient ziSet to 0.8;
a general trust website, which has a weight coefficient ziSet to 1.0;
comparing the trust websites, and weighting the trust websites by a weight coefficient ziSet to 1.2.
Based on the field ranking of the subdivision type of the data source website, the background of an operation company, data authority, a data production mode and the activity of the website, the website is subdivided into five levels of complete trust, comparative trust, general trust, comparative distrust and complete distrust;
the completely trusted data source is a white list website data source, data are crawled in the white list type data source, namely, the complete trust can be obtained, the weight is infinite, and if different data are collected by a plurality of completely trusted websites, the latest updated data are obtained;
completely untrusted data sources are blacklist website data sources, and the website data sources are automatically excluded in the data aggregation process;
data sources that are more trusted, generally trusted and less trusted are given a weighting factor zi.
In order to further illustrate the present invention, the following description further illustrates the present invention in conjunction with specific embodiments.
The information of the academic history of a small micro enterprise owner is gathered in the whole network, and the learning information network is naturally a completely trusted data source, and if the data of the learning information network is collected, the data of the learning information network is naturally used as the standard.
But because the personal academic data of the learning information network data is not public, the data can not be obtained.
We obtain three different results of the subject's academic calendar, the college academic calendar and the high school academic calendar from XX referee document website, social website, wedding website, job hunting website and the like,
wherein:
XX official document website is a complete trust website: the weighting coefficient is infinite, namely the weight is completely trusted;
XX social network site is a relatively distrusted network site, and the weighting coefficient z of XX social network site isiSet to 0.8;
XX wedding web site is general trust web site, and the weighting coefficient z is usediSet to 1.0;
XX job hunting website is a comparative trust website, and the weighting coefficient z of the XX job hunting website isiSet to 1.2;
the XX stranger friend-making website is a completely untrusted website, and the weight of the XX stranger friend-making website is 0, namely no credit is adopted.
The following are specific:
data source XX social network site XX wedding website XX job hunting website
Data source credit rating Is relatively untrusted General trust Comparison trust
Weight coefficient zi 0.8 1.0 1.2
Setting reliability score SiThe cardinality of (a) is 5 points and the number of times the data result appears is crawled, as specified in the following table:
data results Number of occurrences Reliability score Si Highest rating of data sources
University study calendar Y 1 2 10 Comparison trust
Study of this department Y2 1 5 Is relatively untrusted
High school calendar Y 3 2 10 General trust
The results of the calculations made from the above table are as follows:
university study calendar Y1=1.2*10=12
The subject calendar Y2 ═ 0.8 ═ 5 ═ 4
High school calendar Y3=1.0*10=10
Y1>Y3>Y2So we get the university study calendar Y1As data for final credit.
According to the method, the problem that results of the same type of data have differences (namely mutual contradictions) after online big data are intelligently aggregated is focused around a scene of wind control service, the reliability of different data results is automatically judged through a set of data reliability algorithm, and the reliability evaluation and screening of 'fake removal and true deposit' is automatically realized; the data reliability algorithm can focus on a large-data wind control business scene, mainly solves the specific scene that the same data has conflicting results, evaluates and screens the reliability of the data conflicting results, solves the problem of data noise possibly existing in online large-data intelligent aggregated data, and better adapts to high scenes such as wind control and the like which have high requirements on data reliability.
Further optimize, in this embodiment, further optimize, integrated with network interface and wireless signal transceiver on the control circuit board 8, like this, in actual use, will be more convenient, can select the networking mode according to actual needs.
Further optimize, terminal body 1 is provided with speech input and speech output device 7, is provided with intelligent voice module in the terminal body 1, and speech input device 6 is connected with intelligent voice module, and speech output device 7 and intelligent voice module are connected with data processing module 5.
In actual use, the information can be acquired in a voice input mode in addition to a manual input mode; meanwhile, voice playing can be performed through the set voice output module, and convenience of the device can be greatly improved.
Example two
Assuming the conditions are as described in example one above, the results of enumerating fully trusted and fully untrusted website data are as follows:
data source XX referee document website XX wedding website Strange social network site
Data source credit rating Full trust General trust Completely untrusted
Weight coefficient zi 1.0 0
Setting reliability score SiThe cardinality of (a) is 5 points and the number of times the data result appears is crawled, as specified in the following table:
data results Number of occurrences Reliability score Si Highest rating of data sources
University study calendar Y 1 2 10 Full trust
Study of this department Y2 1 5 Completely untrusted
High school calendar Y 3 2 10 General trust
The results of the calculations made from the above table are as follows:
university study calendar Y1Infinity 10 ═ infinity, i.e. the results are directly signalled;
subject calendar Y20-5-0, i.e. the result is not informed;
high school calendar Y3=1.0*10=10;
Y1>Y3>Y2So we get the university study calendar Y1As data for final credit.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention. The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, it should be noted that any modifications, equivalents and improvements made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (5)

1. The data reliability evaluation algorithm based on the online big data intelligent aggregation mode is characterized in that a data reliability evaluation terminal based on the online big data intelligent aggregation mode is used, and the method comprises the following steps:
step 1, assigning weights of data source credit acquisition:
carrying out credit rating on the data source, and determining the weight z of data source credit acquisitioni
And 2, assigning the reliability of the data source:
assigning values to the same type of data based on the occurrence times of keyword clustering to determine data credit score, and assigning reliability based on the occurrence times of clusteringi
Step 3, calculating reliability evaluation score:
obtaining reliability score S based on clustering result of different resultsiTaking the weight coefficient z corresponding to the highest credit evaluation of the data sources in all the data sources of the resultiThen the evaluation score Y of the reliability of the result can be calculatedi=zi*Si
2. The data reliability evaluation algorithm based on the online big data intelligent aggregation mode as claimed in claim 1, wherein: the specific process of step 1 is as follows: based on the field ranking of the subdivision type of the data source website, the background of an operation company, data authority, a data production mode and the activity of the website, the website is subdivided into five levels of complete trust, comparative trust, general trust, comparative distrust and complete distrust.
3. The data reliability evaluation algorithm based on the online big data intelligent aggregation mode as claimed in claim 2, wherein: in step 1, the completely trusted data source is the data source of the white list website, and the data is crawled in the white list type data source, so that the complete trust can be obtained, and the weight is infinite.
4. The data reliability evaluation algorithm based on the online big data intelligent aggregation mode as claimed in claim 2, wherein: in step 1, the completely untrusted data source is the blacklist website data source.
5. The data reliability evaluation algorithm based on the online big data intelligent aggregation mode as claimed in claim 2, wherein: wherein, in step 1, the data sources with more trust, general trust and less trust are endowed with a weighting coefficient zi
Comparing the distrusted websites, and weighting the distrusted websites by a weight coefficient ziSet to 0.8;
a general trust website, which has a weight coefficient ziSet to 1.0;
comparing the trust websites, and weighting the trust websites by a weight coefficient ziSet to 1.2.
CN202110287067.8A 2021-03-17 2021-03-17 Data reliability evaluation algorithm based on online big data intelligent aggregation mode Pending CN113032561A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110287067.8A CN113032561A (en) 2021-03-17 2021-03-17 Data reliability evaluation algorithm based on online big data intelligent aggregation mode

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110287067.8A CN113032561A (en) 2021-03-17 2021-03-17 Data reliability evaluation algorithm based on online big data intelligent aggregation mode

Publications (1)

Publication Number Publication Date
CN113032561A true CN113032561A (en) 2021-06-25

Family

ID=76471408

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110287067.8A Pending CN113032561A (en) 2021-03-17 2021-03-17 Data reliability evaluation algorithm based on online big data intelligent aggregation mode

Country Status (1)

Country Link
CN (1) CN113032561A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117082006A (en) * 2023-08-22 2023-11-17 广东中山网传媒信息科技有限公司 Data source switching method of client based on big data

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109102142A (en) * 2018-06-15 2018-12-28 山东鲁能软件技术有限公司 A kind of personnel evaluation methods and system based on evaluation criterion tree
CN110633996A (en) * 2019-08-05 2019-12-31 长春市万易科技有限公司 Credibility measuring method for enterprise credit evaluation data
CN111046087A (en) * 2019-12-20 2020-04-21 北京锐安科技有限公司 Data processing method, device, equipment and storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109102142A (en) * 2018-06-15 2018-12-28 山东鲁能软件技术有限公司 A kind of personnel evaluation methods and system based on evaluation criterion tree
CN110633996A (en) * 2019-08-05 2019-12-31 长春市万易科技有限公司 Credibility measuring method for enterprise credit evaluation data
CN111046087A (en) * 2019-12-20 2020-04-21 北京锐安科技有限公司 Data processing method, device, equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117082006A (en) * 2023-08-22 2023-11-17 广东中山网传媒信息科技有限公司 Data source switching method of client based on big data
CN117082006B (en) * 2023-08-22 2024-03-19 广东中山网传媒信息科技有限公司 Data source switching method of client based on big data

Similar Documents

Publication Publication Date Title
US11727481B2 (en) Systems and user interfaces for dynamic and interactive investigation of bad actor behavior based on automatic clustering of related data in various data structures
US11416535B2 (en) User interface for visualizing search data
CN107612893B (en) Short message auditing system and method and short message auditing model building method
CN101877837B (en) Method and device for short message filtration
US8566262B2 (en) Techniques to filter media content based on entity reputation
WO2022126963A1 (en) Customer profiling method based on customer response corpora, and device related thereto
WO2021143267A1 (en) Image detection-based fine-grained classification model processing method, and related devices
US11294915B2 (en) Focused probabilistic entity resolution from multiple data sources
US11182447B2 (en) Customized display of emotionally filtered social media content
US20110040630A1 (en) Method and system for matching borrowers and lenders
CN111538794B (en) Data fusion method, device and equipment
CN112231592A (en) Network community discovery method, device, equipment and storage medium based on graph
CN110019841A (en) Construct data analysing method, the apparatus and system of debtor's knowledge mapping
CN113032561A (en) Data reliability evaluation algorithm based on online big data intelligent aggregation mode
CN112950359A (en) User identification method and device
CN112200665A (en) Method and device for determining credit limit
CN116628341A (en) Recommendation method based on multi-type view knowledge comparison learning model
CN114037518A (en) Risk prediction model construction method and device, electronic equipment and storage medium
CN114863162A (en) Object classification method, deep learning model training method, device and equipment
CN114387005A (en) Arbitrage group identification method based on graph classification
Wang et al. Image aesthetics prediction using multiple patches preserving the original aspect ratio of contents
CN115564450B (en) Wind control method, device, storage medium and equipment
JPWO2020202327A1 (en) Learning systems, learning methods, and programs
CN114547448B (en) Data processing method, model training method, device, equipment, storage medium and program
CN113657547B (en) Public opinion monitoring method based on natural language processing model and related equipment thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210625