CN116860720A - Multi-source heterogeneous data model modeling system oriented to big data analysis - Google Patents

Multi-source heterogeneous data model modeling system oriented to big data analysis Download PDF

Info

Publication number
CN116860720A
CN116860720A CN202310859780.4A CN202310859780A CN116860720A CN 116860720 A CN116860720 A CN 116860720A CN 202310859780 A CN202310859780 A CN 202310859780A CN 116860720 A CN116860720 A CN 116860720A
Authority
CN
China
Prior art keywords
data
source heterogeneous
target
heterogeneous data
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310859780.4A
Other languages
Chinese (zh)
Inventor
徐俊山
孔小强
马廷
吕太轩
宋磊
姬廷
董临治
徐生明
常河
周超
王璐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yulin High Tech Zone Xinhui New Energy Co ltd
Original Assignee
Yulin High Tech Zone Xinhui New Energy Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yulin High Tech Zone Xinhui New Energy Co ltd filed Critical Yulin High Tech Zone Xinhui New Energy Co ltd
Priority to CN202310859780.4A priority Critical patent/CN116860720A/en
Publication of CN116860720A publication Critical patent/CN116860720A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/211Schema design and management
    • G06F16/212Schema design and management with details for data modelling support
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • G06Q10/06315Needs-based resource requirements planning or analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Data Mining & Analysis (AREA)
  • Economics (AREA)
  • Quality & Reliability (AREA)
  • Strategic Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Development Economics (AREA)
  • Educational Administration (AREA)
  • Geometry (AREA)
  • Game Theory and Decision Science (AREA)
  • Evolutionary Computation (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Computer Hardware Design (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Computational Linguistics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a multi-source heterogeneous data model modeling system oriented to big data analysis, which belongs to the technical field of multi-source data processing and comprises an information module, an analysis module and a modeling module; the information module is used for users to arrange and upload enterprise demand information and determine corresponding target classes based on the enterprise demand information; the analysis module is used for analyzing each target class and determining an initial processing model of the target multi-source heterogeneous data; the modeling module is used for establishing a multi-source heterogeneous data processing model required by a user, acquiring a target multi-source heterogeneous data initial processing model, and adjusting the target multi-source heterogeneous data initial processing model to acquire a corresponding multi-source heterogeneous data processing model; through the mutual coordination among the information module, the analysis module and the modeling module, personalized establishment of a multi-source heterogeneous data processing model meeting the user requirements is realized; and through the arrangement of the information module, the real modeling requirement of the enterprise user is truly analyzed.

Description

Multi-source heterogeneous data model modeling system oriented to big data analysis
Technical Field
The invention belongs to the technical field of multi-source data processing, and particularly relates to a multi-source heterogeneous data model modeling system for big data analysis.
Background
With the advent of the big data age, hundreds of millions of data are produced at each moment. Based on the massive data, people need to extract useful information from the data to know and even guide people's daily life and work. Thus, big data analysis has grown and is becoming an increasingly popular area.
However, for a big data analysis task, how to acquire the data set needed for the task is a very critical issue. In many data analysis algorithms, especially most machine learning algorithms, data plays a critical role and data plays a decisive role in the quality of the analysis results. However, one often assumes that the data set is already presented. However, the data sets of most data analysis tasks are still often acquired by experts or institutions in this field by means of manual acquisition. The manual data collection method can ensure the data quality and is feasible under the condition of less data quantity, but once the data quantity is increased, the manual data collection method by relying on domain experts or institutions is not practical, and huge manpower, material resources and financial resources are consumed, so that the method is expensive.
Especially in the field of new energy power generation industry, various auxiliary software assisting the work of the field is often applied to each department in an enterprise, so that a large amount of multi-source heterogeneous data can be generated in the enterprise, the coding system of each power station is possibly inconsistent, the complete information of the related data of the equipment cannot be checked by effectively combining the data, and because the requirements of business, operation and the like are met, analysis is possibly needed based on the large data, and therefore, the external related data is needed, and a large amount of multi-source heterogeneous data processing requirements are met; however, for the related middle and small micro enterprises at present, the effect of corresponding data cannot be fully exerted due to the influence of multiple factors such as technology, cost and the like; therefore, in order to solve the modeling requirement of each corresponding enterprise on the multi-source heterogeneous data model, the invention provides a multi-source heterogeneous data model modeling system oriented to big data analysis.
Disclosure of Invention
In order to solve the problems of the scheme, the invention provides a multi-source heterogeneous data model modeling system oriented to big data analysis.
The aim of the invention can be achieved by the following technical scheme:
a multi-source heterogeneous data model modeling system oriented to big data analysis comprises an information module, an analysis module and a modeling module;
the information module is used for users to arrange and upload enterprise demand information and determine corresponding target classes based on the enterprise demand information.
Further, the working method of the information module comprises the following steps:
identifying enterprise demand information uploaded by a user, and acquiring a corresponding target end and modeling demand; and determining corresponding data classes according to the target end, and screening each data class to obtain the corresponding target class.
Further, when the user fills in the enterprise demand information, a corresponding demand information template is preset, and the user fills in corresponding data according to the demand information template.
Further, the method for determining the data class according to the target end comprises the following steps:
gradually establishing and perfecting a target end information base, wherein the target end information base is used for storing various data types corresponding to various target ends;
matching corresponding data classes from a target end information base according to the identified target end;
identifying a target end which is not matched with the data class from the target end information base, and marking the target end as an end to be supplemented; searching corresponding various data types according to the to-be-supplemented terminal, and sorting the data types into corresponding data types;
and supplementing the to-be-supplemented terminal and the corresponding data class into a target terminal information base for storage.
Further, the method for screening each data class comprises the following steps:
a demand analysis model is established, the data class and the enterprise demand information are analyzed through the demand analysis model, basic scores and correction scores corresponding to the data classes are obtained, corresponding evaluation scores are calculated according to the obtained basic scores and correction scores, and the data class with the evaluation score larger than a threshold value X1 is marked as a target class.
Further, the evaluation score calculating method includes:
and respectively marking the obtained basic score and the correction score as JF and XF, and calculating the corresponding evaluation score PGL according to an evaluation formula PGL=b1×JF+b2×XF, wherein b1 and b2 are proportionality coefficients, and the value range is 0< b1 less than or equal to 1, and 0< b2 less than or equal to 1.
And the analysis module is used for analyzing each target class and determining an initial processing model of the target multi-source heterogeneous data.
Further, the working method of the analysis module comprises the following steps:
establishing a model library, wherein each multi-source heterogeneous data initial processing model and a corresponding data processing range are stored in the model library;
and identifying each target class, forming a corresponding target class set, and screening each to-be-selected multi-source heterogeneous data initial processing model based on the target class set to match the corresponding to-be-selected multi-source heterogeneous data initial processing model and the corresponding similarity in the model library to obtain a corresponding target multi-source heterogeneous data initial processing model.
Further, the method for screening the initial processing model of each multi-source heterogeneous data to be selected comprises the following steps:
identifying redundant data classes corresponding to the initial processing model of the multi-source heterogeneous data to be selected, and carrying out similarity correction according to the identified redundant data classes and enterprise demand data to obtain corresponding similarity values and foreground values; removing the to-be-selected multi-source heterogeneous data initial processing model with the similarity value lower than the threshold value X2; identifying cost values corresponding to the initial processing models of the multi-source heterogeneous data to be selected, respectively marking the obtained cost values, foreground values and similar values as CBZ, QJZ and XSZ, and calculating corresponding priority values according to a priority formula KPL=QJZ+XSZ-c×CBZ, wherein c is a cost value adjustment coefficient; and selecting the to-be-selected multi-source heterogeneous data initial processing model with the highest priority value as a target multi-source heterogeneous data initial processing model.
The modeling module is used for establishing a multi-source heterogeneous data processing model required by a user, acquiring a target multi-source heterogeneous data initial processing model, and adjusting the target multi-source heterogeneous data initial processing model to acquire a corresponding multi-source heterogeneous data processing model.
Compared with the prior art, the invention has the beneficial effects that:
through the mutual coordination among the information module, the analysis module and the modeling module, personalized establishment of a multi-source heterogeneous data processing model meeting the user requirements is realized; the real modeling requirements of enterprise users are truly analyzed through the arrangement of the information module, the types of multi-source heterogeneous data processing required by the enterprise users are accurately determined, and accurate processing is facilitated; meanwhile, through personalized service, the enterprise user is helped to reduce the cost of establishing the multi-source heterogeneous data processing model to the greatest extent, the popularization of the system is facilitated, the competitiveness in small and medium-sized enterprises is improved, the problem that the corresponding enterprises use the previous processing mode still because of the cost and the like is avoided, and a large amount of enterprise data cannot be fully applied; by correcting the similarity, a part of the to-be-selected multi-source heterogeneous data initial processing model is removed in advance, so that the amount of the subsequent analysis data is reduced; and the possible subsequent development of the enterprise is combined, and the attention degree of the enterprise to the cost is screened.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings can be obtained according to these drawings without inventive effort to a person skilled in the art.
Fig. 1 is a functional block diagram of the present invention.
Detailed Description
The technical solutions of the present invention will be clearly and completely described in connection with the embodiments, and it is obvious that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
As shown in fig. 1, a multi-source heterogeneous data model modeling system for big data analysis comprises an information module, an analysis module and a modeling module;
the information module is used for users to arrange and upload enterprise demand information, including software, systems, modeling demands, enterprise information and other data used by departments in the enterprise, the modeling demands are processing model demands for processing data in any direction, such as cost analysis demands, business progress demands and the like, a plurality of demands can be set according to actual conditions of the enterprise, and specific demands are set according to own demands of the enterprise; because for enterprises, the cost, the actual processing requirement and other factors need to be considered, some enterprises may have only one requirement, and some enterprises may have a plurality of requirements, and the follow-up model establishment cannot be carried out according to the same requirement, so that resource waste is avoided, higher cost is brought to users, and personalized setting of enterprise users is realized; the method is convenient for reducing the use cost of enterprises, and is an important selection factor especially for small and medium-sized micro enterprises; and processing the enterprise demand information uploaded by the user to obtain target demand data. The specific process is as follows:
setting a requirement information template, filling relevant data of an enterprise by a user according to the requirement information template, acquiring enterprise requirement information after filling, and uploading the enterprise requirement information;
identifying the uploaded enterprise demand information, and identifying a corresponding target end and modeling demands, wherein the target end is software, a system and the like used by each department in the enterprise; determining a corresponding data class according to the target end, wherein the data class is the data type corresponding to each target end, and is marked with the label of each target end and used for indicating which target end belongs to, and one data class possibly has a plurality of target end labels, because each target end possibly has the same type of data; the specific data class determination method is as follows; according to each data class and the corresponding modeling requirement, analyzing, determining each data class which has relevance with the modeling requirement, marking the data class as a target class, namely analyzing which data class data needs to be used in the process of realizing the modeling requirement, regarding the corresponding data class as the target class, mainly comprehensively analyzing by referring to the modeling requirement, enterprise information and enterprise history related item data, and further limiting and confirming the modeling requirement based on the enterprise information and the enterprise history related item data, wherein even though the modeling requirement is the same, other data classes are the same, but the target classes which need to be applied by the enterprise have differences due to different management modes, scales and the like of the enterprise; accordingly, there is a need to analyze characteristics of an enterprise in combination with enterprise information and enterprise history related project data.
Specifically, a corresponding demand analysis model can be established based on a neural network, wherein the neural network comprises a CNN network, a DNN network and the like; training is carried out through the established training set, the training set comprises enterprise demand information, data classes and basic scores and correction scores corresponding to the data classes which are correspondingly arranged, the basic scores are arranged without reference to enterprise information, enterprise history related project data and the like, namely, the basic scores are only arranged according to modeling demands and the data classes, and the correction scores are scores for correction which are analyzed according to the actual conditions of enterprises such as the enterprise information, the enterprise history related project data and the like; analyzing through a demand analysis model after successful training to obtain basic components and correction components corresponding to each data class, respectively marking the obtained basic components and correction components as JF and XF, and calculating corresponding evaluation components PGL according to an evaluation formula PGL=b1×JF+b2×XF, wherein b1 and b2 are proportionality coefficients, and carrying out self-adjustment by enterprise users according to own demands, wherein the value range is 0< b1 less than or equal to 1, and 0< b2 less than or equal to 1; the data class whose evaluation score is greater than the threshold value X1 is marked as the target class.
The method for determining the data class according to the target terminal comprises the following steps:
gradually establishing and perfecting a target end information base, wherein the target end information base is used for storing various data types corresponding to various target ends;
matching corresponding data classes from a target end information base according to the identified target end;
identifying a target end which is not matched with the data class from the target end information base, and marking the target end as an end to be supplemented; because various related software, systems and the like are available in the current market, the target information base can only basically cover software with higher popularity and the like with higher popularity when being established; according to the to-be-supplemented terminal, acquiring various data types possibly generated by the terminal from the Internet or other existing channels, and sorting the data types into corresponding data types;
and supplementing the to-be-supplemented terminal and the corresponding data class into a target terminal information base for storage.
The real modeling requirements of enterprise users are truly analyzed through the arrangement of the information module, the types of multi-source heterogeneous data processing required by the enterprise users are accurately determined, and accurate processing is facilitated; meanwhile, through personalized services, enterprise users are helped to reduce the cost of establishing the multi-source heterogeneous data processing model to the greatest extent, the popularization of the system is facilitated, the competitiveness in small and medium-sized micro enterprises is improved, the problem that the corresponding enterprises use the previous processing mode because of the cost and the like is avoided, and a large amount of enterprise data cannot be fully applied.
The analysis module is used for analyzing each target class and determining a multi-source heterogeneous data initial processing model closest to the requirements, namely determining the multi-source heterogeneous data initial processing model which is most in line with the target class set in the existing model library according to the target class set of each enterprise, and the specific process comprises the following steps:
establishing corresponding multiple multi-source heterogeneous data initial processing models according to service ranges and market demands in a manual mode, setting corresponding multi-source heterogeneous data type processing ranges for each multi-source heterogeneous data initial processing model, and establishing a corresponding model library after finishing;
identifying each target class to form a corresponding target class set, wherein the target class set is a set formed by a plurality of target classes; according to the obtained target class set and the corresponding processing multi-source heterogeneous data type range of each multi-source heterogeneous data initial processing model in the model library, matching is carried out, so that corresponding to-be-selected multi-source heterogeneous data initial processing models and corresponding similarity are obtained, wherein the to-be-selected multi-source heterogeneous data initial processing models refer to the corresponding multi-source heterogeneous data type range covering the corresponding target class set, namely, the types which are equal to or larger than the target class set, if the target class set cannot be fully included, the matching cannot be successfully carried out, and the corresponding multi-source heterogeneous data initial processing models are not used as the to-be-selected multi-source heterogeneous data initial processing models; the similarity is calculated according to the ratio of the number of the target classes to the number of the data classes in the corresponding range; and screening each to-be-selected multi-source heterogeneous data initial processing model to obtain a target multi-source heterogeneous data initial processing model, namely the multi-source heterogeneous data initial processing model closest to the user demand.
The method for screening the initial processing model of each multi-source heterogeneous data to be selected comprises the following steps:
identifying redundant data classes corresponding to each to-be-selected multi-source heterogeneous data initial processing model, and carrying out similarity correction according to each identified redundant data class and enterprise demand data to obtain corresponding similarity values and foreground values; removing the to-be-selected multi-source heterogeneous data initial processing model with the similarity value lower than the threshold value X2; identifying a cost value corresponding to each to-be-selected multi-source heterogeneous data initial processing model, wherein the cost value is converted according to the estimated model establishment cost relative to a user enterprise and is used for carrying out unit conversion post-calculation, setting is carried out based on the preset cost corresponding to each multi-source heterogeneous data initial processing model, matching is carried out subsequently, when the manual price and the like are changed, corresponding adjustment can be carried out, the cost refers to all costs, including subsequent manual adjustment cost and the like, namely all paying costs of estimated enterprise users; the obtained cost value, foreground value and similar value are marked as CBZ, QJZ and XSZ respectively, corresponding priority values are calculated according to a priority formula KPL=QJZ+XSZ-c×CBZ, wherein c is a cost value adjustment coefficient which is set by an enterprise user according to needs, the cost is emphasized, c is set to be more than 1, otherwise, is set to be less than 1, and if not, is defaulted to be 1; and selecting the to-be-selected multi-source heterogeneous data initial processing model with the highest priority value as a target multi-source heterogeneous data initial processing model.
By correcting the similarity, a part of the to-be-selected multi-source heterogeneous data initial processing model is removed in advance, so that the amount of the subsequent analysis data is reduced; and the possible subsequent development of the enterprise is combined, and the attention degree of the enterprise to the cost is screened.
And carrying out similarity correction according to the identified redundant data classes and enterprise demand data, namely correcting according to the relevance of the redundant data classes to the development of the enterprise and the future demand, taking the next development of the enterprise and the model demand change into consideration, and correcting the target classes possibly added to obtain corresponding similar values and foreground values, specifically, establishing a corresponding correction model based on a CNN (computer numerical network) or a DNN (computer numerical network), and establishing a corresponding training set to train in a manual mode, wherein the training set comprises the redundant data classes, the enterprise demand data, the similarity and the correspondingly set corrected similar values and foreground values, and analyzing through the corrected model after successful training to obtain the corresponding similar values and foreground values.
The modeling module is used for establishing a multi-source heterogeneous data processing model of user demands, obtaining a target multi-source heterogeneous data initial processing model, and adjusting the target multi-source heterogeneous data initial processing model according to the user enterprise demands in a manual mode to obtain a corresponding multi-source heterogeneous data processing model.
The above formulas are all formulas with dimensions removed and numerical values calculated, the formulas are formulas which are obtained by acquiring a large amount of data and performing software simulation to obtain the closest actual situation, and preset parameters and preset thresholds in the formulas are set by a person skilled in the art according to the actual situation or are obtained by simulating a large amount of data.
The above embodiments are only for illustrating the technical method of the present invention and not for limiting the same, and it should be understood by those skilled in the art that the technical method of the present invention may be modified or substituted without departing from the spirit and scope of the technical method of the present invention.

Claims (8)

1. The multi-source heterogeneous data model modeling system for big data analysis is characterized by comprising an information module, an analysis module and a modeling module;
the information module is used for users to arrange and upload enterprise demand information and determine corresponding target classes based on the enterprise demand information;
the analysis module is used for analyzing each target class and determining an initial processing model of the target multi-source heterogeneous data;
the modeling module is used for establishing a multi-source heterogeneous data processing model required by a user, acquiring a target multi-source heterogeneous data initial processing model, and adjusting the target multi-source heterogeneous data initial processing model to acquire a corresponding multi-source heterogeneous data processing model.
2. The multi-source heterogeneous data model modeling system for big data analysis according to claim 1, wherein the working method of the information module comprises:
identifying enterprise demand information uploaded by a user, and acquiring a corresponding target end and modeling demand; and determining corresponding data classes according to the target end, and screening each data class to obtain the corresponding target class.
3. The modeling system of a multi-source heterogeneous data model for big data analysis according to claim 2, wherein when the user fills in the enterprise demand information, a corresponding demand information template is preset, and the user fills in the corresponding data according to the demand information template.
4. A multi-source heterogeneous data model modeling system for big data analysis according to claim 3, wherein the method for determining the data class according to the target end comprises:
gradually establishing and perfecting a target end information base, wherein the target end information base is used for storing various data types corresponding to various target ends;
matching corresponding data classes from a target end information base according to the identified target end;
identifying a target end which is not matched with the data class from the target end information base, and marking the target end as an end to be supplemented; searching corresponding various data types according to the to-be-supplemented terminal, and sorting the data types into corresponding data types;
and supplementing the to-be-supplemented terminal and the corresponding data class into a target terminal information base for storage.
5. The multi-source heterogeneous data model modeling system for big data analysis of claim 4, wherein the method for screening each data class comprises:
a demand analysis model is established, the data class and the enterprise demand information are analyzed through the demand analysis model, basic scores and correction scores corresponding to the data classes are obtained, corresponding evaluation scores are calculated according to the obtained basic scores and correction scores, and the data class with the evaluation score larger than a threshold value X1 is marked as a target class.
6. The multi-source heterogeneous data model modeling system for big data analysis according to claim 5, wherein the evaluation score calculating method comprises:
and respectively marking the obtained basic score and the correction score as JF and XF, and calculating the corresponding evaluation score PGL according to an evaluation formula PGL=b1×JF+b2×XF, wherein b1 and b2 are proportionality coefficients, and the value range is 0< b1 less than or equal to 1, and 0< b2 less than or equal to 1.
7. The multi-source heterogeneous data model modeling system for big data analysis according to claim 1, wherein the working method of the analysis module comprises:
establishing a model library, wherein each multi-source heterogeneous data initial processing model and a corresponding data processing range are stored in the model library;
and identifying each target class, forming a corresponding target class set, and screening each to-be-selected multi-source heterogeneous data initial processing model based on the target class set to match the corresponding to-be-selected multi-source heterogeneous data initial processing model and the corresponding similarity in the model library to obtain a corresponding target multi-source heterogeneous data initial processing model.
8. The modeling system of multi-source heterogeneous data model for big data analysis according to claim 7, wherein the method for screening each of the candidate multi-source heterogeneous data initial processing models comprises:
identifying redundant data classes corresponding to the initial processing model of the multi-source heterogeneous data to be selected, and carrying out similarity correction according to the identified redundant data classes and enterprise demand data to obtain corresponding similarity values and foreground values; removing the to-be-selected multi-source heterogeneous data initial processing model with the similarity value lower than the threshold value X2; identifying cost values corresponding to the initial processing models of the multi-source heterogeneous data to be selected, respectively marking the obtained cost values, foreground values and similar values as CBZ, QJZ and XSZ, and calculating corresponding priority values according to a priority formula KPL=QJZ+XSZ-c×CBZ, wherein c is a cost value adjustment coefficient; and selecting the to-be-selected multi-source heterogeneous data initial processing model with the highest priority value as a target multi-source heterogeneous data initial processing model.
CN202310859780.4A 2023-07-13 2023-07-13 Multi-source heterogeneous data model modeling system oriented to big data analysis Pending CN116860720A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310859780.4A CN116860720A (en) 2023-07-13 2023-07-13 Multi-source heterogeneous data model modeling system oriented to big data analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310859780.4A CN116860720A (en) 2023-07-13 2023-07-13 Multi-source heterogeneous data model modeling system oriented to big data analysis

Publications (1)

Publication Number Publication Date
CN116860720A true CN116860720A (en) 2023-10-10

Family

ID=88221268

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310859780.4A Pending CN116860720A (en) 2023-07-13 2023-07-13 Multi-source heterogeneous data model modeling system oriented to big data analysis

Country Status (1)

Country Link
CN (1) CN116860720A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117172729A (en) * 2023-11-03 2023-12-05 南通进宝机械制造有限公司 Labor affair subcontracting personnel management system based on big data
CN117453721A (en) * 2023-10-29 2024-01-26 江苏信而泰智能装备有限公司 Production management data acquisition system based on big data

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117453721A (en) * 2023-10-29 2024-01-26 江苏信而泰智能装备有限公司 Production management data acquisition system based on big data
CN117172729A (en) * 2023-11-03 2023-12-05 南通进宝机械制造有限公司 Labor affair subcontracting personnel management system based on big data
CN117172729B (en) * 2023-11-03 2024-04-05 南通进宝机械制造有限公司 Labor affair subcontracting personnel management system based on big data

Similar Documents

Publication Publication Date Title
CN116860720A (en) Multi-source heterogeneous data model modeling system oriented to big data analysis
CN110517130A (en) A kind of intelligence bookkeeping methods and its system
CN112163424A (en) Data labeling method, device, equipment and medium
CN112488507B (en) Expert classification portrait method and device based on clustering and storage medium
CN110489749B (en) Business process optimization method of intelligent office automation system
CN111949795A (en) Work order automatic classification method and device
CN112214508B (en) Data processing method and device
WO2023071127A1 (en) Policy recommended method and apparatus, device, and storage medium
CN109492859A (en) Employees classification method and device based on neural network model, equipment, medium
CN116423003A (en) Tin soldering intelligent evaluation method and system based on data mining
CN112685374B (en) Log classification method and device and electronic equipment
CN112800219B (en) Method and system for feeding back customer service log to return database
CN116362247A (en) Entity extraction method based on MRC framework
CN115936389A (en) Big data technology-based method for matching evaluation experts with evaluation materials
CN115905470A (en) Method and device for generating financial article, computer equipment and storage medium
CN115375965A (en) Preprocessing method for target scene recognition and target scene recognition method
CN114580348A (en) Method, device, terminal and storage medium for acquiring bidding document by combining RPA and AI
CN113449923A (en) Multi-model object market quotation prediction method and device
CN112182211A (en) Text classification method and device
CN113822272A (en) Data processing method and device
CN110096257B (en) Design graph automatic evaluation system and method based on intelligent recognition
CN117798654B (en) Intelligent adjusting system for center of steam turbine shafting
CN112232352B (en) Automatic pricing system and method for intelligent recognition of PCB drawing
CN118014703A (en) Visual intelligent decision system and method based on digital platform
CN116662882A (en) Mail labeling method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination