CN116628451B - High-speed analysis method for information to be processed - Google Patents

High-speed analysis method for information to be processed Download PDF

Info

Publication number
CN116628451B
CN116628451B CN202310631953.7A CN202310631953A CN116628451B CN 116628451 B CN116628451 B CN 116628451B CN 202310631953 A CN202310631953 A CN 202310631953A CN 116628451 B CN116628451 B CN 116628451B
Authority
CN
China
Prior art keywords
analysis
information
data
processed
scheme
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310631953.7A
Other languages
Chinese (zh)
Other versions
CN116628451A (en
Inventor
宋远岑
陈育鸣
庄健民
蔡定国
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Huacun Electronic Technology Co Ltd
Original Assignee
Jiangsu Huacun Electronic Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Huacun Electronic Technology Co Ltd filed Critical Jiangsu Huacun Electronic Technology Co Ltd
Priority to CN202310631953.7A priority Critical patent/CN116628451B/en
Publication of CN116628451A publication Critical patent/CN116628451A/en
Application granted granted Critical
Publication of CN116628451B publication Critical patent/CN116628451B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a high-speed analysis method of information to be processed, which relates to the technical field of information processing and comprises the following steps: the method comprises the steps of interactively obtaining information to be processed, reading basic attribute information of the information to be processed, including information size and information data type, obtaining information source data, reading source identification data, analyzing source characteristics, generating analysis adaptation constraints, including analysis tool constraints and analysis mode constraints, traversing control parameters, performing information traversal, completing aggregation and segmentation of data, inputting an intelligent analysis model, outputting an analysis matching scheme, and executing data analysis of the aggregation and segmentation result through the analysis matching scheme. The application solves the technical problems of low analysis efficiency and poor accuracy caused by the fact that the information to be processed comes from different sources and has different types and structures in the prior art, so that the analysis method is difficult to deal with.

Description

High-speed analysis method for information to be processed
Technical Field
The application relates to the technical field of information processing, in particular to a high-speed analysis method of information to be processed.
Background
With the rapid development of big data, cloud computing and artificial intelligence technology, various types of data are continuously growing, data analysis becomes a key technology in many fields, and a traditional data analysis method is generally difficult to adapt to diversified and complicated data processing requirements aiming at specific types of data or formats, so that a high-speed analysis method applicable to different data types and sources is provided to improve analysis efficiency and accuracy, and has important practical significance.
Disclosure of Invention
The embodiment of the application provides a high-speed analysis method of information to be processed, which is used for solving the technical problems of low analysis efficiency and poor accuracy caused by the fact that the information to be processed comes from different sources and has different types and structures, so that the analysis method is difficult to deal with in the prior art.
In view of the above problems, embodiments of the present application provide a high-speed analysis method for information to be processed.
In a first aspect, an embodiment of the present application provides a method for high-speed analysis of information to be processed, where the method includes: the method comprises the steps of interactively obtaining information to be processed, and reading basic attribute information of the information to be processed, wherein the basic attribute information comprises information size and information data type; obtaining information source data of the information to be processed, and reading source identification data; performing source characteristic analysis based on the information source data to generate analysis adaptation constraint, wherein the analysis adaptation constraint comprises analysis tool constraint and analysis mode constraint; setting a traversing control parameter through the identification data and the basic attribute information, and carrying out information traversing of the information to be processed through the traversing control parameter to complete aggregation and segmentation of data, wherein an aggregation and segmentation result is provided with a clustering identification; inputting the cluster identification and the analysis adaptation constraint into an intelligent analysis model, and outputting an analysis matching scheme; and executing data analysis of the aggregation segmentation result through the analysis matching scheme.
In a second aspect, an embodiment of the present application provides a high-speed analysis system for information to be processed, the system including: the information processing device comprises an information acquisition module to be processed, a data processing module and a data processing module, wherein the information acquisition module to be processed is used for interactively acquiring information to be processed and reading basic attribute information of the information to be processed, and the basic attribute information comprises information size and information data type; the information source acquisition module is used for acquiring information source data of the information to be processed and reading identification data of the source; the source characteristic analysis module is used for carrying out source characteristic analysis based on the information source data and generating analysis adaptation constraints, wherein the analysis adaptation constraints comprise analysis tool constraints and analysis mode constraints; the information traversing module is used for setting traversing control parameters through the identification data and the basic attribute information, and carrying out information traversing of the information to be processed through the traversing control parameters to complete aggregation and segmentation of the data, wherein an aggregation and segmentation result is provided with a clustering mark; the matching scheme acquisition module is used for inputting the cluster identification and the analysis adaptation constraint into an intelligent analysis model and outputting an analysis matching scheme; and the data analysis module is used for executing data analysis of the aggregation segmentation result through the analysis matching scheme.
One or more technical solutions provided in the embodiments of the present application at least have the following technical effects or advantages:
the method comprises the steps of interactively obtaining information to be processed, reading basic attribute information of the information to be processed, including information size and information data type, obtaining information source data, reading source identification data, analyzing source characteristics, generating analysis adaptation constraints, including analysis tool constraints and analysis mode constraints, traversing control parameters, performing information traversal, completing aggregation and segmentation of data, inputting an intelligent analysis model, outputting an analysis matching scheme, and executing data analysis of the aggregation and segmentation result through the analysis matching scheme. The method solves the technical problems of low analysis efficiency and poor accuracy caused by the fact that information to be processed comes from different sources and has different types and structures in the prior art, and realizes the automatic selection of a proper analysis tool and an analysis mode according to the characteristics of the information to be processed, so that the data analysis mode can be widely applied to data analysis of different sources, types and structures, and the technical effect of meeting diversified and complicated data processing requirements is achieved.
The foregoing description is only an overview of the present application, and is intended to be implemented in accordance with the teachings of the present application in order that the same may be more clearly understood and to make the same and other objects, features and advantages of the present application more readily apparent.
Drawings
Fig. 1 is a schematic flow chart of a high-speed analysis method for information to be processed according to an embodiment of the application;
fig. 2 is a schematic flow chart of an output analysis matching scheme in a high-speed analysis method of information to be processed according to an embodiment of the present application;
fig. 3 is a schematic diagram of a flow chart of data analysis on an aggregated and segmented result in a high-speed analysis method of information to be processed according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of a high-speed analysis system for information to be processed according to an embodiment of the present application.
Reference numerals illustrate: the system comprises a to-be-processed information acquisition module 10, an information source acquisition module 20, a source characteristic analysis module 30, a to-be-processed information traversing module 40, a matching scheme acquisition module 50 and a data analysis module 60.
Description of the embodiments
The embodiment of the application provides a high-speed analysis method of information to be processed, which is used for solving the technical problems of low analysis efficiency and poor accuracy caused by the fact that the information to be processed comes from different sources and has different types and structures, so that the analysis method is difficult to deal with in the prior art.
Examples
As shown in fig. 1, an embodiment of the present application provides a method for high-speed analysis of information to be processed, where the method includes:
step S100: the method comprises the steps of interactively obtaining information to be processed, and reading basic attribute information of the information to be processed, wherein the basic attribute information comprises information size and information data type;
specifically, the information to be processed is received through interaction with a user or a system in the modes of API call, file uploading, input box and the like. Identifying the data type of the information to be processed, such as text, image, audio and the like, by a file extension, MIME type and the like; and calculating the file size or the record number of the statistical data set, and acquiring the size of the information to be processed, wherein the size is taken as a unit of bytes. The acquired data type and information size are integrated into basic attribute information for determining the parsing tools and methods in a subsequent step.
Step S200: obtaining information source data of the information to be processed, and reading source identification data;
specifically, according to the characteristics of the information to be processed, the source of the information is determined through attributes such as metadata, file paths, creation time and the like, wherein the information sources comprise data acquired by a web crawler, data derived from a database, data uploaded by a user and the like. Information related to the information is acquired according to the determined information source, such as a website, a database name, uploading information and the like. This information is used to further understand the context and nature of the information to be processed in order to select the appropriate parsing tools and methods. The obtained information sources and source related information are integrated into source identification data, and the source identification data can be used for distinguishing information of different sources so as to respectively carry out optimized analysis on the data of different sources in the subsequent steps.
Step S300: performing source characteristic analysis based on the information source data to generate analysis adaptation constraint, wherein the analysis adaptation constraint comprises analysis tool constraint and analysis mode constraint;
specifically, analyzing source identification data, extracting key features related to information to be processed, such as the structuring degree, the data sparsity, the data volume and the like of the data, selecting a proper analysis tool for the information to be processed according to the extracted source features, wherein the analysis tool constraint is used for ensuring that the selected tool can meet the analysis requirement of the information to be processed, and the analysis tool comprises various open source libraries, custom algorithms, cloud services and the like. On the basis of selecting an analysis tool, a proper analysis mode is further determined, including sequential analysis, parallel analysis, distributed analysis and the like, and the analysis mode constraint is used for ensuring that the analysis process can fully utilize computing resources and improving the analysis efficiency.
Integrating the selected analysis tool and analysis mode into analysis adaptation constraint, wherein the analysis adaptation constraint comprises analysis tool constraint and analysis mode constraint, and is used for guiding intelligent analysis model selection and data analysis process in subsequent steps.
Step S400: setting a traversing control parameter through the identification data and the basic attribute information, and carrying out information traversing of the information to be processed through the traversing control parameter to complete aggregation and segmentation of data, wherein an aggregation and segmentation result is provided with a clustering identification;
specifically, according to the source identification data and the basic attribute information, traversal control parameters including a traversal speed, a traversal range, a traversal depth, and the like are set for controlling the efficiency and the range of information traversal. And performing information traversal on the information to be processed by using the traversal control parameters, and identifying key structures, features and modes in the data in the traversal process so as to perform effective aggregation segmentation. According to the result of information traversal, the information to be processed is aggregated and segmented, namely similar data are aggregated together, and irrelevant data are segmented, so that the data are organized into a structure which is easy to analyze and process. And (3) distributing cluster identifiers to the aggregated segmentation results, wherein the cluster identifiers can help to distinguish different data sets so as to select a proper analysis scheme according to different data characteristics in a subsequent step.
Step S500: inputting the cluster identification and the analysis adaptation constraint into an intelligent analysis model, and outputting an analysis matching scheme;
in particular, a large number of processed data samples are collected, which samples contain data of different sources, types and structures, for each of which cluster identities, analytical adaptation constraints and corresponding analytical matching schemes are recorded, which data are used as training and validation data for the machine learning model. Preprocessing training data, extracting features related to an analytical matching scheme, including data size, data type, information source, structuring degree and the like.
The method comprises the steps of constructing a network structure of an intelligent analysis model based on a neural network, training the intelligent analysis model, and learning how to generate proper analysis matching schemes for different data sets according to input cluster identifications and analysis adaptation constraints in a training process. And performing performance evaluation on the trained model by using the verification data set, and adjusting and optimizing the model according to the evaluation result so as to improve the analysis effect and accuracy and obtain the intelligent analysis model with the accuracy meeting the requirements.
The cluster identification and the analysis adaptation constraint are integrated to serve as input data of the intelligent analysis model, the data comprise key features of information to be processed, and the intelligent analysis model is facilitated to select proper analysis schemes for different data sets. Inputting input data into an intelligent analysis model, generating and properly analyzing and matching schemes for different data sets by the model according to the input cluster identification and analysis adaptation constraint, and taking the generated and properly analyzed and matched schemes as output. The analysis matching scheme comprises information such as analysis tools, analysis modes and the like aiming at different data sets and is used for guiding a subsequent data analysis process.
Further, as shown in fig. 2, step S500 of the present application further includes:
step S510: setting a screening matching threshold value of a scheme;
step S520: when the scheme matching is carried out through the intelligent analysis model, if the matching value of the matching scheme can meet the screening matching threshold value, the corresponding matching scheme is reserved;
step S530: and carrying out scheme integration on the reserved matching scheme, and outputting the analysis matching scheme.
Specifically, a screening matching threshold is set according to actual application requirements and analysis effects and used for evaluating the quality of a matching scheme generated by the intelligent analysis model. The intelligent analysis model generates analysis matching schemes for different data sets, calculates the matching value of each matching scheme, judges whether screening matching threshold values are met, and only the matching schemes meeting the threshold values can be reserved so as to ensure the effect and accuracy of the analysis process. Integrating the reserved matching schemes, wherein the integrating process comprises the operations of sorting, de-duplication, weight distribution and the like of the matching schemes to form a comprehensive analysis matching scheme, and finally, outputting the integrated analysis matching scheme for guiding the subsequent data analysis process.
Further, step S530 of the present application further includes:
step S531: reading real-time tool call data through a tool call module, wherein the tool call module is a tool constraint module of the intelligent analysis model;
step S532: generating tool constraint data based on the tool call data reading result;
step S533: performing scheme execution analysis on the reserved matching scheme through the tool constraint data;
step S534: and obtaining the analysis matching scheme according to the analysis result and the matching value.
Specifically, a communication connection is established with a tool call module that is responsible for monitoring the usage of the analytical tool, such as performance metrics, load conditions, availability, etc., in real time, which will be stored in a database. And constructing a query statement according to the type of the database used by the tool calling module, and reading real-time tool calling data from the database.
And (3) aggregating the converted tool call data, integrating various indexes of the same analysis tool to form tool constraint data, for example, integrating information such as performance index, load condition and availability of a certain analysis tool into a data object.
For each reserved matching scheme, an estimate of the effect and resource consumption is calculated in conjunction with the tool constraint data calculation scheme. And optimizing the reserved matching scheme according to the evaluation result, wherein the optimization process comprises the steps of adjusting an analysis tool, analysis parameters and the like so as to improve the execution effect and the resource utilization rate of the scheme.
And sorting the reserved matching schemes according to the analysis result and the matching value of the scheme execution. A solution with a high matching value and performing well is preferred. And distributing a weight to each reserved matching scheme, wherein the weight can be comprehensively calculated according to the matching value, the execution effect and other factors of the schemes, so that the contributions of different schemes are balanced in the execution process. And integrating the matching schemes after sequencing and weight distribution into an analysis matching scheme by adopting a weighted average method.
Further, step S532 of the present application further includes:
step S5321: acquiring an analysis time node for obtaining the information to be processed;
step S5322: carrying out occupation analysis on the time node by using a history tool to generate initialization information;
step S5323: carrying out module initialization on the tool calling module through the initialization information;
step S5324: and carrying out constraint analysis on the tool call data reading result based on the module initialization result, and generating the tool constraint data.
Specifically, the current time of the system is obtained, so that the time node for analyzing the information to be processed, namely the time for starting analysis, is recorded.
According to the analysis time node, the history tool use records are queried through checking log files, databases or other storage modes, and the collected data comprise the use frequency, occupation condition, performance and the like of the tool. And calculating the use frequency and the occupation condition of each tool in different time periods, and then analyzing the overall occupation condition of each tool in the whole time period, wherein the calculation comprises the steps of calculating indexes such as average occupation rate, peak occupation rate and the like.
And generating initialization information of a tool calling module according to the occupation analysis result, wherein the initialization information comprises suggested initial tool configuration, tool use weights and the like, the initial tool configuration comprises tool versions, performance settings and the like which are most suitable for the current occupation situation, and the tool use weights are used for indicating the relative importance of tools in the intelligent analysis model.
According to the suggested configuration in the initialization information, parameter setting, resource allocation and the like of each analysis tool are adjusted so as to better adapt to the current occupation situation and performance requirements; according to the tool use weight in the initialization information, the weight of each analysis tool in the intelligent analysis model is set, and the tools with higher weight values have higher priority in the model, so that the tools with higher weight values are more likely to be selected as part of the analysis scheme.
Analyzing the initialization result of the tool calling module, including tool configuration, tool weight and the like, and generating constraint conditions including limiting the use frequency, resource occupation and the like of certain tools according to the module initialization result. And applying the generated constraint conditions to tool call data reading results, screening and adjusting the reading results so as to ensure that the analysis scheme meets the constraint conditions, and generating final tool constraint data according to constraint analysis results, wherein the final tool constraint data comprises a list of available tools, tool weights and the like, and the final tool constraint data is used as one of the inputs of the intelligent analysis model.
Step S600: and executing data analysis of the aggregation segmentation result through the analysis matching scheme.
Specifically, in the analysis matching scheme output by the intelligent analysis model, analysis tools and analysis modes aiming at different data sets are obtained, and according to the analysis matching scheme, corresponding analysis tools and analysis modes are distributed for each data set, so that the characteristics aiming at the different data sets are ensured, and the most suitable analysis method is selected. And according to the distributed analysis tasks, carrying out data analysis on the aggregation segmentation result by using a corresponding analysis tool and an analysis mode.
Further, as shown in fig. 3, step S600 of the present application further includes:
step S610: the method comprises the steps of connecting a data processing system, and reading the system occupation of the data processing system to generate real-time system occupation information;
step S620: performing fitting analysis on the analysis matching scheme, and generating estimated system occupation information according to the real-time system occupation information and fitting analysis results;
step S630: judging whether the estimated system occupation information meets a preset occupation constraint threshold;
step S640: when the estimated system occupation information meets the preset occupation constraint threshold, generating a delay processing instruction;
step S650: and controlling the analysis matching scheme to carry out data analysis on the aggregation segmentation result through the delay processing instruction.
Specifically, connection is established with the data processing system through the communication interface, system resource occupation information is obtained from the data processing system, the system resource occupation information comprises CPU utilization rate, memory use condition, disk I/O and the like, and the collected system occupation information is arranged into real-time system occupation information.
And carrying out fitting analysis on the analytic matching scheme by utilizing historical data or simulation data to evaluate the system occupation situation possibly generated in the actual execution process, wherein the fitting analysis comprises the estimation of resources such as CPU, memory, disk I/O and the like. And integrating the fitting analysis result and the real-time system occupation information to consider the actual occupation situation of the current system resources, and generating estimated system occupation information according to the fitting analysis result and the real-time system occupation information, wherein the estimated system occupation information is used for judging the influence of the analysis matching scheme on the system resources and whether delay processing is needed.
And setting an occupation constraint threshold according to the capacity and performance requirements of the system, such as maximum CPU utilization rate, maximum memory utilization rate and the like, and comparing the estimated system occupation information with a preset occupation constraint threshold to judge whether the analysis matching scheme generates excessive pressure on the system in the execution process.
When the estimated system occupation information reaches the preset occupation constraint threshold, the analysis matching scheme is explained to generate excessive pressure on the system in the actual execution process, and corresponding delay processing instructions are generated according to the difference between the estimated system occupation information and the preset occupation constraint threshold, for example, by means of reducing analysis speed, adjusting analysis priority and the like, so as to reduce the system resource pressure.
And adjusting the analysis matching scheme according to the delay processing instruction, such as reducing analysis speed, adjusting analysis priority, arranging analysis task execution time, and analyzing data of the aggregation segmentation result within the allowable range of system resources according to the adjusted analysis matching scheme so as to ensure stable operation of the whole system.
Further, the application also comprises:
step S710: recording analysis identification information, and generating analysis feedback through the analysis identification information;
step S720: performing source characteristic adjustment on the information source data through the analysis feedback to obtain a source characteristic adjustment result;
step S730: and carrying out data processing on the information source data according to the source characteristic adjustment result.
Specifically, in the analysis process, identification information related to analysis, such as whether analysis is successful or not, time required for analysis, system resources consumed in the analysis process, and the like, is recorded, and analysis feedback including each index in the analysis process is generated according to the recorded analysis identification information. And evaluating the effect of the analysis process, such as analysis speed, analysis accuracy and the like, according to the index in the analysis feedback, and adjusting the characteristics of the information source data, such as adjusting the data format, the data structure and the like, according to the analysis result of the analysis feedback, so as to obtain the adjusted information source data after the source characteristics are adjusted. Updating the analysis adaptation constraint and the analysis matching scheme according to the adjusted source characteristics, and analyzing the adjusted information source data by using the updated analysis matching scheme so as to improve the analysis effect.
Further, the application also comprises:
step S810: judging whether the information to be processed has the analysis stability requirement of the information or not;
step S820: when the analysis stability requirement exists, calling an analysis stability identifier;
step S830: performing stability analysis scheme matching on the data information with the analysis stability mark to obtain a mark analysis scheme;
step S840: and carrying out scheme optimization of the analysis matching scheme through the identification analysis scheme.
Specifically, according to basic attribute information and application scenes of the information to be processed, analysis stability requirements are analyzed, and whether the information to be processed has the analysis stability requirements or not is judged according to analysis results, for example, high instantaneity, high accuracy and the like. If the information to be processed is judged to have the analysis stability requirement, an analysis stability mark is added to the information to be processed so as to indicate that the information needs to be subjected to stability analysis.
And extracting the information to be processed added with the analysis stability identification, and screening out a stability analysis scheme matched with the analysis stability requirement through an intelligent analysis model and combining with the analysis stability identification. Analyzing analysis tool constraint and analysis mode constraint in the identification analysis scheme and other related information, adjusting the original analysis matching scheme according to analysis results, optimizing parameters such as analysis tools, analysis modes and the like, enabling the parameters to meet analysis stability requirements, and generating an optimized analysis matching scheme for a subsequent data analysis process.
In summary, the method for high-speed analysis of information to be processed provided by the embodiment of the application has the following technical effects:
the method comprises the steps of interactively obtaining information to be processed, reading basic attribute information of the information to be processed, including information size and information data type, obtaining information source data, reading source identification data, analyzing source characteristics, generating analysis adaptation constraints, including analysis tool constraints and analysis mode constraints, traversing control parameters, performing information traversal, completing aggregation and segmentation of data, inputting an intelligent analysis model, outputting an analysis matching scheme, and executing data analysis of the aggregation and segmentation result through the analysis matching scheme. The method solves the technical problems of low analysis efficiency and poor accuracy caused by the fact that information to be processed comes from different sources and has different types and structures in the prior art, and realizes the automatic selection of a proper analysis tool and an analysis mode according to the characteristics of the information to be processed, so that the data analysis mode can be widely applied to data analysis of different sources, types and structures, and the technical effect of meeting diversified and complicated data processing requirements is achieved.
Examples
Based on the same inventive concept as the high-speed analysis method of the information to be processed in the foregoing embodiments, as shown in fig. 4, the present application provides a high-speed analysis system of the information to be processed, the system comprising:
the information processing device comprises an information to be processed acquisition module 10, a data processing module 10 and a data processing module, wherein the information to be processed acquisition module 10 is used for interactively acquiring information to be processed and reading basic attribute information of the information to be processed, and the basic attribute information comprises information size and information data type;
the information source acquisition module 20 is used for acquiring information source data of the information to be processed and reading identification data of a source;
the source characteristic analysis module 30 is configured to perform source characteristic analysis based on the information source data, and generate an analysis adaptation constraint, where the analysis adaptation constraint includes an analysis tool constraint and an analysis mode constraint;
the information to be processed traversing module 40 is configured to set a traversing control parameter according to the identification data and the basic attribute information, and perform information traversing of the information to be processed according to the traversing control parameter, so as to complete aggregation and segmentation of data, where an aggregation and segmentation result has a cluster identifier;
the matching scheme acquisition module 50 is used for inputting the cluster identification and the analysis adaptation constraint into an intelligent analysis model and outputting an analysis matching scheme;
the data analysis module 60 is configured to perform data analysis of the aggregated segmentation result according to the analysis matching scheme.
Further, the system further comprises:
the matching threshold acquisition module is used for setting a screening matching threshold of the scheme;
the matching scheme reserving module is used for reserving a corresponding matching scheme if the matching value of the matching scheme can meet the screening matching threshold value when the scheme matching is carried out through the intelligent analysis model;
and the scheme integration module is used for integrating the schemes of the reserved matching schemes and outputting the analysis matching schemes.
Further, the system further comprises:
the calling data reading module is used for reading real-time tool calling data through the tool calling module, wherein the tool calling module is a tool constraint module of the intelligent analysis model;
the constraint data generation module is used for generating tool constraint data based on the tool call data reading result;
the scheme execution analysis module is used for carrying out scheme execution analysis on the reserved matching scheme through the tool constraint data;
and the analysis matching scheme acquisition module is used for acquiring the analysis matching scheme according to the analysis result and the matching value.
Further, the system further comprises:
the analysis time node acquisition module is used for acquiring analysis time nodes for acquiring the information to be processed;
the occupation analysis module is used for carrying out occupation analysis on the historical tool use of the time node and generating initialization information;
the module initialization module is used for carrying out module initialization on the tool calling module through the initialization information;
and the constraint analysis module is used for carrying out constraint analysis on the tool call data reading result based on the module initialization result and generating the tool constraint data.
Further, the system further comprises:
the system occupation reading module is used for connecting a data processing system and reading the system occupation of the data processing system to generate real-time system occupation information;
the fitting analysis module is used for executing fitting analysis on the analysis matching scheme and generating estimated system occupation information according to the real-time system occupation information and fitting analysis results;
the occupation information judging module is used for judging whether the estimated system occupation information meets a preset occupation constraint threshold;
the delay processing instruction generation module is used for generating a delay processing instruction when the estimated system occupation information meets the preset occupation constraint threshold;
and the aggregation segmentation result analysis module is used for controlling the analysis matching scheme to carry out data analysis on the aggregation segmentation result through the delay processing instruction.
Further, the system further comprises:
the analysis feedback generation module is used for recording analysis identification information and generating analysis feedback through the analysis identification information;
the source characteristic adjustment module is used for adjusting the source characteristic of the information source data through the analysis feedback to obtain a source characteristic adjustment result;
and the data processing module is used for carrying out data processing on the information source data according to the source characteristic adjustment result.
Further, the system further comprises:
the information to be processed judging module is used for judging whether the information to be processed has the analysis stability requirement of the information;
the identification calling module is used for calling the analysis stable identification when the analysis stable requirement exists;
the scheme matching module is used for carrying out stability analysis scheme matching on the data information with the analysis stability identifier to obtain an identifier analysis scheme;
and the scheme optimization module is used for carrying out scheme optimization of the analysis matching scheme through the identification analysis scheme.
The foregoing detailed description of a method for high-speed analysis of information to be processed will be clear to those skilled in the art, and the device disclosed in this embodiment is relatively simple in description, and the relevant points refer to the method section.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (6)

1. A method for high-speed analysis of information to be processed, the method comprising:
step S100: the method comprises the steps of interactively obtaining information to be processed, and reading basic attribute information of the information to be processed, wherein the basic attribute information comprises information size and information data type;
step S200: obtaining information source data of the information to be processed, and reading source identification data;
step S300: performing source characteristic analysis based on the information source data to generate analysis adaptation constraint, wherein the analysis adaptation constraint comprises analysis tool constraint and analysis mode constraint;
step S400: setting a traversing control parameter through the identification data and the basic attribute information, and carrying out information traversing of the information to be processed through the traversing control parameter to complete aggregation and segmentation of data, wherein an aggregation and segmentation result is provided with a clustering identification;
step S500: inputting the cluster identification and the analysis adaptation constraint into an intelligent analysis model, and outputting an analysis matching scheme, wherein the method comprises the following steps:
step S510: setting a screening matching threshold value of a scheme;
step S520: when the scheme matching is carried out through the intelligent analysis model, if the matching value of the matching scheme can meet the screening matching threshold value, the corresponding matching scheme is reserved;
step S530: and carrying out scheme integration on the reserved matching scheme, outputting the analysis matching scheme, and further comprising:
step S531: reading real-time tool call data through a tool call module, wherein the tool call module is a tool constraint module of the intelligent analysis model and is responsible for monitoring performance indexes, load conditions and availability of analysis tools in real time;
step S532: generating tool constraint data based on a tool call data reading result, wherein the tool constraint data is data which integrates various indexes of the same analysis tool after the converted tool call data is subjected to aggregation treatment;
step S533: carrying out scheme execution analysis on the reserved matching scheme through the tool constraint data, and particularly calculating a scheme execution effect and a resource consumption estimated value by combining the tool constraint data;
step S534: the analysis matching scheme is obtained according to the analysis result and the matching value, and specifically comprises the following steps: integrating the matching schemes after sequencing and weight distribution into an analysis matching scheme by adopting a weighted average method, wherein the analysis matching scheme comprises the steps of distributing weights for each reserved matching scheme, and comprehensively calculating the weights according to the matching values of the schemes and execution effect factors;
step S600: and executing data analysis of the aggregation segmentation result through the analysis matching scheme.
2. The method of claim 1, wherein the step S532 further comprises:
step S5321: acquiring an analysis time node for obtaining the information to be processed, wherein the analysis time node is the time for starting analysis;
step S5322: carrying out historical tool use occupation analysis on the time node to generate initialization information, wherein the initialization information comprises suggested initial tool configuration and tool use weights;
step S5323: carrying out module initialization on the tool calling module through the initialization information;
step S5324: and carrying out constraint analysis on the tool call data reading result based on the module initialization result, and generating the tool constraint data.
3. The method of claim 2, wherein the step S600 further comprises:
step S610: the method comprises the steps of connecting a data processing system, and generating real-time system occupation information for system occupation reading of the data processing system, wherein the real-time system occupation information comprises CPU (central processing unit) utilization rate, memory utilization condition and disk I/O (input/output);
step S620: performing fitting analysis on the analysis matching scheme, and generating estimated system occupation information according to the real-time system occupation information and fitting analysis results;
step S630: judging whether the estimated system occupation information meets a preset occupation constraint threshold, wherein the preset occupation constraint threshold is set according to the capacity and performance requirements of the system;
step S640: when the estimated system occupation information meets the preset occupation constraint threshold, generating a delay processing instruction;
step S650: and controlling the analysis matching scheme to carry out data analysis on the aggregation segmentation result through the delay processing instruction.
4. The method of claim 1, wherein step S700 further comprises:
step S710: recording analysis identification information, wherein the analysis identification information is identification information related to analysis in the analysis process, and generating analysis feedback through the analysis identification information;
step S720: performing source characteristic adjustment on the information source data through the analysis feedback to obtain a source characteristic adjustment result, wherein the source characteristic adjustment result comprises characteristics after data format and data structure adjustment;
step S730: and carrying out data processing on the information source data according to the source characteristic adjustment result, wherein the data processing comprises the following steps: and updating the analysis adaptation constraint and the analysis matching scheme according to the adjusted source characteristics, and analyzing the adjusted information source data by using the updated analysis matching scheme.
5. The method of claim 1, wherein step S800 further comprises:
step S810: judging whether the information to be processed has an analysis stability requirement of the information, wherein the analysis stability requirement is a requirement of high real-time performance and high accuracy, and the analysis stability requirement is a set value according to basic attribute information and application scenes of the information to be processed;
step S820: when the analysis stability requirement exists, calling an analysis stability identifier, wherein the analysis stability identifier is an analysis stability identifier added for information to be processed;
step S830: performing stability analysis scheme matching on the data information with the analysis stability mark to obtain a mark analysis scheme, wherein the mark analysis scheme is that the stability analysis scheme matched with the analysis stability requirement is screened out by combining the intelligent analysis model with the analysis stability mark;
step S840: and carrying out scheme optimization of the analysis matching scheme through the identification analysis scheme, wherein the scheme optimization comprises analysis of analysis tool constraint and analysis mode constraint in the identification analysis scheme, and adjustment of the original analysis matching scheme.
6. A high-speed analysis system for information to be processed, the system for implementing the method of claim 1, comprising:
the information processing device comprises an information acquisition module to be processed, a data processing module and a data processing module, wherein the information acquisition module to be processed is used for interactively acquiring information to be processed and reading basic attribute information of the information to be processed, and the basic attribute information comprises information size and information data type;
the information source acquisition module is used for acquiring information source data of the information to be processed and reading identification data of the source;
the source characteristic analysis module is used for carrying out source characteristic analysis based on the information source data and generating analysis adaptation constraints, wherein the analysis adaptation constraints comprise analysis tool constraints and analysis mode constraints;
the information traversing module is used for setting traversing control parameters through the identification data and the basic attribute information, and carrying out information traversing of the information to be processed through the traversing control parameters to complete aggregation and segmentation of the data, wherein an aggregation and segmentation result is provided with a clustering mark;
the matching scheme acquisition module is used for inputting the cluster identification and the analysis adaptation constraint into an intelligent analysis model and outputting an analysis matching scheme;
and the data analysis module is used for executing data analysis of the aggregation segmentation result through the analysis matching scheme.
CN202310631953.7A 2023-05-31 2023-05-31 High-speed analysis method for information to be processed Active CN116628451B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310631953.7A CN116628451B (en) 2023-05-31 2023-05-31 High-speed analysis method for information to be processed

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310631953.7A CN116628451B (en) 2023-05-31 2023-05-31 High-speed analysis method for information to be processed

Publications (2)

Publication Number Publication Date
CN116628451A CN116628451A (en) 2023-08-22
CN116628451B true CN116628451B (en) 2023-11-14

Family

ID=87597134

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310631953.7A Active CN116628451B (en) 2023-05-31 2023-05-31 High-speed analysis method for information to be processed

Country Status (1)

Country Link
CN (1) CN116628451B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116956363B (en) * 2023-09-20 2023-12-05 微网优联科技(成都)有限公司 Data management method and system based on cloud computer technology

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108108288A (en) * 2018-01-09 2018-06-01 北京奇艺世纪科技有限公司 A kind of daily record data analytic method, device and equipment
CN108427709A (en) * 2018-01-25 2018-08-21 朗新科技股份有限公司 A kind of multi-source mass data processing system and method
EP3640861A1 (en) * 2018-10-17 2020-04-22 Capital One Services, LLC Systems and methods for parsing log files using classification and a plurality of neural networks
CN112749543A (en) * 2020-12-22 2021-05-04 浙江吉利控股集团有限公司 Matching method, device, equipment and storage medium for information analysis process
CN112910902A (en) * 2021-02-04 2021-06-04 浙江大华技术股份有限公司 Data analysis method and device, electronic equipment and computer readable storage medium
CN113111140A (en) * 2021-05-12 2021-07-13 国家海洋信息中心 Method for rapidly analyzing multi-source marine business observation data
CN113657088A (en) * 2021-08-16 2021-11-16 北京百度网讯科技有限公司 Interface document analysis method and device, electronic equipment and storage medium
CN114598597A (en) * 2022-02-24 2022-06-07 烽台科技(北京)有限公司 Multi-source log analysis method and device, computer equipment and medium
WO2022116425A1 (en) * 2020-12-03 2022-06-09 平安科技(深圳)有限公司 Method and system for data lineage analysis, computer device, and storage medium
CN115437877A (en) * 2022-08-18 2022-12-06 华南理工大学 Online analysis method and system for multi-source log, electronic equipment and storage medium
CN115618083A (en) * 2022-10-27 2023-01-17 北京亚鸿世纪科技发展有限公司 Method and device for multi-source heterogeneous data normalization
CN115828888A (en) * 2022-11-18 2023-03-21 贵州电网有限责任公司遵义供电局 Method for semantic analysis and structurization of various weblogs
CN116010499A (en) * 2022-12-29 2023-04-25 绿盟科技集团股份有限公司 Method and device for determining analysis rule and electronic equipment
CN116189436A (en) * 2023-03-17 2023-05-30 佛山市众合科技有限公司 Multi-source data fusion algorithm based on big data

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8949240B2 (en) * 2012-07-03 2015-02-03 General Instrument Corporation System for correlating metadata

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108108288A (en) * 2018-01-09 2018-06-01 北京奇艺世纪科技有限公司 A kind of daily record data analytic method, device and equipment
CN108427709A (en) * 2018-01-25 2018-08-21 朗新科技股份有限公司 A kind of multi-source mass data processing system and method
EP3640861A1 (en) * 2018-10-17 2020-04-22 Capital One Services, LLC Systems and methods for parsing log files using classification and a plurality of neural networks
WO2022116425A1 (en) * 2020-12-03 2022-06-09 平安科技(深圳)有限公司 Method and system for data lineage analysis, computer device, and storage medium
CN112749543A (en) * 2020-12-22 2021-05-04 浙江吉利控股集团有限公司 Matching method, device, equipment and storage medium for information analysis process
CN112910902A (en) * 2021-02-04 2021-06-04 浙江大华技术股份有限公司 Data analysis method and device, electronic equipment and computer readable storage medium
CN113111140A (en) * 2021-05-12 2021-07-13 国家海洋信息中心 Method for rapidly analyzing multi-source marine business observation data
CN113657088A (en) * 2021-08-16 2021-11-16 北京百度网讯科技有限公司 Interface document analysis method and device, electronic equipment and storage medium
CN114598597A (en) * 2022-02-24 2022-06-07 烽台科技(北京)有限公司 Multi-source log analysis method and device, computer equipment and medium
CN115437877A (en) * 2022-08-18 2022-12-06 华南理工大学 Online analysis method and system for multi-source log, electronic equipment and storage medium
CN115618083A (en) * 2022-10-27 2023-01-17 北京亚鸿世纪科技发展有限公司 Method and device for multi-source heterogeneous data normalization
CN115828888A (en) * 2022-11-18 2023-03-21 贵州电网有限责任公司遵义供电局 Method for semantic analysis and structurization of various weblogs
CN116010499A (en) * 2022-12-29 2023-04-25 绿盟科技集团股份有限公司 Method and device for determining analysis rule and electronic equipment
CN116189436A (en) * 2023-03-17 2023-05-30 佛山市众合科技有限公司 Multi-source data fusion algorithm based on big data

Also Published As

Publication number Publication date
CN116628451A (en) 2023-08-22

Similar Documents

Publication Publication Date Title
CN116628451B (en) High-speed analysis method for information to be processed
EP1970820A1 (en) Information processing device and method, and program
CN115497272A (en) Construction period intelligent early warning system and method based on digital construction
CN117113235B (en) Cloud computing data center energy consumption optimization method and system
CN117670066B (en) Questor management method, system, equipment and storage medium based on intelligent decision
CN117453764A (en) Data mining analysis method
CN112202849A (en) Content distribution method, content distribution device, electronic equipment and computer-readable storage medium
CN115185663A (en) Intelligent data processing system based on big data
CN117495512B (en) Order data management method, device, equipment and storage medium
CN117453409A (en) Data center resource prediction and scheduling method and system based on machine learning
CN116566831A (en) Mobile network resource management method and system based on cloud computing
US20210286809A1 (en) System for generating predicate-weighted histograms
CN111652384B (en) Balancing method for data volume distribution and data processing method
CN117973947B (en) Standardized acceptance checking method and system for power distribution network engineering construction process
CN117632313B (en) Software driving processing method and system based on artificial intelligence
CN117193675B (en) Solid-state storage management system based on distributed computing capacity
CN117808602B (en) Hot account billing method and related device based on sub-account expansion
CN116739545B (en) Method and device for improving intelligent message touch rate
CN113190844B (en) Detection method, correlation method and correlation device
CN117039855B (en) Intelligent load prediction method and system for power system
CN117975977B (en) Intelligent compression method and system for power amplifier
CN118312861A (en) File export method and system based on AI cloud computing
CN117972367B (en) Data storage prediction method, data storage subsystem and intelligent computing platform
CN116991905A (en) Accurate marketing operation and maintenance method and system based on artificial energy intelligent pushing
CN114546337A (en) Multi-dimensional internet large-scale user service demand prediction method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant