CN117539948B - Service data retrieval method and device based on deep neural network - Google Patents

Service data retrieval method and device based on deep neural network Download PDF

Info

Publication number
CN117539948B
CN117539948B CN202410033954.6A CN202410033954A CN117539948B CN 117539948 B CN117539948 B CN 117539948B CN 202410033954 A CN202410033954 A CN 202410033954A CN 117539948 B CN117539948 B CN 117539948B
Authority
CN
China
Prior art keywords
data
structured
neural network
deep neural
raster
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410033954.6A
Other languages
Chinese (zh)
Other versions
CN117539948A (en
Inventor
张少军
李晓朋
刘科检
何宇
王宬
樊超
侯建鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Lingkong Electronic Technology Co Ltd
Original Assignee
Xian Lingkong Electronic Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Lingkong Electronic Technology Co Ltd filed Critical Xian Lingkong Electronic Technology Co Ltd
Priority to CN202410033954.6A priority Critical patent/CN117539948B/en
Publication of CN117539948A publication Critical patent/CN117539948A/en
Application granted granted Critical
Publication of CN117539948B publication Critical patent/CN117539948B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/248Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/258Data format conversion from or to a database
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0499Feedforward networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/096Transfer learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/0985Hyperparameter optimisation; Meta-learning; Learning-to-learn
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a service data retrieval method and device based on a deep neural network, and relates to the technical field of simulation data processing. The method comprises the following steps: performing information rasterization processing on the acquired original business data to obtain raster data; wherein the original business data is multidimensional time-varying data; integrating the raster data into a structured dataset; analyzing the structured dataset and generating a corresponding data view; and outputting the feature vector of the data view through the trained deep neural network model, and carrying out partition matching according to the feature vector and the search content of the user to obtain a search result. The simulation system solves the problem that the existing simulation system is difficult to consider in both digital processing capability and user experience, achieves the effect of considering the user experience on the premise of keeping the data processing capability, and can meet the changing market demands and user demands.

Description

Service data retrieval method and device based on deep neural network
Technical Field
The application relates to the technical field of simulation data processing, in particular to a service data retrieval method and device based on a deep neural network.
Background
A simulation system, also called simulation system, is a complex computer system, which is mainly used in the fields of training, research, prediction, etc. Such a system can help scientists or professionals better understand what may happen in a particular scenario by simulating variables and interactions in the real environment. And can carry out product design improvement, weather prediction, pilot training, military training, financial market analysis and the like based on the results. In addition to directly simulating real world applications, simulation systems may also be applied in virtual reality and electronic games. By mimicking the physical laws and environment of the real world, the simulation system is able to create an immersive experience for the user.
In the current simulation system integration, the business data has the characteristics of diversification and dynamic change, and the requirement on the data processing capacity is higher. Moreover, users nowadays pay more attention to experience effects of the simulation system, namely, higher requirements on processing speed and presentation modes of the simulation system data are put forward. However, the existing simulation system is difficult to consider both digital processing capability and user experience, and is difficult to meet the changing market demands and user demands.
Disclosure of Invention
The embodiment of the application solves the problem that the existing simulation system is difficult to consider in terms of both digital processing capacity and user experience sense by providing the service data retrieval method based on the deep neural network, and the service data retrieval method based on the deep neural network can solve the problem.
In a first aspect, an embodiment of the present application provides a service data retrieval method based on a deep neural network, including: performing information rasterization processing on the acquired original business data to obtain raster data; wherein the original business data is multidimensional time-varying data; integrating the raster data into a structured dataset; analyzing the structured dataset and generating a corresponding data view; and outputting the feature vector of the data view through the trained deep neural network model, and carrying out partition matching according to the feature vector and the search content of the user to obtain a search result.
With reference to the first aspect, in a first possible implementation manner, before performing information rasterization processing on the obtained original service data to obtain raster data, the method further includes: performing primary preprocessing on the acquired original service data; the primary preprocessing comprises the operations of cleaning, denoising and smoothing the original business data.
With reference to the first aspect, in a second possible implementation manner, the performing information rasterization processing on the acquired original service data to obtain raster data includes: defining information grid rules according to the current business requirements or targets of the system; determining the information grid rule, dividing the original service data to obtain grid data, and setting identifiers for the grid data.
With reference to the first aspect, in a third possible implementation manner, the integrating the raster data into a structured data set includes: performing secondary preprocessing on the raster data, and integrating the raster data after the secondary preprocessing into structured data; extracting data features based on the structured data to obtain data features; and constructing the structured data set according to each piece of structured data and the corresponding data characteristic as a data element.
With reference to the third possible implementation manner of the first aspect, in a fourth possible implementation manner, the generating a corresponding data view includes: designing an adapted data structure for each of the data elements in the structured dataset; and generating the corresponding data view according to the data structure of each data element.
With reference to the third possible implementation manner of the first aspect, in a fifth possible implementation manner, before the generating a corresponding data view, the method further includes: and determining statistical information of the structured data, and supplementing the statistical information into the data characteristics.
With reference to the first aspect, in a sixth possible implementation manner, after the obtaining a search result, the method further includes: and carrying out local visual display on the search result.
In a second aspect, an embodiment of the present application provides a service data retrieval device based on a deep neural network, including: the rasterizing module is used for carrying out information rasterizing processing on the acquired original business data to obtain raster data; wherein the original business data is multidimensional time-varying data; an integration module for integrating the raster data into a structured dataset; the generation module is used for analyzing the structured data set and generating a corresponding data view; and the matching module is used for outputting the feature vector of the data view through the trained deep neural network model, and carrying out partition matching on the feature vector and the user search content to obtain a search result.
In a third aspect, embodiments of the present application provide an apparatus, including: a processor; a memory for storing processor-executable instructions; the processor, when executing the executable instructions, implements a method as described in the first aspect or any one of the possible implementations of the first aspect.
In a fourth aspect, embodiments of the present application provide a non-transitory computer readable storage medium comprising instructions for storing a computer program or instructions which, when executed, cause a method as described in the first aspect or any one of the possible implementations of the first aspect to be implemented.
One or more technical solutions provided in the embodiments of the present application at least have the following technical effects or advantages:
according to the embodiment of the application, the original business data is rasterized, so that the high-dimension data can be reduced to a low-dimension space, and the calculation complexity and the storage space are reduced; constructing the structured dataset can classify the structured data; generating a view of the data enables more intuitive presentation of the data to the user. The method effectively solves the problem that the existing simulation system is difficult to consider in terms of both digital processing capacity and user experience, and further achieves the effect that the service data retrieval method based on the deep neural network can consider the user experience on the premise of keeping the data processing capacity, and can meet the continuously-changing market demands and user demands.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings used in the embodiments of the present application or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flowchart of a service data retrieval method based on a deep neural network according to an embodiment of the present application;
fig. 2 is a schematic structural diagram of a service data retrieval device based on a deep neural network according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. It will be apparent that the described embodiments are some, but not all, embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Some of the techniques involved in the embodiments of the present application are described below to aid understanding, and they should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, for the sake of clarity and conciseness, descriptions of well-known functions and constructions are omitted in the following description.
Fig. 1 is a flowchart of a service data retrieval method based on a deep neural network according to an embodiment of the present application, including steps 101 to 104. Wherein fig. 1 is only one execution order shown in the embodiment of the present application, and does not represent a unique execution order of the service data retrieval method based on the deep neural network, and the steps shown in fig. 1 may be executed in parallel or in reverse in case that the final result can be achieved.
Step 101: and performing information rasterization processing on the acquired original service data to obtain raster data. Specifically, the original service data is multidimensional time-varying data, which can be system data changing in real time in a system or system data changing according to user input or changing parameters. The format of the data is not uniform, dynamic changes and the retrieval difficulty is high.
In the embodiment of the application, the original service data is subjected to information rasterization to obtain raster data. The original service data is preprocessed once before the original service data is information rasterized. The method comprises the steps of cleaning, denoising and smoothing the original service data so as to improve the quality of the original service data.
Data cleansing can remove errors, inconsistencies, and missing data in the original business data. Wherein errors may result from errors in the entry of data, such as entering incorrect numbers or letters. Inconsistencies may be due to differences in the format or units of data. A miss is when the value of some feature in the data set is not recorded. Through the steps, the consistency and the integrity of the data can be ensured, and an accurate basis is provided for subsequent analysis.
The influence of noise such as measurement errors, sensor faults and the like occurring in the original business data is eliminated or reduced through filtering and smoothing methods. The original business data is smoothed by a moving average method, an exponential smoothing method and the like, so that the fluctuation of the data can be reduced, the data distribution is more uniform, and the real situation of the data is better reflected.
In addition, the original business data can be partitioned by those skilled in the art. Illustratively, the original business data may be partitioned herein using a k-means clustering algorithm, a DBSCAN clustering algorithm, or the like. To ensure subsequent retrieval efficiency, the partitions should be as uniform as possible here. One skilled in the art can also construct an index for each partition to speed up the subsequent retrieval efficiency.
In the embodiment of the application, the information grid rule is defined according to the current service requirement or target of the system. The information grid rule comprises parameters such as the block size, the block interval and the like of the original service data.
Determining information raster rules, dividing original business data to obtain raster data, and setting identifiers for the raster data. Specifically, the rule of information rasterization is determined according to the service requirement or the target, and the original service data is rasterized according to the rule.
Illustratively, if the business requirement is to analyze the behavior data of the user. Then the original business data needs to be partitioned according to the authority of the user so as to understand the behavior patterns of the users with different authorities. If the business requirement is to analyze the performance of the equipment, the equipment is segmented according to the category or attribute of the equipment so as to understand the performance parameters of different equipment. If the target plans the optimal path, all paths to be planned are required to be classified, and then data of each class are summarized and ordered to obtain the final optimal path.
And a non-repeated identifier is allocated to each grid after rasterization, and the identifier is uniquely corresponding to the grid, so that the efficiency and the accuracy of data processing can be improved.
The information rasterization can convert unstructured or semi-structured data into structured data, and high-dimensional data can be reduced to a low-dimensional space by performing information rasterization on multi-dimensional time-varying data, so that the computational complexity and the storage space are reduced.
Further, a database or distributed storage system may be used in one embodiment of the present application to store and manage grid elements and their associated information.
Step 102: the raster data is integrated into a structured dataset. In the embodiment of the application, the divided raster data is subjected to secondary preprocessing, and the raster data after the secondary preprocessing is integrated into the structured data. Specifically, data integration is required to ensure consistency of the integrated data. This requires a secondary preprocessing of the raster data. The data is cleaned, and noise, errors and inconsistencies in the data are eliminated. Including performing format conversion, unit conversion, or coordinate system conversion on the raster data to conform the raster data to the desired format and accuracy requirements. In addition, timing consistency of the raster data needs to be considered to ensure that the data collected at different points in time are properly aligned and integrated together.
Furthermore, data integration is also required to ensure data integrity. It is ensured that the raster data contains all necessary information to meet the application requirements. Including processing missing values, supplementing missing attribute information, and deleting duplicate or unrelated data. In addition, attention is paid to the quality of the data, and the accuracy, reliability and credibility of the data are ensured. Those skilled in the art will appreciate that sampling inspection, statistical analysis, etc. of the data may be performed as necessary to assess the overall quality of the data.
There is also a need to guarantee the accessibility and availability of data. The data set can be conveniently accessed and used by a user by designing the corresponding data structure and the storage mode for the raster data.
And extracting the data characteristics based on the structured data to obtain the data characteristics. The method comprises the steps of extracting data characteristics of the structured data, and managing the data characteristics, designing easily-understood data tables and charts and the like.
Data feature management of structured data is a key step in improving accessibility of structured data. Data characteristics are data describing the data, including information about the source of the data, creation time, modification history, etc. By managing the data features, it is easier for a user to find and use the structured data that is needed. In one embodiment of the present application, a data feature search engine may be designed, and a user may search for relevant data features by entering keywords. In addition, a visual report and a chart can be generated through the data characteristics, so that a user can be helped to more intuitively understand the corresponding structured data.
And constructing a structured data set according to each structured data and the corresponding data characteristics thereof as data elements. And converting raster data which are regularly divided according to the information raster into corresponding structured data, and constructing a plurality of structured data sets by taking the corresponding structured data and the data characteristics thereof as data elements.
Step 103: the structured dataset is analyzed and a corresponding data view is generated. In the embodiment of the application, the analysis of the structured dataset can mine potential rules and trends of the data, understand the meaning behind the data and provide support for decision making.
The structured dataset may be analyzed in embodiments of the present application using descriptive statistical analysis, correlation analysis, regression analysis, cluster analysis, and the like. In addition, the analysis result of the structured data set can be visualized, and the distribution, trend and relationship of the structured data set can be intuitively displayed, so that abnormal values, trend changes or potential problems in the data can be more easily found.
An adapted data structure is designed for each data element in the structured dataset. In the embodiment of the application, a data structure which can be used for the general purpose of a structured data set is designed first, and the data structure can store various types of structured data. The general data structure includes information such as the name, data type, and time stamp of the structured data. Wherein the name is used to identify the structured data, the data type is used to understand the nature of the structured data, and the timestamp is used to track the time of creation and use of the structured data.
A data structure is then defined for each type of structured data in the structured dataset. For example, if the type of structured data is text data, a data structure is defined for which a character string can be stored. The type of structured data is image data for which a data structure is defined that is capable of storing pixel values.
In addition, one skilled in the art can add some additional data features to the structured data in the data structure to reveal more data information. For example, a quality field may be added for evaluating the reliability of the corresponding structured data.
In the embodiment of the application, the statistical information of the structured data is determined, and the statistical information is added into the data characteristics. Wherein the statistical information includes an average value, a maximum value, a minimum value, etc. of the structured data. The average value is an important index for measuring the trend of the data in the structured dataset, and can represent the overall efficiency of the structured dataset. The maxima and minima may represent the distribution range of the structured dataset, as well as possible outliers.
And generating a corresponding data view according to the data structure of each data element. In the embodiment of the application, a proper visualization method is selected according to the type and the data characteristics of the structured data. The data visualization method comprises a line graph, a bar graph, a pie chart, a scatter graph and the like. The person skilled in the art can choose a suitable visualization method or tool according to the actual situation. Illustratively, the different structured data is compared with a histogram; displaying the change trend of the structured data by using a line graph; the duty cycle of the structured data is shown in pie charts. If the structured data is relatively complex or difficult to understand, one skilled in the art may also select more complex chart types, such as thermodynamic diagrams, scatter diagrams, etc., without limitation. Furthermore, the scale and scale of the structured data need to be considered when constructing a data view of the structured data set.
Step 104: and outputting the feature vector of the data view through the trained deep neural network model, and carrying out partition matching according to the feature vector and the search content of the user to obtain a search result. In particular, conventional deep neural network models require a significant amount of computational resources and time to train and update due to the constantly changing multidimensional time-varying data. In the embodiment of the application, the deep neural network model is optimized by adopting an incremental learning method, and parameters are updated step by step in the model training process, so that the model can adapt to the continuously-changing multidimensional time-varying data environment, and the practicability of the deep neural network model is improved. The specific method is as follows.
The deep neural network model is trained by online learning of the dynamic machine learning method, and each time new input data is received, the new input data and current parameters are input into a gradient descent algorithm together so as to update the parameters of the deep neural network model. By the method, the deep neural network model can be always in the optimal state, and meanwhile, the problem of over fitting is avoided. In particular, online learning may update parameters of the deep neural network model in real time as new input data arrives, enabling the deep neural network model to adapt to changing input data distributions.
The training method for combining a plurality of sub-models together to form an integrated deep neural network model by using ensemble learning. When new input data arrives, the sub-models in the integrated deep neural network model are used for respectively predicting, and then the prediction results are fused to obtain the final prediction result. The method fully utilizes the advantages of each submodel and improves the accuracy and stability of the system. Illustratively, the method of ensemble learning herein includes Bagging, boosting, stacking and the like.
The deep neural network model is trained by a method of training new input data in a fine tuning mode by using the pre-trained model as a basis through transfer learning. The method can reduce training time and improve the performance of the deep neural network model. The core idea is to apply knowledge learned from one task to another. For example, if the original business data is text data, a Word vector model (such as Word2Vec or GloVe) pre-trained on a large amount of text data is used as an initial Word embedding, and then the Word embedding is fine-tuned on a specific text classification task, so that training time is reduced, and classification performance is improved.
The deep neural network model is trained by self-adaptive learning according to the distribution characteristics and the number change conditions of new data and by a method for dynamically adjusting super parameters such as learning rate, regularization coefficient and the like. The method enables the deep neural network model to maintain good performance in the face of input data with different data sizes and complexities. For example, the learning rate is automatically adjusted using an adaptive learning rate optimizer (e.g., adam or RMSprop) to better optimize model parameters during training. In addition, an adaptive regularization method can be adopted, and regularization coefficients can be adjusted according to the complexity of the deep neural network model and the distribution condition of new input data, so that the problem of over-fitting or under-fitting is avoided.
It should be noted that the training method of the four deep neural network models is only one embodiment of the present application, and those skilled in the art may select an appropriate model training method according to the specific situation of the data view.
And taking the data view as input data, inputting the data view into the trained deep neural network model to obtain corresponding feature vectors, and matching the corresponding feature vectors according to search contents of a user to obtain a search result.
In the embodiment of the present application, if the original service data is partitioned in one preprocessing of the original service data, when matching between the search content and the feature vector is performed here, the matching may be performed sequentially according to the partition. In order to improve the retrieval efficiency, a person skilled in the art can also adopt parallel operation, a plurality of partitions are matched at the same time, and the final matching result is summarized into a retrieval result and presented to a user. And when the matching results are summarized, a greedy algorithm, a backtracking algorithm and other methods can be adopted, so that a certain sequence relationship exists among the matching results. The search results can also be visually displayed. Moreover, the search result can be displayed in a local visual way.
In addition, the person skilled in the art can set the search modes such as keyword search, fuzzy query and the like for the user.
Although the present application provides method operational steps as described in the examples or flowcharts, more or fewer operational steps may be included based on conventional or non-inventive labor. The order of steps recited in the present embodiment is only one way of performing the steps in a plurality of steps, and does not represent a unique order of execution. When implemented by an actual device or client product, the method of the present embodiment or the accompanying drawings may be performed sequentially or in parallel (e.g., in a parallel processor or a multithreaded environment).
As shown in fig. 2, the embodiment of the application further provides a service data retrieval device 200 based on the deep neural network. The device comprises: the rasterizing module 201, the integrating module 202, the generating module 203, and the matching module 204 are specifically as follows.
The rasterizing module 201 is configured to perform information rasterizing processing on the acquired original service data to obtain raster data. Wherein the original service data is multi-dimensional time-varying data. The rasterizing module 201 is specifically configured to perform information rasterizing processing on the original service data to obtain raster data. The original service data is preprocessed once before the original service data is information rasterized. The method comprises the steps of cleaning, denoising and smoothing the original service data so as to improve the quality of the original service data.
Data cleansing can remove errors, inconsistencies, and missing data in the original business data. Wherein errors may result from errors in the entry of data, such as entering incorrect numbers or letters. Inconsistencies may be due to differences in the format or units of data. A miss is when the value of some feature in the data set is not recorded. Through the steps, the consistency and the integrity of the data can be ensured, and an accurate basis is provided for subsequent analysis.
The influence of noise such as measurement errors, sensor faults and the like occurring in the original business data is eliminated or reduced through filtering and smoothing methods. The original business data is smoothed by a moving average method, an exponential smoothing method and the like, so that the fluctuation of the data can be reduced, the data distribution is more uniform, and the real situation of the data is better reflected.
In addition, the original business data can be partitioned by those skilled in the art. Illustratively, the original business data may be partitioned herein using a k-means clustering algorithm, a DBSCAN clustering algorithm, or the like. To ensure subsequent retrieval efficiency, the partitions should be as uniform as possible here. One skilled in the art can also construct an index for each partition to speed up the subsequent retrieval efficiency.
In the embodiment of the application, the information grid rule is defined according to the current service requirement or target of the system. The information grid rule comprises parameters such as the block size, the block interval and the like of the original service data.
Determining information raster rules, dividing original business data to obtain raster data, and setting identifiers for the raster data. Specifically, the rule of information rasterization is determined according to the service requirement or the target, and the original service data is rasterized according to the rule.
Illustratively, if the business requirement is to analyze the behavior data of the user. Then the original business data needs to be partitioned according to the authority of the user so as to understand the behavior patterns of the users with different authorities. If the business requirement is to analyze the performance of the equipment, the equipment is segmented according to the category or attribute of the equipment so as to understand the performance parameters of different equipment. If the target plans the optimal path, all paths to be planned are required to be classified, and then data of each class are summarized and ordered to obtain the final optimal path.
And a non-repeated identifier is allocated to each grid after rasterization, and the identifier is uniquely corresponding to the grid, so that the efficiency and the accuracy of data processing can be improved.
The information rasterization can convert unstructured or semi-structured data into structured data, and high-dimensional data can be reduced to a low-dimensional space by performing information rasterization on multi-dimensional time-varying data, so that the computational complexity and the storage space are reduced.
The integration module 202 is used to integrate raster data into a structured dataset. The integration module 202 is specifically configured to perform secondary preprocessing on the divided raster data, and integrate the raster data after the secondary preprocessing into structured data. Specifically, data integration is required to ensure consistency of the integrated data. This requires a secondary preprocessing of the raster data. The data is cleaned, and noise, errors and inconsistencies in the data are eliminated. Including performing format conversion, unit conversion, or coordinate system conversion on the raster data to conform the raster data to the desired format and accuracy requirements. In addition, timing consistency of the raster data needs to be considered to ensure that the data collected at different points in time are properly aligned and integrated together.
Furthermore, data integration is also required to ensure data integrity. It is ensured that the raster data contains all necessary information to meet the application requirements. Including processing missing values, supplementing missing attribute information, and deleting duplicate or unrelated data. In addition, attention is paid to the quality of the data, and the accuracy, reliability and credibility of the data are ensured. Those skilled in the art will appreciate that sampling inspection, statistical analysis, etc. of the data may be performed as necessary to assess the overall quality of the data.
There is also a need to guarantee the accessibility and availability of data. The data set can be conveniently accessed and used by a user by designing the corresponding data structure and the storage mode for the raster data.
And extracting the data characteristics based on the structured data to obtain the data characteristics. The method comprises the steps of extracting data characteristics of the structured data, and managing the data characteristics, designing easily-understood data tables and charts and the like.
Data feature management of structured data is a key step in improving accessibility of structured data. Data characteristics are data describing the data, including information about the source of the data, creation time, modification history, etc. By managing the data features, it is easier for a user to find and use the structured data that is needed. In one embodiment of the present application, a data feature search engine may be designed, and a user may search for relevant data features by entering keywords. In addition, a visual report and a chart can be generated through the data characteristics, so that a user can be helped to more intuitively understand the corresponding structured data.
And constructing a structured data set according to each structured data and the corresponding data characteristics thereof as data elements. And converting raster data which are regularly divided according to the information raster into corresponding structured data, and constructing a plurality of structured data sets by taking the corresponding structured data and the data characteristics thereof as data elements.
The generation module 203 is configured to analyze the structured dataset and generate a corresponding data view. The generating module 203 is specifically configured to analyze the structured dataset to mine potential rules and trends of the data, understand meaning behind the data, and provide support for decision making.
The structured dataset may be analyzed in embodiments of the present application using descriptive statistical analysis, correlation analysis, regression analysis, cluster analysis, and the like. In addition, the analysis result of the structured data set can be visualized, and the distribution, trend and relationship of the structured data set can be intuitively displayed, so that abnormal values, trend changes or potential problems in the data can be more easily found.
An adapted data structure is designed for each data element in the structured dataset. In the embodiment of the application, a data structure which can be used for the general purpose of a structured data set is designed first, and the data structure can store various types of structured data. The general data structure includes information such as the name, data type, and time stamp of the structured data. Wherein the name is used to identify the structured data, the data type is used to understand the nature of the structured data, and the timestamp is used to track the time of creation and use of the structured data.
A data structure is then defined for each type of structured data in the structured dataset. For example, if the type of structured data is text data, a data structure is defined for which a character string can be stored. The type of structured data is image data for which a data structure is defined that is capable of storing pixel values.
In addition, one skilled in the art can add some additional data features to the structured data in the data structure to reveal more data information. For example, a quality field may be added for evaluating the reliability of the corresponding structured data.
In the embodiment of the application, the statistical information of the structured data is determined, and the statistical information is added into the data characteristics. Wherein the statistical information includes an average value, a maximum value, a minimum value, etc. of the structured data. The average value is an important index for measuring the trend of the data in the structured dataset, and can represent the overall efficiency of the structured dataset. The maxima and minima may represent the distribution range of the structured dataset, as well as possible outliers.
And generating a corresponding data view according to the data structure of each data element. In the embodiment of the application, a proper visualization method is selected according to the type and the data characteristics of the structured data. The data visualization method comprises a line graph, a bar graph, a pie chart, a scatter graph and the like. The person skilled in the art can choose a suitable visualization method or tool according to the actual situation. Illustratively, the different structured data is compared with a histogram; displaying the change trend of the structured data by using a line graph; the duty cycle of the structured data is shown in pie charts. If the structured data is relatively complex or difficult to understand, one skilled in the art may also select more complex chart types, such as thermodynamic diagrams, scatter diagrams, etc., without limitation. Furthermore, the scale and scale of the structured data need to be considered when constructing a data view of the structured data set.
The matching module 204 is configured to output a feature vector of the data view through the trained deep neural network model, and perform partition matching according to the feature vector and the user search content, so as to obtain a search result. The matching module 204 is specifically configured to optimize the deep neural network model by adopting an incremental learning method, and gradually update parameters in the model training process, so that the model can adapt to a continuously-changing multidimensional time-varying data environment, thereby improving the practicability of the deep neural network model. The specific method is as follows.
The deep neural network model is trained by online learning of the dynamic machine learning method, and each time new input data is received, the new input data and current parameters are input into a gradient descent algorithm together so as to update the parameters of the deep neural network model. By the method, the deep neural network model can be always in the optimal state, and meanwhile, the problem of over fitting is avoided. In particular, online learning may update parameters of the deep neural network model in real time as new input data arrives, enabling the deep neural network model to adapt to changing input data distributions.
The training method for combining a plurality of sub-models together to form an integrated deep neural network model by using ensemble learning. When new input data arrives, the sub-models in the integrated deep neural network model are used for respectively predicting, and then the prediction results are fused to obtain the final prediction result. The method fully utilizes the advantages of each submodel and improves the accuracy and stability of the system. Illustratively, the method of ensemble learning herein includes Bagging, boosting, stacking and the like.
The deep neural network model is trained by a method of training new input data in a fine tuning mode by using the pre-trained model as a basis through transfer learning. The method can reduce training time and improve the performance of the deep neural network model. The core idea is to apply knowledge learned from one task to another. For example, if the original business data is text data, a Word vector model (such as Word2Vec or GloVe) pre-trained on a large amount of text data is used as an initial Word embedding, and then the Word embedding is fine-tuned on a specific text classification task, so that training time is reduced, and classification performance is improved.
The deep neural network model is trained by self-adaptive learning according to the distribution characteristics and the number change conditions of new data and by a method for dynamically adjusting super parameters such as learning rate, regularization coefficient and the like. The method enables the deep neural network model to maintain good performance in the face of input data with different data sizes and complexities. For example, the learning rate is automatically adjusted using an adaptive learning rate optimizer (e.g., adam or RMSprop) to better optimize model parameters during training. In addition, an adaptive regularization method can be adopted, and regularization coefficients can be adjusted according to the complexity of the deep neural network model and the distribution condition of new input data, so that the problem of over-fitting or under-fitting is avoided.
It should be noted that the training method of the four deep neural network models is only one embodiment of the present application, and those skilled in the art may select an appropriate model training method according to the specific situation of the data view.
And taking the data view as input data, inputting the data view into the trained deep neural network model to obtain corresponding feature vectors, and matching the corresponding feature vectors according to search contents of a user to obtain a search result.
In the embodiment of the present application, if the original service data is partitioned in one preprocessing of the original service data, when matching between the search content and the feature vector is performed here, the matching may be performed sequentially according to the partition. In order to improve the retrieval efficiency, a person skilled in the art can also adopt parallel operation, a plurality of partitions are matched at the same time, and the final matching result is summarized and presented to the user. And when the matching results are summarized, a greedy algorithm, a backtracking algorithm and other methods can be adopted, so that a certain sequence relationship exists among the matching results. The matching results may also be presented here in a visual manner. Moreover, the search result can be displayed in a local visual way.
In addition, the person skilled in the art can set the search modes such as keyword search, fuzzy query and the like for the user.
Some of the modules of the apparatus described herein may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, classes, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
The apparatus or module set forth in the embodiments of the application may be implemented in particular by a computer chip or entity, or by a product having a certain function. For convenience of description, the above devices are described as being functionally divided into various modules, respectively. The functions of the modules may be implemented in the same piece or pieces of software and/or hardware when implementing the embodiments of the present application. Of course, a module that implements a certain function may be implemented by a plurality of sub-modules or a combination of sub-units.
The methods, apparatus or modules described herein may be implemented in computer readable program code means and in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer readable medium storing computer readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, application specific integrated circuits (english: application Specific Integrated Circuit; abbreviated: ASIC), programmable logic controllers and embedded microcontrollers, examples of which include, but are not limited to, the following microcontrollers: ARC 625D, atmel AT91SAM, microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic of the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller in a pure computer readable program code, it is well possible to implement the same functionality by logically programming the method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers, etc. Such a controller can be regarded as a hardware component, and means for implementing various functions included therein can also be regarded as a structure within the hardware component. Or even means for achieving the various functions may be regarded as either software modules implementing the methods or structures within hardware components.
The embodiment of the application also provides equipment, which comprises: a processor; a memory for storing processor-executable instructions; the processor, when executing the executable instructions, implements a method as described in embodiments of the present application.
The embodiments also provide a non-transitory computer readable storage medium having stored thereon a computer program or instructions which, when executed, cause a method as described in the embodiments of the present application to be implemented.
In addition, each functional module in the embodiments of the present invention may be integrated into one processing module, each module may exist alone, or two or more modules may be integrated into one module.
The storage medium includes, but is not limited to, a random access Memory (English: random Access Memory; RAM), a Read-Only Memory (ROM), a Cache Memory (English: cache), a Hard Disk (English: hard Disk Drive; HDD), or a Memory Card (English: memory Card). The memory may be used to store computer program instructions.
From the description of the embodiments above, it will be apparent to those skilled in the art that the present application may be implemented in software plus necessary hardware. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product, or may be embodied in the implementation of data migration. The computer software product may be stored on a storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., comprising instructions for causing a computer device (which may be a personal computer, mobile terminal, server, or network device, etc.) to perform the methods described in various embodiments or portions of embodiments herein.
In this specification, each embodiment is described in a progressive manner, and the same or similar parts of each embodiment are referred to each other, and each embodiment is mainly described as a difference from other embodiments. All or portions of the present application can be used in a number of general purpose or special purpose computer system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet devices, mobile communication terminals, multiprocessor systems, microprocessor-based systems, programmable electronic devices, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.
The above embodiments are only for illustrating the technical solution of the present application, and not for limiting the present application; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced with equivalents; such modifications and substitutions do not depart from the spirit of the corresponding technical solutions.

Claims (9)

1. A service data retrieval method based on a deep neural network is characterized by comprising the following steps:
performing information rasterization processing on the acquired original business data to obtain raster data; wherein the original business data is multidimensional time-varying data;
the raster data is integrated into a structured dataset, in particular as follows: performing secondary preprocessing on the raster data, and integrating the raster data after the secondary preprocessing into structured data; extracting data features based on the structured data to obtain data features; constructing the structured data set according to each structured data and the corresponding data characteristic thereof as a data element;
analyzing the structured dataset and generating a corresponding data view;
optimizing the deep neural network model by adopting an incremental learning method, and gradually updating parameters in the training process of the deep neural network model; and outputting the feature vector of the data view through the trained deep neural network model, and carrying out partition matching according to the feature vector and the search content of the user to obtain a search result.
2. The method of claim 1, wherein before performing information rasterization on the obtained original service data to obtain raster data, the method further comprises:
Performing primary preprocessing on the acquired original service data; the primary preprocessing comprises the operations of cleaning, denoising and smoothing the original business data.
3. The method of claim 1, wherein the performing information rasterization on the acquired original service data to obtain raster data includes:
defining information grid rules according to the current business requirements or targets of the system;
determining the information grid rule, dividing the original service data to obtain grid data, and setting identifiers for the grid data.
4. The method of claim 1, wherein the generating the corresponding data view comprises:
designing an adapted data structure for each of the data elements in the structured dataset;
and generating the corresponding data view according to the data structure of each data element.
5. The method of claim 1, wherein prior to generating the corresponding data view, further comprising:
and determining statistical information of the structured data, and supplementing the statistical information into the data characteristics.
6. The method according to claim 1, wherein after obtaining the search result, further comprising:
And carrying out local visual display on the search result.
7. A deep neural network-based service data retrieval device, comprising:
the rasterizing module is used for carrying out information rasterizing processing on the acquired original business data to obtain raster data; wherein the original business data is multidimensional time-varying data;
the integration module is used for integrating the raster data into a structured data set, and specifically comprises the following steps: performing secondary preprocessing on the raster data, and integrating the raster data after the secondary preprocessing into structured data; extracting data features based on the structured data to obtain data features; constructing the structured data set according to each structured data and the corresponding data characteristic thereof as a data element;
the generation module is used for analyzing the structured data set and generating a corresponding data view;
the matching module is used for optimizing the deep neural network model by adopting an incremental learning method and gradually updating parameters in the training process of the deep neural network model; and outputting the feature vector of the data view through the trained deep neural network model, and carrying out partition matching according to the feature vector and the search content of the user to obtain a search result.
8. An apparatus for performing a deep neural network-based business data retrieval method, comprising:
a processor;
a memory for storing processor-executable instructions;
the processor, when executing the executable instructions, implements the method of any one of claims 1 to 6.
9. A non-transitory computer readable storage medium comprising instructions for storing a computer program or instructions which, when executed, cause the method of any one of claims 1 to 6 to be implemented.
CN202410033954.6A 2024-01-10 2024-01-10 Service data retrieval method and device based on deep neural network Active CN117539948B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410033954.6A CN117539948B (en) 2024-01-10 2024-01-10 Service data retrieval method and device based on deep neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410033954.6A CN117539948B (en) 2024-01-10 2024-01-10 Service data retrieval method and device based on deep neural network

Publications (2)

Publication Number Publication Date
CN117539948A CN117539948A (en) 2024-02-09
CN117539948B true CN117539948B (en) 2024-04-05

Family

ID=89794287

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410033954.6A Active CN117539948B (en) 2024-01-10 2024-01-10 Service data retrieval method and device based on deep neural network

Country Status (1)

Country Link
CN (1) CN117539948B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111460059A (en) * 2020-03-02 2020-07-28 平安国际智慧城市科技股份有限公司 Ambient air quality data visualization method, device, equipment and storage medium
CN115221387A (en) * 2022-07-13 2022-10-21 全拓科技(杭州)股份有限公司 Enterprise information integration method based on deep neural network
CN116257759A (en) * 2023-03-08 2023-06-13 同济大学 Structured data intelligent classification grading system of deep neural network model
CN117368862A (en) * 2023-09-12 2024-01-09 海南省气象信息中心 High-efficiency weather radar data quality evaluation system

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170124497A1 (en) * 2015-10-28 2017-05-04 Fractal Industries, Inc. System for automated capture and analysis of business information for reliable business venture outcome prediction
US11348237B2 (en) * 2019-05-16 2022-05-31 Retrace Labs Artificial intelligence architecture for identification of periodontal features

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111460059A (en) * 2020-03-02 2020-07-28 平安国际智慧城市科技股份有限公司 Ambient air quality data visualization method, device, equipment and storage medium
CN115221387A (en) * 2022-07-13 2022-10-21 全拓科技(杭州)股份有限公司 Enterprise information integration method based on deep neural network
CN116257759A (en) * 2023-03-08 2023-06-13 同济大学 Structured data intelligent classification grading system of deep neural network model
CN117368862A (en) * 2023-09-12 2024-01-09 海南省气象信息中心 High-efficiency weather radar data quality evaluation system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Aerial Unstructured Road Segmentation Based on Deep Convolution Neural Network;Rui Wang et al;《Proceedings of the 38th Chinese Control Conference》;20190730;第8494-8500页 *
基于多通道卷积神经网络的非结构化数据标注;米启超 等;《计算机仿真》;20210630;第38卷(第6期);第400-404页 *

Also Published As

Publication number Publication date
CN117539948A (en) 2024-02-09

Similar Documents

Publication Publication Date Title
CN113822494B (en) Risk prediction method, device, equipment and storage medium
US10977293B2 (en) Technology incident management platform
EP3467723B1 (en) Machine learning based network model construction method and apparatus
CN103513983B (en) method and system for predictive alert threshold determination tool
EP3940555A2 (en) Method and apparatus of processing information, method and apparatus of recommending information, electronic device, and storage medium
US20220100963A1 (en) Event extraction from documents with co-reference
CN112070416B (en) AI-based RPA flow generation method, apparatus, device and medium
CN111753914A (en) Model optimization method and device, electronic equipment and storage medium
US20220100772A1 (en) Context-sensitive linking of entities to private databases
US20220391672A1 (en) Multi-task deployment method and electronic device
US11796991B2 (en) Context-awareness in preventative maintenance
WO2020140624A1 (en) Method for extracting data from log, and related device
CN113538154A (en) Risk object identification method and device, storage medium and electronic equipment
CN115809302A (en) Metadata processing method, device, equipment and storage medium
US20220309391A1 (en) Interactive machine learning optimization
US20220100967A1 (en) Lifecycle management for customized natural language processing
CN110874366A (en) Data processing and query method and device
CN113767403A (en) Automatic resolution of over-and under-designations in knowledge graphs
CN117539948B (en) Service data retrieval method and device based on deep neural network
CN115619245A (en) Portrait construction and classification method and system based on data dimension reduction method
CN111737319B (en) User cluster prediction method, device, computer equipment and storage medium
CN112070487B (en) AI-based RPA flow generation method, apparatus, device and medium
CN114881521A (en) Service evaluation method, device, electronic equipment and storage medium
CN115335834A (en) Machine learning model determination system and machine learning model determination method
EP3624065A1 (en) Classification of knowledge management assets

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant