CN117056060B - Big data information processing method based on deep learning - Google Patents

Big data information processing method based on deep learning Download PDF

Info

Publication number
CN117056060B
CN117056060B CN202311317069.2A CN202311317069A CN117056060B CN 117056060 B CN117056060 B CN 117056060B CN 202311317069 A CN202311317069 A CN 202311317069A CN 117056060 B CN117056060 B CN 117056060B
Authority
CN
China
Prior art keywords
data
host server
cloud
server
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311317069.2A
Other languages
Chinese (zh)
Other versions
CN117056060A (en
Inventor
王文雅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Youtejie Information Technology Co ltd
Original Assignee
Beijing Youtejie Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Youtejie Information Technology Co ltd filed Critical Beijing Youtejie Information Technology Co ltd
Priority to CN202311317069.2A priority Critical patent/CN117056060B/en
Publication of CN117056060A publication Critical patent/CN117056060A/en
Application granted granted Critical
Publication of CN117056060B publication Critical patent/CN117056060B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/211Selection of the most significant subset of features
    • G06F18/2113Selection of the most significant subset of features by ranking or filtering the set of features, e.g. using a measure of variance or of feature cross-correlation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/5038Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the execution order of a plurality of tasks, e.g. taking priority or time dependency constraints into consideration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5072Grid computing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/48Indexing scheme relating to G06F9/48
    • G06F2209/484Precedence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5017Task decomposition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5021Priority

Abstract

The application relates to a big data information processing method based on deep learning, which is used for carrying out big data information processing between a host server and a cloud server, carrying out distributed processing on big data processing tasks issued by the host server by utilizing a distributed computing model MapReduce deployed on the cloud, avoiding the technical defects caused by a distributed network of a Hadoop cluster structure, avoiding a plurality of physical extensions and switches, greatly saving the cost and time of a distributed computing network architecture, improving the processing speed of big data by utilizing the distributed computing model MapReduce, reducing the operation pressure of the host server and improving the response requirement on the big data. And the cloud is also utilized to extract and rank the characteristics of the data blocks of the big data by utilizing the characteristic model, and the distribution calculation is carried out according to the ranks, so that the application degree of the big data calculation result can be further refined according to the characteristic value ranks, and high-value commercial data information is provided for users such as enterprises.

Description

Big data information processing method based on deep learning
Technical Field
The disclosure relates to the technical field of big data application, in particular to a big data information processing method, a big data information processing system and electronic equipment based on deep learning.
Background
The big data is information assets which are suitable for mass, high growth rate and diversification and have four large characteristics of mass data scale, rapid data circulation, diversification data types and low value density by needing a new processing mode to have stronger decision making ability, insight discovery ability and flow optimization capability.
With the continuous emergence of big data, the analysis, ranking and valuable extraction of multidimensional information become vital, and valuable data mining and analysis are carried out from the big data, so that the big data can be maximally converted, and high-value commercial data information is provided for users such as enterprises.
The information processing flow of big data generally comprises the following steps:
1. and (3) data acquisition: collecting data to be analyzed, wherein the data generally comprises structured data, semi-structured data and unstructured data;
2. data cleaning: processing, cleaning, removing weight and the like on the acquired data;
3. data conversion: converting the data into a format suitable for analysis;
4. and (3) data mining: using statistical methods and algorithms to find useful information, patterns and trends from the data;
5. data analysis: analyzing the mined data to obtain useful information and conclusions, and giving data-based suggestions and solutions;
6. Data application: data computation and sharing are applied to the analysis results, and so on.
However, for valuable big data information, a data feature extraction mode is combined to perform useful information data extraction, if deep analysis and mining of data information are required, control and data processing control are also required to be performed in deep data detail:
on one hand, the response efficiency and cost of data processing brought by an analysis model of a large data processing flow are considered, useful and valuable information is extracted in a refined mode, low-value and even non-value information is eliminated, and the data capacity is reduced.
On the other hand, the supporting capability of the hardware and software for big data processing needs to be considered. However, in order to reduce the cost of hardware and software, a plurality of low-cost servers are basically adopted to process massive big data by using the data servers deployed by users of the existing enterprises and the like, for example, a distribution network of a common Hadoop cluster structure shown in fig. 1 is adopted to perform big data distribution calculation, the distribution networks of the Hadoop cluster structure are connected and interacted with each other through an optical fiber high-speed switch, each Rack (Rack) has 30-40 servers, a 1GB switch is configured, and the servers are upwards connected to a core switch or a router (1 GB or more). Although the distribution network of the Hadoop cluster structure can perform decentralized processing on big data processing tasks, so as to reduce the operation pressure of a host, the following problems are caused:
Firstly, a large number of low-cost server extensions are low in performance, low in operation speed and large in total control cost of the extensions; secondly, the host computer has the problems of too scattered data scheduling, complicated scheduling and large total consumption time for each extension; thirdly, the data volume capacity of operation is prioritized, and when the file is too large, the transmission of up and down data is very time-consuming.
Disclosure of Invention
In order to solve the problems, the application provides a big data information processing method, a big data information processing system and electronic equipment based on deep learning.
In one aspect of the application, a big data information processing method based on deep learning is provided, and the method is realized based on data interaction between a host server and a cloud server, and comprises the following steps:
constructing a distributed computing model on a cloud server, and generating a cloud computing list CCL of the distributed computing model;
establishing access communication between a host server and the cloud server, and backing up the cloud computing list CCL to the host server;
leading big data M to be processed into the host server, and carrying out information classification on the big data by utilizing a classification model which is deployed in advance on the host server to obtain a plurality of data blocks { M1, M2, M3.
Transmitting the data blocks { m1, m2, m 3..+ -) to the cloud server, and ranking, by the cloud server, data features p of each of the data blocks { m1, m2, m 3..+ -;
and distributing each data block to the distributed computing model according to the feature ranking sequence, carrying out data processing, and feeding back the processing result to the host server in sequence by the cloud server.
As an optional embodiment of the present application, optionally, the distributed computing model is preferably MapReduce.
As an optional embodiment of the present application, optionally, generating a cloud computing list CCL of the distributed computing model includes:
presetting a cloud computing format list;
recording model IDs of the computing models of all distributed deployments, and sequentially writing the model IDs of the computing models of all distributed deployments into the cloud computing format list;
and after carrying out identity statistics and identification on the cloud computing format list by the cloud server, storing the cloud computing format list as a cloud computing list CCL of the distributed computing model.
As an optional embodiment of the present application, optionally, establishing access communication between a host server and the cloud server includes:
The host server initiates an access request for establishing a big data interaction communication link to the cloud server, wherein the access request comprises host identity information, security address information and the data field of big data M to be processed of the host server;
the cloud server receives and analyzes the access request, verifies the host server and judges:
(1) Whether the host identity information of the host server is qualified or not;
(2) Whether the security address information of the host server has address security authentication or not;
(3) Whether the data field of the big data M to be processed of the host server accords with the cloud technical service field of the host server or not;
if the steps (1) - (3) are all met, sending feedback information for receiving the access request to the host server;
and the host server establishes a big data interaction communication link with the cloud server based on an IP protocol according to the feedback information of the cloud server.
As an optional embodiment of the present application, optionally, backing up the cloud computing list CCL to the host server includes:
the cloud server sends an announcement for backing up the cloud computing list CCL to the host server, and judges whether feedback of the host server is received in a preset time or not:
If receiving feedback from the host server within a preset time, sharing the cloud computing list CCL to the host server;
and the host server receives and reads the cloud computing list CCL, obtains the model IDs of the computing models of each distributed deployment, and stores the model IDs in a host database.
As an optional embodiment of the present application, optionally, backing up the cloud computing list CCL to the host server, further includes:
if the receiving feedback of the host server is not received within the preset time, sending a big data field qualification notice to the host server, and sharing the cloud computing list CCL to the host server;
and the host server receives and reads the cloud computing list CCL, obtains the model IDs of the computing models of each distributed deployment, and stores the model IDs in a host database.
As an alternative embodiment of the present application, alternatively, the data blocks m1, m2, m3......the cloud server is sent to, the data blocks m1, m2, m3......the data characteristics p of the individual data blocks in the } are ranked, comprising the following steps:
the cloud server receives the data blocks { m1, m2, m3., }, and randomly imports each of the data blocks { m1, m2, m3., } into a pre-deployed deep learning model;
Extracting data features of each data block by using the deep learning model, and extracting data features p of each data block;
and carrying out feature ranking on the data features p of each data block by using a feature importance assessment tool corresponding to the deep learning model to obtain a feature ranking sequence Mp of each data block.
As an optional embodiment of the present application, optionally, distributing each data block to the distributed computing model according to ranking, performing data processing, and feeding back, by the cloud server, a processing result to the host server sequentially, including:
reading the characteristic ranking sequence Mp, and arranging the characteristic ranking sequence Mp according to steps to obtain a plurality of ranking subsequences Mp0 distributed in steps;
distributing each ranking sub-sequence Mp0 to the distributed computing model, and enabling each distributed computing model to process the data block corresponding to one ranking sub-sequence Mp0 respectively;
each calculation model outputs a data processing result corresponding to the data block, and binds the data processing result with a model ID of the calculation model for tracking and inquiring the data block;
And collecting all the data processing results by the cloud server, and orderly feeding back to the host server according to the characteristic ranking sequence Mp.
In another aspect of the present application, a system for implementing the deep learning-based big data information processing method is provided, including:
the cloud computing model construction module is used for constructing a distributed computing model on a cloud server and generating a cloud computing list CCL of the distributed computing model;
the access communication establishing module is used for establishing access communication between a host server and the cloud server and backing up the cloud computing list CCL to the host server;
the classification module is used for importing big data M to be processed into the host server, and performing information classification on the big data by utilizing a classification model which is deployed in advance on the host server to obtain a plurality of data blocks { M1, M2, M3. };
a ranking module, configured to send the data blocks { m1, m2, m 3..+ -. To the cloud server, and rank, by the cloud server, data features p of each of the data blocks { m1, m2, m 3.};
and the distribution calculation module is used for distributing each data block to the distributed calculation model according to the feature ranking sequence, carrying out data processing, and feeding back the processing result to the host server in sequence by the cloud server.
In another aspect of the present application, an electronic device is further provided, including:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to implement the one deep learning based big data information processing method when executing the executable instructions.
The invention has the technical effects that:
according to the method and the device, big data information processing is conducted between the host server and the cloud server, the distributed computing model MapReduce deployed on the cloud is utilized to conduct distributed processing on big data processing tasks issued by the host server, the whole big data processing process is conducted between the host server and the cloud server, technical defects caused by a distributed network of a Hadoop cluster structure are avoided, a plurality of physical extensions and switches are not needed, cost and time of a distributed computing network architecture are greatly saved, the processing speed of big data can be greatly improved by utilizing the distributed computing model MapReduce of the cloud technology, operation pressure of the host server is greatly reduced, and response requirements on the big data are improved. According to the scheme, the characteristic model is used for carrying out characteristic extraction and ranking on the data blocks of the big data, distribution calculation is carried out according to the ranking, the calculation model can be orderly arranged according to the characteristic value ranking, the calculation results of the valuable data characteristics are preferably output, the big data is output and applied according to the characteristic ranking, the application degree of the big data calculation results is further refined, the processing results of the big data are optimally utilized, and business data information with high value is provided for users such as enterprises.
Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate exemplary embodiments, features and aspects of the present disclosure and together with the description, serve to explain the principles of the disclosure.
FIG. 1 is a schematic diagram of a foot bone of a prior art Hadoop cluster structure;
FIG. 2 shows a schematic flow chart of an embodiment of the invention;
FIG. 3 is a schematic diagram illustrating the composition of an application between a host server and a cloud server according to the present invention;
fig. 4 shows a flow chart for establishing a communication link for the present invention;
FIG. 5 is a schematic diagram of a model application for big data processing using a deep learning model in accordance with the present invention;
fig. 6 shows a schematic application diagram of the electronic device of the invention.
Detailed Description
Various exemplary embodiments, features and aspects of the disclosure will be described in detail below with reference to the drawings. In the drawings, like reference numbers indicate identical or functionally similar elements. Although various aspects of the embodiments are illustrated in the accompanying drawings, the drawings are not necessarily drawn to scale unless specifically indicated.
The word "exemplary" is used herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.
In addition, numerous specific details are set forth in the following detailed description in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, well known means, elements, and circuits have not been described in detail so as not to obscure the present disclosure.
Example 1
As shown in fig. 2, in one aspect of the present application, a big data information processing method based on deep learning is provided, which is implemented based on data interaction between a host server and a cloud server, and includes the following steps:
s1, constructing a distributed computing model on a cloud server, and generating a cloud computing list CCL of the distributed computing model;
s2, establishing access communication between a host server and the cloud server, and backing up the cloud computing list CCL to the host server;
s3, importing big data M to be processed into the host server, and performing information classification on the big data by using a classification model which is deployed in advance on the host server to obtain a plurality of data blocks { M1, M2, M3. };
s4, sending the data blocks { m1, m2, m 3..+ -) to the cloud server, and ranking, by the cloud server, the data features p of each of the data blocks { m1, m2, m 3.};
And S5, distributing each data block to the distributed computing model according to the feature ranking sequence, performing data processing, and feeding back the processing results to the host server in sequence by the cloud server.
According to the method, big data information processing is conducted between the host server and the cloud server, a distributed computing model MapReduce deployed on the cloud is utilized to conduct distributed processing on big data processing tasks issued by the host server, the whole big data processing process is conducted between the host server and the cloud server, meanwhile, feature extraction and ranking are conducted on data blocks of big data through a feature model on the cloud, distributed computing is conducted according to ranking, application degree of big data computing results is further refined, and processing results of the big data are optimized.
The respective steps will be specifically described below.
As an alternative embodiment of the present application, as shown in fig. 3, optionally, the distributed computing model is preferably MapReduce.
Advantages of MapReduce in cloud computing:
MapReduce may break data into small blocks and process the data blocks in parallel across multiple compute nodes, thereby enabling distributed computing. This distributed computing approach can greatly increase processing speed and can process large-scale data sets. Because the MapReduce adopts a distributed computing mode, computing nodes can be easily increased or reduced to realize horizontal expansion, so that a larger-scale data set can be processed, and computing can be completed through a plurality of low-cost computing nodes, thereby reducing the management cost.
MapReduce is therefore preferred as a distributed computing model on the cloud for performing big data processing tasks issued by the host server.
As an optional embodiment of the present application, optionally, generating a cloud computing list CCL of the distributed computing model includes:
presetting a cloud computing format list;
recording model IDs of the computing models of all distributed deployments, and sequentially writing the model IDs of the computing models of all distributed deployments into the cloud computing format list;
and after carrying out identity statistics and identification on the cloud computing format list by the cloud server, storing the cloud computing format list as a cloud computing list CCL of the distributed computing model.
The distributed computing model is formed by a plurality of distributed computing model networking, and distributed deployment is carried out on a cloud server. In order to facilitate management and task allocation of the cloud server to each distributed and deployed computing model, and unified collection and data feedback of processing results including subsequent data blocks, the cloud server counts model IDs of each computing model, and the model IDs of each computing model may be recorded by the cloud server at the time of deployment and stored in a cloud computing format list preset on the cloud server.
The cloud computing format list, that is, a list of model IDs for counting each computing model, may be written with the model IDs for each computing model. After the models are deployed, model IDs of the models are written in sequence by a cloud server, a background manager of the cloud server counts and identifies the model IDs on the list, judges whether the model IDs of all the deployed models are written in the list, judges whether the model identities, the model attributes and the like of the models are properly deployed in place, stores the list on the cloud if the models pass the verification, stores the list as a cloud computing list CCL of the distributed computing models, and can carry out model deployment and model management on the models according to the model IDs based on the cloud computing list CCL, wherein the binding and the data processing result retrieval are carried out on the model computing results of the computing models of the subsequent distributed deployment.
As an optional embodiment of the present application, optionally, establishing access communication between a host server and the cloud server includes:
the host server initiates an access request for establishing a big data interaction communication link to the cloud server, wherein the access request comprises host identity information, security address information and the data field of big data M to be processed of the host server;
The cloud server receives and analyzes the access request, verifies the host server and judges:
(1) Whether the host identity information of the host server is qualified or not;
(2) Whether the security address information of the host server has address security authentication or not;
(3) Whether the data field of the big data M to be processed of the host server accords with the cloud technical service field of the host server or not;
if the steps (1) - (3) are all met, sending feedback information for receiving the access request to the host server;
and the host server establishes a big data interaction communication link with the cloud server based on an IP protocol according to the feedback information of the cloud server.
As shown in fig. 4, in this scheme, a big data processing task is issued on a host server, and a communication access request is initiated to a cloud server by the host server according to an address. The cloud server analyzes the access request to obtain the identity information and the address information of the host server and the data processing field of big data to be processed by the host server, the cloud server verifies the identity information, the address information and the data field in the access request of the host server, and whether the three pieces of information meet the requirements of the cloud server is judged: if the identity is qualified, the address has security authentication, and the big data field to be processed by the host server is the processing field adapted by the current cloud server, the cloud server can send feedback for receiving the access request to the host server. After receiving the feedback, the host server can establish a communication access link with the cloud server based on the IP communication protocol, and carry out subsequent big data task delivery through the constructed big data interaction communication link, including response feedback of big data processing results.
Based on the big data interaction communication link, the host server can also provide the view of the big data processing process to the cloud server in real time, the cloud server can collect the progress of each model according to the model ID, count the progress as a progress table and feed back the progress table to the host server, and the host server monitors the big data processing progress of the cloud server according to the progress table, so that the monitoring efficiency is improved.
As an optional embodiment of the present application, optionally, backing up the cloud computing list CCL to the host server includes:
the cloud server sends an announcement for backing up the cloud computing list CCL to the host server, and judges whether feedback of the host server is received in a preset time or not:
if receiving feedback from the host server within a preset time, sharing the cloud computing list CCL to the host server;
and the host server receives and reads the cloud computing list CCL, obtains the model IDs of the computing models of each distributed deployment, and stores the model IDs in a host database.
The cloud server can send an announcement whether to backup the cloud computing list CCL to the host server, so that the host server backs up the cloud computing list CCL, the host server can check the model IDs of all the computing models at the same time, and see the processing tasks of the computing models corresponding to the computing model IDs according to the lists.
And meanwhile, the host server is convenient for task comparison according to the cloud computing list and the cloud computing list on the cloud server, and comprises computing data and task synchronization, so that data checking and unified management between the host server and the cloud server are facilitated.
As an optional embodiment of the present application, optionally, backing up the cloud computing list CCL to the host server, further includes:
if the receiving feedback of the host server is not received within the preset time, sending a big data field qualification notice to the host server, and sharing the cloud computing list CCL to the host server;
and the host server receives and reads the cloud computing list CCL, obtains the model IDs of the computing models of each distributed deployment, and stores the model IDs in a host database.
After the cloud server sends out the notification, it is necessary to determine whether the host server receives the notification within a predetermined time, and send out a feedback response to the cloud server.
If the cloud computing list CCR is fed back within the preset time, the host server acquires the cloud computing list CCR and stores the cloud computing list CCR in a database of the host server. If the information fed back by the host server to the cloud server is not received within the specified time, the cloud server actively sends a big data field qualification notice to the host server, and shares the list to the host server.
Of course, if the verification is the previous verification, the technical field of big data of the host server is found to be not in accordance with the requirements of the cloud server, and the cloud server directly refuses to give up.
The cloud server is adopted to actively announce and actively share according to the technical field qualification, so that further economic benefits can be provided for the cloud server.
After the secure communication is established, the host server will begin performing large data processing tasks.
FIG. 5 is a schematic diagram of a data processing system according to the present embodiment.
According to the scheme, the host server classifies big data, then distributes processing personnel to the cloud server, the cloud server performs distributed calculation on classified data blocks { m1, m2 and m3. Through the pre-classification of the host, the cloud server can clearly know the data properties of various data blocks { m1, m2 and m3.
S3, importing big data M to be processed into the host server, and performing information classification on the big data by using a classification model which is deployed in advance on the host server to obtain a plurality of data blocks { M1, M2, M3. };
the host server can classify the big data M to be processed in advance, so that classified management and result statistics are conveniently carried out on the big data M according to classification, and the management difficulty of the big data is reduced.
The big data M to be processed may be classified in advance using a classification model in the machine learning model, such as text classification or image classification, etc., and may be classified into data blocks { M1, M2, M3.
For example, for commercial shopping big data, according to the user portrait or the shopping behavior attribute of the purchasing behavior or the repurchase, the commercial shopping big data is divided into data blocks with different shopping behavior attributes, so as to obtain { short-term shopping crowd, middle-term shopping crowd, long-term shopping crowd, and unintentional shopping crowd.
The classification model may be specifically set by the host administrator according to the data type of the big data M.
As an alternative embodiment of the present application, alternatively, the data blocks m1, m2, m3......the cloud server is sent to, the data blocks m1, m2, m3......the data characteristics p of the individual data blocks in the } are ranked, comprising the following steps:
The cloud server receives the data blocks { m1, m2, m3., }, and randomly imports each of the data blocks { m1, m2, m3., } into a pre-deployed deep learning model;
extracting data features of each data block by using the deep learning model, and extracting data features p of each data block;
and carrying out feature ranking on the data features p of each data block by using a feature importance assessment tool corresponding to the deep learning model to obtain a feature ranking sequence Mp of each data block.
After classifying the big data, the host server may send the data block to the cloud server, and send the data processing task of the data block to the cloud server.
And the cloud server executes the data processing task and sends each data block distribution to each distributed deployed calculation model to perform data calculation. When the cloud server distributes data tasks, a pre-deployed deep learning model, such as a convolutional neural network or a cyclic neural network, can be utilized to extract data features of each database, so as to obtain the data features of each database. The random import mode of the random import deep learning model such as the RNN model is adopted, so that the situation that data are not scattered due to data block concentration can be avoided. Therefore, the data distribution can be improved by adopting a random import mode.
In the manner of extracting the data features by using a deep learning model, such as an RNN model, reference may be specifically made to an application principle of an RNN neural network, which is not described in detail in this embodiment.
After the data characteristics of each data block are extracted, in order to orderly process each data block of big data according to a valuable and characteristic importance mode, the scheme preferentially extracts data information corresponding to the data characteristics with high value, and adopts a characteristic importance evaluation tool to rank the data characteristics of each database.
According to the scheme, feature importance assessment tools such as random forests and gradient lifting are adopted, feature ranking tools are adopted for ranking the data features of all databases, so that feature ranking sequences of all databases can be obtained, the data importance in all the data blocks can be rapidly identified and known according to the feature ranking sequences, an administrator can know the importance of the data features contained in all the data blocks according to the ranking of the data features of all the databases, valuable information extraction is conducted on information in all the data blocks in order according to the feature importance represented by the ranking, and the higher the ranking is, the more valuable the data information contained in the corresponding data blocks is, and the information of the data blocks is preferentially extracted.
For extracting the value information of a specific database, when a host server issues tasks, elements, indexes, requirements and the like of data extraction can be synchronously issued to a cloud server, so that the cloud server extracts corresponding value information according to the elements, indexes and requirements of the host server.
As an optional embodiment of the present application, optionally, distributing each data block to the distributed computing model according to ranking, performing data processing, and feeding back, by the cloud server, a processing result to the host server sequentially, including:
reading the characteristic ranking sequence Mp, and arranging the characteristic ranking sequence Mp according to steps to obtain a plurality of ranking subsequences Mp0 distributed in steps;
distributing each ranking sub-sequence Mp0 to the distributed computing model, and enabling each distributed computing model to process the data block corresponding to one ranking sub-sequence Mp0 respectively;
each calculation model outputs a data processing result corresponding to the data block, and binds the data processing result with a model ID of the calculation model for tracking and inquiring the data block;
And collecting all the data processing results by the cloud server, and orderly feeding back to the host server according to the characteristic ranking sequence Mp.
And the subsequent cloud server arranges each calculation model to perform specific data block analysis on each ranked data block, ranks the data blocks into ladder type arrangement, and performs distribution calculation of the database ranked by the previous step preferentially according to the ladder type arrangement, so as to preferentially obtain corresponding data block processing information. For example, the first data block of the first ladder is arranged according to the ladder type arrangement of 1,3,5,7, the data processing is performed preferentially, and then three data blocks of '2, 3 and 4' of the second ladder are arranged for data processing of the data blocks of the second ladder by three calculation models; and so on.
After each calculation model is processed respectively, the cloud server collects the processed results of each calculation model according to the same ranking, and binds the processed data processing results of each calculation model with the model ID of each calculation model, so that the unified management of the data processing results of each calculation model is facilitated. Meanwhile, if the data processing errors are found, tracking and inquiring processing can be carried out on the computing model with the errors according to the model ID, so that the computing model with the errors can be rapidly positioned, the computing model with the errors can be found, and a tracking and inquiring function is provided for subsequent data dimension inspection.
After the cloud server collects all the calculation results, the calculation results are fed back to the host server, and the host server aggregates all the calculation structures according to the classification to obtain the processing result of the big data.
The method can obtain valuable information extraction of different data blocks, and read from the most valuable information according to the ranking order, thereby realizing the optimization of big data processing.
Therefore, the technical defects caused by the distributed network of the Hadoop cluster structure are avoided by carrying out distributed and feature ranking calculation between the host and the cloud server, a plurality of physical extensions and switches are not needed, the cost and time of the distributed calculation network structure are greatly saved, the processing speed of big data can be greatly improved by utilizing the distributed calculation model MapReduce of the cloud technology, the operation pressure of the host server is greatly reduced, and the response requirement to the big data is improved. According to the scheme, the characteristic model is used for carrying out characteristic extraction and ranking on the data blocks of the big data, distribution calculation is carried out according to the ranking, the calculation model can be orderly arranged according to the characteristic value ranking, the calculation results of the valuable data characteristics are preferably output, the big data is output and applied according to the characteristic ranking, the application degree of the big data calculation results is further refined, the processing results of the big data are optimally utilized, and business data information with high value is provided for users such as enterprises.
The distributed computing model MapReduce can refer to the technical application description of the MapReduce in the prior art.
It should be apparent to those skilled in the art that implementing all or part of the above-described embodiments may be accomplished by computer programs to instruct related hardware, and the programs may be stored in a computer readable storage medium, which when executed may include the processes of the embodiments of the controls described above. It will be appreciated by those skilled in the art that implementing all or part of the above-described embodiments may be accomplished by computer programs to instruct related hardware, and the programs may be stored in a computer readable storage medium, which when executed may include the processes of the embodiments of the controls described above. The storage medium may be a magnetic disk, an optical disc, a Read-only memory (ROM), a random access memory (RandomAccessMemory, RAM), a flash memory (flash memory), a hard disk (HDD), or a Solid State Drive (SSD); the storage medium may also comprise a combination of memories of the kind described above.
Example 2
Based on the implementation principle of embodiment 1, another aspect of the present application proposes a system for implementing the big data information processing method based on deep learning, including:
the cloud computing model construction module is used for constructing a distributed computing model on a cloud server and generating a cloud computing list CCL of the distributed computing model;
the access communication establishing module is used for establishing access communication between a host server and the cloud server and backing up the cloud computing list CCL to the host server;
the classification module is used for importing big data M to be processed into the host server, and performing information classification on the big data by utilizing a classification model which is deployed in advance on the host server to obtain a plurality of data blocks { M1, M2, M3. };
a ranking module, configured to send the data blocks { m1, m2, m 3..+ -. To the cloud server, and rank, by the cloud server, data features p of each of the data blocks { m1, m2, m 3.};
and the distribution calculation module is used for distributing each data block to the distributed calculation model according to the feature ranking sequence, carrying out data processing, and feeding back the processing result to the host server in sequence by the cloud server.
The functions and interaction processes and principles of the above modules are described in detail in embodiment 1, which is not repeated.
The modules or steps of the invention described above may be implemented in a general-purpose computing system, they may be centralized in a single computing system, or distributed across a network of computing systems, where they may alternatively be implemented in program code executable by a computing system, where they may be stored in a memory system and executed by a computing system, where they may be separately fabricated into individual integrated circuit modules, or where multiple modules or steps within them may be fabricated into a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.
Example 3
As shown in fig. 6, in another aspect, the present application further proposes an electronic device, including:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to implement the one deep learning based big data information processing method when executing the executable instructions.
Embodiments of the present disclosure provide for an electronic device that includes a processor and a memory for storing processor-executable instructions. Wherein the processor is configured to implement any one of the deep learning-based big data information processing methods described above when executing the executable instructions.
Here, it should be noted that the number of processors may be one or more. Meanwhile, in the electronic device of the embodiment of the disclosure, an input system and an output system may be further included. The processor, the memory, the input system, and the output system may be connected by a bus, or may be connected by other means, which is not specifically limited herein.
The memory is a computer-readable storage medium that can be used to store software programs, computer-executable programs, and various modules, such as: the embodiment of the disclosure relates to a program or a module corresponding to a big data information processing method based on deep learning. The processor executes various functional applications and data processing of the electronic device by running software programs or modules stored in the memory.
The input system may be used to receive an input digital or signal. Wherein the signal may be a key signal generated in connection with user settings of the device/terminal/server and function control. The output system may include a display device such as a display screen.
The foregoing description of the embodiments of the present disclosure has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the various embodiments described. The terminology used herein was chosen in order to best explain the principles of the embodiments, the practical application, or the technical improvement of the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims (6)

1. The big data information processing method based on deep learning is realized based on data interaction between a host server and a cloud server, and is characterized by comprising the following steps:
constructing a distributed computing model on a cloud server, and generating a cloud computing list CCL of the distributed computing model;
establishing access communication between a host server and the cloud server, and backing up the cloud computing list CCL to the host server;
leading big data M to be processed into the host server, and carrying out information classification on the big data by utilizing a classification model which is deployed in advance on the host server to obtain a plurality of data blocks { M1, M2, M3.
Transmitting the data blocks { m1, m2, m 3..+ -) to the cloud server, and ranking, by the cloud server, data features p of each of the data blocks { m1, m2, m 3..+ -;
distributing each data block to the distributed computing model according to the feature ranking sequence, performing data processing, and feeding back processing results to the host server in sequence by the cloud server;
generating a cloud computing list CCL of the distributed computing model, comprising:
Presetting a cloud computing format list;
recording model IDs of the computing models of all distributed deployments, and sequentially writing the model IDs of the computing models of all distributed deployments into the cloud computing format list;
after carrying out identity statistics and identification on the cloud computing format list by the cloud server, storing the cloud computing format list as a cloud computing list CCL of the distributed computing model;
establishing access communication between a host server and the cloud server, comprising:
the host server initiates an access request for establishing a big data interaction communication link to the cloud server, wherein the access request comprises host identity information, security address information and the data field of big data M to be processed of the host server;
the cloud server receives and analyzes the access request, verifies the host server and judges:
(1) Whether the host identity information of the host server is qualified or not;
(2) Whether the security address information of the host server has address security authentication or not;
(3) Whether the data field of the big data M to be processed of the host server accords with the cloud technical service field of the host server or not;
if all the steps (1) - (3) are met, sending feedback information for receiving the access request to the host server;
The host server establishes a big data interaction communication link with the cloud server based on an IP protocol according to the feedback information of the cloud server;
transmitting the data blocks { m1, m2, m 3..+ -) to the cloud server, ranking, by the cloud server, data features p of each of the data blocks { m1, m2, m 3..+ -.), comprising:
the cloud server receives the data blocks { m1, m2, m3., }, and randomly imports each of the data blocks { m1, m2, m3., } into a pre-deployed deep learning model;
extracting data features of each data block by using the deep learning model, and extracting data features p of each data block;
using a feature importance assessment tool corresponding to the deep learning model to perform feature ranking on the data features p of each data block to obtain a feature ranking sequence Mp of each data block;
distributing each data block to the distributed computing model according to the ranking, performing data processing, and feeding back processing results to the host server in sequence by the cloud server, wherein the method comprises the following steps:
reading the characteristic ranking sequence Mp, and arranging the characteristic ranking sequence Mp according to steps to obtain a plurality of ranking subsequences Mp0 distributed in steps;
Distributing each ranking sub-sequence Mp0 to the distributed computing model, and enabling each distributed computing model to process the data block corresponding to one ranking sub-sequence Mp0 respectively;
each calculation model outputs a data processing result corresponding to the data block, and binds the data processing result with a model ID of the calculation model for tracking and inquiring the data block;
and collecting all the data processing results by the cloud server, and orderly feeding back to the host server according to the characteristic ranking sequence Mp.
2. The deep learning-based big data information processing method according to claim 1, wherein the distributed computing model is preferably MapReduce.
3. The deep learning-based big data information processing method of claim 1, wherein backing up the cloud computing list CCL to the host server comprises:
the cloud server sends an announcement for backing up the cloud computing list CCL to the host server, and judges whether feedback of the host server is received in a preset time or not:
if receiving feedback from the host server within a preset time, sharing the cloud computing list CCL to the host server;
And the host server receives and reads the cloud computing list CCL, obtains the model IDs of the computing models of each distributed deployment, and stores the model IDs in a host database.
4. The deep learning-based big data information processing method of claim 3, wherein backing up the cloud computing list CCL to the host server, further comprises:
if the receiving feedback of the host server is not received within the preset time, sending a big data field qualification notice to the host server, and sharing the cloud computing list CCL to the host server;
and the host server receives and reads the cloud computing list CCL, obtains the model IDs of the computing models of each distributed deployment, and stores the model IDs in a host database.
5. A system for implementing the deep learning-based big data information processing method of any one of claims 1 to 4, comprising:
the cloud computing model construction module is used for constructing a distributed computing model on a cloud server and generating a cloud computing list CCL of the distributed computing model;
the access communication establishing module is used for establishing access communication between a host server and the cloud server and backing up the cloud computing list CCL to the host server;
The classification module is used for importing big data M to be processed into the host server, and performing information classification on the big data by utilizing a classification model which is deployed in advance on the host server to obtain a plurality of data blocks { M1, M2, M3. };
a ranking module, configured to send the data blocks { m1, m2, m 3..+ -. To the cloud server, and rank, by the cloud server, data features p of each of the data blocks { m1, m2, m 3.};
and the distribution calculation module is used for distributing each data block to the distributed calculation model according to the feature ranking sequence, carrying out data processing, and feeding back the processing result to the host server in sequence by the cloud server.
6. An electronic device, comprising:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to implement the deep learning based big data information processing method of any of claims 1-4 when executing the executable instructions.
CN202311317069.2A 2023-10-12 2023-10-12 Big data information processing method based on deep learning Active CN117056060B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311317069.2A CN117056060B (en) 2023-10-12 2023-10-12 Big data information processing method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311317069.2A CN117056060B (en) 2023-10-12 2023-10-12 Big data information processing method based on deep learning

Publications (2)

Publication Number Publication Date
CN117056060A CN117056060A (en) 2023-11-14
CN117056060B true CN117056060B (en) 2024-01-09

Family

ID=88663131

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311317069.2A Active CN117056060B (en) 2023-10-12 2023-10-12 Big data information processing method based on deep learning

Country Status (1)

Country Link
CN (1) CN117056060B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102833516A (en) * 2012-08-23 2012-12-19 深圳先进技术研究院 Cloud computing-based intelligent helmet network system and method for processing video information
WO2014169381A1 (en) * 2013-04-18 2014-10-23 International Business Machines Corporation Extending infrastructure security to services in a cloud computing environment
CN111857523A (en) * 2020-08-04 2020-10-30 吉林师范大学 Computer big data processing acquisition method, system, equipment and medium
CN112073499A (en) * 2020-09-02 2020-12-11 浪潮云信息技术股份公司 Dynamic service method of multi-machine type cloud physical server
CN116302574A (en) * 2023-05-23 2023-06-23 北京前景无忧电子科技股份有限公司 Concurrent processing method based on MapReduce
CN116431282A (en) * 2023-03-29 2023-07-14 度小满科技(北京)有限公司 Cloud virtual host server management method, device, equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102833516A (en) * 2012-08-23 2012-12-19 深圳先进技术研究院 Cloud computing-based intelligent helmet network system and method for processing video information
WO2014169381A1 (en) * 2013-04-18 2014-10-23 International Business Machines Corporation Extending infrastructure security to services in a cloud computing environment
CN111857523A (en) * 2020-08-04 2020-10-30 吉林师范大学 Computer big data processing acquisition method, system, equipment and medium
CN112073499A (en) * 2020-09-02 2020-12-11 浪潮云信息技术股份公司 Dynamic service method of multi-machine type cloud physical server
CN116431282A (en) * 2023-03-29 2023-07-14 度小满科技(北京)有限公司 Cloud virtual host server management method, device, equipment and storage medium
CN116302574A (en) * 2023-05-23 2023-06-23 北京前景无忧电子科技股份有限公司 Concurrent processing method based on MapReduce

Also Published As

Publication number Publication date
CN117056060A (en) 2023-11-14

Similar Documents

Publication Publication Date Title
CN102929961A (en) Data processing method and device thereof based on building quick data staging channel
CN113569117B (en) Cloud platform system and method based on industrial internet big data service
CN101964795A (en) Log collecting system, log collection method and log recycling server
Kamal et al. FbMapping: An automated system for monitoring Facebook data
CN101902497A (en) Cloud computing based internet information monitoring system and method
CN109189578B (en) Storage server allocation method, device, management server and storage system
CN109196807A (en) The method of network node and operation network node to carry out resource dissemination
CN107579858A (en) The alarm method and device of cloud main frame, communication system
WO2021027331A1 (en) Graph data-based full relationship calculation method and apparatus, device, and storage medium
CN101751297A (en) Information system to which a large number of clients can log in and method for large number of clients to log in to same
CN111339052A (en) Unstructured log data processing method and device
CN117056060B (en) Big data information processing method based on deep learning
CN112307247B (en) Distributed face retrieval system and method
CN110839061B (en) Data distribution method, device and storage medium
CN104199850A (en) Method and device for processing essential data
CN113244629B (en) Recall method and device for lost account, storage medium and electronic equipment
CN114817384A (en) Order management method, device, equipment and storage medium
CN114066636A (en) Financial information system based on big data and operation method
CN114356051A (en) Research on electronic information storage technology based on cloud computing and cloud server
CN109669777B (en) Industrial internet big data element demand service providing method and system
CN109828968B (en) Data deduplication processing method, device, equipment, cluster and storage medium
CN112054926A (en) Cluster management method and device, electronic equipment and storage medium
CN113407491A (en) Data processing method and device
CN115982449B (en) Intelligent pushing optimization method based on platform big data feedback
CN111782688A (en) Request processing method, device and equipment based on big data analysis and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant