CN112434201A

CN112434201A - Big data based data visualization method and big data cloud server

Info

Publication number: CN112434201A
Application number: CN202011410341.8A
Authority: CN
Inventors: 高慧军
Original assignee: Individual
Current assignee: Dingdian Software Co ltd Fujian
Priority date: 2020-12-04
Filing date: 2020-12-04
Publication date: 2021-03-02
Anticipated expiration: 2040-12-04
Also published as: CN112434201B; CN113220955A; CN113282813A

Abstract

According to the data visualization method based on the big data and the big data cloud server, the plurality of first to-be-processed data subsets of the target data set can be obtained through division, the first data service type corresponding to each first to-be-processed data subset is determined, different visualization demand information is determined according to the visualization demand information of each first to-be-processed data subset, different visualization recognition results can be obtained, and different visualization processing threads are called to perform visualization processing on each first to-be-processed data subset. Different visualization processing threads can be adapted to different first to-be-processed data subsets, so that the first data service type and the visualization demand information of the first to-be-processed data subsets can be considered, and the visualization identification result corresponding to different visualization demand information is determined, so that the classification visualization processing of different first to-be-processed data subsets can be realized, the complicated and changeable user demands can be met, and the intelligence of data visualization is improved.

Description

Big data based data visualization method and big data cloud server

Technical Field

The application relates to the technical field of big data visualization, in particular to a big data based data visualization method and a big data cloud server.

Background

The rapid development of big data provides great convenience for production and life of people. In order to further improve the value of the data, the data can be clearly and effectively transmitted and communicated, and data visualization is an effective technology. Data visualization can present relatively boring data in a way that is more easily understood and more acceptable to users, thereby serving society. However, common data visualization technologies are difficult to meet the complicated and variable user requirements, so that the intelligence of data visualization is low.

Disclosure of Invention

A first aspect of the present application discloses a big data based data visualization method, the method comprising:

when data visualization processing is carried out on a target data set to be visualized, dividing the target data set into a plurality of first data subsets to be processed;

inputting each first to-be-processed data subset into a data service classification network to obtain a first data service type corresponding to each first to-be-processed data subset in the target data set; the data service classification network is obtained by training based on a plurality of second to-be-processed data subsets included in a first sample data set and at least one second data service type marked in the first sample data set;

determining the visualization demand information corresponding to each first data service type in the target data set according to the first data service type corresponding to each first data subset to be processed and the visualization demand information of each first data subset to be processed;

identifying the visual demand information corresponding to each first data service type in the target data set to obtain a visual identification result of the visual demand information corresponding to each first data service type in the target data set; and calling different visualization processing threads to perform visualization processing on each first to-be-processed data subset based on the visualization recognition result.

In a preferred embodiment, the method further comprises:

dividing the first sample data set into a plurality of second to-be-processed data subsets, and marking at least one second data service type in the first sample data set;

identifying at least one third data service type from the first sample data set according to first data source information corresponding to a plurality of second to-be-processed data subsets included in the first sample data set;

and performing network model training according to at least one third data service type identified from the first sample data set, the marked at least one second data service type and the plurality of second data subsets to be processed to obtain the data service classification network.

In a preferred embodiment, the identifying, according to first data source information corresponding to a plurality of second to-be-processed data subsets included in the first sample data set, at least one third data traffic type from the first sample data set includes:

acquiring first data source information corresponding to each second to-be-processed data subset in the first sample data set;

for first data source information of each second to-be-processed data subset, determining a plurality of second data source information corresponding to the first sample data set;

identifying at least one third data service type from the first sample data set according to a plurality of second data source information corresponding to the first sample data set;

wherein, for the first data source information of each second to-be-processed data subset, determining a plurality of second data source information corresponding to the first sample data set includes: determining a data source time sequence weight of each second to-be-processed data subset according to the first data source information of each second to-be-processed data subset; determining first data source similarity between the first data source information of any two second to-be-processed data subsets according to the first data source information of each second to-be-processed data subset; according to the data source time sequence weight of each second data subset to be processed, carrying out time sequence information characteristic weight weighting on first data source information of which the first data source similarity exceeds a preset similarity value in the first data source information of each second data subset to be processed to obtain a plurality of second data source information;

wherein the identifying at least one third data traffic type from the first sample data set according to the plurality of second data source information corresponding to the first sample data set includes: for each second data source information, determining a second data source similarity between the second data source information and each specified data service type according to the second data source information; selecting a specified data service type with the highest data source similarity between the second data source information and each specified data service type from each specified data service type according to the second data source similarity between the second data source information and each specified data service type; and taking the selected specified data service type as a second data service type corresponding to the second data source information.

In a preferred embodiment, before each first subset of data to be processed is input into the data traffic classification network to obtain the first data traffic type corresponding to each first subset of data to be processed in the target data set, the method further includes:

acquiring a second sample data set, wherein at least one fourth data service type and visual requirement information corresponding to each fourth data service type are marked in the second sample data set;

inputting the second sample data set into the data service classification network, and outputting at least one fifth data service type of the second sample data set and visual requirement information corresponding to each fifth data service type;

performing data classification test on the data service classification network according to the at least one fourth data service type, the visual demand information corresponding to each fourth data service type, and the visual demand information corresponding to the at least one fifth data service type and each fifth data service type; when the data classification test of the data service classification network is successful, executing the step of inputting each first to-be-processed data subset into the data service classification network to obtain a first data service type corresponding to each first to-be-processed data subset in the target data set;

wherein, the performing a data classification test on the data traffic classification network according to the at least one fourth data traffic type and the visual demand information corresponding to each fourth data traffic type, and the at least one fifth data traffic type and the visual demand information corresponding to each fifth data traffic type includes: when the at least one fourth data service type is matched with the at least one fifth data service type, and the visual demand information corresponding to each fourth data service type is matched with the visual demand information corresponding to each fifth data service type, determining that the data classification test of the data service classification network is successful;

the determining, according to the first data service type corresponding to each first to-be-processed data subset and the visualization demand information of each first to-be-processed data subset, the visualization demand information corresponding to each first data service type in the target data set includes: for each first data service type in the target data set, according to at least one first to-be-processed data subset corresponding to the first data service type, taking the visualization demand information of the at least one first to-be-processed data subset corresponding to the first data service type as the visualization demand information corresponding to the first data service type.

In a preferred embodiment, identifying the visualization demand information corresponding to each first data service type in the target data set to obtain a visualization identification result of the visualization demand information corresponding to each first data service type in the target data set includes:

receiving a plurality of data visualization labels of visualization demand information corresponding to each first data service type in the target data set, and acquiring at least one visualization scene feature corresponding to at least one data visualization scene; the at least one visualization scenario feature describes a scenario attribute of each of the at least one data visualization scenario;

according to the at least one visualization scene feature, determining a matching data visualization scene and a matching scene description value for each data visualization tag in the plurality of data visualization tags from the at least one data visualization scene; the matching scene description value represents a probability value of the matching data visualization scene being a correct data visualization scene corresponding to each data visualization label;

selecting a candidate data visualization scene from the matching data visualization scenes corresponding to each data visualization label according to the matching scene description value;

acquiring data visualization indexes and visualization compatibility information of the candidate data visualization scenes, and determining visualization demand identification indexes of the candidate data visualization scenes on the basis of the visualization compatibility information, the data visualization indexes and candidate visualization scene characteristics corresponding to the candidate data visualization scenes; the visual demand identification index represents an information dimension index which corresponds to the candidate data visual scene and is used for demand information identification;

determining whether the candidate data visualization scene is a data visualization scene to be identified according to the visualization demand identification index;

when the candidate visualization scene is the visualization scene of the data to be identified, pairing visualization demand information corresponding to each first data service type in the target data set with the corresponding visualization scene of the data to be identified, and identifying the visualization demand information under the paired visualization scene of the data to be identified to obtain a visualization identification result;

wherein each data visualization tag has tag pointing information and tag type information; the determining a matching data visualization scene and a matching scene description value for each data visualization tag of the plurality of data visualization tags from the at least one data visualization scene according to the at least one visualization scene feature includes:

analyzing scene pointing information of a data visualization scene and scene type information of the data visualization scene corresponding to the at least one data visualization scene from the at least one visualization scene feature;

calculating at least one tag scene pairing coefficient of each data visualization tag and the at least one data visualization scene according to the tag pointing information of each data visualization tag, the tag type information of each data visualization tag, the scene pointing information of the data visualization scene of the at least one data visualization scene, and the scene type information of the data visualization scene of the at least one data visualization scene;

selecting the maximum tag scene matching coefficient from the at least one tag scene matching coefficient; taking a data visualization scene corresponding to the maximum tag scene pairing coefficient in the at least one data visualization scene as the matching data visualization scene of each data visualization tag, and taking the maximum tag scene pairing coefficient as the matching scene description value of each data visualization tag;

wherein, according to the matching scene description value, selecting a candidate data visualization scene from the matching data visualization scenes corresponding to each data visualization tag includes:

selecting one or more current data visualization tags corresponding to the current matching data visualization scene from each data visualization tag; the one or more current data visualization tags are data visualization tags matched with the current matching data visualization scenes, and the current matching data visualization scenes are any data visualization scenes in the matching data visualization scenes corresponding to each data visualization tag;

comparing one or more current matching scene description values corresponding to the one or more current data visualization tags with preset scene description value thresholds respectively to obtain one or more scene description value comparison results; the one or more scene description value comparison results represent whether the one or more current matching scene description values are smaller than the preset scene description value threshold value;

and when the one or more scene description value comparison results represent that the one or more current matching scene description values are all smaller than the preset scene description value threshold value, taking the current matching data visualization scene as the candidate data visualization scene.

In a preferred embodiment, the determining a visualization demand identification indicator of the candidate data visualization scene based on the visualization compatibility information, the data visualization indicator, and the candidate visualization scene feature corresponding to the candidate data visualization scene includes:

constructing a first business visual identification index by using the visual compatible information; constructing a second business visual identification index by using the data visual index and the candidate visual scene characteristics;

calculating the visual demand identification index according to the first business visual identification index and the second business visual identification index;

wherein the visual compatibility information comprises: visual data defect values and visual data deviation values; the constructing a first business visual identification index by using the visual compatible information includes: when the sum of the visual data defect value and the visual data deviation value is larger than a preset compatibility evaluation threshold value, constructing a first business visual identification index by using the visual data defect value and the visual data deviation value; when the sum of the visual data defect value and the visual data deviation value is less than or equal to the preset compatibility evaluation threshold value, determining the obtained first preset business visual identification index as the first business visual identification index;

wherein the data visualization indicators comprise: visualization area ratio and visualization data difference degree; constructing a second business visual identification index by using the data visual index and the candidate visual scene characteristics, wherein the second business visual identification index comprises the following steps: when the visual area ratio is larger than a set area ratio, taking the obtained second preset service visual identification index as the second service visual identification index; when the visualization area ratio is smaller than or equal to the set area ratio, analyzing feature difference information from the candidate visualization scene features, and constructing the second business visualization identification index by using the feature difference information and the visualization data difference degree; wherein the feature difference information characterizes whether a difference feature exists in the candidate visual scene features;

the constructing the first business visual identification index by using the visual data defect value and the visual data deviation value includes: calculating a visual compatible sequence value by using the visual data defect value and the visual data deviation value; and constructing the visual identification index of the first service according to the corresponding relation between the visual compatible sequence value and a preset sequence value.

In a preferred embodiment, the acquiring data visualization indicators and visualization compatibility information of the candidate data visualization scenario includes:

counting the data visualization duration and visualization area ratio of the candidate data visualization scene from historical data visualization records; the data visualization duration characterization represents the accumulated value of effective visualization durations matched with the candidate data visualization scenes, and the visualization area occupation characterization represents the number of effective visualization durations matched with the candidate data visualization scenes and the matching scene description values corresponding to all the data visualization labels are smaller than a preset scene description value threshold; the historical data visualization record is obtained by mining based on a plurality of historical data visualization labels corresponding to historical visualization time intervals;

determining the visualization data difference degree according to the visualization area ratio and the data visualization duration, and forming the data visualization index by using the visualization area ratio and the visualization data difference degree;

and counting visual data defect values and visual data deviation values of the candidate data visual scenes from the historical data visual records, and forming the visual compatible information by using the visual data defect values and the visual data deviation values.

In a preferred embodiment, based on the visual recognition result, invoking different visual processing threads to perform visual processing on each first to-be-processed data subset includes:

determining visual area distribution information, visual data form information and visual interface updating frequency information of a visual identification result;

determining first thread matching information corresponding to the visual identification result based on the visual interface updating frequency information of the visual identification result and visual interface updating frequency information of a reference visual identification result, wherein the reference visual identification result comprises three visual interface updating frequencies with different updating triggering conditions, the updating frequency mean value of the included visual interface updating frequencies is greater than a set updating frequency value, and the result generation time of the reference visual identification result is before the result generation time of the visual identification result;

determining thread calling indication information corresponding to the visual recognition result based on visual area allocation information and visual data form information of the visual recognition result, visual error reporting information and visual feedback information corresponding to a last visual recognition result and the first thread matching information, wherein the thread calling indication information at least comprises second thread matching information, and the thread calling indication information corresponding to the visual recognition result refers to thread calling indication information of a big data cloud server when the visual recognition result is determined;

if the thread matching difference degree between the first thread matching information and the second thread matching information is larger than a set difference degree, determining that the visual identification result is a key visual identification result, and determining third thread matching information, corresponding visual error reporting indication information and visual feedback indication information of all key visual identification results in a target data set based on the first thread matching information and the thread calling indication information;

and determining a visual processing thread corresponding to the visual recognition result based on third thread matching information, corresponding visual error reporting indication information and visual feedback indication information of all key visual recognition results in the target data set, and operating the visual thread to perform visual processing on each first to-be-processed data subset.

A second aspect of the application discloses a big data cloud server, which comprises a processing engine, a network module and a memory; the processing engine and the memory communicate via the network module, and the processing engine reads the computer program from the memory and runs it to perform the method of the first aspect.

A third aspect of the present application discloses a computer-readable signal medium having stored thereon a computer program which, when executed, implements the method of the first aspect.

Compared with the prior art, the big data based data visualization method and the big data cloud server provided by the embodiment of the invention have the following technical effects: the method comprises the steps of dividing a plurality of first to-be-processed data subsets of a target data set, determining a first data service type corresponding to each first to-be-processed data subset, and further determining visual demand information corresponding to each first data service type in the target data set according to the visual demand information of each first to-be-processed data subset, so that a visual identification result of the visual demand information corresponding to each first data service type can be obtained, and different visual processing threads are called to perform visual processing on each first to-be-processed data subset. Different visualization processing threads can be adapted to different first to-be-processed data subsets, so that the first data service type and the visualization demand information of the first to-be-processed data subsets can be considered, and the visualization identification result corresponding to different visualization demand information is determined, so that the classification visualization processing of different first to-be-processed data subsets can be realized, the complicated and changeable user requirements are met, and the intelligence of data visualization is improved.

In the description that follows, additional features will be set forth, in part, in the description. These features will be in part apparent to those skilled in the art upon examination of the following and the accompanying drawings, or may be learned by production or use. The features of the present application may be realized and attained by practice or use of various aspects of the methodologies, instrumentalities and combinations particularly pointed out in the detailed examples that follow.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.

The methods, systems, and/or processes of the figures are further described in accordance with the exemplary embodiments. These exemplary embodiments will be described in detail with reference to the drawings. These exemplary embodiments are non-limiting exemplary embodiments in which reference numerals represent similar mechanisms throughout the various views of the drawings.

FIG. 1 is a block diagram of an exemplary big data based data visualization system, shown in accordance with some embodiments of the present invention.

FIG. 2 is a schematic diagram illustrating the hardware and software components of an exemplary big data cloud server, according to some embodiments of the invention.

FIG. 3 is a flow diagram illustrating an exemplary big data based data visualization method and/or process according to some embodiments of the invention.

FIG. 4 is a block diagram of an exemplary big data based data visualization device, shown in accordance with some embodiments of the present invention.

Detailed Description

The inventor finds that in order to ensure that data visualization can meet the requirements of complex and variable users and improve the intelligence of data visualization, the visualization data needs to be split and cannot be comprehensively subjected to mechanical and integral visualization processing. Therefore, the inventor innovatively provides a data visualization method based on big data and a big data cloud server, a first to-be-processed data subset is obtained by splitting a target data set to be visualized, and classification visualization processing is performed on the first to-be-processed data subset, so that data visualization can be ensured to meet the complex and changeable user requirements, and the intelligence of the data visualization is improved.

In order to better understand the technical solutions of the present invention, the following detailed descriptions of the technical solutions of the present invention are provided with the accompanying drawings and the specific embodiments, and it should be understood that the specific features in the embodiments and the examples of the present invention are the detailed descriptions of the technical solutions of the present invention, and are not limitations of the technical solutions of the present invention, and the technical features in the embodiments and the examples of the present invention may be combined with each other without conflict.

In the following detailed description, numerous specific details are set forth by way of examples in order to provide a thorough understanding of the relevant guidance. It will be apparent, however, to one skilled in the art that the present invention may be practiced without these specific details. In other instances, well-known methods, procedures, systems, compositions, and/or circuits have been described at a relatively high-level, without detail, in order to avoid unnecessarily obscuring aspects of the invention.

These and other features, functions, methods of execution, and combination of functions and elements of related elements in the structure and economies of manufacture disclosed in the present application may become more apparent upon consideration of the following description with reference to the accompanying drawings, all of which form a part of this application. It is to be expressly understood, however, that the drawings are for the purpose of illustration and description only and are not intended as a definition of the limits of the application. It should be understood that the drawings are not to scale. It is to be expressly understood, however, that the drawings are for the purpose of illustration and description only and are not intended as a definition of the limits of the invention. It should be understood that the drawings are not to scale.

Flowcharts are used herein to illustrate the implementations performed by systems according to embodiments of the present application. It should be expressly understood that the processes performed by the flowcharts may be performed out of order. Rather, these implementations may be performed in the reverse order or simultaneously. In addition, at least one other implementation may be added to the flowchart. One or more implementations may be deleted from the flowchart.

Fig. 1 is a block diagram illustrating an exemplary big data based data visualization system 300, according to some embodiments of the present invention, the big data based data visualization system 300 may include a big data server 100 and a user terminal 200. The user terminal 200 may be an intelligent terminal such as a mobile phone, a tablet computer, and a notebook computer, which is not limited herein.

In some embodiments, as shown in fig. 2, big data cloud server 100 may include a processing engine 110, a network module 120, and a memory 130, processing engine 110 and memory 130 communicating through network module 120.

Processing engine 110 may process the relevant information and/or data to perform one or more of the functions described herein. For example, in some embodiments, processing engine 110 may include at least one processing engine (e.g., a single core processing engine or a multi-core processor). By way of example only, the Processing engine 110 may include a Central Processing Unit (CPU), an Application-Specific Integrated Circuit (ASIC), an Application-Specific Instruction Set Processor (ASIP), a Graphics Processing Unit (GPU), a Physical Processing Unit (PPU), a Digital Signal Processor (DSP), a Field Programmable Gate Array (FPGA), a Programmable Logic Device (PLD), a controller, a microcontroller Unit, a Reduced Instruction Set Computer (RISC), a microprocessor, or the like, or any combination thereof.

Network module 120 may facilitate the exchange of information and/or data. In some embodiments, the network module 120 may be any type of wired or wireless network or combination thereof. Merely by way of example, the Network module 120 may include a cable Network, a wired Network, a fiber optic Network, a telecommunications Network, an intranet, the internet, a Local Area Network (LAN), a Wide Area Network (WAN), a Wireless Local Area Network (WLAN), a Metropolitan Area Network (MAN), a Public Switched Telephone Network (PSTN), a bluetooth Network, a Wireless personal Area Network, a Near Field Communication (NFC) Network, and the like, or any combination thereof. In some embodiments, the network module 120 may include at least one network access point. For example, the network module 120 may include wired or wireless network access points, such as base stations and/or network access points.

The Memory 130 may be, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Read-Only Memory (EPROM), an electrically Erasable Read-Only Memory (EEPROM), and the like. The memory 130 is used for storing a program, and the processing engine 110 executes the program after receiving the execution instruction.

It is to be understood that the configuration shown in fig. 2 is merely illustrative, and that big data cloud server 100 may also include more or fewer components than shown in fig. 2, or have a different configuration than shown in fig. 2. The components shown in fig. 2 may be implemented in hardware, software, or a combination thereof.

Fig. 3 is a flowchart illustrating an exemplary big data based data visualization method and/or process according to some embodiments of the present invention, which is applied to the big data cloud server 100 in fig. 1, and may specifically include the contents described in the following steps S31 to S34.

Step S31, when performing data visualization processing on a target data set to be visualized, dividing the target data set into a plurality of first data subsets to be processed.

For example, the target data set may be a plurality of service data acquired, obtained, or organized by the big data cloud server in advance, and the service data covers a wide range of fields, such as block chain finance, online payment, smart city management, internet of things monitoring, intelligent production, smart parks, cloud game platforms, network game services, and the like, which are not limited herein.

Step S32, inputting each first to-be-processed data subset into the data traffic classification network, and obtaining a first data traffic type corresponding to each first to-be-processed data subset in the target data set.

For example, the data traffic classification network is trained based on a plurality of second to-be-processed data subsets included in a first sample data set and at least one second data traffic type labeled in the first sample data set. The data traffic classification network may be a convolutional neural network, a classifier, a vector machine, etc., and is not limited herein. The sample data set may be for training a model.

Step S33, determining the visualization demand information corresponding to each first data traffic type in the target data set according to the first data traffic type corresponding to each first to-be-processed data subset and the visualization demand information of each first to-be-processed data subset.

For example, the visualization demand information may be sent by the user terminal to a big data cloud server. The visualization demand information of different first to-be-processed data subsets may be different, and the visualization demand information corresponding to each first data service type may also be different.

Step S34, identifying the visual requirement information corresponding to each first data service type in the target data set to obtain a visual identification result of the visual requirement information corresponding to each first data service type in the target data set; and calling different visualization processing threads to perform visualization processing on each first to-be-processed data subset based on the visualization recognition result.

For example, the visualization recognition result is used for describing more appropriate visualization processing guidance information corresponding to the visualization demand information, the visualization processing thread may be configured in the big data cloud server in advance, and the visualization processing thread may include data tabulation processing, data histogram processing, data pie graphing processing, data node graphing processing, data linear graphing processing, and the like. It can be understood that the first data service types of different first to-be-processed data subsets may be different, and therefore, the visualization processing method adapted to different first to-be-processed data subsets may also be different.

In actual implementation, by applying the steps S31 to S34, a plurality of first to-be-processed data subsets of the target data set can be obtained by dividing, and a first data service type corresponding to each first to-be-processed data subset is determined, and then visual requirement information corresponding to each first data service type in the target data set is determined according to the visual requirement information of each first to-be-processed data subset, so that a visual identification result of the visual requirement information corresponding to each first data service type can be obtained, and thus, different visual processing threads are invoked to perform visual processing on each first to-be-processed data subset. Different visualization processing threads can be adapted to different first to-be-processed data subsets, so that the first data service type and the visualization demand information of the first to-be-processed data subsets can be considered, and the visualization identification result corresponding to different visualization demand information is determined, so that the classification visualization processing of different first to-be-processed data subsets can be realized, the complicated and changeable user requirements are met, and the intelligence of data visualization is improved.

In the following, some alternative embodiments will be described, which should be understood as examples and not as technical features essential for implementing the present solution.

In some examples, to ensure classification accuracy and reliability of a data traffic classification network, accurate model training of the data traffic classification network is required. To achieve this, the training method of the data traffic classification network described with respect to step S32 may include the following steps S21-S23.

Step S21, dividing the first sample data set into a plurality of second to-be-processed data subsets, and marking at least one second data service type in the first sample data set.

Step S22, identifying at least one third data traffic type from the first sample data set according to first data source information corresponding to a plurality of second to-be-processed data subsets included in the first sample data set.

For example, the data source information is used to record the data source of the second different subset of data to be processed. The third data traffic type is different from the second traffic data type, the third data traffic type is not marked, the third data traffic type may be a traffic type having variability, and the second data traffic type and the first data traffic type may not have variability.

Step S23, performing network model training according to at least one third data service type identified from the first sample data set, at least one labeled second data service type, and the plurality of second to-be-processed data subsets, to obtain the data service classification network.

It can be understood that by implementing the above steps S21-S23, variability of data traffic types can be taken into account, so as to implement accurate model training on the data traffic classification network, and ensure classification accuracy and reliability of the data traffic classification network.

Further, the identifying at least one third data traffic type from the first sample data set according to the first data source information corresponding to the plurality of second to-be-processed data subsets included in the first sample data set in step S22 may include steps S221 to S223.

Step S221, obtaining first data source information corresponding to each second to-be-processed data subset in the first sample data set.

Step S222, for the first data source information of each second to-be-processed data subset, determining a plurality of second data source information corresponding to the first sample data set.

For example, the second data source information may be global source information corresponding to the first sample data set, and the first data source information may be local source information corresponding to the first sample data set. Or it may be understood that the timeliness of the second data source information is better than the timeliness of the first data source information.

Step S223, identifying at least one third data service type from the first sample data set according to the plurality of second data source information corresponding to the first sample data set.

Thus, the timeliness of the third data traffic type can be ensured through the above-mentioned steps S221 to S223.

Further, the determining, for the first data source information of each second to-be-processed data subset, the second data source information corresponding to the first sample data set in step S222 may include steps S2221 to S2223.

Step S2221, determine a data source timing weight of each second to-be-processed data subset according to the first data source information of each second to-be-processed data subset.

For example, the data source timing weight may be used to characterize the timeliness of the data source of the second subset of data to be processed.

Step S2222, determine a first data source similarity between the first data source information of any two second to-be-processed data subsets according to the first data source information of each second to-be-processed data subset.

For example, data source similarity may characterize similarity and correlation between information from different data sources.

Step S2223, according to the data source timing weight of each second to-be-processed data subset, perform timing information feature weight weighting on the first data source information of which the first data source similarity exceeds a preset similarity value in the first data source information of each second to-be-processed data subset, so as to obtain a plurality of second data source information.

By such design, a plurality of second data source information can be accurately obtained based on the steps S2221 to S2223.

Further, the identifying at least one third data service type from the first sample data set according to the plurality of second data source information corresponding to the first sample data set in step S223 may include the following steps S2231 to S2233.

Step S2231, for each second data source information, determining a second data source similarity between the second data source information and each specified data traffic type according to the second data source information.

For example, the specified data traffic type may be adjusted according to actual situations, and is not limited herein.

Step S2232, according to a second data source similarity between the second data source information and each designated data service type, selecting a designated data service type with the highest data source similarity between the designated data service type and the second data source information from the designated data service types.

Step S2233, using the selected designated data service type as a second data service type corresponding to the second data source information.

In a possible embodiment, before inputting each first to-be-processed data subset into the data traffic classification network as described in step S32 to obtain the content of the first data traffic type corresponding to each first to-be-processed data subset in the target data set, the method may further include the following steps S41-S43.

Step S41, a second sample data set is obtained, where at least one fourth data service type and the visualization requirement information corresponding to each fourth data service type are marked in the second sample data set.

It will be appreciated that the second sample data set and the first sample data set are different sample data sets, the second sample data set may be a test set and the first sample data set may be a training set.

Step S42, inputting the second sample data set into the data traffic classification network, and outputting at least one fifth data traffic type of the second sample data set and the visual requirement information corresponding to each fifth data traffic type.

For example, the fourth data traffic type and the fifth data traffic type are used to implement a test of the data traffic classification network.

Step S43, performing a data classification test on the data service classification network according to the at least one fourth data service type and the visual demand information corresponding to each fourth data service type, and the at least one fifth data service type and the visual demand information corresponding to each fifth data service type; and when the data classification test of the data service classification network is successful, executing the step of inputting each first to-be-processed data subset into the data service classification network to obtain a first data service type corresponding to each first to-be-processed data subset in the target data set.

It can be understood that based on the above steps S41-S43, a test of the data traffic classification model can be implemented, thereby ensuring the model stability and classification accuracy of the data traffic classification model.

Further, the performing, by the step S43, a data classification test on the data traffic classification network according to the at least one fourth data traffic type and the visual requirement information corresponding to each fourth data traffic type, and the at least one fifth data traffic type and the visual requirement information corresponding to each fifth data traffic type may include: and when the at least one fourth data service type is matched with the at least one fifth data service type, and the visual demand information corresponding to each fourth data service type is matched with the visual demand information corresponding to each fifth data service type, determining that the data classification test of the data service classification network is successful.

In some embodiments, the determining the visualization demand information corresponding to each first data traffic type in the target data set according to the first data traffic type corresponding to each first subset of data to be processed and the visualization demand information of each first subset of data to be processed, which is described in step S33, may include the following: for each first data service type in the target data set, according to at least one first to-be-processed data subset corresponding to the first data service type, taking the visualization demand information of the at least one first to-be-processed data subset corresponding to the first data service type as the visualization demand information corresponding to the first data service type.

In some examples, the identifying the visualization demand information corresponding to each first data traffic type in the target data set in step S34 to obtain the visualization identification result of the visualization demand information corresponding to each first data traffic type in the target data set may include the following steps S341 to S346.

Step S341, receiving a plurality of data visualization tags of visualization demand information corresponding to each first data service type in the target data set, and obtaining at least one visualization scene feature corresponding to at least one data visualization scene; the at least one visualization scenario feature describes a scenario attribute of each of the at least one data visualization scenario.

Step S342, determining a matching data visualization scene and a matching scene description value for each data visualization tag of the plurality of data visualization tags from the at least one data visualization scene according to the at least one visualization scene feature; the matching scenario description value represents a probability value of the matching data visualization scenario being a correct data visualization scenario corresponding to each data visualization tag.

Step S343, according to the matching scene description value, selecting a candidate data visualization scene from the matching data visualization scenes corresponding to each data visualization tag.

Step S344, acquiring data visualization indexes and visualization compatibility information of the candidate data visualization scenes, and determining visualization demand identification indexes of the candidate data visualization scenes based on the visualization compatibility information, the data visualization indexes and candidate visualization scene features corresponding to the candidate data visualization scenes; and the visual demand identification index represents an information dimension index which is corresponding to the candidate data visual scene and is used for demand information identification.

Step S345, determining whether the candidate data visualization scene is a to-be-identified data visualization scene according to the visualization demand identification indicator.

Step S346, when the candidate visualization scene is the visualization scene of the data to be identified, pairing the visualization demand information corresponding to each first data service type in the target data set with the corresponding visualization scene of the data to be identified, and identifying the visualization demand information under the paired visualization scene of the data to be identified to obtain the visualization identification result.

By means of the design, through the execution of the steps S341 to S346, the multiple data visualization tags of the visualization demand information and the visualization scene features corresponding to the obtained data visualization scenes can be comprehensively analyzed, so that the data visualization scenes to be identified are determined, and then the visualization demand information and the data visualization scenes to be identified are paired, so that the visualization demand information after pairing can be identified, and the high matching performance of the visualization identification result and the actual service scene is ensured.

In some examples, the step S342 of determining a matching data visualization scenario and a matching scenario description value for each data visualization tag of the plurality of data visualization tags from the at least one data visualization scenario according to the at least one visualization scenario feature may include the following steps S3421 to S3423.

Step S3421, analyzing scene pointing information of the data visualization scene and scene type information of the data visualization scene corresponding to the at least one data visualization scene from the at least one visualization scene feature.

Step S3422, calculating at least one tag scene pairing coefficient of each data visualization tag and the at least one data visualization scene according to the tag pointing information of each data visualization tag, the tag type information of each data visualization tag, the scene pointing information of the data visualization scene of the at least one data visualization scene, and the scene type information of the data visualization scene of the at least one data visualization scene.

Step S3423, selecting the maximum tag scene matching coefficient from the at least one tag scene matching coefficient; and taking the data visualization scene corresponding to the maximum tag scene pairing coefficient in the at least one data visualization scene as the matching data visualization scene of each data visualization tag, and taking the maximum tag scene pairing coefficient as the matching scene description value of each data visualization tag.

Therefore, the matching data visualization scene and the matching scene description value can be ensured to be consistent with the actual data visualization scene, and the dynamic combination with the actual business scene is realized. In addition, the descriptive values may comprehensively summarize the responsive data or information, thereby reducing the processing pressure of the big data cloud server.

Further, the step S343 of picking out a candidate data visualization scene from the matching data visualization scenes corresponding to each data visualization tag according to the matching scene description value may include steps S3431 to S3433.

Step S3431, selecting one or more current data visualization tags corresponding to the current matching data visualization scene from each data visualization tag; the one or more current data visualization tags are data visualization tags matched with the current matching data visualization scenes, and the current matching data visualization scenes are any data visualization scenes in the matching data visualization scenes corresponding to each data visualization tag.

Step S3432, comparing one or more current matching scene description values corresponding to the one or more current data visualization tags with preset scene description value thresholds respectively to obtain one or more scene description value comparison results; the one or more scene description value comparison results represent whether the one or more current matching scene description values are less than the preset scene description value threshold.

Step S3433, when the one or more scene description value comparison results indicate that the one or more current matching scene description values are all smaller than the preset scene description value threshold, taking the current matching data visualization scene as the candidate data visualization scene.

By such design, when the above steps S3431 to S3433 are applied, it can be ensured that the candidate data visualization scene matches the actual data visualization scene.

Further, the determining of the visualization requirement identification index of the candidate data visualization scene based on the visualization compatibility information, the data visualization index and the candidate visualization scene feature corresponding to the candidate data visualization scene, which is described in step S344, may include step S3441 and step S3442.

Step S3441, constructing a first business visual identification index by using the visual compatible information; and constructing a second business visual identification index by using the data visual index and the candidate visual scene characteristics.

Step S3442, calculating the visual demand identification index according to the first business visual identification index and the second business visual identification index.

In this way, by performing the above steps S3441 and S3442, the visual demand recognition index can be accurately and comprehensively calculated.

In some examples, the visualization compatibility information includes: visual data defect values and visual data deviation values. Based on this, the step S3441 of constructing the visual identification index of the first business by using the visual compatibility information may include the steps S3441a and S3441 b.

Step S3441a, when the sum of the visual data defect value and the visual data deviation value is greater than a preset compatibility evaluation threshold, construct the first business visual identification index by using the visual data defect value and the visual data deviation value.

Step S3441b, when the sum of the visual data defect value and the visual data deviation value is less than or equal to the preset compatibility evaluation threshold, determine that the obtained first preset business visual identification index is the first business visual identification index.

For example, the defect and deviation values are used to measure errors and mistakes that may occur during visualization.

In some examples, the data visualization metrics include: visualization area occupancy and visualization data dissimilarity. Based on this, the step S3441 of constructing the second business visualization identification index by using the data visualization index and the candidate visualization scene feature may include steps S3441c and S3441 d.

Step S3441c, when the visual area ratio is greater than the set area ratio, using the obtained second preset visual business identification index as the second visual business identification index.

Step S3441d, when the visualization area ratio is less than or equal to the set area ratio, analyzing feature difference information from the candidate visualization scene features, and constructing the second business visualization identification index by using the feature difference information and the visualization data difference degree; wherein the feature difference information characterizes whether a difference feature exists in the candidate visual scene features; wherein the difference features characterize a difference condition during the visualization process.

Further, the constructing the first business visual identification index by using the visual data defect value and the visual data deviation value described in step S3441a may include: calculating a visual compatible sequence value by using the visual data defect value and the visual data deviation value; and constructing the visual identification index of the first service according to the corresponding relation between the visual compatible sequence value and a preset sequence value.

Further, the acquiring of the data visualization index and the visualization compatibility information of the candidate data visualization scenario described in step S344 may further include the following steps (1) to (3).

(1) Counting the data visualization duration and visualization area ratio of the candidate data visualization scene from historical data visualization records; the data visualization duration characterization represents the accumulated value of effective visualization durations matched with the candidate data visualization scenes, and the visualization area occupation characterization represents the number of effective visualization durations matched with the candidate data visualization scenes and the matching scene description values corresponding to all the data visualization labels are smaller than a preset scene description value threshold; the historical data visualization record is obtained by mining based on a plurality of historical data visualization tags corresponding to historical visualization time periods.

(2) And determining the visualization data difference according to the visualization area ratio and the data visualization duration, and forming the data visualization index by using the visualization area ratio and the visualization data difference.

(3) And counting visual data defect values and visual data deviation values of the candidate data visual scenes from the historical data visual records, and forming the visual compatible information by using the visual data defect values and the visual data deviation values.

For example, the number of valid visualization durations may be understood as the number of segments of valid visualization durations.

In some examples, the step S34 of invoking different visualization processing threads to perform visualization processing on each first to-be-processed data subset based on the visualization recognition result may include the following steps S34a to S34 e.

Step S34a, determining visualization area allocation information, visualization data format information, and visualization interface update frequency information of the visualization recognition result.

Step S34b, determining first thread matching information corresponding to the visual recognition result based on the visual interface update frequency information of the visual recognition result and the visual interface update frequency information of a reference visual recognition result, where the reference visual recognition result is a visual recognition result that includes three visual interface update frequencies with different update trigger conditions and includes a visual interface update frequency average value greater than a set update frequency value, and the result generation time of the reference visual recognition result is before the result generation time of the visual recognition result.

Step S34c, determining thread calling indication information corresponding to the visual recognition result based on the visual area allocation information and the visual data form information of the visual recognition result, the visual error reporting information and the visual feedback information corresponding to the previous visual recognition result, and the first thread matching information, where the thread calling indication information at least includes the second thread matching information, and the thread calling indication information corresponding to the visual recognition result refers to the thread calling indication information of the big data cloud server when the visual recognition result is determined.

Step S34d, if the thread matching difference between the first thread matching information and the second thread matching information is greater than the set difference, determining that the visual identification result is a key visual identification result, and determining third thread matching information, corresponding visual error reporting indication information, and visual feedback indication information of all key visual identification results in the target data set based on the first thread matching information and the thread calling indication information.

Step S34e, determining a visualization processing thread corresponding to the visualization recognition result based on the third thread matching information, the corresponding visualization error reporting indication information, and the visualization feedback indication information of all the key visualization recognition results in the target data set, and running the visualization thread to perform visualization processing on each first to-be-processed data subset.

By means of the design, through implementation of the steps S34 a-S34 e, the third thread matching information, the corresponding visual error reporting indication information and the visual feedback indication information of the key visual recognition result can be fully considered before the first to-be-processed data subset is subjected to visual processing, so that the adaptive visual processing thread is determined, and thus, classification visual processing of different first to-be-processed data subsets can be realized, the complex and variable user requirements are met, and the intelligence of data visualization is improved.

In an alternative embodiment, the method may further include the following step S35: and outputting the first subset of the data to be processed which is subjected to the visualization processing. For example, the output mode may be a display on a different platform or a different terminal, which is not limited herein, and thus the data can be presented in a manner that the user can receive and understand the data, thereby improving the readability and understandability of the data.

Fig. 4 is a block diagram illustrating an exemplary big data based data visualization device 140, according to some embodiments of the present invention, the big data based data visualization device 140 may include the following functional modules.

The data dividing module 141 is configured to, when performing data visualization processing on a target data set to be visualized, divide the target data set into a plurality of first data subsets to be processed.

A data classification module 142, configured to input each first to-be-processed data subset into a data service classification network, to obtain a first data service type corresponding to each first to-be-processed data subset in the target data set; the data traffic classification network is trained based on a plurality of second to-be-processed data subsets included in a first sample data set and at least one second data traffic type marked in the first sample data set.

A requirement determining module 143, configured to determine, according to the first data service type corresponding to each first subset of data to be processed and the visualization requirement information of each first subset of data to be processed, visualization requirement information corresponding to each first data service type in the target data set.

The visualization processing module 144 is configured to identify the visualization demand information corresponding to each first data service type in the target data set, so as to obtain a visualization identification result of the visualization demand information corresponding to each first data service type in the target data set; and calling different visualization processing threads to perform visualization processing on each first to-be-processed data subset based on the visualization recognition result.

Reference may be made to the description of method embodiments in relation to the description of the above-described apparatus embodiments.

Based on the same inventive concept, a system embodiment is also provided, and the description about the system embodiment can be as follows.

A big data based data visualization system comprises a big data cloud server and a user terminal which are communicated with each other; wherein the big data cloud server is configured to:

Reference may be made to the description of method embodiments with respect to the above description of system embodiments.

It should be understood that, for technical terms that are not noun-explained in the above, a person skilled in the art can deduce and unambiguously determine the meaning of the present invention from the above disclosure, for example, for some values, coefficients, weights, indexes, factors and other terms, a person skilled in the art can deduce and determine from the logical relationship between the above and the below, and the value range of these values can be selected according to the actual situation, for example, 0 to 1, for example, 1 to 10, and for example, 50 to 100, which is not limited herein.

The skilled person can unambiguously determine some preset, reference, predetermined, set and target technical features/terms, such as threshold values, threshold intervals, threshold ranges, etc., from the above disclosure. For some technical characteristic terms which are not explained, the technical solution can be clearly and completely implemented by those skilled in the art by reasonably and unambiguously deriving the technical solution based on the logical relations in the previous and following paragraphs. Prefixes of unexplained technical feature terms, such as "first", "second", "previous", "next", "current", "history", "latest", "best", "target", "specified", and "real-time", etc., can be unambiguously derived and determined from the context. Suffixes of technical feature terms not to be explained, such as "list", "feature", "sequence", "set", "matrix", "unit", "element", "track", and "list", etc., can also be derived and determined unambiguously from the foregoing and the following.

The foregoing disclosure of embodiments of the present invention will be apparent to those skilled in the art. It should be understood that the process of deriving and analyzing technical terms, which are not explained, by those skilled in the art based on the above disclosure is based on the contents described in the present application, and thus the above contents are not an inventive judgment of the overall scheme.

Having thus described the basic concept, it will be apparent to those skilled in the art that the foregoing detailed disclosure is to be considered merely illustrative and not restrictive of the broad application. Various modifications, improvements and adaptations to the present application may occur to those skilled in the art, although not explicitly described herein. Such modifications, improvements and adaptations are proposed in the present application and thus fall within the spirit and scope of the exemplary embodiments of the present application.

Also, this application uses specific terminology to describe embodiments of the application. Reference throughout this specification to "one embodiment," "an embodiment," and/or "some embodiments" means that a particular feature, structure, or characteristic described in connection with at least one embodiment of the present application is included in at least one embodiment of the present application. Therefore, it is emphasized and should be appreciated that two or more references to "an embodiment" or "one embodiment" or "an alternative embodiment" in various portions of this specification are not necessarily all referring to the same embodiment. Furthermore, some features, structures, or characteristics of at least one embodiment of the present application may be combined as appropriate.

In addition, those skilled in the art will recognize that the various aspects of the application may be illustrated and described in terms of several patentable species or contexts, including any new and useful combination of procedures, machines, articles, or materials, or any new and useful modifications thereof. Accordingly, various aspects of the present application may be embodied entirely in hardware, entirely in software (including firmware, resident software, micro-code, etc.) or in a combination of hardware and software. The above hardware or software may be referred to as a "unit", "component", or "system". Furthermore, aspects of the present application may be represented as a computer product, including computer readable program code, embodied in at least one computer readable medium.

A computer readable signal medium may comprise a propagated data signal with computer program code embodied therein, for example, on a baseband or as part of a carrier wave. The propagated signal may take any of a variety of forms, including electromagnetic, optical, and the like, or any suitable combination. A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code on a computer readable signal medium may be propagated over any suitable medium, including radio, electrical cable, fiber optic cable, RF, or the like, or any combination of the preceding.

Computer program code required for the execution of aspects of the present application may be written in any combination of one or more programming languages, including object oriented programming, such as Java, Scala, Smalltalk, Eiffel, JADE, Emerald, C + +, C #, VB.NET, Python, and the like, or similar conventional programming languages, such as the "C" programming language, Visual Basic, Fortran 2003, Perl, COBOL 2002, PHP, ABAP, dynamic programming languages, such as Python, Ruby, and Groovy, or other programming languages. The programming code may execute entirely on the user's computer, as a stand-alone software package, partly on the user's computer, partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any network format, such as a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet), or in a cloud computing environment, or as a service, such as a software as a service (SaaS).

Additionally, the order of the process elements and sequences described herein, the use of numerical letters, or other designations are not intended to limit the order of the processes and methods unless otherwise indicated in the claims. While various presently contemplated embodiments of the invention have been discussed in the foregoing disclosure by way of example, it should be understood that such detail is solely for that purpose and that the appended claims are not limited to the disclosed embodiments, but, on the contrary, are intended to cover all modifications and equivalent arrangements that are within the spirit and scope of the embodiments herein. For example, although the system components described above may be implemented by hardware means, they may also be implemented by software-only solutions, such as installing the described system on an existing server or mobile device.

It should also be appreciated that in the foregoing description of embodiments of the present application, various features are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure aiding in the understanding of at least one embodiment of the invention. However, this method of disclosure is not intended to require more features than are expressly recited in the claims. Indeed, the embodiments may be characterized as having less than all of the features of a single embodiment disclosed above.

Claims

1. A big data based data visualization method, the method comprising:

2. The method of claim 1, further comprising:

3. The method according to claim 2, wherein the identifying at least one third data traffic type from the first sample data set according to the first data source information corresponding to the plurality of second to-be-processed data subsets included in the first sample data set comprises:

4. The method according to any of claims 1 to 3, wherein before inputting each first subset of data to be processed into the data traffic classification network and obtaining the first data traffic type corresponding to each first subset of data to be processed in the target data set, the method further comprises:

5. The method according to claim 1, wherein identifying the visual requirement information corresponding to each first data service type in the target data set to obtain a visual identification result of the visual requirement information corresponding to each first data service type in the target data set includes:

6. The method of claim 5, wherein determining the visualization demand identification indicator of the candidate data visualization scenario based on the visualization compatibility information, the data visualization indicator, and a candidate visualization scenario feature corresponding to the candidate data visualization scenario comprises:

7. The method of claim 6, wherein the obtaining of the data visualization index and visualization compatibility information of the candidate data visualization scenario comprises:

8. The method according to any one of claims 1 to 7, wherein based on the visual recognition result, invoking a different visual processing thread to perform visual processing on each first to-be-processed data subset comprises:

9. The big data cloud server is characterized by comprising a processing engine, a network module and a memory; the processing engine and the memory communicate through the network module, the processing engine reading a computer program from the memory and operating to perform the method of any of claims 1-8.

10. A computer-readable signal medium, on which a computer program is stored which, when executed, implements the method of any one of claims 1-8.