CN116521738B - Data processing method, system, electronic device and storage medium - Google Patents

Data processing method, system, electronic device and storage medium Download PDF

Info

Publication number
CN116521738B
CN116521738B CN202310506289.3A CN202310506289A CN116521738B CN 116521738 B CN116521738 B CN 116521738B CN 202310506289 A CN202310506289 A CN 202310506289A CN 116521738 B CN116521738 B CN 116521738B
Authority
CN
China
Prior art keywords
data
dictionary
request
signal
same
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310506289.3A
Other languages
Chinese (zh)
Other versions
CN116521738A (en
Inventor
赵乾宽
李继威
祝诗恩
严玮
马骏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zero Beam Technology Co ltd
Original Assignee
Zero Beam Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zero Beam Technology Co ltd filed Critical Zero Beam Technology Co ltd
Priority to CN202310506289.3A priority Critical patent/CN116521738B/en
Publication of CN116521738A publication Critical patent/CN116521738A/en
Application granted granted Critical
Publication of CN116521738B publication Critical patent/CN116521738B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the application provides a data processing method, a system, electronic equipment and a storage medium, wherein the data processing method comprises the following steps: determining a request dictionary corresponding to the data acquisition request according to the data acquisition request; taking a data acquisition time range included in a request dictionary as an index, and carrying out data query in a data dictionary to obtain initial data, wherein the data dictionary is used for storing data in a dictionary form; generating a condition dimension table according to a signal format included in the request dictionary; and associating the condition dimension table with the initial data to obtain output data returned in response to the data acquisition request. Firstly, the data dictionary is used for storing data in a dictionary form, so that the storage space occupation of the stored data is greatly reduced, a condition dimension table is generated through a signal format included in a request dictionary corresponding to a data acquisition request, the condition dimension table is associated with initial data, the data processing efficiency is improved, and the query speed is accelerated.

Description

Data processing method, system, electronic device and storage medium
Technical Field
The present application relates to the field of data processing, and in particular, to a data processing method, system, electronic device, and storage medium.
Background
The storage, processing and extraction of the large data of the Internet of vehicles are accurate and efficient, and the like, are precondition guarantee for quickly searching and positioning data applications such as vehicle problems, vehicle data analysis, data mining, vehicle simulation and the like, and along with the quick development of the intelligent automobile industry, the technology for conveniently and efficiently searching and extracting the appointed automobile data is more and more urgent needed technology of each automobile enterprise. However, due to the problems of complex signal storage format, large occupied space, slow and low-efficiency query process and the like of the current vehicle, the required data cannot be obtained quickly and effectively, so that the current traditional signal storage, query and extraction technology cannot meet the production and use requirements of the current intelligent automobile industry, and therefore, how to obtain the required data efficiently faces to mass data of tens of thousands of vehicles is an urgent problem to be solved.
Disclosure of Invention
Accordingly, one of the technical problems to be solved by the embodiments of the present application is to provide a data processing method, system, electronic device and storage medium for overcoming or avoiding the above-mentioned problems.
Based on the above object, the present application provides a data processing method, including: determining a request dictionary corresponding to a data acquisition request according to the data acquisition request; taking a data acquisition time range included in the request dictionary as an index, and carrying out data query in a data dictionary to obtain initial data, wherein the data dictionary is used for storing data in a dictionary form; generating a condition dimension table according to a signal format included in the request dictionary; and associating the condition dimension table with the initial data to obtain output data returned in response to the data acquisition request.
Optionally, the request dictionary further includes a signal type, the signal format is a signal format corresponding to the signal type, and the method further includes: determining an output mode of the output data according to the signal type; if the output mode is the first mode, continuing to execute the step of associating the condition dimension table with the initial data to obtain output data returned in response to the data acquisition request; and if the output mode is a second mode, aggregating the signal names and the signal values of the same data attribute in the initial data, and continuously executing the step of associating the condition dimension table with the initial data according to the aggregated initial data to obtain the output data returned in response to the data acquisition request.
Optionally, the step of performing data query in the data dictionary with the data collection time range included in the request dictionary as an index to obtain initial data includes: judging whether the history table and the real-time table need to be subjected to sub-table query operation according to the data acquisition time range included in the request dictionary; if so, taking the data acquisition time range as an index, looking up a table in the history table to obtain history data, and looking up a table in the real-time table to obtain real-time data; and de-duplicating the queried real-time data, and combining the historical data with the de-duplicated real-time data to obtain the initial data.
Optionally, the associating the condition dimension table with the initial data to obtain output data returned in response to the data acquisition request includes: and correlating the condition dimension table with the initial data, and cleaning the correlated initial data according to a preset data cleaning rule to obtain the output data.
Optionally, the data cleansing rule includes at least one of: if two pieces of data at the same moment have the same rule, the same channel name, the same acquisition time, the same event number and the same signal value corresponding to the same signal name, one piece of data is reserved; and deleting the two pieces of data when the two pieces of data at the same moment have the same rule, the same channel name, the same acquisition time and the same event number and the signal values corresponding to the same signal name are different.
Optionally, the associating the condition dimension table with the initial data to obtain output data returned in response to the data acquisition request includes: broadcasting the condition dimension table to a plurality of distributed nodes, and carrying out parallel association operation on the initial data according to the condition dimension table by the distributed nodes to obtain a plurality of groups of association data; and merging the associated data to obtain the output data.
Optionally, the data dictionary includes at least one of the following corresponding to the data: vehicle number, acquisition time, acquisition relative time, vehicle signal; or the request dictionary includes an identification value and a request body, the request body including: data acquisition time range, vehicle number, signal type and signal format corresponding to the signal type.
The embodiment of the application also provides a data processing system, which comprises: the request module is used for determining a request dictionary corresponding to the data acquisition request according to the data acquisition request; the query module is used for carrying out data query in a data dictionary by taking a data acquisition time range included in the request dictionary as an index to obtain initial data, and the data dictionary is used for storing data in a dictionary form; the generating module is used for generating a condition dimension table according to the signal format included in the request dictionary; and the output module is used for associating the condition dimension table with the initial data to obtain output data returned in response to the data acquisition request.
The embodiment of the application also provides electronic equipment, which comprises: the device comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete communication with each other through the communication bus; the memory is used for storing at least one executable instruction, and the executable instruction enables the processor to execute the operation corresponding to the method.
The embodiment of the application also provides a computer storage medium, on which a computer program is stored, which when being executed by a processor, implements the method described in the above embodiment.
According to the technical scheme, a request dictionary corresponding to the data acquisition request is determined according to the data acquisition request; taking a data acquisition time range included in a request dictionary as an index, and carrying out data query in a data dictionary to obtain initial data, wherein the data dictionary is used for storing data in a dictionary form; generating a condition dimension table according to a signal format included in the request dictionary; and associating the condition dimension table with the initial data to obtain output data returned in response to the data acquisition request. The method has the advantages that firstly, the data dictionary is used for storing data in a dictionary form, the data storage format is simplified, the storage space occupation of the stored data is greatly reduced, the condition dimension table is generated through the signal format included in the request dictionary corresponding to the data acquisition request, the condition dimension table is associated with the initial data, the query speed is accelerated, the data processing efficiency is improved, the required data can be quickly and effectively acquired, and the search performance is greatly improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the embodiments of the present application, and other drawings may be obtained according to these drawings for a person having ordinary skill in the art.
FIG. 1 is a flow chart of a data processing method according to a first embodiment of the application;
FIG. 2 is a diagram illustrating a data request method according to a first embodiment of the present application;
FIG. 3 is a flow chart of a data processing method according to a second embodiment of the present application;
FIG. 4 is a flow chart of a data processing method according to a third embodiment of the present application;
FIG. 5 is a diagram illustrating a data output method according to a third embodiment of the present application;
FIG. 6 is a flowchart of a data distributed processing method according to a fourth embodiment of the present application;
FIG. 7 is a diagram illustrating a data association method according to a fourth embodiment of the present application;
FIG. 8 is a diagram of a fifth embodiment of a data processing system;
fig. 9 is a schematic structural diagram of an electronic device according to a sixth embodiment of the present application.
Detailed Description
In order to better understand the technical solutions in the embodiments of the present application, the following description will clearly and completely describe the technical solutions in the embodiments of the present application with reference to the accompanying drawings in the embodiments of the present application, and it is obvious that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which are derived by a person skilled in the art based on the embodiments of the present application, shall fall within the scope of protection of the embodiments of the present application.
It should be noted that any technical solution for implementing the embodiments of the present application does not necessarily need to achieve all the advantages mentioned above at the same time. The implementation of the embodiments of the present application will be further described below with reference to the accompanying drawings.
Example 1
Fig. 1 is a flowchart of a data processing method according to an embodiment of the present application, as shown in fig. 1, the data processing method includes the following steps:
step S101, determining a request dictionary corresponding to a data acquisition request according to the data acquisition request;
Specifically, the request conditions in the data acquisition request include a signal requirement of the request, an output mode of the data after the completion of the request, and the like, the output mode includes an output path and an output file configuration, and the output file configuration predefines a file format of data output, for example, the output file format can output the data according to the line number, the data acquisition time, the signal type, the total number of output files, and the like of single file data, so that the data user can use the output data further conveniently. The request dictionary is the signal requirement of the user stored in the dictionary form, and the request condition is stored in the dictionary form, so that the coupling degree of the data main body and the data attribute can be reduced, the storage space is greatly saved, and the service logic is simplified to a certain extent.
It should be noted that, when a user initiates a data acquisition request, a signal requirement of the data acquisition request may be stored in a data table in the form of a request dictionary, the request dictionary may be stored in any row (column) of the data table, and an identification value corresponding to the request dictionary is generated according to the request dictionary, so when data is acquired, the corresponding request dictionary may be matched in the data table according to the identification value, thereby acquiring data required by the user. The signal requirements of the data acquisition request are stored in the data table in the form of a request dictionary, so that the data acquisition request can be recorded and stored on the one hand, and on the other hand, the two processes of storing the signal requirements in the data base in the form of the request dictionary and acquiring the corresponding request dictionary in the data base according to the identification value are divided into two independent programs, so that fault searching and maintenance can be facilitated, data transmission of complex data is avoided, and the stability of the system is ensured.
Optionally, the request dictionary includes an identification value and a request body, the request body including: data acquisition time range, vehicle number, signal type, and signal format corresponding to signal type.
Specifically, each request dictionary includes an identification value for identifying the attribute of the request dictionary in the data acquisition request of the user, by which the unique request dictionary can be determined from the database, for example, the identification value may be PK (Primary Key) values, which are primary key values in the database, which must uniquely identify each piece of data in the data table, i.e., each request dictionary.
The request body is a signal condition in a signal requirement of a user, and comprises: the system comprises a data acquisition time range, a vehicle number, a signal type and a signal format corresponding to the signal type, wherein the vehicle number CAN be an identification code of a vehicle, the signal type represents the type of a signal, such as a CAN signal, a LIN signal, a SOME IP signal, a DCTP signal and the like, and the signal format is a format condition which needs to be met by the signal under the corresponding signal type.
For example, fig. 2 is a schematic diagram of a data request method according to an embodiment of the present application, as shown in fig. 2, a database obtains a request dictionary of a user according to a request unique PK value, that is, a data request in the figure, where the request dictionary may be in a form of a whole character string, and it is inconvenient for some programs to directly use the request dictionary, so in order to conveniently extract each object in the request dictionary, the request dictionary may be subjected to objectification to save states of various objects in a memory of the request dictionary, and the saved states of the objects may be read out again. Meanwhile, a file is independently generated from the request dictionary of the request, the file is output to the position under the root path of the OSS (Object Storage Service ) of the data output as part of the output data, and is stored as the request file, and the root path record of the file is stored.
Step S102, taking a data acquisition time range included in a request dictionary as an index, and carrying out data query in a data dictionary to obtain initial data, wherein the data dictionary is used for storing data in a dictionary form;
specifically, the data dictionary is signal data stored in a dictionary form, and when the acquired signal data is stored in the database during data acquisition, the signal data can be stored in the database in the dictionary form. The method comprises the steps of preprocessing collected signal data, respectively carrying out dimension aggregation on different data dimensions, aggregating signals with the same dimensions, and storing the aggregated signals into a field in a key value pair form, wherein the data dimension refers to signal collection time, a vehicle number, a signal type and other signal attributes, and the key value pair form can be a signal name-signal value. For example, there is a set of vehicle signal data, the vehicle number, signal acquisition time and signal type of which are the same, and only the signal name and signal value are different, then the vehicle number, signal acquisition time and signal type of these data can be uniformly recorded, and the signal name and signal value can be individually recorded as the form of "signal name-signal value". The storage of the data in the dictionary form can save storage space, is beneficial to the processing stages of data extraction, data association and the like, avoids repeated matching of multiple dimensions, accelerates the data processing speed and saves the processing time.
Specifically, the data in the database are usually ordered according to the time sequence, and the data collected at different collection times may exist in different data tables, so that the data storage position information to be collected can be quickly queried according to the collection time index of the data, and the information is not easy to miss.
Optionally, judging whether the history table and the real-time table need to be subjected to sub-table query operation according to the data acquisition time range included in the request dictionary; if so, taking the data acquisition time range as an index, looking up a table in a history table to obtain history data, and looking up a table in a real-time table to obtain real-time data; and de-duplicating the queried real-time data, and combining the historical data with the de-duplicated real-time data to obtain initial data.
Specifically, the data in the database may be classified according to the data collection time, and may be classified into two types, i.e., historical data and real-time data, where the historical data is data after preliminary deduplication, processing and filtering have been completed, and the historical data may be stored in the historical table separately according to a preset rule, and the real-time data refers to data that has not been subjected to preliminary processing or has not been completed yet and may be stored in the real-time table, where the data collection time for classifying the historical data and the real-time data is determined according to the actual situation, which is not limited in this embodiment.
Specifically, according to the data acquisition time range included in the request dictionary and the dividing time of the current history table and the real-time table, whether the data storage position can be found quickly by carrying out the sub-table query operation is judged, and meanwhile, for the queried data which is the history data, the steps of subsequent preliminary de-duplication, processing, filtering and the like can be omitted.
In this embodiment, whether the operation of sub-table query is required for the history table and the real-time table is determined according to the data collection time range included in the request dictionary, if so, the data collection time range is used as an index, the history table and the real-time table are queried respectively, the queried real-time data is de-duplicated, the history data and the de-duplicated real-time data are combined to obtain the initial data, the sub-table query is performed, so that the required data position can be queried rapidly, the data processing speed is increased, and for the queried data which is the history data, the steps of subsequent preliminary de-duplication, processing and filtering can be omitted, thereby saving the query and processing time and improving the data query efficiency.
Optionally, the data dictionary includes at least one of the following for data correspondence: vehicle number, acquisition time, acquisition relative time, vehicle signal.
In particular, the acquisition relative time takes into account time deviations in the data acquisition, such as delay time from the sensor to the controller. Therefore, in the process of querying data by taking the data acquisition time range as an index, the time deviation needs to be considered, and the acquisition time and the acquisition relative time are both in the data acquisition time range and used as the query basis of the data.
Step S103, generating a condition dimension table according to the signal format included in the request dictionary;
Specifically, the condition dimension table is a dimension table generated according to the request dictionary, and the signal format is the format of the signal of the request in the request dictionary. Different signal formats generally correspond to different acquisition protocols, different condition dimensional tables can be generated according to the different signal formats, and specific ranks in the condition dimensional tables are planned according to signal attributes in the same signal format.
Step S104, the condition dimension table is associated with the initial data to obtain output data returned in response to the data acquisition request.
Specifically, the condition dimension table is associated with the initial data, the signal names and the signal values in the condition dimension table are in one-to-one correspondence with the signal names and the signal values in the initial data, output data returned in response to the data acquisition request is obtained, and the output data is output according to the output mode of the data preset by a user. When the data is output, the request dictionary and the output data stored in the data table can be stored in a folder for output, so that the data user can use the output data further conveniently.
As can be seen from the above technical solutions, in this embodiment, a request dictionary corresponding to a data acquisition request is determined according to the data acquisition request; taking a data acquisition time range included in a request dictionary as an index, and carrying out data query in a data dictionary to obtain initial data, wherein the data dictionary is used for storing data in a dictionary form; generating a condition dimension table according to a signal format included in the request dictionary; and associating the condition dimension table with the initial data to obtain output data returned in response to the data acquisition request. The method has the advantages that firstly, the data dictionary is used for storing data in a dictionary form, the data storage format is simplified, the storage space occupation of the stored data is greatly reduced, the condition dimension table is generated through the signal format included in the request dictionary corresponding to the data acquisition request, the condition dimension table is associated with the initial data, the query speed is accelerated, the data processing efficiency is improved, the required data can be quickly and effectively acquired, and the search performance is greatly improved.
Example two
Fig. 3 is a flowchart of a data processing method according to an embodiment of the present application, which is different from the above embodiment, in that, in order to improve data processing efficiency, data is aggregated, as shown in fig. 3, the data processing method includes the following steps:
Step S301, determining a request dictionary corresponding to a data acquisition request according to the data acquisition request, wherein the request dictionary comprises signal types, and the signal formats are signal formats corresponding to the signal types;
step S302, taking a data acquisition time range included in a request dictionary as an index, and carrying out data query in a data dictionary to obtain initial data, wherein the data dictionary is used for storing data in a dictionary form;
step S303, generating a condition dimension table according to the signal format included in the request dictionary;
Step S304, determining an output mode of output data according to the signal type;
Specifically, different output modes of outputting data CAN be adopted according to the complexity of the signal types, for some data of complex signal types, such as CAN signals, LIN signals and the like, the signal format is complex and the signal quantity is large, and the operation is complex and the data volume is large when the subsequent initial data is associated with the condition dimension table, so that the data CAN be aggregated. For simpler signal types, such as SOME IP signals, DCTP signals and the like, the initial data can be directly associated with the condition dimension table and then output, and the signal format and the subsequent association operation are simpler, so that the aggregation operation can not be carried out on the initial data, the operation steps are reduced, and the aggregation cost can be saved.
It should be noted that whether the signal type is complex is defined according to the actual situation, and the limitation is not limited herein.
Step S3041, if the output mode is the first mode, the condition maintenance table is associated with the initial data to obtain the output data returned in response to the data acquisition request;
Specifically, if the signal type of the signal is simpler, the first output mode may be adopted, and the condition dimension table and the initial data are associated to output the data.
Step S3042, if the output mode is the second mode, aggregating the signal names and the signal values of the same data attribute in the initial data, and associating the condition dimension table with the initial data according to the aggregated initial data to obtain the output data returned in response to the data acquisition request.
Specifically, if the data form of the signal is complex, a second output mode may be considered, in which the initial data is aggregated according to attribute information thereof, where the attribute information is a data attribute other than a signal name, including a vehicle number, an acquisition time, an acquisition script number, and the like, and the data with the same attribute may be aggregated into one line, so as to reduce the overall data volume, for example, if in the data table of the initial data, the acquisition time, the vehicle number, and the like of 3 lines of data are the same, the data may be aggregated into "acquisition time, vehicle number, …, { signal name 1: signal value 1, signal name 2: signal value 2, signal name 3: the form of the signal value 3} "is stored as one row, and the initial data after aggregation operation can avoid repeated matching when the association operation of the condition dimension table and the initial data is carried out later, so that the time cost during association is saved and the data burden is reduced.
As can be seen from the above technical solution, in this embodiment, a request dictionary corresponding to a data acquisition request is determined according to the data acquisition request, then a data acquisition time range included in the request dictionary is used as an index, data query is performed in the data dictionary to obtain initial data, a condition dimension table is generated according to a signal format included in the request dictionary, an output mode of output data is determined according to a signal type, and if the output mode is a first mode, the condition dimension table is associated with the initial data to obtain the output data returned in response to the data acquisition request; if the output mode is the second mode, signal names and signal values with the same data attribute in the initial data are aggregated, and the condition dimension table is associated with the initial data according to the aggregated initial data, so that the output data returned in response to the data acquisition request is obtained. According to the embodiment, the data output mode is determined through the signal type, complex signals are required to be aggregated, so that the data volume of initial data is reduced, the storage space is saved, the data burden is reduced, repeated matching of the follow-up condition dimension table and the initial data in the association operation is avoided, and the time cost in association is saved.
Example III
Fig. 4 is a flowchart of a data processing method according to an embodiment of the present application, which is different from the above embodiment in that the output data is cleaned, as shown in fig. 4, and the data processing method includes the following steps:
step S401, determining a request dictionary corresponding to the data acquisition request according to the data acquisition request;
Step S402, taking a data acquisition time range included in a request dictionary as an index, and carrying out data query in a data dictionary to obtain initial data, wherein the data dictionary is used for storing data in a dictionary form;
step S403, generating a condition dimension table according to the signal format included in the request dictionary;
And step S404, associating the condition dimension table with the initial data, and cleaning the associated initial data according to a preset data cleaning rule to obtain output data.
Specifically, after the condition maintenance table is associated with the initial data, many erroneous data such as dirty data, problematic data, incomplete data and the like may be generated, where the data may be in an illegal data format, or there is irregular encoding and ambiguous business logic in the source system, where the data is meaningless to actual business, and may cause data to be unrealistic and inconsistent, and further, repeated and other illegal data may cause abnormal system behavior, sometimes may cause serious faults, even some erroneous data without exposure may cause unpredictable fatal errors, and the hazard may be quite large. Therefore, the data needs to be cleaned before the output data is obtained. And meanwhile, before the data is cleaned, the associated data can be filtered again according to the signal requirement in the user data request so as to ensure the accuracy of the output data.
Optionally, if two pieces of data at the same time have the same rule, the same channel name, the same acquisition time, the same event number and the same signal value corresponding to the same signal name, one piece of data is reserved; and deleting the two pieces of data when the two pieces of data at the same moment have the same rule, the same channel name, the same acquisition time and the same event number and the signal values corresponding to the same signal name are different.
Specifically, the rule herein refers to a conversion rule of a signal value, for example, converting a vehicle speed signal value into a binary form in a specific data rule, or a certain switching state is identified with a specific number "0" or "1". If two pieces of data at the same time have the same rule, the same channel name, the same acquisition time, the same event number and the same signal value corresponding to the same signal name, the data can be judged as dirty data, the dirty data refers to data which does not meet the requirements and cannot be directly subjected to corresponding analysis, and only one piece of data can be stored. If two pieces of data at the same time have the same rule, the same channel name, the same acquisition time and the same event number, and the signal values corresponding to the same signal name are different, the two pieces of data can be judged to be problem data, and in order to avoid unnecessary loss caused by subsequent use of the data, the data can be deleted directly. In addition, when a plurality of inquiry signals at the same moment belong to the same rule, channels are the same, acquisition time stamps are the same, event numbers are the same, the inquiry signals can be judged to be incomplete data when the inquiry signals do not appear at the same time, and the data can be removed when a file is output.
In this embodiment, according to the rule of data, the channel name, the acquisition time, the event number, the signal name, the signal value are the same, and the like, as the data cleaning basis, the data cleaning rule is set, so that consideration is relatively comprehensive, erroneous data can be well filtered, and interference of the erroneous data on subsequent work of the data is avoided.
It should be noted that, the data cleaning rule needs to be set in advance according to the actual situation, and the data cleaning rule is not limited here.
Specifically, in order to facilitate the user to further view and analyze the output data, after the associated initial data is cleaned, the data can be subjected to slicing processing according to an output mode required by the user, file output can be performed according to the number of vehicles, the number of rows of single file data, data acquisition time, signal types, total number of output files and the like, and meanwhile, information such as the signal types, the number of vehicles and the like can be correspondingly selected to serve as directory names with hierarchical relations.
Fig. 5 is a schematic diagram of a data output method according to an embodiment of the present application, as shown in fig. 5, after washing associated initial data, performing slicing processing on the data according to an output mode required by a user to obtain sliced data, and outputting the sliced data in a form of a CSV (command-SEPARATED VALUES) file, that is, a text file, and simultaneously outputting a request dictionary stored in a form of a JSON (JavaScript Object Notation) file under an OSS (Object Storage Service ) root path together.
As can be seen from the above technical solution, in this embodiment, a request dictionary corresponding to a data acquisition request is determined according to the data acquisition request, then a data acquisition time range included in the request dictionary is used as an index, data query is performed in the data dictionary to obtain initial data, a condition dimension table is generated according to a signal format included in the request dictionary, the condition dimension table is associated with the initial data, and the associated initial data is cleaned according to a preset data cleaning rule to obtain output data. According to the method and the device for processing the data, the initial data are cleaned through the preset cleaning rules, so that error data can be filtered well, interference of the error data to follow-up work of the data is avoided, and safety of output data is guaranteed.
Example IV
Fig. 6 is a flowchart of a data distributed processing method according to an embodiment of the present application, as shown in fig. 6, the distributed data processing method includes the following steps:
Step S601, determining a request dictionary corresponding to the data acquisition request according to the data acquisition request;
Step S602, taking a data acquisition time range included in a request dictionary as an index, and carrying out data query in a data dictionary to obtain initial data, wherein the data dictionary is used for storing data in a dictionary form;
step S603, generating a condition dimension table according to the signal format included in the request dictionary;
Step S604, broadcasting a condition dimension table to a plurality of distributed nodes, and performing parallel association operation on initial data according to the condition dimension table by the distributed nodes to obtain a plurality of groups of association data;
Specifically, before the condition dimension table is associated with the initial data, the condition dimension table can be broadcast to a plurality of distributed nodes, the process can be realized by adopting Spark SQL, which is a module for processing structured data by Spark and can be used as a distributed SQL engine, when the data processing mode of broadcasting is adopted for association, firstly, the data information of the condition dimension table is collected on a Spark driver, then, the data information of the condition dimension table is broadcast and distributed on each executor by using the driver to execute subsequent parallel association operation, and the data processing mode of broadcasting is adopted, so that the data processing speed can be accelerated, and meanwhile, the process of subsequent data parallel association is also promoted, and the data query efficiency is improved.
Fig. 7 is a schematic diagram of a data association method according to an embodiment of the present application, where fig. 7 illustrates a data association method by taking a request dictionary including signal types as complex signal types as an example, and as shown in fig. 7, signal names and signal values of the same data attribute are aggregated together to form first aggregated data according to a data attribute of initial data, and in order to prevent incomplete aggregation, the first aggregated data may be processed for a second time and aggregated again to form second aggregated data. And generating a condition dimension table according to a signal format included in the request dictionary, broadcasting the condition dimension table to a plurality of distributed nodes, performing parallel association operation on the aggregated initial data according to the condition dimension table by the distributed nodes to obtain a plurality of groups of associated data, and cleaning the associated initial data according to a preset data cleaning rule to obtain cleaned data.
Step S605, merging the associated data to obtain output data.
According to the technical scheme, a request dictionary corresponding to the data acquisition request is determined according to the data acquisition request, the data acquisition time range included in the request dictionary is taken as an index, data inquiry is carried out in the data dictionary, initial data are obtained, a condition dimension table is generated according to a signal format included in the request dictionary, the condition dimension table is broadcast to a plurality of distributed nodes, parallel association operation is carried out on the initial data according to the condition dimension table through the distributed nodes, multiple groups of association data are obtained, and the association data are combined, so that output data are obtained. The plurality of distributed nodes perform parallel association operation on the initial data according to the condition dimension table, so that the data processing speed can be improved, and the data query efficiency can be improved.
Example five
FIG. 8 is a schematic diagram of a data processing system according to an embodiment of the present application, as shown in FIG. 8, including:
A request module 801, configured to determine a request dictionary corresponding to the data acquisition request according to the data acquisition request;
A query module 802, configured to query data in a data dictionary with a data acquisition time range included in the request dictionary as an index, to obtain initial data, where the data dictionary is used to store data in a dictionary form;
a generating module 803 for generating a condition dimension table according to the signal format included in the request dictionary;
And the output module 804 is configured to associate the condition dimension table with the initial data, and obtain output data returned in response to the data acquisition request.
According to the technical scheme, a request dictionary corresponding to the data acquisition request is determined according to the data acquisition request; taking a data acquisition time range included in a request dictionary as an index, and carrying out data query in a data dictionary to obtain initial data, wherein the data dictionary is used for storing data in a dictionary form; generating a condition dimension table according to a signal format included in the request dictionary; and associating the condition dimension table with the initial data to obtain output data returned in response to the data acquisition request. The method has the advantages that firstly, the data dictionary is used for storing data in a dictionary form, the data storage format is simplified, the storage space occupation of the stored data is greatly reduced, the condition dimension table is generated through the signal format included in the request dictionary corresponding to the data acquisition request, the condition dimension table is associated with the initial data, the query speed is accelerated, the data processing efficiency is improved, the required data can be quickly and effectively acquired, and the search performance is greatly improved.
Example six
Fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present application, and the embodiment of the present application is not limited to the specific implementation of the electronic device.
The electronic device may include: a processor 902, a communication interface (Communications Interface), a memory 906, and a communication bus 908.
Wherein:
processor 902, communication interface 904, and memory 906 communicate with each other via a communication bus 908.
A communication interface 904 for communicating with other electronic devices or servers.
The processor 902 is configured to execute the program 910, and may specifically perform relevant steps in the foregoing method embodiments.
In particular, the program 910 may include program code including computer-operating instructions.
The processor 902 may be a CPU, or an Application-specific integrated Circuit ASIC (Application SPECIFIC INTEGRATED circuits), or one or more integrated circuits configured to implement embodiments of the present application. The one or more processors comprised by the smart device may be the same type of processor, such as one or more CPUs; but may also be different types of processors such as one or more CPUs and one or more ASICs.
A memory 906 for storing a program 910. Memory 906 may comprise high-speed RAM memory or may also include non-volatile memory (non-volatile memory), such as at least one disk memory.
The program 910 may be specifically configured to cause the processor 902 to perform operations corresponding to the methods described in any of the foregoing method embodiments.
The specific implementation of each step in the procedure 910 may refer to the corresponding steps and corresponding descriptions in the units in the above method embodiment, and have corresponding beneficial effects, which are not described herein. It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the apparatus and modules described above may refer to corresponding procedure descriptions in the foregoing method embodiments, which are not repeated herein.
The embodiment of the application also provides a computer storage medium, on which a computer program is stored, the program being executed by a processor to implement operations corresponding to any one of the above-mentioned method embodiments.
It should be noted that, according to implementation requirements, each component/step described in the embodiments of the present application may be split into more components/steps, or two or more components/steps or part of operations of the components/steps may be combined into new components/steps, so as to achieve the objects of the embodiments of the present application.
The methods according to embodiments of the present application described above may be implemented in hardware, firmware, or as software or computer code storable in a recording medium such as CD, ROM, RAM, a floppy disk, a hard disk, or a magneto-optical disk, or as computer code originally stored in a remote recording medium or a non-transitory machine-readable medium and to be stored in a local recording medium downloaded through a network, so that the methods described herein may be stored on such software processes on a recording medium using a general purpose computer, a special purpose processor, or programmable or dedicated hardware such as an ASIC or FPGA. It is understood that a computer, processor, microprocessor controller, or programmable hardware includes a memory component (e.g., RAM, ROM, flash memory, etc.) that can store or receive software or computer code that, when accessed and executed by the computer, processor, or hardware, performs the methods described herein. Furthermore, when a general purpose computer accesses code for implementing the methods illustrated herein, execution of the code converts the general purpose computer into a special purpose computer for performing the methods illustrated herein.
Those of ordinary skill in the art will appreciate that the elements and method steps of the examples described in connection with the embodiments disclosed herein can be implemented as electronic hardware, or as a combination of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the embodiments of the present application.
The above embodiments are only for illustrating the embodiments of the present application, but not for limiting the embodiments of the present application, and various changes and modifications may be made by one skilled in the relevant art without departing from the spirit and scope of the embodiments of the present application, so that all equivalent technical solutions also fall within the scope of the embodiments of the present application, and the scope of the embodiments of the present application should be defined by the claims.

Claims (10)

1. A method of data processing, comprising:
Determining a request dictionary corresponding to a data acquisition request according to the data acquisition request, wherein the request dictionary comprises an identification value and a request body, and the request body comprises: the method comprises the steps of data acquisition time range, vehicle number, signal type and signal format corresponding to the signal type;
taking a data acquisition time range included in the request dictionary as an index, and carrying out data query in a data dictionary to obtain initial data, wherein the data dictionary is used for storing data in a dictionary form;
generating a condition dimension table according to a signal format included in the request dictionary, including: generating different condition dimension tables according to different signal formats, and planning specific rows and columns in the condition dimension tables according to signal attributes in the same signal format, wherein the condition dimension tables are dimension tables generated according to the request dictionary, and the signal formats are formats of signals requested in the request dictionary;
And associating the condition dimension table with the initial data to obtain output data returned in response to the data acquisition request.
2. The method of claim 1, wherein the request dictionary further comprises a signal type, the signal format being a signal format corresponding to the signal type, the method further comprising:
Determining an output mode of the output data according to the signal type;
If the output mode is the first mode, continuing to execute the step of associating the condition dimension table with the initial data to obtain output data returned in response to the data acquisition request;
And if the output mode is a second mode, aggregating the signal names and the signal values of the same data attribute in the initial data, and continuously executing the step of associating the condition dimension table with the initial data according to the aggregated initial data to obtain the output data returned in response to the data acquisition request.
3. The method according to claim 1, wherein the performing data query in the data dictionary with the data collection time range included in the request dictionary as an index to obtain initial data includes:
judging whether the history table and the real-time table need to be subjected to sub-table query operation according to the data acquisition time range included in the request dictionary;
If so, taking the data acquisition time range as an index, looking up a table in the history table to obtain history data, and looking up a table in the real-time table to obtain real-time data;
and de-duplicating the queried real-time data, and combining the historical data with the de-duplicated real-time data to obtain the initial data.
4. The method of claim 1, wherein associating the condition dimensional table with the initial data results in output data returned in response to the data acquisition request, comprising:
And correlating the condition dimension table with the initial data, and cleaning the correlated initial data according to a preset data cleaning rule to obtain the output data.
5. The method of claim 4, wherein the data cleansing rules comprise at least one of:
If two pieces of data at the same moment have the same rule, the same channel name, the same acquisition time, the same event number and the same signal value corresponding to the same signal name, one piece of data is reserved;
And deleting the two pieces of data when the two pieces of data at the same moment have the same rule, the same channel name, the same acquisition time and the same event number and the signal values corresponding to the same signal name are different.
6. The method of claim 1, wherein associating the condition dimensional table with the initial data results in output data returned in response to the data acquisition request, comprising:
broadcasting the condition dimension table to a plurality of distributed nodes, and carrying out parallel association operation on the initial data according to the condition dimension table by the distributed nodes to obtain a plurality of groups of association data;
and merging the associated data to obtain the output data.
7. The method of claim 1, wherein the data dictionary comprises at least one of the following for data correspondence: vehicle number, acquisition time, acquisition relative time, vehicle signal.
8. A data processing system, comprising:
The request module is used for determining a request dictionary corresponding to the data acquisition request according to the data acquisition request, wherein the request dictionary comprises an identification value and a request body, and the request body comprises: the method comprises the steps of data acquisition time range, vehicle number, signal type and signal format corresponding to the signal type;
The query module is used for carrying out data query in a data dictionary by taking a data acquisition time range included in the request dictionary as an index to obtain initial data, and the data dictionary is used for storing data in a dictionary form;
A generating module, configured to generate a condition dimension table according to a signal format included in the request dictionary, including: generating different condition dimension tables according to different signal formats, and planning specific rows and columns in the condition dimension tables according to signal attributes in the same signal format, wherein the condition dimension tables are dimension tables generated according to the request dictionary, and the signal formats are formats of signals requested in the request dictionary;
And the output module is used for associating the condition dimension table with the initial data to obtain output data returned in response to the data acquisition request.
9. An electronic device, comprising: the device comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete communication with each other through the communication bus;
The memory is configured to store at least one executable instruction that causes the processor to perform operations corresponding to the method of any one of claims 1-7.
10. A computer storage medium, characterized in that the computer storage medium has stored thereon a computer program which, when executed by a processor, implements the method according to any of claims 1-7.
CN202310506289.3A 2023-05-06 2023-05-06 Data processing method, system, electronic device and storage medium Active CN116521738B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310506289.3A CN116521738B (en) 2023-05-06 2023-05-06 Data processing method, system, electronic device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310506289.3A CN116521738B (en) 2023-05-06 2023-05-06 Data processing method, system, electronic device and storage medium

Publications (2)

Publication Number Publication Date
CN116521738A CN116521738A (en) 2023-08-01
CN116521738B true CN116521738B (en) 2024-05-14

Family

ID=87389956

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310506289.3A Active CN116521738B (en) 2023-05-06 2023-05-06 Data processing method, system, electronic device and storage medium

Country Status (1)

Country Link
CN (1) CN116521738B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7127456B1 (en) * 2002-12-05 2006-10-24 Ncr Corp. System and method for logging database queries
CN111475553A (en) * 2020-04-09 2020-07-31 五八有限公司 Data query display method and device, electronic equipment and storage medium
CN112241483A (en) * 2020-08-29 2021-01-19 广元量知汇科技有限公司 Intelligent security monitoring data query method
CN113760966A (en) * 2020-08-03 2021-12-07 北京沃东天骏信息技术有限公司 Data processing method and device based on heterogeneous database system
CN113986947A (en) * 2021-10-22 2022-01-28 深信服科技股份有限公司 Data flow display method, device, equipment and readable storage medium
CN114253925A (en) * 2021-12-01 2022-03-29 北京人大金仓信息技术股份有限公司 Method, server, terminal and electronic device for accessing database logs
CN114327605A (en) * 2022-03-08 2022-04-12 深圳市城市交通规划设计研究中心股份有限公司 Vue-based remote form generation method, computer and storage medium
CN114510908A (en) * 2022-02-23 2022-05-17 平安普惠企业管理有限公司 Data export method and device, computer equipment and storage medium
CN116049509A (en) * 2022-12-07 2023-05-02 深圳市普森斯科技有限公司 Data query method, device, equipment and medium based on regular matching

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10769214B2 (en) * 2016-11-04 2020-09-08 Sap Se Encoding and decoding files for a document store

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7127456B1 (en) * 2002-12-05 2006-10-24 Ncr Corp. System and method for logging database queries
CN111475553A (en) * 2020-04-09 2020-07-31 五八有限公司 Data query display method and device, electronic equipment and storage medium
CN113760966A (en) * 2020-08-03 2021-12-07 北京沃东天骏信息技术有限公司 Data processing method and device based on heterogeneous database system
CN112241483A (en) * 2020-08-29 2021-01-19 广元量知汇科技有限公司 Intelligent security monitoring data query method
CN113986947A (en) * 2021-10-22 2022-01-28 深信服科技股份有限公司 Data flow display method, device, equipment and readable storage medium
CN114253925A (en) * 2021-12-01 2022-03-29 北京人大金仓信息技术股份有限公司 Method, server, terminal and electronic device for accessing database logs
CN114510908A (en) * 2022-02-23 2022-05-17 平安普惠企业管理有限公司 Data export method and device, computer equipment and storage medium
CN114327605A (en) * 2022-03-08 2022-04-12 深圳市城市交通规划设计研究中心股份有限公司 Vue-based remote form generation method, computer and storage medium
CN116049509A (en) * 2022-12-07 2023-05-02 深圳市普森斯科技有限公司 Data query method, device, equipment and medium based on regular matching

Also Published As

Publication number Publication date
CN116521738A (en) 2023-08-01

Similar Documents

Publication Publication Date Title
CN102135938B (en) Software product testing method and system
CN110309125B (en) Data verification method, electronic device and storage medium
CN103593352A (en) Method and device for cleaning mass data
CN111563041B (en) Test case on-demand accurate execution method
CN111190792B (en) Log storage method and device, electronic equipment and readable storage medium
JP6996812B2 (en) How to process data blocks in a distributed database, programs, and devices
CN104636401A (en) Method and device for data rollback of SCADA (supervisory control and data acquisition) system
CN110489317B (en) Cloud system task operation fault diagnosis method and system based on workflow
CN111400288A (en) Data quality inspection method and system
CN107783974B (en) Data processing system and method
CN114153980A (en) Knowledge graph construction method and device, inspection method and storage medium
CN112199935A (en) Data comparison method and device, electronic equipment and computer readable storage medium
CN109902070B (en) WiFi log data-oriented analysis storage search method
CN116521738B (en) Data processing method, system, electronic device and storage medium
CN116680445B (en) Knowledge-graph-based multi-source heterogeneous data fusion method and system for electric power optical communication system
CN109543316B (en) Method for extracting connection relation of different modules of layout
CN116132499A (en) Compression method and device for call chain, computer equipment and storage medium
CN113779030B (en) Enumeration value query method, readable storage medium, and computer program product
CN106776704B (en) Statistical information collection method and device
JP6613706B2 (en) Table design support apparatus, table design support method, and control program
CN112445918A (en) Knowledge graph generation method and device, electronic equipment and storage medium
CN113961637B (en) Database-based data fusion method and system and electronic equipment
CN114637786B (en) Off-line calculation method for vehicle working hours and storage medium
CN111046012B (en) Method and device for extracting inspection log, storage medium and electronic equipment
CN110515913B (en) Log processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant