CN117436444A - Tag-based data processing method, device and computer-readable storage medium - Google Patents

Tag-based data processing method, device and computer-readable storage medium Download PDF

Info

Publication number
CN117436444A
CN117436444A CN202311757112.7A CN202311757112A CN117436444A CN 117436444 A CN117436444 A CN 117436444A CN 202311757112 A CN202311757112 A CN 202311757112A CN 117436444 A CN117436444 A CN 117436444A
Authority
CN
China
Prior art keywords
data
tag
processed
label
data processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311757112.7A
Other languages
Chinese (zh)
Other versions
CN117436444B (en
Inventor
巩怀志
于腾波
张惠玲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Zhicheng Software Technology Service Co ltd
Shenzhen Smart City Technology Development Group Co ltd
Original Assignee
Shenzhen Zhicheng Software Technology Service Co ltd
Shenzhen Smart City Technology Development Group Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Zhicheng Software Technology Service Co ltd, Shenzhen Smart City Technology Development Group Co ltd filed Critical Shenzhen Zhicheng Software Technology Service Co ltd
Priority to CN202311757112.7A priority Critical patent/CN117436444B/en
Publication of CN117436444A publication Critical patent/CN117436444A/en
Application granted granted Critical
Publication of CN117436444B publication Critical patent/CN117436444B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a data processing method and device based on a label and a computer readable storage medium, and belongs to the technical field of big data. The tag-based data processing method comprises the following steps: acquiring a data source, a data attribute and an uploading time stamp of data to be processed; determining a label level to which the data to be processed belongs based on a preset label system according to the data source; determining a target sub-tag corresponding to the data to be processed under the tag level according to the data attribute and the data characteristic corresponding to the uploading time stamp; and determining the data type of the data to be processed in the digital twin system based on the target sub-tag. Through the method, the data in the digital twin system can be finely managed, and the functional module which is to be accessed by the data can be conveniently executed for corresponding simulation tasks.

Description

Tag-based data processing method, device and computer-readable storage medium
Technical Field
The present invention relates to the field of big data, and in particular, to a tag-based data processing method, apparatus, and computer-readable storage medium.
Background
The digital twin is used as a simulation technology, and the mapping of the physical space is completed in the virtual space by fully utilizing the data such as sensor data, operation history data, physical model and the like, so that the real-time supervision and prediction of the physical space of the entity are realized. In a digital twin system, classification processing of various types of data is often involved, and the data is accessed into corresponding functional modules according to corresponding data types.
In the prior art, the classification is usually performed only according to the data source, however, for some unstructured data, such as real numbers, integers, discrete value data or other derivative data uploaded by a certain sensor, the actual meaning of each item is difficult to distinguish in the processing process, so that the actual meaning of each item cannot be finely divided, so as to meet the data processing requirement of a complex digital twin scene.
The foregoing is provided merely for the purpose of facilitating understanding of the technical solutions of the present invention and is not intended to represent an admission that the foregoing is prior art.
Disclosure of Invention
The invention mainly aims to provide a data processing method, data processing equipment and a computer readable storage medium based on a tag, and aims to solve the technical problem that the data in a digital twin system is difficult to divide more finely in the prior processing mode.
To achieve the above object, the present invention provides a data processing method based on a tag, the data processing method based on a tag comprising the steps of:
acquiring a data source, a data attribute and an uploading time stamp of data to be processed;
determining a label level to which the data to be processed belongs based on a preset label system according to the data source;
determining a target sub-tag corresponding to the data to be processed under the tag level according to the data attribute and the data characteristic corresponding to the uploading time stamp;
and determining the data type of the data to be processed in the digital twin system based on the target sub-tag.
Optionally, before the step of determining, according to the data source and based on a preset tag system, a tag level to which the data to be processed belongs, the method further includes:
acquiring a data source of the digital twin system and data processing logic of each functional module;
setting a tag rule according to the data source and the data processing logic;
and creating the preset tag system of the digital twin system according to the tag rule and the data source.
Optionally, the step of determining, according to the data source, a label hierarchy to which the data to be processed belongs based on a preset label system includes:
Creating a corresponding label hierarchical topological structure according to the preset label system;
constructing a hierarchical multi-label classification model based on the label hierarchical topology structure;
extracting data source characteristics of the data to be processed according to the data source;
and predicting the label level to which the data to be processed belongs according to the data source characteristics based on the level multi-label classification model.
Optionally, the step of predicting, based on the hierarchical multi-tag classification model and according to the data source feature, a tag hierarchy to which the data to be processed belongs includes:
matching the data source features with nodes in the label hierarchical topology based on the hierarchical multi-label classification model;
and determining the label level to which the data to be processed belongs according to the probability value or the binary classification result of whether the source characteristic belongs to the node.
Optionally, the step of determining, according to the data attribute and the data feature corresponding to the uploading timestamp, a target sub-tag corresponding to the data to be processed under the tag level includes:
converting the data attribute and the data characteristic corresponding to the uploading timestamp into a corresponding characteristic vector;
After adding a preset identifier to the feature vector and performing filling processing, inputting the feature vector into a tag classification model;
and determining a classification result of the data to be processed under the label level based on the label classification model and the feature vector so as to determine a target sub-label corresponding to the data to be processed.
Optionally, before the step of converting the data feature corresponding to the data attribute and the uploading timestamp into the corresponding feature vector, the method further includes:
setting model evaluation indexes and labeling a training sample set according to a label system of the current digital twin system;
training an initial tag classification model based on the training sample set;
based on the model evaluation index, evaluating the model parameters of the initial label classification model after training;
and when the model parameters meet the model evaluation indexes, storing the model parameters to obtain the label classification model.
Optionally, after the step of determining the data type of the data to be processed in the digital twin system based on the target sub-tag, the method further includes:
determining a verification rule of the data to be processed according to the data type;
Checking the integrity of the data to be processed based on the checking rule;
and after the data to be processed passes the verification, integrating and storing the data to be processed according to the processing requirement of the digital twin system.
Optionally, after the step of determining the data type of the data to be processed in the digital twin system based on the target sub-tag, the method further includes:
based on the data type, accessing the data to be processed into a corresponding functional module in the twin system;
and constructing and executing a simulation task based on the data processing logic of the functional module and the data to be processed.
In addition, to achieve the above object, the present invention also provides a tag-based data processing apparatus including: a memory, a processor, and a tag-based data handler stored on the memory and executable on the processor, the tag-based data handler configured to implement the steps of the tag-based data processing method described above.
In addition, to achieve the above object, the present invention also provides a computer-readable storage medium having stored thereon a tag-based data processing program which, when executed by a processor, implements the steps of the tag-based data processing method as described above.
The embodiment of the invention provides a data processing method based on a label, which comprises the steps of firstly acquiring a data source, a data attribute and an uploading time stamp of data to be processed, and determining a label to which the data to be processed belongs based on a preset label system of a current digital twin system according to the data source; extracting data characteristics of the data to be processed according to the data attributes and the uploading time stamp; according to the similarity between the data characteristics and the sub-tag characteristics under the current tag level, determining a target sub-tag to which the data to be processed belongs, and finally determining the data type to which the data to be processed belongs in the digital twin system through the target sub-tag.
Through the method, the data uploaded to the digital twin system in the physical scene of the entity can be finely managed, and the functional module which is to be accessed by the data can be conveniently executed.
Drawings
FIG. 1 is a flowchart of a first embodiment of a tag-based data processing method according to the present invention;
FIG. 2 is a flowchart of a second embodiment of a tag-based data processing method according to the present invention;
FIG. 3 is a flowchart of a third embodiment of a tag-based data processing method according to the present invention;
Fig. 4 is a schematic diagram of a terminal structure of a hardware running environment according to an embodiment of the present invention.
The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.
Detailed Description
It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
An embodiment of the present invention provides a tag-based data processing method, and referring to fig. 1, fig. 1 is a schematic flow chart of a first embodiment of a tag-based data processing method of the present invention.
In this embodiment, the tag-based data processing method includes:
step S10, acquiring a data source, a data attribute and an uploading time stamp of the data to be processed.
In this embodiment, the data to be processed refers to data uploaded by each sensor, an industrial controller, a mobile terminal, or a monitoring device in an entity physical scene. The digital twin system can be correspondingly constructed by completing mapping in the virtual environment through the data uploaded by each entity object in the entity physical scene, and the full life cycle process of each entity device in the corresponding entity scene can be reflected in the digital twin system. The data format of the data to be processed uploaded by the entity device can be structured data (such as device information, fault data, maintenance records, position information and the like), and can also be unstructured data (such as sensor data, video pictures, communication information and the like). Structured data strictly follows the data format and length specification and can be stored and managed by a database, and unstructured data is data with irregular or incomplete data structure and no predefined data, so that the structured data is inconvenient to directly use the database for storage management, and therefore, the unstructured data uploaded by an entity physical scene is difficult to realize fine management when being accessed into a digital twin system.
The data source of the data to be processed comprises the entity object information of uploading the data to be processed and the mode of collecting the data to be processed of the entity object, such as observation, statistics, recording, measurement and calculation, and the like, and the method can be divided into real-time collection or timing collection and the like. The data attribute refers to a characteristic or property describing the data to be processed, and is often represented by a data field, and is mainly divided into a nominal attribute, a binary attribute, an ordinal attribute, a numerical attribute, a discrete attribute, a continuous attribute and the like, and one piece of data to be processed can contain a plurality of data attributes. The uploading time stamp refers to the time taken from the start of the collection of the data to be processed to the uploading of the data to be processed to the system, the uploading time span of the data to be processed can be calculated through the uploading time stamp, and the time range and the duration of the data to be processed are known, so that the data to be processed can be periodically analyzed, such as the change of the data amount in different time periods in one day, the change of the data in different working days in one week, and the like.
Taking rainfall as data to be processed as an example, the data source is a raindrop sensor, the collection mode of the raindrop sensor is used for timing measurement and calculation, the data collected by the raindrop sensor comprises whether the raindrop sensor rains, the size of the raindrops and the density of the raindrops, the data attribute of the corresponding rainfall comprises binary attribute and numerical attribute, the time interval of uploading the data by the raindrop sensor is influenced by the set collection frequency, and the span covered by the time stamp is generally the inverse of the collection frequency.
In this embodiment, by acquiring the data source, the data attribute and the uploading timestamp of the data to be processed, a more comprehensive data characteristic description is facilitated to be obtained later, which is helpful for data classification and tag determination of the data to be processed later.
And step S20, determining the label level to which the data to be processed belongs based on a preset label system according to the data source.
In this embodiment, first, a tag hierarchy to which data to be processed belongs is preliminarily determined from a preset tag hierarchy through a data source. The preset tag system is a tag system preset based on the current digital twin system, and can be understood as a set of hierarchical tag structures for classifying and organizing data, models or entities in the digital twin system. The preset tag system is generally composed of a plurality of tags, and the tags have a hierarchical relationship or mutual connection. The tag hierarchy may include a plurality of tag levels, each of the tag levels being formed of a plurality of co-level tags, each of the tag levels having a corresponding plurality of sub-tags. For example, in a digital twin system for an intelligent building, a tag system may be established to describe various parts of the building, where the tags forming the first tag hierarchy include a floor, a room type, a device type, etc., and each tag includes a plurality of sub-tags, the floor tag includes a floor serial number tag, the room type includes a conference room, an office, a tea room, etc., and the device type includes a tag for office equipment, fire protection equipment, lighting equipment, etc. Through the preset label system, various data and information in the building can be conveniently classified and managed.
Further, in the process of preliminarily determining the label levels from the preset label system by utilizing the data sources, the classification process can be performed by means of a level multi-label classification model, before which the training data set is labeled according to the label levels of the digital twin system, and the parameters of the model are learned through a training algorithm by using the labeled training data set. It has been mentioned hereinbefore that the data sources include the physical object information uploading the data to be processed, and the way in which the physical object collects the data to be processed. By summarizing all entity objects capable of uploading data and the data uploaded by all entity objects in a physical scene, a training sample set is constructed, and a hierarchical multi-label classification model is constructed by a preset label system. In the hierarchical multi-label classification task, a label set constructed based on a label hierarchy in a digital twin system is firstly organized into a label hierarchical topological structure, which can be a tree structure or a graph structure, wherein each label is designated as a node in a tree, and edges represent the hierarchical relationship among labels, and the constructed hierarchical multi-label classification model needs to consider which specific labels the data is applicable to and the hierarchical relationship among the labels. For the uploaded data, feature extraction and preprocessing are performed, and the feature extraction and preprocessing are converted into numerical representation forms which can be understood by a machine learning model, such as vectors or matrixes. In hierarchical multi-label classification, for new unlabeled data (i.e., data to be processed), the nodes between the trained model and the label hierarchical topology are matched by extracting the data source characteristics of the new unlabeled data, so that the label hierarchical prediction is completed. The prediction result may be a probability value or a binary classification result for each tag hierarchy. According to the hierarchical relationship between the labels, the prediction result is transferred from the upper label to the lower label, so that the data type is determined based on the sub-label under the current label hierarchy. For example, if an upper layer tag is predicted to be a positive class, then all its sub-tags are also predicted to be a positive class, and this transfer may be based on a probability threshold or other transfer rules.
In the label transmission process, the transmission mode of the label can be determined by selecting threshold transmission or rule transmission according to the characteristics of the data to be processed. Threshold delivery may be used when tags of data to be processed are independent of each other and have no relatively well-defined hierarchy. In the label transmission process of the probability threshold value transmission, the prediction result of each label has a probability value which indicates the probability that the label is predicted to be in a positive class, a probability threshold value can be set, when the probability value of an upper label exceeds the threshold value, all the sub-labels are also predicted to be in the positive class, and then the target sub-label corresponding to the current data is indicated to be in the label level. And when a clear hierarchical structure exists among tags of the data to be processed and certain constraint conditions need to be met, rule delivery can be used. In this case, the prediction results of the upper layer tag hierarchy may affect the prediction results of the lower layer tags. For example, in a multi-layer classification task, when the upper layer label is a positive class, all its sub-labels must appear in the prediction result to be considered as positive. In this case, it is explained that the data to be processed has the features corresponding to all the sub-tags in the current tag hierarchy, and then the number of the target sub-tags determined later is more than one.
It should be noted that when a preset tag system is used to find a tag level to which the data to be processed belongs, a tag rule needs to be set according to a data source of the digital twin system and data processing logic of each functional module, and then the tag system of the current digital twin system is created according to the tag rule and the data source. Tag rules refer to specific rules and logic in a digital twinning system for classifying and marking data or entity devices, and these rules may include, but are not limited to, condition judgment, classification logic, marking means, etc. for classifying data into corresponding categories or tags according to their characteristics or attributes. For example, in a manufacturing digital twin system, tag rules may be formulated to classify the state of the device into different categories such as normal, warning, failure, etc. according to parameters such as temperature, pressure, vibration, etc. of the operating state of the device, so as to effectively monitor and manage the device. Specific classification and tagging logic is performed by the tag rules, and a structured framework is provided for classification and tagging by the tag system.
In this embodiment, the data source is utilized to preliminarily lock the tag hierarchy where the data to be processed is located, so that the hierarchical multi-tag classification model can better consider the correlation and the dependency between tags, and is convenient for determining the sub-tags under the current tag hierarchy actually attributed to the sub-tags, and the data to be processed, whether the data to be processed is text, image, biological information or sensor data, can be finely managed.
And step S30, determining a target sub-tag corresponding to the data to be processed under the tag level according to the data attribute and the data characteristic corresponding to the uploading time stamp.
The tag level to which the data to be processed belongs has been determined in the previous step, but in this embodiment, the main purpose is to determine the target sub-tag to which the data to be processed corresponds under the current tag level. Some data characteristics of the data to be processed can be extracted according to the data attribute and the uploading time stamp. The data attribute refers to a characteristic or property describing the data to be processed, and is often represented by a data field, and is mainly divided into a nominal attribute, a binary attribute, an ordinal attribute, a numerical attribute, a discrete attribute, a continuous attribute and the like, wherein one piece of data to be processed can contain a plurality of data attributes, and the data characteristic extracted based on the data attributes is used for determining which attributes the data to be processed has. The uploading time stamp is used as a feature of the data and is used for representing the time property of the data, such as the creation time, the updating time or the time point of occurrence of an event, and based on the time stamp, some time-related features, such as the season, the weekday/weekend, the day/night and the like of the uploading of the data, the distribution condition, the trend change and the like of the data in different time periods, can be extracted. A series of time-related features can be derived from the data features corresponding to the uploading time stamps, and the features can be used for analyzing the change trend, periodicity and the like of the data along with time, so that more information and clues are provided for further analysis and application of the data. The data features are initially represented by text content so as to be converted into corresponding feature sequences, so that the purpose of the method is that the types of the data formats to be processed uploaded by the digital twin system are various, if a plurality of tag classification models are created for different data format types, the realization difficulty is high, and the tag classification models are converted into text classifications which are easier to process, so that the identification process of the target sub-tags is simpler and more accurate.
The target sub-label corresponding to the data to be processed under the current label level can be further determined through the data attribute and the data characteristic of the uploading time stamp. Here, we can use a label classification model for the label classification service after reaching a certain effect on the training set to execute the classification task of the target sub-label, and the label classification model can determine the sub-label included under the label level according to the label transmission result in the previous step, and the next marking process is performed based on the sub-label under the label level. The text content representing the data characteristics is converted into a characteristic sequence in a text form, the characteristic sequence with the same dimension is formed after processing such as filling, the characteristic sequence is input into a label classification model, the label classification model performs classification processing according to the characteristic sequence of the data characteristics, and a target sub-label corresponding to the data to be processed is determined. Alternatively, the tag classification model may be selected from Bert/ROBERTA/ERNIE, etc., and the modeling principle of the tag classification model of the present invention will be described below using the Bert model as an example.
Since there is only one sequence for the text classification, when constructing the data set, it is only necessary to construct the Token sequence corresponding to the original text, and add a [ CLS ] symbol and a [ SEP ] symbol to the first bit as inputs, and fill the rest positions with 0. The CLS symbol is used to represent the task of classification or sentence level of the whole sequence, and the hidden state of the symbol can be used for tasks such as text classification, sentence relation judgment, etc. after the hidden state of the symbol is subjected to fine adjustment of the corresponding downstream task. The [ SEP ] symbol is used to separate different sentences or text fragments, and in multi-sentence tasks, multiple sentences can be concatenated together using the [ SEP ] symbol so that the model can distinguish them, the presence of which can help the model learn the relationships between sentences during the pre-training process. Through sample enhancement on the data set, about 500 samples are guaranteed under each label, then data splitting is carried out, a training set, a testing set and a verification set are respectively formed, and splitting ratios are respectively 0.9:0.05:0.05. Then, the data set is batched, the original data sample is subjected to word division (token), a dictionary is constructed according to the result after token, the dictionary can be constructed directly through an open-source vocab.txt file, and the index sequence of each word in the vocab.txt file corresponds to the embedded vector of each word in the model one by one. Further, converting the text sequence after Token into Token sequence according to the dictionary, adding [ CLS ] and [ SEP ] symbols at the head and tail of Token sequence, and filling, namely filling the empty position in the sequence as 0, so as to generate a corresponding filling mask vector. Finally, during model training, only the Token sequence and the filler mask vector need to be fed to the BERT model. In the training process of the model, model evaluation indexes are also required to be selected according to actual demands, the effectiveness of the model is guaranteed, the model evaluation indexes can be precision rate, recall rate, F1 value and the like, the precision rate represents the proportion of the model predicted to be the actual positive sample in the positive sample, the recall rate represents the proportion of the model predicted to be the positive sample in the actual positive sample, the F1 value is the harmonic mean of the precision rate and the recall rate, and the performances of the precision rate and the recall rate are comprehensively evaluated. Through the trained tag classification model, services can be provided in the form of interfaces.
In this embodiment, by performing the above steps, a tag classification model for performing a data classification task of the digital twin system may be constructed, the model providing services in an interface form, and generating a tag result according to data uploaded by a data source by connecting the data source in the digital twin system.
And step S40, determining the data type of the data to be processed in the digital twin system based on the target sub-tag.
In this embodiment, the data type of the data to be processed may be determined by the target sub-tag. The target sub-tag may include information about the source, format, structure, content, etc. of the data. From these target sub-tags, the data to be processed can be categorized as the data type that belongs to in the digital twin system. According to the attribute or the field of the target sub-label contained in the data, the data type of the data to be processed in the digital twin system can be preliminarily determined. For the initially determined data type, its accuracy and integrity may be verified. Different data types have different verification rules, the correctness and the integrity of the data types can be confirmed through the verification rules, and when verification passes, the data are correspondingly integrated, stored and processed according to the processing requirements of the digital twin system, so that the requirements of the digital twin system are met.
Specifically, for numerical data, the check rule is to verify whether the data is of a numerical type and check whether it is within a reasonable range. If the temperature is a real number, it is checked whether the speed is a positive number. For text type data, the rule is to verify whether the data is of text type and check whether the length meets the requirements, such as checking whether the name field is of character string type, checking whether the description field does not exceed a specified number of characters. For enumeration data, the check rule is to verify whether the data belongs to a predefined enumeration value list, such as verifying whether the device status is one of enumeration values such as "normal", "failure" or "shutdown". For boolean data, the check rule is to verify whether the data is boolean, i.e. only True or False. And carrying out integrity check on the data to be processed through the check rules, and checking whether the data accords with the corresponding check rules item by item for each data type. If the data does not meet the check rule, the data can be marked as abnormal or missing, and corresponding information is recorded for subsequent processing. And after the data to be processed passes the verification, integrating and storing the data according to the processing requirement of the digital twin system. Depending on the nature and use scenario of the data, an appropriate data storage means is selected, such as a relational database, a NoSQL database, a file system, etc. The structuring and consistency of the data is ensured for subsequent data analysis and modeling.
In the embodiment, by using tag classification, the data in the data twin system can be finely managed, management differences between structured data and unstructured data are eliminated, and subsequent operations of executing storage of the data, accessing to a functional module and the like are facilitated.
Further, referring to fig. 2, a second embodiment of the tag-based data processing method of the present invention, before step S20, further includes the following steps:
and S50, acquiring a data source of the digital twin system and data processing logic of each functional module.
Step S60, setting a label rule according to the data source and the data processing logic.
Step S70, creating the preset tag system of the digital twin system according to the tag rule and the data source.
In this embodiment, the data source of the digital twin system may be real-time data collected by the sensor, historical data generated by the device, information in an external database, and so on. The functional module of the digital twin system at least comprises the aspects of data acquisition, data processing, modeling simulation, analysis prediction, decision support and the like. The functional module can acquire data from each data source, further process the data according to modeling requirements after necessary data cleaning, denoising, conversion and storage, establish a mathematical model corresponding to the real scene, and simulate by using the mathematical model to simulate the running state, response and change trend of each part of the real scene. Therefore, when a corresponding tag system is formulated for the digital twin system, the tag rule needs to be set according to the data source and the data processing logic, so that the tag system of the current digital twin system can be established according to the tag rule and the data source, wherein the tag rule is used for defining the meaning represented by the tag.
Alternatively, as a possible implementation manner, when the tag rule is formulated according to the data source and the data processing logic, the tag may be set according to the type of the data source, for example, the real-time data collected by the sensor may be set to the tag, for example, "real-time data", "sensor data", the history data generated by the device is set to the tag, for example, "history data", "device data", and the information in the external database is set to the tag, for example, "external database", "database information". Setting data processing tag rules according to data processing logic, such as data preprocessing, data integration, modeling data preparation, data tag generated by modeling simulation, derived data, simulation result and the like.
Further, when a label system is formulated, a label vocabulary list is created, wherein the label rule comprises all available labels, and the understanding of a label model on the labels is unified and consistent. According to the application guide of the label written according to the label rule customized before, how to correctly select and apply the label under different situations, the combined use mode of the label and the mapping relation among the labels of all levels are described, and then the label model is trained according to the application guide, so that the model can be fully adapted to the classification and use of the label. Meanwhile, the collection model is based on feedback of label system training and performs necessary adjustment and optimization on the label system.
In the embodiment, by setting corresponding tag systems for different data sources and data processing links, the data in the digital twin system can be better organized and managed, and subsequent modeling, analysis and decision support are facilitated. When the specific type of data is required to be searched or used, the data can be searched and screened according to the labels, so that the availability and the query efficiency of the data are improved.
Further, referring to fig. 3, a third embodiment of the tag-based data processing method according to the present invention further includes the following steps after step S40:
and step S80, based on the data type, accessing the data to be processed into a corresponding functional module in the twin system.
And step S90, constructing and executing a simulation task based on the data processing logic of the functional module and the data to be processed.
In this embodiment, through the determined data type, the determined data type may be accessed to a corresponding functional module in the digital twin system, and the functional module constructs and executes a corresponding simulation task according to its own data processing logic. It should be noted that, an association relationship needs to be established between the functional modules and the tags, that is, what type of data needs to be accessed by each functional module, and the data needs to be marked in advance, and when the data to be processed is accessed, the corresponding functional module can be accessed according to the association relationship. One functional module can establish an association relationship with a plurality of tags, and one tag can also establish an association relationship with a plurality of functional modules. Because the digital twin application is widely used, in order to understand the technical scheme of the invention, explanation is made by using two application scenes.
Taking a digital twin system of a logistics warehouse as an example, the position and the state of equipment, goods and personnel in the logistics warehouse can be monitored in real time through the digital twin system, the overall tracking and monitoring of the logistics process are realized, and the layout, equipment scheduling and resource utilization of the logistics warehouse can be optimized through the simulation and optimization functions of the digital twin system, so that the logistics efficiency is improved and the cost is reduced. The three batches of data to be processed uploaded at this moment are classified by the labels, and the matched level labels are respectively (warehouse goods, warehouse goods area, goods quantity), (staff, personnel information, working condition), (freight mode, freight channel and freight tool), so that the data labels are classified, the planning module of which the function module to be accessed is the logistics warehouse capacity, the freight capacity and the operation flow can be determined, after the data are accessed, the planning module can construct simulation tasks of the logistics warehouse capacity, the freight capacity and the operation flow based on historical data and the data uploaded at this moment, the prediction analysis is carried out in a mode of creating a prediction model, the capacity, the freight capacity and the operation flow of the logistics warehouse are helped to be planned to adapt to future requirements, and the description is that after the simulation tasks are constructed, the prediction model can be directly accessed without reconstruction when the follow-up prediction is carried out. In addition, the digital twin system also comprises a functional module for monitoring the state of the equipment, so that the classified and uploaded equipment operation data can be timely found out of potential faults of the equipment and diagnosed, maintenance suggestions are provided, and the downtime caused by the equipment faults is reduced.
Optionally, in another possible implementation manner, taking a digital twin system of the parking lot as an example, the digital twin system can monitor the number of vehicles in the parking lot, the occupation condition of the parking spaces and the state of the parking lot equipment in real time, so as to realize comprehensive monitoring and management of the parking lot. The digital twin system comprises five functional modules of parking space navigation and guidance, payment settlement, reservation and prediction, safety monitoring and alarm, data analysis and decision support. The parking space navigation and guidance can provide a parking space navigation function, display real-time parking space information to users, help them to quickly find available parking spaces, and provide vehicle navigation guidance. The reservation and prediction can be based on historical data and real-time data to predict and analyze the parking demand, help users reserve parking spaces, reduce the time for searching the parking spaces and plan parking lot resources in advance. The automatic calculation and the online payment of the parking fee are supported by the payment and the settlement, and a user can settle the parking fee through the system, so that a convenient payment mode is provided, and the manual operation in the parking process is reduced. The safety monitoring and the alarm are mainly combined with the monitoring camera and the sensor technology, so that the safety monitoring of the parking lot is realized, and when abnormal conditions (such as invasion, fire disaster and the like) occur, the system can send out the alarm and notify related personnel. The data analysis and decision support helps the manager to make data-driven decisions by collecting and analyzing the data of the parking lot, optimizes the layout of the parking space, and improves the parking efficiency and the user satisfaction. At this time, the two batches of uploaded data to be processed are respectively (sensor, temperature sensor), (monitoring video and monitoring area) after being classified by the labels, and the data can be confirmed to be uploaded to the safety monitoring and alarming function module through the association relation between the labels and the function module.
In this embodiment, by accessing the data with the determined data type into the functional module in the digital twin system, it can be ensured that the digital twin system can accurately simulate the system running state in the real world, and provide reliable support for the decision maker.
With reference to fig. 4, fig. 4 is a schematic structural diagram of a tag-based data processing device of a hardware running environment according to an embodiment of the present invention.
As shown in fig. 4, the tag-based data processing apparatus may include: a processor 1001, such as a central processing unit (Central Processing Unit, CPU), a communication bus 1002, a user interface 1003, a network interface 1004, a memory 1005. Wherein the communication bus 1002 is used to enable connected communication between these components. The user interface 1003 may include a Display, an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may further include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a WIreless interface (e.g., a WIreless-FIdelity (WI-FI) interface). The Memory 1005 may be a high-speed random access Memory (Random Access Memory, RAM) Memory or a stable nonvolatile Memory (NVM), such as a disk Memory. The memory 1005 may also optionally be a storage device separate from the processor 1001 described above.
It will be appreciated by those skilled in the art that the structure shown in fig. 4 does not constitute a limitation of the tag-based data processing apparatus, and may include more or fewer components than shown, or certain components may be combined, or a different arrangement of components.
As shown in fig. 4, an operating system, a data storage module, a network communication module, a user interface module, and a tag-based data processing program may be included in the memory 1005 as one type of storage medium.
In the tag-based data processing device shown in fig. 4, the network interface 1004 is mainly used for data communication with other devices; the user interface 1003 is mainly used for data interaction with a user; the processor 1001, the memory 1005 in the tag-based data processing apparatus of the present invention may be provided in a tag-based data processing apparatus which calls a tag-based data processing program stored in the memory 1005 through the processor 1001 and performs the steps of:
acquiring a data source, a data attribute and an uploading time stamp of data to be processed;
determining a label level to which the data to be processed belongs based on a preset label system according to the data source;
Determining a target sub-tag corresponding to the data to be processed under the tag level according to the data attribute and the data characteristic corresponding to the uploading time stamp;
and determining the data type of the data to be processed in the digital twin system based on the target sub-tag.
Further, the tag-based data processing apparatus calls the tag-based data processing program stored in the memory 1005 through the processor 1001, and performs the following steps:
acquiring a data source of the digital twin system and data processing logic of each functional module;
setting a tag rule according to the data source and the data processing logic;
and creating the preset tag system of the digital twin system according to the tag rule and the data source.
Further, the tag-based data processing apparatus calls the tag-based data processing program stored in the memory 1005 through the processor 1001, and performs the following steps:
creating a corresponding label hierarchical topological structure according to the preset label system;
constructing a hierarchical multi-label classification model based on the label hierarchical topology structure;
extracting data source characteristics of the data to be processed according to the data source;
And predicting the label level to which the data to be processed belongs according to the data source characteristics based on the level multi-label classification model.
Further, the tag-based data processing apparatus calls the tag-based data processing program stored in the memory 1005 through the processor 1001, and performs the following steps:
matching the data source features with nodes in the label hierarchical topology based on the hierarchical multi-label classification model;
and determining the label level to which the data to be processed belongs according to the probability value or the binary classification result of whether the source characteristic belongs to the node.
Further, the tag-based data processing apparatus calls the tag-based data processing program stored in the memory 1005 through the processor 1001, and performs the following steps:
converting the data attribute and the data characteristic corresponding to the uploading timestamp into a corresponding characteristic vector;
after adding a preset identifier to the feature vector and performing filling processing, inputting the feature vector into a tag classification model;
and determining a classification result of the data to be processed under the label level based on the label classification model and the feature vector so as to determine a target sub-label corresponding to the data to be processed.
Further, the tag-based data processing apparatus calls the tag-based data processing program stored in the memory 1005 through the processor 1001, and performs the following steps:
setting model evaluation indexes and labeling a training sample set according to a label system of the current digital twin system;
training an initial tag classification model based on the training sample set;
based on the model evaluation index, evaluating the model parameters of the initial label classification model after training;
and when the model parameters meet the model evaluation indexes, storing the model parameters to obtain the label classification model.
Further, the tag-based data processing apparatus calls the tag-based data processing program stored in the memory 1005 through the processor 1001, and performs the following steps:
determining a verification rule of the data to be processed according to the data type;
checking the integrity of the data to be processed based on the checking rule;
and after the data to be processed passes the verification, integrating and storing the data to be processed according to the processing requirement of the digital twin system.
Further, the tag-based data processing apparatus calls the tag-based data processing program stored in the memory 1005 through the processor 1001, and performs the following steps:
Based on the data type, accessing the data to be processed into a corresponding functional module in the twin system;
and constructing and executing a simulation task based on the data processing logic of the functional module and the data to be processed.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.
The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.
From the above description of embodiments, it will be clear to a person skilled in the art that the above embodiment method may be implemented by means of software plus a necessary general hardware platform, but may of course also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) as described above, comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method according to the embodiments of the present invention.
The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the scope of the invention, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.

Claims (10)

1. A tag-based data processing method, characterized in that the tag-based data processing method comprises the steps of:
acquiring a data source, a data attribute and an uploading time stamp of data to be processed;
determining a label level to which the data to be processed belongs based on a preset label system according to the data source;
determining a target sub-tag corresponding to the data to be processed under the tag level according to the data attribute and the data characteristic corresponding to the uploading time stamp;
and determining the data type of the data to be processed in the digital twin system based on the target sub-tag.
2. The tag-based data processing method according to claim 1, wherein before the step of determining a tag hierarchy to which the data to be processed belongs based on a preset tag hierarchy according to the data source, the method further comprises:
Acquiring a data source of the digital twin system and data processing logic of each functional module;
setting a tag rule according to the data source and the data processing logic;
and creating the preset tag system of the digital twin system according to the tag rule and the data source.
3. The tag-based data processing method of claim 1, wherein the step of determining a tag hierarchy to which the data to be processed belongs based on a preset tag hierarchy according to the data source comprises:
creating a corresponding label hierarchical topological structure according to the preset label system;
constructing a hierarchical multi-label classification model based on the label hierarchical topology structure;
extracting data source characteristics of the data to be processed according to the data source;
and predicting the label level to which the data to be processed belongs according to the data source characteristics based on the level multi-label classification model.
4. A tag-based data processing method according to claim 3, wherein the step of predicting the tag hierarchy to which the data to be processed belongs from the data source characteristics based on the hierarchical multi-tag classification model comprises:
Matching the data source features with nodes in the label hierarchical topology based on the hierarchical multi-label classification model;
and determining the label level to which the data to be processed belongs according to the probability value or the binary classification result of whether the source characteristic belongs to the node.
5. The tag-based data processing method according to claim 1, wherein the step of determining a target sub-tag corresponding to the data to be processed under the tag level according to the data attribute and the data feature corresponding to the uploading timestamp comprises:
converting the data attribute and the data characteristic corresponding to the uploading timestamp into a corresponding characteristic vector;
after adding a preset identifier to the feature vector and performing filling processing, inputting the feature vector into a tag classification model;
and determining a classification result of the data to be processed under the label level based on the label classification model and the feature vector so as to determine a target sub-label corresponding to the data to be processed.
6. The tag-based data processing method of claim 5, wherein prior to the step of converting the data characteristic corresponding to the uploading timestamp based on the data attribute to a corresponding characteristic vector, further comprising:
Setting model evaluation indexes and labeling a training sample set according to a label system of the current digital twin system;
training an initial tag classification model based on the training sample set;
based on the model evaluation index, evaluating the model parameters of the initial label classification model after training;
and when the model parameters meet the model evaluation indexes, storing the model parameters to obtain the label classification model.
7. The tag-based data processing method of claim 1, wherein after the step of determining the data type to which the data to be processed belongs in the digital twin system based on the target sub-tag, further comprising:
determining a verification rule of the data to be processed according to the data type;
checking the integrity of the data to be processed based on the checking rule;
and after the data to be processed passes the verification, integrating and storing the data to be processed according to the processing requirement of the digital twin system.
8. The tag-based data processing method of claim 1, wherein after the step of determining the data type to which the data to be processed belongs in the digital twin system based on the target sub-tag, further comprising:
Based on the data type, accessing the data to be processed into a corresponding functional module in the twin system;
and constructing and executing a simulation task based on the data processing logic of the functional module and the data to be processed.
9. A tag-based data processing apparatus, the tag-based data processing apparatus comprising: a memory, a processor and a tag-based data handler stored on the memory and executable on the processor, the tag-based data handler being configured to implement the steps of the tag-based data processing method of any one of claims 1 to 8.
10. A computer-readable storage medium, on which a tag-based data processing program is stored, which, when being executed by a processor, implements the steps of the tag-based data processing method according to any one of claims 1 to 8.
CN202311757112.7A 2023-12-20 2023-12-20 Tag-based data processing method, device and computer-readable storage medium Active CN117436444B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311757112.7A CN117436444B (en) 2023-12-20 2023-12-20 Tag-based data processing method, device and computer-readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311757112.7A CN117436444B (en) 2023-12-20 2023-12-20 Tag-based data processing method, device and computer-readable storage medium

Publications (2)

Publication Number Publication Date
CN117436444A true CN117436444A (en) 2024-01-23
CN117436444B CN117436444B (en) 2024-04-02

Family

ID=89558611

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311757112.7A Active CN117436444B (en) 2023-12-20 2023-12-20 Tag-based data processing method, device and computer-readable storage medium

Country Status (1)

Country Link
CN (1) CN117436444B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113301111A (en) * 2021-04-09 2021-08-24 厦门攸信信息技术有限公司 Digital twinning method, edge computing device, mobile terminal and storage medium
CN114782588A (en) * 2022-06-23 2022-07-22 四川见山科技有限责任公司 Real-time drawing method and system for road names in digital twin city
CN115237923A (en) * 2022-08-03 2022-10-25 常州纺织服装职业技术学院 Digital database storage method based on cloud platform
CN115311027A (en) * 2022-10-11 2022-11-08 工业云制造(四川)创新中心有限公司 Supply chain management method and system based on digital twin
US20230058169A1 (en) * 2020-04-28 2023-02-23 Strong Force Tp Portfolio 2022, Llc System for representing attributes in a transportation system digital twin
CN116089886A (en) * 2023-02-17 2023-05-09 中国工商银行股份有限公司 Information processing method, device, equipment and storage medium
CN116187771A (en) * 2022-11-16 2023-05-30 中国电建集团华东勘测设计研究院有限公司 Scene flood control plan management system and method based on structural decomposition
CN116414089A (en) * 2019-02-14 2023-07-11 罗克韦尔自动化技术公司 AI extension and smart model validation for industrial digital twinning
CN117034582A (en) * 2023-07-28 2023-11-10 广州明珞装备股份有限公司 Digital twin modeling method, device, computer equipment and storage medium

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116414089A (en) * 2019-02-14 2023-07-11 罗克韦尔自动化技术公司 AI extension and smart model validation for industrial digital twinning
US20230058169A1 (en) * 2020-04-28 2023-02-23 Strong Force Tp Portfolio 2022, Llc System for representing attributes in a transportation system digital twin
CN113301111A (en) * 2021-04-09 2021-08-24 厦门攸信信息技术有限公司 Digital twinning method, edge computing device, mobile terminal and storage medium
CN114782588A (en) * 2022-06-23 2022-07-22 四川见山科技有限责任公司 Real-time drawing method and system for road names in digital twin city
CN115237923A (en) * 2022-08-03 2022-10-25 常州纺织服装职业技术学院 Digital database storage method based on cloud platform
CN115311027A (en) * 2022-10-11 2022-11-08 工业云制造(四川)创新中心有限公司 Supply chain management method and system based on digital twin
CN116187771A (en) * 2022-11-16 2023-05-30 中国电建集团华东勘测设计研究院有限公司 Scene flood control plan management system and method based on structural decomposition
CN116089886A (en) * 2023-02-17 2023-05-09 中国工商银行股份有限公司 Information processing method, device, equipment and storage medium
CN117034582A (en) * 2023-07-28 2023-11-10 广州明珞装备股份有限公司 Digital twin modeling method, device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN117436444B (en) 2024-04-02

Similar Documents

Publication Publication Date Title
US20170109676A1 (en) Generation of Candidate Sequences Using Links Between Nonconsecutively Performed Steps of a Business Process
US20170109667A1 (en) Automaton-Based Identification of Executions of a Business Process
US20170109636A1 (en) Crowd-Based Model for Identifying Executions of a Business Process
CN106570778A (en) Big data-based data integration and line loss analysis and calculation method
Friederich et al. Towards data-driven reliability modeling for cyber-physical production systems
CN113590451B (en) Root cause positioning method, operation and maintenance server and storage medium
US20170109638A1 (en) Ensemble-Based Identification of Executions of a Business Process
CN113868498A (en) Data storage method, electronic device, device and readable storage medium
CN115564071A (en) Method and system for generating data labels of power Internet of things equipment
CN117271767A (en) Operation and maintenance knowledge base establishing method based on multiple intelligent agents
CN116361147A (en) Method for positioning root cause of test case, device, equipment, medium and product thereof
CN116861924A (en) Project risk early warning method and system based on artificial intelligence
US20170109640A1 (en) Generation of Candidate Sequences Using Crowd-Based Seeds of Commonly-Performed Steps of a Business Process
CN116932523B (en) Platform for integrating and supervising third party environment detection mechanism
CN116611813B (en) Intelligent operation and maintenance management method and system based on knowledge graph
CN117436444B (en) Tag-based data processing method, device and computer-readable storage medium
US20170109670A1 (en) Crowd-Based Patterns for Identifying Executions of Business Processes
CN113807704A (en) Intelligent algorithm platform construction method for urban rail transit data
CN111352818B (en) Application program performance analysis method and device, storage medium and electronic equipment
CN113887932A (en) Operation and maintenance management and control method and device based on artificial intelligence and computer equipment
CN113743695A (en) International engineering project bid quotation risk management method based on big data
CN115185923B (en) Method and system for managing meteorological observation metadata and intelligent terminal
CN117150439B (en) Automobile manufacturing parameter detection method and system based on multi-source heterogeneous data fusion
CN117973566B (en) Training data processing method and device and related equipment
CN116755910B (en) Host machine high availability prediction method and device based on cold start and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant