CN106933826B - Data preprocessing method and device - Google Patents

Data preprocessing method and device Download PDF

Info

Publication number
CN106933826B
CN106933826B CN201511017135.XA CN201511017135A CN106933826B CN 106933826 B CN106933826 B CN 106933826B CN 201511017135 A CN201511017135 A CN 201511017135A CN 106933826 B CN106933826 B CN 106933826B
Authority
CN
China
Prior art keywords
message
cache
processor
data
input
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201511017135.XA
Other languages
Chinese (zh)
Other versions
CN106933826A (en
Inventor
占义忠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ZTE Corp
Original Assignee
ZTE Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ZTE Corp filed Critical ZTE Corp
Priority to CN201511017135.XA priority Critical patent/CN106933826B/en
Priority to PCT/CN2016/085161 priority patent/WO2016197924A1/en
Publication of CN106933826A publication Critical patent/CN106933826A/en
Application granted granted Critical
Publication of CN106933826B publication Critical patent/CN106933826B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24553Query execution of query operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24552Database cache management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)
  • Devices For Executing Special Programs (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention discloses a data preprocessing method, which comprises the steps of acquiring data to be processed when a data preprocessing instruction is received; mapping the acquired data into an input message corresponding to a preset message model; performing logic processing on the input message by using a preset processor to obtain an output message, wherein the performing logic processing on the input message by using the processor comprises: and extracting field information of the input message, and processing and deforming the field information to obtain an output message. The invention also discloses a data preprocessing device. The invention improves the efficiency of data preprocessing and reduces the operation cost of data preprocessing.

Description

Data preprocessing method and device
Technical Field
The present invention relates to the field of data processing, and in particular, to a data preprocessing method and apparatus.
Background
With the large-scale application of technologies such as computers, the internet of things and the like in various fields, the generated data volume is increased. Because the data types generated by the data sources are various and the formats are different, before the data are processed, the data are preprocessed firstly, and specific processing operation on the data is executed after the preprocessing, and because the types and the formats of the data are various and different, the structures of the data are different, different software needs to be developed for different data, each type of data is preprocessed respectively, and after the preprocessing is finished, the data are concentrated into a processor to execute subsequent processing operation, and the data are preprocessed through different developed software, so that the operation of the data preprocessing process is time-consuming, the efficiency is low, and the operation cost is high.
Disclosure of Invention
The invention mainly aims to provide a data preprocessing method and a data preprocessing device, and aims to solve the technical problems of time consumption, low efficiency and high operation cost of a data preprocessing process in a traditional preprocessing mode.
In order to achieve the above object, the present invention provides a data preprocessing method, which includes the following steps:
when a data preprocessing instruction is received, acquiring data to be processed;
mapping the acquired data into an input message corresponding to a preset message model;
performing logic processing on the input message by using a preset processor to obtain an output message, wherein the performing logic processing on the input message by using the processor comprises: and extracting field information of the input message, and processing and deforming the field information to obtain an output message.
Preferably, the step of logically processing the input message by using a preset processor to obtain an output message includes:
determining a type of the input message;
acquiring a message execution stream corresponding to the determined type according to a mapping relation between the preset type and the message execution stream;
determining the processor based on the acquired message execution flow, wherein the determined processor comprises an output message processor;
and performing logic processing on the input message according to the determined processor to obtain an output message.
Preferably, when the determined processor further includes a temporary message processor and/or a cache message processor, the step of performing logic processing on the input message according to the determined processor to obtain an output message includes:
determining attributes corresponding to the temporary message processor and/or the cache message processor;
when the attribute corresponding to the temporary message processor and/or the cache message processor is a creatable attribute, performing logic processing on the input message according to the temporary message processor to obtain a temporary message, and/or performing logic processing on the input message according to the cache message processor to obtain a cache message;
and performing logic processing on the temporary message and/or the cache message according to the output message processor to obtain an output message.
Preferably, the processor performs logic processing on the temporary message and/or the cache message according to the output message to obtain the output message, and if a storage instruction of the temporary message is received, performs the following steps:
and storing the temporary message into a preset cache region to generate a cache message, so that when a data preprocessing instruction is received next time, the cache message in the cache region is used as the referred data of a new input message.
Preferably, the data preprocessing method further includes:
deleting part of cache messages in the cache region when the message storage capacity value of the cache region reaches a preset capacity value, wherein the access time of the deleted cache messages is earlier than that of the undeleted cache messages;
or deleting the cache message when the distance between the access time point with the cache message in the cache region and the current time point reaches a preset time length.
In addition, in order to achieve the above object, the present invention further provides a data preprocessing apparatus, including:
the acquisition module is used for acquiring data to be processed when a data preprocessing instruction is received;
the mapping module is used for mapping the acquired data into an input message corresponding to a preset message model;
a processing module, configured to perform logic processing on the input message by using a preset processor to obtain an output message, where performing logic processing on the input message by the processor includes: and extracting field information of the input message, and processing and deforming the field information to obtain an output message.
Preferably, the processing module comprises:
a first determining submodule for determining a type of the input message;
the obtaining submodule is used for obtaining the message execution flow corresponding to the determined type according to the mapping relation between the preset type and the message execution flow;
a second determining submodule for determining the processor based on the acquired message execution flow, wherein the determined processor includes an output message processor;
and the processing submodule is used for carrying out logic processing on the input message according to the determined processor so as to obtain an output message.
Preferably, when the determined processor further includes an interim message processor and/or a cache message processor, the processing submodule includes:
the determining unit is used for determining the corresponding attribute of the temporary message processor and/or the cache message processor;
the first processing unit is used for performing logic processing on the input message according to the temporary message processor to obtain a temporary message and/or performing logic processing on the input message according to the cache message processor to obtain a cache message when the attribute corresponding to the temporary message processor and/or the cache message processor is a creatable attribute;
and the second processing unit is used for carrying out logic processing on the temporary message and/or the cache message according to the output message processor to obtain an output message.
Preferably, the data preprocessing apparatus further includes:
and the storage unit is used for storing the temporary message into a preset cache region to generate a cache message if a storage instruction of the temporary message is received, so that the cache message in the cache region is used as the referred data of a new input message when a data preprocessing instruction is received next time.
Preferably, the data preprocessing apparatus further includes:
the deleting module is used for deleting part of the cache messages in the cache region when the message storage capacity value of the cache region reaches a preset capacity value, wherein the access time of the deleted cache messages is earlier than that of the undeleted cache messages;
or deleting the cache message when the distance between the access time point with the cache message in the cache region and the current time point reaches a preset time length.
The data preprocessing method and the data preprocessing device provided by the invention have the advantages that when a data preprocessing instruction is received, data to be processed are obtained, the obtained data are mapped into input messages corresponding to a preset message model, a preset processor is adopted to carry out logic processing on the input messages so as to obtain output messages, wherein the logic processing on the input messages by the processor comprises the following steps: the method comprises the steps of extracting field information of the input message, processing and deforming the field information to obtain an output message, mapping different types of data into the input message corresponding to a message model, namely, unstructured data to extract the field information of the input message, performing logic processing on the extracted field information to finally obtain the output message, and not performing preprocessing by different software when the types and the structures of the data are different.
Drawings
FIG. 1 is a flow chart illustrating a data preprocessing method according to a preferred embodiment of the present invention;
FIG. 2 is a flow chart illustrating a preferred embodiment of the present invention for logically processing the input message by using a predetermined processor to obtain an output message;
FIG. 3 is a flow diagram illustrating a preferred embodiment of the present invention for logically processing the incoming message according to the determined processor;
FIG. 4 is a functional block diagram of a data preprocessing apparatus according to a preferred embodiment of the present invention;
FIG. 5 is a schematic diagram of a refinement function module of the processing module of FIG. 4;
FIG. 6 is a schematic diagram of a refinement function module of the processing submodule of FIG. 5;
FIG. 7 is a schematic diagram of an implementation scenario of the present invention;
FIG. 8 is a diagram illustrating a default message execution flow according to the present invention;
fig. 9 is a diagram illustrating the height values corresponding to the messages in the message execution flow of fig. 8.
The objects, features and advantages of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The invention provides a data preprocessing method.
Referring to fig. 1, fig. 1 is a schematic flow chart of a data preprocessing method according to a first embodiment of the present invention.
The embodiment provides a data preprocessing method, which includes:
step S10, when a data preprocessing instruction is received, acquiring data to be processed;
in this embodiment, when the data preprocessing instruction is received, the data to be processed is acquired, where the data to be processed includes: the data source may obtain a temporary message or a cache message generated in a previous data preprocessing process in the data preprocessing process, and use the temporary message or the cache message as the referred data of the data to be processed, further, the temporary message generated in the previous data preprocessing process may be stored in a preset cache space to be a cache message, and then the cache message in the cache area is used as the referred data of the data to be processed when the data preprocessing is performed this time.
Step S20, mapping the acquired data into an input message corresponding to a preset message model;
in this embodiment, in the data preprocessing process, data is abstracted into a message, then the abstracted message is preprocessed, a preset message model is loaded in a manner of abstracting the data into the message, the data is mapped into an input message corresponding to the message model, and the manner of mapping the data into the input message corresponding to the message model is as follows: and extracting each field information contained in the data, arranging each extracted field information according to a preset message model in a form of the message model, and finally taking the arranged field information as an input message. Specifically, the input message is composed of a set of continuous fields, and the fields can be of simple data types or complex combined data types. And the structural content of the message model comprises: 1. message name, message type; wherein, the message name is a character string beginning with letters, special characters which cannot be contained comprise spaces, ', ' - ' and the like, and because the message name needs to be mutually referenced in the model, the model name must be globally unique and can represent the meaning of describing the message, and the message name also serves as the file name of the model configuration; the message Type is specifically an encoding mode of the message model, and the encoding mode of the message model may be fixed format encoding, such as TLV (Type-Length-Value) encoding, or some customized complex encoding, and the like. 2. Logical conditional expression of the message: the model is configured with some logic expressions to control the flow of processing the message, such as a creating conditional expression processor and a deleting conditional expression processor for caching the message, when the creating conditional expression is satisfied, the message is created and cached, and when the deleting conditional expression is satisfied, the corresponding message is deleted from the cache after the flow is processed; 3. the composition of the message: a message consists of fields. The message model is usually expressed in XML (Extensible Markup Language), and the message model may be stored in a relational database in consideration of dynamic change of the message model. In this embodiment, the input message may be a call signaling of a mobile network, or may be an internet record of a user.
And the structural content of the field comprises: 1. field name, field type, field length, etc.; wherein, the field name is a character string beginning with letters, special characters which cannot be contained comprise spaces, ', ' - ' and the like, and the field name must be unique in each message model and can represent the meaning of the field; the field type can be basic field types such as integer, character string and the like, and can also be compound field types, wherein the integer can be divided into single byte, double byte, four byte, long integer and the like; and a field length, for example, 0,65535, it is noted that length represents the maximum length of a character string when the field is the type of the character string. 2. Processor of the field: the newly generated message field in the data preprocessing process can be obtained by processing or operation of the generated message field, and the processing and operation can abstract a general processor. For complex processing logic, the processing logic can be realized by a built-in processor, general logic processing can be realized by an expression processor, the expression processor is a logic processing code supporting online editing and compiling, and the expression implies the association relation and the processing logic of the message. Wherein, each expression may include a plurality of sub-expressions, and each sub-expression is expressed by ": "separate or other special symbol separate, when a first sub-expression fails to process, a second sub-expression is processed, and continued processing of subsequent expressions is not suspended until successful. It will be appreciated that the expression processor system may be dynamically compiled into machine code rather than interpreted for execution, thereby ensuring that the processor is flexibly configurable and efficiently executable, and that the built-in processor and the expression processor may be combined with each other into a combined processor. In order to make the field description of the model more intuitive, the processor is put into the model configuration as an attribute of the field.
Step S30, performing logic processing on the input message by using a preset processor to obtain an output message, where the performing logic processing on the input message by the processor includes: and extracting field information of the input message, and processing and deforming the field information to obtain an output message.
In this embodiment, a preset processor is adopted to perform logic processing on the input message to obtain an output message, that is, field information of the input message is extracted, then the field information is subjected to logic processing, the field information is processed and deformed to obtain a message in a to-be-output form, and the message in the to-be-output form is used as the output message.
To better understand the present solution, the exemplary application scenarios are as follows: when the data preprocessing system is started, a preset message model is loaded, and referring to fig. 7, the message model includes an input message model, an intermediate message (including a temporary message and a cache message) model, and a preprocessed output message model. After the message model is loaded, the data is mapped into an internal structure capable of being directly and efficiently positioned to obtain input messages, configuration information can be loaded at the moment, the loaded configuration information is converted into cache messages, the configuration information can exist in various forms, such as a relational database, an attribute file, an XML file and the like, keywords exist in the cache messages, mapping of the keywords and the cache messages can be built inside, and searching according to the keywords or searching by condition matching is allowed. The cache message may be used for large data preprocessing such as data backfill and data reduction, and meanwhile, a message receiving service is loaded, that is, an object to be transmitted after data preprocessing is determined, and the message receiving service is started, where the message receiving service may be a network receiving service of a User Datagram Protocol (UDP) Protocol or a Transmission Control Protocol (TCP) Protocol, and specifically, a corresponding Protocol is selected according to an actual network scene. And finally, processing the input through a preset processor to extract field information in the input message, performing logic processing on the extracted field information to obtain a message in a form to be output, and finally outputting the output message.
In the data preprocessing method provided in this embodiment, when a data preprocessing instruction is received, to-be-processed data is acquired, the acquired data is mapped to an input message corresponding to a preset message model, and a preset processor is used to perform logic processing on the input message to obtain an output message, where the logic processing performed on the input message by the processor includes: the method comprises the steps of extracting field information of the input message, processing and deforming the field information to obtain an output message, mapping different types of data into the input message corresponding to a message model, namely, unstructured data to extract the field information of the input message, performing logic processing on the extracted field information to finally obtain the output message, and not performing preprocessing by different software when the types and the structures of the data are different.
Further, in order to improve the flexibility of data preprocessing, a second embodiment of the data preprocessing method of the present invention is proposed based on the first embodiment, and in this embodiment, referring to fig. 2, the step S30 includes:
step S31, determining the type of the input message;
step S32, according to the preset mapping relation between the type and the message execution flow, obtaining the message execution flow corresponding to the determined type;
a step S33 of executing a flow determination processor based on the acquired message, wherein the determined processor includes an output message processor;
in this embodiment, the type of the input message is determined first, where the type includes a call type, an internet type, and the like, and after the type of the input message is determined, the message execution flow corresponding to the determined type is obtained according to a mapping relationship between a preset type and a message execution flow, it can be understood that the execution flow corresponding to the message type and the message type is stored in advance in the system, that is, which kind of message corresponds to which kind of execution flow, specifically referring to fig. 8, where the type of the input message is defined as a, it can be known that the execution flow corresponding to the input message is a-B-C-E-D. It is to be understood that, at this time, the processors corresponding to the execution flow include an intermediate message processor and an output message processor, and the intermediate message processor includes a cache message processor and an interim message processor. It is noted that there is only one execution flow for each input message type.
And step S34, performing logic processing on the input message according to the determined processor to obtain an output message.
In this embodiment, also referring to fig. 8, when the type of the input message is E, the message execution flow is E-D, at this time, it can be known that the data preprocessing process only includes the input message and the output message, and does not include an intermediate message, at this time, the determined processor is an output message processor, and the output message processor performs logic processing deformation processing on the input message E to obtain an output message D.
In this embodiment, referring to fig. 3, when the determined processor further includes a temporary message processor and/or a cache message processor, the step S34 includes:
step S341, determining attributes corresponding to the temporary message handler and/or the cache message handler;
step S342, when the attribute corresponding to the temporary message processor and/or the cache message processor is a creatable attribute, performing logic processing on the input message according to the temporary message processor to obtain a temporary message, and/or performing logic processing on the input message according to the cache message processor to obtain a cache message;
step S343, according to the output message, the processor performs logic processing on the temporary message and/or the cache message to obtain an output message.
In this embodiment, after the processor is determined, determining an attribute corresponding to the processor, that is, when the determined processor further includes a temporary message processor and/or a cache message processor, determining an attribute corresponding to the temporary message processor and/or the cache message processor, when the attribute corresponding to the temporary message processor and/or the cache message processor is a creatable attribute, performing logic processing on the input message according to the temporary message processor to obtain a temporary message, and/or performing logic processing on the input message according to the cache message processor to obtain a cache message, and performing logic processing on the temporary message and/or the cache message according to the output message processor to obtain an output message. It will be understood that, besides the input message, other messages may have a creation condition handler, that is, the interim message, the cache message and the output message all have a creation condition handler, and the creation condition handler is used for creating messages, for example, when the attribute corresponding to the creation condition handler of the interim message is a creatable attribute, the input message may generate the interim message by the creation condition handler corresponding to the interim message handler, and when the attribute corresponding to the creation condition handler of the interim message is a non-creatable attribute, the input message may not generate the interim message, further, the cache message also has an update condition handler and a delete condition handler, that is, only the cache message handler corresponds to the update condition handler and the delete condition handler, and the update handler is used for updating the cache message, the deletion condition handler is to delete the aged cache message. In this embodiment, to better understand the present embodiment, also referring to fig. 8, when the type of the input message is a, the message execution flow is a-B-C-E-D, and it can be understood that after the message execution flow is obtained, it may also be determined whether to generate the cache message or the output message according to a creation condition and a deletion condition in the message model, or whether to delete the message after one message execution flow is processed. That is, whether a message can be generated is determined according to the attribute of the corresponding processor in the message execution stream, if the attribute of the processor is an attribute that cannot be created, the processor cannot generate the message content, if the type of the input message is a, the execution stream is a-B-C-E-D, if only the attribute of the processor corresponding to the message C is a creatable attribute at this time, and the attribute of the processor corresponding to each of the other messages is an attribute that cannot be created, the output message that is finally generated is C.
If the attribute of the processor corresponding to each message is a creatable attribute, the height value corresponding to each message needs to be determined first, and the output sequence of each message is determined according to the height value of each message. Namely, the message model is loaded first, and the internal mapping relation is established. And establishing a message execution flow according to the reference relation between the messages, and when one message does not depend on other messages, considering the message as an input message, and traversing the relation graph by taking the message as an inlet of the reference relation to find a list of the reference relation. As in table 1 below:
Figure BDA0000894825110000091
Figure BDA0000894825110000101
A. b, C, D and E, A is not dependent on any other message, so it is an input message, then the reference relationship can be traversed using A as an entry, the height of each message being equal to the highest value of the height of the messages that it depends on plus one. And sequencing according to the height value of each message to finally obtain a dependency chain table, and ensuring that the depended messages are arranged in front, wherein the output message D can be obtained by referring to FIG. 9.
Further, in order to improve flexibility of data preprocessing, a third embodiment of the data preprocessing method according to the present invention is provided based on the second embodiment, and in this embodiment, when the step S343 is executed, if a storage instruction of a temporary message is received, the following steps are executed:
and storing the temporary message into a preset cache region to generate a cache message, so that when a data preprocessing instruction is received next time, the cache message in the cache region is used as the referred data of a new input message.
In this embodiment, when the temporary message and the cache message are logically processed according to the output message processor to obtain an output message, and if a storage instruction of the temporary message is received, the temporary message is stored in a preset cache region to generate a cache message, referring to fig. 7, the temporary message is streamed into the cache message processor to be stored in the preset cache region to generate a cache message, and according to the generated cache message, it can be understood that the cache message generated according to the logical processing of the input message by the cache message processor is also stored in the preset cache region, and when a data preprocessing instruction is received next time, the cache message in the cache region can be used as the referred data of a new input message, and when data is processed next time, the temporary message or the cache message generated this time can be used as the referred data, the efficiency of data processing is improved.
In this embodiment, the database or some configured data may be dynamically loaded into the cache, so that the data may be referred to in the preprocessing process.
Further, in order to improve the flexibility of data preprocessing, a fourth embodiment of the data preprocessing method according to the present invention is proposed based on the third embodiment, and in this embodiment, the data preprocessing method further includes:
deleting part of cache messages in the cache region when the message storage capacity value of the cache region reaches a preset capacity value, wherein the access time of the deleted cache messages is earlier than that of the undeleted cache messages;
or deleting the cache message when the distance between the access time point with the cache message in the cache region and the current time point reaches a preset time length.
In this embodiment, the cache messages may be processed more and more with the messages, and therefore, the cache messages generally need to have a maximum number limit, a storage period, and a mechanism of elimination and deletion, and therefore, in this embodiment, by deleting the cache messages in the cache region at regular time, the cache messages in the cache region are prevented from occupying too much capacity, thereby reducing the operation efficiency of data preprocessing.
Further, the cache messages have corresponding timing services, and each cache message is bound with the timer information, so that when massive data exists in the cache region, the cache messages can be automatically deleted.
The invention further provides a data preprocessing device.
Referring to fig. 4, fig. 4 is a functional block diagram of a data preprocessing apparatus according to a first embodiment of the present invention.
It should be emphasized that the functional block diagram of fig. 4 is merely an exemplary diagram of a preferred embodiment, and those skilled in the art can easily add new functional blocks around the functional blocks of the data preprocessing apparatus shown in fig. 4; the names of the functional modules are self-defined names which are only used for assisting in understanding the program functional blocks of the data preprocessing device and are not used for limiting the technical scheme of the invention, and the core of the technical scheme of the invention is the functions which are to be achieved by the functional modules with the respective defined names.
This embodiment proposes a data preprocessing apparatus, which includes:
the acquiring module 10 is configured to acquire data to be processed when a data preprocessing instruction is received;
in this embodiment, when the data preprocessing instruction is received, the obtaining module 10 obtains the data to be processed, where the data to be processed includes: the data source may obtain a temporary message or a cache message generated in a previous data preprocessing process in the data preprocessing process, and use the temporary message or the cache message as the referred data of the data to be processed, further, the temporary message generated in the previous data preprocessing process may be stored in a preset cache space to be a cache message, and then the cache message in the cache area is used as the referred data of the data to be processed when the data preprocessing is performed this time.
The mapping module 20 is configured to map the acquired data into an input message corresponding to a preset message model;
in this embodiment, in the data preprocessing process, data is abstracted into a message, then the abstracted message is preprocessed, and a preset message model is loaded in a manner of abstracting the data into the message, that is, the mapping module 20 maps the data into an input message corresponding to the message model, and a manner of mapping the data into an input message corresponding to the message model by the mapping module 20 is as follows: and extracting each field information contained in the data, arranging each extracted field information according to a preset message model in a form of the message model, and finally taking the arranged field information as an input message. Specifically, the input message is composed of a set of continuous fields, and the fields can be of simple data types or complex combined data types. And the structural content of the message model comprises: 1. message name, message type; wherein, the message name is a character string beginning with letters, special characters which cannot be contained comprise spaces, ', ' - ' and the like, and because the message name needs to be mutually referenced in the model, the model name must be globally unique and can represent the meaning of describing the message, and the message name also serves as the file name of the model configuration; the message Type is specifically an encoding mode of the message model, and the encoding mode of the message model may be fixed format encoding, such as TLV (Type-Length-Value) encoding, or some customized complex encoding, and the like. 2. Logical conditional expression of the message: the model is configured with some logic expressions to control the flow of processing the message, such as a creating conditional expression processor and a deleting conditional expression processor for caching the message, when the creating conditional expression is satisfied, the message is created and cached, and when the deleting conditional expression is satisfied, the corresponding message is deleted from the cache after the flow is processed; 3. the composition of the message: a message consists of fields. The message model is usually expressed in XML (Extensible Markup Language), and the message model may be stored in a relational database in consideration of dynamic change of the message model. In this embodiment, the input message may be a call signaling of a mobile network, or may be an internet record of a user.
And the structural content of the field comprises: 1. field name, field type, field length, etc.; wherein, the field name is a character string beginning with letters, special characters which cannot be contained comprise spaces, ', ' - ' and the like, and the field name must be unique in each message model and can represent the meaning of the field; the field type can be basic field types such as integer, character string and the like, and can also be compound field types, wherein the integer can be divided into single byte, double byte, four byte, long integer and the like; and a field length, for example, 0,65535, it is noted that length represents the maximum length of a character string when the field is the type of the character string. 2. Processor of the field: the newly generated message field in the data preprocessing process can be obtained by processing or operation of the generated message field, and the processing and operation can abstract a general processor. For complex processing logic, the processing logic can be realized by a built-in processor, general logic processing can be realized by an expression processor, the expression processor is a logic processing code supporting online editing and compiling, and the expression implies the association relation and the processing logic of the message. Wherein, each expression may include a plurality of sub-expressions, and each sub-expression is expressed by ": "separate or other special symbol separate, when a first sub-expression fails to process, a second sub-expression is processed, and continued processing of subsequent expressions is not suspended until successful. It will be appreciated that the expression processor system may be dynamically compiled into machine code rather than interpreted for execution, thereby ensuring that the processor is flexibly configurable and efficiently executable, and that the built-in processor and the expression processor may be combined with each other into a combined processor. In order to make the field description of the model more intuitive, the processor is put into the model configuration as an attribute of the field.
A processing module 30, configured to perform logic processing on the input message by using a preset processor to obtain an output message, where the performing logic processing on the input message by the processor includes: and extracting field information of the input message, and processing and deforming the field information to obtain an output message.
In this embodiment, the processing module 30 performs logic processing on the input message by using a preset processor to obtain an output message, that is, first, field information of the input message is extracted, then, the field information is subjected to logic processing, the field information is processed and deformed to obtain a message in a to-be-output form, and the message in the to-be-output form is used as the output message.
To better understand the present solution, the exemplary application scenarios are as follows: when the data preprocessing system is started, a preset message model is loaded, and referring to fig. 7, the message model includes an input message model, an intermediate message (including a temporary message and a cache message) model, and a preprocessed output message model. After the message model is loaded, the data is mapped into an internal structure capable of being directly and efficiently positioned to obtain input messages, configuration information can be loaded at the moment, the loaded configuration information is converted into cache messages, the configuration information can exist in various forms, such as a relational database, an attribute file, an XML file and the like, keywords exist in the cache messages, mapping of the keywords and the cache messages can be built inside, and searching according to the keywords or searching by condition matching is allowed. The cache message may be used for large data preprocessing such as data backfill and data reduction, and meanwhile, a message receiving service is loaded, that is, an object to be transmitted after data preprocessing is determined, and the message receiving service is started, where the message receiving service may be a network receiving service of a User Datagram Protocol (UDP) Protocol or a Transmission Control Protocol (TCP) Protocol, and specifically, a corresponding Protocol is selected according to an actual network scene. And finally, processing the input through a preset processor to extract field information in the input message, performing logic processing on the extracted field information to obtain a message in a form to be output, and finally outputting the output message.
The data preprocessing device provided in this embodiment acquires data to be processed when receiving a data preprocessing instruction, maps the acquired data to an input message corresponding to a preset message model, and performs logic processing on the input message by using a preset processor to obtain an output message, where performing logic processing on the input message by the processor includes: the method comprises the steps of extracting field information of the input message, processing and deforming the field information to obtain an output message, mapping different types of data into the input message corresponding to a message model, namely, unstructured data to extract the field information of the input message, performing logic processing on the extracted field information to finally obtain the output message, and not performing preprocessing by different software when the types and the structures of the data are different.
Further, in order to improve the flexibility of data preprocessing, a second embodiment of the data preprocessing apparatus of the present invention is proposed based on the first embodiment, and in this embodiment, referring to fig. 5, the processing module 30 includes:
a first determining submodule 31 for determining a type of the input message;
the obtaining submodule 32 is configured to obtain, according to a mapping relationship between a preset type and a message execution stream, a message execution stream corresponding to the determined type;
a second determining submodule 33 for performing a flow determination processor based on the acquired message, wherein the determined processor includes an output message processor;
in this embodiment, the first determining sub-module 31 first determines the type of the input message, where the type includes a call type, an internet type, and the like, after determining the type of the input message, the obtaining sub-module 32 obtains the message execution flow corresponding to the determined type according to the preset mapping relationship between the type and the message execution flow, as can be understood, the execution flow of the message type corresponding to the message type is stored in the system in advance, i.e., which message corresponds to which execution flow, and with particular reference to fig. 8, the type of incoming message is defined as a, it may be known that the execution flow to which the incoming message corresponds is a-B-C-E-D, it is to be understood that, at this time, the processors corresponding to the execution flow include an intermediate message processor and an output message processor, and the intermediate message processor includes a cache message processor and an interim message processor. It is noted that there is only one execution flow for each input message type.
And the processing submodule 34 is configured to perform logic processing on the field message according to the determined processor to obtain an output message.
In this embodiment, referring to fig. 8 as well, when the type of the input message is E, the message execution flow is E-D, at this time, it can be known that the data preprocessing process only includes the input message and the output message, and does not include an intermediate message, at this time, the processor determined by the second determining sub-module 33 is an output message processor, and the output message processor performs logic processing deformation processing on the input message E to obtain an output message D.
In this embodiment, referring to fig. 6, when the determined processor further includes a temporary message processor and/or a cache message processor, the processing sub-module 34 includes:
a determining unit 34, configured to determine an attribute corresponding to the interim message handler and/or the cache message handler;
a first processing unit 35, configured to, when the attribute corresponding to the interim message processor and/or the cache message processor is a creatable attribute, perform logic processing on the input message according to the interim message processor to obtain an interim message, and/or perform logic processing on the input message according to the cache message processor to obtain a cache message;
and a second processing unit 36, configured to perform logic processing on the temporary message and/or the cache message according to the output message processor to obtain an output message.
In this embodiment, after the processor is determined, the determining unit 34 determines the attribute corresponding to the processor, that is, when the determined processor further includes an interim message processor and/or a cache message processor, the attribute corresponding to the interim message processor and/or the cache message processor is determined, when the attribute corresponding to the interim message processor and/or the cache message processor is a creatable attribute, the first processing unit 35 performs logic processing on the input message according to the interim message processor to obtain an interim message, and/or performs logic processing on the input message according to the cache message processor to obtain a cache message, and the second processing unit 36 performs logic processing on the interim message and/or the cache message according to the output message processor, to obtain an output message. It will be understood that, besides the input message, other messages may have a creation condition handler, that is, the interim message, the cache message and the output message all have a creation condition handler, and the creation condition handler is used for creating messages, for example, when the attribute corresponding to the creation condition handler of the interim message is a creatable attribute, the input message may generate the interim message by the creation condition handler corresponding to the interim message handler, and when the attribute corresponding to the creation condition handler of the interim message is a non-creatable attribute, the input message may not generate the interim message, further, the cache message also has an update condition handler and a delete condition handler, that is, only the cache message handler corresponds to the update condition handler and the delete condition handler, and the update handler is used for updating the cache message, the deletion condition handler is to delete the aged cache message. In this embodiment, to better understand the present embodiment, referring to fig. 8 as well, when the type of the input message is a, the execution flow is a-B, A-B-C-E-D, and it can be understood that after the message execution flow is obtained, it may also be determined whether to generate the cache message or the output message according to a creation condition and a deletion condition in the message model, or whether to delete the message after one message execution flow is processed. That is, whether a message can be generated is determined according to the attribute of the corresponding processor in the message execution stream, if the attribute of the processor is an attribute that cannot be created, the processor cannot generate the message content, if the type of the input message is a, the execution stream is corresponding to the message execution stream, and if only the attribute of the processor corresponding to the message C is a creatable attribute and the attribute of the processor corresponding to each of the other messages is an attribute that cannot be created, the output message that is finally generated is C.
If the attribute of the processor corresponding to each message is a creatable attribute, the height value corresponding to each message needs to be determined first, and the output sequence of each message is determined according to the height value of each message. Namely, the message model is loaded first, and the internal mapping relation is established. And establishing a message execution flow according to the reference relation between the messages, and when one message does not depend on other messages, considering the message as an input message, and traversing the relation graph by taking the message as an inlet of the reference relation to find a list of the reference relation. As in table 1 below:
message Dependent message lists
A
B A
C A
D E,A,C
E B,A
A. B, C, D and E, A is not dependent on any other message, so it is an input message, then the reference relationship can be traversed using A as an entry, the height of each message being equal to the highest value of the height of the messages that it depends on plus one. And sequencing according to the height value of each message to finally obtain a dependency chain table, and ensuring that the depended messages are arranged in front, wherein the reference can be made to FIG. 9, and finally, the obtained output message is D.
Further, in order to improve flexibility of data preprocessing, a third embodiment of the data preprocessing device according to the present invention is proposed based on the second embodiment, and in this embodiment, the data preprocessing device further includes:
and the storage unit is used for storing the temporary message into a preset cache region to generate a cache message if a storage instruction of the temporary message is received, so that the cache message in the cache region is used as the referred data of a new input message when a data preprocessing instruction is received next time.
In this embodiment, when the temporary message and the cache message are logically processed according to the output message processor to obtain the output message, and if a storage instruction of the temporary message is received, the temporary message is stored in a preset cache region to generate the cache message, referring to fig. 7, the temporary message is streamed into the cache message processor to be stored in the preset cache region to generate the cache message, and according to the generated cache message, it can be understood that the cache message generated by the input message being logically processed by the cache message processor is also stored in the preset cache region, and when a data preprocessing instruction is received next time, the obtaining module 10 may use the cache message in the cache region as the referred data of a new input message, so that when data is processed next time, the temporary message or the cache message generated at this time can be used as the quoted data, and the data processing efficiency is improved.
In this embodiment, the database or some configured data may be dynamically loaded into the cache, so that the data may be referred to in the preprocessing process.
Further, in order to improve the flexibility of data preprocessing, a fourth embodiment of the data preprocessing device according to the present invention is proposed based on the third embodiment, and in this embodiment, the data preprocessing device further includes:
the deleting module is used for deleting part of the cache messages in the cache region when the message storage capacity value of the cache region reaches a preset capacity value, wherein the access time of the deleted cache messages is earlier than that of the undeleted cache messages;
or deleting the cache message when the distance between the access time point with the cache message in the cache region and the current time point reaches a preset time length.
In this embodiment, the cache messages may be processed more and more with the messages, and therefore, the cache messages generally need to have a maximum number limit, a storage period, and a deletion elimination mechanism, and therefore, in this embodiment, the deletion module deletes the cache messages in the cache region at regular time, so as to avoid that the cache messages in the cache region occupy excessive capacity, thereby reducing the operation efficiency of data preprocessing.
Further, the cache messages have corresponding timing services, and each cache message is bound with the timer information, so that when massive data exists in the cache region, the cache messages can be automatically deleted.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (10)

1. A data preprocessing method, characterized in that the data preprocessing method comprises the steps of:
when a data preprocessing instruction is received, acquiring data to be processed;
mapping the data to be processed, which is acquired when a data preprocessing instruction is received, into an input message corresponding to a preset message model;
determining the type of the input message, and acquiring a message execution stream corresponding to the type of the input message according to a mapping relation between a preset type and the message execution stream;
determining a processor for performing logic processing on the input message based on the acquired message execution flow;
performing logic processing on the input message by using the processor determined based on the acquired message execution flow to obtain an output message, wherein the performing logic processing on the input message by the processor comprises: and extracting field information of the input message, and processing and deforming the field information to obtain an output message.
2. The data pre-processing method of claim 1, wherein the processor comprises an output message processor.
3. The data preprocessing method of claim 2, wherein when the processor further comprises an interim message processor and/or a cache message processor, the step of logically processing the input message with the processor to obtain the output message comprises:
determining attributes corresponding to the temporary message processor and/or the cache message processor;
when the attribute corresponding to the temporary message processor and/or the cache message processor is a creatable attribute, performing logic processing on the input message according to the temporary message processor to obtain a temporary message, and/or performing logic processing on the input message according to the cache message processor to obtain a cache message;
and performing logic processing on the temporary message and/or the cache message according to the output message processor to obtain an output message.
4. The data preprocessing method of claim 3, wherein, while the processor performs logic processing on the interim message and/or the cache message according to the output message to obtain the output message, if a storage instruction of the interim message is received, the following steps are performed:
and storing the temporary message into a preset cache region to generate a cache message, so that when a data preprocessing instruction is received next time, the cache message in the cache region is used as the referred data of a new input message.
5. The data preprocessing method of claim 4, wherein the data preprocessing method further comprises:
deleting part of cache messages in the cache region when the message storage capacity value of the cache region reaches a preset capacity value, wherein the access time of the deleted cache messages is earlier than that of the undeleted cache messages;
or deleting the cache message when the distance between the access time point with the cache message in the cache region and the current time point reaches a preset time length.
6. A data preprocessing apparatus, characterized in that the data preprocessing apparatus comprises:
the acquisition module is used for acquiring data to be processed when a data preprocessing instruction is received;
the mapping module is used for mapping the data to be processed, which is acquired when a data preprocessing instruction is received, into an input message corresponding to a preset message model;
a processing module, configured to determine a type of the input message, obtain a message execution stream corresponding to the type of the input message according to a mapping relationship between a preset type and the message execution stream, determine, based on the obtained message execution stream, a processor for performing logic processing on the input message, and perform logic processing on the input message by using the processor determined based on the obtained message execution stream to obtain an output message, where performing logic processing on the input message by the processor includes: and extracting field information of the input message, and processing and deforming the field information to obtain an output message.
7. The data pre-processing apparatus of claim 6, wherein the processor comprises an output message processor.
8. The data pre-processing apparatus of claim 7, wherein when the processor further comprises an interim message processor and/or a cache message processor, the processing module comprises:
the determining unit is used for determining the corresponding attribute of the temporary message processor and/or the cache message processor;
the first processing unit is used for performing logic processing on the input message according to the temporary message processor to obtain a temporary message and/or performing logic processing on the input message according to the cache message processor to obtain a cache message when the attribute corresponding to the temporary message processor and/or the cache message processor is a creatable attribute;
and the second processing unit is used for carrying out logic processing on the temporary message and/or the cache message according to the output message processor to obtain an output message.
9. The data preprocessing apparatus of claim 8, wherein the data preprocessing apparatus further comprises:
and the storage unit is used for storing the temporary message into a preset cache region to generate a cache message if a storage instruction of the temporary message is received, so that the cache message in the cache region is used as the referred data of a new input message when a data preprocessing instruction is received next time.
10. The data preprocessing apparatus of claim 9, wherein the data preprocessing apparatus further comprises:
the deleting module is used for deleting part of the cache messages in the cache region when the message storage capacity value of the cache region reaches a preset capacity value, wherein the access time of the deleted cache messages is earlier than that of the undeleted cache messages;
or deleting the cache message when the distance between the access time point with the cache message in the cache region and the current time point reaches a preset time length.
CN201511017135.XA 2015-12-29 2015-12-29 Data preprocessing method and device Active CN106933826B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201511017135.XA CN106933826B (en) 2015-12-29 2015-12-29 Data preprocessing method and device
PCT/CN2016/085161 WO2016197924A1 (en) 2015-12-29 2016-06-07 Data preprocessing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201511017135.XA CN106933826B (en) 2015-12-29 2015-12-29 Data preprocessing method and device

Publications (2)

Publication Number Publication Date
CN106933826A CN106933826A (en) 2017-07-07
CN106933826B true CN106933826B (en) 2020-11-27

Family

ID=57503016

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201511017135.XA Active CN106933826B (en) 2015-12-29 2015-12-29 Data preprocessing method and device

Country Status (2)

Country Link
CN (1) CN106933826B (en)
WO (1) WO2016197924A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115525235B (en) * 2022-11-04 2023-09-08 上海威固信息技术股份有限公司 Data operation method and system based on storage structure

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101808175A (en) * 2009-02-13 2010-08-18 华为技术有限公司 Ticket converting method and device
CN104156395A (en) * 2014-07-14 2014-11-19 上海东方延华节能技术服务股份有限公司 Data storage system

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080259930A1 (en) * 2007-04-20 2008-10-23 Johnston Simon K Message Flow Model of Interactions Between Distributed Services
CN101272376A (en) * 2008-05-06 2008-09-24 中兴通讯股份有限公司 Message resolution method
US20140365403A1 (en) * 2013-06-07 2014-12-11 International Business Machines Corporation Guided event prediction

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101808175A (en) * 2009-02-13 2010-08-18 华为技术有限公司 Ticket converting method and device
CN104156395A (en) * 2014-07-14 2014-11-19 上海东方延华节能技术服务股份有限公司 Data storage system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
数字图书馆个性化服务与Web日志挖掘数据预处理技术;柳胜国;《现代情报》;20070730;第65-67页 *

Also Published As

Publication number Publication date
WO2016197924A1 (en) 2016-12-15
CN106933826A (en) 2017-07-07

Similar Documents

Publication Publication Date Title
CN109739894B (en) Method, device, equipment and storage medium for supplementing metadata description
US11899681B2 (en) Knowledge graph building method, electronic apparatus and non-transitory computer readable storage medium
CN110704479A (en) Task processing method and device, electronic equipment and storage medium
CN111045655A (en) Page rendering method and device, rendering server and storage medium
CN110347399B (en) Data processing method, real-time computing system and information system
CN110955646A (en) Data storage and query method, device, equipment and medium
CN110888672B (en) Expression engine implementation method and system based on metadata architecture
CN113051460A (en) Elasticissearch-based data retrieval method and system, electronic device and storage medium
CN102882988A (en) Method, device and equipment for acquiring address information of resource information
CN106933826B (en) Data preprocessing method and device
US11449461B2 (en) Metadata-driven distributed dynamic reader and writer
WO2019000897A1 (en) Data acquisition method and device
JP6323461B2 (en) Server apparatus, client apparatus, information processing method, and recording medium
CN112307061A (en) Method and device for querying data
CN110955712A (en) Development API processing method and device based on multiple data sources
WO2023092981A1 (en) Streaming data processing method, rule plug-in, and streaming data processing module and system
CN107508705B (en) Resource tree construction method of HTTP element and computing equipment
CN113407702B (en) Employee cooperation relationship intensity quantization method, system, computer and storage medium
CN112528593B (en) Document processing method, device, electronic equipment and storage medium
CN115525671A (en) Data query method, device, equipment and storage medium
JPWO2015052967A1 (en) Server apparatus, client apparatus, information processing method, and recording medium
CN117349332B (en) Method and device for generating application programming interface API and electronic equipment
CN113297306B (en) Data processing method and device
CN117251416B (en) File scanning method, device, computer equipment and storage medium
CN117010358A (en) Message card generation method, device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant