WO2016197924A1 - 数据预处理方法及装置 - Google Patents

数据预处理方法及装置 Download PDF

Info

Publication number
WO2016197924A1
WO2016197924A1 PCT/CN2016/085161 CN2016085161W WO2016197924A1 WO 2016197924 A1 WO2016197924 A1 WO 2016197924A1 CN 2016085161 W CN2016085161 W CN 2016085161W WO 2016197924 A1 WO2016197924 A1 WO 2016197924A1
Authority
WO
WIPO (PCT)
Prior art keywords
message
processor
cache
data
temporary
Prior art date
Application number
PCT/CN2016/085161
Other languages
English (en)
French (fr)
Inventor
占义忠
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2016197924A1 publication Critical patent/WO2016197924A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24553Query execution of query operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24552Database cache management

Definitions

  • This document relates to but not limited to the field of data processing, and in particular to a data preprocessing method and apparatus.
  • the amount of data generated has also increased. Since the data generated by the data source has various types and different formats, before processing the data, the data must be pre-processed, and the specific processing operation of the data is performed after the pre-processing, because the types and formats of the data are different. Different, the structure of the data is not the same. For different data, different software needs to be developed, each data is pre-processed, and after the pre-processing is finished, the data is concentrated into the processor to perform subsequent processing operations. The data is pre-processed by different development software, which leads to time-consuming, low-efficiency and high operation cost of the data pre-processing process.
  • the embodiment of the invention provides a data preprocessing method and device, which can reduce the time required for the data preprocessing process, improve efficiency, and reduce operating cost.
  • the input message is logically processed by using a preset processor to obtain an output message, wherein the processor performing logical processing on the input message includes: extracting field information of the input message, and extracting the field The information is processed and deformed to get an output message.
  • the using the preset processor to logically process the input message to obtain The steps to output the message include:
  • the input message is logically processed according to the determined processor to obtain an output message.
  • the step of logically processing the input message according to the determined processor to obtain an output message include:
  • the input message is logically processed according to the temporary message processor to obtain a temporary message, and/or according to the The cache message processor logically processes the input message to obtain a cache message;
  • the temporary message and/or the cache message are logically processed according to the output message processor to obtain an output message.
  • the temporary message is stored in a preset cache area to generate a cache message for using the cached message in the cache area as the referenced data of the new input message when the data pre-processing instruction is received next time.
  • the data preprocessing method further includes:
  • the embodiment of the present invention further provides a data preprocessing apparatus, where the data preprocessing apparatus includes:
  • Obtaining a module configured to acquire data to be processed when receiving a data preprocessing instruction
  • mapping module configured to map the acquired data to be processed into an input message corresponding to the preset message model
  • the processing module is configured to perform logic processing on the input message by using a preset processor to obtain an output message, where the processor performing logical processing on the input message includes: extracting field information of the input message, The field information is processed and deformed to obtain an output message.
  • the processing module includes:
  • a first determining submodule configured to determine a type of the input message
  • Obtaining a sub-module configured to obtain a message execution flow corresponding to the determined type according to a mapping relationship between the preset type and the message execution flow;
  • a second determining submodule configured to determine the processor based on the obtained message execution flow, wherein the determined processor includes an output message processor
  • Processing the sub-module configured to logically process the input message according to the determined processor to obtain an output message.
  • the processing submodule includes:
  • a determining unit configured to determine an attribute corresponding to the temporary message processor and/or the cache message processor
  • a first processing unit configured to: when the attribute corresponding to the temporary message processor and/or the cache message processor is an createable attribute, logically process the input message according to the temporary message processor to obtain a temporary Messages, and/or logically processing the input message according to the cache message processor to obtain a cache message;
  • a second processing unit configured to process the temporary message and/or according to the output message processor
  • the cached message is logically processed to obtain an output message.
  • the data preprocessing apparatus further includes:
  • a storage unit configured to store the temporary message in a preset cache area to generate a cache message for receiving a data pre-processing instruction next time, if the storage instruction receives the temporary message
  • the cached message is used as the referenced data for the new input message.
  • the data preprocessing apparatus further includes:
  • Deleting a module configured to delete a partial cache message in the cache area when a message storage capacity value of the cache area reaches a preset capacity value, where the deleted cache message has an access time earlier than an undeleted cache message interview time;
  • the data preprocessing method and device acquires the data to be processed when the data preprocessing instruction is received, and maps the obtained data to an input message corresponding to the preset message model, and adopts a preset process.
  • the logic processing the input message to obtain an output message wherein the processor logically processing the input message comprises: extracting field information of the input message, and performing processing deformation on the field information,
  • the mapping of different types of data to the input message corresponding to the message model is implemented, that is, the data is destructured to extract the field information of the input message, and the extracted field information is logically processed, and finally obtained
  • the message is output, rather than when the type structure of the data is different, different software is required to separately perform the pre-processing.
  • the embodiment of the invention improves the efficiency of data pre-processing and reduces the operation cost of data pre-processing.
  • FIG. 1 is a schematic flowchart of a data preprocessing method according to an alternative embodiment of the present invention.
  • FIG. 2 is a schematic flowchart of logical processing of the input message by using a preset processor to obtain an output message according to an optional embodiment of the present invention
  • FIG. 3 is an exemplary embodiment of the present invention for logicizing the input message according to the determined processor Schematic diagram of the process of the series processing;
  • FIG. 4 is a schematic diagram of functional modules of a data preprocessing apparatus according to an alternative embodiment of the present invention.
  • FIG. 5 is a schematic diagram of a refinement function module of the processing module of FIG. 4;
  • FIG. 6 is a schematic diagram of a refinement function module of the processing sub-module of FIG. 5;
  • FIG. 7 is a schematic diagram of an implementation scenario according to an embodiment of the present invention.
  • FIG. 8 is a schematic diagram of a preset message execution flow according to an embodiment of the present invention.
  • FIG. 9 is a schematic diagram of height values corresponding to each message in the message execution flow of FIG. 8.
  • the invention provides a data preprocessing method.
  • FIG. 1 is a schematic flowchart of a first embodiment of a data preprocessing method according to the present invention.
  • This embodiment provides a data preprocessing method, where the data preprocessing method includes:
  • Step S10 Acquire data to be processed when receiving a data preprocessing instruction
  • the data to be processed includes: data generated by the data source, and it is noted that, in the data preprocessing process, the temporary message or the cache message generated by the previous data preprocessing process may be acquired, and the The temporary message or the cached message is used as the referenced data of the data to be processed (ie, the data to be processed is referred to by the temporary message or the cached message).
  • the temporary message generated during the last data preprocessing may be stored first.
  • the cached message is used in the preset cache space, and then the cached message in the cache area is used as the referenced data of the data to be processed.
  • Step S20 mapping the acquired data to be processed into an input message corresponding to the preset message model
  • the data pre-processing process first abstracts the data to be processed into a message, and then pre-processes the abstracted message, and abstracts the data to be processed into a message by loading a preset message.
  • the model maps the data to be processed to an input message corresponding to the message model.
  • Mapping the data to be processed to the input message corresponding to the message model includes:
  • Each field information included in the data to be processed is extracted, and each field information extracted is arranged according to the form of the message model according to a preset message model, and finally the aligned field information is used as an input message.
  • the input message is composed of a set of consecutive fields, and the field may be a simple data type or a complex combined data type.
  • the structure of the message model includes:
  • the message name is a string beginning with a letter, and cannot contain special characters. For example, special characters include spaces, '.', '-', etc., and because the message names need to be referenced in the message model, the message The name must be globally unique and can represent the meaning of the description message.
  • the message name is also the file name configured as the model.
  • the message type includes the encoding mode of the message model.
  • the encoding mode of the message model can be fixed format coding. , such as TLV (Type-Length-Value, an encoding format) encoding, can also be some custom complex encoding and so on.
  • the logical conditional expression of the message the message model will be configured with some logical expressions to control the flow of the message being processed, such as the creation conditional expression processor of the cache message and the delete conditional expression processor, when the conditional expression is satisfied
  • the cached message will be created and cached.
  • the delete conditional expression is satisfied, the corresponding cached message will be deleted from the cache after the process is processed.
  • the message consists of fields.
  • the message model is expressed in XML (Extensible Markup Language).
  • the message model can be saved in a relational database.
  • the input message may be call signaling of the mobile network, or may be an online record of the user.
  • the field name includes a string beginning with a letter and cannot contain special characters.
  • special characters include spaces, '.', '-', etc., and the field name must be unique within each message model, and can Represents the meaning of the field;
  • the field type can be a basic field type such as an integer, a string, etc., or a composite field type, wherein the integer can be divided into single bytes, Double-byte, four-byte, long-form, etc.; and the field length is eg [0,65535], it is worth noting that when the field is a string type, the field length represents the maximum length of the string.
  • the processor of the field can be processed or calculated by the generated message field, this processing or operation can abstract the general-purpose processor.
  • general logic processing can be realized by expression processor
  • expression processor is logic processing code that supports online editing and compilation
  • expression implies message association and processing logic.
  • Each expression can contain multiple subexpressions, each separated by a ":" or other special symbol. When the first subexpression fails, the second subexpression is processed. Until the success, the abort continues to process subsequent expressions. Understandably, the expression processor system dynamically compiles into machine code instead of interpreting execution, which ensures that the processor can be flexibly configured and executed efficiently.
  • the built-in processor and the expression processor can be combined into one another. processor. To make the field description of the model more intuitive, put the processor as a field property into the model configuration.
  • step S30 the input message is logically processed by using a preset processor to obtain an output message, wherein the processor performing logical processing on the input message includes: extracting field information of the input message, and The field information is processed and deformed to obtain an output message.
  • the input message is logically processed by using a preset processor to obtain an output message, that is, the field information of the input message is first extracted, and then the field information is logically processed, that is, The field information is processed and deformed to obtain a message in a form to be output, and the message in the form to be output is used as an output message.
  • the data preprocessing system starts loading a preset message model.
  • the message model includes an input message model, an intermediate message (including temporary messages and cache messages), and a pre-model.
  • the processed output message model After the message model is loaded, the data to be processed is mapped into an internal structure that can be directly and efficiently located, and an input message is obtained.
  • the configuration information can be loaded at this time, and the loaded configuration information is converted into a cache message, and the configuration information is configured. It can exist in many forms, such as relational databases, property files, XML files, etc.
  • the cached messages have keywords, and the mapping of keywords and cached messages is established internally, allowing search by keyword or conditional matching.
  • the cached message can be set to be preprocessed by big data such as data backfilling and data reduction.
  • the message receiving service is loaded, that is, the object to be transmitted after the data is preprocessed is determined, and the object is started.
  • the message receiving service may be a network receiving service of a UDP (User Datagram Protocol) protocol or a TCP (Transmission Control Protocol) protocol, and the corresponding information is selected according to the actual network scenario. protocol.
  • UDP User Datagram Protocol
  • TCP Transmission Control Protocol
  • the input message is processed by a preset processor to extract field information in the input message, and logically process the extracted field information to obtain a message to be outputted, and finally The output message is output.
  • the data preprocessing method in this embodiment obtains the data to be processed when the data preprocessing instruction is received, and maps the obtained data to an input message corresponding to the preset message model, and uses a preset processor pair.
  • the input message is logically processed to obtain an output message, wherein the processor logically processing the input message includes: extracting field information of the input message, and performing deformation deformation on the field information to obtain an output
  • the message realizes mapping different types of data into input messages corresponding to the message model, that is, de-structuring the data to extract field information of the input message, and logically processing the extracted field information, and finally obtaining an output message.
  • different software is required to separately perform the pre-processing.
  • the embodiment of the present invention improves the efficiency of data pre-processing and reduces the operation cost of data pre-processing.
  • the step S30 includes:
  • Step S31 determining a type of the input message
  • the corresponding schema model ie, the preset message model
  • the corresponding schema model may be found according to the input message number or the unique identifier, and the type of the message is obtained from the schema model.
  • Step S32 Obtain a message execution flow corresponding to the determined type according to a mapping relationship between the preset type and the message execution flow;
  • Step S33 executing a flow determination processor based on the acquired message, wherein the determined processor includes an output message processor;
  • the type of the input message is determined first, and the type includes a call type, an Internet access type, and the like. After determining the type of the input message, the mapping relationship between the preset type and the message execution flow is determined. The type of message corresponding to the flow of execution, it is understandable that the system The message execution flow corresponding to the message type and the message type is stored, that is, which message corresponds to which message execution flow. Referring specifically to FIG. 8, when the type of the input message is defined as A, the message execution flow corresponding to the type of the input message can be known. For ABCED.
  • the processor corresponding to the message execution flow at this time includes an intermediate message processor and an output message processor, and the intermediate message processor includes a cache message processor and/or a temporary message processor. It is worth noting that there is only one message execution flow for each input message type.
  • Step S34 performing logical processing on the input message according to the determined processor to obtain an output message.
  • the message execution flow is ED.
  • the data preprocessing process includes only the input message and the output message, and does not include the middle.
  • the determined processor is an output message processor, and the output message processor logically processes the input message E to obtain an output message D.
  • the step S34 includes:
  • Step S341 determining an attribute corresponding to the temporary message processor and/or the cache message processor
  • Step S342 when the attribute corresponding to the temporary message processor and/or the cache message processor is an createable attribute, logically processing the input message according to the temporary message processor to obtain a temporary message, and/ Or logically processing the input message according to the cache message processor to obtain a cache message;
  • Step S343 performing logical processing on the temporary message and/or the cached message according to the output message processor to obtain an output message.
  • determining an attribute corresponding to the processor that is, determining that the processor further includes a temporary message processor and/or a cache message processor
  • An attribute corresponding to the temporary message processor and/or the cache message processor according to the temporary message processor, when the attribute corresponding to the temporary message processor and/or the cache message processor is an createable attribute Logging the input message to obtain a temporary message, and/or logically processing the input message according to the cache message processor to obtain a cache
  • the output message processor logically processing the temporary message and/or the cached message to obtain an output message.
  • the input message may generate a temporary message by using the creation condition processor corresponding to the temporary message processor, in the temporary message
  • the attribute corresponding to the condition processor is created, the input message does not generate a temporary message.
  • the cache message further includes an update condition processor and a deletion condition processor, that is, only the The cache message processor corresponds to an update condition processor and a delete condition processor, the update processor is configured to update the cache message, and the delete condition processor is configured to delete the aged cache message.
  • the message execution flow is ABCED, and it can be understood that after the message execution flow is obtained. It is also possible to decide whether to generate the cached message or the output message according to the creation condition and the deletion condition in the message model, or whether to delete the message after the completion of processing of the message execution flow.
  • the attribute of the corresponding processor in the message execution flow whether the message can be generated, if the attribute of the processor is an uncreateable attribute, the processor cannot generate the message content, when the type of the input message is A.
  • the execution flow is ABCED. If only the attribute of the processor corresponding to the message C is an createable attribute, and the attribute of the processor corresponding to each of the other messages is a non-createable attribute, the output message finally generated is C.
  • the height value corresponding to each message needs to be determined first (as shown in FIG. 1, W A represents the height value of the node A, and W B represents the height value of the node B). W C represents the height value of the C node, and W E represents the height value of the E node.
  • the output order of each message is determined according to the height value of each message. That is, the message model is first loaded to establish an internal mapping relationship. The message execution flow is established according to the reference relationship between the messages. When a message does not depend on other messages, it can be regarded as an input message, then the message is used as an entry point of the reference relationship, the relationship diagram is traversed, and a list of reference relationships is found. As shown in Table 1:
  • A, B, C, D and E five message dependencies, A is not dependent on any other message, so it is an input message, then you can use A as the entry, traverse the reference relationship, the height of each message is equal to The maximum value of the height of the message it depends on plus one. Sort according to the height value of each message, and finally get a dependency list to ensure that the dependent messages are ranked first. Referring to Figure 9, finally, the output message is D.
  • a third embodiment of the data preprocessing method of the present invention is proposed based on the second embodiment.
  • step S343 if the step S343 is performed, if the temporary message is received.
  • Store instructions perform the following steps:
  • the temporary message is stored in a preset cache area to generate a cache message for using the cached message in the cache area as the referenced data of the new input message when the data pre-processing instruction is received next time.
  • the temporary message and the cache message are logically processed according to the output message processor to obtain an output message, and if the storage instruction of the temporary message is received, the temporary message is stored. Go to the preset cache area to generate a cache message.
  • the temporary message is sent to the cache message processor to be stored in a preset cache area to generate a cache message.
  • the cache message generated by the cache message processor to logically process the input message is also stored in a preset cache area, and the cached message in the cache area may be used as a new input when the data pre-processing instruction is received next time.
  • the referenced data of the message realizes that when the data is processed next time, the temporary message or the cached message generated this time can be used as the referenced data, thereby improving the efficiency of data processing.
  • some of the configured data in the database may be dynamically loaded into the cache so that the preprocessed process can be used as the referenced data.
  • the number of the present invention is proposed based on the third embodiment.
  • the data preprocessing method further includes:
  • the cached message may be processed more and more as the message is continuously processed. Therefore, the cached message generally needs to have a maximum number of limits, a save period, and an elimination delete mechanism.
  • the cached message in the cache area prevents the cached message in the cache area from occupying too much capacity, thereby reducing the efficiency of data pre-processing.
  • the cached area is deleted by timing, and the storage space of the cache area is increased. Improve the operating efficiency of the system.
  • the cached message has a corresponding timing service, and each cached message itself is bound with timer information, so that when there is a large amount of data in the cache area, the cached message can be automatically deleted.
  • the above method can be implemented by a server.
  • Embodiments of the present invention also provide a computer readable storage medium storing computer executable instructions for performing any of the methods described above.
  • the invention further provides a data preprocessing apparatus.
  • FIG. 4 is a schematic diagram of functional modules of a first embodiment of a data pre-processing apparatus according to the present invention.
  • the functional block diagram shown in FIG. 4 is merely an exemplary diagram of an alternative embodiment, and those skilled in the art will surround the functional modules of the data pre-processing apparatus shown in FIG.
  • the new functional modules can be easily supplemented; the name of each functional module is a custom name, which is only used to assist in understanding each program function block of the data pre-processing device, and is not used to define the technical solution of the present invention.
  • the functionality that each functional module of a custom name has to achieve.
  • This embodiment provides a data preprocessing apparatus, where the data preprocessing apparatus includes:
  • the obtaining module 10 is configured to acquire data to be processed when receiving the data preprocessing instruction
  • the acquiring module 10 acquires the data to be processed, and the data to be processed includes: data generated by a data source, and it is noted that the data is During the pre-processing, the temporary message or the cache message generated by the last data pre-processing process may be obtained, and the temporary message or the cache message is used as the referenced data of the data to be processed, optionally, in the last data pre-processing process.
  • the temporary message generated in the cached message may be stored in the preset cache space as a cached message, and then the cached message in the cached area is used as the referenced data of the to-be-processed data.
  • the mapping module 20 is configured to map the acquired data to be processed into an input message corresponding to the preset message model
  • the data pre-processing process first abstracts the data to be processed into a message, and then pre-processes the abstracted message, and abstracts the data to be processed into a message by loading a preset message.
  • the mapping module 20 maps the data to be processed into an input message corresponding to the message model, and the mapping module 20 is configured to implement mapping the data to be processed into an input message corresponding to the message model in the following manner. And extracting each field information included in the data to be processed, and sorting each field information extracted according to the preset message model according to the form of the message model, and finally using the aligned field information as an input message.
  • the input message is composed of a set of consecutive fields, and the field may be a simple data type or a complex combined data type.
  • the structure of the message model includes:
  • the message name is a string beginning with a letter, and cannot contain special characters. For example, special characters include spaces, '.', '-', etc., and because the message names need to be referenced in the message model, the message The model name must be globally unique and can represent the meaning of the description message.
  • the message name also serves as the file name of the model configuration.
  • the message type includes the encoding mode of the message model, and the encoding mode of the message model can be a fixed format. Encoding, such as TLV (Type-Length-Value, an encoding format) encoding, can also be some custom complex encoding and so on.
  • the logical conditional expression of the message the message model will be configured with some logical expressions to control the flow of the message being processed, such as the creation conditional expression processor of the cache message and the delete conditional expression processor, when the conditional expression is satisfied
  • the cached message will be created and cached, and the delete bar will be deleted.
  • the expression is satisfied, the corresponding cached message will be deleted from the cache after the process is processed;
  • the message consists of fields.
  • the message model is expressed in XML (Extensible Markup Language).
  • the message model can be saved in a relational database.
  • the input message may be call signaling of the mobile network, or may be an online record of the user.
  • the field name includes a string beginning with a letter and cannot contain special characters.
  • special characters include spaces, '.', '-', etc., and the field name must be unique within each message model, and can Represents the meaning of the field;
  • the field type can be a basic field type such as an integer, a string, etc., or a composite field type, wherein the integer can be divided into single-byte, double-byte, and four-byte. And long integers, etc.; and the field length is eg [0,65535], it is worth noting that when the field is a string type, the field length represents the maximum length of the string.
  • the processor of the field can be processed or calculated by the generated message field, this processing or operation can abstract the general-purpose processor.
  • general logic processing can be realized by expression processor
  • expression processor is logic processing code that supports online editing and compilation
  • expression implies message association and processing logic.
  • Each expression can contain multiple subexpressions, each separated by a ":" or other special symbol. When the first subexpression fails, the second subexpression is processed. Until the success, the abort continues to process subsequent expressions. Understandably, the expression processor system dynamically compiles into machine code instead of interpreting execution, which ensures that the processor can be flexibly configured and executed efficiently.
  • the built-in processor and the expression processor can be combined into one another. processor. To make the field description of the model more intuitive, put the processor as a field property into the model configuration.
  • the processing module 30 is configured to perform logical processing on the input message by using a preset processor to obtain an output message, where the processor performing logical processing on the input message includes: extracting field information of the input message And processing and deforming the field information to obtain an output message.
  • the processing module 30 performs logical processing on the input message by using a preset processor to obtain an output message, that is, first extracts field information of the input message, and then the field information.
  • the logical processing is performed, that is, the field information is processed and deformed to obtain a message to be outputted, and the message in the form to be output is used as an output message.
  • the data preprocessing system starts loading a preset message model.
  • the message model includes an input message model, an intermediate message (including temporary messages and cache messages), and a pre-model.
  • the processed output message model After the message model is loaded, the data to be processed is mapped into an internal structure that can be directly and efficiently located, and an input message is obtained.
  • the configuration information can be loaded at this time, and the loaded configuration information is converted into a cache message, and the configuration information is configured. It can exist in various forms, such as relational databases, attribute files, XML files, etc.
  • keywords in the cached message and a mapping of keywords and cached messages is established internally, allowing search by keyword or conditional matching.
  • the cached message may be set to be pre-processed for data backfilling, data reduction, and the like, and at the same time, the message receiving service is loaded, that is, the object to be transmitted after the data is preprocessed is determined, and the message receiving service is started, and the message receiving service may be The network receiving service of the UDP (User Datagram Protocol) protocol or the TCP (Transmission Control Protocol) protocol, and the corresponding protocol is selected according to the actual network scenario.
  • UDP User Datagram Protocol
  • TCP Transmission Control Protocol
  • the data pre-processing device of the embodiment when receiving the data pre-processing instruction, acquires the data to be processed, and maps the acquired data into an input message corresponding to the preset message model, and adopts a preset processor pair.
  • the input message is logically processed to obtain an output message, wherein the processor logically processing the input message includes: extracting field information of the input message, and performing deformation deformation on the field information to obtain an output
  • the message realizes mapping different types of data into input messages corresponding to the message model, that is, de-structuring the data to extract field information of the input message, and logically processing the extracted field information, and finally obtaining an output message.
  • different software is required to separately perform the pre-processing.
  • the embodiment of the present invention improves the efficiency of data pre-processing and reduces the operation cost of data pre-processing.
  • the processing module 30 includes:
  • a first determining submodule 31, configured to determine a type of the input message
  • the obtaining sub-module 32 is configured to acquire a message execution flow corresponding to the determined type according to a mapping relationship between the preset type and the message execution flow;
  • a second determining sub-module 33 configured to execute a flow determining processor based on the acquired message, wherein the determined processor comprises an output message processor;
  • the first determining sub-module 31 first determines the type of the input message, the type includes a call type, an Internet access type, etc., after determining the type of the input message, the obtaining sub-module 32 is based on The mapping between the preset type and the message execution flow, and the message execution flow corresponding to the determined type is obtained.
  • the message execution flow corresponding to the message type and the message type is stored in the system in advance, that is, which message corresponds to which type
  • the message execution flow specifically referring to FIG. 8, defines the type of the input message as A, and then the message execution flow corresponding to the type of the input message is ABCED.
  • the processor corresponding to the message execution flow includes the intermediate message.
  • the processor and the output message processor, and the intermediate message processor includes a cache message processor and/or a temporary message processor. It is worth noting that there is only one message execution flow for each input message type.
  • the processing sub-module 34 is configured to logically process the field message according to the determined processor to obtain an output message.
  • the message execution flow is ED.
  • the data preprocessing process includes only the input message and the output message, and does not include the middle.
  • the processor determined by the second determining sub-module 33 is an output message processor, and the output message processor logically processes the input message E to obtain an output message D.
  • the processing sub-module 34 includes:
  • Determining unit 34 configured to determine the temporary message processor and/or the cache message processing Attribute corresponding to the device
  • the first processing unit 35 is configured to: when the attribute corresponding to the temporary message processor and/or the cache message processor is an createable attribute, logically process the input message according to the temporary message processor to obtain Temporary message, and/or logically processing the input message according to the cache message processor to obtain a cache message;
  • the second processing unit 36 is configured to logically process the temporary message and/or the cached message according to the output message processor to obtain an output message.
  • the determining unit 34 first determines an attribute corresponding to the processor, that is, the determined processor further includes a temporary message processor and/or a cache message processing. Determining an attribute corresponding to the temporary message processor and/or the cache message processor, when the attribute corresponding to the temporary message processor and/or the cache message processor is an createable attribute, The first processing unit 35 logically processes the input message according to the temporary message processor to obtain a temporary message, and/or logically processes the input message according to the cache message processor to obtain a cache message, The second processing unit 36 logically processes the temporary message and/or the cached message according to the output message processor to obtain an output message.
  • the input message may generate a temporary message by using the creation condition processor corresponding to the temporary message processor, in the temporary message
  • the attribute corresponding to the condition processor is created, the input message does not generate a temporary message.
  • the cache message further includes an update condition processor and a deletion condition processor, that is, only the The cache message processor corresponds to an update condition processor and a delete condition processor, the update processor is configured to update the cache message, and the delete condition processor is configured to delete the aged cache message.
  • the execution flow is AB, ABCED, and it can be understood that the message execution flow is obtained. After that, it is also possible to determine whether to generate the cached message or the output message according to the creation condition and the deletion condition in the message model, or whether to delete the message after the completion of processing of the message execution flow.
  • determining whether a message can be generated according to the attribute of the corresponding processor in the message execution flow if the processor belongs to If the attribute is a non-createable attribute, the processor cannot generate the message content.
  • the type of the input message is A
  • the execution flow is corresponding to the message execution flow.
  • the attribute of the processor is that the attribute can be created, and the attribute of the processor corresponding to each other message is a non-createable attribute, and the output message finally generated is C.
  • the height value corresponding to each message needs to be determined first, and the output order of each message is determined according to the height value of each message. That is, the message model is first loaded to establish an internal mapping relationship. The message execution flow is established according to the reference relationship between the messages. When a message does not depend on other messages, it can be regarded as an input message, then the message is used as an entry point of the reference relationship, the relationship diagram is traversed, and a list of reference relationships is found. As shown in Table 1:
  • A, B, C, D and E five message dependencies, A is not dependent on any other message, so it is an input message, then you can use A as the entry, traverse the reference relationship, the height of each message is equal to The maximum value of the height of the message it depends on plus one. Sort according to the height value of each message, and finally get a dependency list to ensure that the dependent messages are ranked first. Referring to Figure 9, finally, the output message is D.
  • the third embodiment of the data pre-processing apparatus of the present invention is proposed based on the second embodiment.
  • the data pre-processing apparatus further includes:
  • a storage unit configured to store the temporary message in a preset cache area to generate a cache message for receiving a data pre-processing instruction next time, if the storage instruction receives the temporary message
  • the cached message is used as the referenced data for the new input message.
  • the temporary message and the cache are processed according to the output message processor.
  • the message is logically processed to obtain the output message. If the storage instruction of the temporary message is received, the temporary message is stored in a preset buffer area to generate a cache message. Referring to FIG. 7, the temporary message is flown to The cache message processor is configured to generate a cache message in a preset cache area. It can be understood that the cache message generated by logically processing the input message according to the cache message processor is also stored in a preset cache. In the area, the next time the data pre-processing instruction is received, the obtaining module 10 may use the cached message in the cache area as the referenced data of the new input message, and realize the next time the data is processed. The generated temporary message or cached message is used as the referenced data, which improves the efficiency of data processing.
  • some of the configured data in the database may be dynamically loaded into the cache so that the preprocessed process can be used as the referenced data.
  • the fourth embodiment of the data pre-processing apparatus of the present invention is proposed based on the third embodiment.
  • the data pre-processing apparatus further includes:
  • Deleting a module configured to delete a partial cache message in the cache area when a message storage capacity value of the cache area reaches a preset capacity value, where the deleted cache message has an access time earlier than an undeleted cache message interview time;
  • the cached message may be processed more and more as the message continues to be processed. Therefore, the cached message generally needs to have a maximum number of limits, a save period, and an elimination delete mechanism. Therefore, the deletion module is used in this embodiment.
  • the deletion module is used in this embodiment. By periodically deleting the cached message in the cache area, the cached message in the cache area is prevented from occupying too much capacity, thereby reducing the efficiency of data pre-processing.
  • the cached area is increased by periodically deleting the cached message. The storage space improves the operating efficiency of the system.
  • the cached message has a corresponding timing service, and each cached message itself is bound with timer information, so that when there is a large amount of data in the cache area, the cached message can be automatically deleted.
  • each module/unit in the foregoing embodiment may be implemented in the form of hardware, for example, by implementing an integrated circuit to implement its corresponding function, or may be implemented in the form of a software function module, for example, executing a program in a storage and a memory by a processor. / instruction to achieve its corresponding function.
  • the invention is not limited to any specific form of combination of hardware and software.
  • the foregoing embodiment method can be implemented by means of software plus a necessary general hardware platform, and of course, can also be through hardware, but in many cases, the former is better.
  • Implementation Based on such understanding, the technical solution of the present invention, which is essential or contributes to the prior art, may be embodied in the form of a software product stored in a storage medium (such as ROM/RAM, disk,
  • the optical disc includes a number of instructions for causing a terminal device (which may be a cell phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the methods described in various embodiments of the present invention.
  • the above technical solution improves the efficiency of data preprocessing and reduces the operating cost of data preprocessing.

Abstract

一种数据预处理方法,在接收到数据预处理指令时,获取待处理的数据;将获取的待处理的数据映射为预设消息模型对应的输入消息;采用预设的处理器对所述输入消息进行逻辑处理,以得到输出消息,其中,所述处理器对所述输入消息进行逻辑处理包括:提取所述输入消息的字段信息,并对所述字段信息进行加工变形,以得到输出消息。

Description

数据预处理方法及装置 技术领域
本文涉及但不限于数据处理领域,尤其涉及一种数据预处理方法及装置。
背景技术
随着计算机、互联网和物联网等技术在多个领域的大规模应用,产生的数据量也随之增多。由于数据源产生的数据类型多样,格式不一,因此,在处理数据之前,要先对数据进行预处理,并在预处理后才执行对数据的具体处理操作,而由于数据的类型多样和格式不一,导致数据的结构不相同,对于不同的数据,需要开发不同的软件,分别对每种数据进行预处理,并在预处理结束后,才将数据集中到处理器中执行后续的处理操作,而通过不同的开发软件对数据进行预处理,导致数据预处理的过程操作耗时,效率低,操作成本高。
发明内容
以下是对本文详细描述的主题的概述。本概述并非是为了限制权利要求的保护范围。
本发明实施例提出一种数据预处理方法及装置,能够减少对数据预处理的过程所需要的时间,提高效率,降低操作成本。
本发明实施例提供的一种数据预处理方法,所述数据预处理方法包括以下步骤:
在接收到数据预处理指令时,获取待处理的数据;
将获取的待处理的数据映射为预设消息模型对应的输入消息;
采用预设的处理器对所述输入消息进行逻辑处理,以得到输出消息,其中,所述处理器对所述输入消息进行逻辑处理包括:提取所述输入消息的字段信息,并对所述字段信息进行加工变形,以得到输出消息。
可选地,所述采用预设的处理器对所述输入消息进行逻辑处理,以得到 输出消息的步骤包括:
确定所述输入消息的类型;
根据预设的类型与消息执行流的映射关系,获取确定的类型对应的消息执行流;
基于获取的所述消息执行流确定所述处理器,其中,确定的所述处理器包括输出消息处理器;
根据确定的所述处理器对所述输入消息进行逻辑处理,以得到输出消息。
可选地,在确定的所述处理器还包括临时消息处理器和/或缓存消息处理器时,所述根据确定的所述处理器对所述输入消息进行逻辑处理,以得到输出消息的步骤包括:
确定所述临时消息处理器和/或所述缓存消息处理器对应的属性;
在所述临时消息处理器和/或所述缓存消息处理器对应的属性为可创建属性时,根据所述临时消息处理器对所述输入消息进行逻辑处理以得到临时消息,和/或根据所述缓存消息处理器对所述输入消息进行逻辑处理以得到缓存消息;
根据所述输出消息处理器对所述临时消息和/或所述缓存消息进行逻辑处理,以得到输出消息。
可选地,所述根据所述输出消息处理器对所述临时消息和/或所述缓存消息进行逻辑处理,以得到输出消息的同时,若接收到临时消息的存储指令,执行以下步骤:
将所述临时消息存储到预设的缓存区域中,以生成缓存消息,以供下次接收到数据预处理指令时,将所述缓存区域中的缓存消息作为新的输入消息的被引用数据。
可选地,所述数据预处理方法还包括:
在所述缓存区域的消息存储容量值达到预设容量值时,删除所述缓存区域中的部分缓存消息,其中,删除的缓存消息的访问时间早于未删除的缓存消息的访问时间;
或者,删除缓存区域中访问时间点距离当前时间点达到预设时长的缓存消息。
本发明实施例还提出一种数据预处理装置,所述数据预处理装置包括:
获取模块,设置为在接收到数据预处理指令时,获取待处理的数据;
映射模块,设置为将获取的待处理的数据映射为预设消息模型对应的输入消息;
处理模块,设置为采用预设的处理器对所述输入消息进行逻辑处理,以得到输出消息,其中,所述处理器对所述输入消息进行逻辑处理包括:提取所述输入消息的字段信息,并对所述字段信息进行加工变形,以得到输出消息。
可选地,所述处理模块包括:
第一确定子模块,设置为确定所述输入消息的类型;
获取子模块,设置为根据预设的类型与消息执行流的映射关系,获取确定的类型对应的消息执行流;
第二确定子模块,设置为基于获取的所述消息执行流确定所述处理器,其中,确定的所述处理器包括输出消息处理器;
处理子模块,设置为根据确定的所述处理器对所述输入消息进行逻辑处理,以得到输出消息。
可选地,在确定的所述处理器还包括临时消息处理器和/或缓存消息处理器时,所述处理子模块包括:
确定单元,设置为确定所述临时消息处理器和/或所述缓存消息处理器对应的属性;
第一处理单元,设置为在所述临时消息处理器和/或所述缓存消息处理器对应的属性为可创建属性时,根据所述临时消息处理器对所述输入消息进行逻辑处理以得到临时消息,和/或根据所述缓存消息处理器对所述输入消息进行逻辑处理以得到缓存消息;
第二处理单元,设置为根据所述输出消息处理器对所述临时消息和/或所 述缓存消息进行逻辑处理,以得到输出消息。
可选地,所述数据预处理装置还包括:
存储单元,设置为若接收到临时消息的存储指令,将所述临时消息存储到预设的缓存区域中,以生成缓存消息,以供下次接收到数据预处理指令时,将所述缓存区域中的缓存消息作为新的输入消息的被引用数据。
可选地,所述数据预处理装置还包括:
删除模块,设置为在所述缓存区域的消息存储容量值达到预设容量值时,删除所述缓存区域中的部分缓存消息,其中,删除的缓存消息的访问时间早于未删除的缓存消息的访问时间;
或者,删除缓存区域中访问时间点距离当前时间点达到预设时长的缓存消息。
本发明实施例提出的数据预处理方法及装置,在接收到数据预处理指令时,获取待处理的数据,将获取的所述数据映射为预设消息模型对应的输入消息,采用预设的处理器对所述输入消息进行逻辑处理,以得到输出消息,其中,所述处理器对所述输入消息进行逻辑处理包括:提取所述输入消息的字段信息,并对所述字段信息进行加工变形,以得到输出消息,实现了将不同类型的数据映射为消息模型对应的输入消息,也就是将数据去结构化,以提取出输入消息的字段信息,并对提取的字段信息进行逻辑处理,最终得到输出消息,而不是在数据的类型结构不同时,需要不同的软件分别对所述进行预处理,本发明实施例提高了数据预处理的效率,并降低了数据预处理的操作成本。
在阅读并理解了附图和详细描述后,可以明白其他方面。
附图概述
图1为本发明可选实施例数据预处理方法的流程示意图;
图2为本发明可选实施例采用预设的处理器对所述输入消息进行逻辑处理,以得到输出消息的流程示意图;
图3为本发明可选实施例根据确定的所述处理器对所述输入消息进行逻 辑处理的流程示意图;
图4为本发明可选实施例数据预处理装置的功能模块示意图;
图5为图4中处理模块的细化功能模块示意图;
图6为图5中处理子模块的细化功能模块示意图;
图7为本发明实施例的实施场景的示意图;
图8为本发明实施例预设的消息执行流的示意图;
图9为图8中消息执行流中每一个个消息对应的高度值的示意图。
本发明的实施方式
应当理解,此处所描述的具体实施例仅仅用以解释本发明,并不用于限定本发明。
本发明提供一种数据预处理方法。
参照图1,图1为本发明数据预处理方法第一实施例的流程示意图。
本实施例提出一种数据预处理方法,所述数据预处理方法包括:
步骤S10,在接收到数据预处理指令时,获取待处理的数据;
在本实施例中,待处理的数据包括:数据源产生的数据,值得注意的是,在数据预处理过程中,可获取上一次数据预处理过程产生的临时消息或缓存消息,并将所述临时消息或缓存消息作为待处理的数据的被引用数据(即引用临时消息或缓存消息对待处理的数据进行处理),可选地,在上一次数据预处理过程中产生的临时消息可先存储到预设的缓存空间中成为缓存消息,然后在本次进行数据预处理时,将所述缓存区域中的缓存消息作为待处理数据的被引用数据。
步骤S20,将获取的待处理的数据映射为预设消息模型对应的输入消息;
在本实施例中,数据预处理过程中是先将待处理的数据抽象成消息,再对抽象后的消息进行预处理,将待处理的数据抽象成消息的方式包括:通过加载预设的消息模型,并将待处理的数据映射为所述消息模型对应的输入消息。
将待处理的数据映射为所述消息模型对应的输入消息包括:
提取待处理的数据中包含的每一个字段信息,根据预设的消息模型,将提取的每一个字段信息按照所述消息模型的形式进行排列,最终将排列好的字段信息作为输入消息。
可选地,所述输入消息是由一组连续的字段组成,字段可以是简单的数据类型,也可以是复杂的组合数据类型。而所述消息模型的结构内容包括:
1、消息名称、消息类型;
其中,所述消息名称是以字母开头的字符串,不能包含的特殊字符,例如,特殊字符包括空格,‘.’,‘-’等,并且由于消息名称需要在消息模型中相互引用,因此消息名称必须全局唯一,且能代表描述消息的含义,消息名称同时也作为模型配置的文件名;而所述消息类型包括所述消息模型的编码方式,所述消息模型的编码方式可以是固定格式编码,如TLV(Type-Length-Value,一种编码格式)编码,也可以是自定义的一些复杂编码等等。
2、消息的逻辑条件表达式:消息模型中会配置一些逻辑表达式来控制消息被处理的流程,比如缓存消息的创建条件表达式处理器和删除条件表达式处理器,当创建条件表达式满足时,缓存消息才会被创建并缓存,删除条件表达式满足时,相应的缓存消息在流程处理完就会从缓存中删除;
3、消息的组成:消息由字段组成。通常,消息模型用XML(Extensible Markup Language,可扩展标记语言)表示,考虑到消息模型动态变更,可以用关系数据库保存消息模型。本实施例中,所述输入消息可以为移动网络的呼叫信令,也可以是用户的上网记录。
而消息中的字段的结构内容包括:
1、字段名称、字段类型、字段长度等;
其中,所述字段名称包括以字母开头的字符串,不能包含的特殊字符,例如,特殊字符包括空格,‘.’,‘-’等,而且字段名称在每个消息模型内必须唯一,且能代表字段的含义;而所述字段类型可以是基本的字段类型如整型、字符串等,也可以是复合的字段类型,其中,整型可以分为单字节、 双字节、四字节和长整型等;而字段长度例如[0,65535],值得注意的是,当字段是字符串的类型时,字段长度表示字符串最大长度。
2、字段的处理器:数据预处理过程新生成的消息字段可以由已经生成的消息字段经过加工或运算得到,这种加工或运算可以抽象出通用的处理器。对于复杂的加工逻辑可以通过内置处理器实现,一般的逻辑处理可以通过表达式处理器实现,表达式处理器是支持在线编辑和编译的逻辑处理代码,表达式隐含了消息的关联关系和处理逻辑。其中,每个表达式中可以包含多个子表达式,每个子表达式之间用“:”分隔或其他特殊符号分隔,当第一个子表达式处理失败时,就会处理第二个子表达式,直到成功才中止继续处理后续的表达式。可以理解的是,表达式处理器系统会动态地编译成机器码而不是解释执行,这样即可以保证处理器能灵活配置同时又能高效执行,内置处理器和表达式处理器相互可以组合成组合处理器。为了模型的字段描述更加直观,将处理器作为字段的属性放到模型配置中。
步骤S30,采用预设的处理器对所述输入消息进行逻辑处理,以得到输出消息,其中,所述处理器对所述输入消息进行逻辑处理包括:提取所述输入消息的字段信息,并对所述字段信息进行加工变形,以得到输出消息。
在本实施例中,采用预设的处理器对所述输入消息进行逻辑处理,以得到输出消息,也就是说先提取所述输入消息的字段信息,然后对所述字段信息进行逻辑处理,即将所述字段信息进行加工变形,得到待输出形式的消息,并将所述待输出的形式的消息作为输出消息。
为更好理解本方案,举例应用场景如下:数据预处理系统启动时会加载预设的消息模型,参照图7,消息模型包括输入消息模型、中间消息(包括临时消息和缓存消息)模型和预处理后的输出消息模型。加载完所述消息模型后,将待处理的数据映射为可以直接高效定位的内部结构,得到输入消息,可选地,此时可加载配置信息,将加载的配置信息转化为缓存消息,配置信息可以以多种形式存在,比如关系数据库、属性文件、XML文件等,缓存消息存在关键字,在内部会建立关键字和缓存消息的映射,允许按关键字查找或条件匹配查找。缓存消息可以设置为数据回填、数据归约等大数据预处理,同时,加载消息接收服务,即确定数据预处理后待传输的对象,并启动 所述消息接收服务,所述消息接收服务可以是UDP(User Datagram Protocol,用户数据报协议)协议或TCP(Transmission Control Protocol,传输控制协议)协议的网络接收服务,具体根据实际网络场景选择相应的协议。最后,通过预设的处理器对所述输入消息进行处理,以提取所述输入消息中的字段信息,并对提取的所述字段信息进行逻辑加工,以得到待输出形式的消息,最终将所述输出消息进行输出。
本实施例提出的数据预处理方法,在接收到数据预处理指令时,获取待处理的数据,将获取的所述数据映射为预设消息模型对应的输入消息,采用预设的处理器对所述输入消息进行逻辑处理,以得到输出消息,其中,所述处理器对所述输入消息进行逻辑处理包括:提取所述输入消息的字段信息,并对所述字段信息进行加工变形,以得到输出消息,实现了将不同类型的数据映射为消息模型对应的输入消息,也就是将数据去结构化,以提取出输入消息的字段信息,并对提取的字段信息进行逻辑处理,最终得到输出消息,而不是在数据的类型结构不同时,需要不同的软件分别对所述进行预处理,本发明实施例提高了数据预处理的效率,并降低了数据预处理的操作成本。
可选地,为了提高数据预处理的灵活性,基于第一实施例提出本发明数据预处理方法的第二实施例,在本实施例中,参照图2,所述步骤S30包括:
步骤S31,确定所述输入消息的类型;
本步骤中,可以根据输入消息号或唯一标识找到对应的schema模型(即预设消息模型),从schema模型中获取消息的类型。
步骤S32,根据预设的类型与消息执行流的映射关系,获取确定的类型对应的消息执行流;
步骤S33,基于获取的所述消息执行流确定处理器,其中,确定的所述处理器包括输出消息处理器;
在本实施例中,先确定所述输入消息的类型,所述类型包括呼叫类型、上网类型等等,在确定输入消息的类型后,根据预设的类型与消息执行流的映射关系,获取确定的类型对应的消息执行流,可以理解的是,系统中事先 存储了消息类型与消息类型对应的消息执行流,即哪种消息对应哪种消息执行流,具体参照图8,将输入消息的类型定义为A,则可知道输入消息的类型对应的消息执行流为A-B-C-E-D。可以理解,此时所述消息执行流对应的处理器包括中间消息处理器和输出消息处理器,而所述中间消息处理器包括缓存消息处理器和/或临时消息处理器。值得注意的是,每个输入消息类型对应的消息执行流只有一种。
步骤S34,根据确定的所述处理器对所述输入消息进行逻辑处理,以得到输出消息。
本实施例中,同样参照图8,当所述输入消息的类型为E时,所述消息执行流为E-D,此时,可知道数据预处理过程中仅包括输入消息和输出消息,不包含中间消息,此时,所述确定的所述处理器为输出消息处理器,所述输出消息处理器对所述输入消息E进行逻辑处理,以得到输出消息D。
在本实施例中,参照图3,在确定的所述处理器还包括临时消息处理器和/或缓存消息处理器时,所述步骤S34包括:
步骤S341,确定所述临时消息处理器和/或所述缓存消息处理器对应的属性;
步骤S342,在所述临时消息处理器和/或所述缓存消息处理器对应的属性为可创建属性时,根据所述临时消息处理器对所述输入消息进行逻辑处理以得到临时消息,和/或根据所述缓存消息处理器对所述输入消息进行逻辑处理以得到缓存消息;
步骤S343,根据所述输出消息处理器对所述临时消息和/或所述缓存消息进行逻辑处理,以得到输出消息。
在本实施例中,在确定了所述处理器后,先确定所述处理器对应的属性,即在确定的所述处理器还包括临时消息处理器和/或缓存消息处理器时,确定所述临时消息处理器和/或所述缓存消息处理器对应的属性,在所述临时消息处理器和/或所述缓存消息处理器对应的属性为可创建属性时,根据所述临时消息处理器对所述输入消息进行逻辑处理以得到临时消息,和/或根据所述缓存消息处理器对所述输入消息进行逻辑处理以得到缓存消 息,根据所述输出消息处理器对所述临时消息和/或所述缓存消息进行逻辑处理,以得到输出消息。可以理解的是,除了输入消息外,其他消息都可以存在创建条件处理器,即所述临时消息、缓存消息以及所述输出消息都存在创建条件处理器,而所述创建条件处理器设置为创建消息,例如,在临时消息的创建条件处理器对应的属性为可创建属性时,此时,所述输入消息通过所述临时消息处理器对应的创建条件处理器可生成临时消息,在临时消息的创建条件处理器对应的属性为不可创建属性时,此时,则所述输入消息不生成临时消息,可选地,所述缓存消息还存在更新条件处理器和删除条件处理器,即仅仅所述缓存消息处理器才对应更新条件处理器和删除条件处理器,所述更新处理器设置为更新缓存消息,所述删除条件处理器设置为删除老化的缓存消息。本实施例中,为更好理解本实施例,同样参照图8,当所述输入消息的类型为A时,所述消息执行流为A-B-C-E-D,可以理解的是,在得到所述消息执行流后,还可以根据消息模型中的创建条件和删除条件,决定是否生成该缓存消息或输出消息,或者一个消息执行流处理完后,是否删除该消息。即根据所述消息执行流中对应的处理器的属性确定是否能生成消息,若处理器的属性为不可创建属性,则所述处理器无法生成消息内容,在所述输入消息的类型为A时,所述执行流为A-B-C-E-D,若此时仅仅是消息C对应的处理器的属性为可创建属性,其它每一个消息对应的处理器的属性为不可创建属性,则最终生成的输出消息为C。
若每一个消息对应的处理器的属性都为可创建属性,则需要先确定每一个消息对应的高度值(如图1中,WA表示A节点的高度值,WB表示B节点的高度值,WC表示C节点的高度值,WE表示E节点的高度值),按照每一个消息的高度值确定每一个消息的输出顺序。即,首先加载消息模型,建立内部映射关系。根据消息之间的引用关系建立消息执行流,当一个消息没有依赖其他消息时,可以认为是输入消息,那么就以该消息作为引用关系的入口,遍历关系图,找到引用关系的列表。如下表1:
消息 依赖消息列表
A  
B A
C A
D E,A,C
E B,A
A、B、C、D和E五个消息依赖关系,A是没有依赖任何其他消息,所以它是输入消息,那么就可以以A为入口,遍历引用关系,每个消息的高度都是等于被其依赖的消息中高度最大值加上一。根据每个消息的高度值大小进行排序,最后得到一个依赖链表,保证被依赖的消息排在前面,可参照图9,最终,可得知输出消息为D。
可选地,为了提高数据预处理的灵活性,基于第二实施例提出本发明数据预处理方法的第三实施例,在本实施例中,所述执行步骤S343的同时,若接收到临时消息的存储指令,执行以下步骤:
将所述临时消息存储到预设的缓存区域中,以生成缓存消息,以供下次接收到数据预处理指令时,将所述缓存区域中的缓存消息作为新的输入消息的被引用数据。
在本实施例中,在根据所述输出消息处理器对所述临时消息及所述缓存消息进行逻辑处理,以得到输出消息的同时,若接收到临时消息的存储指令,将所述临时消息存储到预设的缓存区域中,以生成缓存消息,可参照图7,将临时消息流到缓存消息处理器中,以存储到预设的缓存区域中生成缓存消息,可以理解的是,根据所述缓存消息处理器对所述输入消息进行逻辑处理生成的缓存消息也是存储到预设的缓存区域中,在下次接收到数据预处理指令时,可将所述缓存区域中的缓存消息作为新的输入消息的被引用数据,实现了下次处理数据时,可将本次产生的临时消息或者缓存消息作为被引用数据,提高了数据处理的效率。
本实施例中,数据库中或者是一些配置化的数据可以动态加载到缓存中,以便预处理过程中可以作为被引用的数据。
可选地,为了提高数据预处理的灵活性,基于第三实施例提出本发明数 据预处理方法的第四实施例,在本实施例中,所述数据预处理方法还包括:
在所述缓存区域的消息存储容量值达到预设容量值时,删除所述缓存区域中的部分缓存消息,其中,删除的缓存消息的访问时间早于未删除的缓存消息的访问时间;
或者,删除缓存区域中访问时间点距离当前时间点达到预设时长的缓存消息。
在本实施例中,缓存消息可能会随着消息不断处理,而越来越多,因此,缓存消息一般需要存在最大数量限制、保存周期和淘汰删除机制,因此,本实施例中通过定时删除所述缓存区域中的缓存消息,避免了缓存区域中的缓存消息占用过多的容量,从而降低了数据预处理的运行效率,本实施例通过定时删除缓存消息,增大了缓存区域的存储空间,提高了系统的运行效率。
可选地,缓存消息存在对应的定时服务,每个缓存消息本身绑定定时器信息,使得当缓存区域中存在海量数据时,可以自动删除缓存消息。
上述方法可以通过服务器实现。
本发明实施例还提出了一种计算机可读存储介质,存储有计算机可执行指令,计算机可执行指令用于执行上述描述的任意一个方法。
本发明进一步提供一种数据预处理装置。
参照图4,图4为本发明数据预处理装置第一实施例的功能模块示意图。
需要强调的是,对本领域的技术人员来说,图4所示功能模块图仅仅是一个可选实施例的示例图,本领域的技术人员围绕图4所示的数据预处理装置的功能模块,可轻易进行新的功能模块的补充;每一个功能模块的名称是自定义名称,仅用于辅助理解该数据预处理装置的每一个程序功能块,不用于限定本发明的技术方案,本发明技术方案的核心是,每一个自定义名称的功能模块所要达成的功能。
本实施例提出一种数据预处理装置,所述数据预处理装置包括:
获取模块10,设置为在接收到数据预处理指令时,获取待处理的数据;
在本实施例中,在接收到所述数据预处理指令时,所述获取模块10获取所述待处理的数据,所述待处理数据包括:数据源产生的数据,值得注意的是,在数据预处理过程中,可获取上一次数据预处理过程产生的临时消息或缓存消息,并将所述临时消息或缓存消息作为待处理数据的被引用数据,可选地,在上一次数据预处理过程中产生的临时消息可先存储到预设的缓存空间中成为缓存消息,然后在本次进行数据预处理时,将所述缓存区域中的缓存消息作为待处理数据的被引用数据。
映射模块20,设置为将获取的待处理的数据映射为预设消息模型对应的输入消息;
在本实施例中,数据预处理过程中是先将待处理的数据抽象成消息,再对抽象后的消息进行预处理,将待处理的数据抽象成消息的方式包括:通过加载预设的消息模型,即所述映射模块20将待处理的数据映射为所述消息模型对应的输入消息,所述映射模块20是设置为采用以下方式实现将待处理数据映射为所述消息模型对应的输入消息:提取待处理的数据中包含的每一个字段信息,根据预设的消息模型,将提取的每一个字段信息按照所述消息模型的形式进行排列,最终将排列好的字段信息作为输入消息。
可选地,所述输入消息是由一组连续的字段组成,字段可以是简单的数据类型,也可以是复杂的组合数据类型。而所述消息模型的结构内容包括:
1、消息名称、消息类型;
其中,所述消息名称是以字母开头的字符串,不能包含的特殊字符,例如,特殊字符包括空格,‘.’,‘-’等,并且由于消息名称需要在消息模型中相互引用,因此消息模型名称必须全局唯一,且能代表描述消息的含义,消息名称同时也作为模型配置的文件名;而所述消息类型包括所述消息模型的编码方式,所述消息模型的编码方式可以是固定格式编码,如TLV(Type-Length-Value,一种编码格式)编码,也可以是自定义的一些复杂编码等等。
2、消息的逻辑条件表达式:消息模型中会配置一些逻辑表达式来控制消息被处理的流程,比如缓存消息的创建条件表达式处理器和删除条件表达式处理器,当创建条件表达式满足时,缓存消息才会被创建并缓存,删除条 件表达式满足时,相应的缓存消息在流程处理完就会从缓存中删除;
3、消息的组成:消息由字段组成。通常,消息模型用XML(Extensible Markup Language,可扩展标记语言)表示,考虑到消息模型动态变更,可以用关系数据库保存消息模型。本实施例中,所述输入消息可以为移动网络的呼叫信令,也可以是用户的上网记录。
而消息中的字段的结构内容包括:
1、字段名称、字段类型、字段长度等;
其中,所述字段名称包括以字母开头的字符串,不能包含的特殊字符,例如,特殊字符包括空格,‘.’,‘-’等,而且字段名称在每个消息模型内必须唯一,且能代表字段的含义;而所述字段类型可以是基本的字段类型如整型、字符串等,也可以是复合的字段类型,其中,整型可以分为单字节、双字节、四字节和长整型等;而字段长度例如[0,65535],值得注意的是,当字段是字符串的类型时,字段长度表示字符串最大长度。
2、字段的处理器:数据预处理过程新生成的消息字段可以由已经生成的消息字段经过加工或运算得到,这种加工或运算可以抽象出通用的处理器。对于复杂的加工逻辑可以通过内置处理器实现,一般的逻辑处理可以通过表达式处理器实现,表达式处理器是支持在线编辑和编译的逻辑处理代码,表达式隐含了消息的关联关系和处理逻辑。其中,每个表达式中可以包含多个子表达式,每个子表达式之间用“:”分隔或其他特殊符号分隔,当第一个子表达式处理失败时,就会处理第二个子表达式,直到成功才中止继续处理后续的表达式。可以理解的是,表达式处理器系统会动态地编译成机器码而不是解释执行,这样即可以保证处理器能灵活配置同时又能高效执行,内置处理器和表达式处理器相互可以组合成组合处理器。为了模型的字段描述更加直观,将处理器作为字段的属性放到模型配置中。
处理模块30,设置为采用预设的处理器对所述输入消息进行逻辑处理,以得到输出消息,其中,所述处理器对所述输入消息进行逻辑处理包括:提取所述输入消息的字段信息,并对所述字段信息进行加工变形,以得到输出消息。
在本实施例中,所述处理模块30采用预设的处理器对所述输入消息进行逻辑处理,以得到输出消息,也就是说先提取所述输入消息的字段信息,然后对所述字段信息进行逻辑处理,即将所述字段信息进行加工变形,得到待输出形式的消息,并将所述待输出的形式的消息作为输出消息。
为更好理解本方案,举例应用场景如下:数据预处理系统启动时会加载预设的消息模型,参照图7,消息模型包括输入消息模型、中间消息(包括临时消息和缓存消息)模型和预处理后的输出消息模型。加载完所述消息模型后,将待处理的数据映射为可以直接高效定位的内部结构,得到输入消息,可选地,此时可加载配置信息,将加载的配置信息转化为缓存消息,配置信息可以以多种形式存在,比如关系数据库、属性文件、XML文件等,缓存消息存在存在关键字,在内部会建立关键字和缓存消息的映射,允许按关键字查找或条件匹配查找。缓存消息可以设置为数据回填、数据归约等大数据预处理,同时,加载消息接收服务,即确定数据预处理后待传输的对象,并启动所述消息接收服务,所述消息接收服务可以是UDP(User Datagram Protocol,用户数据报协议)协议或TCP(Transmission Control Protocol,传输控制协议)协议的网络接收服务,具体根据实际网络场景选择相应的协议。最后,通过预设的处理器对所述输入消息进行处理,以提取所述输入消息中的字段信息,并对提取的所述字段信息进行逻辑加工,以得到待输出形式的消息,最终将所述输出消息进行输出。
本实施例提出的数据预处理装置,在接收到数据预处理指令时,获取待处理的数据,将获取的所述数据映射为预设消息模型对应的输入消息,采用预设的处理器对所述输入消息进行逻辑处理,以得到输出消息,其中,所述处理器对所述输入消息进行逻辑处理包括:提取所述输入消息的字段信息,并对所述字段信息进行加工变形,以得到输出消息,实现了将不同类型的数据映射为消息模型对应的输入消息,也就是将数据去结构化,以提取出输入消息的字段信息,并对提取的字段信息进行逻辑处理,最终得到输出消息,而不是在数据的类型结构不同时,需要不同的软件分别对所述进行预处理,本发明实施例提高了数据预处理的效率,并降低了数据预处理的操作成本。
可选地,为了提高数据预处理的灵活性,基于第一实施例提出本发明数据预处理装置的第二实施例,在本实施例中,参照图5,所述处理模块30包括:
第一确定子模块31,设置为确定所述输入消息的类型;
获取子模块32,设置为根据预设的类型与消息执行流的映射关系,获取确定的类型对应的消息执行流;
第二确定子模块33,设置为基于获取的所述消息执行流确定处理器,其中,确定的所述处理器包括输出消息处理器;
在本实施例中,所述第一确定子模块31先确定所述输入消息的类型,所述类型包括呼叫类型、上网类型等等,在确定输入消息的类型后,所述获取子模块32根据预设的类型与消息执行流的映射关系,获取确定的类型对应的消息执行流,可以理解的是,系统中事先存储了消息类型与消息类型对应的消息执行流,即哪种消息对应哪种消息执行流,具体参照图8,将输入消息的类型定义为A,则可知道输入消息的类型对应的消息执行流为A-B-C-E-D,可以理解,此时所述消息执行流对应的处理器包括中间消息处理器和输出消息处理器,而所述中间消息处理器包括缓存消息处理器和/或临时消息处理器。值得注意的是,每个输入消息类型对应的消息执行流只有一种。
处理子模块34,设置为根据确定的所述处理器对所述字段消息进行逻辑处理,以得到输出消息。
本实施例中,同样参照图8,当所述输入消息的类型为E时,所述消息执行流为E-D,此时,可知道数据预处理过程中仅包括输入消息和输出消息,不包含中间消息,此时,所述第二确定子模块33确定的所述处理器为输出消息处理器,所述输出消息处理器对所述输入消息E进行逻辑处理,以得到输出消息D。
在本实施例中,参照图6,在确定的所述处理器还包括临时消息处理器和/或缓存消息处理器时,所述处理子模块34包括:
确定单元34,设置为确定所述临时消息处理器和/或所述缓存消息处理 器对应的属性;
第一处理单元35,设置为在所述临时消息处理器和/或所述缓存消息处理器对应的属性为可创建属性时,根据所述临时消息处理器对所述输入消息进行逻辑处理以得到临时消息,和/或根据所述缓存消息处理器对所述输入消息进行逻辑处理以得到缓存消息;
第二处理单元36,设置为根据所述输出消息处理器对所述临时消息和/或所述缓存消息进行逻辑处理,以得到输出消息。
在本实施例中,在确定了所述处理器后,所述确定单元34先确定所述处理器对应的属性,即在确定的所述处理器还包括临时消息处理器和/或缓存消息处理器时,确定所述临时消息处理器和/或所述缓存消息处理器对应的属性,在所述临时消息处理器和/或所述缓存消息处理器对应的属性为可创建属性时,所述第一处理单元35根据所述临时消息处理器对所述输入消息进行逻辑处理以得到临时消息,和/或根据所述缓存消息处理器对所述输入消息进行逻辑处理以得到缓存消息,所述第二处理单元36根据所述输出消息处理器对所述临时消息和/或所述缓存消息进行逻辑处理,以得到输出消息。可以理解的是,除了输入消息外,其他消息都可以存在创建条件处理器,即所述临时消息、缓存消息以及所述输出消息都存在创建条件处理器,而所述创建条件处理器设置为创建消息,例如,在临时消息的创建条件处理器对应的属性为可创建属性时,此时,所述输入消息通过所述临时消息处理器对应的创建条件处理器可生成临时消息,在临时消息的创建条件处理器对应的属性为不可创建属性时,此时,则所述输入消息不生成临时消息,可选地,所述缓存消息还存在更新条件处理器和删除条件处理器,即仅仅所述缓存消息处理器才对应更新条件处理器和删除条件处理器,所述更新处理器设置为更新缓存消息,所述删除条件处理器设置为删除老化的缓存消息。本实施例中,为更好理解本实施例,同样参照图8,当所述输入消息的类型为A时,所述执行流为A-B、A-B-C-E-D,可以理解的是,在得到所述消息执行流后,还可以根据消息模型中的创建条件和删除条件,决定是否生成该缓存消息或输出消息,或者一个消息执行流处理完后,是否删除该消息。即根据所述消息执行流中对应的处理器的属性确定是否能生成消息,若处理器的属 性为不可创建属性,则所述处理器无法生成消息内容,在所述输入消息的类型为A时,所述执行流为所述消息执行流中对应的,若此时仅仅是消息C对应的处理器的属性为可创建属性,其它每一个消息对应的处理器的属性为不可创建属性,则最终生成的输出消息为C。
若每一个消息对应的处理器的属性都为可创建属性,则需要先确定每一个消息对应的高度值,按照每一个消息的高度值确定每一个消息的输出顺序。即,首先加载消息模型,建立内部映射关系。根据消息之间的引用关系建立消息执行流,当一个消息没有依赖其他消息时,可以认为是输入消息,那么就以该消息作为引用关系的入口,遍历关系图,找到引用关系的列表。如下表1:
消息 依赖消息列表
A  
B A
C A
D E,A,C
E B,A
A、B、C、D和E五个消息依赖关系,A是没有依赖任何其他消息,所以它是输入消息,那么就可以以A为入口,遍历引用关系,每个消息的高度都是等于被其依赖的消息中高度最大值加上一。根据每个消息的高度值大小进行排序,最后得到一个依赖链表,保证被依赖的消息排在前面,可参照图9,最终,可得输出消息为D。
可选地,为了提高数据预处理的灵活性,基于第二实施例提出本发明数据预处理装置的第三实施例,在本实施例中,所述数据预处理装置还包括:
存储单元,设置为若接收到临时消息的存储指令,将所述临时消息存储到预设的缓存区域中,以生成缓存消息,以供下次接收到数据预处理指令时,将所述缓存区域中的缓存消息作为新的输入消息的被引用数据。
在本实施例中,在根据所述输出消息处理器对所述临时消息及所述缓存 消息进行逻辑处理,以得到输出消息的同时,若接收到临时消息的存储指令,将所述临时消息存储到预设的缓存区域中,以生成缓存消息,可参照图7,将临时消息流到缓存消息处理器中,以存储到预设的缓存区域中生成缓存消息,可以理解的是,根据所述缓存消息处理器对所述输入消息进行逻辑处理生成的缓存消息也是存储到预设的缓存区域中,在下次接收到数据预处理指令时,所述获取模块10可将所述缓存区域中的缓存消息作为新的输入消息的被引用数据,实现了下次处理数据时,可将本次产生的临时消息或者缓存消息作为被引用数据,提高了数据处理的效率。
本实施例中,数据库中或者是一些配置化的数据可以动态加载到缓存中,以便预处理过程中可以作为被引用的数据。
可选地,为了提高数据预处理的灵活性,基于第三实施例提出本发明数据预处理装置的第四实施例,在本实施例中,所述数据预处理装置还包括:
删除模块,设置为在所述缓存区域的消息存储容量值达到预设容量值时,删除所述缓存区域中的部分缓存消息,其中,删除的缓存消息的访问时间早于未删除的缓存消息的访问时间;
或者,删除缓存区域中访问时间点距离当前时间点达到预设时长的缓存消息。
在本实施例中,缓存消息可能会随着消息不断处理,而越来越多,因此,缓存消息一般需要存在最大数量限制、保存周期和淘汰删除机制,因此,本实施例中所述删除模块通过定时删除所述缓存区域中的缓存消息,避免了缓存区域中的缓存消息占用过多的容量,从而降低了数据预处理的运行效率,本实施例通过定时删除缓存消息,增大了缓存区域的存储空间,提高了系统的运行效率。
可选地,缓存消息存在对应的定时服务,每个缓存消息本身绑定定时器信息,使得当缓存区域中存在海量数据时,可以自动删除缓存消息。
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其它变 体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素,而且还包括没有明确列出的其它要素,或者是还包括为这种过程、方法、物品或者装置所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者装置中还存在另外的相同要素。
上述本发明实施例序号仅仅为了描述,不代表实施例的优劣。
本领域普通技术人员可以理解上述方法中的全部或部分步骤可通过程序来指令相关硬件(例如处理器)完成,所述程序可以存储于计算机可读存储介质中,如只读存储器、磁盘或光盘等。可选地,上述实施例的全部或部分步骤也可以使用一个或多个集成电路来实现。相应地,上述实施例中的各模块/单元可以采用硬件的形式实现,例如通过集成电路来实现其相应功能,也可以采用软件功能模块的形式实现,例如通过处理器执行存储与存储器中的程序/指令来实现其相应功能。本发明不限于任何特定形式的硬件和软件的结合。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本发明各个实施例所述的方法。
以上仅为本发明的优选实施例,并非因此限制本发明的专利范围,凡是利用本发明说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其它相关的技术领域,均同理包括在本发明的专利保护范围内。
工业实用性
上述技术方案提高了数据预处理的效率,并降低了数据预处理的操作成本。

Claims (10)

  1. 一种数据预处理方法,所述数据预处理方法包括以下步骤:
    在接收到数据预处理指令时,获取待处理的数据;
    将获取的待处理的数据映射为预设消息模型对应的输入消息;
    采用预设的处理器对所述输入消息进行逻辑处理,以得到输出消息,其中,所述处理器对所述输入消息进行逻辑处理包括:提取所述输入消息的字段信息,并对所述字段信息进行加工变形,以得到输出消息。
  2. 如权利要求1所述的数据预处理方法,其中,所述采用预设的处理器对所述输入消息进行逻辑处理,以得到输出消息的步骤包括:
    确定所述输入消息的类型;
    根据预设的类型与消息执行流的映射关系,获取确定的类型对应的消息执行流;
    基于获取的所述消息执行流确定所述处理器,其中,确定的所述处理器包括输出消息处理器;
    根据确定的所述处理器对所述输入消息进行逻辑处理,以得到输出消息。
  3. 如权利要求2所述的数据预处理方法,其中,在确定的所述处理器还包括临时消息处理器和/或缓存消息处理器时,所述根据确定的所述处理器对所述输入消息进行逻辑处理,以得到输出消息的步骤包括:
    确定所述临时消息处理器和/或所述缓存消息处理器对应的属性;
    在所述临时消息处理器和/或所述缓存消息处理器对应的属性为可创建属性时,根据所述临时消息处理器对所述输入消息进行逻辑处理以得到临时消息,和/或根据所述缓存消息处理器对所述输入消息进行逻辑处理以得到缓存消息;
    根据所述输出消息处理器对所述临时消息和/或所述缓存消息进行逻辑处理,以得到输出消息。
  4. 如权利要求3所述的数据预处理方法,所述根据所述输出消息处理器对所述临时消息和/或所述缓存消息进行逻辑处理,以得到输出消息的同时, 若接收到临时消息的存储指令,执行以下步骤:
    将所述临时消息存储到预设的缓存区域中,以生成缓存消息,以供下次接收到数据预处理指令时,将所述缓存区域中的缓存消息作为新的输入消息的被引用数据。
  5. 如权利要求4所述的数据预处理方法,所述数据预处理方法还包括:
    在所述缓存区域的消息存储容量值达到预设容量值时,删除所述缓存区域中的部分缓存消息,其中,删除的缓存消息的访问时间早于未删除的缓存消息的访问时间;
    或者,删除所述缓存区域中访问时间点距离当前时间点达到预设时长的缓存消息。
  6. 一种数据预处理装置,所述数据预处理装置包括:
    获取模块,设置为在接收到数据预处理指令时,获取待处理的数据;
    映射模块,设置为将获取的待处理的数据映射为预设消息模型对应的输入消息;
    处理模块,设置为采用预设的处理器对所述输入消息进行逻辑处理,以得到输出消息,其中,所述处理器对所述输入消息进行逻辑处理包括:提取所述输入消息的字段信息,并对所述字段信息进行加工变形,以得到输出消息。
  7. 如权利要求6所述的数据预处理装置,其中,所述处理模块包括:
    第一确定子模块,设置为确定所述输入消息的类型;
    获取子模块,设置为根据预设的类型与消息执行流的映射关系,获取确定的类型对应的消息执行流;
    第二确定子模块,设置为基于获取的所述消息执行流确定所述处理器,其中,确定的所述处理器包括输出消息处理器;
    处理子模块,设置为根据确定的所述处理器对所述输入消息进行逻辑处理,以得到输出消息。
  8. 如权利要求7所述的数据预处理装置,其中,在确定的所述处理器还 包括临时消息处理器和/或缓存消息处理器时,所述处理子模块包括:
    确定单元,设置为确定所述临时消息处理器和/或所述缓存消息处理器对应的属性;
    第一处理单元,设置为在所述临时消息处理器和/或所述缓存消息处理器对应的属性为可创建属性时,根据所述临时消息处理器对所述输入消息进行逻辑处理以得到临时消息,和/或根据所述缓存消息处理器对所述输入消息进行逻辑处理以得到缓存消息;
    第二处理单元,设置为根据所述输出消息处理器对所述临时消息和/或所述缓存消息进行逻辑处理,以得到输出消息。
  9. 如权利要求8所述的数据预处理装置,所述数据预处理装置还包括:
    存储单元,设置为若接收到临时消息的存储指令,将所述临时消息存储到预设的缓存区域中,以生成缓存消息,以供下次接收到数据预处理指令时,将所述缓存区域中的缓存消息作为新的输入消息的被引用数据。
  10. 如权利要求9所述的数据预处理装置,所述数据预处理装置还包括:
    删除模块,设置为在所述缓存区域的消息存储容量值达到预设容量值时,删除所述缓存区域中的部分缓存消息,其中,删除的缓存消息的访问时间早于未删除的缓存消息的访问时间;
    或者,删除所述缓存区域中访问时间点距离当前时间点达到预设时长的缓存消息。
PCT/CN2016/085161 2015-12-29 2016-06-07 数据预处理方法及装置 WO2016197924A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201511017135.X 2015-12-29
CN201511017135.XA CN106933826B (zh) 2015-12-29 2015-12-29 数据预处理方法及装置

Publications (1)

Publication Number Publication Date
WO2016197924A1 true WO2016197924A1 (zh) 2016-12-15

Family

ID=57503016

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/085161 WO2016197924A1 (zh) 2015-12-29 2016-06-07 数据预处理方法及装置

Country Status (2)

Country Link
CN (1) CN106933826B (zh)
WO (1) WO2016197924A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115525235A (zh) * 2022-11-04 2022-12-27 上海威固信息技术股份有限公司 一种基于存储结构的数据运算方法及系统

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101272376A (zh) * 2008-05-06 2008-09-24 中兴通讯股份有限公司 一种消息解析方法
US20080259930A1 (en) * 2007-04-20 2008-10-23 Johnston Simon K Message Flow Model of Interactions Between Distributed Services
CN101808175A (zh) * 2009-02-13 2010-08-18 华为技术有限公司 话单转换方法及装置

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140365403A1 (en) * 2013-06-07 2014-12-11 International Business Machines Corporation Guided event prediction
CN104156395A (zh) * 2014-07-14 2014-11-19 上海东方延华节能技术服务股份有限公司 数据存储系统

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080259930A1 (en) * 2007-04-20 2008-10-23 Johnston Simon K Message Flow Model of Interactions Between Distributed Services
CN101272376A (zh) * 2008-05-06 2008-09-24 中兴通讯股份有限公司 一种消息解析方法
CN101808175A (zh) * 2009-02-13 2010-08-18 华为技术有限公司 话单转换方法及装置

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115525235A (zh) * 2022-11-04 2022-12-27 上海威固信息技术股份有限公司 一种基于存储结构的数据运算方法及系统
CN115525235B (zh) * 2022-11-04 2023-09-08 上海威固信息技术股份有限公司 一种基于存储结构的数据运算方法及系统

Also Published As

Publication number Publication date
CN106933826B (zh) 2020-11-27
CN106933826A (zh) 2017-07-07

Similar Documents

Publication Publication Date Title
CN109739894B (zh) 补充元数据描述的方法、装置、设备及存储介质
WO2020233367A1 (zh) 区块链数据存储和查询方法、装置、设备及存储介质
WO2020007224A1 (zh) 知识图谱构建及智能应答方法、装置、设备及存储介质
US10162613B1 (en) Re-usable rule parser for different runtime engines
WO2018095351A1 (zh) 搜索处理方法及装置
US8103705B2 (en) System and method for storing text annotations with associated type information in a structured data store
CN106611044B (zh) 一种sql优化方法及设备
US20090222467A1 (en) Method and Apparatus for Converting Legacy Programming Language Data Structures to Schema Definitions
US9514115B2 (en) Method for creating form validation program and corresponding form interface according to result set metadata
CN105988996B (zh) 一种索引文件生成方法及装置
US7610292B2 (en) Systems and methods for storing a dataset having a hierarchical data structure in a database
CN115543402B (zh) 一种基于代码提交的软件知识图谱增量更新方法
CN112883030A (zh) 数据收集方法、装置、计算机设备和存储介质
US20190377780A1 (en) Automated patent preparation
CN112818181A (zh) 图数据库检索方法、系统、计算机设备和存储介质
US7512608B2 (en) Method for processing structured documents stored in a database
US11449461B2 (en) Metadata-driven distributed dynamic reader and writer
WO2016177027A1 (zh) 批量数据查询方法和装置
WO2016197924A1 (zh) 数据预处理方法及装置
US9348884B2 (en) Methods and apparatus for reuse optimization of a data storage process using an ordered structure
EP3282372A1 (en) Method and apparatus for storing data
CN111125129A (zh) 数据处理方法和装置、存储介质及处理器
CN108073584B (zh) 一种数据处理方法及服务器
CN101930451B (zh) 用于存储高效地搜索至少一个询问数据元素的方法和装置
CN112148710B (zh) 微服务分库方法、系统和介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16806822

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16806822

Country of ref document: EP

Kind code of ref document: A1