CN114385694A

CN114385694A - Data processing method and device, computer equipment and storage medium

Info

Publication number: CN114385694A
Application number: CN202111579929.0A
Authority: CN
Inventors: 冯天驰; 肖林岩; 李邕; 张龙达
Original assignee: Hunan Caixin Digital Technology Co ltd
Current assignee: Hunan Caixin Digital Technology Co ltd
Priority date: 2021-12-22
Filing date: 2021-12-22
Publication date: 2022-04-22

Abstract

The embodiment of the application belongs to the technical field of computer information management, and relates to a data processing method and device, computer equipment and a storage medium. According to the method, the original data to be processed is analyzed to obtain each structured data of the original data, the corresponding operator rule is obtained according to each structured data, the operator rule carries out processing operation on the corresponding structured data to obtain a target processing result, and accordingly the internal data with clear content definition are planned in a unified mode.

Description

Data processing method and device, computer equipment and storage medium

Technical Field

The present application relates to the technical field of computer information management in artificial intelligence, and in particular, to a data processing method and apparatus, a computer device, and a storage medium.

Background

In the field of computer applications, information management by computers has become a major application, and data processing is an essential means for information management.

The existing data processing method is to process data according to a certain logic, for example: screening conditions, fields or correlation operations among the fields (such as addition, subtraction, multiplication, division, counting, average values and the like of mathematics), intercepting certain unknown and long character strings in the fields and other logics, thereby achieving the purpose of data processing.

However, the applicant finds that the conventional data processing method is generally not intelligent, and because the data processing module in the existing enterprise data platform or data center system is designed and built based on the starting point of processing the internal data, the conventional data processing method can only be applied to the internal data with definite content definition, but is difficult to apply to the data of the industry and core enterprises which come from the outside, have diverse data definitions, complex types and frequent changes, so that the conventional data processing method has the problem of poor compatibility.

Disclosure of Invention

An object of the embodiments of the present application is to provide a data processing method, an apparatus, a computer device, and a storage medium, so as to solve the problem of poor compatibility of the conventional data processing method.

In order to solve the above technical problem, an embodiment of the present application provides a data processing method, which adopts the following technical solutions:

acquiring original data to be processed;

performing data analysis operation on the original data to obtain structured data;

if the structured data further comprises personalized data, performing data conversion operation on the personalized data, and taking the personalized data after the data conversion operation as the structured data;

reading an operator rule base, and acquiring operator rules corresponding to the structured data in the operator rule base;

and processing the structured data according to the operator rule to obtain a target processing result.

In order to solve the above technical problem, an embodiment of the present application further provides a data processing apparatus, which adopts the following technical solutions:

the data acquisition module is used for acquiring original data to be processed;

the data analysis module is used for carrying out data analysis operation on the original data to obtain structured data;

the format conversion module is used for carrying out data conversion operation on the personalized data if the structured data also comprises the personalized data, and taking the personalized data after the data conversion operation as the structured data;

the operator acquisition module is used for reading an operator rule base and acquiring an operator rule corresponding to the structured data from the operator rule base;

and the first processing module is used for processing the structured data according to the operator rule to obtain a target processing result.

In order to solve the above technical problem, an embodiment of the present application further provides a computer device, which adopts the following technical solutions:

comprising a memory and a processor, wherein the memory stores computer readable instructions, and the processor executes the computer readable instructions to realize the steps of the data processing method.

In order to solve the above technical problem, an embodiment of the present application further provides a computer-readable storage medium, which adopts the following technical solutions:

the computer readable storage medium has stored thereon computer readable instructions which, when executed by a processor, implement the steps of the data processing method as described above.

Compared with the prior art, the embodiment of the application mainly has the following beneficial effects:

the application provides a data processing method, which comprises the following steps: acquiring original data to be processed; performing data analysis operation on the original data to obtain structured data; if the structured data further comprises personalized data, performing data conversion operation on the personalized data, and taking the personalized data after the data conversion operation as the structured data; reading an operator rule base, and acquiring operator rules corresponding to the structured data in the operator rule base; and processing the structured data according to the operator rule to obtain a target processing result. The method comprises the steps of analyzing original data to be processed to obtain each structured data of the original data, obtaining a corresponding operator rule according to each structured data to enable the operator rule to process the corresponding structured data to obtain a target processing result, realizing a further data processing process through the combination of subsequent data processing operator rules, finally completing the processing from various non-structured data to internal data with definite content definition, and when the structured data also comprises individualized data with various data definitions, complex types, frequent changes and the like, obtaining the structured data meeting format requirements through data conversion operation on the individualized data, and further completing the operation of obtaining the operator rule and processing the data from the outside, the data processing method has the advantages that the data processing method is complex in type and frequently changed in the industry and core enterprises, and the compatibility of data processing is effectively improved.

Drawings

In order to more clearly illustrate the solution of the present application, the drawings needed for describing the embodiments of the present application will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present application, and that other drawings can be obtained by those skilled in the art without inventive effort.

FIG. 1 is an exemplary system architecture diagram in which the present application may be applied;

FIG. 2 is a flowchart of an implementation of a data processing method according to an embodiment of the present disclosure;

fig. 3 is a flowchart of another specific implementation of a data processing method according to an embodiment of the present application;

FIG. 4 is a flowchart of one embodiment of step S203 in FIG. 2;

FIG. 5 is a flowchart of an implementation of obtaining a semantic analysis model according to an embodiment of the present application;

fig. 6 is a flowchart of an implementation of a first method for obtaining a feature expression vector according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of a data processing device according to a second embodiment of the present application;

fig. 8 is a schematic structural diagram of another specific implementation of a data processing device according to a second embodiment of the present application;

FIG. 9 is a schematic block diagram of one embodiment of a computer device according to the present application.

Detailed Description

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used in the description of the application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "including" and "having," and any variations thereof, in the description and claims of this application and the description of the above figures are intended to cover non-exclusive inclusions. The terms "first," "second," and the like in the description and claims of this application or in the above-described drawings are used for distinguishing between different objects and not for describing a particular order.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings.

As shown in fig. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The user may use the

terminal devices

101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The

terminal devices

101, 102, 103 may have various communication client applications installed thereon, such as a web browser application, a shopping application, a search application, an instant messaging tool, a mailbox client, social platform software, and the like.

The

terminal devices

101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III, mpeg compression standard Audio Layer 3), MP4 players (Moving Picture Experts Group Audio Layer IV, mpeg compression standard Audio Layer 4), laptop portable computers, desktop computers, and the like.

The server 105 may be a server providing various services, such as a background server providing support for pages displayed on the

terminal devices

101, 102, 103.

It should be noted that the data processing method provided in the embodiments of the present application is generally executed by a server/terminal device, and accordingly, the data processing apparatus is generally disposed in the server/terminal device.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

Continuing to refer to fig. 2, a flowchart of an implementation of the data processing method according to the first embodiment of the present application is shown, and for convenience of description, only the portions related to the present application are shown.

The data processing method comprises the following steps:

step S201: and acquiring original data to be processed.

In the embodiment of the application, the preprocessing processing of the personalized data in the enterprise data platform or the data center is used for preprocessing the personalized data with the partner, which is from a core enterprise of an external industry partner and needs to enter the enterprise data platform or the data center for processing, so that the enterprise data platform or the data center can be suitable for the external data with various types and frequent change, the data processing efficiency of the enterprise data platform or the data center is improved, the system stability is improved, and the cost is greatly reduced.

In the embodiment of the present application, currently, two processing methods for the input data of the enterprise data platform or the data center station are adopted:

1) processing at a data access layer (ODM) layer, including JSON string analysis, decomposing a character string into a plurality of fields, and writing into corresponding data table records; using NLP to process characters, and performing word segmentation processing on the text to structure and record the text;

2) the data processing method comprises the steps of processing each layer (including DW, PDM, MDM, CDM and the like) of a data center station or a data warehouse, processing the data according to certain logic, such as screening conditions, field self or correlation operation (such as mathematical addition, subtraction, multiplication, division, counting, average value and the like) between fields, intercepting character strings with certain unknown and certain lengths in the fields and the like.

The data processing rules are solidified in an enterprise data system, and if the rules need to be adjusted, the system needs to be re-developed and adjusted with functions of scheduling related data batch processing tasks and the like.

Data processing modules in existing enterprise data platforms or data middlebox systems are designed and built based on the starting point of processing internal data. The number of data sources is determined, and the system and the data content of the data sources are clearly defined; the internal data source has stable supply and seldom changes content or field; even if the change occurs, the relevant party will be informed in advance, and the relevant station will make relevant modification in the data. The data of the industry and core enterprises which come from the outside, have various data definitions, complex types and frequent changes are difficult to apply, and are mainly reflected in that:

1) the program solidification and data processing are not flexible: the data processing program codes are solidified, and once an external data structure is changed, the existing program needs to be modified or redeveloped;

2) external data change cannot be sensed in time, and a large amount of error data is caused: the system mechanically executes a processing program without being responsible for data results, and result abnormity caused by external data change cannot be sensed unless processing failure causes error; such a processing method can process a large amount of error data, and the correct data and the error data are mixed together, which seriously affects the data quality.

3) The cost is high: when the change occurs, the existing program needs to be modified and developed, frequent external data change (which is very common) needs to be developed frequently, and a large amount of labor and time investment is needed for subsequent test, online and related function adjustment.

In the embodiment of the application, the raw data to be processed refers to data to be processed which needs to be processed, the data to be processed may be internal data with a definite content definition, and the data to be processed may also be data of industries and core enterprises which come from the outside, have various data definitions, are complex in types, and change frequently.

Step S202: and carrying out data analysis operation on the original data to obtain structured data.

In this embodiment of the present application, the data parsing operation may be implemented by structuring a standard JSON string, so as to obtain the structured data.

Step S203: and if the structured data also comprises personalized data, performing data conversion operation on the personalized data, and taking the personalized data after the data conversion operation as the structured data.

In the embodiment of the application, the structured data comprises general data with a format meeting preset requirements and personalized data with various data definitions, complex types, frequent changes and the like, wherein the data can be classified according to the data characteristics of external industries and core enterprises, namely the data classification required to distinguish the general data from the personalized data.

In the embodiment of the present application, if data access content or data format of a data source changes, a data conversion operation may be performed on the structured data, specifically, data conversion may be to convert characters into numbers through NLP, and it should be understood that the example of the data conversion operation is only for convenience of understanding and is not limited to the present application.

Step S204: and reading the operator rule base, and acquiring the operator rule corresponding to the structured data in the operator rule base.

In the embodiment of the present application, the operator rule base is mainly used for storing operator rules corresponding to different structured data, for example: the A operator is responsible for intercepting the second bit to the sixth bit of the field a and converting the second bit to the sixth bit into a numerical value; the following steps are repeated: the B operator is responsible for accumulating and summing the upstream data, etc., and it should be understood that the example of the operator rule is only for convenience of understanding and is not intended to limit the present application.

Step S205: and processing the structured data according to the operator rule to obtain a target processing result.

In the embodiment of the application, after the operator rules are obtained, different operators are combined according to the processing sequence, so that different complex processing rules can be realized, for example, the combination of the operator a and the operator B can realize the summation of the data converted from the second position to the sixth position in the field a, and after the configuration is completed, a processing program of the data processing and processing module is automatically generated.

In the embodiment of the present application, the processing operation refers to performing data processing and processing on the structured data according to the processing program, so as to obtain a processing result.

In an embodiment of the present application, a data processing method is provided, including: acquiring original data to be processed; performing data analysis operation on the original data to obtain structured data; if the structured data also comprises personalized data, performing data conversion operation on the personalized data, and taking the personalized data after the data conversion operation as the structured data; reading an operator rule base, and acquiring operator rules corresponding to the structured data in the operator rule base; and processing the structured data according to the operator rule to obtain a target processing result. The method comprises the steps of analyzing original data to be processed to obtain each structured data of the original data, obtaining a corresponding operator rule according to each structured data to enable the operator rule to process the corresponding structured data to obtain a target processing result, realizing a further data processing process through the combination of subsequent data processing operator rules, finally completing the processing from various non-structured data to internal data with definite content definition, and when the structured data also comprises individualized data with various data definitions, complex types, frequent changes and the like, obtaining the structured data meeting format requirements through data conversion operation on the individualized data, and further completing the operation of obtaining the operator rule and processing the data from the outside, the data processing method has the advantages that the data processing method is complex in type and frequently changed in the industry and core enterprises, and the compatibility of data processing is effectively improved.

Continuing to refer to fig. 3, a flow chart of another specific implementation of the data processing method according to the first embodiment of the present application is shown, and for convenience of description, only the portions related to the present application are shown.

In some optional implementations of this embodiment, after step S205, the method further includes:

step S301: and carrying out result verification operation on the target processing result according to a preset verification rule to obtain a verification result.

In the embodiment of the present application, a check rule is set according to a result of processing a data table accessed to a DOM layer, where the preset check rule may be according to processing result data distribution statistics (such as maximum, minimum, average, count, and the like), null value and abnormal value check, numerical data time series change amplitude check, and the like.

Step S302: and if the verification result is that abnormal data does not exist, taking the target machining result as a final machining result.

In the embodiment of the present application, the abnormal data refers to result data in which the processing result definitely does not conform to the business logic.

Step S303: and if the verification result is that abnormal data exists, acquiring abnormal structural data and abnormal operator rules corresponding to the abnormal data according to the incidence relation of the structural data, the operator rules and the target processing result.

In the embodiment of the present application, when the check result is that abnormal data exists in the target processing result, it indicates that the operator rule of the partially structured data needs to be adjusted.

Step S304: and carrying out rule correction operation on the abnormal operator rule according to the correction rule to obtain a correction operator rule.

In this embodiment of the present application, the correcting rule may be to obtain another set of operator rules according to the above abnormal structured data, and the rule correcting operation may be to update the abnormal operator rules to the another set of operator rules.

Step S305: and processing the abnormal structured data according to the correction operator rule to obtain corrected processing data.

In the embodiment of the application, after the new operator rule is obtained, the rerun data is started for the abnormal structured data, so that the result data meeting the check rule is obtained.

Step S306: and taking the normal data and the corrected processing data of the target processing result as final processing results.

if the verification result is that the suspicious data exist, the suspicious structured data corresponding to the suspicious data are obtained according to the association relation, and the suspicious structured data are stored in a cache region so as to be conveniently screened manually.

In the embodiment of the present application, the suspicious data refers to data that is determined to be in question when the processing result does not meet the check rule.

In the embodiment of the present application, after the suspicious structured data is stored in the cache region, manual review may be performed to confirm that no error exists, or a delay time may be set, and when the delay time is satisfied, normal data may be confirmed by the suspicious structured data.

Continuing to refer to fig. 4, a flowchart of one embodiment of step S203 of fig. 2 is shown, and for convenience of illustration, only the portions relevant to the present application are shown.

In some optional implementation manners of this embodiment, step S203 specifically includes:

step S401: and inputting the personalized data into a semantic analysis model to perform word meaning identification operation to obtain real word meaning information.

In the embodiment of the application, the semantic analysis model is a pre-trained deep recognition network model, and the semantic analysis model can acquire the real meaning of the personalized data by analyzing the associated text content.

In the embodiment of the present application, the real word sense information refers to a real word sense of an ambiguous word predicted by the semantic analysis model based on the associated text information, so as to avoid a situation of erroneous judgment.

Step S402: and taking the real word meaning information as structured data.

In the embodiment of the application, due to the characteristics of various data definitions, complex types, frequent change and the like of personalized data, the situations of input errors, existence of two word senses and the like are often caused.

With continued reference to fig. 5, a flowchart for implementing obtaining a semantic analysis model provided in an embodiment of the present application is shown, and for convenience of description, only the relevant portions of the present application are shown.

In some optional implementation manners of the first embodiment of the present application, before the step S401, the method further includes: step S501, step S502, step S503, and step S504.

In step S501, a sample text is obtained in the local database, and each participle included in the sample text is determined.

In this embodiment of the present application, a plurality of texts may be obtained from the local database, and a training set formed by the obtained plurality of texts is determined, so that each text in the training set may be used as a sample text.

In this embodiment of the present application, when determining the participles included in the sample text, the sample text may be subjected to a participle process first to obtain each participle included in the sample text. When performing word segmentation processing on a sample text, any word segmentation method may be adopted, and of course, each character in the sample text may also be processed as a word segmentation, and it should be understood that the example of word segmentation processing is only for convenience of understanding and is not limited to the present application.

In step S502, a word vector corresponding to each participle is determined based on the semantic analysis model to be trained.

In the embodiment of the present application, the semantic analysis model may include at least four layers, which are: the system comprises a semantic representation layer, an attribute relevance representation layer and a classification layer.

In the embodiment of the present application, the semantic representation layer at least includes a sub-model for outputting a bidirectional semantic representation vector, such as a bert (bidirectional Encoder representation from transforms) model. Each participle can be input into a semantic representation layer in a semantic analysis model, and a bidirectional semantic representation vector corresponding to each participle output by the semantic representation layer is obtained and serves as a word vector corresponding to each participle. It should be understood that the model for outputting the bi-directional semantic representation vector includes other models besides the BERT model described above, and the example of the model for outputting the bi-directional semantic representation vector is only for convenience of understanding and is not intended to limit the present application.

In step S503, semantic attributes are obtained from the local database, and a first feature expression vector of the sample text related to the semantic attributes is determined according to an attention matrix corresponding to the semantic attributes and a word vector corresponding to each participle included in the semantic analysis model to be trained.

In this embodiment of the present application, a word vector corresponding to each participle may be input to an attribute characterization layer in a semantic analysis model, the attention matrix corresponding to the semantic attribute included in the attribute characterization layer is used to perform attention weighting on the word vector corresponding to each participle, and a first feature expression vector of the sample text related to the semantic attribute is determined according to the word vector corresponding to each participle after the attention weighting.

In step S504, a second feature representation vector of the sample text related to the semantic attributes is determined according to the self-attention matrix included in the semantic analysis model to be trained for representing the correlation between different semantic attributes and the first feature representation vector.

In the embodiment of the present application, the first feature expression vector of the sample text related to each semantic attribute may be input to an attribute relevance expression layer in the speech analysis model, the first feature expression vector of the sample text related to each semantic attribute may be self-attention weighted by the above-mentioned self-attention matrix included in the attribute relevance expression layer, and a second feature expression vector of the sample text related to each semantic attribute may be determined according to each self-attention weighted first feature expression vector.

In step S505, a classification result output by the semantic training model to be trained is determined according to the semantic analysis model to be trained and the second feature expression vector, where the classification result includes a semantic attribute to which the sample text belongs and an emotion polarity corresponding to the semantic attribute to which the sample text belongs.

In the embodiment of the application, the classification layer at least comprises a hidden layer, a full connection layer and a softmax layer.

In the embodiment of the application, the second feature representation vectors of the sample texts related to each semantic attribute can be sequentially input into the hidden layer, the full-link layer and the softmax layer in the classification layer, and the sample texts are classified according to the classification parameters corresponding to each semantic attribute contained in each second feature representation vector and the hidden layer, the full-link layer and the softmax layer of the classification layer, so that the classification result output by the classification layer is obtained.

In the embodiment of the present application, the classification result at least includes the semantic attribute to which the sample text belongs and the emotion polarity corresponding to the semantic attribute to which the sample text belongs.

In the embodiment of the present application, the emotion polarity can be quantified by a numerical value, for example, the closer the numerical value is to 1, the more positive the emotion polarity is, the closer the numerical value is to-1, the more negative the emotion polarity is, and the closer the numerical value is to 0, the neutral the emotion polarity is.

In step S506, the model parameters in the semantic analysis model are adjusted according to the classification result and the labels preset in the sample text, so as to obtain the semantic analysis model.

In the embodiment of the present application, the model parameters to be adjusted at least include the classification parameters described above, and may further include the attention matrix and the self-attention matrix described above. The model parameters in the semantic analysis model can be adjusted by using a traditional training method. That is, the loss (hereinafter referred to as a first loss) corresponding to the classification result is determined directly according to the classification result and the label preset for the sample text, and the model parameters in the semantic analysis model are adjusted by using the first loss as the training target, so as to complete the training of the semantic analysis model.

In the embodiment of the application, because the self-attention matrix for representing the correlation between different semantic attributes is added to the semantic analysis model, the semantic analysis model obtained by training by adopting the traditional training method can analyze the semantics of the text to be analyzed more accurately.

In some optional implementations of the first embodiment of the present application, the step S502 specifically includes the following steps:

and inputting each participle into a semantic representation layer of a semantic analysis model to obtain a bidirectional semantic representation vector corresponding to each participle output by the semantic representation layer as a word vector corresponding to each participle.

In an embodiment of the application, the semantic representation layer comprises at least a sub-model for outputting the bi-directional semantic representation vector, the sub-model comprising a BERT model.

Continuing to refer to fig. 6, a flowchart of an implementation of the first eigenvector obtaining method provided in the first embodiment of the present application is shown, and for convenience of description, only the relevant portions of the present application are shown.

In some optional implementation manners of the first embodiment of the present application, step S503 specifically includes: step S601, step S602, and step S603.

In step S601, the word vector corresponding to each participle is input to the attribute characterization layer in the semantic analysis model.

In the embodiment of the present application, at least the attribute characterization layer includes an attention matrix corresponding to each semantic attribute.

In step S602, attention weighting is performed on the word vector corresponding to each participle through the attention matrix corresponding to the semantic attributes included in the attribute representation layer, so as to obtain a weighted word vector.

In step S603, a first feature representation vector of the sample text relating to semantic attributes is determined based on the weighted word vector.

In this embodiment, the first feature expression vector may characterize the probability that the sample text relates to the semantic attribute and the emotion polarity on the semantic attribute.

In some optional implementation manners of the first embodiment of the present application, in step S504, the method specifically includes: step S701, step S702, and step S703.

In step S701, the first feature representation vector is input to the attribute correlation representation layer in the semantic analysis model.

In the embodiment of the present application, at least a self-attention matrix is included in an attribute correlation representation layer in a semantic analysis model, the self-attention matrix is used for representing correlation between different semantic attributes, and the form of the self-attention matrix may be: element R in the matrix_ijRepresenting the correlation of the ith semantic attribute and the jth semantic attribute, the stronger the correlation, R_ijThe larger the value of (A) and the smaller the opposite.

In step S702, a first feature representation vector of the sample text related to each semantic attribute is self-attention weighted by a self-attention matrix included in the attribute relevance representation layer for representing relevance between different semantic attributes, so as to obtain a weighted feature representation vector.

In step S703, a second feature representation vector of the sample text relating to each semantic attribute is determined based on the weighted feature representation vector.

In the embodiment of the present application, the second feature expression vector may also represent the probability that the sample text relates to each semantic attribute and the emotion polarity on the semantic attribute, but unlike the first feature expression vector, the first feature expression vector is obtained by weighting the word vector by using the attention matrix corresponding to each semantic attribute, which is independent of each other, and therefore, the probability that the sample text characterized by the second feature expression vector relates to each semantic attribute and the emotion polarity on the semantic attribute do not consider the correlation between different semantic attributes. And the second feature expression vector is obtained by weighting the first feature expression vector by using a self-attention matrix for expressing the correlation between different semantic attributes, which is equivalent to a factor of the correlation between different semantic attributes introduced by the self-attention matrix, so that the probability of the sample text represented by the second feature expression vector related to each semantic attribute and the emotion polarity on the semantic attributes take the correlation between different semantic attributes into consideration.

In summary, the present application provides a data processing method, including: acquiring original data to be processed; performing data analysis operation on the original data to obtain structured data; if the structured data also comprises personalized data, performing data conversion operation on the personalized data, and taking the personalized data after the data conversion operation as the structured data; reading an operator rule base, and acquiring operator rules corresponding to the structured data in the operator rule base; and processing the structured data according to the operator rule to obtain a target processing result. The method comprises the steps of analyzing original data to be processed to obtain each structured data of the original data, obtaining a corresponding operator rule according to each structured data to enable the operator rule to process the corresponding structured data to obtain a target processing result, realizing a further data processing process through the combination of subsequent data processing operator rules, finally completing the processing from various non-structured data to internal data with definite content definition, and when the structured data also comprises individualized data with various data definitions, complex types, frequent changes and the like, obtaining the structured data meeting format requirements through data conversion operation on the individualized data, and further completing the operation of obtaining the operator rule and processing the data from the outside, the data processing method has the advantages that the data processing method is complex in type and frequently changed in the industry and core enterprises, and the compatibility of data processing is effectively improved. Meanwhile, semantic analysis is carried out by combining the context content of the ambiguous vocabulary, the actual meaning of the vocabulary is obtained, and then subsequent structured data processing operation is carried out, so that the condition of misjudgment is effectively avoided, and further the subsequent unnecessary correction operation is avoided.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware associated with computer readable instructions, which can be stored in a computer readable storage medium, and when executed, can include processes of the embodiments of the methods described above. The storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a Random Access Memory (RAM).

It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.

Example two

With further reference to fig. 7, as an implementation of the method shown in fig. 2, the present application provides an embodiment of a data processing apparatus, which corresponds to the embodiment of the method shown in fig. 2, and which is particularly applicable to various electronic devices.

As shown in fig. 7, the data processing apparatus 200 of the present embodiment includes: the system comprises a data acquisition module 201, a data analysis module 202, a format conversion module 203, an operator acquisition module 204 and a first processing module 205. Wherein:

a data obtaining module 201, configured to obtain original data to be processed;

the data analysis module 202 is configured to perform data analysis operation on the original data to obtain structured data;

the format conversion module 203 is configured to perform data conversion operation on the personalized data if the structured data further includes the personalized data, and use the personalized data after the data conversion operation as the structured data;

the operator obtaining module 204 is configured to read an operator rule base, and obtain an operator rule corresponding to the structured data from the operator rule base;

and the first processing module 205 is configured to perform processing operation on the structured data according to the operator rule to obtain a target processing result.

2) the error of data change cannot be found, resulting in a large amount of error data: the system mechanically executes a processing program without being responsible for data results, and result abnormity caused by external data change cannot be sensed unless processing failure causes error; such a processing method may result in a large amount of error data being processed, and the correct data and the error data are mixed together, which seriously affects the data quality.

In an embodiment of the present application, there is provided a data processing apparatus 200 including: a data obtaining module 201, configured to obtain original data to be processed; the data analysis module 202 is configured to perform data analysis operation on the original data to obtain structured data; the format conversion module 203 is configured to perform data conversion operation on the personalized data if the structured data further includes the personalized data, and use the personalized data after the data conversion operation as the structured data; the operator obtaining module 204 is configured to read an operator rule base, and obtain an operator rule corresponding to the structured data from the operator rule base; and the first processing module 205 is configured to perform processing operation on the structured data according to the operator rule to obtain a target processing result. The method comprises the steps of analyzing original data to be processed to obtain each structured data of the original data, obtaining a corresponding operator rule according to each structured data to enable the operator rule to process the corresponding structured data to obtain a target processing result, realizing a further data processing process through the combination of subsequent data processing operator rules, finally completing the processing from various non-structured data to internal data with definite content definition, and when the structured data also comprises individualized data with various data definitions, complex types, frequent changes and the like, obtaining the structured data meeting format requirements through data conversion operation on the individualized data, and further completing the operation of obtaining the operator rule and processing the data from the outside, the data processing method has the advantages that the data processing method is complex in type and frequently changed in the industry and core enterprises, and the compatibility of data processing is effectively improved.

Continuing to refer to fig. 8, a schematic structural diagram of another specific implementation of the data processing device according to the second embodiment of the present application is shown, and for convenience of description, only the portions related to the present application are shown.

In some optional implementations of the present embodiment, the data processing apparatus 200 further includes: a result verification module 206, a result normality module 207, a result abnormality module 208, a rule correction module 209, a second processing module 210, and a result confirmation module 211, wherein:

the result checking module 206 is configured to perform a result checking operation on the target processing result according to a preset checking rule to obtain a checking result;

a result normality module 207, configured to take the target machining result as a final machining result if the verification result indicates that there is no abnormal data;

a result exception module 208, configured to, if the check result indicates that there is exception data, obtain, according to an association relationship between the structured data and the operator rule and the target processing result, exception structured data and an exception operator rule corresponding to the exception data;

the rule correcting module 209 is configured to perform a rule correcting operation on the abnormal operator rule according to the correction rule to obtain a correction operator rule;

the second processing module 210 is configured to perform processing operation on the abnormal structured data according to the corrector rule to obtain corrected processing data;

and a result confirmation module 211 for setting the normal data and the correction machining data of the target machining result as a final machining result.

In some optional implementations of this embodiment, the data processing apparatus 200 further includes: a suspicious results module, wherein:

and the suspicious result module is used for acquiring suspicious structured data corresponding to the suspicious data according to the association relation if the verification result indicates that the suspicious data exist, and storing the suspicious structured data into a cache region so as to facilitate manual screening.

In some optional implementation manners of this embodiment, the data conversion module 203 specifically includes: a sense recognition sub-module and a structured data determination sub-module, wherein:

the word meaning identification submodule is used for inputting the personalized data into the semantic analysis model to carry out word meaning identification operation to obtain real word meaning information;

and the structured data determining submodule is used for taking the real word sense information as structured data.

In some optional implementation manners of the second embodiment of the present application, the data conversion module 203 further includes: the system comprises a word segmentation determining module, a word vector determining module, a first feature expression vector determining module, a second feature expression vector determining module, a classification result determining module and a model obtaining module. Wherein:

the word segmentation determining module is used for acquiring a sample text from a local database and determining each word segmentation contained in the sample text;

the word vector determining module is used for determining a word vector corresponding to each participle based on the semantic analysis model to be trained;

the first feature expression vector determining module is used for acquiring semantic attributes from a local database, and determining a first feature expression vector of the sample text related to the semantic attributes according to an attention matrix corresponding to the semantic attributes and a word vector corresponding to each participle contained in a semantic analysis model to be trained;

the second feature expression vector determining module is used for determining a second feature expression vector of the sample text related to the semantic attributes according to a self-attention matrix which is contained in the semantic analysis model to be trained and is used for expressing the correlation among different semantic attributes and the first feature expression vector;

the classification result determining module is used for determining a classification result output by the semantic training model to be trained according to the semantic analysis model to be trained and the second feature expression vector, and the classification result comprises a semantic attribute to which the sample text belongs and an emotion polarity corresponding to the semantic attribute to which the sample text belongs;

and the model acquisition module is used for adjusting model parameters in the semantic analysis model according to the classification result and the preset label of the sample text to obtain the semantic analysis model.

In some optional implementations of the second embodiment of the present application, the word vector determining module specifically includes: and a semantic representation submodule. Wherein:

and the semantic representation submodule is used for inputting each participle into a semantic representation layer of the semantic analysis model to obtain a bidirectional semantic representation vector which corresponds to each participle output by the semantic representation layer and is used as a word vector corresponding to each participle.

In some optional implementations of the second embodiment of the present application, the first feature expression vector determining module specifically includes: an attribute characterization sub-module, an attention weighting sub-module, and a first feature representation vector determination sub-module. Wherein:

the attribute characterization submodule is used for inputting the word vector corresponding to each participle into an attribute characterization layer in the semantic analysis model;

the attention weighting submodule is used for carrying out attention weighting on the word vector corresponding to each participle through an attention matrix corresponding to the semantic attributes contained in the attribute representation layer to obtain a weighted word vector;

a first feature representation vector determination submodule for determining a first feature representation vector of the sample text relating to semantic attributes on the basis of the weighted word vector.

In some optional implementations of the second embodiment of the present application, the second feature expression vector determining module specifically includes: an attribute relevance representation submodule, a self-attention weighting submodule and a second feature representation vector determination submodule. Wherein:

the attribute relevance representation submodule is used for inputting the first feature representation vector to an attribute relevance representation layer in the semantic analysis model;

the self-attention weighting submodule is used for carrying out self-attention weighting on a first feature representation vector of the sample text related to each semantic attribute through a self-attention matrix which is contained in the attribute relevance representation layer and used for representing the relevance between different semantic attributes to obtain a weighted feature representation vector;

a second feature representation vector determination sub-module for determining a second feature representation vector of the sample text relating to each semantic attribute based on the weighted feature representation vectors.

In summary, the present application provides a data processing apparatus 200, comprising: a data obtaining module 201, configured to obtain original data to be processed; the data analysis module 202 is configured to perform data analysis operation on the original data to obtain structured data; the format conversion module 203 is configured to perform data conversion operation on the personalized data if the structured data further includes the personalized data, and use the personalized data after the data conversion operation as the structured data; the operator obtaining module 204 is configured to read an operator rule base, and obtain an operator rule corresponding to the structured data from the operator rule base; and the first processing module 205 is configured to perform processing operation on the structured data according to the operator rule to obtain a target processing result. The method comprises the steps of analyzing original data to be processed to obtain each structured data of the original data, obtaining a corresponding operator rule according to each structured data to enable the operator rule to process the corresponding structured data to obtain a target processing result, realizing a further data processing process through the combination of subsequent data processing operator rules, finally completing the processing from various non-structured data to internal data with definite content definition, and when the structured data also comprises individualized data with various data definitions, complex types, frequent changes and the like, obtaining the structured data meeting format requirements through data conversion operation on the individualized data, and further completing the operation of obtaining the operator rule and processing the data from the outside, the data processing method has the advantages that the data processing method is complex in type and frequently changed in the industry and core enterprises, and the compatibility of data processing is effectively improved. Meanwhile, semantic analysis is carried out by combining the context content of the ambiguous vocabulary, the actual meaning of the vocabulary is obtained, and then subsequent structured data processing operation is carried out, so that the condition of misjudgment is effectively avoided, and further the subsequent unnecessary correction operation is avoided.

In order to solve the technical problem, an embodiment of the present application further provides a computer device. Referring to fig. 9, fig. 9 is a block diagram of a basic structure of a computer device according to the present embodiment.

The computer device 300 includes a memory 310, a processor 320, and a network interface 330 communicatively coupled to each other via a system bus. It is noted that only computer device 300 having

components

310 and 330 is shown, but it is understood that not all of the shown components are required and that more or fewer components may alternatively be implemented. As will be understood by those skilled in the art, the computer device is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and the hardware includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like.

The computer device can be a desktop computer, a notebook, a palm computer, a cloud server and other computing devices. The computer equipment can carry out man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch panel or voice control equipment and the like.

The memory 310 includes at least one type of readable storage medium including a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, etc. In some embodiments, the storage 310 may be an internal storage unit of the computer device 300, such as a hard disk or a memory of the computer device 300. In other embodiments, the memory 310 may also be an external storage device of the computer device 300, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), or the like, provided on the computer device 300. Of course, the memory 310 may also include both internal and external storage devices of the computer device 300. In this embodiment, the memory 310 is generally used for storing an operating system installed on the computer device 300 and various types of application software, such as computer readable instructions of a data processing method. In addition, the memory 310 may also be used to temporarily store various types of data that have been output or are to be output.

The processor 320 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data Processing chip in some embodiments. The processor 320 is generally operative to control overall operation of the computer device 300. In this embodiment, the processor 320 is configured to execute the computer readable instructions stored in the memory 310 or process data, for example, execute the computer readable instructions of the data processing method.

The network interface 330 may include a wireless network interface or a wired network interface, and the network interface 330 is generally used to establish a communication connection between the computer device 300 and other electronic devices.

The computer equipment provided by the application analyzes original data to be processed to obtain each structured data of the original data, obtains a corresponding operator rule according to each structured data to enable the operator rule to process the corresponding structured data to obtain a target processing result, realizes a further data processing process through the combination of subsequent data processing operator rules, finally completes the processing from various non-structural data to internal data with clear content definition, and in addition, when the structured data also comprises individualized data with various data definitions, complex types, frequent changes and the like, obtains the structured data meeting format requirements through data conversion operation on the individualized data, and performs subsequent operator rule acquisition and processing operation to further complete the processing from the outside, the data definition is various, the type is complex, the processing of the data of the frequently changed industry and core enterprises is realized, and the compatibility of the data processing is effectively improved.

The present application further provides another embodiment, which is to provide a computer-readable storage medium, wherein the computer-readable storage medium stores computer-readable instructions, which can be executed by at least one processor, so as to cause the at least one processor to execute the steps of the data processing method as described above.

The computer readable storage medium provided by the application analyzes original data to be processed to obtain each structured data of the original data, acquires a corresponding operator rule according to each structured data to enable the operator rule to process the corresponding structured data to obtain a target processing result, and realizes a further data processing process through the combination of subsequent data processing operator rules to finally complete the processing from various non-structural data to internal data with definite content definition, and in addition, when the structured data also comprises individualized data with various data definitions, complex types, frequent changes and the like, the individualized data is subjected to data conversion operation to obtain structured data meeting format requirements, and the subsequent operator rule and processing operation are acquired, and further, the processing of data of industries and core enterprises which come from the outside, have various data definitions, complex types and frequent changes is completed, and the compatibility of data processing is effectively improved.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present application.

It is to be understood that the above-described embodiments are merely illustrative of some, but not restrictive, of the broad invention, and that the appended drawings illustrate preferred embodiments of the invention and do not limit the scope of the invention. This application is capable of embodiments in many different forms and is provided for the purpose of enabling a thorough understanding of the disclosure of the application. Although the present application has been described in detail with reference to the foregoing embodiments, it will be apparent to one skilled in the art that the present application may be practiced without modification or with equivalents of some of the features described in the foregoing embodiments. All equivalent structures made by using the contents of the specification and the drawings of the present application are directly or indirectly applied to other related technical fields and are within the protection scope of the present application.

Claims

1. A data processing method is characterized by comprising the following steps:

acquiring original data to be processed;

2. The data processing method of claim 1, further comprising, after the step of performing the processing operation on the structured data according to the operator rule to obtain the target processing result, the steps of:

performing result verification operation on the target processing result according to a preset verification rule to obtain a verification result;

if the verification result is that abnormal data does not exist, taking the target machining result as a final machining result;

if the verification result is that abnormal data exists, acquiring abnormal structured data and abnormal operator rules corresponding to the abnormal data according to the incidence relation of the structured data, the operator rules and the target processing result;

carrying out rule correction operation on the abnormal operator rule according to a correction rule to obtain a correction operator rule;

processing the abnormal structured data according to the correction operator rule to obtain corrected processing data;

and taking the normal data of the target machining result and the corrected machining data as the final machining result.

3. The data processing method of claim 1, further comprising, after the step of performing the processing operation on the structured data according to the operator rule to obtain the target processing result, the steps of:

and if the verification result is that suspicious data exists, obtaining suspicious structured data corresponding to the suspicious data according to the association relation, and storing the suspicious structured data into a cache region so as to carry out manual screening.

4. The data processing method according to claim 1, wherein if the structured data further includes personalized data, the step of performing a data transformation operation on the personalized data and using the personalized data after the data transformation operation as the structured data specifically includes the following steps:

inputting the personalized data into a semantic analysis model to perform word meaning identification operation to obtain real word meaning information;

and taking the real word meaning information as the structured data.

5. The data processing method of claim 4, wherein before the step of inputting the personalized data into a semantic analysis model for word sense recognition operation to obtain real word sense information, the method further comprises:

obtaining a sample text from the local database, and determining each participle contained in the sample text;

determining a word vector corresponding to each participle based on a semantic analysis model to be trained;

obtaining semantic attributes from the local database, and determining a first feature expression vector of the sample text related to the semantic attributes according to an attention matrix corresponding to the semantic attributes and a word vector corresponding to each participle in the semantic analysis model to be trained;

determining a second feature representation vector of the sample text related to the semantic attributes according to a self-attention matrix which is contained in the semantic analysis model to be trained and used for representing correlation among different semantic attributes and the first feature representation vector;

determining a classification result output by the semantic training model to be trained according to the semantic analysis model to be trained and the second feature expression vector, wherein the classification result comprises a semantic attribute to which the sample text belongs and an emotion polarity corresponding to the semantic attribute to which the sample text belongs;

and adjusting model parameters in the semantic analysis model according to the classification result and the preset label of the sample text to obtain the semantic analysis model.

6. The data processing method according to claim 5, wherein the step of determining the word vector corresponding to each participle based on the semantic analysis model to be trained specifically comprises:

and inputting each participle into a semantic representation layer of the semantic analysis model to obtain a bidirectional semantic representation vector which is output by the semantic representation layer and corresponds to each participle respectively, and taking the bidirectional semantic representation vector as a word vector corresponding to each participle.

7. A data processing apparatus, comprising:

8. The data processing apparatus of claim 7, further comprising:

the result checking module is used for carrying out result checking operation on the target machining result according to a preset checking rule to obtain a checking result;

a result normal module, configured to take the target processing result as a final processing result if the verification result indicates that no abnormal data exists;

the result exception module is used for acquiring the exception structured data and the exception operator rule corresponding to the exception data according to the incidence relation of the structured data, the operator rule and the target processing result if the verification result indicates that the exception data exists;

the rule correction module is used for carrying out rule correction operation on the abnormal operator rule according to a correction rule to obtain a correction operator rule;

the second processing module is used for carrying out the processing operation on the abnormal structured data according to the correction operator rule to obtain corrected processing data;

and the result confirmation module is used for taking the normal data of the target machining result and the corrected machining data as the final machining result.

9. A computer device comprising a memory having computer readable instructions stored therein and a processor which when executed implements the steps of a data processing method according to any one of claims 1 to 6.

10. A computer-readable storage medium, having computer-readable instructions stored thereon, which, when executed by a processor, implement the steps of the data processing method of any one of claims 1 to 6.