CN112380214A - Data processing method and device and electronic equipment - Google Patents

Data processing method and device and electronic equipment Download PDF

Info

Publication number
CN112380214A
CN112380214A CN202011272746.XA CN202011272746A CN112380214A CN 112380214 A CN112380214 A CN 112380214A CN 202011272746 A CN202011272746 A CN 202011272746A CN 112380214 A CN112380214 A CN 112380214A
Authority
CN
China
Prior art keywords
data
field
matching
standard
data processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011272746.XA
Other languages
Chinese (zh)
Inventor
刘伟
李晓宇
周宇
张焱
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Ultrapower Intelligent Data Technology Co ltd
Original Assignee
Beijing Ultrapower Intelligent Data Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Ultrapower Intelligent Data Technology Co ltd filed Critical Beijing Ultrapower Intelligent Data Technology Co ltd
Priority to CN202011272746.XA priority Critical patent/CN112380214A/en
Publication of CN112380214A publication Critical patent/CN112380214A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/256Integrating or interfacing systems involving database management systems in federated or virtual databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/258Data format conversion from or to a database

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a data processing method and device and electronic equipment. The method comprises the following steps: splicing the original data of the first original fields to obtain spliced data; the first original field is an associated field of a first standard field; matching the splicing data with the association table of the first standard field to obtain a matching result; and determining standardized data of the first standard field according to the matching result. The technical scheme has the advantages that the spliced data after the original data are spliced is matched by using the association table, the standardized data are determined according to the matching result, a good data standardization effect can still be obtained under the condition that the original data are too disordered or missing, and the method has high robustness and efficiency.

Description

Data processing method and device and electronic equipment
Technical Field
The present application relates to the field of computer technologies, and in particular, to a data processing method and apparatus, and an electronic device.
Background
Currently, various big data storage products and big data applications oriented to various industries are in endless, and data processing technologies meet new challenges in the big data era. For example, the formats of data generated by different channels are different, which is not favorable for the collection and merging of data.
In some schemes, the data normalization processing by using a regular expression is selected, but the method has the disadvantages that the applicable scenarios are limited, and the requirement cannot be met in the case that the data content is too messy or missing.
Disclosure of Invention
The embodiment of the application provides a data processing method and device and electronic equipment, and can realize effective standardized processing on disordered data.
The embodiment of the application adopts the following technical scheme:
in a first aspect, an embodiment of the present application provides a data processing method, including: splicing the original data of the first original fields to obtain spliced data; the first original field is an associated field of the first standard field; matching the splicing data with the association table of the first standard field to obtain a matching result; and determining standardized data of the first standard field according to the matching result.
Optionally, there are a plurality of association tables of the first standard field, and matching the splicing data with the association table of the first standard field to obtain a matching result includes: matching the splicing data with each association table in sequence according to the matching priority of the association tables; each item of the association table records standardized data of a first standard field; stopping matching after the matching item is obtained, and taking the obtained matching item as a matching result; and if no matching item is obtained in each association table, stopping matching and obtaining a matching result without the matching item.
Optionally, determining the normalized data of the first criterion field according to the matching result comprises: if the matching result is no matching item, determining that the standardized data of the first standard field is a null value; the data processing method further comprises: error information corresponding to the first standard field is generated to enable data checking according to the error information.
Optionally, determining the normalized data of the first criterion field according to the matching result comprises: and if the matching result is no matching item, extracting the original data of the first original field directly associated with the first standard field by using the regular expression to obtain the standardized data of the first standard field.
Optionally, the data processing method further includes: and extracting the original data from the second original field by using a regular expression to obtain the standardized data of the second standard field.
Optionally, the association table records multiple sets of standardized data, each set of standardized data including standardized data of the first standard field and standardized data of a plurality of third standard fields; the data processing method further comprises: and determining the normalized data of the third standard field according to the matching result.
Optionally, the data processing method further includes, before all the steps: and loading a standard configuration file, wherein the association relation between the standard fields and the original fields and the association table of each standard field are recorded in the standard configuration file.
Optionally, the standard configuration file further records a storage path of the original data file and a storage path of the result file, and the data processing method further includes: reading original data of an original field according to a storage path of an original data file; the resulting normalized data is written into the corresponding normalized fields of the result file.
In a second aspect, an embodiment of the present application further provides a data processing apparatus, configured to implement the data processing method described above.
In a third aspect, an embodiment of the present application further provides an electronic device, including: a processor; and a memory arranged to store computer executable instructions that, when executed, cause the processor to perform a data processing method as described in any one of the above.
In a fourth aspect, embodiments of the present application further provide a computer-readable storage medium storing one or more programs, which when executed by an electronic device including a plurality of application programs, cause the electronic device to perform the data processing method as described in any one of the above.
The embodiment of the application adopts at least one technical scheme which can achieve the following beneficial effects:
the method comprises the steps of utilizing an association table to match spliced data after splicing of original data, determining standardized data according to a matching result, still obtaining a good data standardization effect under the condition that the original data is too disordered or missing, and having high robustness and efficiency.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
FIG. 1 is a schematic flow chart diagram of a data processing method according to an embodiment of the present application;
FIG. 2 is a schematic block diagram of a data processing apparatus according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of an electronic device in an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be described in detail and completely with reference to the following specific embodiments of the present application and the accompanying drawings. It should be apparent that the described embodiments are only some of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The technical idea of the present application is that taking into account the relevance existing between the data contents, it is also possible to determine the brand, manufacturer and country of the car, for example, when determining the train of the car. Then, regardless of whether the brand, manufacturer, and country of the automobile are directly shown in the raw data, the brand, manufacturer, and country of the automobile can be confirmed as long as the series of the automobile is directly shown in the raw data. The method is applied to the process of data standardization processing, brand, manufacturer and country information in original data does not need to be processed, even if the brand, manufacturer and country information is lost or is excessively disordered, robustness and efficiency of data processing are greatly improved.
The technical solutions provided by the embodiments of the present application are described in detail below with reference to the accompanying drawings.
Fig. 1 is a schematic flow chart of a data processing method according to an embodiment of the present application. As shown in fig. 1, the method includes:
and step S110, splicing the original data of the plurality of first original fields to obtain spliced data. Wherein the first original field is an associated field of the first standard field.
For example, the automobile information may be obtained by crawling from various automobile websites by using a crawler program, and the crawled information of the automobile department, the brand manufacturer and the like may be recorded in the automobile department field and the brand manufacturer field as original data, where the automobile department field and the brand manufacturer field are the original fields.
Even if the same type of information, such as the car series, is associated with different car websites, the data format crawled may be different, and standardized processing is required. For example, the raw data in the vehicle family field needs to be recorded in the standard vehicle family field after the standardization process. In this case, the vehicle family field is the associated field of the standard vehicle family field.
As described above, after the series of the automobile is determined, the brand, manufacturer, and country of the automobile can be determined at the same time without considering whether the original data contains such information, that is, the series field (original field) is also an associated field of the standard brand field (standard field).
That is, the associated field means that the raw data in the first raw field is directly or indirectly helpful for the determination of the normalized data in the first standard field.
According to similar logic, the brand manufacturer field can be determined to be the associated field of the standard brand field and the standard manufacturer field respectively, and the vehicle department field, the brand manufacturer field and the vehicle style field are all associated fields of country fields, which are not specifically analyzed.
Although the above example is described by way of example with respect to car information, it can be understood that the technical solution of the present application can also be applied to processing of data in other fields, for example, house information (there is a relationship between streets, cities, and the like) and the like, and the example is not given here.
The splicing of the original data can also be understood as merging or fusing of the original data, and a specific implementation manner can be implemented by any one of the prior arts, which is not limited in the present application.
And step S120, matching the splicing data with the association table of the first standard field to obtain a matching result.
For example, if the first standard field is a standard vehicle family field, the association table may be a vehicle family table, which records the brand, manufacturer, and country of each vehicle family. The association table can support direct matching and fuzzy matching.
Step S130, determining the standardized data of the first standard field according to the matching result.
Therefore, in the method shown in fig. 1, the association table is used for matching the spliced data after splicing the original data, and the standardized data is determined according to the matching result, so that a good data standardization effect can be still obtained under the condition that the original data is too disordered or missing, and the method has high robustness and efficiency.
In an embodiment of the application, in the data processing method, there are a plurality of association tables in the first standard field, and matching the splicing data with the association table in the first standard field to obtain a matching result includes: matching the splicing data with each association table in sequence according to the matching priority of the association tables; each item of the association table records standardized data of a first standard field; stopping matching after the matching item is obtained, and taking the obtained matching item as a matching result; and if no matching item is obtained in each association table, stopping matching and obtaining a matching result without the matching item.
For example, the association table of the standard brand field is a vehicle family table and a brand table, the priority of the vehicle family table is first level, and the priority of the brand table is second level. The splicing data is obtained by splicing the original data of the three fields of the vehicle family field, the brand manufacturer field and the vehicle style field.
And when matching, matching the spliced data with the train system table, stopping matching if a matching item can be obtained, and outputting the corresponding matching item as a matching result. And if the matching item cannot be obtained in the train system table, matching the splicing data with the brand table, stopping matching if the matching item can be obtained, outputting the corresponding matching item as a matching result, and stopping matching if the matching item cannot be obtained even if the brand table is traversed to obtain the matching result without the matching item.
Similarly, the association table of the standard country field is a vehicle system table and a country table, the priority of the vehicle system table is first level, and the priority of the country table is second level; the association table of the standard manufacturer field is a vehicle system table and a manufacturer table, the priority of the vehicle system table is first level, and the priority of the manufacturer table is second level; the splicing data can be obtained by splicing the original data of the three fields of the vehicle family field, the brand manufacturer field and the vehicle style field. The process of matching is similar to the previous example and will not be described in detail here.
The items of the association table are respectively recorded with the standardized data of the first standard field, so that the standardized processing of the original data is automatically realized after the matching items are obtained.
The first normalized data can be directly determined for the case where the matching result is a match, and can be handled in two ways for the case where the matching result is no match.
For example, in an embodiment of the present application, in the data processing method, the determining the normalized data of the first standard field according to the matching result includes: if the matching result is no matching item, determining that the standardized data of the first standard field is a null value; the data processing method further comprises: error information corresponding to the first standard field is generated to enable data checking according to the error information.
This may be the case where the original data format is too cluttered, for example for brands, manufacturers and countries. That is, the standardized data may not be directly obtained from the original data, and manual intervention may be required, at this time, error information corresponding to the first standard field, such as an error code, may be generated, the error codes of different standard fields may be different, and a data processor may implement manual processing or data inspection according to the error code.
In another embodiment of the present application, in the data processing method, the determining the normalized data of the first standard field according to the matching result includes: and if the matching result is no matching item, extracting the original data of the first original field directly associated with the first standard field by using the regular expression to obtain the standardized data of the first standard field.
This may be the case where the raw data is of a certain normative nature, such as a train. When the matching is carried out, the original data of the three original fields of the vehicle system field, the brand manufacturer field and the vehicle style field can be used for splicing to obtain spliced data, the spliced data is matched with the association table of the standard vehicle system field, and after the matching item cannot be obtained, the original data in the vehicle system field directly associated with the standard vehicle system field can be used for extracting by using a regular expression to obtain the standardized data of the standard vehicle system field.
In another embodiment of the present application, in the data processing method, the data processing method further includes: and extracting the original data from the second original field by using a regular expression to obtain the standardized data of the second standard field.
For example, price and vehicle style are generally difficult to infer from other information, and the information still has certain normativity during recording, so that the information can be directly extracted by using a regular expression, for example, raw data of a price field is extracted by using the regular expression to obtain standardized data of a standard price field, so that invalid price information and unified units (for example, prices are unified to xx ten thousand yuan) in the raw data can be removed, and a specific numerical value of 46.98 ten thousand yuan or a price interval of 23.88-40.98 ten thousand yuan can be specifically included. The normalized data of the second standard field is null if the raw data of the second raw field is null or invalid.
In another embodiment of the present application, in the data processing method, the association table records multiple sets of standardized data, where each set of standardized data includes standardized data of the first standard field and standardized data of multiple third standard fields; the data processing method further comprises: and determining the normalized data of the third standard field according to the matching result.
For example, the associated table may record a brand, a manufacturer, and a country corresponding to the vehicle system, and after the matching item is obtained, because the matching item records not only the standardized data of the first standard field, which is the standard vehicle system field, but also the standardized data of the third standard fields, which is the standard brand field, the standard manufacturer field, and the standard country field, the matching item does not need to match with the associated tables of the standard brand field, the standard manufacturer field, and the standard country field, respectively, and the standardized data of the standard brand field, the standard manufacturer field, and the standard country field can be directly determined, so that the standardized processing time of the three fields is saved, and the efficiency is greatly improved.
In another embodiment of the present application, in the data processing method, before all the steps, the data processing method further includes: and loading a standard configuration file, wherein the association relation between the standard fields and the original fields and the association table of each standard field are recorded in the standard configuration file.
Specifically, the technical solution of the present application may be implemented by executing a JAR (Java Archive) package by using a command hint, and a standard configuration file written in advance may be loaded during execution. In another embodiment of the present application, in the data processing method, the standard configuration file further records a storage path of the original data file and a storage path of the result file, and the data processing method further includes: reading original data of an original field according to a storage path of an original data file; the resulting normalized data is written into the corresponding normalized fields of the result file.
In some embodiments, the standard configuration file may be a file in xml format, and the result file and the raw data file may be files in txt format.
An embodiment of the present application further provides a data processing apparatus, which is configured to implement the data processing method according to any of the above embodiments.
Specifically, in one embodiment of the present application, the data processing apparatus 200 may include a splicing unit 210, a matching unit 220, and a determination unit 230. The splicing unit 210 is configured to splice the original data of the plurality of first original fields to obtain spliced data. Wherein the first original field is an associated field of the first standard field. The matching unit 220 is configured to match the splicing data with the association table of the first standard field to obtain a matching result. The determining unit 230 is configured to determine normalized data of the first standard field according to the matching result.
In an embodiment of the application, in the data processing apparatus 200, the matching unit 220 is configured to sequentially match the splicing data with each association table according to the matching priority of the association table; each item of the association table records standardized data of a first standard field; stopping matching after the matching item is obtained, and taking the obtained matching item as a matching result; and if no matching item is obtained in each association table, stopping matching and obtaining a matching result without the matching item.
In an embodiment of the application, in the data processing apparatus 200, the determining unit 230 is configured to determine that the normalized data of the first standard field is a null value if the matching result is no matching entry; error information corresponding to the first standard field is generated to enable data checking according to the error information.
In an embodiment of the application, in the data processing apparatus 200, the determining unit 230 is configured to, if the matching result is no matching item, extract, by using a regular expression, the original data of the first original field directly associated with the first standard field to obtain the normalized data of the first standard field.
In an embodiment of the present application, in the data processing apparatus 200, the determining unit 230 is configured to extract the raw data from the second raw field by using a regular expression, so as to obtain the normalized data of the second standard field.
In an embodiment of the present application, in the data processing apparatus 200, the association table records a plurality of sets of standardized data, each set of standardized data includes standardized data of a first standard field and standardized data of a plurality of third standard fields; a determining unit 230, configured to determine normalized data of the third standard field according to the matching result.
In an embodiment of the present application, the data processing apparatus 200 further includes a loading unit, configured to load a standard configuration file, where the association relationship between the standard field and the original field and an association table of each standard field are recorded in the standard configuration file.
In an embodiment of the present application, in the data processing apparatus 200, the standard configuration file further records a storage path of the original data file and a storage path of the result file, the splicing unit 210 is configured to read the original data of the original field according to the storage path of the original data file, and the determining unit 230 is configured to write the obtained standardized data into the corresponding standardized field of the result file.
It should be noted that, the embodiments of the data processing apparatus may be implemented by referring to the embodiments of the data processing method, and are not described herein again.
Fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present application. Referring to fig. 3, at a hardware level, the electronic device includes a processor, and optionally further includes an internal bus, a network interface, and a memory. The Memory may include a Memory, such as a Random-Access Memory (RAM), and may further include a non-volatile Memory, such as at least 1 disk Memory. Of course, the electronic device may also include hardware required for other services.
The processor, the network interface, and the memory may be connected to each other via an internal bus, which may be an ISA (Industry Standard Architecture) bus, a PCI (Peripheral Component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one double-headed arrow is shown in FIG. 3, but this does not indicate only one bus or one type of bus.
And the memory is used for storing programs. In particular, the program may include program code including computer operating instructions. The memory may include both memory and non-volatile storage and provides instructions and data to the processor.
The processor reads a corresponding computer program from the non-volatile memory into the memory and then runs the computer program, thereby forming the data processing device on a logic level. The processor executes the program stored in the memory, and is specifically used for executing:
splicing the original data of the first original fields to obtain spliced data; the first original field is an associated field of the first standard field; matching the splicing data with the association table of the first standard field to obtain a matching result; and determining standardized data of the first standard field according to the matching result.
The data processing method performed by the data processing apparatus may be applied to or implemented by a processor. The processor may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the data processing method may be implemented by hardware integrated logic circuits in a processor or instructions in the form of software. The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components. The data processing methods, steps and logic block diagrams disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the data processing method disclosed in the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory, and the processor reads the information in the memory and completes the steps of the data processing method in combination with the hardware.
The electronic device may further execute the data processing method in fig. 1, and implement the functions of the data processing method in the embodiment shown in fig. 1, which are not described herein again in this embodiment of the present application.
An embodiment of the present application further provides a computer-readable storage medium, where the computer-readable storage medium stores one or more programs, where the one or more programs include instructions, which, when executed by an electronic device including a plurality of application programs, enable the electronic device to perform the data processing method in the embodiment shown in fig. 1, and are specifically configured to perform:
splicing the original data of the first original fields to obtain spliced data; the first original field is an associated field of the first standard field; matching the splicing data with the association table of the first standard field to obtain a matching result; and determining standardized data of the first standard field according to the matching result.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims (10)

1. A data processing method, comprising:
splicing the original data of the first original fields to obtain spliced data; the first original field is an associated field of a first standard field;
matching the splicing data with the association table of the first standard field to obtain a matching result;
and determining standardized data of the first standard field according to the matching result.
2. The data processing method of claim 1, wherein there are a plurality of association tables for the first standard field, and the matching the concatenated data with the association table for the first standard field to obtain the matching result comprises:
matching the splicing data with each association table in sequence according to the matching priority of the association tables; standardized data of a first standard field are recorded in each item of the association table respectively;
stopping matching after the matching item is obtained, and taking the obtained matching item as a matching result;
and if no matching item is obtained in each association table, stopping matching and obtaining a matching result without the matching item.
3. The data processing method of claim 2, wherein said determining normalized data for the first criteria field from the match result comprises:
if the matching result is no matching item, determining that the standardized data of the first standard field is a null value;
the data processing method further comprises:
generating error information corresponding to the first standard field to enable data checking according to the error information.
4. The data processing method of claim 2, wherein said determining normalized data for the first criteria field from the match result comprises:
and if the matching result is no matching item, extracting the original data of the first original field directly associated with the first standard field by using a regular expression to obtain the standardized data of the first standard field.
5. The data processing method of claim 1, wherein the data processing method further comprises:
and extracting the original data from the second original field by using a regular expression to obtain the standardized data of the second standard field.
6. The data processing method of claim 1, wherein the association table records a plurality of sets of standardized data, each set of standardized data including standardized data of the first standard field and standardized data of a plurality of third standard fields;
the data processing method further comprises:
and determining standardized data of a third standard field according to the matching result.
7. The data processing method according to any one of claims 1 to 6, further comprising, before all the steps:
and loading a standard configuration file, wherein the association relationship between the standard fields and the original fields and the association table of each standard field are recorded in the standard configuration file.
8. The data processing method of claim 7, wherein the standard configuration file further records a storage path of the original data file and a storage path of the result file, the data processing method further comprising:
reading original data of an original field according to a storage path of the original data file;
writing the obtained standardized data into the corresponding standardized fields of the result file.
9. A data processing apparatus, characterized in that the data processing apparatus is used for implementing the data processing method of any one of claims 1 to 8.
10. An electronic device, comprising:
a processor; and
a memory arranged to store computer executable instructions which, when executed, cause the processor to perform the data processing method of any of claims 1 to 8.
CN202011272746.XA 2020-11-13 2020-11-13 Data processing method and device and electronic equipment Pending CN112380214A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011272746.XA CN112380214A (en) 2020-11-13 2020-11-13 Data processing method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011272746.XA CN112380214A (en) 2020-11-13 2020-11-13 Data processing method and device and electronic equipment

Publications (1)

Publication Number Publication Date
CN112380214A true CN112380214A (en) 2021-02-19

Family

ID=74582430

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011272746.XA Pending CN112380214A (en) 2020-11-13 2020-11-13 Data processing method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN112380214A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114064720A (en) * 2021-11-15 2022-02-18 中国建设银行股份有限公司 Heterogeneous stream data processing method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7062509B1 (en) * 2000-05-22 2006-06-13 Instill Corporation System and method for product data standardization
CN109189769A (en) * 2018-08-14 2019-01-11 平安医疗健康管理股份有限公司 Data standardization processing method, device, computer equipment and storage medium
CN110442617A (en) * 2019-06-27 2019-11-12 华迪计算机集团有限公司 A kind of method and system carrying out dynamic processing to statistical data based on administration cell
CN110515999A (en) * 2019-08-27 2019-11-29 北京百度网讯科技有限公司 General record processing method, device, electronic equipment and storage medium
CN111047419A (en) * 2019-12-31 2020-04-21 广州探途天下科技有限公司 Vehicle type standardization method and related device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7062509B1 (en) * 2000-05-22 2006-06-13 Instill Corporation System and method for product data standardization
CN109189769A (en) * 2018-08-14 2019-01-11 平安医疗健康管理股份有限公司 Data standardization processing method, device, computer equipment and storage medium
CN110442617A (en) * 2019-06-27 2019-11-12 华迪计算机集团有限公司 A kind of method and system carrying out dynamic processing to statistical data based on administration cell
CN110515999A (en) * 2019-08-27 2019-11-29 北京百度网讯科技有限公司 General record processing method, device, electronic equipment and storage medium
CN111047419A (en) * 2019-12-31 2020-04-21 广州探途天下科技有限公司 Vehicle type standardization method and related device

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114064720A (en) * 2021-11-15 2022-02-18 中国建设银行股份有限公司 Heterogeneous stream data processing method and device

Similar Documents

Publication Publication Date Title
CN108170656B (en) Template creating method, document creating method, rendering method and rendering device
CN112307509A (en) Desensitization processing method, equipment, medium and electronic equipment
CN108241720B (en) Data processing method, device and computer readable storage medium
CN110955714A (en) Method and device for converting unstructured text into structured text
CN107609011B (en) Database record maintenance method and device
CN114297204A (en) Data storage and retrieval method and device for heterogeneous data source
CN112380214A (en) Data processing method and device and electronic equipment
CN111310137A (en) Block chain associated data evidence storing method and device and electronic equipment
CN109558548B (en) Method for eliminating CSS style redundancy and related product
CN110647463B (en) Method and device for restoring test breakpoint and electronic equipment
CN113282541B (en) File calling method and device and electronic equipment
CN113986739A (en) Monitoring method and device for website memory leakage, storage medium and processor
CN111459474B (en) Templated data processing method and device
CN114138745A (en) Data integration method and device, storage medium and processor
CN111858619B (en) Data self-circulation method and device and electronic equipment
CN110018844B (en) Management method and device of decision triggering scheme and electronic equipment
CN114416442A (en) Hardware change detection method and device, electronic equipment and readable storage medium
CN110554867B (en) Application processing method and device
CN112416753A (en) Method, system and equipment for standardized management of urban brain application scene data
CN110750271A (en) Service aggregation, method and device for executing aggregated service and electronic equipment
CN108459879B (en) Method for preventing terminal from crashing and terminal
CN115880085A (en) Transaction reconciliation method, device, electronic equipment and storage medium
CN114625595B (en) Method, device and system for rechecking dynamic configuration information of service system
CN117435662B (en) File import intelligent analysis method and system
CN117874002A (en) Method and system for heterogeneous data migration

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20210219

WD01 Invention patent application deemed withdrawn after publication