CN116150198A - Data processing method and device, electronic equipment and storage medium - Google Patents

Data processing method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN116150198A
CN116150198A CN202210985682.0A CN202210985682A CN116150198A CN 116150198 A CN116150198 A CN 116150198A CN 202210985682 A CN202210985682 A CN 202210985682A CN 116150198 A CN116150198 A CN 116150198A
Authority
CN
China
Prior art keywords
data
service
information
target
application
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210985682.0A
Other languages
Chinese (zh)
Inventor
李智
郭剑霓
吴海英
郭江
刘磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Mashang Xiaofei Finance Co Ltd
Original Assignee
Mashang Xiaofei Finance Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mashang Xiaofei Finance Co Ltd filed Critical Mashang Xiaofei Finance Co Ltd
Priority to CN202210985682.0A priority Critical patent/CN116150198A/en
Publication of CN116150198A publication Critical patent/CN116150198A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24553Query execution of query operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24552Database cache management
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The application provides a data processing method and device, electronic equipment and storage medium, wherein the method comprises the following steps: obtaining service data of a target service object in a streaming mode; acquiring an object identifier of a target service object from service data, and acquiring object information corresponding to the target service object from a key-value type target database according to the object identifier; carrying out validity check on the service data according to the object information, and storing the service data as target data under the condition that the check is passed; the target database is used for storing information records corresponding to a plurality of service objects at the current moment, the information records are in a key value pair format, the key value pair takes the object identification of the service object as a key, and takes the object information of the service object as a value. According to the embodiment of the application, the data matching processing can be rapidly, efficiently and accurately carried out on the service data generated by the service application, so that the service data meeting the user requirements is obtained.

Description

Data processing method and device, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a data processing method and apparatus, an electronic device, and a storage medium.
Background
With the continuous development of internet technology, service applications can generate a large amount of service data at all times in the running process, and in order to facilitate users to detect service conditions, data matching applications are generally required to screen service data meeting user requirements from the service data generated by the service applications for display for users to view. For example, in an e-commerce promotion scenario, the electronic device may typically perform a matching process on merchandise order data generated by the e-commerce application based on the data matching application to present sales data for different types of merchandise to the user.
Currently, when implementing data matching processing, matching processing is generally performed on service data generated by a service application based on a large processing data frame, for example, spark frame. However, since Spark is based on a batch processing mechanism, this method generally has problems of data processing delay, inefficiency, and inaccurate matching result.
Disclosure of Invention
The application provides a data processing method and device, electronic equipment and a storage medium, so as to quickly, efficiently and accurately perform data matching processing on service data generated by service application and obtain service data meeting user requirements.
In a first aspect, the present application provides a data processing method, including:
obtaining service data of a target service object in a streaming mode;
acquiring an object identifier of the target service object from the service data, and acquiring object information corresponding to the target service object from a key-value type target database according to the object identifier, wherein the object information comprises information for indicating whether the target service object is an effective object or not;
carrying out validity check on the service data according to the object information, and storing the service data as target data under the condition that the check is passed;
the target database is used for storing information records corresponding to a plurality of service objects at the current moment, wherein the information records are in a key value pair format, and the key value pair takes an object identifier of the service object as a key and takes object information of the service object as a value.
In a second aspect, the present application provides a data processing apparatus comprising:
the service data acquisition unit is used for acquiring the service data of the target service object in a streaming mode;
an object information obtaining unit, configured to obtain an object identifier of the target service object from the service data, and obtain object information corresponding to the target service object from a key-value-type target database according to the object identifier, where the object information includes information for indicating whether the target service object is an effective object;
The verification unit is used for verifying the validity of the service data according to the object information and storing the service data as target data under the condition that the verification is passed;
the target database is used for storing information records corresponding to a plurality of service objects at the current moment, wherein the information records are in a key value pair format, and the key value pair takes an object identifier of the service object as a key and takes object information of the service object as a value.
In a third aspect, the present application provides an electronic device, comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores one or more computer programs executable by the at least one processor, one or more of the computer programs being executable by the at least one processor to enable the at least one processor to perform the data processing method of the first aspect described above.
In a fourth aspect, the present application provides a computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the data processing method of the first aspect described above.
According to the embodiment provided by the application, the electronic equipment acquires the service data of the target service object through streaming, and then acquires the object information corresponding to the target service object from the key-value type target database according to the object identification of the target service object in the service data, so that the electronic equipment can carry out validity check on the service data according to the object information, and under the condition that the verification is passed, the service data is stored as the target data.
Because the service data is acquired in a streaming mode and subjected to matching processing, the method provided by the embodiment of the application can timely perform data matching processing so as to acquire target data quickly and with low delay; in addition, when the service data is matched, the method provided by the embodiment of the application obtains the object information of the target service object from a Key-Value (KV) target database according to the object identification of the target service object in the service data, and performs validity check on the service data based on the object information.
It should be understood that the description of this section is not intended to identify key or critical features of the embodiments of the application or to delineate the scope of the application. Other features of the present application will become apparent from the description that follows.
Drawings
The accompanying drawings are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate the application and together with the embodiments of the application, and not constitute a limitation to the application. The above and other features and advantages will become more readily apparent to those skilled in the art by describing in detail exemplary embodiments with reference to the attached drawings, in which:
FIG. 1 is a schematic diagram of a data matching process in the related art;
FIG. 2 is a flowchart of a data processing method according to an embodiment of the present application;
fig. 3 is a schematic flow chart of obtaining an information record according to an embodiment of the present application;
fig. 4 is a schematic flow chart of acquiring source information data according to an embodiment of the present application;
fig. 5 is a schematic flow chart of acquiring service data according to an embodiment of the present application;
FIG. 6 is a schematic diagram of a framework for data processing according to an embodiment of the present disclosure;
FIG. 7 is a block diagram of a data processing apparatus according to an embodiment of the present application;
Fig. 8 is a block diagram of an electronic device according to an embodiment of the present application.
Detailed Description
For a better understanding of the technical solutions of the present application, the following description of exemplary embodiments of the present application is made with reference to the accompanying drawings, in which various details of embodiments of the present application are included to facilitate understanding, and they should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
In the absence of conflict, embodiments and features of embodiments herein may be combined with one another.
As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used herein, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. The terms "connected" or "connected," and the like, are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect.
Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and this application and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
Please refer to fig. 1, which is a schematic diagram of a data matching process in the related art. As shown in fig. 1, in the related art, service data generated for a service application is generally processed based on a large processing data frame, such as a Spark frame, and in particular, when data matching is performed, service data to be matched, which is generated in a current period and corresponds to a service object, is generally obtained from a service database, such as a mysql database, according to a preset processing period; then, generating a first memory table through Spark SQL; and performing association matching on the object identification of the service object in the service data cached in the first memory table and a data warehouse, for example, a data record in an object information table of a Hive library, so as to determine whether the service object corresponding to the service data is an effective object, if yes, determining that the service data is the effective data and storing the effective data for display by a terminal device, otherwise, considering that the service data is invalid data and filtering the service data, wherein the preset processing period can be 10 minutes, the service object corresponding to the service data can be a service user or a service product, the object information table is used for storing the object information of the service object, and the object information of the service object comprises information indicating whether the service object is the effective object.
With reference to fig. 1, since the object information of the service object is not fixed but may change dynamically at any time, in the related art, the matching process is performed, meanwhile, the object information of the service object is often obtained from the service database according to a preset processing period, a second memory table is generated through Spark SQL, and then the update type of the object information of the service object is determined by associating with the object information table in the data warehouse, for example, the Hive database, and the data record in the object information table is updated according to the determined update type.
In the process of implementing the present application, the inventor finds that, although the above data matching method based on the big data processing framework in the related art can complete the data matching process, because it is based on a batch processing mechanism, that is, whether the service data is acquired or the object information is updated, the data is processed according to a preset data processing period, and this processing method has a problem of processing delay even if the data processing period is set to be short or different, and may also cause a problem of incorrect matching result due to the problem of processing delay because of untimely updating of the data record in the object information table; in addition, since the data warehouse is used for persistently storing information, when storing object information of a business object, the information of all fields of the data warehouse is usually required to be stored, however, in practice, not all information is used for carrying out data matching processing, so that the method has the problem of large storage occupation; in addition, when matching business data with data records in the data warehouse object information table based on Spark framework, it is necessary to use Spark calculation engine to perform association calculation, and each time, it is necessary to sequentially execute the steps: starting an engine, acquiring computing resources, acquiring disk data, performing association computation, and writing an association result into a disk, wherein the method often occupies a large amount of resources and consumes more time, and has the problem of low efficiency; in addition, after updating the object information table according to the obtained object information, the table needs to be repaired (repair) once, and frequent repair tables may also cause a large burden on the large data cluster, so that the performance of the large data cluster may be affected.
To solve at least one of the above problems, an embodiment of the present application provides a data processing method, please refer to fig. 2, which is a flowchart of a data processing method provided in an embodiment of the present application. The method can be applied to the electronic equipment, the electronic equipment can be a server, and the server can be a physical server or a virtual server; of course, with the continuous progress of the technology, the electronic device may also be a terminal device, that is, the method may also be applied to the terminal device alone, for example, may be applied to an edge terminal device in an edge computing scenario, which is not limited in particular herein.
As shown in fig. 2, the data processing method provided in the embodiment of the present application may include the following steps S201 to S203, which are described in detail below.
In step S201, service data of the target service object is obtained in a streaming manner.
The target business object can be any business object in business application.
In the embodiment of the present application, the target service object and the service data of the target service object may be different according to different application scenarios.
For example, when the method is applied to a financial lending scenario, that is, to processing business data generated by a financial application, a business object may be any business user using the financial application, that is, a customer who uses a business such as lending, financial management, etc. of the financial application, and the business user may be an individual user or an enterprise user, which is not particularly limited herein; in this scenario, the service data corresponding to the service object may be, for example, pay-out amount data, which may be, for example, in the form of (data identifier, user identifier, generation time, pay-out amount).
For another example, when the method is applied to an e-commerce scenario, that is, to processing business data generated by an e-commerce application, the business object may be a business product and/or a business user in the e-commerce application; correspondingly, in the scene, the service data of the service object can be data such as order data, browsing data, attention data and the like corresponding to the service product and/or the service user.
That is, in the embodiment of the present application, according to the application scenario, the service object may be at least one of a service user and/or a service product, and the service data may be at least one of payment amount data corresponding to the service user and order data corresponding to the service product, which is not limited in particular herein.
In this embodiment of the present application, the streaming acquiring service data of the target service object may be acquiring the service data in a streaming processing manner.
In the streaming process, the data producer writes the generated data record into the ordered data stream, and the data consumer continuously and uninterruptedly acquires the data record from the ordered data stream according to the same sequence, and in the streaming process, no preset starting or ending is usually performed, but the data record generated by the data producer is subjected to real-time response processing through a series of event nodes, where the data producer may be, for example, a service application for generating the service data in the embodiment of the present application, and the data consumer may be, for example, a data processing application in an electronic device for executing the method described in the embodiment of the present application.
That is, in the embodiment of the application, the service data generated by the service application is acquired by adopting a streaming processing manner to achieve continuous and uninterrupted processing of the service data, so as to reduce the technical effect of processing delay, aiming at the problem of data processing delay caused by using a batch processing mechanism when data matching processing is performed based on a distributed processing framework, such as a Spark framework in the related art.
Step S202, obtaining an object identifier of a target service object from service data, and obtaining object information corresponding to the target service object from a key-value-type target database according to the object identifier, wherein the object information comprises information for indicating whether the target service object is an effective object, and the target database is used for storing information records corresponding to a plurality of service objects at the current moment, wherein the information records are in a key-value-pair format, the key-value pair takes the object identifier of the service object as a key, and the object information of the service object as a value.
The object identifier is information for uniquely identifying a service object, and for example, the object identifier may be a universal unique identification code (UUID, universally Unique Identifier) corresponding to the service object.
A key-value type database, which is a non-relational database, stores data using a simple key-value method. In general, a key-value type database stores data as a set of key-value pairs, with keys as unique identifiers, both keys and values can be anything from simple objects to complex composite objects. A common key value type database is a database such as Redis, memcached.
Considering that the Redis database supports relatively more stored value types, and the supported data types support complex operations such as push (push), pop (pop), add (add), remove (remove), intersection, union, difference and the like, and the Redis is provided with a data updating mechanism, and the managed key value records of the Redis database can be updated efficiently in millisecond level, in the embodiment of the present application, the target database is preferably the Redis database, and of course, along with the continuous progress of the technology, the target database can also be other key value type databases with better performance, and no special limitation is made herein.
The information record of the plurality of business objects stored in the target database may be a key value pair in the form of < object identifier, object information > wherein the object information may be information at least including validity identifier indicating whether the business object is valid, that is, in the embodiment of the present application, the object information may include other data items while including the validity identifier of the business object, for example, for an individual user in the business user, the object information may include a name, an age, a registration time, whether the individual user is a valid user, and so on; for enterprise users in the service users, the object information may include data items such as enterprise names, registration time, and whether the enterprise users are valid users.
It should be noted that, in the embodiment of the present application, in order to improve accuracy of a matching result, information records of a plurality of service objects stored in a target database may be records corresponding to a service object at a current time, that is, latest records corresponding to the current time. For example, when the change of the information of the service object in the service application is detected, the information record of the service object in the target database can be updated in real time, so as to ensure that the latest data record time stored in the target database.
In addition, considering that when data matching is performed, only the validity identification of the service object is often required to complete the data matching processing, in this embodiment of the present application, in order to reduce the storage occupation, the object information in the information record of the service object stored in the target database may be directly the validity identification of the service object, that is, the information record of the service object may be directly in the form of < object identification, validity identification >, so as to reduce the storage occupation.
In the related art, since object information of a business object for performing data matching processing is stored in an object information table of a data warehouse, such as a Hive database, when a data record in the object information table needs to be used, a series of processes need to be performed to operate the data record in the object information table and each time the data record in the object information table is updated, an operation of repairing the table needs to be performed, which may not only cause too low efficiency of data matching but also affect performance and stability thereof.
In order to solve the problem, in this embodiment of the present application, considering that service data generated by a service application generally includes an object identifier of a service object, it may be considered that a Key (Key) of the service object is the object identifier, and object information of the service object is a Value (Value) corresponding to the Key, so that when performing data matching processing, an electronic device executing the method in this embodiment of the present application may quickly query, by means of Get (Key), object information of the target service object from the target database according to the object identifier of the target service object included in the service data after acquiring the service data of the target service object in the step S201, so as to verify the service data according to the object information.
Step S203, the validity of the service data is checked according to the object information, and the service data is stored as target data when the verification is passed.
After the service data of the target service object is obtained based on the step S201, and the object information of the target service object is obtained from the target database according to the object identifier of the target service object through the step S202, the service data can be effectively checked according to the object information, and the service data is stored as the target data under the condition that the verification is passed, so that the service application can obtain the target data in real time and display the target data for a user to view, or perform downstream task processing based on the target data.
As described in the above step S202, in the embodiment of the present application, the object information in the information record stored in the target database may be a validity identifier for indicating whether the service object is valid, and in this embodiment, the obtaining, according to the object identifier, the object information corresponding to the target service object from the key-value type target database in the above step S202 may be: and acquiring a validity identification corresponding to the object identification from the target database.
After the validity identification is obtained, the validity verification of the service data according to the object information can be that; carrying out validity check on the service data according to the validity identification; under the condition that the validity identification indicates that the target service object is valid, determining that the service data passes the verification; and determining that the service data is not verified and passed in the case that the validity identification indicates that the target service object is invalid.
As can be seen from the above description, according to the data processing method provided by the embodiment of the present application, since the service data is acquired and processed in a streaming manner, the method can perform data matching processing on the service data in time, so as to obtain the target data quickly and with low delay; in addition, when matching processing is performed on service data, according to the method provided by the embodiment of the application, according to the object identifier of the target service object in the service data, the object information corresponding to the target service object is obtained from the target database of the key value type, and because the target database stores the information records of the key value pair format corresponding to a plurality of service objects at the current moment, the corresponding object information is obtained from the target database based on the object identifier, and the validity check is performed on the service data based on the object information, so that the accuracy of the matching result can be improved, and the stability of the data cluster can be ensured; in addition, because the object information of the business object in the target database can be only identified for the validity thereof, compared with the mode in the related art, the mode can also greatly reduce the storage occupation.
Please refer to fig. 3, which is a flowchart illustrating a process of obtaining an information record according to an embodiment of the present application. As shown in fig. 3, in the embodiment of the present application, to ensure accuracy of a data processing result, an information record corresponding to any business object stored in a target database at a current time may be obtained by: step S301, source information data to be processed is obtained in a streaming mode, wherein the source information data comprises an object identifier and object information of a service object; and step S302, updating the source information data into the target database based on a mechanism that the target database operates on the data stored in the memory through a single thread.
The source information data can be information of any business object acquired by a data processing application in a streaming processing mode. In other words, during the service application process, whenever the information of a service object is updated, for example, when the service object is newly added, deleted or changed, the service application can push the information as the source information data of the service object to the data processing application, the data processing application obtains the source information data through a streaming processing mode, and automatically updates the source information data into a target database according to a target database, for example, a Redis database, through a mechanism of updating the data stored in a memory through a single thread, so that the source information data to be processed is obtained through continuous and non-interval streaming, and is updated into the Redis database at a millisecond speed based on the data updating mechanism of the Redis database, so that the object information of the service object is ensured to be up to date to the maximum degree when the data processing is performed, and errors of data processing results are avoided.
Please refer to fig. 4, which is a flowchart illustrating a process of acquiring source information data according to an embodiment of the present application. As shown in fig. 4, the streaming acquisition of the source information data to be processed may include: step S401, source information data pushed by a first service application is obtained, wherein the first service application is any application for generating the source information data; step S402, generating a first task message according to the source information data and writing the first task message into a first message queue; step S403, the first task message is acquired from the first message queue by using the first stream computing job, so as to obtain the source information data in a stream manner.
The first message queue is configured to buffer source information data pushed by the first service application, so that an electronic device implementing the method of the embodiment of the present application may process the source information data piece by piece in a stream processing manner, where the first message queue may be a kafka message queue, for example.
The first service application may be any application for generating source information data, the service application may be an application associated with a second service application for generating raw data to be matched, for providing object information of a service object to the second service application through a preset data interface, for example, the first service application may be a user information management application for managing registration, login, information maintenance, and the like of an individual user and/or an enterprise user, the second service application may be a financial lending application, the second service application obtains information of the user based on the first service application, and provides a payment process to the user; of course, the first service application and the second service application may be the same service application, and the same service application is integrated with the user information management function and other service functions at the same time, which is not limited herein.
In an embodiment of the present application, the first stream computing job may be a stream computing (Flink) job implemented based on an Apache Flink distributed processing framework. Because of the high throughput, low latency, high performance, and the characteristics of supporting checkpoints (checkpoints), transactional mechanisms, and exact-once (exact-once), the first stream computing job is used to obtain source information data from the first message queue in a stream manner, and update information records in the target database, so that the data can be ensured to be recoverable in the field when faults occur, and the problem of repeatedly processing the data can be avoided.
Similar to the process of streaming acquiring source information data, in the embodiment of the present application, the effect of streaming acquiring the service data of the target service object may also be achieved through similar processing, please refer to fig. 5, which is a schematic flow chart of acquiring the service data provided in the embodiment of the present application. As shown in fig. 5, the streaming acquisition of the service data of the target service application in step S201 may include: step S501, service data pushed by a second service application is obtained, wherein the second service application is any application for generating the original data; step S502, generating a second task message according to the service data and writing the second task message into a second message queue; step S503, using the second stream computing job to obtain the second task message from the second message queue, so as to obtain the service data in a stream manner.
The second stream computing job, which may be a job task that is different from the first stream computing job and is implemented based on the Apache link distributed processing framework, is used for obtaining the service data of the target service object in a stream manner, obtaining the object information of the target service object according to the object identifier of the target service object in the service data after obtaining the service data, and performing validity check on the service data according to the object information.
The second message queue may be a kafka message queue, and in addition, the first message queue and the second message queue may be the same or different message queues, where in the case that the first message queue and the second message queue are the same message queue, according to different types of original data and source information data pushed by the service application, fields of identification data types are added to the corresponding first task message and second task message, so that the first stream computing job and the second stream computing job acquire the corresponding task message according to the data types corresponding to the task message, and process the corresponding task message.
In some embodiments, after the target data is obtained based on the above method, the business data passing through the verification is stored as target data, which may be stored in a real-time database for performing real-time data analysis, for example, a Doris database, so that after the target data is obtained, the target data can be quickly displayed to a user for viewing.
The Doris database is a modern large-scale parallel analysis (MPP, massively Parallel Processing) analytical database, and can obtain query results only by sub-second response time, so that real-time data analysis is effectively supported.
In addition, in the embodiment of the present application, after storing the service data passing the verification as the target data, the method further includes: the target data is displayed.
That is, for the purpose of satisfying the purpose that the user views a specific service in real time, after the target data is obtained through the above processing, the target data may be directly presented for the user to view.
For example, in the case that the second stream computing job is a link job, after the target data is obtained through verification, the target data can be stored into a Doris database for real-time data analysis by calling a link sink method, so that the terminal device can display the target data in real time for a user to check, and the user can conveniently determine whether the operation strategy of the service application needs to be adjusted according to the displayed target data.
In addition, in the embodiment of the present application, when the electronic device initially runs the data processing application or the data processing application is abnormal, when the information record stored in the target database needs to be initialized, the information record may be obtained from a data warehouse storing information records of a plurality of service objects, and the information record in the target database that is initially run is initialized based on the information record, where the data warehouse may be a database for permanently storing data, for example, may be a Hive database; in such an embodiment, the method further comprises: and storing the information records corresponding to the plurality of business objects stored in the target database at the current moment into the data warehouse according to a preset time interval. That is, in the process of running the data processing application, object information corresponding to a plurality of business objects stored in the target database at the current time is synchronized to the data warehouse for permanently storing data at a preset time interval, for example, once every 10 minutes, so as to facilitate the abnormal recovery.
For easy understanding of the data processing method provided in the embodiments of the present application, please refer to fig. 6, which is a schematic diagram of a framework of data processing provided in the embodiments of the present application. The following describes a data processing method provided in an embodiment of the present application with reference to fig. 6, where in fig. 6, a second service application for generating service data of a target service object and a first service application for generating source information data are the same service application, and the service application is a financial application; the description will be made taking the business object as the business customer, the business data as the deposit amount data, and the source information data as the customer information of the business customer.
As shown in fig. 6, in the embodiment of the present application, the business data layer may be configured to couple business data processing with understanding at the data matching location, so that the data matching processing may process business data corresponding to different businesses and different business scenarios, and the financial application in the business data layer is configured to generate customer information data corresponding to a business customer and generate payoff amount data corresponding to different business customers, and push the customer information data to the message queue 1 shown in fig. 6 and the payoff amount data to the message queue 2 shown in fig. 6, where the message queue 1 and the message queue 2 may be kafka message queues; the data processing application in the electronic device may set the stream computing job 1 and the stream computing job 2 to obtain the customer information data and the deposit amount data in a stream manner, and further perform data processing with low delay, high efficiency and accuracy by continuously and uninterruptedly obtaining the corresponding customer information data and the corresponding deposit amount data, where the stream computing job 1 and the stream computing job 2 may be a link job.
Specifically, referring to fig. 6, during the running process of the data processing application, the streaming computing job 2 may execute the above step S201, that is, streaming obtain the payoff amount data of the service client by means of streaming obtain the task message in the message queue 2; after the payoff amount data is obtained, it may generate a corresponding first cache object, i.e. a BUFFER object, in the memory, and execute the above step S202, i.e. according to the client identifier of the service client in the payoff amount data, obtain corresponding client information from a key-value type target database, for example, a dis database shown in fig. 6; after that, the above step S203 is executed, that is, whether the payoff amount data is valid is checked according to the client information, and if the check is passed, the payoff amount data may be stored as target data, for example, as shown in fig. 6, the payoff amount data passed by the check may be stored in a real-time table of the Doris database, so that the terminal device may display the payoff amount data based on the billboard.
In addition, as shown in fig. 6, in order to ensure that the information record of the service object stored in the target database is the latest information record at the current moment, during the running process of the data processing application, the streaming computing job 1 may execute the above step S301, that is, acquire the client information data updated in real time by the service application in a manner of streaming acquiring the task message in the message queue 1; after the client information data is obtained, it may also generate a corresponding second cache object in the memory, and as shown in fig. 6, the above step S302 is executed, that is, based on a mechanism that the target database operates on the data stored in the memory through a single thread, the update type of the corresponding information record is automatically determined, so as to complete the update of the client information data.
It should be noted that, in fig. 6, the method is applied to a financial lending scenario, for example, to a financial application, and in practical implementation, the method may also be applied to other scenarios separately or simultaneously, for example, may also be applied to an e-commerce scenario, and the matching processing of the product order data is completed by acquiring the product order data, the user information data, and the like generated by the e-commerce application, so that the corresponding target data is displayed according to the user requirement with low delay, high efficiency, and accuracy, which is not described herein again.
In addition, it should be noted that, in the process of executing the data matching process, as shown in fig. 6, the target database may also store the information records of the plurality of stored service objects into the data warehouse according to a preset time interval, for example, the Hive database, so as to initialize the data in the target database, i.e., the Redis database, when a special restart is required, and the detailed processing procedure thereof will not be repeated here.
In summary, according to the data processing method provided by the embodiment of the present application, the service data of the target service object is obtained through streaming, and the data matching process is performed by performing data matching with the information record in the target database storing the information records corresponding to the plurality of service objects at the current moment, so that the data matching process can be completed timely, efficiently and accurately; and the data matching processing is carried out through the stream computing operation developed based on the Apache Flink framework, and the matching data flow rate which is self-adaptive in a mode of dynamically adjusting the processing concurrency according to the data flow rate in the data matching processing process can be further carried out, so that the data cluster can be conveniently expanded and contracted.
It will be appreciated that the above-mentioned method embodiments of the present application may be combined with each other to form a combined embodiment without departing from the principle logic, which is not repeated herein, and the present application is limited to the description. It will be appreciated by those skilled in the art that in the above-described methods of the embodiments, the particular order of execution of the steps should be determined by their function and possible inherent logic.
In addition, the application further provides a data processing device, an electronic device and a computer readable storage medium, and the above may be used to implement any one of the data processing methods provided in the application, and the corresponding technical schemes and descriptions and corresponding descriptions referring to the method parts are not repeated.
Fig. 7 is a block diagram of a data processing apparatus according to an embodiment of the present application.
Referring to fig. 7, an embodiment of the present application provides a data processing apparatus, which may be applied to an electronic device, the data processing apparatus 700 includes: a service data acquisition unit 701, an object information acquisition unit 702, and a verification unit 703.
The service data acquiring unit 701 is configured to obtain service data of a target service object in a streaming manner.
The object information obtaining unit 702 is configured to obtain an object identifier of a target service object from service data, and obtain object information corresponding to the target service object from a key-value-type target database according to the object identifier, where the object information includes information for indicating whether the target service object is an effective object, and the target database is configured to store information records corresponding to a plurality of service objects at a current moment, where the information records are in a key-value pair format, and the key-value pair uses an object identifier of the service object as a key and uses object information of the service object as a value.
The verification unit 703 is configured to perform validity verification on the service data according to the object information, and store the service data as target data if the verification passes.
In some embodiments, the apparatus 700 further comprises a source information data acquisition unit for: the method comprises the steps of obtaining source information data to be processed in a streaming mode, wherein the source information data comprise object identifiers and object information of service objects; and updating the source information data into the target database based on a mechanism that the target database operates on the data stored in the memory through a single thread.
In some embodiments, the source information data obtaining unit may be configured to, when obtaining source information data to be processed in a streaming manner: acquiring source information data pushed by a first service application, wherein the first service application is any application for generating the source information data; generating a first task message according to the source information data and writing the first task message into a first message queue; the first task message is obtained from the first message queue using the first stream computation job to obtain source information data in a stream.
In some embodiments, the service data obtaining unit 701 may be configured to, when obtaining service data of a target service object in a streaming manner: acquiring service data pushed by a second service application, wherein the second service application is any application for generating the service data; generating a second task message according to the service data and writing the second task message into a second message queue; and acquiring a second task message from a second message queue by using a second stream computing job to obtain service data in a stream mode.
In some embodiments, the object information in the information record of any business object stored in the target database is a validity identifier for indicating whether the business object is valid or not; the object information obtaining unit 702 may be configured to, when obtaining object information corresponding to a target service object from a key-value type target database according to an object identifier: and acquiring the validity identification corresponding to the object identification from the target database.
In some embodiments, the verification unit 703 may be configured to, when performing validity verification on service data according to object information: carrying out validity verification on the service data according to the validity identification; under the condition that the validity identification indicates that the target service object is valid, determining that the service data passes the verification; and determining that the service data is not checked to pass under the condition that the validity identification indicates that the target service object is invalid.
In some embodiments, the information record of the initial time in the target database is obtained from a data warehouse storing information records of a plurality of business objects, the data warehouse being a database for permanently storing data; the apparatus 700 further comprises a persistent storage unit for: and storing the information records corresponding to the plurality of business objects stored in the target database at the current moment into a data warehouse according to a preset time interval.
Fig. 8 is a block diagram of an electronic device according to an embodiment of the present application.
Referring to fig. 8, an embodiment of the present application provides an electronic device, including: at least one processor 801; at least one memory 802, and one or more I/O interfaces 803, coupled between the processor 801 and the memory 802; the memory 802 stores one or more computer programs executable by the at least one processor 801, and the one or more computer programs are executed by the at least one processor 801 to enable the at least one processor 801 to perform the data processing method described above.
The embodiment of the application also provides a computer readable storage medium, on which a computer program is stored, wherein the computer program realizes the data processing method when being executed by a processor. The computer readable storage medium may be a volatile or nonvolatile computer readable storage medium.
Embodiments of the present application also provide a computer program product comprising computer readable code, or a non-transitory computer readable storage medium carrying computer readable code, which when executed in a processor of an electronic device, performs the above-described data processing method.
Those of ordinary skill in the art will appreciate that all or some of the steps, systems, functional modules/units in the apparatus, and methods disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. In a hardware implementation, the division between the functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be performed cooperatively by several physical components. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer-readable storage media, which may include computer storage media (or non-transitory media) and communication media (or transitory media).
The term computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable program instructions, data structures, program modules or other data, as known to those skilled in the art. Computer storage media includes, but is not limited to, random Access Memory (RAM), read Only Memory (ROM), erasable Programmable Read Only Memory (EPROM), static Random Access Memory (SRAM), flash memory or other memory technology, portable compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical disc storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by a computer. Furthermore, as is well known to those of ordinary skill in the art, communication media typically embodies computer readable program instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and may include any information delivery media.
The computer readable program instructions described herein may be downloaded from a computer readable storage medium to a respective computing/processing device or to an external computer or external storage device over a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmissions, wireless transmissions, routers, firewalls, switches, gateway computers and/or edge servers. The network interface card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium in the respective computing/processing device.
Computer program instructions for performing the operations of the present application may be assembly instructions, instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, c++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer readable program instructions may be executed entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, aspects of the present application are implemented by personalizing electronic circuitry, such as programmable logic circuitry, field Programmable Gate Arrays (FPGAs), or Programmable Logic Arrays (PLAs), with state information for computer readable program instructions, which may execute the computer readable program instructions.
The computer program product described herein may be embodied in hardware, software, or a combination thereof. In an alternative embodiment, the computer program product is embodied as a computer storage medium, and in another alternative embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK), or the like.
Various aspects of the present application are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.
These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable medium having the instructions stored therein includes an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Example embodiments have been disclosed herein, and although specific terms are employed, they are used and should be interpreted in a generic and descriptive sense only and not for purpose of limitation. In some instances, it will be apparent to one skilled in the art that features, characteristics, and/or elements described in connection with a particular embodiment may be used alone or in combination with other embodiments unless explicitly stated otherwise. It will therefore be understood by those of ordinary skill in the art that various changes in form and details may be made therein without departing from the scope of the present application as set forth in the following claims.

Claims (10)

1. A method of data processing, comprising:
obtaining service data of a target service object in a streaming mode;
acquiring an object identifier of the target service object from the service data, and acquiring object information corresponding to the target service object from a key-value type target database according to the object identifier, wherein the object information comprises information for indicating whether the target service object is an effective object or not;
carrying out validity check on the service data according to the object information, and storing the service data as target data under the condition that the check is passed;
The target database is used for storing information records corresponding to a plurality of service objects at the current moment, wherein the information records are in a key value pair format, and the key value pair takes an object identifier of the service object as a key and takes object information of the service object as a value.
2. The method according to claim 1, wherein the information record corresponding to any business object stored in the target database at the current moment is obtained by:
the method comprises the steps of obtaining source information data to be processed in a streaming mode, wherein the source information data comprise object identifiers and object information of service objects;
and updating the source information data into the target database based on a mechanism of operating the target database on the data stored in the memory through a single thread.
3. The method of claim 2, wherein the streaming acquisition of source information data to be processed comprises:
acquiring the source information data pushed by a first service application, wherein the first service application is any application for generating the source information data;
generating a first task message according to the source information data and writing the first task message into a first message queue;
And acquiring the first task message from the first message queue by using a first flow computing job to acquire the source information data in a flow mode.
4. The method of claim 1, wherein the streaming acquisition of the business data of the target business object comprises:
acquiring the service data pushed by a second service application, wherein the second service application is any application for generating the service data;
generating a second task message according to the service data and writing the second task message into a second message queue;
and acquiring the second task message from the second message queue by using a second stream computing job to obtain the service data in a stream mode.
5. The method according to claim 1, wherein object information in an information record of any business object stored in the target database is a validity flag for indicating whether the business object is valid or not;
the obtaining the object information corresponding to the target business object from the key-value target database according to the object identifier comprises the following steps:
and acquiring a validity identification corresponding to the object identification from the target database.
6. The method of claim 5, wherein said verifying the validity of the service data based on the object information comprises:
carrying out validity verification on the service data according to the validity identification;
determining that the service data passes the verification under the condition that the validity identifier indicates that the target service object is valid; the method comprises the steps of,
and if the validity identification indicates that the target service object is invalid, determining that the service data is not checked to pass.
7. The method of claim 1, wherein the information record of the initial time in the target database is obtained from a data repository storing information records of the plurality of business objects, the data repository being a database for permanently storing data;
the method further comprises the steps of:
and storing information records corresponding to the business objects stored in the target database at the current moment into the data warehouse according to a preset time interval.
8. A data processing apparatus, comprising:
the service data acquisition unit is used for acquiring the service data of the target service object in a streaming mode;
An object information obtaining unit, configured to obtain an object identifier of the target service object from the service data, and obtain object information corresponding to the target service object from a key-value-type target database according to the object identifier, where the object information includes information for indicating whether the target service object is an effective object;
the verification unit is used for verifying the validity of the service data according to the object information and storing the service data as target data under the condition that the verification is passed;
the target database is used for storing information records corresponding to a plurality of service objects at the current moment, wherein the information records are in a key value pair format, and the key value pair takes an object identifier of the service object as a key and takes object information of the service object as a value.
9. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein, the liquid crystal display device comprises a liquid crystal display device,
the memory stores one or more computer programs executable by the at least one processor to enable the at least one processor to perform the data processing method of any one of claims 1-7.
10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the data processing method according to any of claims 1-7.
CN202210985682.0A 2022-08-17 2022-08-17 Data processing method and device, electronic equipment and storage medium Pending CN116150198A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210985682.0A CN116150198A (en) 2022-08-17 2022-08-17 Data processing method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210985682.0A CN116150198A (en) 2022-08-17 2022-08-17 Data processing method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116150198A true CN116150198A (en) 2023-05-23

Family

ID=86360573

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210985682.0A Pending CN116150198A (en) 2022-08-17 2022-08-17 Data processing method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116150198A (en)

Similar Documents

Publication Publication Date Title
US10121169B2 (en) Table level distributed database system for big data storage and query
US10083202B2 (en) Verifying data consistency
US10866973B2 (en) Test data management
US11036713B2 (en) Sending notifications in a multi-client database environment
CN108804306B (en) Method and system for automatic test system
US10877962B2 (en) Deferred update of database hashcode in blockchain
CN108647357B (en) Data query method and device
US10621003B2 (en) Workflow handling in a multi-tenant cloud environment
US9852232B2 (en) Automating event trees using analytics
US20130185086A1 (en) Generation of sales leads using customer problem reports
US20190073394A1 (en) Automatically aggregating data in database tables
WO2023098462A1 (en) Improving performance of sql execution sequence in production database instance
CN116150198A (en) Data processing method and device, electronic equipment and storage medium
US11768741B2 (en) Replicating changes written by a transactional virtual storage access method
US10931781B2 (en) Mobile device cache updating
US20210034590A1 (en) Ledger-based machine learning
US11169979B2 (en) Database-documentation propagation via temporal log backtracking
US10572838B2 (en) Operational data rationalization
US10922312B2 (en) Optimization of data processing job execution using hash trees
US10754876B2 (en) Cloning of a system
US20200104391A1 (en) Ensuring integrity of records in a not only structured query language database
US11176108B2 (en) Data resolution among disparate data sources
CN116578572A (en) Service date changing method, device, equipment, storage medium and program product
CN117609381A (en) Data synchronization method and device, electronic equipment and computer readable storage medium
CN113760870A (en) Method, device and equipment for processing service data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination