CN111881301B

CN111881301B - Data processing method and device, electronic equipment and storage medium

Info

Publication number: CN111881301B
Application number: CN202010680770.0A
Authority: CN
Inventors: 杨双全; 张倩; 张阳; 谢奕; 刘畅; 张雪婷
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2020-07-15
Filing date: 2020-07-15
Publication date: 2024-06-04
Anticipated expiration: 2040-07-15
Also published as: CN111881301A

Abstract

The application discloses a data processing method, a data processing device, electronic equipment and a storage medium, relates to the field of knowledge maps, and can be used for cloud platforms or cloud computing. The specific implementation scheme is as follows: a data processing method, comprising: acquiring first data from a first database, wherein the first data comprises a plurality of sub-data; identifying sub-data with a target relationship in the plurality of sub-data, and generating target data, wherein the target data comprises the sub-data with the target relationship and relationship information of the target relationship; and storing the target data into a relation map. The data processing method, the device, the electronic equipment and the storage medium can solve the problem of long time consumption in the prior art when the association relation between the first data of each data source is analyzed.

Description

Data processing method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of knowledge maps in the field of data processing, and in particular, to a data processing method, apparatus, electronic device, and storage medium.

Background

With the rapid development of data acquisition technology, a specific person may acquire first data related to case processing from different data sources. After the first data of different data sources are acquired, since the data structures of the first data of each data source are generally different, a specific person is required to manually clear the association relationship between the first data from each data source so as to acquire key evidence therefrom.

Disclosure of Invention

The application provides a data processing method, a data processing device, electronic equipment and a storage medium.

In one aspect of the present application, there is provided a data processing method, including:

Acquiring first data from a first database, wherein the first data comprises a plurality of sub-data;

Identifying sub-data with a target relationship in the plurality of sub-data, and generating target data, wherein the target data comprises the sub-data with the target relationship and relationship information of the target relationship;

And storing the target data into a relation map.

In two aspects of the present application, there is provided a data processing apparatus comprising:

The acquisition module is used for acquiring first data from a first database, wherein the first data comprises a plurality of sub-data;

The identification module is used for identifying the sub-data with the target relation in the plurality of sub-data and generating target data, wherein the target data comprises the sub-data with the target relation and relation information of the target relation;

and the storage module is used for storing the target data into a relation map.

In a third aspect of the present application, there is provided an electronic apparatus comprising:

at least one processor; and

A memory communicatively coupled to the at least one processor; wherein,

The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the data processing method.

In a fourth aspect of the present application, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing the computer to execute the data processing method.

According to the technology provided by the application, the problem of long time consumption in the prior art when the association relation between the first data of each data source is analyzed is solved.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the application or to delineate the scope of the application. Other features of the present application will become apparent from the description that follows.

Drawings

The drawings are included to provide a better understanding of the present application and are not to be construed as limiting the application. Wherein:

FIG. 1 is one of the flowcharts of a data processing method provided by an embodiment of the present application;

FIG. 2 is a schematic diagram of a direct single-pass relationship in an embodiment of the present application;

FIG. 3 is a schematic diagram of a direct persistence relationship in an embodiment of the application;

FIG. 4 is a schematic diagram of an intermediate single-pass relationship according to an embodiment of the present application;

FIG. 5 is a schematic diagram of an embodiment of the application with intermediate persistence relationships;

FIG. 6 is a second flowchart of a data processing method according to an embodiment of the present application;

FIG. 7 is a schematic diagram of a structure of importing first data into a relationship extraction model according to an embodiment of the present application;

FIG. 8 is a third flowchart of a data processing method according to an embodiment of the present application;

FIG. 9 is a schematic diagram of a data processing apparatus according to an embodiment of the present application;

Fig. 10 is a block diagram of an electronic device for implementing a data processing method of an embodiment of the present application.

Detailed Description

Exemplary embodiments of the present application will now be described with reference to the accompanying drawings, in which various details of the embodiments of the present application are included to facilitate understanding, and are to be considered merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Referring to fig. 1, fig. 1 is a data processing method provided in an embodiment of the present application, including:

Step S101, first data are obtained from a first database, wherein the first data comprise a plurality of sub-data.

The first database may store first data from one data source or may store first data from a plurality of different data sources. The first data may be first data collected by the data collection device, internet data, and data collected manually by a specific person offline, for example, may be image data captured by a monitoring device at a road traffic intersection, data issued by a netizen on the internet, or data such as identity information and face information of a mobile person manually collected by a specific person at a public place.

The sub data in the first data may refer to data carried in the first data. For example, when the first data is data generated by a monitoring device based on a photographed photo, it is assumed that the photo includes data such as a license plate number of a vehicle, a face feature of a driver, and the like, and information such as a time when the photo is photographed and an ID number of the monitoring device is recorded in the monitoring device, wherein a place of photographing may be determined based on the ID number of the monitoring device. In this case, the first data may include the following sub-data: the license plate number of the vehicle, the face characteristics of the driver, the ID number of the monitoring device and the like, wherein the shooting place can be used as the attribute information of the ID number of the sub-data monitoring device, and the time when the photo is shot can be used as the attribute information of the license plate number. For another example, when the first data is data such as identity information and face information of a mobile person manually collected by a specific person in a public place, the first data may include the following sub-data: identity information of mobile personnel, face photos, time of acquisition, place of acquisition, etc.

In particular, the first data generated by each data source may be data having a fixed data structure, and the data structures of the first data generated by different data sources may be different. The first data may include a field description, and specific data corresponding to the field description, for example, the generated first data may be: number of vehicle: XXX1; face characteristics of the driver: XXX2; ID number of monitoring device: XXX3], wherein the license plate number of the vehicle, the face characteristics of the driver and the ID number of the monitoring equipment are field descriptions, and XXX1, XXX2 and XXX3 are specific data corresponding to the field descriptions, namely the sub-data. Thus, the first data is conveniently identified later by generating the data with a fixed data structure.

Step S102, identifying sub-data with a target relation in the plurality of sub-data, and generating target data, wherein the target data comprises the sub-data with the target relation and relation information of the target relation.

The target relationship may be a general relationship type, for example, a general relationship type such as a relationship, a direct relationship, an indirect relationship, and the like, so that relationships between sub-data of the first data of various types can be expressed by the target relationship. So that the relationship among the sub-data carried in the first data of various types can be reflected in the same relationship map.

The first data generally carries an association relationship between each sub-data, for example, when the first data is: number of vehicle: XXX1; face characteristics of the driver: XXX2; ID number of monitoring device: XXX3], because the license plate number of the vehicle and the face characteristics of the driver are data directly shot by the monitoring equipment, a direct relationship exists between XXX1 and XXX3 and a direct relationship exists between XXX2 and XXX3, and by determining the relationship, the track information of the target vehicle and the target person and the like can be determined. Because the license plate number of the vehicle and the face feature of the driver appear in the same photo, namely the license plate number of the vehicle and the face feature of the driver are connected through the monitoring equipment, an indirect relationship exists between XXX1 and XXX2, and whether the person driving the vehicle is the owner of the vehicle or not is conveniently determined through acquiring the relationship. Based on this, in this case, the target data that can be identified may be at least one of the following: [ XXX1, XXX3, direct relationship ], [ XXX2, XXX3, direct relationship ], [ XXX1, XXX2, indirect relationship ] and the like.

And step S103, storing the target data into a relation map.

Specifically, since the target data includes sub-data having a target relationship and relationship information of the target relationship, when the target data is stored in a relationship map, in order to make the relationship map clearly show a relationship between the sub-data, each sub-data in the target data may be used as a connection node in the relationship map, and different connection nodes may be connected through connection lines to show the target relationship between the sub-data, and in addition, different types of target relationships may be represented by different types of lines, such as straight lines, broken lines, thickened lines, and the like.

In this way, the association relationship between the sub-data carried in the first data from different sources can be embodied through the relationship graph, and when partial sub-data exist in the first data from different sources, the sub-data share the connection nodes in the relationship graph, so that the association relationship between the different first data is automatically established in the relationship graph. For example, after the first target data [ a, B, direct relation ] from the first data source and the second target data [ a, C, direct relation ] from the second data source are stored in the relation map, the sub data B and the sub data C will establish an indirect relation through the sub data a, thereby establishing an association relation between the first target data and the second target data, and mining a hidden relation [ B, C, indirect relation ] hidden between the first target data and the second target data, and the hidden relation may be directly embodied in the relation map.

In this embodiment, the sub-data having the target relationship is identified in the first data, and the identified sub-data having the target relationship is stored in the relationship graph, so that the association relationship between the sub-data is represented by the relationship graph. Therefore, after all the sub-data with the target relationship in the first data are stored in the relationship map, the relationship map can intuitively show the association relationship between the first data no matter whether the sources of the first data and the data constitution are the same or not. Therefore, the function of automatically mining the association relation among the first data is realized, and the efficiency of carrying out association analysis on the first data is improved.

The data processing method provided by the embodiment of the application can be used in a cloud platform or cloud computing. For example, when a server in a cloud platform needs to perform association analysis on first data of different sources based on service requirements, the data processing method provided by the embodiment of the application can be adopted for analysis.

Optionally, the relationship graph includes a plurality of connection nodes, each connection node storing a different sub-data, and storing the target data into the relationship graph includes:

Determining target connection nodes corresponding to the sub-data in the target data one by one in the relation map, wherein the target connection nodes are connection nodes for storing the sub-data in the target data;

And connecting the target connection nodes corresponding to the sub-data with the target relation in the relation map.

Specifically, each sub data is stored in a specific connection node, and the connection nodes with the target relationship are connected through connection lines, so that the association relationship among the sub data is shown in a relationship map; wherein, the connection node with the target relationship means: the sub-data in the connected node has the target relationship.

Since the sub-data may be sub-data in different types of first data, for example, a specific object may be sub-data in the first data collected by the monitoring device, or may be sub-data in the first data collected by a specific person on site, or may be sub-data in the first data collected from a network and related to the specific object. Therefore, by storing each sub-data in the same relationship graph, when the target data in the plurality of first data each includes a certain sub-data, after storing the target data in the plurality of first data in the relationship graph, the target data will share one connection node, so that the association relationship between the first data is represented by the connection node.

In the embodiment, the association relation between the sub-data of each first data is embodied in the form of the connection point and the connection line in the relation graph, so that the association relation between each first data can be intuitively clarified, the relation graph can be automatically generated by the electronic equipment, and compared with the prior art, the efficiency of carrying out association analysis on the first data can be effectively improved by adopting a manual analysis mode.

Optionally, each connection node may store attribute information of the sub-data, for example, taking a scene that a monitoring device shoots a photo on a road as an example, when the sub-data stored in a certain connection node is a license plate number of a certain vehicle, the time of shooting the license plate number may be stored in the sub-data, and the time may be the time of shooting the license plate number last time or the time of shooting the license plate number each time in a history shooting process, where the license plate number may be used as an identity of the connection node, and the time stored in the connection node may be used as attribute information of the sub-data. It should be understood that, the fact that different target data have the same sub data means that identities of the sub data are the same, where attribute information of the sub data may be different. Specifically, after the user determines that there is an association between the first data based on the relationship map, the attribute information of the sub data may be queried to further determine specific data associated between the first data.

Optionally, the determining, in the relationship graph, a target connection node corresponding to the sub-data in the target data one to one includes:

Creating a first target node in the relationship graph under the condition that the first target node is not included in the plurality of connecting nodes, wherein the first target node is a connecting node for storing any sub-data in the target data;

And determining the first target node as the target connection node.

It should be noted that in the case where the plurality of connection nodes includes the first target node, the first target node is directly determined as the target connection node without additional creation.

In particular, the above process of storing target data in a relationship map is further described in the following in a specific embodiment, where the target data includes first sub-data and second sub-data, and the target relationship exists between the first sub-data and the second sub-data. Firstly, inquiring in all connection nodes of the relation graph based on first sub data and second sub data respectively, if two first target nodes corresponding to the first sub data and the second sub data respectively exist in the relation graph, determining the two first target nodes as target connection nodes respectively, and connecting the two first target nodes, so that the process of storing the target data in the relation graph is completed. If only the first target node exists in all the connection nodes in the relation graph, for example, when the first target node stores the first sub-data, one connection node is added in the relation graph, and the second sub-data is stored in the newly added connection node, so that the determination process of the target connection node is completed. Correspondingly, if the first target node does not exist in the relation graph, two connecting nodes are correspondingly added in the relation graph, and the first sub-data and the second sub-data are respectively stored in the added two connecting nodes, so that the determination process of the target connecting nodes is completed. After the target connection node is determined, the determined target connection node is connected through a connection line.

Optionally, before the identifying the sub data with the target relationship in the plurality of sub data and generating the target data, the method further includes:

And determining the target relationship, wherein the target relationship is a relationship selected by a user from at least two preset relationships, and the at least two preset relationships comprise a direct relationship and an indirect relationship.

The direct relation may include a direct single relation and a direct persistent relation, where in the case where a direct relation exists between two sub-data in the first data, it may be further determined whether the two have a direct single relation or a direct persistent relation according to a frequency of existence of the direct relation, where when the two have a direct relation only once in a period of time, the two may be considered to have a direct single relation, and where when the two have a direct relation once every preset period of time in a period of time, the two may be considered to have a direct persistent relation. For example, in a monitoring process of a camera of a garage, if only a record of a first license plate is identified within a half year, it is determined that an ID of the camera and the first license plate are in a direct single relationship, and if the camera generates an identification record of a second license plate every day within a half year, it is determined that the ID of the camera and the second license plate are in a direct continuous relationship. In addition, fig. 2 and 3 may be used to show that a direct single-time relationship and a direct continuous relationship exist between A1 and A2, and when the direct single-time relationship between A1 and A2 is identified, the identification process is completed only by identifying the direct single-time relationship between A1 and A2 without setting duration information. When the direct continuous relation between A1 and A2 is identified, a duration may be set, that is, the number of times of the direct relation between A1 and A2 in the duration is identified, and when the number of times of the direct relation between A1 and A2 in the duration exceeds a preset threshold, the direct continuous relation between A1 and A2 is considered to exist. The relationship between the first data may be further refined by further subdividing the direct relationship into a direct single relationship and a direct persistent relationship.

Accordingly, the indirect relationship may include an indirect single relationship and an indirect continuous relationship, and the differences between the indirect single relationship and the indirect continuous relationship are further explained below by taking a scene that the camera shoots the license plate number as an example. As is clear from the above discussion, when the camera captures the driver and the license plate number appearing in the same photograph, an indirect relationship is established between the driver and the license plate number through the camera, and the indirect single relationship may be that in a record of a plurality of vehicles captured within a period of time, only one photograph exists to represent the indirect relationship between the first driver and the license plate number. Accordingly, if the indirect relationship between the second driver and the license plate number can be represented in each photo taken in the record of photographing a certain vehicle for a plurality of times within a period of time, it can be considered that there is an indirect continuous relationship between the second driver and the license plate number. In addition, fig. 4 and fig. 5 may also be used to show that an indirect single-time relationship and an indirect continuous relationship exist between A1 and A2, respectively, where a relationship is generated between A1 and A2 by B1, in this case, the difference between the indirect single-time relationship and the indirect continuous relationship is similar to the difference between the direct single-time relationship and the direct continuous relationship, which are not described herein for avoiding repetition.

Specifically, a relationship extraction model can be built to automatically complete the identification and extraction process of the target data, and a user can pre-select one of the preset relationships as a target relationship to be analyzed before making a relationship extraction task in the relationship extraction model, and of course, all the preset relationships can also be simultaneously selected to be respectively used as the target relationships, so that the generated relationship map can completely reflect all the association relationships existing between the first data.

In the embodiment, a plurality of preset relations are set for a user to select, so that the user can quickly analyze a new data access relation map to be introduced. Because new data is introduced into the generated association relation, the development is not needed again, and the analysis efficiency of carrying out association analysis on the first data is effectively improved.

Optionally, after storing the target data in a relationship map, the method further includes:

Receiving a first query instruction of a user, wherein the first query instruction comprises a second target node and the target relationship, and the second target node is any one of the plurality of connection nodes;

And inquiring the relation map based on the first inquiry instruction to obtain a target relation map, wherein the target relation map comprises the second target node and a connecting node with the target relation with the second target node in the plurality of connecting nodes.

Each data source is typically capable of increasing amounts of first data each day, and the relationship map is typically required to represent associations between all first data generated by a plurality of different data sources over a longer period of time. Therefore, the connection points and connection lines in the generated relationship graph are usually complicated, and based on the connection points and connection lines, the embodiment of the application can build indexes for the relationship graph, and a user can query part of the relationship graph for display based on the query instruction input constraint condition. For example, when a user needs to study the driving track of a certain vehicle, the license plate number of the vehicle can be input and the direct relationship is used as the target relationship to query, namely, the IDs of all monitoring devices which shoot the license plate number can be queried, the positions of the monitoring devices are determined based on the IDs of the monitoring devices, and then the determined positions of all the monitoring devices are connected in time sequence, namely, the driving track of the vehicle in a certain time period can be determined.

In the embodiment, by setting the queriable function for the relationship graph, a user is facilitated to quickly query all first data related to the target object in the relationship graph.

Optionally, since the first data of each data source is generally updated in real time, in order to ensure that the latest association relationship between the first data can be maintained in the relationship graph, a corresponding scheduling task can be formulated, the newly added first data is periodically extracted from each data source, the first data is identified when the newly added first data is extracted, and the identified target data is stored in the relationship graph. So as to update the relationship map in time based on the newly generated first data.

Specifically, when the above-mentioned scheduling task is formulated, a scheduling rule may be configured, where the scheduling rule may refer to: whether the first data imported each time is a full import or an incremental import, the frequency of scheduling and other relevant rules.

Optionally, under the condition that the number of the sub-data included in the first data exceeds a first preset value, the first data may be split into a plurality of sub-first data, each sub-first data is respectively identified, corresponding target data is obtained, and the obtained target data is respectively stored in the relationship map. For example, when a certain first data records all the first data shot by a certain camera within one month, the first data can be split according to the unit of day, so that the problem of task execution failure caused by oversized file of the first data can be avoided.

Optionally, the identifying sub-data having a target relationship among the plurality of sub-data, and generating at least two of the target data includes:

Performing de-duplication operation on the plurality of sub-data to obtain a plurality of target sub-data;

Identifying target sub-data with target relation in the target sub-data, and generating target data, wherein the target data comprises the target sub-data with target relation and relation information of the target relation;

And storing the target data into a relation map.

Wherein, the performing the deduplication operation on the plurality of sub-data includes: and screening the same sub data from the plurality of sub data, acquiring the generation time points of the selected same sub data, and reserving one of the two same sub data when the time interval between the generation time points of the two same sub data is smaller than a second preset value, thereby completing the deduplication process. The generating time point of the sub data can be the generating time point of the sub data acquired by the data acquisition equipment, and in the process, the same sub data with the generating time point being relatively close is deduplicated, so that the interference of repeated data generated in a short time on an analysis result is avoided.

It should be noted that this embodiment differs from the above examples only in that: the step of deduplication operation is added before identifying the target sub-data having the target relationship in the plurality of target sub-data, and in addition, other steps of the embodiment may be the same as those of the above embodiment, and the same beneficial effects can be achieved, so that repetition is avoided, and no description is given here.

Referring to fig. 6, fig. 6 is a flowchart of a data processing method according to an embodiment of the present application, which specifically includes:

Before processing the first data, a user may pre-formulate a scheduling task, then acquire the first data from the first database based on the formulated scheduling task, transfer the acquired first data to a relationship extraction model, extract target data in the first data by the relationship extraction model, and store the extracted target data in the relationship map in the form of data volume. Thereby completing the processing of the first data. The structure of the first data of each data source may be analyzed in advance, so as to determine a relationship existing between different fields (i.e., sub-data) in the first data of each data source, then, the determined relationship between different fields in the first data of each data is input into a relationship extraction model, and when the subsequent relationship extraction model analyzes the first data, sub-data having a target relationship may be identified according to the structure of the first data, so as to obtain the target data. The process of importing data of the data sources into the relationship extraction model may also be as shown in fig. 7, where the first data of each data source is imported into the relationship extraction model in a unified and self-adaptive manner, and the relationship extraction model automatically identifies the first data from each data source.

Referring to fig. 8, when new data needs to be introduced into the relationship map, a user may select a target relationship in the relationship selection module, and then, a task registration module formulates a scheduling rule of corresponding first data, and the system will automatically generate a configuration required by a data flow task based on information filled by the user and register a data volume task. Then, the first data in the first database is accessed into the system to complete the analysis process of the association relation between the first data.

Referring to fig. 9, fig. 9 is a data processing apparatus 900 according to an embodiment of the present application, including:

an obtaining module 901, configured to obtain first data from a first database, where the first data includes a plurality of sub-data;

An identifying module 902, configured to identify sub-data having a target relationship in the plurality of sub-data, and generate target data, where the target data includes the sub-data having the target relationship and relationship information of the target relationship;

a storage module 903, configured to store the target data into a relationship map.

Optionally, the relationship graph includes a plurality of connection nodes, each of which stores a different sub-data, and the storage module 903 includes:

The determining submodule is used for determining target connection nodes corresponding to the sub-data in the target data one by one in the relation graph, wherein the target connection nodes are connection nodes for storing the sub-data in the target data;

and the connection sub-module is used for connecting the target connection node corresponding to the sub-data with the target relation in the relation map.

Optionally, the determining submodule includes:

A creating unit, configured to create a first target node in the relationship map, where the first target node is a connection node storing any sub-data in the target data, in a case where the first target node is not included in the plurality of connection nodes;

And the determining unit is used for determining the first target node as the target connection node.

Optionally, the apparatus further comprises:

The determining module is used for determining the target relationship, wherein the target relationship is a relationship selected by a user from at least two preset relationships, and the at least two preset relationships comprise a direct relationship and an indirect relationship.

Optionally, the apparatus further comprises:

the receiving module is used for receiving a first query instruction of a user, wherein the first query instruction comprises a second target node and the target relationship, and the second target node is any one of the plurality of connection nodes;

And the query module is used for querying the relation graph based on the first query instruction to obtain a target relation graph, wherein the target relation graph comprises the second target node and a connecting node with the target relation with the second target node in the plurality of connecting nodes.

The data processing apparatus 900 provided in this embodiment can implement each process in the method embodiments shown in fig. 1 to 8, and can achieve the same beneficial effects, and for avoiding repetition, a detailed description is omitted here.

According to an embodiment of the present application, the present application also provides an electronic device and a readable storage medium.

As shown in fig. 10, there is a block diagram of an electronic device of a data processing method according to an embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the applications described and/or claimed herein.

As shown in fig. 10, the electronic device includes: one or more processors 1001, memory 1002, and interfaces for connecting the components, including a high-speed interface and a low-speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions executing within the electronic device, including instructions stored in or on memory to display graphical information of the GUI on an external input/output device, such as a display device coupled to the interface. In other embodiments, multiple processors and/or multiple buses may be used, if desired, along with multiple memories and multiple memories. Also, multiple electronic devices may be connected, each providing a portion of the necessary operations (e.g., as a server array, a set of blade servers, or a multiprocessor system). One processor 1001 is illustrated in fig. 10.

Memory 1002 is a non-transitory computer-readable storage medium provided by the present application. Wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the data processing method provided by the present application. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to execute the data processing method provided by the present application.

The memory 1002 is used as a non-transitory computer readable storage medium, and may be used to store non-transitory software programs, non-transitory computer-executable programs, and modules, such as program instructions/modules (e.g., the acquisition module 901, the identification module 902, and the storage module 903 shown in fig. 9) corresponding to the data processing method according to the embodiments of the present application. The processor 1001 executes various functional applications of the server and data processing, i.e., implements the data processing method in the above-described method embodiment, by running non-transitory software programs, instructions, and modules stored in the memory 1002.

Memory 1002 may include a storage program area that may store an operating system, at least one application program required for functionality, and a storage data area; the storage data area may store data created according to the use of the electronic device of the data processing method, and the like. In addition, the memory 1002 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device. In some embodiments, memory 1002 optionally includes memory located remotely from processor 1001 which may be connected to the data processing method's electronic device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device of the data processing method may further include: an input device 1003 and an output device 1004. The processor 1001, memory 1002, input device 1003, and output device 1004 may be connected by a bus or other means, for example by a bus connection in fig. 10.

The input device 1003 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic device of the data processing method, such as a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointer stick, one or more mouse buttons, a track ball, a joystick, etc. input devices. The output means 1004 may include a display device, auxiliary lighting means (e.g., LEDs), tactile feedback means (e.g., vibration motors), and the like. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device may be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASIC (application specific integrated circuit), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

These computing programs (also referred to as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented in a high-level procedural and/or object-oriented programming language, and/or in assembly/machine language. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

According to the technical scheme of the embodiment of the application, the sub-data with the target relationship is identified in the first data, and the identified sub-data with the target relationship is stored in the relationship graph, so that the association relationship among the sub-data is embodied through the relationship graph. Therefore, after all the sub-data with the target relationship in the first data are stored in the relationship map, the relationship map can intuitively show the association relationship between the first data no matter whether the sources of the first data and the data constitution are the same or not. Therefore, the function of automatically mining the association relation among the first data is realized, and the efficiency of carrying out association analysis on the first data is improved.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present application may be performed in parallel, sequentially, or in a different order, provided that the desired results of the disclosed embodiments are achieved, and are not limited herein.

The above embodiments do not limit the scope of the present application. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present application should be included in the scope of the present application.

Claims

1.A data processing method, comprising:

storing the target data into a relationship graph;

the relationship graph is used for representing the association relationship between the sub-data carried by the first data of different sources, and the same sub-data in the first data of different sources share the same connecting node; the connection nodes shared by the first data with different sources are used for representing the association relation between the corresponding first data with different sources;

The relationship graph includes a plurality of connection nodes, each connection node storing a different sub-data, the storing the target data into the relationship graph includes: determining target connection nodes corresponding to the sub-data in the target data one by one in the relation map, wherein the target connection nodes are connection nodes for storing the sub-data in the target data; in the relation map, connecting the target connection nodes corresponding to the sub-data with the target relation;

the target relationship is a relationship selected by a user from at least two preset relationships, wherein the at least two preset relationships comprise a direct relationship and an indirect relationship, and if the sub data A and the sub data B have a direct relationship and the sub data A and the sub data C have a direct relationship, the sub data B and the sub data C have an indirect relationship; the direct relation comprises a direct single relation and a direct continuous relation, and the indirect relation comprises an indirect single relation and an indirect continuous relation, wherein the relation between two sub-data of the preset relation is a continuous relation generated once every preset duration in a preset time period;

determining the relation existing between different fields in the first data of each data source, wherein the fields are sub-data, inputting the determined relation between the different fields in the first data of each data source into a relation extraction model, and identifying the sub-data with the target relation according to the structure of the first data when the relation extraction model analyzes the first data to obtain target data;

When new data is required to be introduced into the relation graph, a corresponding scheduling rule of the first data is formulated in the task registration module based on a target relation selected by a user in the relation selection module, configuration required by a data flow task is automatically generated based on information filled by the user, and a data volume task is registered.

2. The method of claim 1, wherein the determining, in the relationship graph, a target connection node that corresponds one-to-one to sub-data in the target data comprises:

And determining the first target node as the target connection node.

3. The method of claim 1, wherein prior to identifying sub-data of the plurality of sub-data that has a target relationship and generating target data, the method further comprises:

and determining the target relation.

4. The method of claim 1, wherein after storing the target data in a relationship graph, the method further comprises:

5. A data processing apparatus comprising:

the storage module is used for storing the target data into a relation map;

The relationship graph includes a plurality of connection nodes, each connection node storing a different sub-data, the storage module including: the determining submodule is used for determining target connection nodes corresponding to the sub-data in the target data one by one in the relation graph, wherein the target connection nodes are connection nodes for storing the sub-data in the target data; a connection sub-module, configured to connect, in the relationship graph, the target connection node corresponding to the sub-data having the target relationship;

6. The apparatus of claim 5, wherein the determination submodule comprises:

7. The apparatus of claim 5, wherein the apparatus further comprises:

and the determining module is used for determining the target relation.

8. The apparatus of claim 5, wherein the apparatus further comprises:

9. An electronic device, comprising:

at least one processor; and

A memory communicatively coupled to the at least one processor; wherein,

The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-4.

10. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-4.